Pest Management Database
a pest control report parser and visualizer
Overview
My junior design project focused on understanding problems from the perspective of shareholders, thorough system mapping for identifying and weighing intervention points, and delivery of a viable solution on a constrained timeline. The project took a multifaceted approach to addressing rodent presence and its impacts on Harvard’s campus. Integrating perspectives of students, staff, and administrators, our team identified 7 viable intervention points. We addressed these points through a three part solution: a preventative approach in a redesigned trash can lid, a reactive approach in an active ultrasonic deterrent, and an analytical approach in an automated pest control report parser and database. My work in the development phase was devoted to the automated report parser. Completed designs were presented to Harvard’s Environmental Health and Safety Department.
Design
The report parser was implemented in Python with a SQLite database for storage. The PDFMiner module was used to extract text as an object that hierarchically describes page, text box, line, and individual character content, giving the coordinates of each element on a page. The parser leveraged the standardized format of pest reports to first scan a document to determine which of a bank of subsections were present. The contents of these subsections were located using line order and bounding box location cues. Content was further organized into objects containing information including report location, comments, and pest control products applied.
The python objects generated by the parser were used to populate a master table containing basic identifying information about each report and 5 supporting tables. The supporting tables allowed for detailed organization of more complex report subsections. After being populated, the database file was passed to other scripts, developed by my teammates, allowing users to search for forms and generate figures from their data.