Last Updated: 08/02/2024
Plan Bay Area (PBA) is the region’s visionary long-range plan, including 35 strategies spread across the elements of transportation, housing, the economy and the environment that collectively seek to make the Bay Area more equitable for all residents and more resilient in the face of unexpected challenges. Over the next two and a half years, the plan will be updated in consultation with a wide range of partners, including federal, state, regional, county, local and Tribal governments, as well as community organizations, other stakeholders and the public. This project aims to facilitate a more informed and responsive approach to public feedback by developing and implementing a framework using embeddings and Large Language Model (LLM) prompt engineering techniques.
The 5 main components of this pipeline:
- Component 1. Data Ingestion: Collecting and formatting public engagement comment data for analysis.
- Component 2. Topic/Subtopic Analysis: Utilizing LLMs in the classification of public comments into main and subtopics.
- Component 3. Theme Analysis using Embeddings: Extracting embeddings from comments to cluster into groups with alike meaning, accompanied by automatically generated topic name recommendations.
- Component 4. Final Topic/Subtopic/Theme Assignment: A manual process of reviewing and finalizing the categories of topics given the sugesstions the earlier methods.
- Component 5. Present Results: Compiling and presenting the analysis methodologies as user-friendly tools for future public engagement initiatives.
- .vscode: Configuration settings for Visual Studio Code.
- Analysis Modules:
- Data Preparation(Component 1): Processes leverages OpenAI to generate synthetic comments based on NextGen comments.
- Theme Analysis(Component 3): Processes leveraging embeddings in the grouping and naming of public comments.
- OpenAI Topic Tagging(Component 2): Leveraging OpenAI LLM's to automatically tag public comments.
- Prompt Setup(Components 2-4): Materials to assist in creating prompts for new uses of these tools.
- Demonstrations: Quick coding examples.
- configs(Components 2-4): Process configuration files (e.g. lists of tags).
- utilities(All Components): Utility functions supporting the processes.
- .env-example: Example definition of the OpenAI API key (filename must be changed to .env in practice).
- PBA50 Comment Processing Pipeline.ipynb: Notebook containing example functionality for all processes included in this repo.
- environment.yml: Conda environment configuration.
After cloning the repository create and activate your virtual environment by running the following (more comprehensive directions in the project design document.
$ conda env create -f environment.yml
then
$ conda activate comment-analysis-env
Then configure a .env file like .env-example containing your OpenAI API key and you should be all set to run the processes!
