A guide to acquire, install and deploy the search capability via API, for SQL and OpenSearch.
-
- If you encounter issues with Docker, try uninstalling and reinstalling it using Homebrew:
brew uninstall --cask docker --force brew uninstall --formula docker --force brew install --cask docker
- This works even if Docker was not originally installed using Homebrew.
-
- Install
libpqvia Homebrew:
brew install libpq brew link --force libpq
- Install
-
- Register for an API key at the top of the page
Clone the Data-Product-Kit repository:
git clone https://github.com/mirrulations/Data-Product-Kit.gitCreate a virtual environment, activate it, and install the requirements:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate a .env file in the parent directory with the following fields:
OPENSEARCH_INITIAL_ADMIN_PASSWORD=<your_secure_password>
OPENSEARCH_HOST=opensearch-node1
OPENSEARCH_PORT=9200
S3_BUCKET_NAME=presentationbucketcs334s25
POSTGRES_DB=postgres
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_HOST=db
POSTGRES_PORT=5432
REGULATIONS_API_KEY=<your_API_key>NOTE: For OpenSearch including a $ or a ! in the password as a special character may lead to issues when running docker compose later on (the following text gets interpreted as a shell variable), so avoid using them in your password.
NOTE: When running locally, you can set the username and password to any desired credentials. Example credentials are included below:
OPENSEARCH_INITIAL_ADMIN_PASSWORD=C4nzUMkFu^e4N2
OPENSEARCH_HOST=opensearch-node1
OPENSEARCH_PORT=9200
S3_BUCKET_NAME=presentationbucketcs334s25NOTE: The S3 Bucket we are using for sample data is presentationbucketcs334s25. You can use your own bucket by changing the value of S3_BUCKET_NAME in the .env file.
Run the following command to load the environment variables:
source .envWe now use a Docker network that allows OpenSearch and SQL to communicate within containers without being exposed to the local machine.
- Ensure Docker is running.
- Start all services (OpenSearch, SQL, and Query):
docker compose build --no-cache docker compose up -d
NOTE: This should run a total of 7 containers, as well as running the Docker Network
- Verify running containers:
docker compose ps
Note: There should be 3 opensearch containers running upon successful execution. If you are having issues, you may want to revisit your password as the containers will not start with a low strength password. Visit the OpenSearch troubleshooting guide for further assistance.
docker-compose exec opensearch-node1 curl -X GET "http://opensearch-node1:9200" -
In the virtual environment, run the following command to create the index and ingest the data:
docker-compose exec ingest python /app/ingest.pyNote: This may take a few minutes.
-
To query the data, run the following command:
docker-compose exec ingest python /app/query.py <search_term>
Note: Only dockets with matching comments will appear in the output.
- To delete data from the OpenSearch instance, run the following command:
docker-compose exec ingest python /app/delete_index.py
Troubleshooting tips are available in the SQL Troubleshooting Guide
-
Create Tables & Insert Data:
docker-compose exec sql-client python CreateTables.py -
Ingest Data from S3 Bucket:
docker-compose exec sql-client python IngestFromBucket.py presentationbucketcs334s25 -
Optional: Ingest Individual Docket from Mirrulations S3 Bucket:
docker-compose exec sql-client python IngestDocket.py <docket_id>
Example Docket:
DOS-2022-0004 -
Optional: Ingest Agency Data From regulations.gov:
To check for and insert missing agency data into the
agencies.txtfile, run:docker-compose exec sql-client python CheckAgencies.py
IMPORTANT: Additional documentation for all the scripts for SQL can be found here
-
PSQL Interface:
docker-compose exec sql-client psql -h db -U postgres -d postgresYou can begin querying once the connection has been established.
To enhance readability:
\xExit PSQL:
\q -
Query Using Script:
- This command allows the user to input a SQL query:
docker-compose exec sql-client python /app/Query.py "SELECT docket_id FROM dockets;"
An example query is provided above.
NOTE: Queries are limited to SELECT statements and must be written within the quotation marks.The script must be rerun per query issued; output is returned in JSON format.
- Drop Tables:
docker-compose exec sql-client python DropTables.py
To run a search query that integrates both OpenSearch and SQL:
docker compose exec queries python query.py "<search_term>"This will:
- Query OpenSearch for docket IDs matching the search term.
- Fetch additional details from SQL for those docket IDs.
- Return combined results.
To stop the docker containers, run the following command:
docker compose downFor additional help, reach out to the Data Product team.