WellDB

A web scraper and REST API for New Mexico Oil Conservation Division (OCD) well data.

DATABASE FILE: api_well_data.db

Polygon Query Results: polygon_query_results.csv

Setup

Option 1: uv (Recommended)

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies and set up environment
uv sync
uv run playwright install chromium

Option 2: Python Virtual Environment

python -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install chromium

Option 3: Docker

docker compose up --build

OR , if venv or UV venv activated:

[UV run] python -m well_db docker up --build

This starts the API server at http://localhost:8000. The Playwright browser is included in the image.

Getting Started

Scrape Well Data

Populate the database from the CSV file containing API numbers:

# Using uv
uv run python -m well_db scrape

# With options
uv run python -m well_db scrape --concurrency 3 --missing

The --missing flag only scrapes APIs not already in the database. Use --force to re-scrape everything.

Start the API Server

uv run python -m well_db serve

Server runs at http://127.0.0.1:8000. API docs available at /docs.

Query Wells in a Polygon (CLI)

# Use the test polygon from the assignment
uv run python -m well_db polygon test -o results.csv

# Custom polygon
uv run python -m well_db polygon "[(32.81,-104.19),(32.66,-104.32),(32.54,-104.24)]" -o results.csv

Other Commands

uv run python -m well_db delete --yes    # Delete the database
uv run python -m well_db docker up       # Start Docker containers
uv run python -m well_db docker down     # Stop Docker containers

API Endpoints

Required Endpoints

Method	Endpoint	Description
GET	`/well?api_number=XX-XXX-XXXXX`	Get all data for a single well
GET	`/polygon-search?polygon=[(lat,lon),...]`	Find wells within a polygon

Database and Scraping

Method	Endpoint	Description
GET	`/db/status`	Database status and CSV comparison
POST	`/scrape/start`	Start background scrape job
GET	`/scrape/status`	Monitor scrape progress
POST	`/scrape/stop`	Stop running scrape job
GET	`/wells`	List all wells (paginated)
GET	`/wells/count`	Total well count

Utilities

Method	Endpoint	Description
GET	`/`	Health check
GET	`/well/scrape?api_number=...`	Force scrape a single well
GET	`/wells/random`	Get a random well (scrapes if missing)
POST	`/polygon-search`	Polygon search with JSON body

Implementation Notes

Scraper Architecture

The scraper uses Playwright with headless Chromium.

Key scraper features:

Element-based waiting: Uses wait_for_selector("#general_information") [General Well Information] instead of fixed delays for reliable page load detection
Targeted extraction: Extracts only the fieldset.data_container element rather than full page text
Worker pool concurrency: Uses asyncio.Queue with configurable workers instead of semaphore-based approaches

Concurrency Strategy

I attempted to use asyncio.gather() with a semaphore, which caused request clustering. The final implementation uses a worker pool pattern where N workers pull from a shared queue, each maintaining a 1.5-second delay between their own requests. This provides consistent rate limiting without overwhelming the target server.

Polygon Search

Uses GeoPandas with WGS84 CRS (EPSG:4326) for geodetically-correct point-in-polygon testing. A bounding box pre-filter in SQL reduces the candidate set before the spatial join.

Data Model

Field ordering in the SQLAlchemy model matches the assignment specification. The api field serves as the primary key. Timestamps use datetime.now(timezone.utc) with lambda wrappers to avoid the deprecated datetime.utcnow().

Error Handling

The scraper implements exponential backoff with 3 retries per API. Failed APIs are tracked and reported but don't halt the batch. The API's /scrape/start endpoint runs scraping in a background task with status polling via /scrape/status.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
resources		resources
utils		utils
well_db		well_db
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
api_well_data.db		api_well_data.db
docker-compose.yml		docker-compose.yml
polygon_query_results.csv		polygon_query_results.csv
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WellDB

DATABASE FILE: api_well_data.db

Polygon Query Results: polygon_query_results.csv

Setup

Option 1: uv (Recommended)

Option 2: Python Virtual Environment

Option 3: Docker

Getting Started

Scrape Well Data

Start the API Server

Query Wells in a Polygon (CLI)

Other Commands

API Endpoints

Required Endpoints

Database and Scraping

Utilities

Implementation Notes

Scraper Architecture

Concurrency Strategy

Polygon Search

Data Model

Error Handling

About

Uh oh!

Releases

Packages

Languages

jLes/well_db

Folders and files

Latest commit

History

Repository files navigation

WellDB

DATABASE FILE: api_well_data.db

Polygon Query Results: polygon_query_results.csv

Setup

Option 1: uv (Recommended)

Option 2: Python Virtual Environment

Option 3: Docker

Getting Started

Scrape Well Data

Start the API Server

Query Wells in a Polygon (CLI)

Other Commands

API Endpoints

Required Endpoints

Database and Scraping

Utilities

Implementation Notes

Scraper Architecture

Concurrency Strategy

Polygon Search

Data Model

Error Handling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages