Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
a79b5a8
minor: update CLI command with params field and update Makefile to al…
dherincx92 Oct 18, 2025
dcf4c97
minor: update CLI with new fields command
dherincx92 Oct 18, 2025
b1ae038
Add end of file whitespace
dherincx92 Oct 18, 2025
9dfee86
Fix README
dherincx92 Oct 18, 2025
f8cefe7
minor: fix field with missing regex pattern and update data method to…
dherincx92 Dec 10, 2025
6af5067
fix Makefile
dherincx92 Dec 10, 2025
d23c526
Add scraper script to use with GHA to idenify new search fields
dherincx92 Dec 11, 2025
1959eb8
Fix dependencies; remove aiohttp --> httpx
dherincx92 Dec 14, 2025
ba867a2
Add WIP parser tests; still few enhancements remaining in parser
dherincx92 Dec 14, 2025
22c42be
Fix trailing comma on param utilities
dherincx92 Dec 14, 2025
5fbfb61
fix docstring for param utilities
dherincx92 Dec 14, 2025
be51cfb
Add new fpdsWriter to handle chunking and gzipping of records
dherincx92 Dec 14, 2025
e88e844
Updated uv.lock dependencies
dherincx92 Dec 14, 2025
2312f64
Update CLI and assc. tests
dherincx92 Dec 14, 2025
4c0ef88
Add ezsearch url to config
dherincx92 Dec 14, 2025
d31e1fc
Add docstrings to CLI files
dherincx92 Dec 14, 2025
5b1160f
Update dependencies
dherincx92 Dec 14, 2025
39dad34
add headless option to selenium scraper
dherincx92 Dec 14, 2025
f815fc6
Install scripts extra in pyproject.toml
dherincx92 Dec 14, 2025
e50f9cb
Run formatters
dherincx92 Dec 14, 2025
761dd9a
Fix return type
dherincx92 Dec 14, 2025
e005c79
Fix circular import
dherincx92 Dec 14, 2025
b1be329
Fixing type issues where relevant
dherincx92 Dec 14, 2025
434855d
Fix last type issue
dherincx92 Dec 14, 2025
1b07b73
update README and add new xml test file with no links
dherincx92 Dec 15, 2025
3442646
remove coment
dherincx92 Dec 15, 2025
bec69d1
test xml
dherincx92 Dec 15, 2025
57bc79f
minor: update docstring
dherincx92 Dec 15, 2025
04194e6
Adding GHA step for installing just and modified tests
dherincx92 Jan 7, 2026
9ff4073
Install webdriver manager package
dherincx92 Jan 7, 2026
6ffd2e3
add trigger for new fields PR
dherincx92 Jan 7, 2026
4000175
minor: remove webdriver_manager package
dherincx92 Jan 7, 2026
98cd318
Fixing yml file workflow
dherincx92 Jan 7, 2026
c414dcd
Add pprint to scraper
dherincx92 Jan 7, 2026
5b9ef2d
Using setup-chromedriver job
dherincx92 Jan 7, 2026
b66578f
Add step for installing uv
dherincx92 Jan 7, 2026
0af6de2
pprint new options to test script
dherincx92 Jan 7, 2026
1cf3104
minor: update scraper
dherincx92 Jan 7, 2026
724b102
Add EOL
dherincx92 Jan 7, 2026
1be61f2
Update GHA for new fields
dherincx92 Jan 7, 2026
6b1752f
Thanks to Leon; updating types based on PEP-0604
dherincx92 Jan 7, 2026
e02ec83
run formatters
dherincx92 Jan 7, 2026
cd1ea7c
Add PR step
dherincx92 Jan 7, 2026
038c780
Push formatting for tests
dherincx92 Jan 7, 2026
71fd013
add formatters step in test-package GHA workflow to ensure formatter …
dherincx92 Jan 7, 2026
c56f8f2
Activate environment
dherincx92 Jan 7, 2026
b296d7d
update scraper logic
dherincx92 Jan 7, 2026
02b3c41
Update PR title
dherincx92 Jan 7, 2026
ea67a7f
Syntax error on new fields yaml pipeline
dherincx92 Jan 7, 2026
ed72251
Fix package dependencies and capping typing-extensions at major versi…
dherincx92 Jan 7, 2026
9ec43e9
Convert all classes to pascal case
dherincx92 Jan 7, 2026
c7c2e86
Fix casing references across repo to be pascal case
dherincx92 Jan 7, 2026
e4d882d
patch: [skip ci] Run formatters
dherincx92 Jan 7, 2026
7fb3e86
Add additional configs to auto commit
dherincx92 Jan 7, 2026
6ed0927
Remove bad options
dherincx92 Jan 7, 2026
4aeaafc
Add trailing whitespace when writing new fields.json file
dherincx92 Jan 7, 2026
dfe04a4
comment pull request action
dherincx92 Jan 7, 2026
92d9185
Fix parse to ensure dir var always exists
dherincx92 Jan 7, 2026
0df4e8a
Update Justfile
dherincx92 Jan 8, 2026
5d4d332
Update src/fpds/utilities/writer.py
dherincx92 Jan 8, 2026
0d58d3d
Update src/fpds/utilities/writer.py
dherincx92 Jan 8, 2026
15e410e
Update README.md
dherincx92 Jan 8, 2026
3abcb5f
fix indent in pyproject.toml
mitchbregs Jan 7, 2026
750adde
Remove PR template markdown
dherincx92 Jan 9, 2026
dcb4bf1
minor: update README
dherincx92 Jan 9, 2026
a6d22b0
Attempt to fix ascii
dherincx92 Jan 9, 2026
dae30b6
Add pre tag
dherincx92 Jan 9, 2026
39d6c3e
major: closing in on the final commit
dherincx92 Jan 9, 2026
260c79a
typo removing ref to Makefile
dherincx92 Jan 9, 2026
65f7f4b
Add img/ dir
dherincx92 Jan 9, 2026
67d3c1f
Update README
dherincx92 Jan 9, 2026
dab8c30
Updating scraper now that we can get input boxes correctly
dherincx92 Jan 9, 2026
dea96dc
patch: [skip ci] Run formatters
dherincx92 Jan 9, 2026
07d15c6
re run new fields workflow
dherincx92 Jan 9, 2026
d0d3246
Merge branch 'minor/add-params-cli-command' of github.com:dherincx92/…
dherincx92 Jan 9, 2026
7c2ea2a
create separate formatters job due to clashing commits with matrix
dherincx92 Jan 9, 2026
a1fa638
Add conditional to jst execute commit on 3.13
dherincx92 Jan 9, 2026
142cf77
minor: updating README and removing duplicate text
dherincx92 Jan 9, 2026
1fbafea
inspired by Leon; using list instead of List
dherincx92 Jan 9, 2026
cc53bba
Change default chunk size to 10 MB instead of 100MB
dherincx92 Jan 9, 2026
ef7c9f9
Add scraper job to retrieve data dict url and publish it in GHA
dherincx92 Jan 10, 2026
646bee9
patch: [skip ci] Run formatters
dherincx92 Jan 10, 2026
4855e69
Update config with new workspace URL
dherincx92 Jan 10, 2026
77fe357
Merge branch 'minor/add-params-cli-command' of github.com:dherincx92/…
dherincx92 Jan 10, 2026
2c05387
Updating new fields workflow format and adding GHA output feed versio…
dherincx92 Jan 10, 2026
6d51948
patch: [skip ci] Run formatters
dherincx92 Jan 10, 2026
dd59b8b
minor: update uv.lock and removing unstructured dependency
dherincx92 Jan 10, 2026
43fff9c
Merge branch 'minor/add-params-cli-command' of github.com:dherincx92/…
dherincx92 Jan 10, 2026
d72915d
Add new github module
dherincx92 Jan 10, 2026
abbc881
patch: [skip ci] Run formatters
dherincx92 Jan 10, 2026
d177a68
Update set github output function to support multiline
dherincx92 Jan 10, 2026
377440e
Merge branch 'minor/add-params-cli-command' of github.com:dherincx92/…
dherincx92 Jan 10, 2026
2bf3b26
Add $ to ensure grid is rendered
dherincx92 Jan 10, 2026
ec12118
Update PR template
dherincx92 Jan 10, 2026
c9b5788
patch: [skip ci] Run formatters
dherincx92 Jan 10, 2026
d0d00f3
Expand text grid for new fields workflow
dherincx92 Jan 10, 2026
5167453
Merge branch 'minor/add-params-cli-command' of github.com:dherincx92/…
dherincx92 Jan 10, 2026
e0224fb
Add emoji for status
dherincx92 Jan 10, 2026
92b4d59
patch: [skip ci] Run formatters
dherincx92 Jan 10, 2026
daa4a97
update name/description to make them reusable
dherincx92 Jan 10, 2026
96be3f0
fix conflicts
dherincx92 Jan 10, 2026
25b1143
minor: finalize PR format
dherincx92 Jan 10, 2026
6e13c67
remove unused parser test
dherincx92 Jan 10, 2026
43a35e2
Update docstring for pattern in CLI command
dherincx92 Jan 10, 2026
fa49453
patch: [skip ci] Run formatters
dherincx92 Jan 10, 2026
adbcf83
Add docstring for parse command
dherincx92 Jan 10, 2026
9c245dc
Merge branch 'minor/add-params-cli-command' of https://github.com/dhe…
dherincx92 Jan 10, 2026
cf5cfe4
Cleaning up some code and updating docstring for core FPDSRequest class
dherincx92 Jan 10, 2026
1fb5ffe
update docstrings in core/xml
dherincx92 Jan 10, 2026
4e0da0e
update docstring for scraper
dherincx92 Jan 10, 2026
7ba5be5
add missing docstrings
dherincx92 Jan 10, 2026
13b917b
Cleaning up README
dherincx92 Jan 10, 2026
235b5ba
Updating cron time
dherincx92 Jan 10, 2026
e77e8a9
Running scraper locally to ensure we get new fields from FPDS site di…
dherincx92 Jan 10, 2026
ffc1082
Remove pull request trigger for new fields GHA workflow
dherincx92 Jan 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions .github/workflows/pr-for-new-fields.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: New fpds search fields PR

on:
schedule:
- cron: "0 5 * * *"
workflow_dispatch:

jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: nanasess/setup-chromedriver@v2

- name: Install uv
uses: astral-sh/setup-uv@v5
- uses: extractions/setup-just@v3
with:
just-version: 1.46.0

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: 3.13

- name: Scrape FPDS Site
id: scrape
run: just scrape

- name: Create Pull Request
uses: peter-evans/create-pull-request@v6
with:
commit-message: "minor: updating fields.json with newly identified ezSearch fields"
title: "minor: 🤖 [AUTOMATED] new ezSearch fields"
body: |
Automated updates from FPDS ezSearch scraper.

✅ ATOM Feed Version: ${{ steps.scrape.outputs.feed_version }}
📃 Data Dictionary [here](${{ steps.scrape.outputs.data_dict_url }})

Before merging, please review the data dictionary file above and generate the appropriate regex pattern
for all new fields.

🔴 **Failures** 🔴
-----------------
The following fields were found in the ezSearch page, but their format could not be determined. If the
field is marked as ❌, you will need to find relevant metadata in the data dictionary file linked above.
If the field is marked as ✅, you can ignore this warning.

${{ steps.scrape.outputs.grid }}

branch: minor/new-ezsearch-fields
base: master
labels: automated, feature
draft: false
10 changes: 7 additions & 3 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,16 +59,20 @@ jobs:
- name: Install uv
uses: astral-sh/setup-uv@v5

- uses: extractions/setup-just@v3
with:
just-version: 1.46.0

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install Project
run: make install
run: just install

- name: Run Tests
run: make test
run: just test

publish:
needs: [cut-release, build-and-test]
Expand All @@ -85,4 +89,4 @@ jobs:
- name: Publish Package
env:
UV_PUBLISH_TOKEN: ${{ secrets.PYPI_API_TOKEN }}
run: make publish
run: just publish
20 changes: 15 additions & 5 deletions .github/workflows/test-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,30 @@ jobs:
- name: Install uv
uses: astral-sh/setup-uv@v5

- uses: extractions/setup-just@v3
with:
just-version: 1.46.0

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install Project
run: make install
run: just install

- name: Run Tests
run: make test
- name: Run linters
run: just formatters

- uses: actions/checkout@v4
- uses: stefanzweifel/git-auto-commit-action@v5
if: ${{ matrix.python-version == '3.13' }}
with:
fetch-depth: 0
commit_message: "patch: [skip ci] Run formatters"
file_pattern: "*.py"
disable_globbing: true

- name: Run Tests
run: just test

test-package-version:
runs-on: ubuntu-latest
Expand Down
16 changes: 8 additions & 8 deletions Makefile → Justfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
.PHONY: help venv install clean formatters mypy test local-test package publish
.DEFAULT_GOAL := help

help:
@python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)
default:
just --list

venv: ## defaults to creating virtual environment in current directory under .venv
@if [ -d .venv ]; then \
Expand All @@ -11,8 +8,8 @@ venv: ## defaults to creating virtual environment in current directory under .ve
uv venv; \
fi

install: venv ## checks if uv.lock is up-to-date and manually syncs all deps + extras
uv lock --check
install: venv ## updates uv.lock if needed and manually syncs all deps + extras
uv lock
uv sync --extra all

clean: ## Remove test and coverage artifacts
Expand All @@ -27,7 +24,7 @@ formatters: venv ## https://docs.astral.sh/ruff/formatter/#line-breaks
uv tool run ruff format

mypy: ## Typechecking with mypy
uv tool run mypy src/
uv run mypy src/

test: venv install ## Run unit tests with coverage
uv run -m pytest
Expand All @@ -40,3 +37,6 @@ package: ## builds project + artifacts in dist/ directory

publish: package ## publishes package to pypi
uv publish

scrape: venv install
uv run python src/fpds/scripts/scraper.py
139 changes: 88 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,127 @@
# fpds
<div align="center">
<pre>
__________ ____ _____
/ ____/ __ \/ __ \/ ___/
/ /_ / /_/ / / / /\__ \
/ __/ / ____/ /_/ /___/ /
/_/ /_/ /_____//____/
Welcome to a more user-friendly FPDS 🚀
</pre>
</div>
A light-weight, pythonic parser for the Federal Procurement Data System (FPDS) ATOM Feed.
Reference [here](https://www.fpds.gov/fpdsng_cms/index.php/en/).


## Motivation
The FPDS ATOM feed limits each request to 10 records, which forces users to deal with pagination. Additonally, data is exported as XML, which proves annoying. `fpds` will handle all pagination and data
transformation to provide users with a nice JSON representation of the
equivalent XML data and attributes.
To make FPDS data more accesible to developers.

This library helps users by doing the following:
- Automatically handling pagination
- Converting XML into a flat JSON structure

## Setup
As of version 1.5.0, this library manages dependencies using `uv`. It is
_highly_ recommended since this library is tested with it.

This library is based on the FPDS ezSearch interface that can be found
[here](https://www.fpds.gov/ezsearch/search.do?indexName=awardfull&templateName=1.5.3&s=FPDS.GOV&q=).

### Installing `uv`

You can follow any of the methods found [here](https://docs.astral.sh/uv/getting-started/installation/). If on Linux or MacOS, we recommend using Homebrew:
## Prerequisites
As of version 1.5.0, this library manages dependencies using `uv`. It is
_highly_ recommended since this library is tested with it. Note that this
README assumes you will install `uv` and therefore runs all commands within
its context.

### `uv`
You can follow any of the methods found [here](https://docs.astral.sh/uv/getting-started/installation/).
If on MacOS, we recommend using Homebrew:
```
$ brew install uv
```

Once `uv` is installed, you can use the project Makefile to ensure your local environment is synced with the latest library installation. Start by running `make install` — this will check the status of the `uv.lock` file, and install all project dependencies + extras
### `just`
A command runner inspired by `Makefile`, written in Rust.
```
$ brew install just
```

### Local Development
Once `uv` is installed, you can use the project Justfile to ensure your local environment
is synced with the latest library installation. Start by running `just install` — this
will check the status of the `uv.lock` file, and install all project dependencies +
package extras.

For linting and formatting, we use `ruff`. See `pyproject.toml`
for specific configuration.
## Usage
For a list of valid search criteria parameters, consult FPDS documentation
found [here](https://www.fpds.gov/wiki/index.php/Atom_Feed_Usage).

```
$ make formatters
```
### CLI
### `fields`
Returns fields available for API requests.

You can clean the clutter and unwanted noise from tools using:
To display all available fields:

```
$ make clean
$ uv run fpds fields
```

### Testing
If successful, you should see a nice, tabulated table in your terminal

![fields-cli-output](img/fpds-fields-cli-output.png)
<div align="center"><b>Figure 1 - Tabulated fields CLI output</b></div>
<br />


If you wanted to perform a more targeted search, add `-p`. For example, to
get all fields containing the text "vendor" anywhere in the name, you could
run the following:
```
$ make local-test
$ uv run fpds fields -p vendor
```

## Usage
For a list of valid search criteria parameters, consult FPDS documentation
found [here](https://www.fpds.gov/wiki/index.php/Atom_Feed_Usage). Parameters
will follow the `URL String` format shown in the link above, with the
following exceptions:
![fields-cli-vendor](img/fpds-fields-vendor.png)
<div align="center"><b>Figure 2 - Matching "vendor" fields</b></div>
<br />

+ Colons (:) will be replaced by equal signs (=)
+ Certain parameters enclose their value in quotations. `fpds` will
automatically determine if quotes are needed, so simply enclose your
entire criteria string in quotes.
### `parse`
Sends and parses records from an FPDS ATOM feed request

For example, `AGENCY_CODE:"3600"` should be used as `"AGENCY_CODE=3600"`.
Lets say you wanted records from the OFFICE OF THE INSPECTOR GENERAL.
Through your research, you identified that agency's code as 7504. For your
particular project, you are only interested in awards modified within the first
quarter of 2025. Using the `parse` command, you can easily retrieve records with
the following command:

Via CLI:
```
$ fpds parse "LAST_MOD_DATE=[2022/01/01, 2022/05/01]" "AGENCY_CODE=7504"
$ uv run fpds parse "LAST_MOD_DATE=[2022/01/01, 2022/03/31]" "AGENCY_CODE=7504"
```

By default, data will be dumped into an `.fpds` folder at the user's
`$HOME` directory. If you wish to override this behavior, provide the `-o`
option. The directory will be created if it doesn't exist.
With this command, you can specify as many filters as you want. Unfortunately due
to rate limitations with the ATOM feed, you _must_ have at least 1 filter.

As of v1.5.0, you can opt out of regex validation by setting the `-k` flag
to `False` — this is helpful in scenarios when either the regex pattern has
been altered by the ATOM feed or a new parameter name is supported, but not
yet added to the configuration in this library.
By default, this command will output records to a directory named `.fpds` in
your home directory. If you wish to output to a different location, specify your
location with `-o`:

```
$ fpds parse "LAST_MOD_DATE=[2022/01/01, 2022/05/01]" "AGENCY_CODE=7504" -o ~/.my-preferred-dir
$ uv run fpds parse "LAST_MOD_DATE=[2022/01/01, 2022/03/31]" "AGENCY_CODE=7504" -o /Users/Desktop/data
```

Same request via python interpreter:
```
## Python
Core parsing and transformation classes are exposed as first-class citizens.

```{python}

import asyncio
from fpds import fpdsRequest
from fpds import FPDSRequest

request = fpdsRequest(
LAST_MOD_DATE="[2022/01/01, 2022/05/01]",
request = FPDSRequest(
LAST_MOD_DATE="[2022/01/01, 2022/03/31]",
AGENCY_CODE="7504"
)

# will return the initial HTTP request a user would make if using Postman
request_url = request.__url__()

# total number of pages in request
page_count = request.page_count


# returns records as an async generator
gen = request.iter_data()

Expand All @@ -97,11 +134,10 @@ async for entry in gen:
records = asyncio.run(request.data())
```


# Highlights

Between v1.2.1 and v1.3.0, significant improvements were made with `asyncio`. Here are some rough benchmarks in estimated data extraction + post-processing
times:
Between v1.2.1 and v1.3.0, significant improvements were made with `asyncio`. Here are
some rough benchmarks in estimated data extraction + post-processing times:

| v1.2.1 | v.1.3.0 |
-------- | --------
Expand All @@ -116,4 +152,5 @@ This equates to a <u>**84.89%**</u> decrease in completion time!

# Notes

Please be aware that this project is an after-hours passion of mine. I do my best to accomodate requests the best I can, but I receive no $$$ for any of the work I do here.
Please be aware that this project is an after-hours passion of mine. I do my best
to accomodate requests, but I receive no $$$ for any of the work I do here.
Binary file added img/fpds-fields-cli-output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/fpds-fields-vendor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading