This is a simple python app which exposes rest api endpoints to store, retrieve and delete CSV documents.
- Upload CSV documents and store content in memory / local file system (extendable).
- Stora metadata in a SQL DB table.
- List all metadata with support for pagination and find metadata for a single document.
- Delete uploaded documents.
- Locate the docker-compose.yaml in the repository root, and use the
docker-compose upcommand to start the setup. This will build a docker image from the code and start a container alongside a PostgreSQL container with everything wired automatically.
- Create a python virtual environment and install the dependencies using the requirements.txt file using
pip install -r requirements.txt. Therequirements.txtfile is in the root of the repository. - Start a PostgresSQL server and create a DB. The username, password, host, port of the server and the DB name used are required to start the application.
- Expose an environment variable named
DB_URLwith the DB connection URL:export DB_URL='postgresql+psycopg://<username>:<password>@<host>:<port>/<dbname>'This will be used by the application to establish connection with the DB in the DB server. - Start the application using the command
uvicorn svc:app --host 0.0.0.0 --port 8080.
The base URL for the application is at http://localhost:8080/. The OpenAPI based web UI can be accessed at http://localhost:8080/docs via a browser.
It is possible to test the app with the 'Try it out' functionality of the OpenAPI based web UI.
The following curl commands can be used to invoke the application:
- Upload a CSV document:
curl -X 'POST' 'http://localhost:8080/files' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F 'file=@genes_human.csv'
This will return the unique id which is assigned for the document, which can be used in the commands below.
-
List metadata of all uploaded documents. This endpoint supports pagination for efficiently accessing metadata of large number of documents.
curl -X 'GET' 'http://localhost:8080/files?page=1&page_size=10' -H 'accept: application/json' -
List metadata for a document identified by the given ID:
curl -X 'GET' 'http://localhost:8080/files/ID/metadata' -H 'accept: application/json' -
List contents for a document identified by the given ID:
curl -X 'GET' 'http://localhost:8080/files/ID/data' -H 'accept: application/json' -
Delete a document identified by the given ID:
curl -X 'DELETE' 'http://localhost:8080/files/ID' -H 'accept: */*'
All the responses are returned in JSON format.
- The implementation is done such that loading the entire document to memory is minimized during saving and retrieval. This can be observed with the below graphs generated with
mpproftool in uvicorn. In both scenarios, 25 users with 100 requests in total were used, where each request uploaded a 50MB CSV.
- Rest api endpoint parameters are validated using pydantic.
- The number of rows in a documented can be counted exactly or can be approximated. This approximation reads up to
10 MBsof content from large CSVs and proportionally calculates the rows considering the total size of the document. This feature is disabled by default, and can be enabled by setting the environment variableESTIMATE_ROW_COUNTtoTrue.
- Support resumable document uploads.
- Improve the handling of partial inserts and partial deletes of documents between the content store and the metadata table.
- Improve the document content retrieval such that for large documents, a link is provided to download directly.
This app was tested with python version 3.11.5. The unit and integration tests can be run with python -m pytest tests command.

