Skip to content

isurulucky/csv-file-storage-service

Repository files navigation

CSV Persistence App using FastAPI

This is a simple python app which exposes rest api endpoints to store, retrieve and delete CSV documents.

Supported Functionalities

  • Upload CSV documents and store content in memory / local file system (extendable).
  • Stora metadata in a SQL DB table.
  • List all metadata with support for pagination and find metadata for a single document.
  • Delete uploaded documents.

Installation

Docker Compose Based Setup
  • Locate the docker-compose.yaml in the repository root, and use the docker-compose up command to start the setup. This will build a docker image from the code and start a container alongside a PostgreSQL container with everything wired automatically.
Manual installation
  • Create a python virtual environment and install the dependencies using the requirements.txt file using pip install -r requirements.txt. The requirements.txt file is in the root of the repository.
  • Start a PostgresSQL server and create a DB. The username, password, host, port of the server and the DB name used are required to start the application.
  • Expose an environment variable named DB_URL with the DB connection URL: export DB_URL='postgresql+psycopg://<username>:<password>@<host>:<port>/<dbname>' This will be used by the application to establish connection with the DB in the DB server.
  • Start the application using the command uvicorn svc:app --host 0.0.0.0 --port 8080.
Testing the Functionality

The base URL for the application is at http://localhost:8080/. The OpenAPI based web UI can be accessed at http://localhost:8080/docs via a browser. It is possible to test the app with the 'Try it out' functionality of the OpenAPI based web UI.

Testing via CURL Tool

The following curl commands can be used to invoke the application:

  • Upload a CSV document: curl -X 'POST' 'http://localhost:8080/files' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F 'file=@genes_human.csv'

This will return the unique id which is assigned for the document, which can be used in the commands below.

  • List metadata of all uploaded documents. This endpoint supports pagination for efficiently accessing metadata of large number of documents. curl -X 'GET' 'http://localhost:8080/files?page=1&page_size=10' -H 'accept: application/json'

  • List metadata for a document identified by the given ID: curl -X 'GET' 'http://localhost:8080/files/ID/metadata' -H 'accept: application/json'

  • List contents for a document identified by the given ID: curl -X 'GET' 'http://localhost:8080/files/ID/data' -H 'accept: application/json'

  • Delete a document identified by the given ID: curl -X 'DELETE' 'http://localhost:8080/files/ID' -H 'accept: */*'

All the responses are returned in JSON format.

Special Features
  • The implementation is done such that loading the entire document to memory is minimized during saving and retrieval. This can be observed with the below graphs generated with mpprof tool in uvicorn. In both scenarios, 25 users with 100 requests in total were used, where each request uploaded a 50MB CSV.
Without loading the entire file to memory

mem_consumption_streaming.png

Loading the entire file to memory

mem_consumption_in_mem.png

  • Rest api endpoint parameters are validated using pydantic.
  • The number of rows in a documented can be counted exactly or can be approximated. This approximation reads up to 10 MBs of content from large CSVs and proportionally calculates the rows considering the total size of the document. This feature is disabled by default, and can be enabled by setting the environment variable ESTIMATE_ROW_COUNT to True.
Future Improvements
  • Support resumable document uploads.
  • Improve the handling of partial inserts and partial deletes of documents between the content store and the metadata table.
  • Improve the document content retrieval such that for large documents, a link is provided to download directly.

This app was tested with python version 3.11.5. The unit and integration tests can be run with python -m pytest tests command.

About

CSV File Upload Service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors