GitHub - isurulucky/csv-file-storage-service: CSV File Upload Service

CSV Persistence App using FastAPI

This is a simple python app which exposes rest api endpoints to store, retrieve and delete CSV documents.

Supported Functionalities

Upload CSV documents and store content in memory / local file system (extendable).
Stora metadata in a SQL DB table.
List all metadata with support for pagination and find metadata for a single document.
Delete uploaded documents.

Installation

Docker Compose Based Setup

Locate the docker-compose.yaml in the repository root, and use the docker-compose up command to start the setup. This will build a docker image from the code and start a container alongside a PostgreSQL container with everything wired automatically.

Manual installation

Create a python virtual environment and install the dependencies using the requirements.txt file using pip install -r requirements.txt. The requirements.txt file is in the root of the repository.
Start a PostgresSQL server and create a DB. The username, password, host, port of the server and the DB name used are required to start the application.
Expose an environment variable named DB_URL with the DB connection URL: export DB_URL='postgresql+psycopg://<username>:<password>@<host>:<port>/<dbname>' This will be used by the application to establish connection with the DB in the DB server.
Start the application using the command uvicorn svc:app --host 0.0.0.0 --port 8080.

Testing the Functionality

The base URL for the application is at http://localhost:8080/. The OpenAPI based web UI can be accessed at http://localhost:8080/docs via a browser. It is possible to test the app with the 'Try it out' functionality of the OpenAPI based web UI.

Testing via CURL Tool

The following curl commands can be used to invoke the application:

Upload a CSV document: curl -X 'POST' 'http://localhost:8080/files' -H 'accept: application/json' -H 'Content-Type: multipart/form-data' -F 'file=@genes_human.csv'

This will return the unique id which is assigned for the document, which can be used in the commands below.

List metadata of all uploaded documents. This endpoint supports pagination for efficiently accessing metadata of large number of documents. curl -X 'GET' 'http://localhost:8080/files?page=1&page_size=10' -H 'accept: application/json'
List metadata for a document identified by the given ID: curl -X 'GET' 'http://localhost:8080/files/ID/metadata' -H 'accept: application/json'
List contents for a document identified by the given ID: curl -X 'GET' 'http://localhost:8080/files/ID/data' -H 'accept: application/json'
Delete a document identified by the given ID: curl -X 'DELETE' 'http://localhost:8080/files/ID' -H 'accept: */*'

All the responses are returned in JSON format.

Special Features

The implementation is done such that loading the entire document to memory is minimized during saving and retrieval. This can be observed with the below graphs generated with mpprof tool in uvicorn. In both scenarios, 25 users with 100 requests in total were used, where each request uploaded a 50MB CSV.

Without loading the entire file to memory

Loading the entire file to memory

Rest api endpoint parameters are validated using pydantic.
The number of rows in a documented can be counted exactly or can be approximated. This approximation reads up to 10 MBs of content from large CSVs and proportionally calculates the rows considering the total size of the document. This feature is disabled by default, and can be enabled by setting the environment variable ESTIMATE_ROW_COUNT to True.

Future Improvements

Support resumable document uploads.
Improve the handling of partial inserts and partial deletes of documents between the content store and the metadata table.
Improve the document content retrieval such that for large documents, a link is provided to download directly.

This app was tested with python version 3.11.5. The unit and integration tests can be run with python -m pytest tests command.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
content		content
db		db
metadata		metadata
storagemanager		storagemanager
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
mem_consumption_in_mem.png		mem_consumption_in_mem.png
mem_consumption_streaming.png		mem_consumption_streaming.png
requirements.txt		requirements.txt
svc.py		svc.py
validation.py		validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSV Persistence App using FastAPI

Supported Functionalities

Installation

Docker Compose Based Setup

Manual installation

Testing the Functionality

Testing via CURL Tool

Special Features

Without loading the entire file to memory

Loading the entire file to memory

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CSV Persistence App using FastAPI

Supported Functionalities

Installation

Docker Compose Based Setup

Manual installation

Testing the Functionality

Testing via CURL Tool

Special Features

Without loading the entire file to memory

Loading the entire file to memory

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages