syppp

A tiny tool to parse pdf files, identify blocks of content and to annotate pdf files visually.

Getting Started

Setup

In Development

Run ./install.sh

Installs OS dependencies.
Creates a new poetry lock to fetch latest dependency versions.
Runs poetry install.

For quick testing

Run poetry install to just install the python dependencies as defined in the lock file.

Start the Server

run poetry python -m uvicorn api.main_raw:app --host 0.0.0.0 --port 8090 --reload in the root of the project

The first start takes quite a while as different integrations load some transformer models. The completion of the bootstrap will be indicated by this log message: INFO: Application startup complete.

Now open your browser at http://127.0.0.1:8090/docs#/ to use the API.

Use the app

/parse-file gets a pdf file and an eps value to parse the content

eps stands for epsilon — it’s a parameter used by the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm and defines the maximum distance in pixels between components.
Choosing the right eps is crucial — too small, and nothing clusters; too large, and unrelated elements get grouped.

/annotate is a simple merge of the original pdf file with the output from the /parse-file endpoint. It generates a new pdf files with bounding boxes and comments inside.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

syppp

Getting Started

Setup

In Development

For quick testing

Start the Server

Use the app

About

Uh oh!

Releases

Packages

Languages

License

rumperto/syppp

Folders and files

Latest commit

History

Repository files navigation

syppp

Getting Started

Setup

In Development

For quick testing

Start the Server

Use the app

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages