ClariESG 🌱

ClariESG is an end-to-end system designed to extract, clean, structure, and semantically query information contained in corporate sustainability reports (PDF). It integrates LLM-based language understanding, table extraction, numerical reasoning, and sector-aware contextualization through an interactive Gradio interface. The system supports ESG analysis, report comparison, automated table extraction, and RAG-based question answering.

✨ Main Features

PDF Upload & Processing

ClariESG automatically:

extracts the company name
identifies, cleans, and standardizes GRI-related tables
generates metadata
stores structured tables in table_dataset/
inserts dense and sparse embeddings into PostgreSQL

Table Dataset Management

For each processed report, the system generates a dedicated folder inside table_dataset/ containing:

cleaned CSV tables
extracted GRI indicators
metadata files

RAG-based Chatbot

The chatbot allows users to:

query uploaded reports
query companies
query industrial sectors
perform numerical reasoning using Program-of-Thought
retrieve relevant tables and text segments

Retrieval uses a hybrid dense + sparse strategy powered by OpenAI, LangChain, and pgvector.

Fully Dockerized

A single Docker container includes the Python backend, the Gradio interface, PostgreSQL with pgvector, and the full processing pipeline.

📦 Installation & Setup (Docker Only)

This is the only required setup method. No Git clone is necessary for normal usage. Before running ClariESG, you must install Docker Desktop on your machine.🐋

Download it from:
https://www.docker.com/products/docker-desktop/

Docker Desktop is required in order to pull, run, and manage the ClariESG container. Once installed, make sure it is running before executing any Docker commands.

1. Create a Local Folder

Create a directory on your Desktop, for example: clariesg/ Inside it, prepare the following structure:

clariesg/
│
├── .env
├── reports/
└── table_dataset/

reports/

Copy the entire reports/ folder from the GitHub repository into your local directory. From repo → Code button → Download ZIP extract and keep only the folder you need. It contains example sustainability reports used by the demo. You may also add your own PDF reports inside this folder.

table_dataset/

Create this empty folder.
ClariESG will automatically populate it as you process reports.

2. Create the `.env` File

Create a .env file inside your project folder with the following content. Make sure the file is named exactly .env and does not have extensions like .txt or similar.

POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=griqa
POSTGRES_PORT=5432
POSTGRES_EMB_TABLE_NAME=langchain_pg_embedding
POSTGRES_SPARSE_TABLE_NAME=sparse_table

DATABASE_URL=postgresql://postgres:postgres@127.0.0.1:5432/griqa

PYTHONHASHSEED=0

OPENAI_API_KEY=YOUR_OPENAI_KEY_HERE
OPENAI_MODEL_NAME=gpt-4o-mini
OPENAI_TEMPERATURE=0.2

Replace YOUR_OPENAI_KEY_HERE with your actual OpenAI key. You can create one here: https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key

3. Pull the Docker Image

⚠️ System Requirement: AMD64 Architecture

Open a shell and run the following command:

docker pull --platform linux/amd64 martasantacroce/clariesg:latest

4. Run the Docker Container

From inside your clariesg/ folder open a shell and digit:

docker run --platform linux/amd64 \
  --name clariesg_container \
  --env-file .env \
  -v ./reports:/app/reports \
  -v ./table_dataset:/app/table_dataset \
  -p 7860:7860 \
  -p 5432:5432 \
  -p 8080:8080 \
  martasantacroce/clariesg:latest

This command:

loads your .env
mounts reports/ as input
mounts table_dataset/ as output
exposes Gradio (7860) and PostgreSQL (5432)

To stop the container:

docker stop clariesg_container

After the first build and creation of the container, you can start it again at any time with:

docker start clariesg_container

5. Access the Web Interface

Open: http://localhost:7860

▶️ Usage Summary

Create a folder (clariesg/).
Copy the entire reports/ folder from GitHub.
Add your own PDF reports if desired.
Create an empty table_dataset/ folder.
Add the .env file.
Pull the Docker image.
Run the container.
Use the Gradio interface to upload, process, and query ESG data.

📄 License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
images		images
json_config		json_config
prompts		prompts
reports		reports
table_dataset		table_dataset
tester		tester
.env		.env
Dockerfile		Dockerfile
README.md		README.md
build_summary_company.py		build_summary_company.py
connectors.py		connectors.py
dataprocessor.py		dataprocessor.py
docker-compose.yml		docker-compose.yml
extract_company_sector.py		extract_company_sector.py
gradio_actions.py		gradio_actions.py
gradio_interface.py		gradio_interface.py
init_tables.sh		init_tables.sh
llm.py		llm.py
main.py		main.py
query_agent.py		query_agent.py
requirements.txt		requirements.txt
runnable.py		runnable.py
style.css		style.css
table_extraction.py		table_extraction.py
utils.py		utils.py
vector_store.py		vector_store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClariESG 🌱

✨ Main Features

PDF Upload & Processing

Table Dataset Management

RAG-based Chatbot

Fully Dockerized

📦 Installation & Setup (Docker Only)

1. Create a Local Folder

reports/

table_dataset/

2. Create the `.env` File

3. Pull the Docker Image

4. Run the Docker Container

5. Access the Web Interface

▶️ Usage Summary

📄 License

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

softlab-unimore/ClariESG

Folders and files

Latest commit

History

Repository files navigation

ClariESG 🌱

✨ Main Features

PDF Upload & Processing

Table Dataset Management

RAG-based Chatbot

Fully Dockerized

📦 Installation & Setup (Docker Only)

1. Create a Local Folder

reports/

table_dataset/

2. Create the .env File

3. Pull the Docker Image

4. Run the Docker Container

5. Access the Web Interface

▶️ Usage Summary

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

2. Create the `.env` File

Packages