databasegen

A professional web application that extracts text from PDF files and images, converts them to structured CSV format, and provides SQLite database export capabilities using AI-powered processing.

Features

Multi-Format Support - Process PDF files and images (PNG, JPG, JPEG, WebP)
AI-Powered Extraction - Uses Gemini 2.5 Flash for intelligent data structure recognition
Database Export - Convert extracted data to SQLite database format
Live Preview - View extracted data in formatted tables
Data Analysis Integration - Direct links to DataChat and QueryBot for data exploration
One-Click Downloads - Export CSV and SQLite database files
Modern Interface - Responsive Bootstrap-based design
Client-Side Processing - No server required, runs entirely in the browser
Real-Time Feedback - Visual progress indicators during processing

Technology Stack

Frontend: HTML5, JavaScript (ES6+), Bootstrap 5
PDF Processing: Pyodide + PyPDF2
Image Processing: AI vision for text extraction from images
Database: SQLite generation via pandas
AI Integration: Google Gemini 2.5 Flash API
Runtime: Browser-based Python execution via Pyodide

Quick Start

Setup

git clone https://github.com/prudhvi1709/databasegen.git
cd databasegen

Run Application

# Open index.html in a modern web browser
open index.html
# OR serve locally
python -m http.server 8000

Usage
- Upload PDF files or images
- Click "Process Files"
- Preview extracted data
- Download CSV or convert to SQLite database
- Use DataChat or QueryBot for data analysis

Project Structure

project/
├── index.html     # Main application interface
├── script.js      # Core application logic and utilities
├── README.md      # Documentation
├── LICENSE        # MIT License

How It Works

File Upload: Support for PDFs and image files
Text Extraction:
- PDFs: PyPDF2 extracts text content
- Images: AI vision analyzes and extracts tabular data
Data Processing: Gemini AI structures raw text into CSV format
Database Conversion: Pandas converts CSV to SQLite database
Export Options: Download CSV files or SQLite databases
Analysis Integration: Connect with external tools for data exploration

Data Analysis Options

After extracting your data, you can analyze it using:

For Small to Medium Files

DataChat: Upload CSV/DB files to datachat.straivedemo.com
Interactive web-based data analysis platform

For Large Files

QueryBot: Local data analysis tool
Installation: pip install querybot
Documentation: pypi.org/project/querybot

Configuration

The application uses the Gemini API for intelligent data extraction. The endpoint is configured in app.js:

const LLM_BASE_URL = "https://llmfoundry.straive.com/gemini/v1beta/models/gemini-2.5-flash:generateContent";

AI Processing Strategy

The system employs sophisticated prompts for optimal data extraction:

Identifies structured data patterns (tables, lists, key-value pairs)
Creates meaningful column headers
Handles missing values appropriately
Preserves data types and numerical accuracy
Combines multiple data sources intelligently

Browser Compatibility

Chrome 90+
Firefox 88+
Safari 14+
Edge 90+

Privacy and Security

Local Processing: File content processed in browser
Secure Communication: HTTPS for all API calls
No File Storage: Files not retained on servers
Minimal Data Transfer: Only processed text sent to AI service

Development

# Optional: Install development tools
npm install -g live-server prettier

# Start development server
python -m http.server

# Format code
prettier --write "**/*.js"

Troubleshooting

Pyodide Issues:

Ensure stable internet connection
Refresh page if loading fails
Check browser console for errors

Processing Errors:

Verify PDF is not password-protected
Ensure images contain clear, readable text/tables
Try smaller files for testing

Database Conversion:

Requires successful Pyodide initialization
Check browser support for WebAssembly

Contributing

Fork the repository
Create a feature branch
Implement changes with tests
Submit a pull request

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

databasegen

Features

Technology Stack

Quick Start

Project Structure

How It Works

Data Analysis Options

For Small to Medium Files

For Large Files

Configuration

AI Processing Strategy

Browser Compatibility

Privacy and Security

Development

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
script.js		script.js

License

prudhvi1709/databasegen

Folders and files

Latest commit

History

Repository files navigation

databasegen

Features

Technology Stack

Quick Start

Project Structure

How It Works

Data Analysis Options

For Small to Medium Files

For Large Files

Configuration

AI Processing Strategy

Browser Compatibility

Privacy and Security

Development

Troubleshooting

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages