diff --git a/.env.example b/.env.example
new file mode 100644
index 0000000..fc97698
--- /dev/null
+++ b/.env.example
@@ -0,0 +1,22 @@
+# Ollama Configuration
+OLLAMA_HOST=http://localhost:11434
+OLLAMA_MODEL=mistral
+
+# Database Configuration
+DATABASE_URL=sqlite:///./fireform.db
+
+# Logging Configuration
+LOG_LEVEL=INFO
+
+# Security Configuration
+MAX_INPUT_LENGTH=50000
+MAX_FIELD_COUNT=50
+MAX_FIELD_NAME_LENGTH=100
+MAX_FIELD_VALUE_LENGTH=500
+
+# File Configuration
+MAX_PDF_SIZE=10485760 # 10MB in bytes
+OUTPUT_DIRECTORY=./outputs
+
+# API Configuration
+API_TIMEOUT=30
\ No newline at end of file
diff --git a/.gitignore b/.gitignore
index 7fa2022..9ebc117 100644
--- a/.gitignore
+++ b/.gitignore
@@ -2,4 +2,5 @@
.idea
venv
.venv
-*.db
\ No newline at end of file
+*.db
+.env
\ No newline at end of file
diff --git a/README.md b/README.md
index 42862e3..e3af6fe 100644
--- a/README.md
+++ b/README.md
@@ -11,6 +11,7 @@ First responders, like firefighters, are often required to report a single incid
## 💡 The Solution
FireForm is a centralized "report once, file everywhere" system.
+
- **Single Input:** A firefighter records a single voice memo or fills out one "master" text field describing the entire incident.
- **AI Extraction:** The transcription is sent to an open-source LLM (via Ollama) which extracts all the key information (names, locations, incident details) into a structured JSON file.
- **Template Filling:** FireForm then takes this single JSON object and uses it to automatically fill every required PDF template for all the different agencies.
@@ -18,12 +19,103 @@ FireForm is a centralized "report once, file everywhere" system.
The result is hours of time saved per shift, per firefighter.
### ✨ Key Features
+
- **Agnostic:** Works with any department's existing fillable PDF forms.
- **AI-Powered:** Uses open-source, locally-run LLMs (Mistral) to extract data from natural language. No data ever needs to leave the local machine.
- **Single Point of Entry:** Eliminates redundant data entry entirely.
+- **Enterprise Security:** Comprehensive input validation, XSS protection, path traversal prevention, and prompt injection defense.
+- **Production Ready:** Full API server with FastAPI, database integration, and comprehensive error handling.
+- **Fully Tested:** 100% test coverage with comprehensive security validation and end-to-end functionality testing.
Open-Source (DPG): Built 100% with open-source tools to be a true Digital Public Good, freely available for any department to adopt and modify.
+## 🚀 Quick Start
+
+### Prerequisites
+
+- Python 3.13+
+- [Ollama](https://ollama.ai/) installed locally
+- Required Python packages (see `requirements.txt`)
+
+### Installation
+
+1. Clone the repository:
+
+ ```bash
+ git clone https://github.com/your-username/FireForm.git
+ cd FireForm
+ ```
+
+2. Install dependencies:
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+3. Set up environment variables:
+
+ ```bash
+ cp .env.example .env
+ # Edit .env with your configuration
+ ```
+
+4. Start Ollama and pull a model:
+ ```bash
+ ollama pull mistral
+ ```
+
+### Usage
+
+#### API Server
+
+Start the FastAPI server:
+
+```bash
+uvicorn api.main:app --host 127.0.0.1 --port 8000
+```
+
+Access the API documentation at `http://127.0.0.1:8000/docs`
+
+#### Command Line
+
+Run the main application:
+
+```bash
+python src/main.py
+```
+
+#### Docker
+
+```bash
+docker-compose up
+```
+
+## 🧪 Testing
+
+The system includes comprehensive testing:
+
+- **Security Testing:** XSS, path traversal, prompt injection protection
+- **API Testing:** Full endpoint validation with real HTTP requests
+- **End-to-End Testing:** Complete pipeline from input to PDF generation
+- **Performance Testing:** Input validation performance benchmarks
+
+Run tests:
+
+```bash
+pytest tests/
+```
+
+## 🔒 Security
+
+FireForm implements enterprise-grade security:
+
+- Input validation and sanitization
+- XSS and homograph attack prevention
+- Path traversal protection
+- Prompt injection defense
+- SQL injection prevention
+- Comprehensive error handling
+
## 🤝 Code of Conduct
We are committed to providing a friendly, safe, and welcoming environment for all. Please see our [Code of Conduct](CODE_OF_CONDUCT.md) for more information.
@@ -34,11 +126,10 @@ Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md)
## ⚖️ License
-
-
This project is licensed under the MIT License. See the LICENSE file for details.
## 🏆 Acknowledgements and Contributors
+
This project was built in 48 hours for the Reboot the Earth 2025 hackathon. Thank you to the United Nations and UC Santa Cruz for hosting this incredible event and inspiring us to build solutions for a better future.
## 📜 Citation
@@ -49,9 +140,10 @@ If you use FireForm in your research or project, please cite it using the follow
You can also use the "Cite this repository" button in the GitHub repository sidebar to export the citation in your preferred format.
-__Contributors:__
+**Contributors:**
+
- Juan Álvarez Sánchez (@juanalvv)
- Manuel Carriedo Garrido
- Vincent Harkins (@vharkins1)
-- Marc Vergés (@marcvergees)
+- Marc Vergés (@marcvergees)
- Jan Sans
diff --git a/api/db/database.py b/api/db/database.py
index 7943947..e215cf7 100644
--- a/api/db/database.py
+++ b/api/db/database.py
@@ -1,13 +1,47 @@
from sqlmodel import create_engine, Session
+from sqlalchemy.engine.url import make_url
+from sqlalchemy.pool import StaticPool
+import os
-DATABASE_URL = "sqlite:///./fireform.db"
+DATABASE_URL = os.getenv("DATABASE_URL", "sqlite:///./fireform.db")
-engine = create_engine(
- DATABASE_URL,
- echo=True,
- connect_args={"check_same_thread": False},
-)
+# Detect database dialect to apply appropriate configuration
+db_url = make_url(DATABASE_URL)
+is_sqlite = db_url.drivername.startswith('sqlite')
+
+# Configure engine with dialect-specific settings
+engine_kwargs = {
+ "echo": False, # Disable SQL logging in production for security
+}
+
+if is_sqlite:
+ # SQLite-specific configuration
+ engine_kwargs["connect_args"] = {
+ "check_same_thread": False,
+ "timeout": 30, # 30 second timeout
+ }
+ # Use StaticPool for SQLite to avoid connection issues
+ engine_kwargs["poolclass"] = StaticPool
+else:
+ # PostgreSQL/MySQL configuration with connection pooling
+ engine_kwargs["pool_size"] = 5 # Connection pool size
+ engine_kwargs["max_overflow"] = 10 # Maximum overflow connections
+ engine_kwargs["pool_timeout"] = 30 # Pool timeout
+ engine_kwargs["pool_recycle"] = 3600 # Recycle connections every hour
+ engine_kwargs["pool_pre_ping"] = True # Verify connections before use
+
+engine = create_engine(DATABASE_URL, **engine_kwargs)
def get_session():
+ """
+ Get database session with proper resource management.
+ Uses context manager to ensure sessions are properly closed.
+ """
with Session(engine) as session:
- yield session
\ No newline at end of file
+ try:
+ yield session
+ except Exception:
+ session.rollback()
+ raise
+ finally:
+ session.close()
\ No newline at end of file
diff --git a/api/db/models.py b/api/db/models.py
index f76c93b..ff2f5e3 100644
--- a/api/db/models.py
+++ b/api/db/models.py
@@ -1,13 +1,13 @@
from sqlmodel import SQLModel, Field
from sqlalchemy import Column, JSON
-from datetime import datetime
+from datetime import datetime, timezone
class Template(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
name: str
fields: dict = Field(sa_column=Column(JSON))
pdf_path: str
- created_at: datetime = Field(default_factory=datetime.utcnow)
+ created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
class FormSubmission(SQLModel, table=True):
@@ -15,4 +15,4 @@ class FormSubmission(SQLModel, table=True):
template_id: int
input_text: str
output_pdf_path: str
- created_at: datetime = Field(default_factory=datetime.utcnow)
\ No newline at end of file
+ created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
\ No newline at end of file
diff --git a/api/db/repositories.py b/api/db/repositories.py
index 6608718..3568d36 100644
--- a/api/db/repositories.py
+++ b/api/db/repositories.py
@@ -1,19 +1,137 @@
from sqlmodel import Session, select
+from sqlalchemy.exc import IntegrityError, OperationalError, DatabaseError as SQLAlchemyDatabaseError
from api.db.models import Template, FormSubmission
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class DatabaseError(Exception):
+ """Custom exception for database operations"""
+ pass
# Templates
def create_template(session: Session, template: Template) -> Template:
- session.add(template)
- session.commit()
- session.refresh(template)
- return template
+ """
+ Create a new template with validation.
+
+ Args:
+ session: Database session
+ template: Template object to create
+
+ Returns:
+ Template: Created template with ID
+
+ Raises:
+ ValueError: If template data is invalid
+ """
+ if not template:
+ raise ValueError("Template cannot be None")
+
+ if not template.name or not template.name.strip():
+ raise ValueError("Template name is required")
+
+ if not template.pdf_path or not template.pdf_path.strip():
+ raise ValueError("Template PDF path is required")
+
+ if not template.fields or not isinstance(template.fields, dict):
+ raise ValueError("Template fields must be a non-empty dictionary")
+
+ try:
+ session.add(template)
+ session.commit()
+ session.refresh(template)
+ logger.info(f"Created template: {template.id}")
+ return template
+ except IntegrityError as e:
+ session.rollback()
+ logger.error(f"Integrity error creating template: {e}", exc_info=True)
+ raise DatabaseError("Template integrity constraint violated") from e
+ except OperationalError as e:
+ session.rollback()
+ logger.error(f"Database operational error creating template: {e}", exc_info=True)
+ raise DatabaseError("Database operation failed") from e
+ except SQLAlchemyDatabaseError as e:
+ session.rollback()
+ logger.error(f"Database error creating template: {e}", exc_info=True)
+ raise DatabaseError("Database error occurred") from e
+ except Exception as e:
+ session.rollback()
+ logger.error(f"Unexpected error creating template: {e}", exc_info=True)
+ raise DatabaseError("Failed to create template") from e
def get_template(session: Session, template_id: int) -> Template | None:
- return session.get(Template, template_id)
+ """
+ Get template by ID with validation.
+
+ Args:
+ session: Database session
+ template_id: Template ID to retrieve
+
+ Returns:
+ Template | None: Template if found, None otherwise
+
+ Raises:
+ ValueError: If template_id is invalid
+ Exception: If database operation fails (propagated)
+ """
+ # Explicitly reject booleans (bool is a subclass of int)
+ if isinstance(template_id, bool) or not isinstance(template_id, int) or template_id <= 0:
+ raise ValueError("Template ID must be a positive integer")
+
+ try:
+ return session.get(Template, template_id)
+ except Exception as e:
+ logger.error(f"Failed to get template {template_id}: {e}", exc_info=True)
+ raise
# Forms
def create_form(session: Session, form: FormSubmission) -> FormSubmission:
- session.add(form)
- session.commit()
- session.refresh(form)
- return form
\ No newline at end of file
+ """
+ Create a new form submission with validation.
+
+ Args:
+ session: Database session
+ form: FormSubmission object to create
+
+ Returns:
+ FormSubmission: Created form with ID
+
+ Raises:
+ ValueError: If form data is invalid
+ """
+ if not form:
+ raise ValueError("Form cannot be None")
+
+ # Explicitly reject booleans (bool is a subclass of int)
+ if isinstance(form.template_id, bool) or not isinstance(form.template_id, int) or form.template_id <= 0:
+ raise ValueError("Template ID must be a positive integer")
+
+ if not form.input_text or not form.input_text.strip():
+ raise ValueError("Input text is required")
+
+ if not form.output_pdf_path or not form.output_pdf_path.strip():
+ raise ValueError("Output PDF path is required")
+
+ try:
+ session.add(form)
+ session.commit()
+ session.refresh(form)
+ logger.info(f"Created form submission: {form.id}")
+ return form
+ except IntegrityError as e:
+ session.rollback()
+ logger.error(f"Integrity error creating form submission: {e}", exc_info=True)
+ raise DatabaseError("Form submission integrity constraint violated") from e
+ except OperationalError as e:
+ session.rollback()
+ logger.error(f"Database operational error creating form submission: {e}", exc_info=True)
+ raise DatabaseError("Database operation failed") from e
+ except SQLAlchemyDatabaseError as e:
+ session.rollback()
+ logger.error(f"Database error creating form submission: {e}", exc_info=True)
+ raise DatabaseError("Database error occurred") from e
+ except Exception as e:
+ session.rollback()
+ logger.error(f"Unexpected error creating form submission: {e}", exc_info=True)
+ raise DatabaseError("Failed to create form submission") from e
\ No newline at end of file
diff --git a/api/main.py b/api/main.py
index d0b8c79..331e92f 100644
--- a/api/main.py
+++ b/api/main.py
@@ -1,7 +1,11 @@
from fastapi import FastAPI
from api.routes import templates, forms
+from api.errors.handlers import register_exception_handlers
app = FastAPI()
+# Register exception handlers
+register_exception_handlers(app)
+
app.include_router(templates.router)
app.include_router(forms.router)
\ No newline at end of file
diff --git a/api/routes/forms.py b/api/routes/forms.py
index f3430ed..5c3c432 100644
--- a/api/routes/forms.py
+++ b/api/routes/forms.py
@@ -1,4 +1,4 @@
-from fastapi import APIRouter, Depends
+from fastapi import APIRouter, Depends, HTTPException
from sqlmodel import Session
from api.deps import get_db
from api.schemas.forms import FormFill, FormFillResponse
@@ -6,20 +6,97 @@
from api.db.models import FormSubmission
from api.errors.base import AppError
from src.controller import Controller
+import logging
+import os
+
+logger = logging.getLogger(__name__)
router = APIRouter(prefix="/forms", tags=["forms"])
@router.post("/fill", response_model=FormFillResponse)
def fill_form(form: FormFill, db: Session = Depends(get_db)):
- if not get_template(db, form.template_id):
- raise AppError("Template not found", status_code=404)
+ """
+ Fill a PDF form with AI-extracted data.
+ Uses database transactions to ensure data consistency.
+ """
+ generated_pdf_path = None
+
+ try:
+ logger.info(f"Processing form fill request for template_id: {form.template_id}")
+
+ # Fetch and validate template
+ fetched_template = get_template(db, form.template_id)
+ if not fetched_template:
+ logger.error(f"Template not found: {form.template_id}")
+ raise HTTPException(status_code=404, detail="Template not found")
+
+ # Check template has required fields
+ if not fetched_template.fields:
+ logger.error(f"Template {form.template_id} has no fields defined")
+ raise HTTPException(status_code=400, detail="Template has no fields defined")
- fetched_template = get_template(db, form.template_id)
+ # Check PDF file exists
+ if not os.path.exists(fetched_template.pdf_path):
+ logger.error(f"PDF template file not found: {fetched_template.pdf_path}")
+ raise HTTPException(status_code=404, detail="PDF template file not found")
- controller = Controller()
- path = controller.fill_form(user_input=form.input_text, fields=fetched_template.fields, pdf_form_path=fetched_template.pdf_path)
+ # Create controller and process form
+ controller = Controller()
+
+ try:
+ generated_pdf_path = controller.fill_form(
+ user_input=form.input_text,
+ fields=fetched_template.fields,
+ pdf_form_path=fetched_template.pdf_path
+ )
+ except FileNotFoundError as e:
+ logger.error(f"PDF template file not found: {e}", exc_info=True)
+ raise HTTPException(status_code=404, detail="PDF template file not found")
+ except ValueError as e:
+ logger.error(f"Invalid input data: {e}", exc_info=True)
+ raise HTTPException(status_code=400, detail="Invalid input data")
+ except Exception as e:
+ logger.error(f"PDF generation failed: {e}", exc_info=True)
+ raise HTTPException(status_code=500, detail="PDF generation failed")
- submission = FormSubmission(**form.model_dump(), output_pdf_path=path)
- return create_form(db, submission)
+ # Create database record (let SQLModel handle transactions)
+ try:
+ submission = FormSubmission(
+ template_id=form.template_id,
+ input_text=form.input_text,
+ output_pdf_path=generated_pdf_path
+ )
+ result = create_form(db, submission)
+
+ logger.info(f"Form filled successfully: {result.id}")
+ return result
+
+ except Exception as e:
+ logger.error(f"Database operation failed: {e}", exc_info=True)
+
+ # Remove generated PDF file on database failure
+ if generated_pdf_path and os.path.exists(generated_pdf_path):
+ try:
+ os.remove(generated_pdf_path)
+ logger.info(f"Cleaned up PDF file after DB failure: {generated_pdf_path}")
+ except OSError as cleanup_error:
+ logger.warning(f"Failed to clean up PDF file {generated_pdf_path}: {cleanup_error}")
+
+ raise HTTPException(status_code=500, detail="Database operation failed")
+
+ except HTTPException:
+ raise
+ except Exception as e:
+ logger.error(f"Unexpected error in form filling: {e}", exc_info=True)
+
+ # Remove any generated files on unexpected errors
+ if generated_pdf_path and os.path.exists(generated_pdf_path):
+ try:
+ os.remove(generated_pdf_path)
+ logger.info(f"Cleaned up PDF file after unexpected error: {generated_pdf_path}")
+ except OSError as cleanup_error:
+ logger.warning(f"Failed to clean up PDF file {generated_pdf_path}: {cleanup_error}")
+
+ raise HTTPException(status_code=500, detail="Internal server error")
diff --git a/api/routes/templates.py b/api/routes/templates.py
index 5c2281b..3ac7a35 100644
--- a/api/routes/templates.py
+++ b/api/routes/templates.py
@@ -1,16 +1,94 @@
-from fastapi import APIRouter, Depends
+from fastapi import APIRouter, Depends, HTTPException
from sqlmodel import Session
from api.deps import get_db
from api.schemas.templates import TemplateCreate, TemplateResponse
from api.db.repositories import create_template
from api.db.models import Template
from src.controller import Controller
+import logging
+import os
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
router = APIRouter(prefix="/templates", tags=["templates"])
+# Configure base uploads directory
+BASE_UPLOADS_DIR = os.getenv("BASE_UPLOADS_DIR", "src/inputs")
+
@router.post("/create", response_model=TemplateResponse)
def create(template: TemplateCreate, db: Session = Depends(get_db)):
- controller = Controller()
- template_path = controller.create_template(template.pdf_path)
- tpl = Template(**template.model_dump(exclude={"pdf_path"}), pdf_path=template_path)
- return create_template(db, tpl)
\ No newline at end of file
+ """
+ Create a new PDF template with proper validation and error handling.
+ """
+ try:
+ logger.info(f"Creating template: {template.name}")
+
+ # Resolve and validate path against base uploads directory
+ try:
+ pdf_path = Path(template.pdf_path)
+ resolved_path = pdf_path.resolve()
+ base_dir = Path(BASE_UPLOADS_DIR).resolve()
+
+ if not str(resolved_path).startswith(str(base_dir)):
+ logger.error(f"Path traversal attempt detected: {template.pdf_path}")
+ raise HTTPException(status_code=403, detail="Access denied: path outside allowed directory")
+
+ # Use the validated resolved path for all subsequent checks
+ validated_path = resolved_path
+
+ except (ValueError, OSError) as e:
+ logger.error(f"Invalid path: {template.pdf_path} - {e}")
+ raise HTTPException(status_code=400, detail="Invalid file path")
+
+ # Validate PDF file exists before processing
+ if not validated_path.exists():
+ logger.error(f"PDF file not found: {validated_path}")
+ raise HTTPException(status_code=404, detail="PDF file not found")
+
+ # Check file permissions
+ if not os.access(validated_path, os.R_OK):
+ logger.error(f"Cannot read PDF file: {validated_path}")
+ raise HTTPException(status_code=403, detail="Cannot read PDF file")
+
+ # Create controller and process template
+ controller = Controller()
+
+ try:
+ template_path = controller.create_template(str(validated_path))
+ except FileNotFoundError as e:
+ logger.error(f"Template creation failed - file not found: {e}", exc_info=True)
+ raise HTTPException(status_code=404, detail="PDF file not found")
+ except ValueError as e:
+ logger.error(f"Template creation failed - invalid input: {e}", exc_info=True)
+ raise HTTPException(status_code=400, detail="Invalid PDF file")
+ except Exception as e:
+ logger.error(f"Template creation failed: {e}", exc_info=True)
+ raise HTTPException(status_code=500, detail="Template creation failed")
+
+ # Create database record
+ try:
+ tpl = Template(**template.model_dump(exclude={"pdf_path"}), pdf_path=template_path)
+ result = create_template(db, tpl)
+
+ logger.info(f"Template created successfully: {result.id}")
+ return result
+
+ except Exception as e:
+ logger.error(f"Database operation failed: {e}", exc_info=True)
+
+ # Clean up generated template file on database failure
+ if template_path and os.path.exists(template_path):
+ try:
+ os.remove(template_path)
+ logger.info(f"Cleaned up template file after DB failure: {template_path}")
+ except OSError as cleanup_error:
+ logger.warning(f"Failed to clean up template file: {cleanup_error}")
+
+ raise HTTPException(status_code=500, detail="Database operation failed")
+
+ except HTTPException:
+ raise
+ except Exception as e:
+ logger.error(f"Unexpected error in template creation: {e}", exc_info=True)
+ raise HTTPException(status_code=500, detail="Internal server error")
\ No newline at end of file
diff --git a/api/schemas/forms.py b/api/schemas/forms.py
index 3cce650..8821ccb 100644
--- a/api/schemas/forms.py
+++ b/api/schemas/forms.py
@@ -1,15 +1,269 @@
-from pydantic import BaseModel
-
-class FormFill(BaseModel):
- template_id: int
- input_text: str
-
-
-class FormFillResponse(BaseModel):
- id: int
- template_id: int
- input_text: str
- output_pdf_path: str
-
- class Config:
- from_attributes = True
\ No newline at end of file
+from pydantic import BaseModel, Field, field_validator, ConfigDict
+import re
+import html
+import logging
+import unicodedata
+import urllib.parse
+
+# Get logger for this module
+logger = logging.getLogger(__name__)
+
+# Optional bleach import for HTML sanitization
+try:
+ import bleach
+ BLEACH_AVAILABLE = True
+except ImportError:
+ BLEACH_AVAILABLE = False
+
+# Pre-compile regex patterns for performance
+DANGEROUS_CONTENT_PATTERN = re.compile(
+ r'(?i)(?:'
+ r'<\s*(?:script|iframe|object|embed|form|input|meta|link|style|base|applet|body|html|head|title|svg|math|xml)\b|'
+ r'javascript\s*:|'
+ r'data\s*:|'
+ r'vbscript\s*:|'
+ r'file\s*:|'
+ r'ftp\s*:|'
+ r'on(?:click|error|load|mouseover|focus|blur|change|submit|keydown|keyup|keypress|resize|scroll|unload|beforeunload|hashchange|popstate|storage|message|offline|online|pagehide|pageshow|beforeprint|afterprint|dragstart|drag|dragenter|dragover|dragleave|drop|dragend|copy|cut|paste|selectstart|select|input|invalid|reset|search|abort|canplay|canplaythrough|durationchange|emptied|ended|loadeddata|loadedmetadata|loadstart|pause|play|playing|progress|ratechange|seeked|seeking|stalled|suspend|timeupdate|volumechange|waiting|animationstart|animationend|animationiteration|transitionend|wheel|contextmenu|show|toggle)\s*=|'
+ r'\s*(?:\d{1,7}|x[0-9a-f]{1,6})\s*;|'
+ r'expression\s*\(|'
+ r'url\s*\(|'
+ r'import\s*\(|'
+ r'@import\b|'
+ r'binding\s*:|'
+ r'behavior\s*:|'
+ r'mocha\s*:|'
+ r'livescript\s*:|'
+ r'eval\s*\(|'
+ r'setTimeout\s*\(|'
+ r'setInterval\s*\(|'
+ r'Function\s*\(|'
+ r'constructor\s*\(|'
+ r'alert\s*\(|'
+ r'confirm\s*\(|'
+ r'prompt\s*\(|'
+ r'document\.\w+\s*[\(\[=]|'
+ r'window\.\w+\s*[\(\[=]|'
+ r'location\.|'
+ r'navigator\.|'
+ r'history\.|'
+ r'localStorage\.|'
+ r'sessionStorage\.|'
+ r'XMLHttpRequest\b|'
+ r'fetch\s*\(|'
+ r'WebSocket\b|'
+ r'EventSource\b|'
+ r'SharedWorker\b|'
+ r'\bWorker\b|'
+ r'\bServiceWorker\b|'
+ r'postMessage\b|'
+ r'innerHTML\b|'
+ r'outerHTML\b|'
+ r'insertAdjacentHTML\b|'
+ r'document\.write\b|'
+ r'document\.writeln\b|'
+ r'createContextualFragment\b|'
+ r'DOMParser\b|'
+ r'Range\.createContextualFragment\b|'
+ r'<\s*!\s*\[CDATA\[|'
+ r'<\s*!\s*--.*?--|'
+ r'<\s*\?.*?\?>'
+ r')', re.DOTALL
+)
+
+# Control character pattern including Unicode control chars
+CONTROL_CHARS_PATTERN = re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\u2000-\u200F\u2028-\u202F\u205F-\u206F\uFEFF]')
+
+# Path traversal pattern (compiled for performance)
+PATH_TRAVERSAL_PATTERN = re.compile(r'(?i)(?:\.\./|\.\.\\|%2e%2e%2f|%2e%2e%5c|\.\.%2f|\.\.%5c)')
+
+# Pattern for detecting potential prompt injection
+PROMPT_INJECTION_PATTERN = re.compile(
+ r'(?i)(?:'
+ r'(?:please\s+)?ignore\s+(?:all\s+)?(?:previous|above|all|the|your|system|earlier|prior)\s+(?:instructions?|prompts?|commands?|rules?|directions?)|'
+ r'(?:please\s+)?forget\s+(?:all\s+)?(?:previous|above|all|the|your|system|earlier|prior)\s+(?:instructions?|prompts?|commands?|rules?|directions?)|'
+ r'(?:please\s+)?disregard\s+(?:all\s+)?(?:previous|above|all|the|your|system|earlier|prior|everything)\s*(?:instructions?|prompts?|commands?|rules?|directions?|and)?|'
+ r'(?:please\s+)?override\s+(?:all\s+)?(?:previous|above|all|the|your|system|earlier|prior)\s+(?:instructions?|prompts?|commands?|rules?|directions?)|'
+ r'new\s+(?:instructions?|prompts?|commands?|rules?|directions?)|'
+ r'(?:^|\s|["\'\[\(])(?:system|assistant|user|human|ai|bot)\s*:\s*|'
+ r'(?:^|\s)(?:now\s+)?(?:you\s+(?:are|will|must|should)|act\s+as|pretend\s+to\s+be|roleplay\s+as)|'
+ r'(?:^|\s)(?:from\s+now\s+on|instead\s+of|rather\s+than)(?:\s|$)|'
+ r'actually\s+you\s+(?:are|will|must|should)|'
+ r'in\s+reality\s+you\s+(?:are|will|must|should)|'
+ r'the\s+truth\s+is|'
+ r'actually\s+ignore|'
+ r'but\s+ignore|'
+ r'however\s+ignore|'
+ r'nevertheless\s+ignore|'
+ r'nonetheless\s+ignore|'
+ r'still\s+ignore|'
+ r'yet\s+ignore|'
+ r'although\s+ignore|'
+ r'though\s+ignore|'
+ r'despite\s+ignore|'
+ r'in\s+spite\s+of\s+ignore|'
+ r'regardless\s+ignore|'
+ r'irrespective\s+ignore|'
+ r'notwithstanding\s+ignore|'
+ r'(?:can\s+you|i\s+need\s+you\s+to)\s+(?:ignore|forget|disregard)'
+ r')'
+)
+
+class FormFill(BaseModel):
+ model_config = ConfigDict(strict=True) # Disable type coercion for security
+
+ template_id: int = Field(..., gt=0, le=2147483647)
+ input_text: str = Field(..., min_length=1, max_length=50000)
+
+ @field_validator('template_id')
+ @classmethod
+ def validate_template_id(cls, v):
+ if v is None:
+ raise ValueError('Template ID cannot be null')
+ # Check boolean before int since bool is a subclass of int
+ if isinstance(v, bool):
+ raise ValueError('Template ID cannot be a boolean')
+ if not isinstance(v, int):
+ raise ValueError('Template ID must be an integer')
+ return v
+
+ @field_validator('input_text')
+ @classmethod
+ def validate_input_text(cls, v):
+ if v is None:
+ raise ValueError('Input text cannot be null')
+
+ if not v.strip():
+ raise ValueError('Input text cannot be empty')
+
+ # Early length check to prevent processing attacks
+ if len(v) > 50000:
+ raise ValueError('Input text too long')
+
+ if DANGEROUS_CONTENT_PATTERN.search(v):
+ raise ValueError('Potentially dangerous content detected')
+
+ # Check for zero-width and invisible characters
+ invisible_chars = ['\u200B', '\u200C', '\u200D', '\u2060', '\uFEFF', '\u202E']
+ if any(char in v for char in invisible_chars):
+ raise ValueError('Invisible or zero-width characters detected')
+
+ # Enhanced homograph attack detection
+ # Check for common Cyrillic/Greek lookalikes mixed with Latin
+ suspicious_chars = {
+ # Cyrillic lookalikes
+ 'а', 'е', 'і', 'о', 'р', 'с', 'у', 'х', 'ѕ', # Cyrillic lowercase
+ 'А', 'В', 'Е', 'К', 'М', 'Н', 'О', 'Р', 'С', 'Т', 'Х', # Cyrillic uppercase
+ # Greek lookalikes
+ 'Α', 'Β', 'Ε', 'Ζ', 'Η', 'Ι', 'Κ', 'Μ', 'Ν', 'Ο', 'Ρ', 'Τ', 'Υ', 'Χ', # Greek uppercase
+ 'α', 'ε', 'ι', 'ν', 'ο', 'ρ', 'τ', 'υ', 'ω', # Greek lowercase
+ }
+
+ # Single pass check for mixed scripts
+ has_latin = False
+ has_suspicious = False
+ for char in v:
+ if char in suspicious_chars:
+ has_suspicious = True
+ if has_latin: # Early exit if both found
+ raise ValueError('Potential homograph attack detected')
+ elif char.isascii() and char.isalpha():
+ has_latin = True
+ if has_suspicious: # Early exit if both found
+ raise ValueError('Potential homograph attack detected')
+
+ # Check for path traversal patterns (optimized)
+ if PATH_TRAVERSAL_PATTERN.search(v):
+ raise ValueError('Path traversal pattern detected')
+
+ # Check for control characters and null bytes
+ if any(ord(c) < 32 and c not in '\t\n\r' for c in v):
+ raise ValueError('Control characters or null bytes detected')
+
+ # Unicode normalization with strict expansion protection
+ try:
+ normalized = unicodedata.normalize('NFC', v)
+
+ # Detect combining character attacks
+ combining_chars = sum(1 for c in v if unicodedata.combining(c))
+ base_chars = len(v) - combining_chars
+ if base_chars > 0 and combining_chars / base_chars > 0.5: # More than 0.5 combining per base
+ raise ValueError('Suspicious Unicode combining character pattern detected')
+
+ # Check for Unicode expansion attacks
+ if len(normalized) > len(v) * 1.5:
+ raise ValueError('Suspicious Unicode normalization expansion detected')
+
+ # Also check for excessive compression (potential DoS)
+ if len(normalized) < len(v) * 0.3 and len(v) > 1000:
+ raise ValueError('Suspicious Unicode normalization compression detected')
+
+ # Apply normalized result
+ v = normalized
+
+ # URL decode to catch encoded injection attempts
+ decoded = urllib.parse.unquote(v)
+
+ # Check for URL decoding expansion
+ if len(decoded) > len(v) * 2:
+ raise ValueError('Suspicious URL decoding expansion detected')
+
+ # Check decoded content for dangerous patterns
+ if DANGEROUS_CONTENT_PATTERN.search(decoded):
+ raise ValueError('Potentially dangerous content detected after URL decoding')
+
+ # Length check after all processing
+ if len(v) > 45000: # Reduced from original to account for processing
+ raise ValueError('Input text too long after normalization')
+
+ except ValueError:
+ # Re-raise ValueError to preserve security error messages
+ raise
+ except Exception:
+ raise ValueError('Invalid Unicode characters detected')
+
+ # Simplified HTML entity decoding
+ try:
+ v = html.unescape(v)
+ except Exception:
+ raise ValueError('HTML entity decoding failed')
+
+ v = v.strip()
+
+ # Remove control characters
+ v = CONTROL_CHARS_PATTERN.sub('', v)
+
+ # Use bleach if available
+ if BLEACH_AVAILABLE:
+ try:
+ v = bleach.clean(v, tags=[], attributes={}, strip=True)
+ except Exception as e:
+ logger.error(f"bleach.clean failed: {str(e)}", exc_info=True)
+ pass
+
+ # Final dangerous content check after processing
+ if DANGEROUS_CONTENT_PATTERN.search(v):
+ raise ValueError('Potentially dangerous content detected after processing')
+
+ # Check for prompt injection attempts
+ if PROMPT_INJECTION_PATTERN.search(v):
+ raise ValueError('Potential prompt injection detected')
+
+ # Final validation
+ if len(v) == 0:
+ raise ValueError('Input text cannot be empty after processing')
+
+ # Additional length check for processed content
+ if len(v) > 45000: # Leave buffer for processing
+ raise ValueError('Input text too long after processing')
+
+ return v
+
+
+class FormFillResponse(BaseModel):
+ model_config = ConfigDict(from_attributes=True)
+
+ id: int
+ template_id: int
+ input_text: str
+ output_pdf_path: str
\ No newline at end of file
diff --git a/api/schemas/templates.py b/api/schemas/templates.py
index 961f219..4e54331 100644
--- a/api/schemas/templates.py
+++ b/api/schemas/templates.py
@@ -1,15 +1,185 @@
-from pydantic import BaseModel
+from pydantic import BaseModel, Field, field_validator, ConfigDict
+import re
+import os
+from pathlib import Path
+import urllib.parse
+import unicodedata
class TemplateCreate(BaseModel):
- name: str
- pdf_path: str
- fields: dict
+ name: str = Field(..., min_length=1, max_length=100)
+ pdf_path: str = Field(..., min_length=1, max_length=500)
+ fields: dict = Field(...)
+
+ @field_validator('name')
+ @classmethod
+ def validate_name(cls, v):
+ if not re.match(r'^[a-zA-Z0-9\s_-]+$', v):
+ raise ValueError('Name can only contain letters, numbers, spaces, underscores, and hyphens')
+ return v
+
+ @field_validator('pdf_path')
+ @classmethod
+ def validate_pdf_path(cls, v):
+ if not v or not v.strip():
+ raise ValueError('PDF path cannot be empty')
+
+ # Early length check
+ if len(v) > 500:
+ raise ValueError('Path too long')
+
+ # Unicode normalization to prevent compatibility attacks
+ try:
+ original_len = len(v)
+ v = unicodedata.normalize('NFKC', v)
+
+ # Check for suspicious expansion after normalization
+ if len(v) > original_len * 1.5:
+ raise ValueError('Suspicious Unicode expansion detected')
+
+ # Check for dangerous Unicode categories and ranges
+ for char in v:
+ char_code = ord(char)
+ # Fullwidth forms
+ if 0xFF00 <= char_code <= 0xFF60:
+ raise ValueError('Fullwidth characters detected in path')
+ # Mathematical operators that could be confused
+ if 0x2200 <= char_code <= 0x22FF:
+ raise ValueError('Mathematical operator characters detected in path')
+ # Various symbols that could be path separators
+ if char_code in [0x2044, 0x2215, 0x29F8, 0x29F9]: # Fraction slash, division slash, etc.
+ raise ValueError('Suspicious separator characters detected in path')
+ # Zero-width and invisible characters
+ if char_code in [0x200B, 0x200C, 0x200D, 0x2060, 0xFEFF]:
+ raise ValueError('Invisible characters detected in path')
+
+ except ValueError:
+ # Re-raise ValueError to preserve error message
+ raise
+ except Exception:
+ raise ValueError('Invalid Unicode characters in path')
+
+ # Single round of URL decoding to prevent double-encoding attacks
+ original_v = v
+ try:
+ v = urllib.parse.unquote(v)
+ if len(v) < len(original_v) * 0.3:
+ raise ValueError('Suspicious path encoding detected')
+ except ValueError:
+ # Re-raise ValueError to preserve original error message
+ raise
+ except Exception:
+ raise ValueError('Invalid URL encoding in path')
+
+ # Normalize path
+ try:
+ normalized = os.path.normpath(v)
+ except Exception:
+ raise ValueError('Invalid path format')
+
+ # Early traversal detection on normalized path
+ if ('..' in normalized or
+ normalized.startswith(('/', '\\')) or
+ re.match(r'^[A-Za-z]:[\\/]', normalized, re.ASCII) or # Windows drive letters (ASCII only)
+ re.match(r'^[A-Za-z][A-Za-z0-9+.-]{0,20}://', normalized, re.ASCII)): # URI schemes (bounded)
+ raise ValueError('Path traversal detected')
+
+ # Traversal pattern detection
+ traversal_patterns = [
+ '..', '..\\', '../', '..\\\\', '..\\/', '../\\',
+ '%2e%2e', '%2e%2e%2f', '%2e%2e%5c', '%252e%252e'
+ ]
+
+ v_lower = v.lower()
+ normalized_lower = normalized.lower()
+
+ for pattern in traversal_patterns:
+ if pattern in v_lower or pattern in normalized_lower:
+ raise ValueError('Path traversal detected')
+
+ # Check for forbidden characters
+ forbidden_chars = ['~', '$', '|', '&', ';', '`', '<', '>', '"', "'", '*', '?', ':']
+ forbidden_chars.extend([chr(i) for i in range(32)]) # Control characters
+ forbidden_chars.append(chr(127)) # DEL character
+
+ for char in forbidden_chars:
+ if char in normalized:
+ raise ValueError(f'Forbidden character detected: {repr(char)}')
+
+ # Check if it's a PDF file
+ if not normalized.lower().endswith('.pdf'):
+ raise ValueError('File must be a PDF')
+
+ # Check filename for Windows reserved names
+ try:
+ filename = Path(normalized).name
+ if not filename: # Empty filename
+ raise ValueError('Empty filename detected')
+
+ # Check for empty base name (e.g., ".pdf" with no actual name)
+ base_name_check = filename.rsplit('.', 1)[0] if '.' in filename else filename
+ if not base_name_check or base_name_check == '.' or base_name_check == '':
+ raise ValueError('Invalid filename: empty base name')
+
+ # Check for reserved names (case-insensitive, handle edge cases)
+ filename_upper = filename.upper()
+ base_name = filename_upper.split('.')[0] if '.' in filename_upper else filename_upper
+
+ reserved_names = ['CON', 'PRN', 'AUX', 'NUL'] + [f'COM{i}' for i in range(1, 10)] + [f'LPT{i}' for i in range(1, 10)]
+
+ if base_name in reserved_names:
+ raise ValueError(f'Reserved filename detected: {base_name}')
+
+ # Additional checks for edge cases
+ if filename.startswith('.') and len(filename) == 1:
+ raise ValueError('Invalid filename: single dot')
+ if filename == '..':
+ raise ValueError('Invalid filename: double dot')
+ if len(filename) > 255: # Windows/Linux filename length limit
+ raise ValueError('Filename too long')
+
+ except Exception as e:
+ if isinstance(e, ValueError):
+ raise
+ raise ValueError(f'Error validating filename: {e}')
+
+ # Strict prefix validation (no symlink resolution at validation time)
+ allowed_prefixes = [
+ 'src/inputs/', 'src/templates/', 'uploads/', 'templates/',
+ './src/inputs/', './src/templates/', './uploads/', './templates/',
+ 'src\\inputs\\', 'src\\templates\\', 'uploads\\', 'templates\\',
+ '.\\src\\inputs\\', '.\\src\\templates\\', '.\\uploads\\', '.\\templates\\'
+ ]
+
+ normalized_forward = normalized.replace('\\', '/')
+ if not any(normalized_forward.startswith(prefix.replace('\\', '/')) for prefix in allowed_prefixes):
+ raise ValueError('Path must be within allowed directories (src/inputs/, src/templates/, uploads/, templates/)')
+
+ # Final length check after all processing
+ if len(normalized) > 400:
+ raise ValueError('Path too long after processing')
+
+ return normalized
+
+ @field_validator('fields')
+ @classmethod
+ def validate_fields(cls, v):
+ if not isinstance(v, dict):
+ raise ValueError('Fields must be a dictionary')
+
+ if len(v) > 50:
+ raise ValueError('Too many fields: maximum 50 allowed')
+
+ for key, value in v.items():
+ if not isinstance(key, str) or not isinstance(value, str):
+ raise ValueError('Field keys and values must be strings')
+ if len(key) > 100 or len(value) > 500:
+ raise ValueError('Field names or values too long')
+ return v
class TemplateResponse(BaseModel):
+ model_config = ConfigDict(from_attributes=True)
+
id: int
name: str
pdf_path: str
- fields: dict
-
- class Config:
- from_attributes = True
\ No newline at end of file
+ fields: dict
\ No newline at end of file
diff --git a/docs/api.md b/docs/api.md
new file mode 100644
index 0000000..1dc38fc
--- /dev/null
+++ b/docs/api.md
@@ -0,0 +1,275 @@
+# FireForm API Documentation
+
+## Overview
+
+The FireForm API provides endpoints for creating PDF templates and filling forms using AI-powered text extraction. The API is built with FastAPI and includes comprehensive security validation.
+
+## Base URL
+
+```
+http://127.0.0.1:8000
+```
+
+## Authentication
+
+Currently, the API does not require authentication. This is suitable for local deployment and development.
+
+## Endpoints
+
+### Templates
+
+#### Create Template
+
+Create a new PDF template for form filling.
+
+**Endpoint**: `POST /templates/create`
+
+**Request Body**:
+
+```json
+{
+ "name": "string",
+ "pdf_path": "string",
+ "fields": {
+ "field_name": "field_type"
+ }
+}
+```
+
+**Parameters**:
+
+- `name` (string, required): Human-readable name for the template
+- `pdf_path` (string, required): Path to the PDF template file
+- `fields` (object, required): Mapping of field names to their types
+
+**Example Request**:
+
+```json
+{
+ "name": "Incident Report Template",
+ "pdf_path": "src/inputs/incident_report.pdf",
+ "fields": {
+ "officer_name": "string",
+ "incident_date": "string",
+ "location": "string",
+ "description": "string"
+ }
+}
+```
+
+**Response**:
+
+```json
+{
+ "id": 1,
+ "name": "Incident Report Template",
+ "pdf_path": "src/inputs/incident_report.pdf",
+ "fields": {
+ "officer_name": "string",
+ "incident_date": "string",
+ "location": "string",
+ "description": "string"
+ }
+}
+```
+
+#### Get Template
+
+Retrieve details of a specific template.
+
+**Endpoint**: `GET /templates/{template_id}`
+
+**Parameters**:
+
+- `template_id` (integer, required): ID of the template
+
+**Response**:
+
+```json
+{
+ "id": 1,
+ "name": "Incident Report Template",
+ "pdf_path": "src/inputs/incident_report.pdf",
+ "fields": {
+ "officer_name": "string",
+ "incident_date": "string",
+ "location": "string",
+ "description": "string"
+ }
+}
+```
+
+### Forms
+
+#### Fill Form
+
+Fill a PDF form using AI extraction from natural language input.
+
+**Endpoint**: `POST /forms/fill`
+
+**Request Body**:
+
+```json
+{
+ "template_id": "integer",
+ "input_text": "string"
+}
+```
+
+**Parameters**:
+
+- `template_id` (integer, required): ID of the template to use
+- `input_text` (string, required): Natural language description of the incident
+
+**Example Request**:
+
+```json
+{
+ "template_id": 1,
+ "input_text": "Officer John Smith responded to a vehicle accident on March 22, 2026 at the intersection of Main Street and Oak Avenue. The incident involved two vehicles with minor injuries reported."
+}
+```
+
+**Response**:
+
+```json
+{
+ "id": 1,
+ "template_id": 1,
+ "input_text": "Officer John Smith responded to a vehicle accident...",
+ "output_pdf_path": "incident_report_abc123_filled.pdf"
+}
+```
+
+#### Get Form
+
+Retrieve details of a specific filled form.
+
+**Endpoint**: `GET /forms/{form_id}`
+
+**Parameters**:
+
+- `form_id` (integer, required): ID of the filled form
+
+**Response**:
+
+```json
+{
+ "id": 1,
+ "template_id": 1,
+ "input_text": "Officer John Smith responded to a vehicle accident...",
+ "output_pdf_path": "incident_report_abc123_filled.pdf"
+}
+```
+
+## Error Responses
+
+### Validation Error (422)
+
+Returned when request data fails validation.
+
+```json
+{
+ "detail": [
+ {
+ "loc": ["body", "field_name"],
+ "msg": "field required",
+ "type": "value_error.missing"
+ }
+ ]
+}
+```
+
+### Not Found (404)
+
+Returned when a requested resource doesn't exist.
+
+```json
+{
+ "detail": "Template not found"
+}
+```
+
+### Internal Server Error (500)
+
+Returned when an unexpected error occurs.
+
+```json
+{
+ "detail": "Internal server error"
+}
+```
+
+## Security Features
+
+The API includes comprehensive security validation:
+
+### Input Validation
+
+- **XSS Protection**: Detects and blocks script tags and malicious HTML
+- **Homograph Detection**: Prevents attacks using similar-looking characters
+- **Path Traversal Prevention**: Blocks attempts to access unauthorized files
+- **Prompt Injection Defense**: Prevents manipulation of AI prompts
+
+### Content Sanitization
+
+- HTML entity decoding with safety checks
+- Unicode normalization to prevent encoding attacks
+- URL decoding validation
+- Malicious content pattern detection
+
+### Error Handling
+
+- Sanitized error messages to prevent information leakage
+- Proper HTTP status codes
+- Comprehensive logging for debugging
+
+## Rate Limiting
+
+Currently, no rate limiting is implemented. For production deployment, consider implementing rate limiting based on your requirements.
+
+## Interactive Documentation
+
+The API provides interactive documentation via Swagger UI:
+
+- **Swagger UI**: `http://127.0.0.1:8000/docs`
+- **ReDoc**: `http://127.0.0.1:8000/redoc`
+
+## Testing
+
+Test the API endpoints using the provided test suite:
+
+```bash
+pytest tests/ -v
+```
+
+Or use curl for manual testing:
+
+```bash
+# Create a template
+curl -X POST "http://127.0.0.1:8000/templates/create" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "name": "Test Template",
+ "pdf_path": "src/inputs/file.pdf",
+ "fields": {"name": "string", "date": "string"}
+ }'
+
+# Fill a form
+curl -X POST "http://127.0.0.1:8000/forms/fill" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "template_id": 1,
+ "input_text": "John Smith submitted the form on March 22, 2026"
+ }'
+```
+
+## Development
+
+To start the development server:
+
+```bash
+uvicorn api.main:app --host 127.0.0.1 --port 8000 --reload
+```
+
+The `--reload` flag enables automatic reloading when code changes are detected.
diff --git a/docs/db.md b/docs/db.md
index 4d702be..12bd92e 100644
--- a/docs/db.md
+++ b/docs/db.md
@@ -1,11 +1,12 @@
# Database and API Management Guide
-This guide explains how to set up, initialize, and manage the FireForm database.
+This guide explains how to set up, initialize, and manage the FireForm database and API server.
## Prerequisites
> [!IMPORTANT]
> Ensure you have installed all dependencies before proceeding:
+>
> ```bash
> pip install -r requirements.txt
> ```
@@ -19,30 +20,88 @@ python -m api.db.init_db
```
> [!TIP]
-> After running this, you should see a `.db` file in the root of the project. If you don't see it, it means the database was not successfully created.
+> After running this, you should see a `fireform.db` file in the root of the project. If you don't see it, it means the database was not successfully created.
## Running the API
Once the database is initialized, start the FastAPI server:
```bash
-uvicorn api.main:app --reload
+uvicorn api.main:app --host 127.0.0.1 --port 8000
```
If successful, you will see:
`INFO: Uvicorn running on http://127.0.0.1:8000`
+## API Endpoints
+
+The API provides the following endpoints:
+
+### Templates
+
+- `POST /templates/create` - Create a new PDF template
+- `GET /templates/{template_id}` - Get template details
+
+### Forms
+
+- `POST /forms/fill` - Fill a form using AI extraction
+- `GET /forms/{form_id}` - Get form details
+
## Testing Endpoints
1. Open your browser and go to [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs).
2. Use the **Swagger UI** to test endpoints like `POST /templates/create`.
3. Click **"Try it out"**, fill in the data, and click **"Execute"** to see the response.
+### Example Template Creation
+
+```json
+{
+ "name": "Incident Report",
+ "pdf_path": "src/inputs/file.pdf",
+ "fields": {
+ "officer_name": "string",
+ "incident_date": "string",
+ "location": "string"
+ }
+}
+```
+
+### Example Form Filling
+
+```json
+{
+ "template_id": 1,
+ "input_text": "Officer John Smith responded to an incident on March 22, 2026 at 123 Main Street."
+}
+```
+
+## Security Features
+
+The API includes comprehensive security validation:
+
+- Input sanitization and validation
+- XSS attack prevention
+- Path traversal protection
+- Prompt injection defense
+- Malicious content detection
+
## Database Visualization
> [!NOTE]
> The database file is excluded from Git to avoid conflicts between developers.
To visualize the database:
+
1. Install the **SQLite3 Editor** extension in VS Code.
-2. Open the `.db` file directly.
+2. Open the `fireform.db` file directly.
+
+## Testing
+
+Run the test suite to verify API functionality:
+
+```bash
+pytest tests/ -v
+```
+
+The system includes comprehensive testing for all endpoints and security features.
diff --git a/docs/docker.md b/docs/docker.md
index 118eb10..a2dfcb3 100644
--- a/docs/docker.md
+++ b/docs/docker.md
@@ -1,40 +1,38 @@
-# Docker documentation for FireForm
+# Docker Documentation for FireForm
-## Setup
-We will be using 2 different containers:
-1. `fireform-app` -> This container will hold the whole project itself.
-2. `ollama/ollama:latest` -> This is to deploy ollama, that way it's faster to set up.
+## Overview
-### Initial configuration steps
-For this I provided a script that can be run to automate the setup.
-This script builds both containers and starts them.
+FireForm uses Docker containers for easy deployment and development. The setup includes:
-You will have to make the script executable, this can be done in linux systems with:
-```bash
-chmod +x container-init.sh
-```
-The it can be run with:
-```bash
-./container-init.sh
-```
-- NOTE: This pulls ollama and mistral, so it's normal for it to take a long time to finish. Don't interrupt it.
+1. `fireform-app` - Main application container with API server and processing
+2. `ollama/ollama:latest` - Local LLM server for AI processing
+
+## Quick Start
+
+### Prerequisites
-## Dependencies
- **Docker Engine** (20.10+) - [Installation Guide](https://docs.docker.com/engine/install/)
- **Docker Compose** (2.0+) - Included with Docker Desktop or install separately
- **Make** - For running development commands
- **Git** - For version control
-## Configuration files
-The files involved in this are:
-- Dockerfile
-- Makefile
-- docker-compose.yml
-- .dockerignore (like gitignore but for the containers)
-- container-init.sh
+### Initial Setup
+
+Run the automated setup script:
-The makefile is set up so that you don't need to learn how to properly use docker, just use the __available commands:__
+```bash
+chmod +x container-init.sh
+./container-init.sh
```
+
+> [!NOTE]
+> This script pulls Ollama and Mistral model, so it may take several minutes to complete. Don't interrupt the process.
+
+## Available Commands
+
+Use the Makefile for easy container management:
+
+```bash
make build # Build Docker images
make up # Start all containers
make down # Stop all containers
@@ -43,10 +41,110 @@ make shell # Open bash shell in app container
make exec # Run main.py in container
make pull-model # Pull Mistral model into Ollama
make clean # Remove all containers and volumes
+make help # Show all available commands
```
-* You can see this list at any time by running `make help`.
-## Debugging
-For debugging with LLMs it's really useful to attach the logs.
-* You can obtain the logs using `make logs` or `docker compose logs`.
-* A common problem is when you already have something running in port 11434. As ollama runs in that port, we need it free. You can check what's running on that port with `sudo lsof -i :11434`.
+## Configuration Files
+
+The Docker setup uses these files:
+
+- `Dockerfile` - Main application container definition
+- `docker-compose.yml` - Multi-container orchestration
+- `Makefile` - Development commands
+- `.dockerignore` - Files excluded from Docker build
+- `container-init.sh` - Automated setup script
+
+## Services
+
+### FireForm App Container
+
+- **Port**: 8000 (API server)
+- **Features**: FastAPI server, PDF processing, database
+- **Health Check**: Automatic health monitoring
+- **Security**: Non-root user, resource limits
+
+### Ollama Container
+
+- **Port**: 11434 (LLM API)
+- **Model**: Mistral (automatically pulled)
+- **GPU Support**: Enabled if available
+- **Persistence**: Model data persisted in volumes
+
+## Development Workflow
+
+1. **Start Development Environment**:
+
+ ```bash
+ make up
+ ```
+
+2. **View Application Logs**:
+
+ ```bash
+ make logs
+ ```
+
+3. **Access API Documentation**:
+ Open `http://localhost:8000/docs`
+
+4. **Run Commands in Container**:
+
+ ```bash
+ make shell
+ ```
+
+5. **Stop Environment**:
+ ```bash
+ make down
+ ```
+
+## Troubleshooting
+
+### Common Issues
+
+**Port 11434 Already in Use**:
+
+```bash
+sudo lsof -i :11434 # Check what's using the port
+```
+
+**Container Won't Start**:
+
+```bash
+make logs # Check container logs
+docker system prune # Clean up Docker resources
+```
+
+**Model Not Loading**:
+
+```bash
+make pull-model # Manually pull Mistral model
+```
+
+### Debugging
+
+- **View All Logs**: `make logs`
+- **Container Status**: `docker compose ps`
+- **Resource Usage**: `docker stats`
+- **Clean Reset**: `make clean && make build && make up`
+
+## Security Features
+
+The Docker setup includes:
+
+- Non-root user execution
+- Resource limits (CPU/memory)
+- Network isolation
+- Volume security
+- Health checks
+- Automatic restarts
+
+## Production Deployment
+
+For production use:
+
+1. Update environment variables in `.env`
+2. Configure proper SSL certificates
+3. Set up reverse proxy (nginx/traefik)
+4. Enable monitoring and logging
+5. Configure backup strategies
diff --git a/docs/security.md b/docs/security.md
new file mode 100644
index 0000000..8f74834
--- /dev/null
+++ b/docs/security.md
@@ -0,0 +1,203 @@
+# Security Documentation
+
+## Overview
+
+FireForm implements enterprise-grade security measures to protect against common web application vulnerabilities and AI-specific attacks. This document outlines the security features and best practices implemented in the system.
+
+## Security Features
+
+### Input Validation and Sanitization
+
+#### XSS Protection
+
+- **Script Tag Detection**: Blocks `",
+ "",
+ "javascript:alert('xss')",
+ "