This project is an entry for the OpenAI to Z Challenge, a competition co-organized by Kaggle and OpenAI. After submitting our work, we decided to open-source the code, aiming to provide archaeologists, anthropologists, and enthusiasts with a free and easy-to-use AI-driven archaeological technology framework.
This pipeline uses AI and remote sensing data to explore potential archaeological sites in the Amazon rainforest. It analyzes deforestation patterns, satellite imagery, and elevation data to identify areas where ancient settlements might be hidden beneath the forest canopy.
Key Features:
- Deforestation pattern analysis to find optimal archaeological visibility
- Sentinel-2 satellite imagery processing for vegetation anomaly detection
- FABDEM elevation validation for subsurface feature confirmation
- OpenAI GPT integration for contextual analysis and interpretation
- Archaeology-themed UI for parameter management and pipeline execution
Welcome to contact us if you have any questions or ideas via the email below:
wangzifeng157@gmail.com
Cite this repo if it helps you in your work:
-
APA: "Li, L., & Wang, Z. (2025). DualVectorFoil-AI-Archaeology (Version 1.0.0) [Computer software]. https://github.com/BostonListener/DualVectorFoil-AI-Archaeology/tree/main"
-
Bibtex: @software{Li_DualVectorFoil-AI-Archaeology_2025, author = {Li, Linduo and Wang, Zifeng}, doi = {10.5281/zenodo.1234}, month = jun, title = {{DualVectorFoil-AI-Archaeology}}, url = { https://github.com/BostonListener/DualVectorFoil-AI-Archaeology }, version = {1.0.0}, year = {2025} }
We've developed a beautiful, archaeology-themed web interface that makes the pipeline accessible to non-technical users. The interface provides visual parameter editing and real-time pipeline monitoring without requiring manual YAML file editing.
- 🎨 Archaeological Theme: Professional earth-tone design with ancient-inspired visual elements
- ⚙️ Interactive Parameter Editor: Visual editing of all pipeline configuration parameters
- 🚀 One-Click Execution: Run setup, pipeline, checkpoint, and visualization with single clicks
- 📊 Real-Time Monitoring: Live console output and progress tracking via WebSocket
- 🔧 Zero Code Changes: Seamlessly integrates with existing pipeline scripts
The web interface provides comprehensive parameter editing across all pipeline stages:
Execute all pipeline stages with professional action buttons:
Monitor pipeline execution with live console output:
The complete archaeological detection workflow consists of three main stages:
# Install all dependencies
pip install rasterio geopandas shapely scikit-image scipy requests matplotlib pandas numpy folium python-dotenv pyyaml openai flask flask-socketioDownload and place these files:
PRODES Deforestation Data:
- Download from: https://terrabrasilis.dpi.inpe.br/en/download-files/
- File: Amazon Legal
.gpkgfile - Location:
data/input/prodes_amazonia_legal.gpkg
FABDEM Elevation Data:
- Download from: https://data.bris.ac.uk/data/dataset/s5hqmjcdj8yo2ibzi9b4ew3sn
- Files: FABDEM
.zipfiles for your study area - Location:
data/input/DEM/
Create .env file:
USER_NAME=your_copernicus_username
USER_PASSWORD=your_copernicus_password
OPENAI_API_KEY=your_openai_api_keyRegister for free accounts:
- Copernicus Data Space: https://dataspace.copernicus.eu/
- OpenAI API: https://platform.openai.com/api-keys
# Start the web interface
python run_ui.pyOpen browser to: http://localhost:5000
- Configure Parameters: Edit all pipeline settings visually
- Run Setup: Initialize directories and validate configuration
- Run Pipeline: Execute the complete 3-stage archaeological detection
- Run Checkpoints: Validate competition compliance
- Run Visualization: Generate site visualizations
# Configure parameters manually
# Edit config/parameters.yaml for your study area
# Setup pipeline directories
python setup_pipeline.py
# Check dependencies
python run_pipeline.py --check
# Run complete pipeline
python run_pipeline.py --full
# Run checkpoints
python run_checkpoint.pydata/input/
├── prodes_amazonia_legal.gpkg # Deforestation polygons
└── DEM/
└── FABDEM_*.zip # Elevation tiles
data/
├── stage1/
│ ├── archaeological_candidates.csv # Ranked deforestation candidates
│ └── archaeological_candidates.shp # Geographic boundaries
├── stage2/
│ ├── downloads/ # Sentinel-2 satellite data
│ └── pattern_summary.csv # NDVI vegetation patterns
├── stage3/
│ ├── final_archaeological_sites.csv # Validated archaeological sites
│ ├── final_archaeological_sites.geojson
│ └── final_archaeological_sites.html # Interactive map
├── checkpoint2_outputs/
│ ├── five_anomaly_footprints.json # 5 candidate anomalies
│ └── checkpoint2_results.json
├── checkpoint3_outputs/
│ ├── best_site_evidence_package.json # Single best discovery
│ └── checkpoint3_notebook.json
└── checkpoint4_outputs/
├── two_page_presentation.json # Presentation materials
└── presentation_pdf_content.txt
Final Archaeological Sites (data/stage3/final_archaeological_sites.csv):
- Site coordinates and measurements
- Confidence assessments from multiple data sources
- Geometric properties and pattern classifications
Interactive Map (data/stage3/final_archaeological_sites.html):
- Visualization of all discovered sites
- Elevation and satellite imagery overlays
Checkpoint Results:
- 5 anomaly footprints for competition compliance
- Best site documentation with evidence
- Presentation materials for live demonstration
The web interface organizes all pipeline parameters into intuitive categories:
- Geographic bounds definition
- Region name and coordinate boundaries
- Area of interest specification
- Temporal range for PRODES data analysis
- Size filters for archaeological features
- Age parameters for optimal site visibility
- Shape and optimization criteria
- Satellite data download parameters
- Cloud cover thresholds and preferences
- NDVI analysis sensitivity settings
- Pattern detection parameters
- FABDEM analysis parameters
- Contour intervals and roughness thresholds
- Topographic validation criteria
- Buffer distances and pixel requirements
- Input data locations
- Output directory structure
- Stage-specific file paths
study_area:
name: "Acre"
bounds:
min_lon: -68.5
max_lon: -67.5
min_lat: -10.6
max_lat: -9.6
deforestation:
start_year: 2017
end_year: 2022
min_age_years: 3
max_age_years: 8
min_size_ha: 2.5
max_size_ha: 100
sentinel_download:
max_candidates: 20
cloud_cover_threshold: 75
buffer_degrees: 0.01
sentinel_analysis:
parameter_grid:
ndvi_contrast_threshold: [0.05, 0.08, 0.12]
geometry_threshold: [0.35, 0.50, 0.65]
min_pattern_pixels: [5, 7, 9]
dem_validation:
buffer_distance_m: 100
elevation_std_threshold: 0.4
elevation_range_threshold: 1.5
patterns_to_validate: 25# Copernicus Data Space credentials (required)
USER_NAME=your_email@example.com
USER_PASSWORD=your_password
# OpenAI API credentials (required)
OPENAI_API_KEY=sk-your-api-key-hereTypical Pipeline Output:
- Input: ~10,000 deforestation polygons
- Stage 1: 20 archaeological candidates
- Stage 2: 15 NDVI pattern detections
- Stage 3: 5-10 validated archaeological sites
Our AI-powered pipeline successfully identified several high-confidence archaeological sites. Here are the top 5 discoveries:
- Flask: RESTful API for parameter management and script execution
- WebSocket: Real-time communication for live console output
- Process Management: Subprocess orchestration with UTF-8 encoding
- Parameter Handling: YAML configuration file management
- Archaeological Theme: Earth tones, professional styling
- Interactive Forms: Dynamic parameter editing with validation
- Real-Time Updates: Live console output and status monitoring
- Responsive Design: Desktop and mobile compatibility
- Zero Modifications: Works with existing pipeline scripts unchanged
- Parameter Synchronization: Automatic YAML file updates
- Process Monitoring: Real-time execution tracking
- Error Handling: Comprehensive error reporting and recovery
"Cannot connect to server"
- Check if port 5000 is available
- Ensure Flask dependencies are installed:
pip install flask flask-socketio - Try restarting the interface:
python run_ui.py
"Configuration not saving"
- Verify write permissions to
config/directory - Check for YAML syntax errors in manual edits
- Ensure all required parameters are filled
Authentication Errors:
- Verify Copernicus credentials in
.envfile - Check OpenAI API key format (starts with 'sk-')
Data Not Found:
- Ensure PRODES
.gpkgfile is in correct location - Download FABDEM tiles covering your study area
No Patterns Detected:
- Reduce thresholds in
sentinel_analysis.parameter_grid - Increase
max_candidatesfor more input data - Check study area bounds cover deforested regions
Unicode/Encoding Errors:
- The web interface automatically handles UTF-8 encoding
- For command line usage, set:
set PYTHONIOENCODING=utf-8(Windows) orexport PYTHONIOENCODING=utf-8(Linux/Mac)
Coordinate Issues:
- Verify study area bounds in parameters
- Ensure FABDEM tiles cover the study area
- Check that deforestation data exists in the region
- Configure Parameters: Use the web interface to set your study area and analysis parameters
- Run Setup: Initialize directories and validate configuration
- Execute Pipeline: Run the complete 3-stage archaeological detection workflow
- Run Checkpoints: Complete requirements with checkpoint analysis
- Review Results: Examine output files and interactive maps
- Generate Visualizations: Create professional site documentation
- Field Planning: Use coordinates for ground-truth validation
- Research Documentation: Analyze patterns and prepare academic publications
The pipeline implements a three-stage archaeological detection workflow enhanced with a modern web interface:
-
Stage 1 - Deforestation Analysis: Identifies optimal areas for archaeological visibility through systematic analysis of TerraBrasilis PRODES data, applying temporal, spatial, and geometric filters to find areas where ancient settlements might be revealed.
-
Stage 2 - Satellite Analysis: Downloads and processes Sentinel-2 imagery to calculate NDVI patterns that indicate subsurface archaeological features through vegetation anomalies and geometric patterns.
-
Stage 3 - Elevation Validation: Uses FABDEM bare-earth elevation data to validate potential sites through statistical analysis of elevation signatures and terrain characteristics.
-
Web Interface: Provides intuitive parameter management and real-time monitoring, making the pipeline accessible to non-technical users while maintaining full compatibility with command-line usage.
Each stage feeds into OpenAI GPT models for contextual interpretation and evidence synthesis, creating a comprehensive AI-enhanced archaeological discovery system.
This pipeline represents the revolution of archaeological methodology in this AI era by:
- Scaling Discovery: Enables systematic exploration of previously inaccessible Amazon regions
- AI Integration: Demonstrates practical application of machine learning for heritage preservation
- Community Partnership: Supports indigenous communities in documenting cultural landscapes
- Methodological Innovation: Creates reproducible framework applicable to global archaeological research
- Accessibility: The web interface democratizes advanced AI-archaeological tools for researchers worldwide
The results contribute to understanding pre-Columbian civilizations while respecting indigenous rights and promoting collaborative research practices.
- CPU: Multi-core processor recommended for parallel processing
- RAM: 8GB minimum, 16GB recommended for large study areas
- Storage: 10-50GB depending on satellite data downloads
- Network: Stable internet connection for data downloads
- Study Area Size: Smaller areas (1°×1°) process faster than large regions
- Temporal Range: Limiting years reduces processing time
- Candidate Limits: Adjust
max_candidatesbased on computational resources - Parallel Processing: Multiple CPU cores automatically utilized where possible
- v1.0.0 (2025): Initial release with complete pipeline and web interface
- Core archaeological detection algorithms
- Competition checkpoint compliance
- Professional web interface with real-time monitoring
- Comprehensive documentation and examples
We would like to express our sincere gratitude to our team members who contributed their valuable expertise in archaeology and anthropology:
- Yifan Wu: MS at UCL, Central Asian Studies
- Lienong Zhang: PhD student at University of Pittsburgh, Anthropology (the society and technology in Amazon Basin)
- Tianyu Yao: Freshman at Rutgers University, Anthropology
Special thanks to the open-source community and the organizations providing the essential datasets that make this research possible:
- TerraBrasilis/INPE for PRODES deforestation data
- ESA Copernicus for Sentinel-2 satellite imagery
- University of Bristol for FABDEM elevation data
- OpenAI for advanced language model capabilities














