FusionFly: A Scalable Open-Source Framework for AI-Powered Positioning Data Standardization

FusionFly is an open-source toolkit for processing and fusing GNSS (Global Navigation Satellite System) and IMU (Inertial Measurement Unit) data with Factor Graph Optimization (FGO). The system provides a modern web interface for uploading, processing, visualizing, and downloading standardized navigation data.

Demo

Check out the FusionFly demo video to see the system in action:

You can also:

Download the Demo Video directly from the repository

Note: After cloning the repository, you can find the demo video in the public/assets directory.

System Architecture

FusionFly follows a standard client-server architecture with a React frontend, Express.js backend, and Redis job queue for processing large files.

┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│                 │          │                 │          │                 │
│  React Frontend │◄────────►│  Express Backend│◄────────►│   Redis Queue   │
│                 │   HTTP   │                 │   Jobs   │                 │
└────────┬────────┘          └────────┬────────┘          └────────┬────────┘
         │                            │                            │
         │                            │                            │
         ▼                            ▼                            ▼
┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
│  User Interface │          │  File Storage   │          │  Data Processing│
│  - File Upload  │          │  - Raw Files    │          │  - Conversion   │
│  - Visualization│          │  - Processed    │          │  - FGO          │
│  - Downloads    │          │  - Results      │          │  - Validation   │
└─────────────────┘          └─────────────────┘          └─────────────────┘

Data Standardization Pipeline

FusionFly processes data through a standardization pipeline:

┌───────────┐     ┌────────────┐     ┌───────────────┐     ┌───────────┐     ┌──────────────┐     ┌─────────────┐
│           │     │            │     │               │     │           │     │              │     │             │
│  Detect   │────►│ Process via│────►│ AI-Assisted   │────►│ Conversion│────►│ Schema       │────►│ Schema      │
│  Format   │     │ Standard   │     │ Parsing       │     │ Validation│     │ Conversion   │     │ Validation  │
│           │     │ Script     │     │ (if needed)   │     │           │     │              │     │             │
└───────────┘     └────────────┘     └───────────────┘     └───────────┘     └──────────────┘     └─────────────┘
                         │                   │                   │                   │                   │
                         │                   │                   │                   │                   │
                         ▼                   ▼                   ▼                   ▼                   ▼
                  ┌────────────────────────────────────────────────────────────────────────────────────────┐
                  │                                                                                        │
                  │                           Automated Feedback Loop                                      │
                  │                                                                                        │
                  └────────────────────────────────────────────────────────────────────────────────────────┘

Detect Format
- Analyzes file extension and content to determine the data format
- Identifies the appropriate processing pathway
Try with standard script:
- RINEX (.obs files)
  - Processes using georinex library
  - Converts to standardized format with timestamps
  - Uses AI-assisted parsing if standard conversion fails
- NMEA (.nmea files)
  - Processes NMEA sentences using pynmea2
  - Handles common message types (GGA, RMC)
  - Extracts timestamps and coordinates
  - Uses AI-assisted parsing when needed
- Unknown Formats
  - Analyzes file content to determine structure
  - Generates appropriate conversion logic
  - Extracts relevant location data
AI-Assisted Parsing
- If standard script is not working, it will call Azure OpenAI service with a snippet of data
- Executes the generated conversion code automatically in the backend
- Provides detailed error information to improve subsequent attempts

LLM Robustness Features

FusionFly implements robust validation and fallback mechanisms for each LLM step in the AI-assisted conversion pipeline:

Format Conversion (First LLM)

Script Generation: LLM generates a complete Node.js script to process the input file
Validation: System executes the generated script and validates that output conforms to expected JSONL format
Fallback Mechanism:
- When the script fails to execute or produces invalid output, the system captures specific errors
- Error details are fed back to the LLM in a structured format for improved retry
- The LLM is instructed to fix specific issues in its subsequent script generation attempt
- System makes up to 3 attempts with increasingly detailed error feedback

Location Extraction (Second LLM)

Script Generation: LLM generates a specialized Node.js script to extract location data from the first-stage output
Validation: System executes the script and validates coordinates (latitude/longitude in correct ranges), timestamps, and required field presence
Fallback Mechanism:
- Detects script execution errors or missing/invalid location data in extraction output
- Provides field-specific guidance to the LLM about conversion issues
- Includes examples of proper formatting in error feedback
- Retries with progressive reinforcement learning pattern

Schema Conversion (Third LLM) - Enhanced Two-Submodule Approach

Submodule 1: Direct Schema Conversion
- Takes a small sample of data from Module 2 (location data)
- Directly converts it to the target schema format without generating code
- Validates the converted sample against schema requirements
- Produces a set of high-quality examples to guide the second submodule
- Prompt focuses on direct data transformation, not code generation
Submodule 2: Transformation Script Generation
- Takes both raw data and the converted examples from Submodule 1
- Generates a Node.js transformation script based on the pattern shown in the examples
- Script is executed to process the entire dataset
- Output is validated against schema requirements
- Prompt includes both input data and properly formatted examples for pattern matching
Enhanced Validation and Feedback Flow:
- If Submodule 1 fails, the entire process fails (since examples are required for Submodule 2)
- Detailed error messages identify specific schema non-compliance issues
- Output from both submodules is validated against target schema
- Retries with improved instructions based on specific validation failures

Third LLM Pipeline Architecture

The detailed architecture of the enhanced third LLM module is illustrated below:

This diagram shows how the system first attempts direct schema conversion with Submodule 1, then uses both the original data and converted examples to generate a transformation script with Submodule 2. The script is then executed to process the entire dataset, with validation at each step ensuring compliance with the pre-defined schema.

The complete data transformation pipeline across all three modules is shown here:

This end-to-end view shows how each module contributes to the overall transformation process, from format conversion through location extraction to the final schema conversion.

Unit Testing

Comprehensive test suite covers each LLM step with:
- Happy path tests with valid inputs and expected outputs
- Error handling tests with malformed inputs
- Edge case tests (empty files, missing fields, etc.)
- API error simulation and recovery tests
- Validation and fallback mechanism tests

Error Feedback Loop

The entire pipeline implements a closed feedback loop where:
- Each step validates the output of the previous step
- Validation errors are captured in detail
- Structured error information guides the next LLM attempt
- System learns from previous failures to improve conversion quality
- Detailed logs are maintained for debugging and improvement

This multi-layer validation and fallback approach ensures robust processing even with challenging or unusual data formats, significantly improving the reliability of the AI-assisted conversion pipeline.

Conversion Validation
- Runs comprehensive unit tests on the converted data
- Validates correct JSONL formatting and data integrity
- If validation fails, feeds the error data to the LLM
- Regenerates conversion scripts up to 10 times until correctly converted to JSONL
Schema Conversion
- After converting to JSONL, extracts data entries to the target schema
- Uses the enhanced two-submodule approach:
  1. First directly converts a small sample to the target schema
  2. Then generates a script using both raw data and converted examples
- Handles complex field mappings and data transformations
- Applies data cleaning and normalization rules
- Produces structurally consistent output conforming to the target schema
Schema Validation
- Performs rigorous validation against the required schema structure
- Specifically verifies entry names match exactly with the target schema
- Validates data types, required fields, and structural constraints
- If validation fails, triggers the fallback mechanism:
  - Reports specific validation errors to the LLM
  - Generates improved conversion code with corrected field mappings
  - Re-processes the data with enhanced instructions
  - Repeats until output fully conforms to schema specifications

Each step includes comprehensive error handling and logging, allowing for detailed diagnostics and continuous improvement of the conversion process. The entire pipeline is designed to handle variations in input data formats while ensuring consistent, standardized output.

Data Flow Pipeline

FusionFly processes data through a well-defined pipeline:

┌───────────┐     ┌───────────┐     ┌───────────┐     ┌───────────┐     ┌───────────┐
│           │     │           │     │           │     │           │     │           │
│  File     │────►│  Format   │────►│ Conversion│────►│  Process  │────►│  Output   │
│  Upload   │     │  Detection│     │ to JSONL  │     │  & Fusion │     │  Results  │
│           │     │           │     │           │     │           │     │           │
└───────────┘     └───────────┘     └───────────┘     └───────────┘     └───────────┘
       │                │                │                │                   │
       ▼                ▼                ▼                ▼                   ▼
┌─────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                     │
│  Supported Input Formats                │  Output Formats                           │
│  ─────────────────────                  │  ─────────────────                        │
│  GNSS:                                  │  Standardized JSONL                       │
│  - RINEX (.obs, .rnx, .21o)             │  Location Data                            │
│  - NMEA (.nmea, .gps, .txt)             │  Trajectory Visualization                 │
│  - UBX (binary)                         │  Validation Reports                       │
│  - JSON, CSV                            │                                           │
│                                         │                                           │
│  IMU:                                   │                                           │
│  - Raw IMU data (.imu)                  │                                           │
│  - CSV, JSON, TXT                       │                                           │
│                                         │                                           │
└─────────────────────────────────────────────────────────────────────────────────────┘

Component Descriptions

Frontend (React)

The frontend is built with React and provides a modern user interface for interacting with the system. It includes:

Home Page: Overview of the system and its capabilities
Upload Interface:
- Drag-and-drop file upload for GNSS and IMU data
- Progress tracking for uploads and processing
- Format detection and validation
Files Page:
- List of processed files with metadata
- Download options for processed data
- Cache management
Results Visualization: (Coming soon)
- Trajectory visualization
- Error analysis
- Quality metrics

Backend (Express.js)

The backend provides the API endpoints and processing logic:

API Layer:
- RESTful API for file operations
- Status reporting
- Error handling
Processing Engine:
- Format detection and conversion
- GNSS data parsing (RINEX, NMEA, UBX)
- IMU data processing
- Data fusion with FGO (Factor Graph Optimization)
Storage Management:
- File storage
- Processing results
- Cache management

Job Queue (Redis/Bull)

Long-running processing tasks are handled by a Redis-backed job queue:

Job Management:
- Job creation and tracking
- Progress reporting
- Error handling and retries
Worker Processes:
- File conversion
- Data processing
- Result generation

Benchmark System

FusionFly includes a comprehensive scientific benchmark system for evaluating the accuracy, robustness, and efficiency of navigation data transformation processes.

Benchmark Architecture

The benchmark system is organized as follows:

benchmark/
├── raw/                # Raw, unformatted navigation data
│   ├── gnss/           # GNSS data in various formats
│   └── imu/            # IMU data in various formats
├── standardized/       # Standardized data (ground truth)
├── test_cases/         # Test cases for benchmarking
│   ├── normal/         # Normal operating scenarios
│   └── edge_cases/     # Challenging data scenarios
├── metadata/           # Dataset and schema information
├── evaluation/         # Evaluation tools and metrics
└── results/            # Benchmark results

Test Case Design

The benchmark includes carefully constructed test scenarios:

Normal Test Cases:
- Medium Urban Environment with NMEA
- Medium Urban Environment with RINEX OBS
- Tunnel Environment (IMU-only)
Edge Cases:
- Missing Data: Files with various fields removed (5-30%)
- Corrupted Data: Files with invalid/extreme values
- Format Variations: Different field ordering, units, etc.

Scientific Evaluation Metrics

1. Data Field Accuracy Metrics

These metrics quantify how precisely the transformed data matches ground truth values across all navigation parameters.

Implementation Logic:

Points from ground truth and transformed datasets are matched by timestamps
Field-by-field comparison is performed using dot notation (e.g., position_lla.latitude_deg)
Statistical measures are calculated for each field

Core Metrics:

Mean Absolute Error (MAE): Average absolute difference between original and transformed values
```
mae = np.mean(errors)  # Simple, interpretable measure of error magnitude
```

Root Mean Square Error (RMSE): Square root of average squared differences

rmse = np.sqrt(np.mean(np.array(errors) ** 2))  # Penalizes larger errors

Normalized RMSE: RMSE normalized by the range of original values

nrmse = rmse / (max(gt_values) - min(gt_values))  # Makes errors comparable across different measurement types

Specialized Accuracy Metrics:

Coordinate transformation accuracy (ECEF↔LLA)
Timestamp conversion precision (microseconds)
Sampling rate preservation
Structural schema compliance

2. Robustness Metrics

These metrics evaluate how well the system handles variations, edge cases, and challenging inputs.

Implementation Logic:

Test files contain intentionally corrupted or challenging data
System processes these files and results are compared to expected handling

Specific types of data corruption are systematically introduced:

{
  "time_unix": 1621218775.5489783,
  "linear_acceleration": {
    "x": 100.0,  // Extreme value (corrupted)
    "y": -100.0, // Extreme value (corrupted)
    "z": 100.0   // Extreme value (corrupted)
  }
}

Core Metrics:

Success Rate: Percentage of edge cases successfully processed
Error Recovery Rate: Percentage of corrupted data points properly handled
Accuracy Under Stress: Field accuracy metrics on edge case data

Specialized Robustness Metrics:

Missing data handling at 5%, 10%, 20%, and 30% levels
Outlier detection and filtering performance
Special value handling (NaN, infinity, null)
Response to inconsistent sampling rates

3. Efficiency Metrics

These metrics measure the computational resources required for data transformation.

Implementation Logic:

Performance monitoring during transformation process
Resource usage tracking using psutil library

Core Metrics:

Transformation Time: Processing time per data point
Peak Memory Usage: Maximum memory consumption during processing
CPU Utilization: Average and peak CPU usage percentage
Size Ratio: Ratio of transformed data size to original data size

Benchmark Automation

The benchmark system is fully automated:

# Run the complete benchmark suite
./run_benchmark.sh

# Evaluate specific test cases
python evaluation/metrics.py --ground-truth standardized/ --converted results/
python evaluation/benchmark.py --input-dir test_cases/normal/case1/ --output-dir results/

Visualization and Reporting

The benchmark generates comprehensive reports with:

Statistical summaries of all metrics
Field-by-field comparisons between original and transformed data
Error distribution visualizations
Overall quality score based on weighted metrics

For detailed technical information about metrics implementation, see Benchmark Metrics Implementation.

Detailed Process Flow

┌──────────────────────┐     ┌──────────────────────┐     ┌──────────────────────┐
│                      │     │                      │     │                      │
│  Client              │     │  Server              │     │  Processing          │
│                      │     │                      │     │                      │
│  1. Select Files     │────►│  1. Receive Files    │────►│  1. Detect Format    │
│  2. Upload           │     │  2. Store Files      │     │  2. Convert to JSONL │
│  3. Monitor Progress │◄────│  3. Create Job       │     │  3. Extract Location │
│  4. View Results     │     │  4. Return Job ID    │     │  4. Validate Data    │
│  5. Download Output  │◄────│  5. Serve Results    │◄────│  5. Generate Output  │
│                      │     │                      │     │                      │
└──────────────────────┘     └──────────────────────┘     └──────────────────────┘

API Endpoints

FusionFly exposes the following RESTful API endpoints:

Endpoint	Method	Description
`/api/files/upload`	POST	Upload GNSS and/or IMU files
`/api/files/status/:id`	GET	Check processing status for a job
`/api/files/list`	GET	List all processed files
`/api/files/download/:id`	GET	Download a processed file
`/api/files/clear-cache`	POST	Clear all cached files
`/api/health`	GET	Check API health

GNSS+IMU Fusion with FGO

FusionFly uses Factor Graph Optimization (FGO) to fuse GNSS and IMU data. This approach:

Creates a graph where nodes represent states (position, velocity, orientation)
Adds edges representing constraints from sensor measurements
Optimizes the graph to find the most likely trajectory
Produces a consistent navigation solution robust to sensor errors

Benefits of FGO:

Handles sensor outages and degraded signals
Provides accurate positioning in challenging environments
Combines complementary sensor characteristics:
- GNSS: Absolute positioning, drift-free
- IMU: High rate, orientation, robust to signal loss

Getting Started

Prerequisites

Node.js (v16+)
npm or yarn
Redis server
For production deployment:
- Azure Cosmos DB account
- Azure Blob Storage account
- Vercel account (optional)

Installation

Clone the repository:

git clone https://github.com/Thorkee/FusionFly.git
cd FusionFly

Install dependencies:
```
npm run install:all
```

Set up environment:

cp backend/.env.example backend/.env
# Edit .env with your configuration

Start the development servers:
```
npm run dev
```

Deployment to Vercel

If you wish to deploy the application to Vercel, follow these steps:

Backend Deployment

Import the project in Vercel Dashboard
Configure environment variables:
- All variables from backend/.env.example
- Set USE_LOCAL_DB_FALLBACK=false for production
- Add your Cosmos DB, Blob Storage credentials
Deploy the backend service

Frontend Deployment

Update frontend/.env.production with your backend URL
Import the frontend project in Vercel Dashboard
Deploy the frontend application

Post-Deployment Steps

Create containers in Azure Blob Storage:
- uploads
- processed
- results
Initialize Cosmos DB:
- The application will automatically create the database and containers on first run
- No manual initialization is required

Usage

Navigate to http://localhost:3000 in your browser
Upload GNSS and/or IMU data files on the Upload page
Monitor processing status
View and download results from the Files page

Development

Project Structure

FusionFly/
├── frontend/           # React frontend
│   ├── public/         # Static assets
│   └── src/            # React components and logic
│       ├── components/ # Reusable UI components
│       └── pages/      # Main application pages
├── backend/            # Express.js backend
│   └── src/
│       ├── controllers/# API controllers
│       ├── services/   # Business logic
│       ├── routes/     # API routes
│       ├── models/     # Data models
│       └── utils/      # Utility functions
├── uploads/            # Uploaded and processed files
└── test-files/         # Test data for development

Roadmap

Basic GNSS data processing (RINEX, NMEA, UBX)
Multi-format conversion to standardized JSONL
File upload and download functionality
IMU data support
Complete GNSS+IMU fusion with FGO
Interactive trajectory visualization
Batch processing
User authentication and file management
Performance optimizations for large datasets

Troubleshooting

If file uploads fail, check your Blob Storage connection string
For authentication issues, verify your JWT secret
If you encounter Cosmos DB errors, ensure your endpoint and key are correct

License

This project is licensed under the MIT License with Citation Requirements. When using FusionFly, you must cite:

The original research paper:

M. J. L. Lee, J. Lin and L. -T. Hsu, "Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning," 2024 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN), Kowloon, Hong Kong, 2024, pp. 1-6, doi: 10.1109/IPIN62893.2024.10786123.

The FusionFly repository:
```
https://github.com/Thorkee/FusionFly
```

See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue on GitHub.

Acknowledgments

Built with React, Express.js, and Redis
Uses Factor Graph Optimization techniques
Inspired by modern GNSS+IMU fusion research

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
backend		backend
benchmark		benchmark
deployment		deployment
frontend		frontend
public/assets		public/assets
wiki		wiki
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AAE45_Presentation.pdf		AAE45_Presentation.pdf
LICENSE		LICENSE
README.md		README.md
README.original.md		README.original.md
deploy-all-in-one.sh		deploy-all-in-one.sh
deploy-azure-minimal.sh		deploy-azure-minimal.sh
deploy-to-azure.sh		deploy-to-azure.sh
package-lock.json		package-lock.json
package.json		package.json
progress.md		progress.md
vercel.json		vercel.json

License

ada-jl4025/FusionFly

Folders and files

Latest commit

History

Repository files navigation

FusionFly: A Scalable Open-Source Framework for AI-Powered Positioning Data Standardization

Demo

System Architecture

Data Standardization Pipeline

LLM Robustness Features

Format Conversion (First LLM)

Location Extraction (Second LLM)

Schema Conversion (Third LLM) - Enhanced Two-Submodule Approach

Third LLM Pipeline Architecture

Unit Testing

Error Feedback Loop

Data Flow Pipeline

Component Descriptions

Frontend (React)

Backend (Express.js)

Job Queue (Redis/Bull)

Benchmark System

Benchmark Architecture

Test Case Design

Scientific Evaluation Metrics

1. Data Field Accuracy Metrics

2. Robustness Metrics

3. Efficiency Metrics

Benchmark Automation

Visualization and Reporting

Detailed Process Flow

API Endpoints

GNSS+IMU Fusion with FGO

Getting Started

Prerequisites

Installation

Deployment to Vercel

Backend Deployment

Frontend Deployment

Post-Deployment Steps

Usage

Development

Project Structure

Roadmap

Troubleshooting

License

Contributing

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages