Ask ocean data questions. Get answers, not files.
FloatChat is a conversational interface for exploring ARGO oceanographic data. Users can query oceanographic datasets in natural language and receive visualizations such as plots, maps, and summaries.
ARGO datasets are powerful but difficult to work with due to their size and format. FloatChat reduces this friction by providing a chat-based interface for data exploration instead of manual data processing.
The project was built to make oceanographic data more accessible for students, researchers, and analysts without requiring prior experience with NetCDF files or scripting workflows.
- Natural language querying over ARGO float data using RAG pipeline
- Automatic SQL generation from natural language questions
- Interactive visualizations with Plotly (plots, maps, heatmaps)
- Semantic search across float metadata and measurements via ChromaDB
- User authentication with local accounts and Google OAuth
- Real-time chat interface with message history
- Session management for persistent conversations
- Kubernetes-ready with full deployment manifests
Hero Section |
The Problem & Solution |
Key Features |
Why It Matters |
Interactive Prompting Interface
Frontend (Next.js)
- Next.js 15 with React 19
- TypeScript
- TailwindCSS
- Plotly.js & React-Plotly for data visualization
- React Markdown for message formatting
- Framer Motion for animations
Backend API (Node.js)
- Express.js
- TypeScript
- Passport.js (Local & Google OAuth)
- PostgreSQL with pg driver
- Session management with connect-pg-simple
AI/ML Service (Python)
- FastAPI
- LangChain for RAG (Retrieval-Augmented Generation)
- Groq API (Llama 3.3 70B)
- HuggingFace Embeddings (all-MiniLM-L6-v2)
- LangChain SQL Agent for database queries
Databases
- PostgreSQL with PostGIS extension
- ChromaDB for vector embeddings
Data Processing
- Xarray for NetCDF files
- Pandas for data manipulation
- psycopg2 for database operations
Infrastructure
- Docker & Docker Compose
- Kubernetes manifests (k8s/)
FloatChat follows a microservices architecture with three main services:
- Next.js application with React components
- Handles user authentication, chat interface, and data visualization
- Communicates with the backend API for user management
- Connects to the AI service for chat functionality
- Express.js REST API
- Manages user authentication (local + Google OAuth)
- Handles session management with PostgreSQL
- Provides authentication endpoints
- FastAPI-based service
- Implements RAG (Retrieval-Augmented Generation) pipeline
- Uses LangChain with Groq's Llama 3.3 70B model
- Queries PostgreSQL database using natural language
- Retrieves context from ChromaDB vector store
- Generates SQL queries and returns structured responses
RAG & MCP Pipeline Architecture
- Data Ingestion: ARGO NetCDF files are downloaded and parsed using Xarray
- Data Storage: Extracted measurements, profiles, and float metadata are stored in PostgreSQL with PostGIS
- Embedding Generation: Profile summaries are generated and embedded using HuggingFace models
- Vector Storage: Embeddings are stored in ChromaDB for semantic search
- User Query: User sends natural language query through the Next.js frontend
- RAG Pipeline:
- Query is processed by the LangChain agent
- Relevant context is retrieved from ChromaDB
- SQL queries are generated to fetch data from PostgreSQL
- LLM synthesizes the response
- Visualization: Response is rendered with Plotly charts in the frontend
CREATE TABLE floats (
float_id INTEGER PRIMARY KEY,
wmo_number VARCHAR(20),
data_center VARCHAR(50),
platform_type VARCHAR(100),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE profiles (
profile_id SERIAL PRIMARY KEY,
float_id INTEGER REFERENCES floats(float_id),
cycle_number INTEGER,
profile_date TIMESTAMPTZ,
latitude DECIMAL(10,7),
longitude DECIMAL(10,7),
location GEOGRAPHY(POINT, 4326),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE measurements (
measurement_id SERIAL PRIMARY KEY,
profile_id INTEGER REFERENCES profiles(profile_id),
pressure DECIMAL(10,3),
temperature DECIMAL(10,4),
salinity DECIMAL(10,4),
created_at TIMESTAMPTZ DEFAULT NOW()
);floatchat/
├── app/ # Next.js frontend application
│ ├── src/
│ │ ├── app/
│ │ ├── components/
│ │ ├── lib/
│ │ └── utils/
│ ├── public/
│ └── Dockerfile
│
├── api/ # Express.js backend API
│ ├── src/
│ │ ├── controllers/
│ │ ├── models/
│ │ ├── routes/
│ │ ├── passport.ts
│ │ └── server.ts
│ └── Dockerfile
│
├── ai/ # Python FastAPI AI service
│ ├── src/
│ │ ├── api/
│ │ ├── core/
│ │ ├── database/
│ │ ├── llm/
│ │ └── schemas/
│ ├── scripts/
│ ├── data/chroma_db/
│ └── Dockerfile
│
├── data_processing/ # ARGO data scraping scripts
│ └── scrape-argo-data.py
│
├── k8s/ # Kubernetes deployment manifests
│ ├── ai/
│ ├── api/
│ ├── app/
│ ├── db/
│ └── ingress.yaml
│
└── images/ # Project screenshots and assets
FloatChat was developed as part of the Smart India Hackathon.
The project cleared the internal hackathon at VIT and was shortlisted for the SIH finals from VIT Vellore.
This repository contains the final version submitted during the selection process.
Team BoyOhBuoy - Smart India Hackathon 2025
- Aakashdeep Singh - Backend, Agentic AI Development & Pipelining
- Upayan Mazumder - Frontend/Backend Integration with AI
- Saksham Dubey - Backend Development & Integration
- Maneet Gupta - AI Development
- Chitrita Gahlot - UI/UX Design
- Ashman Sodhi - Machine Learning
Mentor: Professor Manoov R



