Skip to content

Datakult0r/ai-data-science-team

Repository files navigation

AI Data Science Team

AI Data Science Team is a Python library of specialized agents for common data science workflows, plus a flagship app: AI Pipeline Studio. The Studio turns your work into a visual, reproducible pipeline, while the AI team handles data loading, cleaning, visualization, and modeling.

AI Pipeline Studio (Flagship App)

AI Pipeline Studio is the main example of the AI Data Science Team in action.

AI Pipeline Studio

Highlights:

  • Pipeline-first workspace: Visual Editor, Table, Chart, EDA, Code, Model, Predictions, MLflow
  • Manual + AI steps with lineage and reproducible scripts
  • Multi-dataset handling and merge workflows
  • Project saves: metadata-only or full-data
  • Storage footprint controls and rehydrate workflows

Run it:

streamlit run apps/ai-pipeline-studio-app/app.py

Full app docs: apps/ai-pipeline-studio-app/README.md

Quickstart

Requirements

  • Python 3.10+
  • OpenAI API key (or Ollama for local models)

Install the app and library

Clone the repo and install in editable mode:

pip install -e .

Run the AI Pipeline Studio app

streamlit run apps/ai-pipeline-studio-app/app.py

Library Overview

The repository includes both the AI Pipeline Studio app and the underlying AI Data Science Team library. The library provides agent building blocks and multi-agent workflows for:

  • Data loading and inspection
  • Cleaning, wrangling, and feature engineering
  • Visualization and EDA
  • Modeling and evaluation (H2O + MLflow tools)
  • SQL database interaction

Agents (Snapshot)

Agent examples live in examples/. Notable agents:

  • Data Loader Tools Agent
  • Data Wrangling Agent
  • Data Cleaning Agent
  • Data Visualization Agent
  • EDA Tools Agent
  • Feature Engineering Agent
  • SQL Database Agent
  • H2O ML Agent
  • MLflow Tools Agent
  • Multi-agent workflows (e.g., Pandas Data Analyst, SQL Data Analyst)
  • Supervisor Agent (oversees other agents)
  • Custom tools for data science tasks

Apps

See all apps in apps/. Notable apps:

  • AI Pipeline Studio: apps/ai-pipeline-studio-app/
  • EDA Explorer App: apps/exploratory-copilot-app/
  • Pandas Data Analyst App: apps/pandas-data-analyst-app/

Use OpenAI

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model_name="gpt-4.1-mini",
)

Use Ollama (Local LLM)

ollama serve
ollama pull llama3.1:8b
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3.1:8b",
)

About

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages