Skip to content

markebrown/data-tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

122 Commits
 
 
 
 

Repository files navigation

Not exhaustive and personnal list of "modern" Data Tools and Projects

Suggest a Data Tool !

No (file systems) storage or (traditional) databases, and for now, no data science, virtualization, or streaming tools. And no all embedded tools and services proposed by the 3 main public Cloud providers (Google Cloud, Microsoft Azure and AWS).

Data Architecture

Data Ingestion / Data Onboarding / ETL / ELT

  • Flatfile Data Onboarding platform
  • Fivetran Cloud data integration platform
  • Matillion Cloud data integration platform
  • Apache Gobblin Open Source distributed data integration framework
  • Singer "Open Source standard for writing scripts that move data"
  • Meltano Open Source ELT for the DataOps
  • Airbyte Open Source data integration platform
  • Stitch Simple, extensible Cloud ETL platform (Talend)
  • Hevo No-code data pipeline as a service
  • Apache Hop Open Source data integration platform project
  • Meroxa Real-time data ingestion infrastructure
  • Portable Cloud Hosted ELT Platform
  • Talend, StreamSets, Alooma (Google), Xplenty, Striim, Panoply, Stambia, HVR

Reverse ETL

  • Census Operational analytics platform, move data from data warehouse to apps
  • Hightouch Sync customer data to SaaS business platforms
  • Grouparoo Open Source framework to move data between database and Cloud apps

Data Collection / Product Analytics / Customer Data

  • Segment Customer data platform (CDP) (Twilio)
  • RudderStack Customer data pipeline, event tracking
  • Snowplow Data collection platform
  • Freshpaint Collect, control, and deliver customer data
  • PostHog Open Source Product Analytics platform
  • Amplitude Product Analytics platform
  • Iteratively Product Analytics platform « Capture customer data you trust »
  • Avo Product Analytics platform
  • Mixpanel Product analytics platform
  • Indicative Product analytics platform 
  • Heap Product analytics platform
  • Supermetrics Get marketing data for reporting, analytics and storage

Transformation / Preparation / Cleaning / Wrangling

  • Trifacta Data Wrangling for Cloud (or Hadoop) platforms and storages
  • dbt Transform with SQL from command line (Open Source) or Cloud
  • Dataform Collaboration on SQL pipelines in Cloud data warehouses (Google)
  • Pano Open Source data preparation for Cloud data warehouses
  • Rasgo Data preparation for Data Scientists
  • Mito Jupyter Lab extension to generate panda Python code from a spreadsheet
  • DataPrep Prepare data in Python
  • OpenRefine "A free, open source, powerful tool for working with messy data"

SQL Tools / Editors

  • Count "The BI notebook built for analysts"
  • PopSQL "Modern SQL editor"
  • DataGrip IDE for SQL (JetBrains)
  • DBeaver Free (or Enterprise and Cloud editions) universal database tool
  • sq "swiss-army knife for data", SQL in command line for relational data
  • SqlDBM Develop Database Models
  • Querybook Open Source SQL query and Big Data IDE via a notebook interface
  • Soda SQL Data testing, monitoring, and profiling for SQL-accessible data
  • SQLFluff SQL Linting and Auto-formatting for Humans

SQL Engines

  • Trino Open Source high perf and distributed SQL query engine (formerly PrestoSQL)
  • Starburst Cloud or On-premises SQL engine (based on Trino)
  • AWS Athena Interactive SQL query service for Amazon S3 (based on Presto)
  • DataFusion Query execution engine using Apache Arrow as its in-memory format

BI / Reporting / Data Visualization

  • Metabase Open Source business intelligence tool
  • Apache Superset Open Source modern data exploration and visualization platform
  • Apache ECharts Open Source JavaScript Visualization Library
  • Cube.js Open Source Analytical API platform
  • Grafana Open Source analytics & monitoring solution
  • Looker BI and Analytics Platform (Google)
  • Redash Data visualisation and Dashboarding with SQL (Databricks)
  • Mode Collaborative data platform that combines SQL, R, Python, and visual analytics
  • Sigma Cloud analytics solution
  • Hex Collaborative SQL + Python-based notebooks
  • Lux Python library and API for Intelligent Visual Discovery
  • y42 "No-Code Business Intelligence" platform
  • Knowage Open Source Business Analytics Suite
  • Rakam Data platform for building analytics interface (dbt integration)
  • Datawrapper Enrich stories and articles with data visualization
  • D3 JavaScript library for visualizing data with HTML, SVG, and CSS
  • Lightdash Open source BI tool fully integrated with dbt projects
  • Tableau, PowerBI, Sisense, Qlik, Spotfire, ThoughtSpot, Chartio (Atlassian), Domo, Toucan Toco

Data Quality / Profiling / Observability

  • Monte Carlo "Data Reliability Delivered"
  • Datafold Data Observability platform
  • Great Expectations Open Source data quality, profiling & validation
  • Bigeye Automatic data quality monitoring
  • Anomalo Validate and document your data warehouse
  • Trackplan "Schema Management for Behavioural Data Tracking"
  • lightup Cloud data quality indicators provider

Data Management / Lineage / Catalog / Governance

  • Datakin DataOps solution, Data Lineage
  • Marquez Open Source metadata and data governance project
  • DataHub Open Source metadata search & discovery tool
  • Amundsen Open Source data discovery and metadata engine
  • Data Galaxy Data Governance platform with Data Catalog and Data Lineage
  • Zeenea Cloud-native Data Catalog
  • Alation Data Governance and Data Catalog platform
  • Collibra Data Governance and Data Catalog platform
  • Secoda Data Discovery and Data Catalog
  • MANTA Data Lineage platform
  • data.world Cloud-native Data Catalog
  • Stemma SaaS managed version of Amundsen
  • Egeria Open Metadata and Governance

DataOps / Data Fabric

  • Altan "the modern data workspace", Data Management & DataOps
  • Nessie DataOps for Data Lakes, a "Git-Like Experience for your Data Lake"
  • Nexla DataOps platform "to delivery data for Analytics, AI and Operations"
  • Keboola DataOps platform
  • Saagie DataOps platform
  • DataKitchen DataOps platform
  • DAGsHub GitHub for data
  • Unravel DataOps platform
  • Upsolver "Compute and pipeline layer between your data lake and the analytics tools"
  • Cinchy "Autonomous Data Fabric" and Data Management platform

Orchestration / Workflow

  • Apache Airflow Open Source workflow scheduler platform
  • Dagster Open Source "Data orchestrator for machine learning, analytics, and ETL"
  • Prefect Workflow management system and platform for dataflow automation
  • Apache DolphinScheduler Distributed and visual workflow scheduler system
  • Luigi Python package to build complex pipelines of batch jobs

Storage / Database

  • DuckDB In-process SQL OLAP database (Sqlite like column oriented)
  • ClickHouse Open-source OLAP database management system
  • DoltHub "the true Git for data experience in a SQL database"
  • DVC Data Version Control
  • Materialize Event Streaming Database
  • Warp 10 Advanced Time Series Platform
  • Snowflake, Firebolt, BigQuery, Redshift, Apache Cassandra, MongoDB, InfluxDB, QuestDB, Neo4j, SingleStore(MemSQL)

Data Privacy / Security / Identity

  • Immuta "Self-Service Data Access with Automated Privacy Control"
  • Okera Cloud data security, "Universal Data Authorization"
  • Privacera SaaS Access Governance Solution
  • Apache Ranger Framework to enable, monitor and manage comprehensive data security
  • Baffle Cloud security with a "transparent data security mesh"
  • Privitar Enterprise Data Privacy Software
  • ReachFive Identity & Access Management
  • Okta Trusted platform to secure identities, from customers to workforce

Others

  • Opendatasoft Data sharing platform
  • Streamlit Turns data scripts into shareable data web apps
  • Transform Data Shared data interface and metrics repository
  • White Label Data Platform for building and deploying custom data applications
  • Flat Data Bring working datasets into your GitHub repositories and versioning them

And finally don't hesitate to:

Victor

About

Data Tools Subjective List

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published