Tabular Database Systems (TaDa)

A DuckDB-based course on the fundamentals of relational database management systems and SQL.

Welcome!

This lecture material has been developed by Torsten Grust to support a 15-week course (coined TaDa) for undergraduate students of the Database Research Group at University of Tübingen (Germany). You are welcome to use this material in any way you may see fit: skim it, study it, send suggestions or corrections, or tear it apart to build your own lecture material based on it. I would be delighted to hear from you in any case:

E-Mail: torsten.grust@uni-tuebingen.de
Web: https://db.cs.uni-tuebingen.de/grust/
Bluesky: https://bsky.app/profile/teggy.org

A DuckDB-Based Introduction to Tabular Data and SQL

As a member of the decades-old family of relational database management systems, DuckDB is an expert in processing tabular data (or: tables, relations). DuckDB is a capable and very efficient SQL database system that can crunch billions of rows on commodity laptops. Its SQL dialect is versatile, complete, and remarkably friendly. DuckDB is a breeze to install ~~and maintain~~, open for tinkering and inspection, has developed an open and supportive community since its inception in 2019, and thus makes for an ideal vehicle for an introduction into the world of contemporary tabular database system technology.

15 weeks hardly suffice to exhaustively cover a field that has developed since the early 1970s, but I still hope that the topics covered in TaDa will pave a path from which you can easily branch off to get lost in the depths of Ted Codd's jungle. A future TaDa may see chapters added, merged, or removed but as of March 2026, the chapter layout reads as follows:

Tabular Data and Database Systems
Tabular Data in CSV Files
Reading Data at the Speed of ~~Light~~Memory
Columnar Table Storage
Database-External Data in Parquet Files
The Structured Query Language (SQL)
More SQL (Subqueries + Embedded SQL)
SQL: Grouping + Aggregation and Functional Dependencies

Here at U Tübingen, I walk students through these chapters front to back.

We do not assume any prior SQL skills. Chapters 03 and 07 will have you read and modify short pieces of C, Python, or awk code. There will be only fleeting glimpses at the internals of DuckDB. If the innards of The Duck catch your interest, you may find the companion course DiDi on selected internals of the DuckDB kernel helpful.

TaDa = Slides + Auxiliary Material

Chapter ‹N› of TaDa comes with a slide set in file slides/TaDa-‹N›.pdf (see the hierachy of relevant files below). Note that these slide sets literally only tell half of the story.

The other half is found in 50+ auxiliary files—mostly SQL scripts, but also code written in C, Python, and awk—collected in directory material/‹N›/ for Chapter ‹N›. The slides contain tags 📄#‹nnn› whenever a file named ‹nnn›-* contains relevant supporting material. Beyond code, these files contains plenty of commentary—you absolutely need to study (and ideally run, modify, play with) these files in material/ to obtain the intended and complete TaDa picture.

To run these files, change into the material/‹N›/ directory and invoke DuckDB, your Python/awk interpreter, or C compiler there. For example:

$ cd material/03
$ duckdb -f generate-lineitem.sql
$ ./013-sum-quantity.py lineitem.csv
$ duckdb -f 020-sum-quantity.sql
$ duckdb
D .read 020-sum-quantity.sql

I have found that students make best use of the SQL scripts when they cut & paste individual SQL commands and queries from the *.sql files right into a DuckDB CLI session.

Generating Sample Database Instances

Some of the SQL scripts operate over CSV files or persistent DuckDB database (stored in *.db files). Should such data sources be needed, the material/‹N›/ subdirectories contain generate-*.sql scripts that will generate the required CSV or database files in the current directory. Example:

$ cd material/07
$ duckdb -f generate-045-vehicles.sql
$ ls -lh 045-vehicles.db
-rw-r--r--  1 grust  staff   1.5M Mar 19 10:07 045-vehicles.db

Credits

The TaDa material stands on the shoulders of

a variety of scientific papers (which we mention and link to on the slides),
the DuckDB documentation at https://duckdb.org/docs/,
blog posts (mostly found on https://duckdb.org/news/),
discussions on the friendly DuckDB Discord (https://discord.duckdb.org/),
personal communication (over Discord and beers) with the awesome bunch of DuckDB developers at DuckDB Labs,
SQL references/standards,
experience, and best practices.

Chapter 03 (Reading Data at the Speed of ~~Light~~Memory) is an adaptation and extension of a discussion found in Thomas Neumann's fabulous lecture notes on Foundations in Data Engineering.

The slides were authored using (a heavily modified version of) Morgan McGuire's Markdown dialect Markdeep. I used Fabrizio Schiavi's fixed-width Pragmata Pro fonts for typesetting.

TaDa File Layout

.
└── slides
│   ├── TaDa-01.pdf
│   ├── TaDa-02.pdf
│   ├── TaDa-03.pdf
│   ├── TaDa-04.pdf
│   ├── TaDa-05.pdf
│   ├── TaDa-06.pdf
│   ├── TaDa-07.pdf
│   └── TaDa-08.pdf
├── material
│   ├── 01
│   │   ├── 001-create-table.sql
│   │   ├── 002-rock-paper-scissors.sql
│   │   ├── 003-primary-keys.sql
│   │   ├── 004-good-keys.sql
│   │   ├── 005-foreign-keys.sql
│   │   └── 006-finite-state-machine.sql
│   ├── 02
│   │   ├── 007-vehicles.csv -> vehicles.csv
│   │   ├── 008-read_csv.sql
│   │   ├── 009-gsheets.sql
│   │   ├── 010-copy-from-to.sql
│   │   ├── 011-csv-sniffing.sql
│   │   ├── band.csv
│   │   ├── capacity.tsv
│   │   ├── drivers.csv
│   │   ├── flights.csv
│   │   ├── generate-lineitem.sql
│   │   ├── StarWars-EpisodeIV.txt
│   │   └── vehicles.csv
│   ├── 03
│   │   ├── 012-sum-quantity.awk
│   │   ├── 013-sum-quantity.py
│   │   ├── 014-sum-quantity.c
│   │   ├── 015-sum-quantity-mmap.c
│   │   ├── 016-bit-twiddling.c
│   │   ├── 017-sum-quantity-mmap-block.c
│   │   ├── 018-sum-quantity-mmap-threads.c
│   │   ├── 019-aggregates.sql
│   │   ├── 020-sum-quantity.sql
│   │   └── generate-lineitem.sql
│   ├── 04
│   │   ├── 021-databases.sql
│   │   ├── 022-row-groups.sql
│   │   └── 023-compression.sql
│   ├── 05
│   │   ├── 024-parquet.sql
│   │   ├── 025-pushdown.sql
│   │   ├── 026-bloom-filters.sql
│   │   └── 027-hive-partitioning.sql
│   ├── 06
│   │   ├── 028-row-variables.sql
│   │   ├── 029-cross-products.sql
│   │   ├── 030-inner-join.sql
│   │   ├── 031-outer-join.sql
│   │   ├── 032-semi-anti-join.sql
│   │   ├── 033-join-plans.sql
│   │   ├── 034-bag-algebra.sql
│   │   └── 035-starwars.db
│   ├── 07
│   │   ├── 036-subqueries.sql
│   │   ├── 037-correlation.sql
│   │   ├── 038-decorrelation.sql
│   │   ├── 039-table-valued.sql
│   │   ├── 040-quantification.sql
│   │   ├── 042-embedded.py
│   │   ├── 043-embedded-types.py
│   │   ├── 044-construction.py
│   │   ├── 046-n-plus-one.py
│   │   ├── compositionality.py
│   │   ├── generate-041-railway.sql
│   │   └── generate-045-vehicles.sql
│   └── 08
│       ├── 047-grouping.sql
│       ├── 048-grouping-quiz.sql
│       ├── 049-granularity.sql
│       ├── 050-nyc-taxi.sql
│       ├── 052-grouping.py
│       ├── 053-grouping-and-fds.sql
│       ├── 054-fds-having.sql
│       ├── 055-fd-split.sql
│       ├── 056-views.sql
│       ├── generate-051-nyc-taxi.sql
│       └── nyc-taxi
│           ├── central-park-weather-2024.csv
│           ├── taxi_zone_lookup.csv
│           ├── taxi_zones.dbf
│           ├── taxi_zones.shp
│           └── taxi_zones.shx
├── README.md
└── LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tabular Database Systems (TaDa)

Welcome!

A DuckDB-Based Introduction to Tabular Data and SQL

TaDa = Slides + Auxiliary Material

Generating Sample Database Instances

Credits

TaDa File Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
material		material
slides		slides
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Tabular Database Systems (TaDa)

Welcome!

A DuckDB-Based Introduction to Tabular Data and SQL

TaDa = Slides + Auxiliary Material

Generating Sample Database Instances

Credits

TaDa File Layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages