Skip to content

Lightweight relational database from scratch. Uses custom-built B+ Tree indexing for key-value retrieval, secondary indexing, and write-ahead logging for crash recovery. It also integrates with Google Maps via Python scripts to import real-world "Saved Places" data into the database.

License

Notifications You must be signed in to change notification settings

yianan261/BPlusTree_Database_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database Project

This is a simulated relational database that efficiently retrieve key value storage. The database manages user-saved places, using a custom-built B+ Tree storage engine, and secondary indexing. It also integrates with Google Maps via Python scripts to import real-world "Saved Places" data into the database.


Project Components

  • BPlusNode: Represents a node (internal or leaf) in a B+ Tree. Handles splitting, merging, and traversal. Internal nodes store keys, key-value pairs (key and record pointers) are stored in leaf nodes , which are linked together using a doubly-linked list data structure.
  • BPlusTree: Manages the self-balancing B+ Tree structure (order 3). Provides insert, delete, search, range search, and prefix search operations. When the keys in a node overflows it will automatically split. When it underflows it will borrow or merge with sibling nodes.
  • BTreeIndex: A wrapper around the B+ Tree, abstracting the interface for database use.
  • LeaderDB: Main database engine. Inherits publicly from DBInstance abstract class. Manages tables (with indexes), executes CRUD operations in the CLI.
  • SecondaryIndex: Supports secondary indexes (non-primary keys) for attribute-value to list-of-PKs mapping to support faster attribute-based search (e.g., find by email, find by title).
  • CsvParser / FileUtils / DataInserter: Utilities for loading CSV files into tables and inserting structured place data.

Note

Currently Write Ahead Log implementations are incomplete, please disregard at the current time of project. We are keeping it to continue working on it later.

Complexity of search time

m = order = 3 for this implementation, meaning each internal node has at least m/2 children. Assume n total keys, then height h of tree is h = logn. At each internal node binary search is performed to search the local key array (O(logm)) time, but since m is fixed, this is done in O(1) time. Since we established height h of the tree is approximately logn, so search/insert/delete is O(logn) operation.

Inheritance and Polymorphism

  1. InternalNode and LeafNode inherit publicly from base class BPlusNode
  2. LeaderDB inherits publicly from abstract base class DBInstance. This class could extend to future DB nodes for multiple DB instances.

Database Schema

The database schema supports users importing their saved places lists from Google Maps:

1. Users Table (users)

  • email: Email address (used as key).
  • user_id(PK): Random unique generated integer ID.
  • created_at: creation timestamp.

2. Places Table (places)

  • hashedId (PK): The Google place_id is hashed to integers as the primary key, and the original place id will be kept as a string attribute.
  • place_id: original place id kept as a string attribute.
  • name: Name of the place (e.g., "Central Park").
  • address: Formatted address.
  • latitude: Latitude coordinate.
  • longitude: Longitude coordinate.
  • description: Description of the place.

3. SavedLists Table (savedlists)

  • list_id (PK): Random unique list ID.
  • user_id (FK): Belongs to a user.
  • title: Title of the list (e.g., "Favorites").
  • createdAt: timestamp.

4. ListPlaces Table (listplaces)

  • Combined Primary Key of list_id and place_id: int IDs
  • list_id: Mapped to list_id
  • place_Id: Links places to lists (many-to-many relationship)

Commands (Database Menu Options)

  1. get <key> Retrieve one record (all attributes) from the current table using its primary key.
get <key>
  1. create Create a new record or a new table. You will be prompted to specify if you want to create a table or a record.
create
// Program will ask if you want to create table or instance (record in a table)
(table/instance)
Create table or instance? table
Enter table name: Pets
Enter headers (comma separated): name,age
Table pets created with headers: name,  age
  1. update Update an existing record in the current table by specifying a key and providing new attributes.
update
Key: (Enter key)
Attributes (comma separated): (Enter comma separated attributes)
Record updated.
  1. delete <key> Delete a specific record from the current table using its primary key.
delete:
Enter key to delete: 5
Key removed.
  1. drop (table) Delete an entire table from the database (except the default table).
drop
Please enter table name to drop: places
Table dropped.
  1. use (table) Switch to a different table for subsequent operations.
use (tablename)
  1. tables List all existing tables in the database.

  2. load <filepath> Load records from a CSV file into the current table. The first line of the CSV should contain column headers.

load
Enter CSV file path: (enter path)
  1. save Save all tables into CSV files. Each table will be saved to its own CSV under the output/ directory.
save
Saved default to ./output/default.csv
  1. view View up to 10 records from the current table, formatted nicely in columns.
view
  1. createindex <col> Build a secondary index on a specified column to speed up select queries.
createindex 2
Enter column name: (column name)
Secondary index built on <column name>
  1. select <cols>|\* where <col>=<val> Query the current table by applying a filter condition (where) and optionally projecting specific columns.
select * where name=(name)
  1. createuser Create a new user account and optionally upload Google Maps Saved Places data into your database.
createuser
Enter user email: (email)
Would you like to upload your Saved Places? (yes/no):

Paste the relative path, the project use this relative path:

saved_places_dir/yian261_at_gmail_dot_com
  1. join Perform a join between two tables based on matching column values, with optional projection of columns.
join A.1 B.2
  1. help Display the full list of available commands and their descriptions.
help
  1. exit Exit the program.
exit

Building the Project

Make sure you have a C++17 compatible compiler installed.

  1. Clone the repository.
  2. In the project root, compile the program:
make clean
make

To Run the project

After compiling, run

./leaderdb

To Test with createuser uploads

Users may use the provided csv files in the filepath saved_places_dir/yian261_at_gmail_dot_com
The dataset_project files can be used for other table command testing


To Upload Google Takeout File (Optional)

1. Get Google API Key and keep in a .env file

GOOGLE_API_KEY=<your api key>

2. Create and activate Python virtual environment

python -m venv myenv
source myenv/bin/activate  # windows `myenv\Scripts\activate`
pip install -r requirements.txt

3. Install dependencies

Run

pip install -r requirements.txt

4. Get your Google takeout data and drop them in the takeout/Saved folder

5. Run Python script:

python fetch_places.py takeout/Saved <user email>

Credits

Authors:

Zirui Wen, Yian Chen

Dataset:

Google takeout data from Yian Chen

ASCII Art:

From Patorjk

About

Lightweight relational database from scratch. Uses custom-built B+ Tree indexing for key-value retrieval, secondary indexing, and write-ahead logging for crash recovery. It also integrates with Google Maps via Python scripts to import real-world "Saved Places" data into the database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •