🚀 BDM2 - NoSQL Database Design with MongoDB

This project explores core concepts of document-oriented databases using MongoDB, one of the most widely adopted NoSQL systems. It demonstrates how to design and model document structures, import and manipulate data, and execute queries efficiently. The project also walks through setting up the environment and provides hands-on exercises that reinforce key principles of document store management, particularly in the context of Big Data.

👩🏻‍💻🧑‍💻 Project by Lucia Sauer & Julian Romero
🎓 Master in Data Science – Barcelona School of Economics

🗂️ Repository Structure

BDM2-Assignment2/
│
├── SectionA/                     # JSON structure definitions for each data model
│   └── model_structures.md      # Describes the document designs for Model 1, 2, and 3
│
├── SectionB/
│   └── project/
│       ├── main.py              # Orchestrates data generation, insertion, and querying
│       ├── data_generator.py    # Generates company and person data
│       ├── models/              # Model implementations
│       │   ├── model1.py        # Reference-based model (normalized)
│       │   ├── model2.py        # Embedded company inside person (denormalized)
│       │   └── model3.py        # Embedded persons inside company (denormalized)
│       └── utils.py             # Shared utilities (e.g., for measuring execution time)
│
├── L2-T01_submission.pdf        # Final report (Section C): performance evaluation and conclusions

🧪 How to Run

Ensure MongoDB is installed and running locally.
Navigate to the SectionB directory.
Run the main script:
```
python main.py
```

This will:

Generate a dataset of companies and associated persons.
Insert the data according to three data models.
Run benchmark queries and display the average execution times.

📌 Key Findings

Each document model has its own strengths and limitations. The performance and maintainability of a MongoDB schema depend on the query patterns and update frequency.

Model	Characteristics	✅ Pros	⚠️ Cons
M1	Separate collections with references	Flexible updates, normalized structure	Requires joins and aggregations for related data
M2	Company embedded in each person	Fast person-centric queries	Redundant company data across documents
M3	Persons embedded in each company	Efficient company-centric queries	Updating individual person info is complex

📌 TL;DR:

M1 is best when updates are frequent and relationships change often.
M2 is optimal for read-heavy, person-focused applications.
M3 excels when queries are company-centric and updates are infrequent.

📄 Submission

The final report L2-T01_submission.pdf contains:

A breakdown of each model’s design choices.
Query performance benchmarks.
A detailed analysis of trade-offs between schema types.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
sectionA		sectionA
sectionB		sectionB
.gitignore		.gitignore
L2-T01_Romero_Sauer.pdf		L2-T01_Romero_Sauer.pdf
Lab2-Assignment-Statement.pdf		Lab2-Assignment-Statement.pdf
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 BDM2 - NoSQL Database Design with MongoDB

🗂️ Repository Structure

🧪 How to Run

📌 Key Findings

📄 Submission

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

luciasauer/bdm2

Folders and files

Latest commit

History

Repository files navigation

🚀 BDM2 - NoSQL Database Design with MongoDB

🗂️ Repository Structure

🧪 How to Run

📌 Key Findings

📄 Submission

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages