Skip to content
This repository was archived by the owner on Nov 9, 2025. It is now read-only.

Matheussoranco/Data_Lake_Foss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Lake FOSS Architecture

A datalake architecture made only with FOSS tools. It was part of my bachelors thesis, not on work anymore.

Repository Sanitization

This repository has been sanitized for public sharing. All proprietary company references, specific server names, IP addresses, and internal domain names have been replaced with generic placeholders (e.g., .example.com). This ensures the repository can be safely shared and used as a reference without exposing internal infrastructure details.

What was sanitized:

  • Company names and references
  • Specific hostnames and domain names (.corpintra.net.example.com)
  • IP addresses and server names
  • Internal registry URLs
  • Proxy configurations
  • Certificate names in Kubernetes secrets

Placeholders used:

  • Domain: .example.com
  • Hostnames: your-host.example.com, your-registry.example.com, etc.
  • IPs: Generic placeholders like 192.168.1.100
  • Registry: your-registry.example.com/your-org/your-repo

When deploying, replace these placeholders with your actual infrastructure details.

Authors

This repository is maintained by Matheus Pullig Soranço de Carvalho.

License

This repository is licensed under GNU Affero General Public License v3 (AGPL-3.0).

However, the included projects have their own licenses:

  • Apache Airflow: Apache License 2.0
  • Apache Bigtop: Apache License 2.0
  • Apache Ranger: Apache License 2.0
  • Apache Trino: Apache License 2.0
  • DBT: Apache License 2.0
  • Kubernetes: Apache License 2.0
  • Livy: Apache License 2.0

Note: There is a potential license compatibility issue between AGPL-3.0 and Apache 2.0. The AGPL-3.0 is a strong copyleft license that requires derivative works to be licensed under the same license, while Apache 2.0 is permissive. Since this repository only aggregates unmodified FOSS projects and provides configuration/setup instructions, it should not create derivative works. However, for clarity and to avoid any potential issues, consider using this repository for reference only and comply with each project's individual license terms.

About

A data lakehouse implemented only with FOSS tools

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors