Data Science Lab 2023: Group 5 Targaryen 🐉

This repository contains our project of the phase 1 of the Practical Course: Data Science for Scientific Data at Karlsruhe Institute of Technology (KIT). The project is about the 'Richter's Predictor: Modeling Earthquake Damage' competition (Link).

Group Members 👤

Forename	Surname	Matr.#
Nina	Mertins	xxxxxxx
Kevin	Hartmann	xxxxxxx
Alessio	Negrini	xxxxxxx

Folder Structure 🗂️

📦phase-1
 ┣ 📂config                    <-- Configuration files for the pipeline
 ┣ 📂data                      <-- Data used as input during development with Jupyter notebooks. 
 ┃ ┣ 📂predictions             <-- Contains the predicted data build during development.
 ┃ ┣ 📂raw                     <-- Contains the raw data provided by the supervisors.
 ┃ ┗ 📂processed               <-- Contains the processed data build during development.
 ┣ 📂models                    <-- Saved models during Development.
 ┣ 📂notebooks                 <-- Jupyter Notebooks used in development.
 ┃ ┗ 📂weekXX                  <-- Contains the notebooks for weekly subtasks and experimenting.
 ┣ 📂src                       <-- Source code.
 ┃ ┣ 📜data_cleaning.py        <-- Contains the functions for data cleaning.
 ┃ ┣ 📜feature_engineering.py  <-- Contains the functions for feature engineering.
 ┃ ┣ 📜feature_selection.py    <-- Contains the functions for feature selection.
 ┃ ┣ 🕹️main.py                 <-- Main file for running the pipeline.
 ┃ ┗ 📜modelling.py            <-- Contains the functions for model training.
 ┣ 📜.gitignore                <-- Specifies intentionally untracked files to ignore when using Git.
 ┣ 📜README.md                 <-- The top-level README for developers using this project. 
 ┗ 📜requirements.txt          <-- The requirenments file for reproducing the environment, e.g. generated with 
                                    'pip freeze > requirenments.txt'.

Setting up the environment and run the code ▶️

Clone the repository by running the following command in your terminal:

git clone https://git.scc.kit.edu/data-science-lab-2023/group-5-targaryen/phase-1.git

Navigate to the project root directory by running the following command in your terminal:
```
cd phase-1
```
[Optional] Create a virtual environment and activate it. For example, using the built-in venv module in Python:
```
python3 -m venv venv-phase-1
source venv-phase-1/bin/activate
```
Install the required packages by running the following command in your terminal:
```
pip install -r requirements.txt
```
Place the data in the phase-1/data/raw folder. Ensure that the data is in the appropriate format and structure required by the pipeline. The dataset can be downloaded as is from the competition website.

Run the pipeline with the following command:

python3 src/main.py --config "configs/config.yml"

By following these steps, you should be able to successfully run the pipeline on the data and obtain the desired results. You can also monitor the pipeline's progress through the logs printed in the terminal. If any errors or issues occur, the logs will provide valuable information for troubleshooting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Science Lab 2023: Group 5 Targaryen 🐉

Group Members 👤

Folder Structure 🗂️

Setting up the environment and run the code ▶️

About

Uh oh!

Releases

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
configs		configs
data		data
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

negralessio/data-science-lab-phase-1

Folders and files

Latest commit

History

Repository files navigation

Data Science Lab 2023: Group 5 Targaryen 🐉

Group Members 👤

Folder Structure 🗂️

Setting up the environment and run the code ▶️

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 4

Uh oh!

Languages