Data science portfolio

Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of Jupyter notebooks,

Setup

Create a virtual environment and install the project dependencies:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt

Alternatively, you can automate the above steps with:

make install

After setting up the environment, install the pre-commit hooks:

pre-commit install

Contributing

Branch Naming

Use descriptive branch names prefixed with feature/, fix/, or docs/.
Separate words with hyphens, e.g., feature/add-model-evaluation.

Commit Messages

Use the imperative mood in the subject line.
Keep the subject line under 50 characters.

Pull Requests

Open PRs against the main branch.
Ensure all checks pass and request at least one review.
Address review feedback promptly.

Docker

Build the image:

docker build -t dsp .

Run the container:

docker run -p 8888:8888 dsp

This starts a Jupyter Notebook server at http://localhost:8888.

Building a Simple Language Model with Flask and NLTK This repository details the implementation, testing, and evaluation of a simple n-gram language model using the NLTK library. The model is designed to predict the next word in a sentence based on the preceding words, using n-gram statistics. We explored both bigram (n=2) and trigram (n=3) models to compare their performance.

Synthetic Oncology Data: Generate de-identified cancer patient records with Synthea and matching VCF variant profiles.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.devcontainer		.devcontainer
.ipynb_checkpoints		.ipynb_checkpoints
AI_Career_coach		AI_Career_coach
AI_workflow		AI_workflow
Arrythmia_classification		Arrythmia_classification
Arrythmia_detection		Arrythmia_detection
CharityML		CharityML
ETL_Pipeline/movies_reviews		ETL_Pipeline/movies_reviews
GAN implementation		GAN implementation
LLM_FLASK		LLM_FLASK
News explorer hub		News explorer hub
boston_housing		boston_housing
cat_spotter		cat_spotter
mit-bih-arrhythmia-database-p-wave-annotations-1.0.0		mit-bih-arrhythmia-database-p-wave-annotations-1.0.0
streamlit app		streamlit app
synthetic_oncology		synthetic_oncology
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
arrhythmia.csv		arrhythmia.csv
battedBallData.csv		battedBallData.csv
final_df.csv		final_df.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data science portfolio

Setup

Contributing

Branch Naming

Commit Messages

Pull Requests

Docker

Contents

Machine Learning

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Dislevekanku/datascienceprojects

Folders and files

Latest commit

History

Repository files navigation

Data science portfolio

Setup

Contributing

Branch Naming

Commit Messages

Pull Requests

Docker

Contents

Machine Learning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages