A set of generic templates for data preprocessing, exploratory analysis, text analysis and data visualisation. This repository is designed to demonstrate proficiency in key data analytics techniques and tools.
This repository is a curated collection of Jupyter Notebooks that serve as templates for various tasks in the data analytics process. The purpose of this repository is to:
- Showcase proficiency with Python for data-related tasks.
- Serve as a resource for common workflows in data analysis.
- Illustrate best practices in data preprocessing, EDA, and visualization.
- It is aimed at prospective employers and anyone interested in data analytics.
-
- Correlation Analysis
- Distribution Tests
- Similarity Analysis
-
- Sampling Methods
- Aggregation
- Binarisation
- Handling Duplicates
- Extracting Nominal Categories
- Handling Missing Values
-
- Case Folding
- Normalisation
- Stemming
- Stop Word Removal
- Tokenisation
-
- Heatmaps
- Histograms
- Scatterplots
Python 3.11+ Jupyter Notebook
pip install pandas, scipy, sklearn, ydata_profiling, collections, re, nltk, seaborn, matplotlib
- Clone the repository:
git clone https://github.com/spencerduberry/Data-Analytics-Tools_Python.git - Navigate to the relevant folder based on your task (e.g. preprocessing)
- Run the scripts or notebooks:
jupyter notebook Aggregation.ipynb
Contributions are welcome! If you have a useful template or improvement, feel free to open a pull request. Steps to contribute:
- Create a branch (git checkout -b feature/NewFeature).
- Commit your changes (git commit -m 'Add new feature').
- Push the branch (git push origin feature/NewFeature).
- Open a pull request.
Spencer Duberry
LinkedIn: www.linkedin.com/in/spencer-duberry-938233285
Email: spencerduberry@hotmail.co.uk