Skip to content

spencerduberry/data-analytics-tools

Repository files navigation

Data Analytics Tools - Python

A set of generic templates for data preprocessing, exploratory analysis, text analysis and data visualisation. This repository is designed to demonstrate proficiency in key data analytics techniques and tools.

Table of Contents

  1. About the Repository
  2. Contents
  3. Getting Started
  4. Usage
  5. Contributing
  6. Contact

About the Repository

This repository is a curated collection of Jupyter Notebooks that serve as templates for various tasks in the data analytics process. The purpose of this repository is to:

  • Showcase proficiency with Python for data-related tasks.
  • Serve as a resource for common workflows in data analysis.
  • Illustrate best practices in data preprocessing, EDA, and visualization.
  • It is aimed at prospective employers and anyone interested in data analytics.

Contents

  1. Exploratory Data Analysis (EDA)

    • Correlation Analysis
    • Distribution Tests
    • Similarity Analysis
  2. Preprocessing

    • Sampling Methods
    • Aggregation
    • Binarisation
    • Handling Duplicates
    • Extracting Nominal Categories
    • Handling Missing Values
  3. Text Analysis

    • Case Folding
    • Normalisation
    • Stemming
    • Stop Word Removal
    • Tokenisation
  4. Visualisation

    • Heatmaps
    • Histograms
    • Scatterplots

Getting started

Prerequisites

Python 3.11+ Jupyter Notebook

Install required packages:

pip install pandas, scipy, sklearn, ydata_profiling, collections, re, nltk, seaborn, matplotlib

Usage

  1. Clone the repository:
    git clone https://github.com/spencerduberry/Data-Analytics-Tools_Python.git
  2. Navigate to the relevant folder based on your task (e.g. preprocessing)
  3. Run the scripts or notebooks:
    jupyter notebook Aggregation.ipynb

Contributing

Contributions are welcome! If you have a useful template or improvement, feel free to open a pull request. Steps to contribute:

Fork the repository.

  1. Create a branch (git checkout -b feature/NewFeature).
  2. Commit your changes (git commit -m 'Add new feature').
  3. Push the branch (git push origin feature/NewFeature).
  4. Open a pull request.

Contact

Spencer Duberry
LinkedIn: www.linkedin.com/in/spencer-duberry-938233285
Email: spencerduberry@hotmail.co.uk

About

Generic data analysis tools: preprocessing, exploratory, visualisations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors