Skip to content

Repo for learning ML via CodeSignal challenges, including algorithms and projects

Notifications You must be signed in to change notification settings

teguhteja/learn-codesignal-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

436 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learn CodeSignal ML

A comprehensive collection of machine learning and web development resources from CodeSignal's learning paths, implemented in Jupyter notebooks.

Project Overview

This repository contains Jupyter notebooks covering various topics from CodeSignal's course paths (https://learn.codesignal.com/course-paths). Initially focused on machine learning, the project has expanded to include web development with Python and advanced AI topics.

Complete Course Materials

1. Machine Learning & AI Fundamentals

  1. Journey into Machine Learning with Sklearn and Tensorflow

    • Data Cleaning and Preprocessing in Machine Learning
    • Foundational Machine Learning Models with Sklearn
    • Feature Engineering for Machine Learning
    • Intro to Model Optimization in Machine Learning
    • Introduction to Neural Networks with TensorFlow
  2. AI Theory and Coding

    • Regression and Gradient Descent
    • Classification Algorithms and Metrics
    • Gradient Descent: Building Optimization Algorithms from Scratch
    • Ensemble Methods from Scratch
    • Unsupervised Learning and Clustering
    • Neural Networks Basics from Scratch
  3. Introduction to Machine Learning with SciKit Learn

    • Training Your First Machine Learning Model from Scratch
    • Data Preprocessing For Machine Learning
    • Diving Deep into Regression
    • Cracking Classification
    • Deep Dive into Regression and Classification Metrics
    • Ensembles in Machine Learning
    • Hypertuning and Cross-Validation

2. Advanced Data Science & Analytics

  1. Mastering Dimensionality Reduction with Python

    • Navigating Data Simplification with PCA
    • Linear Landscapes of Dimensionality Reduction
    • Non-linear Dimensionality Reduction Techniques
    • Enigmatic Autoencoders for Dimensionality Reduction
    • Dimensionality Reduction with Feature Selection
  2. Deep Dive into Numpy and Pandas

    • NumPy Basics
    • Pandas Basics and DataFrame Manipulation
    • Introduction to Data Cleaning and Transformation
    • Advanced Data Analysis with Pandas
    • Data Transformation Techniques in Pandas
    • Comprehensive Data Wrangling and Analysis with Pandas and Numpy
  3. Mastering Clustering in Machine Learning

    • K-means Clustering Decoded
    • Hierarchical Clustering Deep Dive
    • Density-Based Clustering Simplified
    • Cluster Performance Unveiled

3. Deep Learning & Neural Networks

  1. Comprehensive Introduction to Tensorflow

    • Introduction to TensorFlow Basics
    • Building a Neural Network in TensorFlow
    • Modeling the Iris Dataset with TensorFlow
    • TensorFlow Techniques for Model Optimization
  2. Mathematical Foundations for Deep Learning

    • Introduction to Linear Algebra for Machine Learning
    • Introduction to Calculus for Machine Learning
    • Advanced Calculus for Machine Learning
    • Foundations of Optimization Algorithms
    • Introduction to Probability and Statistics for Machine Learning
  3. Building a Sketch Recognition System with CNN

    • Introduction to Drawing Recognition and CNN Fundamentals
    • Data Preparation for Drawing Recognition
    • Drawing Recognition with CNNs for Sketches
    • Building the UI for Drawing Recognition

4. Natural Language Processing

  1. Introduction to Natural Language Processing

    • Introduction to Text Data Exploration in Python
    • Text Data Preprocessing in Python
    • Introduction to TF-IDF Vectorization in Python
    • Building and Evaluating Text Classifiers in Python
  2. Text Classification with Natural Language Processing

    • Collecting and Preparing Textual Data for Classification
    • Feature Engineering for Text Classification
    • Introduction to Modeling Techniques for Text Classification
    • Advanced Modeling for Text Classification
  3. Token Classification in NLP using spaCy

    • Building an NLP Pipeline with spaCy for Token Classification
    • Linguistics for Token Classification in spaCy
    • Practical Applications of spaCy for Real-Life Tasks
  4. Data Processing for LLMs

    • Foundations of NLP Data Processing
    • Modern Tokenization Techniques for AI & LLMs
    • Optimized Data Preparation for Large-Scale LLMs
    • Chunking and Storing Text for Efficient LLM Processing

5. Generative AI & LLMs

  1. Prompt Engineering for Everyone

    • Understanding LLMs and Basic Prompting Techniques
    • Engineering Output Size with LLMs
    • Journey Into Format Control in Prompt Engineering
    • Prompt Engineering for Precise Text Modification
    • Advanced Techniques in Prompt Engineering
  2. Generative AI for Everyone in 2025

    • Generative AI in 2025 - Overview and Practice
    • Mastering Communication with AI Language Models
    • Applying Generative AI in Everyday Professional Tasks
    • Making Things Shine - Practice and Learn Image Generation with AI
    • Generative AI - The Next Frontier: Voice, Video, and More
  3. Foundations of Retrieval Augmented Generation (RAG) Systems

    • Introduction to RAG
    • Text Representation Techniques for RAG Systems
    • Scaling up RAG with Vector Databases
    • Beyond Basic RAG: Improving our Pipeline
  4. Talk to Your Documents with LangChain and Python

    • LangChain Chat Essentials in Python
    • Document Processing and Retrieval with LangChain in Python
    • Building a RAG-Powered Chatbot with LangChain and Python

6. AI Applications & Services

  1. Implementing Video Transcriber with OpenAI Whisper in Python

    • Getting Started with OpenAI Whisper API in Python
    • Transcribing Large Files in Python using FFmpeg
    • Scraping and Transcribing Remote Videos
  2. Building a Chatbot with FastAPI and OpenAI

    • Creating a Chatbot with OpenAI in Python
    • Building a Chatbot Service With FastAPI
    • Developing a Chatbot Web Application With FastAPI
  3. Building a Personal tutor with DeepSeek and FastAPI

    • Creating a Personal Tutor with DeepSeek in Python
    • Building a Personal Tutor Service With FastAPI
    • Developing a Personal Tutor Web Application With FastAPI
  4. MCP Servers Made Easy with Python and OpenAI Agents

    • Introduction to OpenAI Agents SDK in Python
    • Developing and Integrating an MCP Server in Python
    • Advanced MCP Server and Agent Integration in Python

7. Web Development & APIs

  1. Introduction to Django for Back-End Development

    • First Steps Into Back-End Engineering with Django
    • Managing Data with SQLite and Django ORM
    • Retrieving and Manipulating Data with Django ORM
    • Building a Full-Featured To-Do List Application
  2. APIs Made Easy with Python and Flask

    • Introduction to Flask Basics
    • Mastering Flask HTTP Methods
    • Flask Data Modeling with Marshmallow
    • Securing Flask Apps with JWT Authentication
  3. Mastering Web Scraping with Python and Beautiful Soup

    • Basic Python and Web Requests
    • Introduction to BeautifulSoup for Web Scraping
    • Advanced Web Scraping Techniques
    • Implementing Scalable Web Scraping with Python

8. Cloud & Production Systems

  1. Mastering Cloud Engineering with AWS and Python

    • Introduction to AWS SDK for Python
    • Mastering Amazon S3 with AWS SDK for Python
    • Introduction to DynamoDB with AWS SDK for Python
    • Mastering Messaging with AWS SDK for Python
    • AWS Secrets Management with AWS SDK for Python
  2. Deploying ML Models in Production

    • Building Reusable Pipeline Functions
    • Model Serving with FastAPI
    • Automating Retraining with Apache Airflow
  3. Learning and Mastering Redis with Python

    • Introduction to Redis with Python: The Basics
    • Mastering Redis Data Structures with Python: Beyond Basics
    • Mastering Redis Transactions and Efficiency with Python
    • Mastering Redis for High-Performance Applications with Python
    • Implementing a Redis-based Backend System with Python

9. Specialized Topics

  1. Intro to Machine Learning in Trading with $TSLA

    • Basic $TSLA Financial Data Handling in Pandas
    • Technical Indicators in Financial Analysis with Pandas
    • Preparing Financial Data for Machine Learning
    • Introduction to Machine Learning with Gradient Boosting Models
  2. Mastering Algorithms and Data Structures in Python

    • Hashing, Dictionaries, and Sets in Python
    • Sorting and Searching Algorithms in Python
    • Linked Lists, Stacks, and Queues in Python
    • Understanding and Using Trees in Python
    • Mastering Graphs in Python
  3. AI Interviews - Software Design, Architecture, and More

    • AI Interviews: Software Development and Methodologies
    • AI Interviews: System Architecture and Design
    • AI Interviews: Network and Data Management
    • AI Interviews: System Performance and Security
  4. Getting Started with SQL with Leo Messi

  5. Go Programming for Beginners

Why This Project?

  • Practical Learning: Hands-on implementation of theoretical concepts
  • Diverse Topics: Covers ML, AI, web development, and more
  • Industry Relevance: Focuses on in-demand skills and technologies
  • Open Source: Encourages collaboration and knowledge sharing

How to Use

  1. Clone the repository
  2. Open the Jupyter notebooks
  3. Explore the markdown explanations and code examples
  4. Experiment with the code and adapt it to your projects

Development Tools

This repository includes several automation tools to help with content creation and organization:

create_folder.py - Folder Structure Generator

Purpose: Automatically creates folder structures based on course text files.

How it works:

  • Reads a text file containing course outline
  • Detects lines ending with "practices" as course section indicators
  • Creates numbered folders for each section found after a "practices" line
  • Useful for organizing course materials before converting to notebooks

Usage:

python create_folder.py <text_file>

Example:

python create_folder.py course-outline.txt

Input format example:

Unit 1
4 practices
16 min
Introduction to Tokenization (Rule-Based Tokenization)
Preview
Tokenize Text with NLTK

Output: Creates folder "1. Introduction to Tokenization (Rule-Based Tokenization)"

generate-ipynb.sh - Jupyter Notebook Generator

Purpose: Converts structured text files into Jupyter notebooks with proper formatting.

Features:

  • Smart Filtering: Automatically skips lines containing:
    • Number + "practices" (e.g., "4 practices")
    • Number + "min" (e.g., "16 min")
    • "Preview" (case-insensitive)
  • Intelligent Naming: Creates notebook filenames using Unit number + first valid content line
  • Content Structure: Converts text content to markdown cells with proper heading hierarchy
  • Last Line Handling: Ensures the last line of each unit becomes a heading in the notebook

Usage:

./generate-ipynb.sh <input_text_file>

Example:

./generate-ipynb.sh course-content.txt

Input format example:

Unit 1
4 practices
16 min
Introduction to Tokenization (Rule-Based Tokenization)
Preview
Tokenize Text with NLTK
Sentence Tokenization with NLTK
Extract Monetary Values with Regex

Unit 2
3 practices  
13 min
Byte-Pair Encoding (BPE) – Subword Tokenization
Preview
Exploring Pre-trained Tokenizers with GPT-2

Output:

  • 1.Introduction-to-Tokenization-(Rule-Based-Tokenization).ipynb
  • 2.Byte-Pair-Encoding-(BPE)-–-Subword-Tokenization.ipynb

Generated Notebook Structure:

  • Unit title as H1 heading
  • Content lines as H2 headings
  • Last content line of each unit included as final heading
  • Proper JSON structure for Jupyter notebooks

Workflow Example

  1. Organize Structure: Use create_folder.py to create folder hierarchy
  2. Generate Content: Use generate-ipynb.sh to create notebook files
  3. Manual Enhancement: Add code cells, explanations, and examples to the generated notebooks

Contributions

Contributions are welcome! Feel free to submit pull requests with improvements, additional exercises, or bug fixes.

License

This project is open-source and available under the MIT License.

About

Repo for learning ML via CodeSignal challenges, including algorithms and projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors