A comprehensive collection of machine learning and web development resources from CodeSignal's learning paths, implemented in Jupyter notebooks.
This repository contains Jupyter notebooks covering various topics from CodeSignal's course paths (https://learn.codesignal.com/course-paths). Initially focused on machine learning, the project has expanded to include web development with Python and advanced AI topics.
-
Journey into Machine Learning with Sklearn and Tensorflow
- Data Cleaning and Preprocessing in Machine Learning
- Foundational Machine Learning Models with Sklearn
- Feature Engineering for Machine Learning
- Intro to Model Optimization in Machine Learning
- Introduction to Neural Networks with TensorFlow
-
AI Theory and Coding
- Regression and Gradient Descent
- Classification Algorithms and Metrics
- Gradient Descent: Building Optimization Algorithms from Scratch
- Ensemble Methods from Scratch
- Unsupervised Learning and Clustering
- Neural Networks Basics from Scratch
-
Introduction to Machine Learning with SciKit Learn
- Training Your First Machine Learning Model from Scratch
- Data Preprocessing For Machine Learning
- Diving Deep into Regression
- Cracking Classification
- Deep Dive into Regression and Classification Metrics
- Ensembles in Machine Learning
- Hypertuning and Cross-Validation
-
Mastering Dimensionality Reduction with Python
- Navigating Data Simplification with PCA
- Linear Landscapes of Dimensionality Reduction
- Non-linear Dimensionality Reduction Techniques
- Enigmatic Autoencoders for Dimensionality Reduction
- Dimensionality Reduction with Feature Selection
-
Deep Dive into Numpy and Pandas
- NumPy Basics
- Pandas Basics and DataFrame Manipulation
- Introduction to Data Cleaning and Transformation
- Advanced Data Analysis with Pandas
- Data Transformation Techniques in Pandas
- Comprehensive Data Wrangling and Analysis with Pandas and Numpy
-
Mastering Clustering in Machine Learning
- K-means Clustering Decoded
- Hierarchical Clustering Deep Dive
- Density-Based Clustering Simplified
- Cluster Performance Unveiled
-
Comprehensive Introduction to Tensorflow
- Introduction to TensorFlow Basics
- Building a Neural Network in TensorFlow
- Modeling the Iris Dataset with TensorFlow
- TensorFlow Techniques for Model Optimization
-
Mathematical Foundations for Deep Learning
- Introduction to Linear Algebra for Machine Learning
- Introduction to Calculus for Machine Learning
- Advanced Calculus for Machine Learning
- Foundations of Optimization Algorithms
- Introduction to Probability and Statistics for Machine Learning
-
Building a Sketch Recognition System with CNN
- Introduction to Drawing Recognition and CNN Fundamentals
- Data Preparation for Drawing Recognition
- Drawing Recognition with CNNs for Sketches
- Building the UI for Drawing Recognition
-
Introduction to Natural Language Processing
- Introduction to Text Data Exploration in Python
- Text Data Preprocessing in Python
- Introduction to TF-IDF Vectorization in Python
- Building and Evaluating Text Classifiers in Python
-
Text Classification with Natural Language Processing
- Collecting and Preparing Textual Data for Classification
- Feature Engineering for Text Classification
- Introduction to Modeling Techniques for Text Classification
- Advanced Modeling for Text Classification
-
Token Classification in NLP using spaCy
- Building an NLP Pipeline with spaCy for Token Classification
- Linguistics for Token Classification in spaCy
- Practical Applications of spaCy for Real-Life Tasks
-
Data Processing for LLMs
- Foundations of NLP Data Processing
- Modern Tokenization Techniques for AI & LLMs
- Optimized Data Preparation for Large-Scale LLMs
- Chunking and Storing Text for Efficient LLM Processing
-
Prompt Engineering for Everyone
- Understanding LLMs and Basic Prompting Techniques
- Engineering Output Size with LLMs
- Journey Into Format Control in Prompt Engineering
- Prompt Engineering for Precise Text Modification
- Advanced Techniques in Prompt Engineering
-
Generative AI for Everyone in 2025
- Generative AI in 2025 - Overview and Practice
- Mastering Communication with AI Language Models
- Applying Generative AI in Everyday Professional Tasks
- Making Things Shine - Practice and Learn Image Generation with AI
- Generative AI - The Next Frontier: Voice, Video, and More
-
Foundations of Retrieval Augmented Generation (RAG) Systems
- Introduction to RAG
- Text Representation Techniques for RAG Systems
- Scaling up RAG with Vector Databases
- Beyond Basic RAG: Improving our Pipeline
-
Talk to Your Documents with LangChain and Python
- LangChain Chat Essentials in Python
- Document Processing and Retrieval with LangChain in Python
- Building a RAG-Powered Chatbot with LangChain and Python
-
Implementing Video Transcriber with OpenAI Whisper in Python
- Getting Started with OpenAI Whisper API in Python
- Transcribing Large Files in Python using FFmpeg
- Scraping and Transcribing Remote Videos
-
Building a Chatbot with FastAPI and OpenAI
- Creating a Chatbot with OpenAI in Python
- Building a Chatbot Service With FastAPI
- Developing a Chatbot Web Application With FastAPI
-
Building a Personal tutor with DeepSeek and FastAPI
- Creating a Personal Tutor with DeepSeek in Python
- Building a Personal Tutor Service With FastAPI
- Developing a Personal Tutor Web Application With FastAPI
-
MCP Servers Made Easy with Python and OpenAI Agents
- Introduction to OpenAI Agents SDK in Python
- Developing and Integrating an MCP Server in Python
- Advanced MCP Server and Agent Integration in Python
-
Introduction to Django for Back-End Development
- First Steps Into Back-End Engineering with Django
- Managing Data with SQLite and Django ORM
- Retrieving and Manipulating Data with Django ORM
- Building a Full-Featured To-Do List Application
-
APIs Made Easy with Python and Flask
- Introduction to Flask Basics
- Mastering Flask HTTP Methods
- Flask Data Modeling with Marshmallow
- Securing Flask Apps with JWT Authentication
-
Mastering Web Scraping with Python and Beautiful Soup
- Basic Python and Web Requests
- Introduction to BeautifulSoup for Web Scraping
- Advanced Web Scraping Techniques
- Implementing Scalable Web Scraping with Python
-
Mastering Cloud Engineering with AWS and Python
- Introduction to AWS SDK for Python
- Mastering Amazon S3 with AWS SDK for Python
- Introduction to DynamoDB with AWS SDK for Python
- Mastering Messaging with AWS SDK for Python
- AWS Secrets Management with AWS SDK for Python
-
Deploying ML Models in Production
- Building Reusable Pipeline Functions
- Model Serving with FastAPI
- Automating Retraining with Apache Airflow
-
Learning and Mastering Redis with Python
- Introduction to Redis with Python: The Basics
- Mastering Redis Data Structures with Python: Beyond Basics
- Mastering Redis Transactions and Efficiency with Python
- Mastering Redis for High-Performance Applications with Python
- Implementing a Redis-based Backend System with Python
-
Intro to Machine Learning in Trading with $TSLA
- Basic $TSLA Financial Data Handling in Pandas
- Technical Indicators in Financial Analysis with Pandas
- Preparing Financial Data for Machine Learning
- Introduction to Machine Learning with Gradient Boosting Models
-
Mastering Algorithms and Data Structures in Python
- Hashing, Dictionaries, and Sets in Python
- Sorting and Searching Algorithms in Python
- Linked Lists, Stacks, and Queues in Python
- Understanding and Using Trees in Python
- Mastering Graphs in Python
-
AI Interviews - Software Design, Architecture, and More
- AI Interviews: Software Development and Methodologies
- AI Interviews: System Architecture and Design
- AI Interviews: Network and Data Management
- AI Interviews: System Performance and Security
-
Getting Started with SQL with Leo Messi
-
Go Programming for Beginners
- Practical Learning: Hands-on implementation of theoretical concepts
- Diverse Topics: Covers ML, AI, web development, and more
- Industry Relevance: Focuses on in-demand skills and technologies
- Open Source: Encourages collaboration and knowledge sharing
- Clone the repository
- Open the Jupyter notebooks
- Explore the markdown explanations and code examples
- Experiment with the code and adapt it to your projects
This repository includes several automation tools to help with content creation and organization:
Purpose: Automatically creates folder structures based on course text files.
How it works:
- Reads a text file containing course outline
- Detects lines ending with "practices" as course section indicators
- Creates numbered folders for each section found after a "practices" line
- Useful for organizing course materials before converting to notebooks
Usage:
python create_folder.py <text_file>Example:
python create_folder.py course-outline.txtInput format example:
Unit 1
4 practices
16 min
Introduction to Tokenization (Rule-Based Tokenization)
Preview
Tokenize Text with NLTK
Output: Creates folder "1. Introduction to Tokenization (Rule-Based Tokenization)"
Purpose: Converts structured text files into Jupyter notebooks with proper formatting.
Features:
- Smart Filtering: Automatically skips lines containing:
- Number + "practices" (e.g., "4 practices")
- Number + "min" (e.g., "16 min")
- "Preview" (case-insensitive)
- Intelligent Naming: Creates notebook filenames using Unit number + first valid content line
- Content Structure: Converts text content to markdown cells with proper heading hierarchy
- Last Line Handling: Ensures the last line of each unit becomes a heading in the notebook
Usage:
./generate-ipynb.sh <input_text_file>Example:
./generate-ipynb.sh course-content.txtInput format example:
Unit 1
4 practices
16 min
Introduction to Tokenization (Rule-Based Tokenization)
Preview
Tokenize Text with NLTK
Sentence Tokenization with NLTK
Extract Monetary Values with Regex
Unit 2
3 practices
13 min
Byte-Pair Encoding (BPE) – Subword Tokenization
Preview
Exploring Pre-trained Tokenizers with GPT-2
Output:
1.Introduction-to-Tokenization-(Rule-Based-Tokenization).ipynb2.Byte-Pair-Encoding-(BPE)-–-Subword-Tokenization.ipynb
Generated Notebook Structure:
- Unit title as H1 heading
- Content lines as H2 headings
- Last content line of each unit included as final heading
- Proper JSON structure for Jupyter notebooks
- Organize Structure: Use
create_folder.pyto create folder hierarchy - Generate Content: Use
generate-ipynb.shto create notebook files - Manual Enhancement: Add code cells, explanations, and examples to the generated notebooks
Contributions are welcome! Feel free to submit pull requests with improvements, additional exercises, or bug fixes.
This project is open-source and available under the MIT License.