A collection of projects to help me learn about Natural Language Processing.
For a list of the technical and soft skills I've learned through this NLP portfolio, please check out my resume page, where I highlight my experience in various NLP techniques and concepts, as well as my soft skills in presenting experiments and research findings, scheduling, time and project management, and analytical thinking
I am enthusiastic about Natural Language Processing (NLP) and have developed a strong set of skills through my NLP portfolio. I have gained experience in various NLP techniques and concepts, including text processing, Part of Speech (POS) tagging, language modeling, web scraping, and deep learning models, vector databases, to name a few. I have also developed my soft skills in presenting experiments and research findings, scheduling, time and project management, and analytical thinking.
Moving forward, I plan to continue exploring NLP techniques and stay up-to-date with the rapidly changing field by reading research papers and engaging with online communities. I am also excited to work on personal projects that apply my NLP skills to real-world problems. One area of particular interest to me is working with Vector Databases and integrating them with Large Language Models such as ChatGPT to provide a more meaningful and personalized user experience. If you are interested in my skills and experience and have any potential employment opportunities or collaborations, please do not hesitate to reach out to me by filling out this form.
Introduction document summarizing historical and current approaches to NLP, as well as a reflection on my personal interest in NLP.
You can see the document here.
Simple program to get used to text processing in python.
You can see the code here and a descriptive document here.
Program to get used to Part of Speech (POS) tagging with NLTK. Also includes a hangman-style word guessing game that chooses a word at random from the 50 most common lemmas occurring in the text.
You can see the code here and instruction on how to run it here
Assignment to demonstrate skills in WordNet and SentiWordNet, as well as finding collocations. Notebook can be found here
Assignment to gain experience in creating ngrams from text and building a language model from ngrams, as well as to reflect on the utility of ngram language models
program1 generate the unigram and bigram dicts for the given English, French, and Italian training sets. It then pickles these dictionaries so that they can be used by program2.
program2 uses unpickles the dictionaries created by program1 and uses them to compute the most likely language for each line in the test data set. It then computes the accuracy using the solutions set.
The narrative provides a reflection on ngrams and the utility of ngram language models.
Assignment to understand concepts related to sentence syntax, understand the 3 types of sentence parses (PSG, dependency, and SRL), and to be able to use syntax parsers
The document for this assignment can be found here
Assignment to understand the importance of corpora in NLP, understand how to extract information from website using html, understand how websites work, and be able to do web scraping using Beautifuk Soup.
The report for this assigment can be found here.
The code for this project includes the following python files:
- 1_link_scraper.py
- 2_text_scraper.py
- 3_clean_text.py
- 4_extract_important_terms.py
- 5_create_knowledge_base.py
Instructions on how to run then Web Crawler locally can be found here.
Assignment Goals:
- Gain experience with sklearn using text data
- Gain experience with text classification
What I did:
- Used this dataset to train models to classify a job posting as real or fraudulent
- Tried Naive Bayes, Logistic Regression, and Neural networks using sklearn
- Wrote up my analysis on the various approaches in the notebook
Assignment Goals:
- Get familiar with reading ACL Papers
- Learn how to stay updated with the rapidly evolving field of NLP
What I did:
- Read the following paper
- Wrote up an analysis on the authors' findings and contributions
The paper analysis can be found here
Assignment Goals:
- Gain experience with Keras
- Gain experience with text classificatiom
- Gain experience with deep learning model variations and embeddings
What I did:
- Experimented with several approaches to train a classifier model
- Tried basic sequential model
- Trained a Recurrent Neural Network (RNN)
- Trained a few variations on a Convolutional Neural Network (CNN) with different embedding techniques
- Evaluated and analysed the results, comparing the performance of each model
Notebook containing the code can be found here
Assignment Goals:
- Create a chatbot using NLP techniques learned in class
- The chatbot should be able to carry on a limited conversation in a particular domain using a knowledge base or knowledge from the web, and knowledge it learns from the user
What I did:
- Extracted information on over 12K book from this dataset.
- Used weaviate to store the text information about the books as vectors in a vector database
- Created a ChatBot is integrated with OpenAI's chat API to gather user preferences
- Used the information gathered about the user to query the vector database and retrieve books that the user might like
The code and report can be found here