NLP Portfolio

A collection of projects to help me learn about Natural Language Processing.

For a list of the technical and soft skills I've learned through this NLP portfolio, please check out my resume page, where I highlight my experience in various NLP techniques and concepts, as well as my soft skills in presenting experiments and research findings, scheduling, time and project management, and analytical thinking

TL:DR

I am enthusiastic about Natural Language Processing (NLP) and have developed a strong set of skills through my NLP portfolio. I have gained experience in various NLP techniques and concepts, including text processing, Part of Speech (POS) tagging, language modeling, web scraping, and deep learning models, vector databases, to name a few. I have also developed my soft skills in presenting experiments and research findings, scheduling, time and project management, and analytical thinking.

Moving forward, I plan to continue exploring NLP techniques and stay up-to-date with the rapidly changing field by reading research papers and engaging with online communities. I am also excited to work on personal projects that apply my NLP skills to real-world problems. One area of particular interest to me is working with Vector Databases and integrating them with Large Language Models such as ChatGPT to provide a more meaningful and personalized user experience. If you are interested in my skills and experience and have any potential employment opportunities or collaborations, please do not hesitate to reach out to me by filling out this form.

Assignment 0

Introduction document summarizing historical and current approaches to NLP, as well as a reflection on my personal interest in NLP.

You can see the document here.

Program 1

Simple program to get used to text processing in python.

You can see the code here and a descriptive document here.

Word Guessing Game

Program to get used to Part of Speech (POS) tagging with NLTK. Also includes a hangman-style word guessing game that chooses a word at random from the 50 most common lemmas occurring in the text.

You can see the code here and instruction on how to run it here

WordNet

Assignment to demonstrate skills in WordNet and SentiWordNet, as well as finding collocations. Notebook can be found here

Ngrams

Assignment to gain experience in creating ngrams from text and building a language model from ngrams, as well as to reflect on the utility of ngram language models

program1 generate the unigram and bigram dicts for the given English, French, and Italian training sets. It then pickles these dictionaries so that they can be used by program2.

program2 uses unpickles the dictionaries created by program1 and uses them to compute the most likely language for each line in the test data set. It then computes the accuracy using the solutions set.

The narrative provides a reflection on ngrams and the utility of ngram language models.

Sentence Parsing

Assignment to understand concepts related to sentence syntax, understand the 3 types of sentence parses (PSG, dependency, and SRL), and to be able to use syntax parsers

The document for this assignment can be found here

Web Crawler

Assignment to understand the importance of corpora in NLP, understand how to extract information from website using html, understand how websites work, and be able to do web scraping using Beautifuk Soup.

The report for this assigment can be found here.

The code for this project includes the following python files:

Instructions on how to run then Web Crawler locally can be found here.

Text Classification 1

Assignment Goals:

Gain experience with sklearn using text data
Gain experience with text classification

What I did:

Used this dataset to train models to classify a job posting as real or fraudulent
Tried Naive Bayes, Logistic Regression, and Neural networks using sklearn
Wrote up my analysis on the various approaches in the notebook

ACL Paper Summary

Assignment Goals:

Get familiar with reading ACL Papers
Learn how to stay updated with the rapidly evolving field of NLP

What I did:

Read the following paper
Wrote up an analysis on the authors' findings and contributions

The paper analysis can be found here

Text Classification 2

Assignment Goals:

Gain experience with Keras
Gain experience with text classificatiom
Gain experience with deep learning model variations and embeddings

What I did:

Experimented with several approaches to train a classifier model
Tried basic sequential model
Trained a Recurrent Neural Network (RNN)
Trained a few variations on a Convolutional Neural Network (CNN) with different embedding techniques
Evaluated and analysed the results, comparing the performance of each model

Notebook containing the code can be found here

ChatBot

Assignment Goals:

Create a chatbot using NLP techniques learned in class
The chatbot should be able to carry on a limited conversation in a particular domain using a knowledge base or knowledge from the web, and knowledge it learns from the user

What I did:

Extracted information on over 12K book from this dataset.
Used weaviate to store the text information about the books as vectors in a vector database
Created a ChatBot is integrated with OpenAI's chat API to gather user preferences
Used the information gathered about the user to query the vector database and retrieve books that the user might like

The code and report can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
Assignment1		Assignment1
Text-Classification-2		Text-Classification-2
Text-Classification		Text-Classification
Web-Crawler		Web-Crawler
Word_Guessing_Game		Word_Guessing_Game
chatbot		chatbot
ngrams		ngrams
.gitignore		.gitignore
ACL Paper Summary.pdf		ACL Paper Summary.pdf
LICENSE		LICENSE
NLP-Resume.md		NLP-Resume.md
NLP_WordNet.ipynb		NLP_WordNet.ipynb
Overview_of_NLP.pdf		Overview_of_NLP.pdf
README.md		README.md
Sentence-Parsing.pdf		Sentence-Parsing.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Portfolio

TL:DR

Assignment 0

Program 1

Word Guessing Game

WordNet

Ngrams

Sentence Parsing

Web Crawler

Text Classification 1

ACL Paper Summary

Text Classification 2

ChatBot

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Portfolio

TL:DR

Assignment 0

Program 1

Word Guessing Game

WordNet

Ngrams

Sentence Parsing

Web Crawler

Text Classification 1

ACL Paper Summary

Text Classification 2

ChatBot

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages