Skip to content

rigrergl/nlp_portfolio

Repository files navigation

NLP Portfolio

A collection of projects to help me learn about Natural Language Processing.

For a list of the technical and soft skills I've learned through this NLP portfolio, please check out my resume page, where I highlight my experience in various NLP techniques and concepts, as well as my soft skills in presenting experiments and research findings, scheduling, time and project management, and analytical thinking

TL:DR

I am enthusiastic about Natural Language Processing (NLP) and have developed a strong set of skills through my NLP portfolio. I have gained experience in various NLP techniques and concepts, including text processing, Part of Speech (POS) tagging, language modeling, web scraping, and deep learning models, vector databases, to name a few. I have also developed my soft skills in presenting experiments and research findings, scheduling, time and project management, and analytical thinking.

Moving forward, I plan to continue exploring NLP techniques and stay up-to-date with the rapidly changing field by reading research papers and engaging with online communities. I am also excited to work on personal projects that apply my NLP skills to real-world problems. One area of particular interest to me is working with Vector Databases and integrating them with Large Language Models such as ChatGPT to provide a more meaningful and personalized user experience. If you are interested in my skills and experience and have any potential employment opportunities or collaborations, please do not hesitate to reach out to me by filling out this form.

Assignment 0

Introduction document summarizing historical and current approaches to NLP, as well as a reflection on my personal interest in NLP.

You can see the document here.

Program 1

Simple program to get used to text processing in python.

You can see the code here and a descriptive document here.

Word Guessing Game

Program to get used to Part of Speech (POS) tagging with NLTK. Also includes a hangman-style word guessing game that chooses a word at random from the 50 most common lemmas occurring in the text.

You can see the code here and instruction on how to run it here

WordNet

Assignment to demonstrate skills in WordNet and SentiWordNet, as well as finding collocations. Notebook can be found here

Ngrams

Assignment to gain experience in creating ngrams from text and building a language model from ngrams, as well as to reflect on the utility of ngram language models

program1 generate the unigram and bigram dicts for the given English, French, and Italian training sets. It then pickles these dictionaries so that they can be used by program2.

program2 uses unpickles the dictionaries created by program1 and uses them to compute the most likely language for each line in the test data set. It then computes the accuracy using the solutions set.

The narrative provides a reflection on ngrams and the utility of ngram language models.

Sentence Parsing

Assignment to understand concepts related to sentence syntax, understand the 3 types of sentence parses (PSG, dependency, and SRL), and to be able to use syntax parsers

The document for this assignment can be found here

Web Crawler

Assignment to understand the importance of corpora in NLP, understand how to extract information from website using html, understand how websites work, and be able to do web scraping using Beautifuk Soup.

The report for this assigment can be found here.

The code for this project includes the following python files:

  1. 1_link_scraper.py
  2. 2_text_scraper.py
  3. 3_clean_text.py
  4. 4_extract_important_terms.py
  5. 5_create_knowledge_base.py

Instructions on how to run then Web Crawler locally can be found here.

Text Classification 1

Assignment Goals:

  • Gain experience with sklearn using text data
  • Gain experience with text classification

What I did:

  • Used this dataset to train models to classify a job posting as real or fraudulent
  • Tried Naive Bayes, Logistic Regression, and Neural networks using sklearn
  • Wrote up my analysis on the various approaches in the notebook

ACL Paper Summary

Assignment Goals:

  • Get familiar with reading ACL Papers
  • Learn how to stay updated with the rapidly evolving field of NLP

What I did:

  • Read the following paper
  • Wrote up an analysis on the authors' findings and contributions

The paper analysis can be found here

Text Classification 2

Assignment Goals:

  • Gain experience with Keras
  • Gain experience with text classificatiom
  • Gain experience with deep learning model variations and embeddings

What I did:

  • Experimented with several approaches to train a classifier model
  • Tried basic sequential model
  • Trained a Recurrent Neural Network (RNN)
  • Trained a few variations on a Convolutional Neural Network (CNN) with different embedding techniques
  • Evaluated and analysed the results, comparing the performance of each model

Notebook containing the code can be found here

ChatBot

Assignment Goals:

  • Create a chatbot using NLP techniques learned in class
  • The chatbot should be able to carry on a limited conversation in a particular domain using a knowledge base or knowledge from the web, and knowledge it learns from the user

What I did:

  • Extracted information on over 12K book from this dataset.
  • Used weaviate to store the text information about the books as vectors in a vector database
  • Created a ChatBot is integrated with OpenAI's chat API to gather user preferences
  • Used the information gathered about the user to query the vector database and retrieve books that the user might like

The code and report can be found here

About

A collection of projects to help me learn about Natural Language Processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors