Skip to content

shoibolina/NYTimes-mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYTimes Mining Portfolio

A lightweight, end-to-end data mining pipeline that fetches news articles from the NYTimes API, preprocesses text using Python (Pandas, NLTK, spaCy), and uncovers hidden topics with NMF and KMeans clustering, visualized via insightful word clouds.

Features

  • NYTimes API Integration: Retrieve news articles with custom queries and date ranges.
  • Secure API Key Management: Utilizes a .env file to keep your API key private.
  • Text Preprocessing: Lowercases, cleans, and removes stopwords from article text.
  • NER & Topic Modeling: Extracts named entities and discovers latent topics.
  • Clustering: Groups similar articles for deeper analysis.

Language & Libraries

Python Pandas NLTK spaCy scikit-learn WordCloud Matplotlib Seaborn python-dotenv

How to use this repository?

Step 1: Clone this repository

Run the command below in your terminal

git clone https://github.com/shoibolina/NYTimes-mining.git

Step 2: Obtain API key and Set-up environment variables

Create a .env file using terminal at the project directory to securely store the API key.

touch .env

Create an account at The New York Times and get your Article Search API Key. Open the previously created .env file and enter your api key as follows:

NYTIMES_API_KEY=your_nytimes_api_key

This file is included in .gitignore to prevent sharing/committing the api keys.


Step 3: Install libraries

Install the python libraries listed in Language & Libraries


Step 4: Run the notebook

Follow through the comments in the notebook and have fun exploring!

Author

Shoibolina Kaushik
Master of Science, Computer Science (25G)
Emory University

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors