This project uses state-of-the-art NLP techniques for news summarization. Additionally, a Streamlit app is provided for easy interaction with the models.
To get started, you need to install the required dependencies:
pip3 install -r requirements.txtTo run the Streamlit app:
- Navigate to the application directory:
cd Code/app- Download the fine-tuned models:
wget https://storage.googleapis.com/nlp-grp-2-bucket/best_model_Multi_News_final.pt
wget https://storage.googleapis.com/nlp-grp-2-bucket/bart-large-xsum-cnn_daily_final.zip
unzip bart-large-xsum-cnn_daily_final.zip-
Update the News API key in
Code/app/utils.pyon line 7. You can get a free API key from newsapi.org. -
Start the Streamlit server:
python3 -m streamlit run news_summarization_app.py --server.port=8888To fine-tune the BART model on the CNN/Daily Mail dataset:
python3 Code/bart_xsum_fine_tuning.pyTo effectively utilize the code for the facebook/bart-large-xsum model trained on the Multi News dataset, follow these steps:
- Execute the
NLP_Project_Train_Multi_News_Final.pyscript. This will generate two crucial artifacts:multi_label_test.csv: This file serves as the test set for evaluation purposes.best_model_Multi_News_final.pt: This file represents the optimal model trained on the Multi News dataset.
To test the BART model trained on the Multi News dataset:
- Run the
NLP_Project_Test_Multi_News_Final.pyscript. - Use the
multi_label_test.csvdataset as the testing set. - Validate the performance of the model using
best_model_Multi_News_final.pt.
To generate text using the BART model trained on the Multi News dataset:
- Run the
Generating_Text_Multi_News_Final.pyscript. - Use
best_model_Multi_News_final.ptto execute the desired functionality.
To generate a news summary using LLaMA 3, use llama.ipynb in a Jupyter notebook.
Just update the access_token with your Hugging Face access token, which you can get from
huggingface.co/settings/tokens.