QuestionAnswering

This repository contains the code for the Final Project of the Natural Language Processing course in the Artificial Intelligence master degree at UniBo.

The objective of the project is to create an NLP system that solves the problem of Question Answering on the SQuAD dataset. The project has been extended for Open Domain Question Answering, with an additional module (DPR) for a 3 CFU Project Work for the same course.

Quick start

Clone the repository, create a virtual environment and install the requirements provided in requirements.txt.

python3 -m venv .env # or conda create -n NLP python3

Then, once the environment is active:

python3 -m pip install -r requirements.txt

Our normal model's weights can be downloaded from here, while the BERT model's weights can be downloaded from here. They must be placed in src/checkpoints. The DPR module's weights can be downloaded here and must be placed in src/checkpoints/training_dpr.

Another important step is to download SpaCy's english language model:

python3 -m spacy download en_core_web_sm

Then, the model can be evaluated on a test dataset using python3 compute_answers.py *PATH_TO_TEST_JSON_FILE*.

Organization of the repository

TaskExplanation.pdf contains the explanation of the task
data contains the JSON files of the training (training_set.json), validation (validation_set.json), test (dev_set.json), as well as some intermediate files for analysis.
src contains the code of our tests and experiments.
- Final Project (QA)
  - config.py and utils.py contain utility code that is used thoughout all other files
  - checkpoints should contain the weights of the model
  - baselines.ipynb is a notebook containing the implementation of the baselines described in the report
  - data_analysis.ipynb contains an analysis of the dataset
  - error_analysis.ipynb contains an analysis of the mistakes that the model makes with respect to the ground truth
  - train.ipynb is a notebook containing all of the training experiments we conducted
  - evaluation_tests.ipynb contains the evaluations whose results we presented in the report
- 3 CFU Project Work (Open Domain QA)
  - tf_idf_retrieval_baseline.ipynb contains the implementation and analysis of a simple baseline for the OpenQA task, as explained in the report
  - dense_passage_retriever.ipynb contains the definition and training of the DPR
  - mixing_evaluations.ipynb contains the evaluations for the full OpenQA task uisng our different mixing methods
  - DPR_and_other_methods_analysis_over_k_and_epochs.ipynb is an analysis evaluating the performances of the DPR and its mixing methods during the training epochs.
  - create_exact_over_retrieval_accuracy_graph.ipynb was used to obtain a graph included in the report.
report_final_project.pdf contains the report for the final project (Question Answering with DistilBert).
report_project_work.pdf contains the report for the project work (Open Domain QA with Sparse and Dense Representations).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QuestionAnswering

Quick start

Organization of the repository

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
TaskExplanation.pdf		TaskExplanation.pdf
notes.md		notes.md
report_final_project.pdf		report_final_project.pdf
report_project_work.pdf		report_project_work.pdf
requirements.txt		requirements.txt

MarcelloCeresini/QuestionAnswering

Folders and files

Latest commit

History

Repository files navigation

QuestionAnswering

Quick start

Organization of the repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages