- Retrieving documents by looking for query terms alone throughout documents is slow and not scalable. Moreover, it is usually needed in information retrieval systems to arrange query results based on the order of relevance to the query.
- Implemented a responsive information retrieval WebApp (< 2 sec) using PostgreSQL + Django
- vectorial representation of files using a term-document-matrix (TDM) and query results
- Retrieval of those documents using a dictionary data structure and display of search result files in order of relevance.
- Created a database schema in PostgreSQL with Python's Django models.
- Database connection, configuration, and population with text files.
- Text pre-processing and cleaning using natural language processing toolkit ( NLTK)
View the Jupyter Notebook here.
Step-1: Virtual Environment Setup
-
Create a
pipenvvirtual environment first using documentation -
Install dependencies from Pipfile
pipenv install -
To install exact versions run:
pipenv install --ignore-pipfile -
run the app:
python manage.py runserver
Step-2: Postgres Database and inserting data
-
text datasets are available inside
datasetfolder -
first setup postgres database locally - check documentation
-
insert text data into the databse using the code
push_to_postgres.py
I will dockerize this soon
ir_video.mp4





