Skip to content

bektade/InformationRetrival-WebApp

Repository files navigation

Information Retrieval WebApp (Postgre SQL + Django)

Problem:

  • Retrieving documents by looking for query terms alone throughout documents is slow and not scalable. Moreover, it is usually needed in information retrieval systems to arrange query results based on the order of relevance to the query.

Accomplishments:

  • Implemented a responsive information retrieval WebApp (< 2 sec) using PostgreSQL + Django
  • vectorial representation of files using a term-document-matrix (TDM) and query results
  • Retrieval of those documents using a dictionary data structure and display of search result files in order of relevance.
  • Created a database schema in PostgreSQL with Python's Django models.
  • Database connection, configuration, and population with text files.
  • Text pre-processing and cleaning using natural language processing toolkit ( NLTK)

View the Jupyter Notebook here.

How to run the app

Step-1: Virtual Environment Setup

  • Create a pipenv virtual environment first using documentation

  • Install dependencies from Pipfile

    pipenv install
    
  • To install exact versions run:

    pipenv install --ignore-pipfile
    
  • run the app:

    python manage.py runserver
    

Step-2: Postgres Database and inserting data

  • text datasets are available inside dataset folder

  • first setup postgres database locally - check documentation

  • insert text data into the databse using the code push_to_postgres.py

    Screenshot 2024-01-29 at 3 55 38 PM

Promise!

I will dockerize this soon

Video

ir_video.mp4

System Architecture

Screenshot 2024-01-29 at 3 55 10 PM

Term-Document Matrix

Screenshot 2024-01-29 at 3 49 27 PM

Query processing

newplot (4)

Zif's Law

newplot (1)

Web App's UI

Screenshot 2024-01-29 at 3 55 38 PM

Screenshot 2024-01-29 at 3 55 38 PM

Screenshot 2024-01-29 at 3 55 38 PM

Screenshot 2024-01-29 at 3 55 38 PM

About

Term Document Frequency based Information retrieval WebApp with Django + PostgreSQL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published