Information Retrieval System – Inverted Index & TF-IDF Search

This project implements a lightweight Information Retrieval (IR) pipeline using Node.js, Express, and MongoDB. It was originally created as an academic / research project to explore concepts like document preprocessing, inverted index construction, TF-IDF computation, query ranking, and search with cosine-based scoring.

The system loads raw text documents, preprocesses them, builds an inverted index, and provides a search endpoint that returns relevant documents ranked by similarity score.

📁 Project Structure

/routes
documents.js – Save documents, build inverted index, search
/models
Document.js – Mongoose schema for processed docs
FichierInverse.js – Schema for inverted index entries
/utils
saveDocumentsFromFolder.js
query.js – TF-IDF search + scoring
creationfichierinverse.js
/Collection_TIME – Folder containing raw text files

🔍 Features

1. Document Loader

Reads .txt files from /Collection_TIME
Preprocesses text
Computes term frequencies
Saves documents into MongoDB

2. Inverted Index Builder

Creates TF-IDF-ready inverted index
Stores posting lists and document term frequencies

3. TF-IDF Search Engine

Tokenizes user query
Computes TF-IDF score
Ranks documents by similarity
Returns most relevant content

📡 API Endpoints

POST /save

Loads and stores all documents.

POST /build-inverted-index

Creates the TF-IDF inverted index.

POST /search

Request: { "request": "your keywords" }

Response: { "results": [ { "fileName": "...", "content": "...", "score": 0.87 } ] }

🗄️ Data Models

Document Model

fileName
content
indexdoc.index
indexdoc.frequency

Inverted Index Model

terme
nb_doc
posting[fileName, frequency]

⚙️ Technologies

Node.js / Express
MongoDB / Mongoose
TF-IDF scoring
Basic IR engine design

🚀 How to Run

npm install
npm start

Workflow:

Add documents to /Collection_TIME
POST /save
POST /build-inverted-index
POST /search

📌 Future Improvements

Add stemming & stop words
Improve cosine similarity
Add pagination & highlighting
Add frontend viewer

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Collection_TIME		Collection_TIME
models		models
node_modules		node_modules
routes		routes
utils		utils
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
stoplist.txt		stoplist.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Information Retrieval System – Inverted Index & TF-IDF Search

📁 Project Structure

🔍 Features

1. Document Loader

2. Inverted Index Builder

3. TF-IDF Search Engine

📡 API Endpoints

POST /save

POST /build-inverted-index

POST /search

🗄️ Data Models

Document Model

Inverted Index Model

⚙️ Technologies

🚀 How to Run

📌 Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Mattic77/Mini-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval System – Inverted Index & TF-IDF Search

📁 Project Structure

🔍 Features

1. Document Loader

2. Inverted Index Builder

3. TF-IDF Search Engine

📡 API Endpoints

POST /save

POST /build-inverted-index

POST /search

🗄️ Data Models

Document Model

Inverted Index Model

⚙️ Technologies

🚀 How to Run

📌 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages