Search Engine Project

A full-stack search engine with multi-threaded web crawling, TF-IDF ranking, and real-time search capabilities. Built with Spring Boot, React, and PostgreSQL, this project demonstrates modern search algorithms, efficient document indexing techniques, and a responsive user interface.

A comprehensive full-stack search engine application with web crawling, document indexing, and search capabilities. This project demonstrates the implementation of modern search algorithms, efficient indexing techniques, and a responsive user interface.

Features

Backend Components

Multi-threaded Web Crawler: Efficiently crawls websites with respect for robots.txt directives
Text Processing Engine: Implements tokenization, stemming, and stop word removal
Search API: Uses TF-IDF algorithms for accurate and fast search results
PostgreSQL Database: Stores documents, indexes, and search metadata

Frontend Components

Modern React UI: Clean, responsive interface built with React 18
Interactive Search: Real-time suggestions and highlighting
Results Navigation: Pagination with relevance scoring
Voice Search: Speech recognition for hands-free searching

Architecture

Technical Stack

Backend: Java 21, Spring Boot 3.2.x, JPA/Hibernate, PostgreSQL
Search Technology: Custom implementation with TF-IDF, PageRank, and vector space models
Frontend: React 18, Styled Components, Axios
DevOps: Docker, Docker Compose, GitHub Actions

Getting Started

Prerequisites

Docker and Docker Compose
Git
Java 17+ (for local development only)
Node.js 16+ (for local development only)

Quick Start with Docker

Clone the repository

git clone https://github.com/yourusername/Search-Engine.git
cd Search-Engine

Create environment variables file

cp .env.example .env
# Edit .env with your preferred settings

Build and run with Docker Compose
```
docker-compose up -d
```
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8080

Manual Development Setup

Backend

Configure PostgreSQL database

CREATE DATABASE searchengine;
CREATE USER searchuser WITH PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE searchengine TO searchuser;

Configure application properties

cd searchengine
cp src/main/resources/application.properties.example src/main/resources/application.properties
# Edit application.properties with your database settings

Run the Spring Boot application
```
./mvnw spring-boot:run
```

Frontend

Install dependencies
```
cd frontend
npm install
```

Configure API endpoint

cp .env.example .env
# Edit .env to set REACT_APP_API_URL

Start development server
```
npm start
```

Usage

Starting and Managing Crawls

Start a new crawl with specified number of threads:

curl -X POST "http://localhost:8080/crawler?thread_num=16"

Monitor crawling progress:
```
curl -X GET "http://localhost:8080/crawler/status"
```
This will return JSON with crawling statistics, including:
- Total pages crawled
- Pages in queue
- Crawling rate (pages/second)
- Elapsed time
- Estimated completion time

Stop an active crawl:

curl -X POST "http://localhost:8080/crawler/stop"

Document Indexing

Trigger manual indexing of crawled documents:

curl -X POST "http://localhost:8080/reindex"

Check indexing status:
```
curl -X GET "http://localhost:8080/index-status"
```
This endpoint provides information about:
- Number of documents indexed
- Current indexing progress
- Index statistics (unique words, document count)
- Estimated time remaining

Searching

Access the web interface at http://localhost:3000 and enter your search query.

The API endpoint is also available for direct integration:

curl -X GET "http://localhost:8080/search?q=your+search+query&page=0&size=10"

Advanced Search Features

Use quotes for exact phrases: "artificial intelligence"
Use operators: machine AND learning, python OR java, programming NOT javascript
Voice search: Click the microphone icon and speak your query
Filter by domain: site:github.com python
Filter by date: after:2023-01-01 before:2023-12-31 machine learning

Project Structure

Search-Engine/
├── searchengine/                  # Spring Boot backend
│   ├── src/main/java/
│   │   └── com/example/searchengine/
│   │       ├── Crawler/           # Web crawler components
│   │       ├── Indexer/           # Document indexing
│   │       └── Search/            # Search functionality
│   └── pom.xml
├── frontend/                      # React frontend
│   ├── src/
│   │   ├── components/            # UI components
│   │   └── App.js                 # Main application
├── docker-compose.yml             # Docker configuration
└── README.md                      # Project documentation

Performance Considerations

Database Indexing: Custom PostgreSQL indices for optimized searches
Connection Pooling: HikariCP for efficient database connections
Caching: In-memory caching of frequent searches
Pagination: All results are paginated to improve performance

Troubleshooting

Common Issues

Docker container fails to start
Check logs with docker-compose logs backend or docker-compose logs frontend
Search results are not appearing
Ensure you've started the crawler to populate the database with documents
Frontend can't connect to backend
Verify the API URL configuration in the frontend's .env file
Database vacuum operation fails
This operation requires special permissions. If running locally, ensure your database user has appropriate rights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Engine Project

Features

Backend Components

Frontend Components

Architecture

Technical Stack

Getting Started

Prerequisites

Quick Start with Docker

Manual Development Setup

Backend

Frontend

Usage

Starting and Managing Crawls

Document Indexing

Searching

Advanced Search Features

Project Structure

Performance Considerations

Troubleshooting

Common Issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.idea		.idea
.vscode		.vscode
frontend		frontend
searchengine		searchengine
.env		.env
.env.example		.env.example
README.md		README.md
docker-compose.yml		docker-compose.yml
env		env

mohamed-sameh-albaz/Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Search Engine Project

Features

Backend Components

Frontend Components

Architecture

Technical Stack

Getting Started

Prerequisites

Quick Start with Docker

Manual Development Setup

Backend

Frontend

Usage

Starting and Managing Crawls

Document Indexing

Searching

Advanced Search Features

Project Structure

Performance Considerations

Troubleshooting

Common Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages