Incepta Backend

A semantic search engine backend that enables intelligent searching across university technology databases and government grants using embeddings and LLM-powered explanations.

Features

Semantic Search: Advanced search capabilities using embeddings and cross-encoder reranking
Dual Search Modes:
- Patents search (Stanford University Technology Database)
- Grants search (Grants.gov and DOD SBIR/STTR)
Modern UI: Responsive web interface with gradient styling and intuitive search
LLM-Powered: Intelligent search powered by embeddings and GPT-4 explanations
Cross-Encoder Reranking: Enhanced result relevance using ms-marco-MiniLM-L-6-v2
About Page: Team information and mission statement

Project Structure

Incepta_backend/ |-- README.md # Project overview and setup instructions |-- requirements.txt # Python package dependencies |-- .env # Environment variables for API keys (TO BE ADDED) |-- config.py # Centralized configuration file (TO BE ADDED)

|-- deployment/ # Deployment-related configurations | |-- Dockerfile # Docker configuration (to be added) | |-- docker-compose.yml # Optional: Multi-container setup (to be added) | |-- gunicorn_config.py # Gunicorn config (to be added) | |-- nginx/ # NGINX configuration | |-- nginx.conf

Setup & Running

Install required packages:
- look at requirements.txt
Set up your API keys:
- Create a file for your Pinecone API key
- Create a file for your OpenAI API key
- Update the paths in search_service.py
Run the web application: cd Incepta_backend/main python app.py
Access the application:
- Open your browser and navigate to the URL displayed in the terminal
- Use the search bar to query patents or grants
- Toggle between Patents and Grants using the buttons
- Visit the About page to learn more about Incepta

Technical Details

Flask backend with async search capabilities
Modern UI with gradient styling and responsive design
Cross-encoder reranking for improved search relevance
GPT-4 powered result explanations
Pinecone vector database for efficient similarity search
NLTK for text processing
Sentence transformers for embedding generation

Data Sources

Stanford TechFinder database (scraped on 11/17/2024)
Grants.gov database (downloaded on 11/21/2024)
DOD SBIR/STTR database (scraped on 11/21/2024)
Embeddings stored in Pinecone indexes: stanford-techfinder-133-v1 and grants-2024-11-21

Development Notes

The application uses Flask for the backend server
Frontend built with modern HTML5 and CSS3
Search results returned via AJAX calls
Async/await pattern for efficient search operations
Cross-encoder reranking for better result quality

Notes

need to summarize and upsert all of stanford, not all was done before because of rate limit
need to scrape upenn again. Only got 15/21 pages. low priority though

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
__pycache__		__pycache__
data		data
main		main
scrapers		scrapers
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
.slugignore		.slugignore
Procfile		Procfile
README.md		README.md
application.py		application.py
columbia_12072024.csv		columbia_12072024.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incepta Backend

Features

Project Structure

Setup & Running

Technical Details

Data Sources

Development Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Incepta Backend

Features

Project Structure

Setup & Running

Technical Details

Data Sources

Development Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages