Code Semantic Searcher

Status: Work in Progress

A local semantic search tool for querying codebases with natural language. Point it at a directory, it indexes everything, and then you can ask things like "where do we parse config files" instead of grepping around.

Why I Built This

Mainly to learn. I wanted to get hands-on with:

Building a TUI that actually talks to a backend
Working with embeddings and vector search (FAISS specifically)
Wiring up a simple ML pipeline end-to-end
FastAPI for quick API prototyping

It's not meant to replace proper code search tools. It's a playground for understanding how semantic search works under the hood.

How It Works

Preprocess - Reads source files, strips comments, outputs clean text chunks
Embed - Runs each chunk through a sentence-transformer model (all-MiniLM-L6-v2)
Index - Builds a FAISS index for fast similarity lookup
Search - Query gets embedded, matched against the index, results come back ranked

The API serves results, the TUI consumes them. Nothing fancy.

What I Picked Up Along the Way

Regex-based comment stripping is fragile but works for a prototype
FAISS IndexIVF needs training data, which is awkward for tiny datasets
Keeping pipeline stages separate makes debugging way easier
Type hints pay off when you're wiring modules together

Known Issues

This is learning code, not production code:

No incremental updates (you rebuild the whole index on changes)
Chunking is per-file, not per-function
No auth on the API
Probably breaks on edge cases I haven't hit yet

Running It

export PYTHONPATH=$PWD

# Index your code
python3 -m src.preprocessing.preprocess --input ~/your-code --output data/processed
python3 -m src.embedding.embedder
python3 -m src.indexing.build_index

# Start the API
uvicorn src.api.server:app --reload

# Open another terminal, run the TUI
python3 tui/client.py

Needs Python 3.10+, sentence-transformers, faiss-cpu, fastapi.

Project Structure

src/
  preprocessing/   # cleans code files
  embedding/       # generates vectors
  indexing/        # builds FAISS index
  search/          # runs queries
  api/             # FastAPI server
tui/
  client.py        # terminal interface

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/raw		data/raw
src		src
tui		tui
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Semantic Searcher

Why I Built This

How It Works

What I Picked Up Along the Way

Known Issues

Running It

Project Structure

License

About

Uh oh!

Releases

Packages

Languages

Sycritz/code-semantic-search

Folders and files

Latest commit

History

Repository files navigation

Code Semantic Searcher

Why I Built This

How It Works

What I Picked Up Along the Way

Known Issues

Running It

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages