🧠 NoteWeb

A local-first AI-powered tool to search, summarize, and understand your notes — built for students, by a student.
Uses chunking, vector embeddings, and LLaMA 3 via Ollama for real semantic understanding.

Features

Index PDFs, Word, Excel, and PowerPoint files into semantic chunks
Embed and store those chunks using vector search
Ask questions using LLaMA 3 via Ollama (offline + local)
Get answers with real context grounding (like the ChatGPT Retrieval Plugin)

- Follow-up question support with persistent memory
- Ingest multiple files at once from a folder
- Test-ready with test files/ folder

Folder Structure

noteweb/
├── main.py                 # CLI entrypoint
├── search.py              # Embedding search logic
├── llm_answerer.py        # Sends chunks to LLaMA via Ollama
├── embedder.py            # Generates embeddings
├── chunker.py             # Breaks files into semantic chunks
├── files_loader.py        # PDF loader (more formats soon)
├── generate_index.py      # Index generator (embeds + saves)
├── embeddings_index.json  # Your saved vector index
├── test files/            # Sample PDFs to test with
├── requirements.txt
└── venv/                  # Your virtual environment

Requirements

Python 3.11+
Ollama installed and running
```
brew install ollama
ollama run llama3
```
Install dependencies:
```
pip install -r requirements.txt
```

How to Use

1. Clone the Repository

git clone https://github.com/marcanjoul/noteweb.git
cd noteweb

Or, if you downloaded the ZIP, unzip it and navigate into the folder via:

cd ~/Desktop/noteweb-main  # or wherever you saved it

2. Set Up a Virtual Environment

We recommend using a virtual environment to keep dependencies clean:

python3 -m venv venv
source venv/bin/activate  # (for mac) # On Windows, use venv\Scripts\activate

3. Install Dependencies

Run the following to install all required packages:

pip install -r requirements.txt

If needed, manually install these extras:

pip install python-docx python-pptx openpyxl sentence-transformers

4. Add Your Files

Create or drop any files you want to search into the test files/ directory. Supported formats: .pdf .docx .pptx .xlsx

5. Run this to generate semantic embeddings from files in your folder

python generate_index.py

This will:

Load all .pdf, .docx, .xslx, and .pptx files from your test files/ folder
Chunk the content into semantically meaningful parts
Embed each chunk using sentence-transformers
Save everything to embeddings_index.json

6. Search Your Files with the AI

python search.py

This will:

Search your indexed chunks for relevant context
Pass top matches to LLaMA 3
Return an answer based on your notes
You can also ask follow-up questions, as NoteWeb remembers the context!

💡 Example Usage

What is the difference between supervised and unsupervised learning?

❗Requirements Make sure you have:
- Python 3.9+
- pip
- Optional: Ollama installed and running (for local LLaMA 3 support)
💡 Optional: Skip venv (Not Recommended) You can also run NoteWeb without using a virtual environment:
```
pip install -r requirements.txt
pip install python-docx python-pptx openpyxl sentence-transformers
```

7. Ask a question

python main.py --search "What is instruction-level parallelism?"

Why This Matters

NoteWeb simulates real retrieval-augmented generation (RAG) — the same strategy used in:

ChatGPT w/ File Uploads
Perplexity AI
Open-source RAG pipelines (like LangChain, LlamaIndex)

But here, it’s all:

Local
Educational
Hackable

Perfect for learning how vector search + LLMs work together.

File Support

✅ PDF

- ✅ DOCX (.docx)
- ✅ PowerPoint (.pptx)
- ✅ Excel (.xlsx)
- 🔜 TXT, Markdown, Web scraping

You can drop files into the test files/ folder!

Roadmap

Multi-file indexing (entire folders at once)
DOCX, PPTX, and XLSX support
Follow-up question support
Optional toggle in code for chunk visibility
Index caching to skip re-embedding unchanged files
Command-line UI / TUI
Web UI

Project Status

NoteWeb is my first AI-integrated project — made while learning:

How LLMs like LLaMA work
What “semantic search” really means
How chunking, embeddings, and vector stores come together

This is the foundation for bigger projects — search tools, academic companions, even personalized AI.

Credits

Built with 💻 and ☕ by @marcanjoul
PDF parsing via PyMuPDF
DOCX parsing via python-docx
PowerPoint parsing via python-pptx
Excel parsing via openpyxl
Embeddings via sentence-transformers
LLM answers via Ollama and Meta’s LLaMA 3

Want to improve or collaborate?

Open an issue, drop a PR, or fork it and make it your own.I'd appreciate any feedback!

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
test files		test files
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
chunker.py		chunker.py
embedder.py		embedder.py
files_loader.py		files_loader.py
generate_index.py		generate_index.py
llm_answerer.py		llm_answerer.py
requirements.txt		requirements.txt
search.py		search.py
test_chunker.py		test_chunker.py
test_embed.py		test_embed.py
test_loader.py		test_loader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 NoteWeb

Features

Folder Structure

Requirements

How to Use

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

4. Add Your Files

5. Run this to generate semantic embeddings from files in your folder

6. Search Your Files with the AI

7. Ask a question

Why This Matters

File Support

Roadmap

Project Status

Credits

Want to improve or collaborate?

About

Uh oh!

Releases

Packages

Languages

marcanjoul/NoteWeb

Folders and files

Latest commit

History

Repository files navigation

🧠 NoteWeb

Features

Folder Structure

Requirements

How to Use

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

4. Add Your Files

5. Run this to generate semantic embeddings from files in your folder

6. Search Your Files with the AI

7. Ask a question

Why This Matters

File Support

Roadmap

Project Status

Credits

Want to improve or collaborate?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages