Disclaimer: This README was written by an LLM based on the provided code.
A Python-based bot that attempts to "speedrun" Wikipedia, navigating from a starting page to a target page using only the links available on the current page. It utilizes semantic embeddings to intelligently choose the most relevant link at each step.
The bot mimics the "Wikipedia Game" (or "Wiki Race") where the goal is to navigate from one article to another with the fewest clicks. Instead of random guessing, this bot uses Natural Language Processing (NLP) to calculate the semantic similarity between the available links on the current page and the target topic.
- Intelligent Navigation: Uses cosine similarity to determine the best path.
- Dual Embedding Modes:
- Dynamic (Ollama): Uses the
bge-m3model via Ollama. This mode is context-aware (considers surrounding text) and highly accurate but slower. - Static (Model2Vec): Uses the
minishlab/potion-base-32Mmodel viamodel2vec. This mode is extremely fast but relies on static embeddings.
- Dynamic (Ollama): Uses the
- Visualization: Optional Selenium integration to watch the bot navigate in real-time using Chrome or Firefox.
- Interactive CLI: Simple command-line interface to configure the run.
- Python: Version 3.11 or higher.
- Ollama: Required if you plan to use the "Dynamic" mode. Download Ollama.
- uv: Recommended for fast Python package management.
-
Clone the repository (if applicable) or navigate to the project directory.
-
Install dependencies using
uv:uv sync
-
Pull the embedding model (for Dynamic mode):
ollama pull bge-m3
The easiest way to run the bot is through the CLI:
uv run cli.py- Start URL: The full Wikipedia URL to start from (e.g.,
https://en.wikipedia.org/wiki/Mitochondrion). - End URL: The target Wikipedia URL (e.g.,
https://en.wikipedia.org/wiki/Adolf_Hitler). - Visualize:
yto open a browser window and watch the bot click links,nfor a headless console-only run. - Embedding Model:
1for Dynamic (Ollama)2for Static (Model2Vec)
cli.py: Entry point for the interactive command-line interface.main.py: Core logic for the speedrun loop and navigation.bs_scraper.py: Handles web scraping usingrequestsandBeautifulSoup. Extracts links and context.word_embedder.py: Manages embedding generation using either Ollama or Model2Vec.pyproject.toml: Project dependencies and configuration.
- The bot must start at the given Start URL.
- It can only click links present on the current page.
- It cannot go back to previously visited pages (to avoid loops).