Skip to content

yashasnadigsyn/wikipedia_speedrunner

Repository files navigation

Wikipedia Speedrunner Bot

Disclaimer: This README was written by an LLM based on the provided code.

A Python-based bot that attempts to "speedrun" Wikipedia, navigating from a starting page to a target page using only the links available on the current page. It utilizes semantic embeddings to intelligently choose the most relevant link at each step.

Overview

The bot mimics the "Wikipedia Game" (or "Wiki Race") where the goal is to navigate from one article to another with the fewest clicks. Instead of random guessing, this bot uses Natural Language Processing (NLP) to calculate the semantic similarity between the available links on the current page and the target topic.

Features

  • Intelligent Navigation: Uses cosine similarity to determine the best path.
  • Dual Embedding Modes:
    • Dynamic (Ollama): Uses the bge-m3 model via Ollama. This mode is context-aware (considers surrounding text) and highly accurate but slower.
    • Static (Model2Vec): Uses the minishlab/potion-base-32M model via model2vec. This mode is extremely fast but relies on static embeddings.
  • Visualization: Optional Selenium integration to watch the bot navigate in real-time using Chrome or Firefox.
  • Interactive CLI: Simple command-line interface to configure the run.

Prerequisites

  • Python: Version 3.11 or higher.
  • Ollama: Required if you plan to use the "Dynamic" mode. Download Ollama.
  • uv: Recommended for fast Python package management.

Installation

  1. Clone the repository (if applicable) or navigate to the project directory.

  2. Install dependencies using uv:

    uv sync
  3. Pull the embedding model (for Dynamic mode):

    ollama pull bge-m3

Usage

The easiest way to run the bot is through the CLI:

uv run cli.py

Interactive Prompts

  1. Start URL: The full Wikipedia URL to start from (e.g., https://en.wikipedia.org/wiki/Mitochondrion).
  2. End URL: The target Wikipedia URL (e.g., https://en.wikipedia.org/wiki/Adolf_Hitler).
  3. Visualize: y to open a browser window and watch the bot click links, n for a headless console-only run.
  4. Embedding Model:
    • 1 for Dynamic (Ollama)
    • 2 for Static (Model2Vec)

Project Structure

  • cli.py: Entry point for the interactive command-line interface.
  • main.py: Core logic for the speedrun loop and navigation.
  • bs_scraper.py: Handles web scraping using requests and BeautifulSoup. Extracts links and context.
  • word_embedder.py: Manages embedding generation using either Ollama or Model2Vec.
  • pyproject.toml: Project dependencies and configuration.

Rules

  1. The bot must start at the given Start URL.
  2. It can only click links present on the current page.
  3. It cannot go back to previously visited pages (to avoid loops).

About

A wikipedia speedrunner using model embeddings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages