UrbanArchive

Urban Dictionary Archive – A continuously scraped dataset of slang definitions from Urban Dictionary, automatically updated every 5-15 minutes via GitHub Actions.

📁 Data Structure

This repository maintains two complementary data storage formats:

1. Daily Dumps (`/data/YYYY-MM-DD.json`)

Daily chronological archives containing all entries fetched on a specific date. Each file contains an array of definition objects.

2. Alphabetical Dictionary (`/dictionary/A.json` ... `/dictionary/Z.json`)

Alphabetically organized files where entries are grouped by the first letter of the word. Each file is structured as a JSON object with words as keys and arrays of definitions as values.

📋 Entry Format

Each definition entry contains the following fields:

{
  "defid": 12345678,
  "word": "rizz",
  "definition": "Slang for charisma; ability to attract.",
  "example": "He's got mad rizz.",
  "written_on": "2025-09-09T21:31:00.000Z"
}

Field Descriptions

defid: Unique identifier for the definition (used for deduplication)
word: The slang term being defined
definition: The definition of the word
example: Usage example of the word
written_on: Timestamp when the definition was originally submitted

🔍 Querying the Data

By Date

To find all entries from a specific date:

# View entries from January 15, 2024
cat data/2024-01-15.json

By Alphabetical Order

To find all words starting with a specific letter:

# View all words starting with 'R'
cat dictionary/R.json

Example: Finding "rizz" definitions

# Look in the R.json file
jq '.rizz' dictionary/R.json

🔄 Data Collection Process

Source: Urban Dictionary Random API (https://api.urbandictionary.com/v0/random)
Frequency: Every 5-15 minutes via GitHub Actions
Deduplication: Entries are deduplicated by defid to ensure data cleanliness
Error Handling: API failures are handled gracefully with retry logic
Storage: Dual storage system for both chronological and alphabetical access

🚀 Running Locally

Clone the repository:

git clone https://github.com/yourusername/UrbanArchive.git
cd UrbanArchive

Install dependencies:

pip install -r requirements.txt

Run the fetcher:

python fetch_ud.py

📊 Data Statistics

Statistics are automatically updated with each data collection run

Current Database Status

Total Unique Entries:
Dictionary Files:
Daily Dumps:
Last Updated:

Collection Activity

Data is collected every 5-15 minutes via GitHub Actions
Each run fetches 50 batches with ~10 entries per batch
Automatic deduplication prevents duplicate entries
All activity is logged to logs/ directory (not committed to repo)

Repository Maintenance

Deduplication: No duplicate entries based on defid
Continuous Growth: New entries added every 5-15 minutes
Dual Access: Both chronological (daily) and alphabetical organization
JSON Validation: All data is validated before storage

🤝 Contributing

This is an automated data collection project. The main ways to contribute are:

Improving the fetching script (fetch_ud.py)
Enhancing data processing or storage formats
Adding data analysis tools or utilities
Reporting issues with the automation

📄 License

This project is open source. The collected data comes from Urban Dictionary's public API.

⚠️ Disclaimer

This archive contains user-generated content from Urban Dictionary. The definitions and examples may contain explicit language, offensive terms, or inappropriate content. This repository is for research and archival purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 5,875 Commits
.github/workflows		.github/workflows
data		data
dictionary		dictionary
.gitignore		.gitignore
README.md		README.md
fetch_ud.py		fetch_ud.py
requirements.txt		requirements.txt
stats.json		stats.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UrbanArchive

📁 Data Structure

1. Daily Dumps (`/data/YYYY-MM-DD.json`)

2. Alphabetical Dictionary (`/dictionary/A.json` ... `/dictionary/Z.json`)

📋 Entry Format

Field Descriptions

🔍 Querying the Data

By Date

By Alphabetical Order

Example: Finding "rizz" definitions

🔄 Data Collection Process

🚀 Running Locally

📊 Data Statistics

Current Database Status

Collection Activity

Repository Maintenance

🤝 Contributing

📄 License

⚠️ Disclaimer

About

Uh oh!

Contributors 2

Uh oh!

Languages

arkhamkxd/UrbanArchive

Folders and files

Latest commit

History

Repository files navigation

UrbanArchive

📁 Data Structure

1. Daily Dumps (/data/YYYY-MM-DD.json)

2. Alphabetical Dictionary (/dictionary/A.json ... /dictionary/Z.json)

📋 Entry Format

Field Descriptions

🔍 Querying the Data

By Date

By Alphabetical Order

Example: Finding "rizz" definitions

🔄 Data Collection Process

🚀 Running Locally

📊 Data Statistics

Current Database Status

Collection Activity

Repository Maintenance

🤝 Contributing

📄 License

⚠️ Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages

1. Daily Dumps (`/data/YYYY-MM-DD.json`)

2. Alphabetical Dictionary (`/dictionary/A.json` ... `/dictionary/Z.json`)