Skip to content

iamYannC/r-podcast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

670 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R Weekly Podcast Scraper 🙃 🚴

I first started this project before agents and LLMs wrote code for us. I had to, god forbid, copy-paste regex and css selectors like in the old days...

What is it anyway? An automated project aimed to provide an easy-to-use database with all the goodies from the folks at the R Weekly Highlights podcast.
Full episodes breakthrough: Description, shownotes and full transcripts (where available) of each episode.

What can it become? Whatever you make of it!


Show me the data 📊

R Users

repo <- "https://github.com/iamYannC/r-podcast/raw/main/outputs"

# R Binary (RDS)
snapshot <- paste0(repo, "/snapshots/snapshot_latest.rds") |>
  url() |> readRDS()

closeAllConnections()


# SQLite Database
sqlite_url <- paste0(repo, "/exports/snapshot_sqlite.sqlite")
sqlite_file <- tempfile(fileext = ".sqlite")

download.file(sqlite_url, sqlite_file, mode = "wb")
con <- DBI::dbConnect(RSQLite::SQLite(), sqlite_file)
snapshot <- lapply(DBI::dbListTables(con),\(tb) DBI::dbReadTable(con, tb))
DBI::dbDisconnect(con) 

Python Users

# pip install pandas openpyxl
from pathlib import Path
import sqlite3
import urllib.request

import pandas as pd
repo = "https://github.com/iamYannC/r-podcast/raw/main/outputs"

# Excel Workbook
xlsx_url = f"{repo}/exports/snapshot_xlsx.xlsx"
xlsx_path = Path("snapshot_xlsx.xlsx")
urllib.request.urlretrieve(xlsx_url, xlsx_path)
meta_xlsx = pd.read_excel(xlsx_path, sheet_name="meta")

# SQLite Database
sqlite_url = f"{repo}/exports/snapshot_sqlite.sqlite"
sqlite_path = Path("snapshot_sqlite.sqlite")
urllib.request.urlretrieve(sqlite_url, sqlite_path)
con = sqlite3.connect(sqlite_path)
con.close()

Regular people

Just download the xlsx workbook.

find your preferred file type in outputs/snapshots (.rds) or outputs/exports (SQLite and xlsx).

🎉 Shout Out!

Imagine my surprise to see that someone forked my repo, and it wasnt even by accident!

Nils Indreiten built a cool AI chatbot based on (or inspired by) the previous version of this scraping project. Go check it out (but don't burn his api credits...) 👇


📂 Project Structure

Read more here

⚠️ Non-Affiliation

This project is not affiliated with or endorsed by the R Weekly team. This is an independent, fun project to make podcast data more accessible. and because before LLMs it was real good practice of web-scraping! (it stil is, but differnet...)

I encourage everyone to:

  • Use, Tweak, Copy & Build Whatever comes to mind. just let me know about it. And if you find this useful, give it a star ⭐ - my mom will be proud!

💬 Let's Talk

All contact details 👉 🌐 www.yann-dev.io

About

A fun mini-project scraping the R-Weekly podcast

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages