I first started this project before agents and LLMs wrote code for us. I had to, god forbid, copy-paste regex and css selectors like in the old days...
What is it anyway?
An automated project aimed to provide an easy-to-use database with all the goodies from the folks at the R Weekly Highlights podcast.
Full episodes breakthrough: Description, shownotes and full transcripts (where available) of each episode.
What can it become? Whatever you make of it!
repo <- "https://github.com/iamYannC/r-podcast/raw/main/outputs"
# R Binary (RDS)
snapshot <- paste0(repo, "/snapshots/snapshot_latest.rds") |>
url() |> readRDS()
closeAllConnections()
# SQLite Database
sqlite_url <- paste0(repo, "/exports/snapshot_sqlite.sqlite")
sqlite_file <- tempfile(fileext = ".sqlite")
download.file(sqlite_url, sqlite_file, mode = "wb")
con <- DBI::dbConnect(RSQLite::SQLite(), sqlite_file)
snapshot <- lapply(DBI::dbListTables(con),\(tb) DBI::dbReadTable(con, tb))
DBI::dbDisconnect(con)
# pip install pandas openpyxl
from pathlib import Path
import sqlite3
import urllib.request
import pandas as pd
repo = "https://github.com/iamYannC/r-podcast/raw/main/outputs"
# Excel Workbook
xlsx_url = f"{repo}/exports/snapshot_xlsx.xlsx"
xlsx_path = Path("snapshot_xlsx.xlsx")
urllib.request.urlretrieve(xlsx_url, xlsx_path)
meta_xlsx = pd.read_excel(xlsx_path, sheet_name="meta")
# SQLite Database
sqlite_url = f"{repo}/exports/snapshot_sqlite.sqlite"
sqlite_path = Path("snapshot_sqlite.sqlite")
urllib.request.urlretrieve(sqlite_url, sqlite_path)
con = sqlite3.connect(sqlite_path)
con.close()Just download the xlsx workbook.
find your preferred file type in outputs/snapshots (.rds) or outputs/exports (SQLite and xlsx).
Imagine my surprise to see that someone forked my repo, and it wasnt even by accident!
Nils Indreiten built a cool AI chatbot based on (or inspired by) the previous version of this scraping project. Go check it out (but don't burn his api credits...) 👇
This project is not affiliated with or endorsed by the R Weekly team. This is an independent, fun project to make podcast data more accessible. and because before LLMs it was real good practice of web-scraping! (it stil is, but differnet...)
I encourage everyone to:
- Use, Tweak, Copy & Build Whatever comes to mind. just let me know about it. And if you find this useful, give it a star ⭐ - my mom will be proud!
All contact details 👉 🌐 www.yann-dev.io