https://tjkemper.github.io/jre/
# Update data
$ python3 ./youtube-scraper/youtube_scraper.py
# Deploy website
$ cd ./frontend/
$ npm run deployThe web scraper has two parts:
- Get video ids
- Update video metadata
response = requests.get(video_url) # fast
soup = BeautifulSoup(response.content, "html.parser") # slowLearned facts:
- Creating the
BeautifulSoupobject is the most expensive operation. - Web scraping requires sequential video access:
- Get first video
- Get next video
- Get next video
- ...
We want to make the sequential part as inexpensive as possible: get video ids.
Once we have the video ids, updating video metadata can be done in parallel.
Input: playlist_url, datafile
Output: New video ids are saved to datafile
This function will iterate through the entire playlist until the number of stored video ids is the same as the playlist length.
Input: datafile, full_update
Output: Video metadata saved to datafile
full_update: If True, update all videos. Else, only update videos that have no metadata.
Update ./frontend/src/data/jre.json
python3 ./youtube-scraper/youtube_scraper.pyYou can also run get_video_ids.py and update_video_metadata.py individually.
https://tjkemper.github.io/jre/
Served using GitHub Pages. Database is a json file.
Analytics provides several facts and graphs.
- Total views
- Number of videos
- Total time
- Most viewed video
- Longest video
- Views over time
- Keyword frequency
- Keyword views
- Family friendly
Searchable & sortable table.
Are you curious to know which videos are not family friendly?
Get random video.