This directory contains helper scripts for running and analyzing BIS Scraper operations.
Script to run a full BIS speech scraping and conversion process. This script downloads speeches from the BIS website and converts them to text.
Usage:
./run_full_scrape.shConfiguration: Edit the variables at the top of the script to customize:
- Data directory
- Date range
- Institution filtering
- Force download options
- Speech limits
Script to analyze the results of the BIS scraping process. Provides statistics on the downloaded PDFs and converted text files.
Usage:
./analyze_results.sh [path_to_data_dir]If no data directory is specified, the script will use the default ($HOME/bis_full_data).
Output:
- Counts of PDF and text files
- Breakdown by institution
- Breakdown by year
- Conversion success rate
- Recent log entries
-
Configure the scraping job:
nano scripts/run_full_scrape.sh
-
Run the scraping process:
cd scripts ./run_full_scrape.shFor long-running jobs, consider using screen or tmux:
screen -S bis_scraper cd scripts ./run_full_scrape.sh # Detach with Ctrl+A followed by D # Reattach later with: screen -r bis_scraper
-
Analyze the results:
cd scripts ./analyze_results.sh