Skip to content

nj7782/Webscraping-Rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rust

Webscraping-Rust

In this post I'm going to write a Rust script that scrapes (web-scraping) information about the 10 most COVID affected countries with the details (total cases, total deaths, and total recovered) from worldometers.info to a CSV file?

Scraping Ecosystem

I will be using the following libraries

  • [reqwest]

    An easy and powerful Rust HTTP Client

  • [scraper]

    HTML parsing and querying with CSS selectors

  • [select.rs]

    A Rust library to extract useful data from HTML documents, suitable for web scraping

  • [tokio::main]

    A runtime for writing reliable network applications without compromising speed.

  • [csv::Writer]

    A CSV writer takes as input Rust values and writes those values in a valid CSV format as output.

The first script will perform a fairly basic task: grabbing all the covid data from the page. For this, we'll utilize reqwest and scraper. As you can see the syntax is fairly concise and straightforward.

extern crate reqwest;
extern crate scraper;

use scraper::{Selector, Html};
use csv::Writer;
use std::fs;
use std::fs::File;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {

    let resp = reqwest::get("https://www.worldometers.info/coronavirus/#countries").await?;
    assert!(resp.status().is_success());


    let body = resp.text().await?;

    let fragment = Html::parse_document(&body);

    let countries = Selector::parse("#main_table_countries_today tbody:nth-child(2) tr:not([style*=\"display: none\"]):not(.total_row_world)").unwrap();

    let mut count = 1;
    

I want top 10 covid countries data to wrap up in a csv file. So, I will create a path and select the data which I require.

   let mut wtr = Writer::from_path("data.csv")?;

   wtr.write_record(&["#", "Country", "Total Cases", "Total Deaths", "Total Recovered"])?;


   for country in fragment.select(&countries) {

       let mut country_txt = country.text().collect::<Vec<_>>();

       country_txt.retain(|x| *x != "\n" && !x.starts_with("+"));

       wtr.write_record(&[country_txt[0], country_txt[1], country_txt[2], country_txt[3], country_txt[4]])?;


       if cnt >= 10 { break; }
       else {cnt += 1;}
   }


   wtr.flush()?;
   Ok(())
}

The end result of running this script is as follows:

data

Thanks For Watching my repo:)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages