Skip to content

yangyuwang/wikiart_metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WikiArt Dataset Scraping

This repository contains scripts for scraping artist and artwork information from WikiArt using Selenium. The dataset generated from these scripts includes detailed information about artists and their artworks. Wikipedia expanded data is from kaggle as a basic list of artists to extract.

Author: Yangyu Wang

Date: January 18, 2025

Contents

This Jupyter notebook contains the code for scraping artist information and their artworks from WikiArt. The main steps include:

  • Generating artist names from an existing dataset.
  • Opening Firefox using Selenium WebDriver.
  • Extracting artist information and artworks.
  • Saving the extracted data into CSV files.
  • Re-scraping for not found items.
  • Results see artist_data_new.csv
  • Results see artist_artwork.csv

This Jupyter notebook focuses on scraping detailed information about artworks from WikiArt. The main steps include:

This Jupyter notebook is designed for scraping art images using the Requests library. The main steps include:

  • Loading artwork data from a CSV file.
  • Renaming columns for consistency.
  • Saving the modified data to a new CSV file.
  • Listing files in the target directory.
  • Downloading images using the img2dataset library.
  • Displaying an example of the scraped images.
  • Results see Wikiart Images

This Jupyter notebook is dedicated to scraping artist information from Wikipedia. The main steps include:

  • Loading artist data from a CSV file.
  • Defining a function to scrape Wikipedia pages.
  • Iterating through the list of artist URLs.
  • Handling errors and saving the scraped data into HTML and TXT files.
  • Results see artist_wikipedia_content

Dataset

The dataset generated from these scripts includes:

Requirements

  • Python 3.10.0
  • uv

You can use uv sync after installation of uv, to syncronize all the requirements of the scraping. For jupyter notebook, please use the .venv generated by uv.

Notes

  • The scraping process may take a significant amount of time due to the large number of artists and artworks.
  • Ensure that the Geckodriver version is compatible with the installed Firefox version.

About

Scraping and dataset of wikiart

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •