This is a simple tool to retrieve and store data from the European Union's public consultation platform "Have your say". Data is stored in an SQLite database. The tool can collect initiative and feedback data, download attached files and create CSV datasets of the collected data.
The "Have Your Say" API provides data on initiatives with publications nested within them. Publications can have feedback and attached files. Feedback can also have attached files. The data structure is as follows:
- Initiatives contain
- Publications contain
- Publication attachments
- Feedback contain
- Feedback attachments
- Publications contain
The tool re-creates this structure in an SQLite database but stores only the necessary raw JSON responses and extracts the relevant information on-the-fly when needed.
- Clone the repository and navigate to the project folder
git clone https://github.com/ghxm/haveyoursay.git
cd haveyoursay- Install the dependencies
pip install -r requirements.txtThe tool provides a CLI interface with three modes of operation: collect, download, and dataset.
python haveyoursay.py [common options] <mode> [mode options]The common options are:
--dbor-d: Path to the SQLite database file. Default ishaveyoursay.db.--verboseor-v: Enable verbose output.
See this help message for more information:
python haveyoursay.py --helpReplace <mode> with one of the following options:
collect: Collects data from the European Commission Have Your Say website. This should be run first. Use--updateto only request data not already in the database and--waitto specify seconds to wait in between requests.download: Downloads publication and feedback attachments from the collected data.- Use
--directoryto specify the output directory for the attachments. - Use
--onlyto specify the type(s) of documents to download (default is both publication and feedback attachments). - Attachments can be further filtered by
--publication-typeand--languageto reduce the number of files to download.
- Use
dataset: Createsmetaandtextdatasets from the collected data and output them as csv files.- Optional
<dataset_type>argument can be specified (metaortextdatasets, default ismeta), wheremetaproduces datasets from the raw metadata retreived viacollectbeforehand andtextextracts text from the attachments downloaded viadownload. - Use
--directoryto specify the output directory for the dataset, --attachmentsto include attachment datasets,--onlyto specify the type(s) of documents to create datasets for, and--mergeto merge all datasets into a single dataset (only valid formetadatsets).- For text datasets,
--input-directorycan be to specify a custom directory for the text files.
- Optional
See this help message for more information:
python haveyoursay.py <mode> --helpNote
Note that the publications dataset contains ~ 35 duplicate publications (as of spring 2024). These are not removed from the dataset to preserve the original data as closely as possible. All publications can be uniquely identified by the id field in combination with the initiative_id field.
The tool will automatically create the necessary tables in the database if they do not exist and document all runs in a logfile.
The project is structured as follows:
haveyoursay.py- the main script that contains the CLI interfacesrc/- folder containing the modules with the main functionalitycollect.py- the data collection moduledownload.py- the attachment download moduledataset.py- the dataset creation moduleutils.py- utility functions