GuAPIB aka Guardian-API-to-Broker is an aws deployable python script to retrieve articles from the guardian API and relay them to a Rabbit MQ hosted message broker.
Though this tool will work on any operating system, the tools to help you install it are designed to run on Linux Ubuntu using bash. If you encounter errors using the make file, you will need to run the commands manually and fix errors as you find them.
In order to run the application, the .env file needs to be complete.
Rename the .env.example file to .env and use the steps below to get the required credentials.
Create an account on the Guardian API website. Make a note of the API key provided after email verification. This will be used to call the API.
Input the API key to GUARDIAN_API_KEY in the .env file.
RabbitMQ is our message broker. You can self host a server but for simplicity we will make use of a free tier server hosted by CloudAMQP.
-
Set up a 'Little Lemur' free tier account on cloudamqp to host your queue.
-
Follow the set up instructions in the CloudAMQP documentation
-
Make a note of the AMQPS connection URL provided to you in your dashboard, it will be in the form
amqp://<user>:<password>@<server>/<vhost>Input this intoCLOUDAMQP_URLin the.envfile.
GuAPIB runs using AWS. You will need an AWS account.
GuAPIB will use the following AWS services:
- IAM - setting permissions for application execution
- Lambda - running the package
- Secrets Manager - holding environmental variables (your API key and server credentials)
Once you have made an AWS account, follow these steps to make a user to run terraform with:
- log in to the eu-west-2 region console.
- Into the search bar, type
IAMand select this option. - Click on
Usersin the left-hand menu. Ceate User-> User name should be something descriptive, e.g. GuAPIB-terraform-user- Select
Attach policies directlyand attach appropriate policies, if you are unsure attachAdministratorAccesswhich will allow terraform access to everything. Note: Once you have set up the remainder of your data pipeline, you should reduce the policies you have allowed to terraform to protect your account from malicious behaviour. - Click
Create - Select
Security credentialstab - Under
Access keysclickCreate access key - Select
Third-party serviceas your Use case. - Note your
AWS Access key IDandAWS Secret access key. The secret key will not be displayed again.
Enter both of these keys into the .env file as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Terraform is an Infrastructure as Code (IaC) tool and is used to provision everything in aws for you automatically.
Install it by entering the following into a bash terminal
make setup-machine
If this results in an error, please install manually from the terraform website.
As above, in order to run the application, the .env file needs to be complete. Rename the .env.example file to .env and fill in the required credentials.
The Makefile handles most of the application building.
make deploy - builds your AWS infrastructure with Terraform.
make plan - plans your AWS infrastructure with Terraform. Note Terraform init must be run at least once from within the terraform directory of the project in order to be able to use this command.
make destory - destroys all terraform created infrastructure in AWS.
To use the application, an event JSON should be sent via AWS Lambda. This is intended to be done as part of a data pipeline, but can also be triggered direct from the AWS console if needed.
- Go to your lambda dashboard
- Select your function
guardian-api-to-broker - Go to the
Testtab - Enter your search arguments, e.g.
{
"search_query": "data",
"date_from": "2025-01-01",
"queue_name": "guardian_content"
}- Leave all other defaults as is. Click
Test
Your code should successfully run!
- Head to your CloudAMQP dashboard to see your message being received.
Select your queue name on the
Queues and Streamstab, and clickGet Messages. Note messages are only stored for a maximum of three days.
search_query - optional - word, phrase or collection of words. Supports AND, OR and NOT operators, and exact phrase queries using double quotes escaped with backslashes.
e.g. sausages, "pork sausages", sausages AND (mash OR chips), sausages AND NOT (saveloy OR battered)
date_from - optional - reverse date in YYYY-MM-DD format. Returns only content published on or after this date.
e.g. 2024-02-16
queue_name - required - queue reference for message broker.
e.g. guardian_content
{
"search_query": "\"machine learning\"",
"date_from": "2025-01-01",
"queue_name": "guardian_content"
}This will find guardian articles related to 'computers' posted after 1st January 2025, and will add them to a message broker queue named 'guardian_content'.
The application will return up to the 10 most recent articles. If less or none are returned, the return value will be an empty list.
[{
"webPublicationDate": "2025-10-08T14:00:32Z",
"webTitle": "'AI is here to stay and change things': Mad Max director George Miller on why he is taking part in an AI film festival",
"webUrl": "https://www.theguardian.com/film/2025/oct/09/ai-film-making-omni-festival-mad-max-director-george-miller-interview"
},{
"webPublicationDate": "2025-10-02T05:00:40Z",
"webTitle": "UK fifth-worst country in Europe for loss of green space to development",
"webUrl": "https://www.theguardian.com/environment/2025/oct/02/uk-fifth-worst-country-in-europe-for-loss-of-green-space-to-development"
}] - Check your aws credentials are being stored in your bash script if you are having authentication/credential issues when using Terraform.
- Ensure you have all five variables filled in your
.envfile, and that it is saved in your project root. - This guide is designed for bash on linux, and for the eu-west-1 AWS region.
- AWS is reluctant to delete stored secrets sometimes. This feature is overwritten in this application, but is something to be aware of. You made need to change the secret variable name in variables.tf and aws_secrets.py if you encounter this error.