Skip to content

josephxtian/guardian-api-to-broker

Repository files navigation

GuAPIB

GuAPIB aka Guardian-API-to-Broker is an aws deployable python script to retrieve articles from the guardian API and relay them to a Rabbit MQ hosted message broker.

Though this tool will work on any operating system, the tools to help you install it are designed to run on Linux Ubuntu using bash. If you encounter errors using the make file, you will need to run the commands manually and fix errors as you find them.

Prerequisites

In order to run the application, the .env file needs to be complete.

Rename the .env.example file to .env and use the steps below to get the required credentials.

Guardian API

Create an account on the Guardian API website. Make a note of the API key provided after email verification. This will be used to call the API.

Input the API key to GUARDIAN_API_KEY in the .env file.

RabbitMQ

RabbitMQ is our message broker. You can self host a server but for simplicity we will make use of a free tier server hosted by CloudAMQP.

  1. Set up a 'Little Lemur' free tier account on cloudamqp to host your queue.

  2. Follow the set up instructions in the CloudAMQP documentation

  3. Make a note of the AMQPS connection URL provided to you in your dashboard, it will be in the form amqp://<user>:<password>@<server>/<vhost> Input this into CLOUDAMQP_URL in the .env file.

AWS - Amazon Web Services

GuAPIB runs using AWS. You will need an AWS account.

GuAPIB will use the following AWS services:

  • IAM - setting permissions for application execution
  • Lambda - running the package
  • Secrets Manager - holding environmental variables (your API key and server credentials)

Once you have made an AWS account, follow these steps to make a user to run terraform with:

  1. log in to the eu-west-2 region console.
  2. Into the search bar, type IAM and select this option.
  3. Click on Users in the left-hand menu.
  4. Ceate User -> User name should be something descriptive, e.g. GuAPIB-terraform-user
  5. Select Attach policies directly and attach appropriate policies, if you are unsure attach AdministratorAccess which will allow terraform access to everything. Note: Once you have set up the remainder of your data pipeline, you should reduce the policies you have allowed to terraform to protect your account from malicious behaviour.
  6. Click Create
  7. Select Security credentials tab
  8. Under Access keys click Create access key
  9. Select Third-party service as your Use case.
  10. Note your AWS Access key ID and AWS Secret access key. The secret key will not be displayed again.

Enter both of these keys into the .env file as AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY

Terraform

Terraform is an Infrastructure as Code (IaC) tool and is used to provision everything in aws for you automatically.

Install it by entering the following into a bash terminal make setup-machine If this results in an error, please install manually from the terraform website.

Setup

As above, in order to run the application, the .env file needs to be complete. Rename the .env.example file to .env and fill in the required credentials.

The Makefile handles most of the application building.

make deploy - builds your AWS infrastructure with Terraform.

make plan - plans your AWS infrastructure with Terraform. Note Terraform init must be run at least once from within the terraform directory of the project in order to be able to use this command.

make destory - destroys all terraform created infrastructure in AWS.

Usage

To use the application, an event JSON should be sent via AWS Lambda. This is intended to be done as part of a data pipeline, but can also be triggered direct from the AWS console if needed.

  1. Go to your lambda dashboard
  2. Select your function guardian-api-to-broker
  3. Go to the Test tab
  4. Enter your search arguments, e.g.
{
  "search_query": "data",
  "date_from": "2025-01-01",
  "queue_name": "guardian_content"
}
  1. Leave all other defaults as is. Click Test

Your code should successfully run!

  1. Head to your CloudAMQP dashboard to see your message being received. Select your queue name on the Queues and Streams tab, and click Get Messages. Note messages are only stored for a maximum of three days.

Arguments

search_query - optional - word, phrase or collection of words. Supports AND, OR and NOT operators, and exact phrase queries using double quotes escaped with backslashes.

e.g. sausages, "pork sausages", sausages AND (mash OR chips), sausages AND NOT (saveloy OR battered)

date_from - optional - reverse date in YYYY-MM-DD format. Returns only content published on or after this date.

e.g. 2024-02-16

queue_name - required - queue reference for message broker.

e.g. guardian_content

Example

{
  "search_query": "\"machine learning\"",
  "date_from": "2025-01-01",
  "queue_name": "guardian_content"
}

This will find guardian articles related to 'computers' posted after 1st January 2025, and will add them to a message broker queue named 'guardian_content'.

The application will return up to the 10 most recent articles. If less or none are returned, the return value will be an empty list.

[{
    "webPublicationDate": "2025-10-08T14:00:32Z",
    "webTitle": "'AI is here to stay and change things': Mad Max director George Miller on why he is taking part in an AI film festival",
    "webUrl": "https://www.theguardian.com/film/2025/oct/09/ai-film-making-omni-festival-mad-max-director-george-miller-interview"
},{
    "webPublicationDate": "2025-10-02T05:00:40Z",
    "webTitle": "UK fifth-worst country in Europe for loss of green space to development",
    "webUrl": "https://www.theguardian.com/environment/2025/oct/02/uk-fifth-worst-country-in-europe-for-loss-of-green-space-to-development"
}] 

Common Debugging

  • Check your aws credentials are being stored in your bash script if you are having authentication/credential issues when using Terraform.
  • Ensure you have all five variables filled in your .env file, and that it is saved in your project root.
  • This guide is designed for bash on linux, and for the eu-west-1 AWS region.
  • AWS is reluctant to delete stored secrets sometimes. This feature is overwritten in this application, but is something to be aware of. You made need to change the secret variable name in variables.tf and aws_secrets.py if you encounter this error.

About

Deployable python script to take articles from the guardian API and relay them using RabbitMQ to a message broker.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors