Skip to content

hitchon1/setup-airflow-ec2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

# Small EC2 Instance Airflow Tutorial

## EC2 Configuration
- **Instance Type**: T2.small
- **Key Pair**: Create a new key pair
- **Image**: Ubuntu

## Connect to EC2 Instance
After launching your EC2 instance, connect to it in aws


## Commands to Run


   sudo apt update
  1. Install Python 3 and pip:

    sudo apt install python3-pip
  2. Install SQLite3:

    sudo apt install sqlite3
  3. Install the Python 3.10 virtual environment package:

    sudo apt install python3.10-venv
  4. Create a Python virtual environment:

    python3 -m venv venv
  5. Activate the virtual environment:

    source venv/bin/activate
  6. (Optional) Install PostgreSQL development libraries:

    sudo apt-get install libpq-dev
  7. Install Apache Airflow with PostgreSQL support:

    pip install "apache-airflow[postgres]==2.5.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.5.0/constraints-3.7.txt"
  8. Initialize the Airflow database:

    airflow db init
  9. Install PostgreSQL and its contrib package:

    sudo apt-get install postgresql postgresql-contrib
  10. Switch to the PostgreSQL user:

    sudo -i -u postgres
  11. Access the PostgreSQL shell:

    psql
  12. Create the Airflow database and user:

    CREATE DATABASE airflow;
    CREATE USER airflow WITH PASSWORD 'airflow';
    GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;
  13. Exit the PostgreSQL shell by pressing Ctrl + D.

  14. Navigate to the Airflow directory:

    cd airflow
  15. Replace the connection string in the airflow.cfg file to use PostgreSQL instead of SQLite:

    sed -i 's#sqlite:////home/ubuntu/airflow/airflow.db#postgresql+psycopg2://airflow:airflow@localhost/airflow#g' airflow.cfg
  16. Verify the SQL Alchemy connection string:

    grep sql_alchemy airflow.cfg
  17. Check the executor configuration:

    grep executor airflow.cfg
  18. Replace SequentialExecutor with LocalExecutor:

    sed -i 's#SequentialExecutor#LocalExecutor#g' airflow.cfg
  19. Re-initialize the Airflow database:

    airflow db init
  20. Create an Airflow user:

    airflow users create -u airflow -f airflow -l airflow -r Admin -e airflow@gmail.com
    • Enter Password: airflow
    • Repeat Password: airflow
  21. Update security group inbound rules in your EC2 instance:

    • Type: Custom TCP
    • Port Range: 8080
    • Source: Anywhere (IPV4)
    • Save the rules.
  22. Start the Airflow webserver:

    airflow webserver &
  23. Start the Airflow scheduler:

    airflow scheduler
  24. Copy the public IPv4 DNS of your EC2 instance and open it in your browser with port 8080:

    http://your-ec2-public-ip:8080
    

Use below dag code as a test. make sure to mkdir dag folder and then put this .py script in there!

from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator
from datetime import datetime

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
    'retries': 1,
}

dag = DAG(
    'my_new_dag',
    default_args=default_args,
    schedule_interval='@daily',
)

start = DummyOperator(task_id='start', dag=dag)
end = DummyOperator(task_id='end', dag=dag)

start >> end

About

this is a tutorial to setup airflow on an ec2

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors