0
0
Apache Airflowdevops~30 mins

DAG concept (Directed Acyclic Graph) in Apache Airflow - Mini Project: Build & Apply

Choose your learning style9 modes available
Basic Airflow DAG Creation
📖 Scenario: You are a data engineer setting up a simple workflow to run daily tasks using Apache Airflow. You want to create a Directed Acyclic Graph (DAG) that defines the order of tasks.
🎯 Goal: Build a basic Airflow DAG with two tasks where the second task runs after the first. This will help you understand how to define workflows in Airflow using DAGs.
📋 What You'll Learn
Create a DAG object with a specific dag_id and schedule
Define two PythonOperator tasks with simple Python functions
Set task dependencies so the second task runs after the first
Print the DAG structure by listing task IDs in order
💡 Why This Matters
🌍 Real World
Airflow DAGs are used to automate and schedule workflows like data pipelines, report generation, and system maintenance tasks.
💼 Career
Understanding DAGs is essential for roles like data engineer, DevOps engineer, and workflow automation specialist who manage task scheduling and orchestration.
Progress0 / 4 steps
1
Create the DAG object
Import DAG from airflow and create a DAG object called simple_dag with dag_id='simple_dag' and schedule_interval='@daily'. Use start_date=datetime(2024, 1, 1) and catchup=False.
Apache Airflow
Need a hint?

Use the DAG class from airflow and pass the parameters exactly as shown.

2
Define two PythonOperator tasks
Import PythonOperator from airflow.operators.python. Define two Python functions called task_one_func and task_two_func that each print a simple message. Then create two tasks called task_one and task_two using PythonOperator with task_id matching the variable names and assign python_callable to the respective functions. Attach both tasks to the simple_dag DAG.
Apache Airflow
Need a hint?

Define simple functions that print messages and use them as python_callable in PythonOperator.

3
Set task dependencies
Set the dependency so that task_two runs after task_one by writing task_one >> task_two.
Apache Airflow
Need a hint?

Use the >> operator to set the order of tasks.

4
Print the DAG task order
Write a print statement that outputs the list of task IDs in the DAG in the order they will run. Use print([task.task_id for task in simple_dag.topological_sort()]).
Apache Airflow
Need a hint?

Use simple_dag.topological_sort() to get tasks in order and print their task_id.