0
0
Apache Airflowdevops~10 mins

Creating a basic DAG file in Apache Airflow - Visual Walkthrough

Choose your learning style9 modes available
Process Flow - Creating a basic DAG file
Import Airflow modules
Define default_args dict
Create DAG object with args
Define tasks as Operators
Set task dependencies
Save DAG file in airflow/dags folder
This flow shows the steps to create a DAG file: import modules, set default arguments, create the DAG, define tasks, set their order, and save the file.
Execution Sample
Apache Airflow
from airflow import DAG
from airflow.providers.bash.operators.bash import BashOperator
from datetime import datetime

with DAG('my_first_dag', start_date=datetime(2024, 1, 1), schedule_interval='@daily') as dag:
    task1 = BashOperator(task_id='print_date', bash_command='date')
This code creates a DAG named 'my_first_dag' that runs daily and has one task that prints the date.
Process Table
StepActionEvaluationResult
1Import airflow modulesModules importedDAG, BashOperator available
2Define DAG with name 'my_first_dag', start_date 2024-01-01, schedule '@daily'DAG object createdDAG 'my_first_dag' ready
3Define task1 as BashOperator with task_id 'print_date' and command 'date'Task createdTask 'print_date' ready
4Set task dependencies (none here)No dependencies setSingle task DAG
5Save file in airflow/dags folderFile savedDAG available to Airflow scheduler
💡 All steps completed; DAG file is ready and recognized by Airflow
Status Tracker
VariableStartAfter Step 2After Step 3Final
dagNoneDAG object with name 'my_first_dag'Same DAG objectSame DAG object
task1NoneNoneBashOperator task 'print_date'Same task object
Key Moments - 3 Insights
Why do we need to import specific modules like DAG and BashOperator?
Because these modules provide the classes and functions needed to create and define the DAG and its tasks, as shown in execution_table step 1.
What does the 'with DAG(...) as dag:' syntax do?
It creates the DAG object and allows tasks defined inside the block to be automatically associated with this DAG, as seen in execution_table step 2 and 3.
Why is setting task dependencies important even if not shown here?
Dependencies define the order tasks run. Without them, tasks run independently. This example has one task, so no dependencies are needed, as noted in step 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the state of 'task1' after step 3?
ATask 'print_date' not created yet
BTask 'print_date' deleted
CTask 'print_date' created as BashOperator
DTask 'print_date' running
💡 Hint
Check the 'Result' column in row for step 3 in execution_table
At which step is the DAG object created?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the 'Action' and 'Result' columns in execution_table for step 2
If you add a second task and set it to run after 'task1', which step would change?
AStep 4 - Set task dependencies
BStep 2 - Define DAG
CStep 1 - Import modules
DStep 5 - Save file
💡 Hint
Setting task order is done in step 4 according to execution_table
Concept Snapshot
Creating a basic Airflow DAG file:
1. Import DAG and Operators
2. Define DAG with name, start_date, schedule
3. Define tasks inside DAG context
4. Set task dependencies
5. Save file in airflow/dags folder
Airflow scheduler detects and runs the DAG
Full Transcript
To create a basic Airflow DAG file, first import the necessary modules like DAG and BashOperator. Then define the DAG object with a name, start date, and schedule interval. Inside the DAG context, define tasks such as BashOperator tasks. Optionally set dependencies between tasks to control execution order. Finally, save the file in the airflow/dags folder so Airflow can find and run it. This process ensures your workflow is automated and scheduled.