How to Install Apache Airflow on Windows 10/11 without Docker

As data pipelines become increasingly complex and business demands for real-time insights grow, data engineers are turning to workflow management systems to orchestrate their ETL jobs efficiently and reliably. Apache Airflow has quickly become one of the most popular platforms for programmatically authoring, scheduling, and monitoring data workflows.

According to the 2021 Airflow Community Survey, Airflow‘s user base has grown by over 50% year-over-year, with adoption spreading beyond the technology industry into sectors like financial services, healthcare, and government. As a full-stack developer who has used Airflow in production for several years, I can attest to its power and flexibility for managing complex data pipelines at scale.

While Docker has become the de facto standard for deploying Airflow in production due to its isolation and portability benefits, there are still cases where installing Airflow directly on a Windows machine is preferred, such as:

  • Local development and testing, where the overhead of Docker may be unnecessary
  • Resource-constrained environments that can‘t support containerization
  • Organizations where Docker usage is restricted due to security or compliance policies

In this guide, we‘ll walk through how to install Apache Airflow on Windows step-by-step without using Docker. We‘ll cover the prerequisites, the installation process, configuring the Airflow environment, creating an example DAG, and some best practices and troubleshooting tips based on my experience as a professional Airflow developer. Let‘s dive in!

Prerequisites

Before we get started, ensure you have the following:

  • Windows 10 build 19041 or higher, or Windows 11
  • Windows Subsystem for Linux (WSL2) enabled
  • Python 3.7, 3.8, or 3.9 installed on WSL2
  • A code editor such as Visual Studio Code or PyCharm
  • Familiarity with using the command line

We‘ll be using WSL2 to run Airflow because it is not natively compatible with the Windows OS. WSL2 provides a Linux environment directly on Windows, which is necessary for executing the required commands and dependencies.

Step 1: Set Up a Python Virtual Environment

Using a virtual environment is a Python best practice that allows you to isolate project dependencies and avoid conflicts with other packages on your system. Open the Ubuntu terminal in WSL2 and run the following commands to create and activate a new virtual environment for Airflow:

sudo apt update
sudo apt install python3-virtualenv
virtualenv airflow-venv
source airflow-venv/bin/activate

Your command prompt should now be prefixed with (airflow-venv), indicating the virtual environment is active.

Step 2: Install Airflow and its Dependencies

With the virtual environment activated, we can install Airflow and its core dependencies using pip. To ensure compatibility, we‘ll specify a constraint file that pins the versions of Airflow‘s dependencies:

pip install "apache-airflow==2.3.2" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.3.2/constraints-3.7.txt"

Adjust the Airflow version and constraint file URL based on your needs. As of writing, 2.3.2 is the latest stable release.

Verify the installation succeeded by running:

airflow version

You should see output similar to:

2.3.2

Step 3: Initialize the Airflow Database

Airflow uses a SQL database to store metadata about DAGs, tasks, variables, and more. By default, it uses SQLite, which is sufficient for local development. For production, consider using a more robust database like PostgreSQL or MySQL.

To initialize the database, set the AIRFLOW_HOME environment variable to a directory where Airflow will create its configuration and store the database file. Then, run the db init command:

export AIRFLOW_HOME=~/airflow
airflow db init

This will create the necessary configuration files and initialize the Airflow database in the specified AIRFLOW_HOME directory.

Step 4: Create an Admin User

To access Airflow‘s web interface, we need to create an admin user. Run the following command, replacing the details with your own:

airflow users create \
    --username admin \
    --password password \
    --firstname Jane \
    --lastname Doe \
    --role Admin \
    --email [email protected]

This user will have full permissions to manage and monitor DAGs and tasks in the Airflow UI.

Step 5: Launch the Scheduler and Webserver

Airflow has two main components that need to run continuously in the background:

  1. The scheduler, which handles the scheduling and execution of tasks based on the defined DAG schedules
  2. The webserver, which provides the user interface for monitoring and managing DAGs, tasks, and logs

To start them, open two separate terminal windows. In the first, run:

airflow scheduler

In the second, run:

airflow webserver --port 8080

The scheduler and webserver will start up and begin processing any defined DAGs in the AIRFLOW_HOME/dags directory.

Open a web browser and navigate to http://localhost:8080. You should see the Airflow login page. Enter the admin credentials you created in Step 4 to access the Airflow UI.

Creating an Example DAG

Let‘s create an example DAG to see Airflow in action. DAGs are defined in Python files placed in the AIRFLOW_HOME/dags directory. Create a new file named example_dag.py in this directory with the following contents:

from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator

def print_hello():
    print("Hello from Python!")

with DAG(
    dag_id="example_dag",
    start_date=datetime(2023, 5, 1),
    schedule_interval="0 9 * * *",  # Run at 9AM UTC daily
    dagrun_timeout=timedelta(minutes=60),
    catchup=False,
    default_args={
        "retries": 2,
        "retry_delay": timedelta(minutes=5),
    }
) as dag:

    t1 = BashOperator(
        task_id="print_date",
        bash_command="date",
    )

    t2 = PythonOperator(
        task_id="print_hello",
        python_callable=print_hello,
    )

    t1 >> t2  # Define task dependency

This DAG defines two simple tasks:

  1. print_date: Executes the date bash command to print the current timestamp
  2. print_hello: Calls a Python function that prints "Hello from Python!"

The >> operator defines a task dependency, specifying that print_date should run before print_hello.

After saving the file, the Airflow scheduler should automatically detect the new DAG within a few seconds and display it in the UI. Toggle the DAG to "On" to enable it, and manually trigger a run to see the tasks execute.

Best Practices and Tips

Here are some best practices and lessons learned from my experience as a full-stack Airflow developer:

  • Use a version control system like Git to manage your DAG files, plugins, and configuration
  • Follow Airflow‘s style guide for naming conventions, docstrings, and code organization
  • Keep your DAGs lean and modular by extracting complex logic into separate Python functions or hook/operator plugins
  • Use Airflow Variables to store sensitive information like database passwords or API keys, rather than hardcoding them in DAGs
  • Set up monitoring and alerting for critical DAGs to proactively detect issues (e.g., Slack alerts on DAG failures)
  • Regularly update Airflow and its dependencies to access new features, bug fixes, and security patches
  • Consider using Airflow Pools to limit the number of concurrently running tasks and avoid overloading source/destination systems

Comparison to Other Workflow Management Tools

While Airflow has become increasingly popular, it‘s not the only workflow management tool available. Here‘s a comparison of Airflow‘s key features to some other popular platforms:

Feature Airflow Dagster Prefect Luigi
Language Python Python Python Python
License Apache 2.0 Apache 2.0 Open-core Apache 2.0
Scheduling Cron-based Time-based Cron or Interval Cron-based
UI Web UI Web UI Web UI No built-in UI
Task Dependencies DAGs Jobs & Ops Flows & Tasks Task Dependencies
Executors Celery, K8s, LocalExecutor Celery, K8s, LocalExecutor DaskExecutor, LocalExecutor Luigi Daemon
Sensors Yes Yes Yes No
Plugins/Extensions Hooks, Operators Libraries Tasks Contrib Packages
Dynamic DAG Generation Yes Yes Yes No
Deployment Docker, VM Docker, K8s Docker, K8s, VM Docker, Packages

As you can see, while there is overlap in core functionality, each tool has its own strengths and design paradigms. Airflow‘s rich plugin ecosystem, stable codebase, and extensive provider packages make it well-suited for large-scale production pipelines, particularly those involving multi-cloud and hybrid environments. However, newer entrants like Dagster and Prefect offer innovative features like data-aware scheduling, first-class support for streaming pipelines, and more flexible deployment options that are worth evaluating based on your specific use case.

Troubleshooting Common Issues

Even with the best practices, issues can still arise. Here are some common problems I‘ve encountered and how to resolve them:

  • ModuleNotFoundError when importing DAGs: Ensure your PYTHONPATH includes the AIRFLOW_HOME/dags directory, or use absolute imports
  • DAGs not showing up in the UI: Check that the DAG file is in the correct location (AIRFLOW_HOME/dags) and has a .py extension. Ensure there are no syntax errors in the DAG code preventing Airflow from parsing it.
  • Tasks stuck in "running" state: Verify that the Airflow scheduler is running and has sufficient resources. Check the task logs for any errors or timeouts.
  • Webserver not starting: Ensure that the specified port (default 8080) is not in use by any other applications. You can change the port using the --port flag.

Conclusion

In this guide, we covered how to install and configure Apache Airflow on Windows 10/11 without using Docker. While containerization has its benefits, installing Airflow directly can still be a viable option for local development or resource-constrained environments.

We walked through the key steps, including:

  1. Setting up a Python virtual environment
  2. Installing Airflow and its dependencies
  3. Initializing the Airflow database
  4. Creating an admin user
  5. Launching the scheduler and webserver
  6. Defining an example DAG

We also discussed some best practices, tips, and common issues based on real-world experience using Airflow in production.

Airflow is a powerful and flexible platform for orchestrating complex data workflows, but it‘s just one tool in the modern data engineer‘s toolkit. As a full-stack developer, it‘s important to evaluate a range of workflow management solutions and choose the one that best fits your team‘s skills, scale, and business requirements.

I hope this guide has been helpful in getting you up and running with Airflow on Windows. For more advanced topics like writing custom plugins, using XComs for inter-task communication, and deploying Airflow to production, check out the official Airflow documentation. You can also join the active Airflow community on Slack and Stack Overflow to learn from other practitioners and get support when needed.

Happy data pipelining!

Similar Posts