Installation

Getting Airflow

The easiest way to install the latest stable version of Airflow is with pip:

pip install apache-airflow

You can also install Airflow with support for extra features like gcp or postgres:

pip install 'apache-airflow[postgres,gcp]'

Extra Packages

The apache-airflow PyPI basic package only installs what’s needed to get started. Subpackages can be installed depending on what will be useful in your environment. For instance, if you don’t need connectivity with Postgres, you won’t have to go through the trouble of installing the postgres-devel yum package, or whatever equivalent applies on the distribution you are using.

Behind the scenes, Airflow does conditional imports of operators that require these extra dependencies.

Here’s the list of the subpackages and what they enable:

subpackage

install command

enables

all

pip install 'apache-airflow[all]'

All Airflow features known to man

all_dbs

pip install 'apache-airflow[all_dbs]'

All databases integrations

atlas

pip install 'apache-airflow[atlas]'

Apache Atlas to use Data Lineage feature

async

pip install 'apache-airflow[async]'

Async worker classes for Gunicorn

aws

pip install 'apache-airflow[aws]'

Amazon Web Services

azure

pip install 'apache-airflow[azure]'

Microsoft Azure

cassandra

pip install 'apache-airflow[cassandra]'

Cassandra related operators & hooks

celery

pip install 'apache-airflow[celery]'

CeleryExecutor

cgroups

pip install 'apache-airflow[cgroups]'

Needed To use CgroupTaskRunner

cloudant

pip install 'apache-airflow[cloudant]'

Cloudant hook

crypto

pip install 'apache-airflow[crypto]'

Encrypt connection passwords in metadata db

dask

pip install 'apache-airflow[dask]'

DaskExecutor

databricks

pip install 'apache-airflow[databricks]'

Databricks hooks and operators

datadog

pip install 'apache-airflow[datadog]'

Datadog hooks and sensors

devel

pip install 'apache-airflow[devel]'

Minimum dev tools requirements

devel_hadoop

pip install 'apache-airflow[devel_hadoop]'

Airflow + dependencies on the Hadoop stack

doc

pip install 'apache-airflow[doc]'

Packages needed to build docs

docker

pip install 'apache-airflow[docker]'

Docker hooks and operators

druid

pip install 'apache-airflow[druid]'

Druid related operators & hooks

elasticsearch

pip install 'apache-airflow[elasticsearch]'

Elastic Log Handler

gcp

pip install 'apache-airflow[gcp]'

Google Cloud Platform

github_enterprise

pip install 'apache-airflow[github_enterprise]'

GitHub Enterprise auth backend

google_auth

pip install 'apache-airflow[google_auth]'

Google auth backend

grpc

pip install 'apache-airflow[grpc]'

Grpc hooks and operators

hdfs

pip install 'apache-airflow[hdfs]'

HDFS hooks and operators

hive

pip install 'apache-airflow[hive]'

All Hive related operators

jdbc

pip install 'apache-airflow[jdbc]'

JDBC hooks and operators

jira

pip install 'apache-airflow[jira]'

Jira hooks and operators

kerberos

pip install 'apache-airflow[kerberos]'

Kerberos integration for Kerberized Hadoop

kubernetes

pip install 'apache-airflow[kubernetes]'

Kubernetes Executor and operator

ldap

pip install 'apache-airflow[ldap]'

LDAP authentication for users

mongo

pip install 'apache-airflow[mongo]'

Mongo hooks and operators

mssql

pip install 'apache-airflow[mssql]'

Microsoft SQL Server operators and hook, support as an Airflow backend

mysql

pip install 'apache-airflow[mysql]'

MySQL operators and hook, support as an Airflow backend. The version of MySQL server has to be 5.6.4+. The exact version upper bound depends on version of mysqlclient package. For example, mysqlclient 1.3.12 can only be used with MySQL server 5.6.4 through 5.7.

oracle

pip install 'apache-airflow[oracle]'

Oracle hooks and operators

papermill

pip install 'apache-airflow[papermill]'

Papermill hooks and operators

password

pip install 'apache-airflow[password]'

Password authentication for users

pinot

pip install 'apache-airflow[pinot]'

Pinot DB hook

postgres

pip install 'apache-airflow[postgres]'

PostgreSQL operators and hook, support as an Airflow backend

qds

pip install 'apache-airflow[qds]'

Enable QDS (Qubole Data Service) support

rabbitmq

pip install 'apache-airflow[rabbitmq]'

RabbitMQ support as a Celery backend

redis

pip install 'apache-airflow[redis]'

Redis hooks and sensors

salesforce

pip install 'apache-airflow[salesforce]'

Salesforce hook

samba

pip install 'apache-airflow[samba]'

airflow.operators.hive_to_samba_operator.Hive2SambaOperator

sendgrid

pip install 'apache-airflow[sendgrid]'

Send email using sendgrid

segment

pip install 'apache-airflow[segment]'

Segment hooks and sensors

slack

pip install 'apache-airflow[slack]'

airflow.operators.slack_operator.SlackAPIOperator

snowflake

pip install 'apache-airflow[snowflake]'

Snowflake hooks and operators

ssh

pip install 'apache-airflow[ssh]'

SSH hooks and Operator

statsd

pip install 'apache-airflow[statsd]'

Needed by StatsD metrics

vertica

pip install 'apache-airflow[vertica]'

Vertica hook support as an Airflow backend

webhdfs

pip install 'apache-airflow[webhdfs]'

HDFS hooks and operators

winrm

pip install 'apache-airflow[winrm]'

WinRM hooks and operators

Initiating Airflow Database

Airflow requires a database to be initiated before you can run tasks. If you’re just experimenting and learning Airflow, you can stick with the default SQLite option. If you don’t want to use SQLite, then take a look at Initializing a Database Backend to setup a different database.

After configuration, you’ll need to initialize the database before you can run tasks:

airflow db init