Table of Contents
What is Apache Airflow
Apache Airflow is a web application for the construction, management, and monitoring of data pipelines. It’s built on top of Apache Kafka and Hadoop YARN. With Airflow it’s possible to model complex workflows automatically with little effort. The number one way to describe Apache Airflow might be as a “data orchestration tool”. With it, users can create pipelines of data operations (such as moving data from input to output) that are executed against external systems with semiautomatic work scheduling.
So, what exactly is a “data pipeline”?
A data pipeline is usually composed of the following elements:
Sources // example: reading data from a database or another service. This can be an external system or just local files on your computer.
Transformations // For example, transforming data from one format to another, or aggregating raw data into more useful forms. These can also be SQL queries or any arbitrary Python code.

What is NiFi
Nifi is a framework for building and deploying applications in the cloud. It’s an open source Java application framework for cloud-focused languages. Nifi is built on top of a core set of libraries for cloud functions (e.g., Amazon EC2, Heroku Postgres). Nifi applications are packaged as deployed to Heroku or Amazon Web Services, and use the AWS SDK for Java and/or the cloud-specific AWS library. Nifi has a RESTful development model, so it’s easy to deploy code that uses HTTP endpoints, JSON actions and JSON responses.
Nifi application build and deployment system is designed to support a single build step, which runs on your local machine. It creates an infrastructure package (an artifact with the .war extension), uploads it to Heroku or Amazon Web Services, creates AWS “pipelines” (endpoints) that invoke Nifi functions, and invokes the Nifi functions in the pipeline.
The default deploy process makes two simple assumptions:
* You have a Heroku account and know how to push code to Heroku.
* You have an Amazon Web Services account and know how to push code to an S3 bucket.
You can deploy Nifi to any supported cloud infrastructure in a similar fashion, including standalone servers (with or without Docker), or other hosting services such as Hadoop distribution hosts and Apache Mesos.

Which one should you use
You should use Nifi if you’re using AWS or Heroku and want to build applications or data pipelines. You should use Airflow if you’re using a third-party server provider and want to create a data pipeline. If you need to use a programming language other than Java, you should use Nifi. If you’re using an application builder like Maven or Gradle and want to package your project as a cloud service, Nifi is the better choice. If you’re already using Apache Airflow, you don’t need a new data pipeline framework.
Slides
Slides can be downloaded here: [link]
Twitter: @jenniferdolton
blog: http://blog.jennidemidolton.com/
twitter: @jennidolton_d
job: http://wipswitch.com/jenny.dolton
twitter: @doltonjenny
github: https://github.com/jennidolton
This page is under construction, I am leaving comments in the post until it is complete. Please be patient. I hope to have the complete information up soon, just need more time to do it all.
If you would like to leave a comment on this page, please tap the “comments” button below. Your comment will be moderated and I will have the ability to approve or deny your comment. All comments are visible by anyone on this page when posted.
Pros and cons of each tool
Airflow : – It is open source. – It has a large user base. – It has a very active community that constantly works to improve the platform and add new features. Nifi : – It is open source. – It’s fast (because it’s written in Java). – It has a large user base. – It’s backed by AWS, which provides a lot of documentation and support for it. – It works great with AWS services. – It has a very active community that constantly works to improve the platform and add new features.
Airflow : – Its documentation is not as good as NiFi’s, but it’s still pretty good. – It’s not backed by AWS, so you won’t get as much support from AWS when working with it. – It doesn’t work well with some AWS services (such as Lambda). Nifi : – Its user base is smaller than Airflows. – It’s not as fast as Airflow, and is written in Java, which means that it will run slower than Airflow. – It doesn’t have a good mechanism for handling retries, which can be a problem if a Lambda function times out or there is an error communicating with AWS API Gateway. – Its documentation could be better.
