Secured NiFi cluster with Terraform on the Google Cloud Platform
All of my NiFi related content in one place
This story is a follow up of this previous story about deploying a single secured NiFi instance, configured with OIDC, using Terraform on the Google Cloud Platform. This time it’s about deploying a secured NiFi cluster.
In this story, we’ll use Terraform to quickly:
deploy a NiFi CA server as a convenient way to generate TLS certificates
deploy an external ZooKeeper instance to manage cluster coordination and state across the nodes
deploy X secured NiFi instances clustered together
configure NiFi to use OpenID connect for authentication
configure an HTTPS load balancer with Client IP affinity in front of the NiFi cluster
Note — I assume you have a domain that you own (you can get one with Google). It will be used to map a domain to the web interface exposed by the NiFi cluster. In this post, I use my own domain: pierrevillard.com and will map nifi.pierrevillard.com to my NiFi cluster.
Disclaimer — the below steps should not be used for a production deployment, it can definitely get you started but I’m just using the below to start a secured cluster (there is no configuration that one would expect for a production setup such as a clustered Zookeeper, disks for repositories, etc).
If you don’t want to read the story and want to get straight into the code,it’s right here!
What is Terraform?
Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.
Configuration files describe to Terraform the components needed to run a single application or your entire datacenter. Terraform generates an execution plan describing what it will do to reach the desired state, and then executes it to build the described infrastructure. As the configuration changes, Terraform is able to determine what changed and create incremental execution plans which can be applied.
The infrastructure Terraform can manage includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc.
What is NiFi?
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. In simpler words, Apache NiFi is a great tool to collect and move data around, process it, clean it and integrate it with other systems. As soon as you need to bring data in, you want to use Apache NiFi.
Why ZooKeeper?
Best is to refer to the documentation, but, in short… NiFi employs a Zero-Master Clustering paradigm. Each node in the cluster performs the same tasks on the data, but each operates on a different set of data. One of the nodes is automatically elected (via Apache ZooKeeper) as the Cluster Coordinator. All nodes in the cluster will then send heartbeat/status information to this node, and this node is responsible for disconnecting nodes that do not report any heartbeat status for some amount of time. Additionally, when a new node elects to join the cluster, the new node must first connect to the currently-elected Cluster Coordinator in order to obtain the most up-to-date flow.
OAuth Credentials
First step is to create the OAuth Credentials (at this moment, this cannot be done using Terraform).
Go in your GCP project, APIs & Services, Credentials.
Click on Create credentials, OAuth client ID. Select Web application.
Once the credentials are created, you will get a client ID and a client secret that you will need in the Terraform variables.
By creating the credentials, your domain will be automatically added to the list of the “Authorized domains” in the OAuth consent screen configuration. It protects you and your users by ensuring that OAuth authentication is only coming from authorized domains.
Download the NiFi binaries in Google Cloud Storage
In your GCP project, create a bucket in Google Cloud Storage. We are going to use the bucket to store the Apache NiFi & ZooKeeper binaries (instead of downloading directly from the Apache repositories at each deployment), and also as a way to retrieve the certificates that we’ll use for the HTTPS load balancer.
Note — you’ll need Apache ZooKeeper 3.5.5+.
You can download the binaries using the below links:
Note — you’ll need to use the NiFi Toolkit version 1.9.2
Deploy NiFi with Terraform
Once you have completed the above prerequisites, installing your NiFi cluster will only take few minutes. Open your Google Cloud Console in your GCP project and run: