Monitoring NiFi – Site2Site reporting tasks

Note – This article is part of a series discussing subjects around NiFi monitoring.

If we look at the development documentation about reporting tasks:

So far, we have mentioned little about how to convey to the outside world how NiFi and its components are performing. Is the system able to keep up with the incoming data rate? How much more can the system handle? How much data is processed at the peak time of day versus the least busy time of day?

In order to answer these questions, and many more, NiFi provides a capability for reporting status, statistics, metrics, and monitoring information to external services by means of the ReportingTask interface. ReportingTasks are given access to a host of information to determine how the system is performing.

Out of the box, you have quite a lot of available reporting tasks and, in this article, we are going to focus on few of them (have a look at the other articles to find more about the other reporting tasks).

Before going into the details, I am going to discuss Site To Site related reporting tasks and we need to understand what is Site To Site (S2S):

When sending data from one instance of NiFi to another, there are many different protocols that can be used. The preferred protocol, though, is the NiFi Site-to-Site Protocol. Site-to-Site makes it easy to securely and efficiently transfer data to/from nodes in one NiFi instance or data producing application to nodes in another NiFi instance or other consuming application.

In this case, we are going to use S2S reporting tasks, it means that the reporting tasks are continually running to collect data and to send this data to a remote NiFi instance, but you can use S2S to send data to the instance running the reporting task as well. This way, by using an input port on the canvas, you can actually receive the data generated by the reporting task and use it in a NiFi workflow. That’s really powerful because, now, you can use all the NiFi capabilities to process this data and do whatever you want with it.

Let’s go over some examples!

  • Monitoring bulletins

With the new version of Apache NiFi (1.2.0), you can transform every bulletin into a flow file sent to any NiFi instance using Site-To-Site. This way, as soon as a processor / controller service / reporting task is generating a bulletin this can be converted into a flow file that you can use to feed any system you want.

Let’s configure this reporting task to send the bulletins (as flow files) to an input port called “bulletinMonitoring” and use the flow files to send emails.

First, since my NiFi cluster is secured, I create a StandardSSLContextService in the Controller Services tab of the Controller Settings menu (this way, it can be used by reporting tasks).

Screen Shot 2017-05-10 at 12.28.54 PM.png

Screen Shot 2017-05-10 at 12.28.46 PM.png

Then, I can define my reporting task by adding a new SiteToSiteBulletinReportingTask:

Screen Shot 2017-05-10 at 12.30.55 PM.png

Screen Shot 2017-05-10 at 12.31.29 PM.png

Before starting the task, on my canvas, I have the following:

Screen Shot 2017-05-10 at 12.32.55 PM.png

Note – in a secured environment, you need to set the correct permissions on the components. You need to allow NiFi nodes to receive data via site-to-site on the input port and you also need to grant the correct permissions on the root process group so the nodes are able to see the component, view and modify the data.

I configured my PutEmail processor to send emails using the Gmail SMTP server:

Screen Shot 2017-05-10 at 12.45.13 PM.png

Now, as soon as bulletins are generated I’ll receive a notification by email containing my message with the attributes of the flow file, and there will the bulletins as attachment of my email.

Obviously, instead of sending emails, you could, for example, use some other processors to automatically open tickets in your ticketing system (like JIRA using REST API).

  • Monitoring disk space

Now, using the task we previously set, we can take advantage of the task monitoring disk space. This reporting task will generate warn logs (in the NiFi log file) and bulletins when the disk partition to monitor is used over a custom threshold. In case I want to monitor the Content Repository, I could configure my reporting task as below:

Screen Shot 2017-05-10 at 1.28.24 PM.png

Screen Shot 2017-05-10 at 1.28.34 PM.png

Using the combination of this Reporting Task and the SiteToSiteBulletinReportingTask, I’m able to generate flow files when the disk utilization is reaching a critical point and to receive notifications using all the processors I want.

  • Monitoring memory use

The same approach can be used to monitor the memory utilization within the NiFi JVM using the MonitorMemory reporting task. Have a look at the documentation of this reporting task here.

  • Monitoring back pressure on connections

There is also the SiteToSiteStatusReportingTask that will send details about the NiFi Controller status over Site-to-Site. This can be particularly useful to be notified when some processors are stopped, queues are full (and back pressure is enabled), or to build reports regarding NiFi performances. This reporting task will slightly be improved regarding back pressure (with NIFI-3791). In the meantime, if you want to receive notifications when back pressure is enabled on any connection, here is what you can do (assuming you know the back pressure thresholds):

Screen Shot 2017-05-10 at 2.07.44 PM.png

Screen Shot 2017-05-10 at 2.09.14 PM.png

Note that I configured the task to only send data about the connections but you can receive information for any kind of component.

And I use the following workflow: my input port to receive the flow files with the controller status (containing an array of JSON elements for all of my connections), then I split my array using SplitJson processor, then I use EvaluateJsonPath to extract as attributes the values queuedCount and queuedBytes and then I use a RouteOnAttribute processor to check if one of the two attributes I have is greater or equal than my thresholds, and if that’s the case I send the notification by email.

Screen Shot 2017-05-10 at 2.31.23 PM.png

My RouteOnAttribute configuration:

Screen Shot 2017-05-10 at 3.33.46 PM

Site to site reporting tasks are really useful and there are many ways to use the data they can send for monitoring purpose.

  • SiteToSiteProvenanceReportingTask

Note that you have also a Site2Site reporting task to send all the provenance events over S2S. This can be really useful if you want to send this information to external system.

While monitoring a dataflow, users often need a way to determine what happened to a particular data object (FlowFile). NiFi’s Data Provenance page provides that information. Because NiFi records and indexes data provenance details as objects flow through the system, users may perform searches, conduct troubleshooting and evaluate things like dataflow compliance and optimization in real time. By default, NiFi updates this information every five minutes, but that is configurable.

Besides, with NIFI-3859, this could also be used in a monitoring approach to only look for specific events according to custom parameters. It could be used, for instance, to check how long it takes for an event to go through NiFi and raise alerts in case an event took an unusual duration to be processed (have a look to the next article to see how this can be done differently).

As usual feel free to ask questions and comment this post.

16 thoughts on “Monitoring NiFi – Site2Site reporting tasks

  1. Thanks Pierre, excellent article as usual from you it’s very much appreciated they are always interesting and a pleasure to read. A quick question regarding your statement

    “In this case, we are going to use S2S reporting tasks, it means that the reporting tasks are continually running to collect data and to send this data to a remote NiFi instance, but you can use S2S to send data to the instance running the reporting task as well. This way, by using an input port on the canvas, you can actually receive the data generated by the reporting task and use it in a NiFi workflow. That’s really powerful because, now, you can use all the NiFi capabilities to process this data and do whatever you want with it.”

    I’m interpreting this as being able to use a single NiFi instance to talk to itself via the open S2S port and ingest reporting tasks running within it. Is this the correct interpretation of what you mean here or do I still need to configure a second NiFi instance before I can achieve this?

    With a single NiFi instance I always hit the following snag (sorry for the stack trace)…

    (single instance, unsecured)

    2017-08-04 21:15:45,561 INFO [Site-to-Site Worker Thread-0] o.a.nifi.remote.SocketRemoteSiteListener Received connection from localhost/127.0.0.1, User DN: null
    2017-08-04 21:15:45,563 ERROR [Site-to-Site Worker Thread-0] o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote instance null due to org.apache.nifi.remote.exception.HandshakeException: Handshake with nifi://localhost:39540 failed because the Magic Header was not present; closing connection

    I’m pretty confident my S2S config is correct

    # Site to Site properties
    122 nifi.remote.input.host=localhost
    123 nifi.remote.input.secure=false
    124 nifi.remote.input.socket.port=10000
    125 nifi.remote.input.http.enabled=true
    126 nifi.remote.input.http.transaction.ttl=30 sec
    127
    128 # web properties #
    129 nifi.web.war.directory=./lib
    130 nifi.web.http.host=localhost
    131 nifi.web.http.port=8080
    132 nifi.web.http.network.interface.default=
    133 nifi.web.https.host=
    134 nifi.web.https.port=

    Like

    • Hi Dwane,

      Your interpretation is correct, you can use the reporting task to send data to the NiFi instance running the task.

      What’s the configuration of the reporting task generating the error? What version of NiFi?

      Thanks!

      Like

  2. Ok thanks Pierre 🙂

    Apologies I’m running v1.2.0 but I hit the same issue when running v1.3.0 in both secure and unsecure mode however I’ve just managed to get it working thanks to your prompt. My mistake was specifying the “Destination URL” in the SiteToSiteBullitenReportingTask using the SiteToSite port number and not the NiFi http web interface port (or https for secure mode). I swear I tried that a numerous times but with switching between versions and adding and removing security I must not have been as diligent in my troubleshooting as I thought I was. Thanks for the quick reply and the great examples you put together on this blog it’s a tremendous resource and greatly appreciated.

    Liked by 1 person

  3. Pierre, whenever I try to edit receive data site to site/send data via site to site, the options are grayed out. Do you know what settings I need to change in order to be able to access these? I am logged in as the admin.

    Like

  4. I am trying to filter to just the disk related Bulletins. Version 1.3, the only items I see in the S2S Reporting Tasks, Component Type Regex are (Processor|ProcessGroup|RemoteProcessGroup|RootProcessGroup|Connection|InputPort|OutputPort), nothing about Disk.

    Liked by 1 person

    • Hi Steven. The idea is to use the MonitorDiskUsage reporting task that will raise bulletins when disk use is reaching the configured threshold. Then you’ll use the SiteToSiteBulletinReportingTask to get the bulletins as flow files. The reporting task you’re mentioning is to get controller status information. It’s useful if you want to receive information about the components of your workflows (hence the component types).

      Liked by 1 person

  5. Hi Pierre,

    thank you very much for the extremely valuable contents you share.

    I have a doubt in the very first step, while setting up the StandardSSLContextService. You mention that you have a secured cluster, which is exactly my case. Then, when I want to configure the Controller Service properties, I have some doubts. I may place all the keystore and truststore files in the same path and use the same filenames accross all servers, so that “Keysotre Filename” and “Truststore Filename” are valid along the cluster, but the passwords for them, at least it you use “NiFi Toolkit” to generate them, will be different for each server. How can I configure the StandardSSLContextService in that case?

    Thanks in advance!!

    Like

  6. Hi Pierre. Very interesting read, but I have a question regarding bulletin monitoring: How do you deal with more than 5 bulletins at a time? Imagine 100 logs fail and produce bulletins… You’ll lose most of the bulletins (of course the motive why they fail might be the same for all). Also, there’s a bug on the S2S bulletin reporter which will send a WARN when the board is empty which doesn’t really help.

    Like

    • Hi – which version are you using, we did fix some issues. For the bulletins, we only show the last 5 on the UI, but I believe we keep track of the bulletins generated over the last 5 minutes. So it should be OK if the reporting task is scheduled more frequently than every five minutes.

      Like

Leave a Reply to pvillard31 Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.