XML data processing with Apache NiFi

Notelook at the new features in NiFi 1.7+ about XML processing in this post

I recently had to work on a NiFi workflow to process millions of XML documents per day. One of the step being the conversion of the XML data into JSON. It raises the question of the performances and I will briefly expose my observations in this post.

The two most natural approaches to convert XML data with Apache NiFi are:

  • Use the TransformXML processor with a XSLT file
  • Use a scripted processor or use a custom Java processor relying on a library

There are few XSLT available on the internet providing a generic way to transform any XML into a JSON document. That’s really convenient and easy to use. However, depending of your use case, you might need specific features.

In my case, I’m processing a lot of XML files based on the same input schema (XSD) and I want the output to be compliant to the same Avro schema (in order to use the record-oriented processors in NiFi). The main issue is to force the generation of an array when you only have one single element in your input.

XSLT approach

Example #1:

<MyDocument>
  <MyList>
    <MyElement>
      <Text>Some text...</Text>
      <RecordID>1</RecordID>
    </MyElement>
    <MyElement>
      <Text>Some text...</Text>
      <RecordID>1</RecordID>
    </MyElement>
  </MyList>
</MyDocument>

This XML document will be converted into the following JSON:

{
   "MyDocument" : {
     "MyList" : {
       "MyElement" : [ {
           "Text" : "Some text...",
           "RecordID" : 1
         }, {
           "Text" : "Some text...",
           "RecordID" : 2
         } ]
      }
   }
}

Example #2:

However, if you have the following XML document:

<MyDocument>
  <MyList>
    <MyElement>
      <Text>Some text...</Text>
      <RecordID>1</RecordID>
    </MyElement>
  </MyList>
</MyDocument>

The document will be converted into:

{
  "MyDocument" : {
    "MyList" : {
      "MyElement" : {
        "Text" : "Some text...",
        "RecordID" : 1
      }
    }
  }
}

Force array

And here start the problems… because we don’t have the same Avro schema. That is why I recommend using the XSLT file provided by Bram Stein here on Github. It provides a way to force the creation of an array. To do that, you need to insert a tag into your XML input file. The tag to insert is

json:force-array="true"

But for this tag to be correctly interpreted, you also need to specify the corresponding namespace:

xmlns:json="http://json.org/"

In the end, using ReplaceText processors with regular expressions, you need to have the following input (for the example #2):

<MyDocument xmlns:json="http://json.org/">
  <MyList>
    <MyElement json:force-array="true">
      <Text>Some text...</Text>
      <RecordID>1</RecordID>
    </MyElement>
  </MyList>
</MyDocument>

And this will give you:

{
  "MyDocument" : {
    "MyList" : {
      "MyElement" : [ {
        "Text" : "Some text...",
        "RecordID" : 1
      } ]
    }
  }
}

And now I do have the same schema describing my JSON documents. Conclusion: you need to use regular expressions to add a namespace in the first tag of your document and add the JSON array tag in every tag wrapping elements that should be part of an array.

Java approach

Now, let’s assume you’re not afraid about using scripted processors or developing your own custom processor. Then it’s really easy to have a processor doing the same using a Java library like org.json (note that library is *NOT* Apache friendly in terms of licensing and that’s why the following code cannot be released with Apache NiFi). Here is an example of custom processor doing the conversion. And here is a Groovy version for the ExecuteScript processor.

What about arrays with this solution? Guess what… It’s kind of similar: you have to use a ReplaceText processor before and after to ensure that arrays are arrays in the JSON output for any number of elements in your input. Also, you might have to do some other transformations like removing the namespaces or replacing empty strings

""

by

null

values (by default, everything will be converted to an empty string although you might want null record instead).

To force arrays, the easiest approach is to double every tag that should be converted into an array. With the example #2, I transform my input to have:

<MyDocument>
  <MyList>
    <MyElement /><MyElement>
      <Text>Some text...</Text>
      <RecordID>1</RecordID>
    </MyElement>
  </MyList>
</MyDocument>

It’ll give me the following JSON:

{
  "MyDocument" : {
    "MyList" : {
      "MyElement" : [ "", {
        "Text" : "Some text...",
        "RecordID" : 1
      } ]
    }
  }
}

And, then, I can use another ReplaceText processor to remove the unwanted empty strings created by the conversion.

Conclusion: with the two approaches you’ll need to be a bit intrusive in your data to get the expected results. What about the performances now?

Benchmark

I remove the ReplaceText processors from the equation as I usually need the same amount of regular expressions work in both cases. I want to only focus on:

I’ll compare the performances of each case using input of different sizes (data generated using a GenerateFlowFile processor) with default configuration (one thread, no change on run duration, etc) on my laptop.

Method: I’m generating as much data as possible (it’s always the same file during a single run) using the GenerateFlowFile processor. I wait at least 5 minutes to have a constant rate of processing and I get the mean on a 5 minutes window of constant processing.

Screen Shot 2017-09-07 at 12.12.12 AM.png

For each run, I’m only running the GenerateFlowFile, one of the three processors I’m benchmarking, and the UpdateAttribute (used to only drop the data).

The input data used for the benchmark is a fairly complex XML document with arrays of arrays, lot of elements in the arrays, deeply nested records, etc. To reduce the size of the input size, I’m not changing the structure but only removing elements in the arrays. In other words: the schema describing the output data remains the same for each run.

Note that the custom Java/Groovy option is loading the full XML document in memory. To process very large XML document, a streaming approach with another library would certainly be better suited.

Here are the results with input data of 5KB, 10KB, 100KB, 500KB and 1000KB. The below graph gives the number of XML files processed per second based on the input size for each solution.

Screen Shot 2017-09-07 at 10.16.45 PM

It’s clear that the custom Java processor is the most efficient one. The XSLT option is really nice when you want to do very specific transformations but it can quickly get slow. Using a generic XSLT file for XML to JSON transformation is easy and convenient but won’t be the most efficient option.

We can also notice that the Groovy option is a little bit less efficient than the Java one, but that’s expected. Nevertheless, the Groovy option provides pretty good performances and does not require building and compiling a custom processor: everything can be done directly from the NiFi UI.

To improve the performances, it’s then possible to play with the “run duration” parameter and increase the number of concurrent tasks. Actually it’s quite easy to reach the I/O limitations of the disks. Using a NiFi cluster and multiple disks for the content repository, it’s really easy to process hundreds of millions of XML documents per day.

If we display the performance ratio based on the file size between the XSLT solution and the Java based solution, we have:

Screen Shot 2017-09-07 at 10.28.46 PM

We can see that with very small files, the processing using Java-based processor is about 13x more efficient than the XSLT approach. But with files over 100KB, the Java solution is about 26x more efficient. That’s because the NiFi framework is doing few things before and after a flow file has been processed. When processing thousands of flow files per second it creates a small overhead that explains the difference.

XML Record Reader

Since few versions, Apache NiFi contains record-oriented processors. It provides very powerful means to process record-oriented data. In particular, it allows users to process batches of data instead of a “per-file” processing. This provides a very robust and high rate processing. While I’m writing this post there is no reader for XML data yet. However there is a JIRA for it and it would provide few interesting features:

  • By using a schema describing the XML data, it’d remove the need to use ReplaceText processors to handle the “array problem”.
  • It’d give the possibility to merge XML documents together to process much more data at once providing even better performances.

This effort can be tracked under NIFI-4366.

As usual, feel free to post any comment/question/feedback.

https://gist.github.com/pvillard31/408c6ba3a9b53880c751a35cffa9ccea.js

20 thoughts on “XML data processing with Apache NiFi

  1. Hi, We are also trying to convert XML data into Avro using XSDs. We get multiple XML message types based on different XSD definitions. Is there a way we can transform XML data using XSDs saved in the schema registry without writing XSLT for each message type? TransformXML processor needs custom XSLTs and EvaluateXPath needs the manual definition of attributes. Why can’t we apply XSDs to transform XML data in NiFI?

    Like

    • Hi Srini. With the upcoming version of NiFi 1.7.0 (should be released soon), there will be an XML reader & writer allowing you to use the *Record processors with XML data assuming you can provide the Avro schema corresponding to your data. That will be much more easier and efficient (you can already use it if you build the master branch). There is no way, at the moment, to use XSDs but that would be a nice improvement. Feel free to file a JIRA on the NiFi project (https://issues.apache.org/jira/projects/NIFI).

      Like

  2. Hi, I am working on transforming XML and load the values into Database table. How can i do that? Any idea/suggestion/articles will be greatly appreciated. Thank you.

    Like

    • Hi, easiest way is to use NiFi 1.7.0 (to be released tomorrow) that will contain a XML reader/writer allowing you to use the Record processors. In particular, you’ll be able to use PutDatabaseRecord processor in combination with the XML reader to read the data and send the values into a database. Obviously, if you have complex structures, you might need to use additional record processors to transform your data first.

      Liked by 1 person

      • hi, @ pvillard31, Thanks for your reply. It means nifi 1.7.0 is coming with lots of processors which will make complex tasks easy. Really eagerly waiting for the new release. πŸ™‚ thank you.

        Like

  3. Thanks, Pierre for the update. It is critical for our project and eagerly waiting to try it. It is a much-needed thing for real-time processing with XML event message data.
    Best Regards,
    Srini Alavala

    Like

    • Just to be clear – there is no issue at all for processing XML data at the moment using the approaches described in this post. I’ve successfully implemented workflows processing millions of XML files per day and it’s working completely fine in production environments. My comment regarding NiFi 1.7.0 and the XML reader/writer is just that things will be much more easier and workflows will require less processors to achieve the same goal. Nevertheless, everything can already be done with versions below 1.7.0.

      Like

    • Hi pvillard,
      I am using NiFi 1.9.2 at Docker environment and can not find XML Reader/Writer from the processor list. But from your NiFi 1.7x, they should be available. Do you know the reason? Are not they available now?

      Like

  4. Hi
    I decided to create a generic solution for converting XML files into table based on an XSD file.
    My code has limitation and do not handle all XSD styles but if you use generic styling all works well.

    The concept is simple, I process the XSD to identify all branches that requires its own table, if it does not its elements and attributes will be part of the parent table.

    You can read my article series and access the code on GitHub
    http://max.bback.se/index.php/2018/06/30/xml-to-tables-csv-with-nifi-and-groovy-part-2-of-2/
    https://github.com/maxbback/nifi-xml

    /Max

    Like

  5. Hello,
    My NiFi is Version 1.7.0.3.2.0.0-520.
    I have some problem with TransformXML processor. I have sample XML like this:

    one

    one
    two
    three

    When i used ListFile processor to search new file in my directory and send from this processor xml file to TransformXML processor i have error:
    I used library and settings from https://github.com/bramstein/xsltjson

    2018-11-02 12:17:54,958 ERROR [Timer-Driven Process Thread-20] o.a.n.processors.standard.TransformXml TransformXml[id=9ae1b1ab-0166-1000-ffff-ffffabc6df1a] Unable to transform StandardFlowFileRecord[uuid=a60d9978-068f-4025-8ffd-97aea1bf9165,claim=,offset=0,name=n1.xml,size=0] due to org.apache.nifi.processor.exception.ProcessException: IOException thrown from TransformXml[id=9ae1b1ab-0166-1000-ffff-ffffabc6df1a]: java.io.IOException: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.: org.apache.nifi.processor.exception.ProcessException: IOException thrown from TransformXml[id=9ae1b1ab-0166-1000-ffff-ffffabc6df1a]: java.io.IOException: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
    org.apache.nifi.processor.exception.ProcessException: IOException thrown from TransformXml[id=9ae1b1ab-0166-1000-ffff-ffffabc6df1a]: java.io.IOException: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2906)
    at org.apache.nifi.processors.standard.TransformXml.onTrigger(TransformXml.java:236)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.IOException: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
    at org.apache.nifi.processors.standard.TransformXml$2.process(TransformXml.java:263)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2885)
    … 12 common frames omitted
    Caused by: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:460)
    at net.sf.saxon.event.Sender.send(Sender.java:171)
    at net.sf.saxon.Controller.transform(Controller.java:1692)
    at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:547)
    at net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:179)
    at org.apache.nifi.processors.standard.TransformXml$2.process(TransformXml.java:261)
    … 13 common frames omitted
    Caused by: org.xml.sax.SAXParseException: Premature end of file.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
    … 18 common frames omitted

    My TransformXML processor had set:
    XSLT file name /home/nifi/xsltjson/conf/xml-to-json.xsl

    What is wrong ?

    I have tried any other processor to XML like Validate and also have same problem.
    Caused by: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Premature end of file.

    Like

      • If I try run shell script, everything is ok.
        [nifi@4gt-nifi-m1 test]$ /usr/jdk64/jdk1.8.0_112/bin/java -jar ../xsltjson/lib/saxon/saxon9.jar n1.xml ../xsltjson/conf/xml-to-json.xsl
        {“root”:{“set”:[{“record”:”one”},{“record”:[“one”,”two”,”three”]}]}}

        Like

      • Yes, of course. I was created public gists. Links to them is below: Input xml: .gist table { margin-bottom: 0; } This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters <?xml version="1.0" encoding="utf-8"?> <root xmlns:json="http://json.org/"> <set> <record>one</record> </set> <set> <record>one</record> <record>two</record> <record>three</record> </set> </root> view raw n1.xml hosted with ❤ by GitHub .gist table { margin-bottom: 0; } This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters <Log> <Transaction> <StoreID>240041</StoreID> </Transaction> </Log> view raw r.xml hosted with ❤ by GitHub Xslt: .gist table { margin-bottom: 0; } This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:json="http://json.org/"> <xsl:output indent="no" omit-xml-declaration="yes" method="text" encoding="utf-8"/> <xsl:strip-space elements="*"/> <!– XSLTJSON v1.0.93. You can use these parameters to control the output by supplying them to stylesheet. Consult the manual of your XSLT processor for instructions on how to pass parameters to a stylesheet. * debug – Enable or disable the output of the temporary XML tree used to generate JSON output. * use-rabbitfish – Output basic JSON with a '@' to indicate XML attributes. * use-badgerfish – Use the BadgerFish (http://badgerfish.ning.com/) convention to output JSON without XML namespaces. * use-rayfish – Use the RayFish (http://onperl.org/blog/onperl/page/rayfish) convention to output JSON without XML namespaces. * use-namespaces – Output XML namespaces according to the BadgerFish convention. * skip-root – Skip the root XML element. * jsonp – Enable JSONP; the JSON output will be prepended with the value of the jsonp parameter and wrapped in parentheses. Credits: Chick Markley (chick@diglib.org) – Octal number & numbers with terminating period. Torben Schreiter (Torben.Schreiter@inubit.com) – Suggestions for skip root and node list. Michael Nilsson – Bug report and unit tests for json:force-array feature. Frank Schwichtenberg – Namespace prefix name bug. Wilson Cheung – Bug report and fix for invalid number serialization. Danny Cohn – Bug report and fix for invalid floating point number serialization. Copyright: 2006-2014, Bram Stein Licensed under the new BSD License. All rights reserved. –> <xsl:param name="debug" as="xs:boolean" select="false()"/> <xsl:param name="use-rabbitfish" as="xs:boolean" select="false()"/> <xsl:param name="use-badgerfish" as="xs:boolean" select="false()"/> <xsl:param name="use-namespaces" as="xs:boolean" select="false()"/> <xsl:param name="use-rayfish" as="xs:boolean" select="false()"/> <xsl:param name="jsonp" as="xs:string" select="''"/> <xsl:param name="skip-root" as="xs:boolean" select="false()"/> <!– If you import or include the stylesheet in your own stylesheet you can use this function to transform any XML node to JSON. –> <xsl:function name="json:generate" as="xs:string"> <xsl:param name="input" as="node()"/> <xsl:variable name="json-tree"> <json:object> <xsl:copy-of select="if (not($use-rayfish)) then json:create-node($input, false()) else json:create-simple-node($input)"/> </json:object> </xsl:variable> <xsl:variable name="json-mtree"> <xsl:choose> <xsl:when test="$skip-root"> <xsl:copy-of select="$json-tree/json:object/json:member/json:value/child::node()"/> </xsl:when> <xsl:otherwise> <xsl:copy-of select="$json-tree"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="output"> <xsl:choose> <xsl:when test="normalize-space($jsonp)"> <xsl:value-of select="$jsonp"/><xsl:text>(</xsl:text><xsl:apply-templates select="$json-mtree" mode="json"/><xsl:text>)</xsl:text> </xsl:when> <xsl:otherwise> <xsl:text/><xsl:apply-templates select="$json-mtree" mode="json"/><xsl:text/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:sequence select="$output"/> </xsl:function> <!– Template to match the root node so that the stylesheet can also be used on the command line. –> <xsl:template match="/*"> <xsl:choose> <xsl:when test="$debug"> <xsl:variable name="json-tree"> <json:object> <xsl:copy-of select="if (not($use-rayfish)) then json:create-node(., false()) else json:create-simple-node(.)"/> </json:object> </xsl:variable> <debug> <xsl:copy-of select="$json-tree"/> </debug> <xsl:apply-templates select="$json-tree" mode="json"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="json:generate(.)"/> </xsl:otherwise> </xsl:choose> </xsl:template> <!– All methods below are private methods and should not be used standalone. –> <xsl:template name="json:build-tree"> <xsl:param name="input" as="node()"/> <json:object> <xsl:copy-of select="if (not($use-rayfish)) then json:create-node($input, false()) else json:create-simple-node($input/child::node())"/> </json:object> </xsl:template> <xsl:function name="json:create-simple-node-member" as="node()"> <xsl:param name="type" as="xs:string"/> <xsl:param name="value"/> <json:member> <json:name><xsl:value-of select="$type"/></json:name> <json:value><xsl:copy-of select="$value"/></json:value> </json:member> </xsl:function> <xsl:function name="json:create-simple-node" as="node()*"> <xsl:param name="node" as="node()"/> <xsl:copy-of select="json:create-simple-node-member('#name', $node/local-name())"/> <xsl:copy-of select="json:create-simple-node-member('#text', $node/child::text())"/> <xsl:variable name="empty-array"> <json:array/> </xsl:variable> <xsl:variable name="children"> <json:array> <xsl:for-each select="$node/@*"> <json:array-value> <json:value> <json:object> <xsl:copy-of select="json:create-simple-node-member('#name', concat('@',./local-name()))"/> <xsl:copy-of select="json:create-simple-node-member('#text', string(.))"/> <xsl:copy-of select="json:create-simple-node-member('#children', $empty-array)"/> </json:object> </json:value> </json:array-value> </xsl:for-each> <xsl:for-each select="$node/child::element()"> <json:array-value> <json:value> <json:object> <xsl:copy-of select="json:create-simple-node(.)"/> </json:object> </json:value> </json:array-value> </xsl:for-each> </json:array> </xsl:variable> <xsl:copy-of select="json:create-simple-node-member('#children', $children)"/> </xsl:function> <xsl:function name="json:create-node" as="node()"> <xsl:param name="node" as="node()"/> <xsl:param name="in-array" as="xs:boolean"/> <xsl:choose> <xsl:when test="$in-array"> <json:array-value> <json:value> <xsl:copy-of select="json:create-children($node)"/> </json:value> </json:array-value> </xsl:when> <xsl:otherwise> <json:member> <xsl:copy-of select="json:create-string($node)"/> <json:value> <xsl:copy-of select="json:create-children($node)"/> </json:value> </json:member> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:function name="json:create-children"> <xsl:param name="node" as="node()"/> <xsl:choose> <xsl:when test="exists($node/child::text()) and count($node/child::node()) eq 1"> <xsl:choose> <xsl:when test="(count($node/namespace::*) gt 0 and $use-namespaces) or count($node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]) gt 0"> <json:object> <xsl:copy-of select="json:create-namespaces($node)"/> <xsl:copy-of select="json:create-attributes($node)"/> <json:member> <json:name>$</json:name> <json:value><xsl:value-of select="$node"/></json:value> </json:member> </json:object> </xsl:when> <xsl:otherwise> <xsl:copy-of select="json:create-text-value($node)"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test="exists($node/child::text())"> <xsl:choose> <xsl:when test="(count($node/namespace::*) gt 0 and $use-namespaces) or count($node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]) gt 0"> <json:object> <xsl:copy-of select="json:create-namespaces($node)"/> <xsl:copy-of select="json:create-attributes($node)"/> <json:member> <json:name>$</json:name> <json:value> <xsl:copy-of select="json:create-mixed-array($node)"/> </json:value> </json:member> </json:object> </xsl:when> <xsl:otherwise> <xsl:copy-of select="json:create-mixed-array($node)"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test="exists($node/child::node()) or ((count($node/namespace::*) gt 0 and $use-namespaces) or count($node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]) gt 0)"> <json:object> <xsl:copy-of select="json:create-namespaces($node)"/> <xsl:copy-of select="json:create-attributes($node)"/> <xsl:for-each-group select="$node/child::node()" group-adjacent="local-name()"> <xsl:choose> <xsl:when test="count(current-group()) eq 1 and (not(exists(./@json:force-array)) or ./@json:force-array eq 'false')"> <xsl:copy-of select="json:create-node(current-group()[1], false())"/> </xsl:when> <xsl:otherwise> <json:member> <json:name><xsl:value-of select="if($use-namespaces) then current-group()[1]/name() else current-group()[1]/local-name()"/></json:name> <json:value> <json:array> <xsl:for-each select="current-group()"> <xsl:copy-of select="json:create-node(.,true())"/> </xsl:for-each> </json:array> </json:value> </json:member> </xsl:otherwise> </xsl:choose> </xsl:for-each-group> </json:object> </xsl:when> </xsl:choose> </xsl:function> <xsl:function name="json:create-mixed-array" as="node()"> <xsl:param name="node" as="node()"/> <json:array> <xsl:for-each select="$node/child::node()"> <json:array-value> <json:value> <xsl:choose> <xsl:when test="self::text()"> <xsl:copy-of select="json:create-text-value(.)"/> </xsl:when> <xsl:otherwise> <json:object> <xsl:copy-of select="json:create-node(.,false())"/> </json:object> </xsl:otherwise> </xsl:choose> </json:value> </json:array-value> </xsl:for-each> </json:array> </xsl:function> <xsl:function name="json:create-text-value" as="node()"> <xsl:param name="node" as="node()"/> <xsl:choose> <xsl:when test="$use-badgerfish"> <json:object> <json:member> <json:name>$</json:name> <json:value> <xsl:value-of select="$node"/> </json:value> </json:member> </json:object> </xsl:when> <xsl:otherwise> <xsl:value-of select="$node"/> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:function name="json:create-string" as="node()"> <xsl:param name="node" as="node()"/> <xsl:choose> <xsl:when test="$use-namespaces"> <json:name><xsl:value-of select="$node/name()"/></json:name> </xsl:when> <xsl:otherwise> <json:name><xsl:value-of select="$node/local-name()"/></json:name> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:function name="json:create-attributes" as="node()*"> <xsl:param name="node" as="node()"/> <xsl:for-each select="$node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]"> <json:member> <json:name><xsl:if test="$use-badgerfish or $use-rabbitfish">@</xsl:if><xsl:value-of select="if($use-namespaces) then name() else local-name()"/></json:name> <json:value><xsl:value-of select="."/></json:value> </json:member> </xsl:for-each> </xsl:function> <xsl:function name="json:create-namespaces" as="node()*"> <xsl:param name="node" as="node()"/> <xsl:if test="$use-namespaces"> <xsl:if test="count($node/namespace::*) gt 0"> <json:member> <json:name><xsl:if test="$use-badgerfish or $use-rabbitfish">@</xsl:if>xmlns</json:name> <json:value> <json:object> <xsl:for-each select="$node/namespace::*"> <json:member> <xsl:choose> <xsl:when test="local-name(.) eq ''"> <json:name>$</json:name> </xsl:when> <xsl:otherwise> <json:name><xsl:value-of select="local-name(.)"/></json:name> </xsl:otherwise> </xsl:choose> <json:value><xsl:value-of select="."/></json:value> </json:member> </xsl:for-each> </json:object> </json:value> </json:member> </xsl:if> </xsl:if> </xsl:function> <!– These are output functions that transform the temporary tree to JSON. –> <xsl:template match="json:parameter" mode="json"> <xsl:variable name="parameters"><xsl:apply-templates mode="json"/></xsl:variable> <xsl:value-of select="string-join($parameters/parameter, ', ')"/> </xsl:template> <xsl:template match="json:object" mode="json"> <xsl:variable name="members"><xsl:apply-templates mode="json"/></xsl:variable> <parameter> <xsl:text/>{<xsl:text/> <xsl:value-of select="string-join($members/member,',')"/> <xsl:text/>}<xsl:text/> </parameter> </xsl:template> <xsl:template match="json:member" mode="json"> <xsl:text/><member><xsl:apply-templates mode="json"/></member><xsl:text/> </xsl:template> <xsl:function name="json:encode-string" as="xs:string"> <xsl:param name="string" as="xs:string"/> <xsl:sequence select="replace( replace( replace( replace( replace( replace( replace( replace( replace($string, '\\','\\\\'), '/', '\\/'), '&quot;', '\\&quot;'), ' ','\\n'), ' ','\\r'), ' ','\\t'), 'n','\\n'), 'r','\\r'), 't','\\t')"/> </xsl:function> <xsl:template match="json:name" mode="json"> <xsl:text/>"<xsl:value-of select="json:encode-string(.)"/>":<xsl:text/> </xsl:template> <xsl:template match="json:value" mode="json"> <xsl:choose> <xsl:when test="node() and not(text())"> <xsl:apply-templates mode="json"/> </xsl:when> <xsl:when test="text()"> <xsl:choose> <!– A value is considered a string if the following conditions are met: * There is whitespace/formatting around the value of the node. * The value is not a valid JSON number (i.e. '01', '+1', '1.', and '.5' are not valid JSON numbers.) * The value does not equal the any of the following strings: 'false', 'true', 'null'. –> <xsl:when test="normalize-space(.) ne . or not((string(.) castable as xs:integer and not(starts-with(string(.),'+')) and not(starts-with(string(.),'0') and not(. = '0'))) or (string(.) castable as xs:decimal and not(starts-with(string(.),'+')) and not(starts-with(.,'-.')) and not(starts-with(.,'.')) and not(starts-with(.,'-0') and not(starts-with(.,'-0.'))) and not(ends-with(.,'.')) and not(starts-with(.,'0') and not(starts-with(.,'0.'))) )) and not(. = 'false') and not(. = 'true') and not(. = 'null')"> <xsl:text/>"<xsl:value-of select="json:encode-string(.)"/>"<xsl:text/> </xsl:when> <xsl:otherwise> <xsl:text/><xsl:value-of select="."/><xsl:text/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:text/>null<xsl:text/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="json:array-value" mode="json"> <xsl:text/><value><xsl:apply-templates mode="json"/></value><xsl:text/> </xsl:template> <xsl:template match="json:array" mode="json"> <xsl:variable name="values"> <xsl:apply-templates mode="json"/> </xsl:variable> <xsl:text/>[<xsl:text/> <xsl:value-of select="string-join($values/value,',')"/> <xsl:text/>]<xsl:text/> </xsl:template> </xsl:stylesheet> [nifi@4gt-nifi-m1 conf]$ cat xml-to-json.xsl <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:json="http://json.org/"> <xsl:output indent="no" omit-xml-declaration="yes" method="text" encoding="utf-8"/> <xsl:strip-space elements="*"/> <!– XSLTJSON v1.0.93. You can use these parameters to control the output by supplying them to stylesheet. Consult the manual of your XSLT processor for instructions on how to pass parameters to a stylesheet. * debug – Enable or disable the output of the temporary XML tree used to generate JSON output. * use-rabbitfish – Output basic JSON with a '@' to indicate XML attributes. * use-badgerfish – Use the BadgerFish (http://badgerfish.ning.com/) convention to output JSON without XML namespaces. * use-rayfish – Use the RayFish (http://onperl.org/blog/onperl/page/rayfish) convention to output JSON without XML namespaces. * use-namespaces – Output XML namespaces according to the BadgerFish convention. * skip-root – Skip the root XML element. * jsonp – Enable JSONP; the JSON output will be prepended with the value of the jsonp parameter and wrapped in parentheses. Credits: Chick Markley (chick@diglib.org) – Octal number & numbers with terminating period. Torben Schreiter (Torben.Schreiter@inubit.com) – Suggestions for skip root and node list. Michael Nilsson – Bug report and unit tests for json:force-array feature. Frank Schwichtenberg – Namespace prefix name bug. Wilson Cheung – Bug report and fix for invalid number serialization. Danny Cohn – Bug report and fix for invalid floating point number serialization. Copyright: 2006-2014, Bram Stein Licensed under the new BSD License. All rights reserved. –> <xsl:param name="debug" as="xs:boolean" select="false()"/> <xsl:param name="use-rabbitfish" as="xs:boolean" select="false()"/> <xsl:param name="use-badgerfish" as="xs:boolean" select="false()"/> <xsl:param name="use-namespaces" as="xs:boolean" select="false()"/> <xsl:param name="use-rayfish" as="xs:boolean" select="false()"/> <xsl:param name="jsonp" as="xs:string" select="''"/> <xsl:param name="skip-root" as="xs:boolean" select="false()"/> <!– If you import or include the stylesheet in your own stylesheet you can use this function to transform any XML node to JSON. –> <xsl:function name="json:generate" as="xs:string"> <xsl:param name="input" as="node()"/> <xsl:variable name="json-tree"> <json:object> <xsl:copy-of select="if (not($use-rayfish)) then json:create-node($input, false()) else json:create-simple-node($input)"/> </json:object> </xsl:variable> <xsl:variable name="json-mtree"> <xsl:choose> <xsl:when test="$skip-root"> <xsl:copy-of select="$json-tree/json:object/json:member/json:value/child::node()"/> </xsl:when> <xsl:otherwise> <xsl:copy-of select="$json-tree"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="output"> <xsl:choose> <xsl:when test="normalize-space($jsonp)"> <xsl:value-of select="$jsonp"/><xsl:text>(</xsl:text><xsl:apply-templates select="$json-mtree" mode="json"/><xsl:text>)</xsl:text> </xsl:when> <xsl:otherwise> <xsl:text/><xsl:apply-templates select="$json-mtree" mode="json"/><xsl:text/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:sequence select="$output"/> </xsl:function> <!– Template to match the root node so that the stylesheet can also be used on the command line. –> <xsl:template match="/*"> <xsl:choose> <xsl:when test="$debug"> <xsl:variable name="json-tree"> <json:object> <xsl:copy-of select="if (not($use-rayfish)) then json:create-node(., false()) else json:create-simple-node(.)"/> </json:object> </xsl:variable> <debug> <xsl:copy-of select="$json-tree"/> </debug> <xsl:apply-templates select="$json-tree" mode="json"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="json:generate(.)"/> </xsl:otherwise> </xsl:choose> </xsl:template> <!– All methods below are private methods and should not be used standalone. –> <xsl:template name="json:build-tree"> <xsl:param name="input" as="node()"/> <json:object> <xsl:copy-of select="if (not($use-rayfish)) then json:create-node($input, false()) else json:create-simple-node($input/child::node())"/> </json:object> </xsl:template> <xsl:function name="json:create-simple-node-member" as="node()"> <xsl:param name="type" as="xs:string"/> <xsl:param name="value"/> <json:member> <json:name><xsl:value-of select="$type"/></json:name> <json:value><xsl:copy-of select="$value"/></json:value> </json:member> </xsl:function> <xsl:function name="json:create-simple-node" as="node()*"> <xsl:param name="node" as="node()"/> <xsl:copy-of select="json:create-simple-node-member('#name', $node/local-name())"/> <xsl:copy-of select="json:create-simple-node-member('#text', $node/child::text())"/> <xsl:variable name="empty-array"> <json:array/> </xsl:variable> <xsl:variable name="children"> <json:array> <xsl:for-each select="$node/@*"> <json:array-value> <json:value> <json:object> <xsl:copy-of select="json:create-simple-node-member('#name', concat('@',./local-name()))"/> <xsl:copy-of select="json:create-simple-node-member('#text', string(.))"/> <xsl:copy-of select="json:create-simple-node-member('#children', $empty-array)"/> </json:object> </json:value> </json:array-value> </xsl:for-each> <xsl:for-each select="$node/child::element()"> <json:array-value> <json:value> <json:object> <xsl:copy-of select="json:create-simple-node(.)"/> </json:object> </json:value> </json:array-value> </xsl:for-each> </json:array> </xsl:variable> <xsl:copy-of select="json:create-simple-node-member('#children', $children)"/> </xsl:function> <xsl:function name="json:create-node" as="node()"> <xsl:param name="node" as="node()"/> <xsl:param name="in-array" as="xs:boolean"/> <xsl:choose> <xsl:when test="$in-array"> <json:array-value> <json:value> <xsl:copy-of select="json:create-children($node)"/> </json:value> </json:array-value> </xsl:when> <xsl:otherwise> <json:member> <xsl:copy-of select="json:create-string($node)"/> <json:value> <xsl:copy-of select="json:create-children($node)"/> </json:value> </json:member> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:function name="json:create-children"> <xsl:param name="node" as="node()"/> <xsl:choose> <xsl:when test="exists($node/child::text()) and count($node/child::node()) eq 1"> <xsl:choose> <xsl:when test="(count($node/namespace::*) gt 0 and $use-namespaces) or count($node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]) gt 0"> <json:object> <xsl:copy-of select="json:create-namespaces($node)"/> <xsl:copy-of select="json:create-attributes($node)"/> <json:member> <json:name>$</json:name> <json:value><xsl:value-of select="$node"/></json:value> </json:member> </json:object> </xsl:when> <xsl:otherwise> <xsl:copy-of select="json:create-text-value($node)"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test="exists($node/child::text())"> <xsl:choose> <xsl:when test="(count($node/namespace::*) gt 0 and $use-namespaces) or count($node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]) gt 0"> <json:object> <xsl:copy-of select="json:create-namespaces($node)"/> <xsl:copy-of select="json:create-attributes($node)"/> <json:member> <json:name>$</json:name> <json:value> <xsl:copy-of select="json:create-mixed-array($node)"/> </json:value> </json:member> </json:object> </xsl:when> <xsl:otherwise> <xsl:copy-of select="json:create-mixed-array($node)"/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:when test="exists($node/child::node()) or ((count($node/namespace::*) gt 0 and $use-namespaces) or count($node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]) gt 0)"> <json:object> <xsl:copy-of select="json:create-namespaces($node)"/> <xsl:copy-of select="json:create-attributes($node)"/> <xsl:for-each-group select="$node/child::node()" group-adjacent="local-name()"> <xsl:choose> <xsl:when test="count(current-group()) eq 1 and (not(exists(./@json:force-array)) or ./@json:force-array eq 'false')"> <xsl:copy-of select="json:create-node(current-group()[1], false())"/> </xsl:when> <xsl:otherwise> <json:member> <json:name><xsl:value-of select="if($use-namespaces) then current-group()[1]/name() else current-group()[1]/local-name()"/></json:name> <json:value> <json:array> <xsl:for-each select="current-group()"> <xsl:copy-of select="json:create-node(.,true())"/> </xsl:for-each> </json:array> </json:value> </json:member> </xsl:otherwise> </xsl:choose> </xsl:for-each-group> </json:object> </xsl:when> </xsl:choose> </xsl:function> <xsl:function name="json:create-mixed-array" as="node()"> <xsl:param name="node" as="node()"/> <json:array> <xsl:for-each select="$node/child::node()"> <json:array-value> <json:value> <xsl:choose> <xsl:when test="self::text()"> <xsl:copy-of select="json:create-text-value(.)"/> </xsl:when> <xsl:otherwise> <json:object> <xsl:copy-of select="json:create-node(.,false())"/> </json:object> </xsl:otherwise> </xsl:choose> </json:value> </json:array-value> </xsl:for-each> </json:array> </xsl:function> <xsl:function name="json:create-text-value" as="node()"> <xsl:param name="node" as="node()"/> <xsl:choose> <xsl:when test="$use-badgerfish"> <json:object> <json:member> <json:name>$</json:name> <json:value> <xsl:value-of select="$node"/> </json:value> </json:member> </json:object> </xsl:when> <xsl:otherwise> <xsl:value-of select="$node"/> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:function name="json:create-string" as="node()"> <xsl:param name="node" as="node()"/> <xsl:choose> <xsl:when test="$use-namespaces"> <json:name><xsl:value-of select="$node/name()"/></json:name> </xsl:when> <xsl:otherwise> <json:name><xsl:value-of select="$node/local-name()"/></json:name> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:function name="json:create-attributes" as="node()*"> <xsl:param name="node" as="node()"/> <xsl:for-each select="$node/@*[not(../@json:force-array) or count(.|../@json:force-array)=2]"> <json:member> <json:name><xsl:if test="$use-badgerfish or $use-rabbitfish">@</xsl:if><xsl:value-of select="if($use-namespaces) then name() else local-name()"/></json:name> <json:value><xsl:value-of select="."/></json:value> </json:member> </xsl:for-each> </xsl:function> <xsl:function name="json:create-namespaces" as="node()*"> <xsl:param name="node" as="node()"/> <xsl:if test="$use-namespaces"> <xsl:if test="count($node/namespace::*) gt 0"> <json:member> <json:name><xsl:if test="$use-badgerfish or $use-rabbitfish">@</xsl:if>xmlns</json:name> <json:value> <json:object> <xsl:for-each select="$node/namespace::*"> <json:member> <xsl:choose> <xsl:when test="local-name(.) eq ''"> <json:name>$</json:name> </xsl:when> <xsl:otherwise> <json:name><xsl:value-of select="local-name(.)"/></json:name> </xsl:otherwise> </xsl:choose> <json:value><xsl:value-of select="."/></json:value> </json:member> </xsl:for-each> </json:object> </json:value> </json:member> </xsl:if> </xsl:if> </xsl:function> <!– These are output functions that transform the temporary tree to JSON. –> <xsl:template match="json:parameter" mode="json"> <xsl:variable name="parameters"><xsl:apply-templates mode="json"/></xsl:variable> <xsl:value-of select="string-join($parameters/parameter, ', ')"/> </xsl:template> <xsl:template match="json:object" mode="json"> <xsl:variable name="members"><xsl:apply-templates mode="json"/></xsl:variable> <parameter> <xsl:text/>{<xsl:text/> <xsl:value-of select="string-join($members/member,',')"/> <xsl:text/>}<xsl:text/> </parameter> </xsl:template> <xsl:template match="json:member" mode="json"> <xsl:text/><member><xsl:apply-templates mode="json"/></member><xsl:text/> </xsl:template> <xsl:function name="json:encode-string" as="xs:string"> <xsl:param name="string" as="xs:string"/> <xsl:sequence select="replace( replace( replace( replace( replace( replace( replace( replace( replace($string, '\\','\\\\'), '/', '\\/'), '&quot;', '\\&quot;'), ' ','\\n'), ' ','\\r'), ' ','\\t'), 'n','\\n'), 'r','\\r'), 't','\\t')"/> </xsl:function> <xsl:template match="json:name" mode="json"> <xsl:text/>"<xsl:value-of select="json:encode-string(.)"/>":<xsl:text/> </xsl:template> <xsl:template match="json:value" mode="json"> <xsl:choose> <xsl:when test="node() and not(text())"> <xsl:apply-templates mode="json"/> </xsl:when> <xsl:when test="text()"> <xsl:choose> <!– A value is considered a string if the following conditions are met: * There is whitespace/formatting around the value of the node. * The value is not a valid JSON number (i.e. '01', '+1', '1.', and '.5' are not valid JSON numbers.) * The value does not equal the any of the following strings: 'false', 'true', 'null'. –> <xsl:when test="normalize-space(.) ne . or not((string(.) castable as xs:integer and not(starts-with(string(.),'+')) and not(starts-with(string(.),'0') and not(. = '0'))) or (string(.) castable as xs:decimal and not(starts-with(string(.),'+')) and not(starts-with(.,'-.')) and not(starts-with(.,'.')) and not(starts-with(.,'-0') and not(starts-with(.,'-0.'))) and not(ends-with(.,'.')) and not(starts-with(.,'0') and not(starts-with(.,'0.'))) )) and not(. = 'false') and not(. = 'true') and not(. = 'null')"> <xsl:text/>"<xsl:value-of select="json:encode-string(.)"/>"<xsl:text/> </xsl:when> <xsl:otherwise> <xsl:text/><xsl:value-of select="."/><xsl:text/> </xsl:otherwise> </xsl:choose> </xsl:when> <xsl:otherwise> <xsl:text/>null<xsl:text/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="json:array-value" mode="json"> <xsl:text/><value><xsl:apply-templates mode="json"/></value><xsl:text/> </xsl:template> <xsl:template match="json:array" mode="json"> <xsl:variable name="values"> <xsl:apply-templates mode="json"/> </xsl:variable> <xsl:text/>[<xsl:text/> <xsl:value-of select="string-join($values/value,',')"/> <xsl:text/>]<xsl:text/> </xsl:template> </xsl:stylesheet> view raw xml-to-json.xsl hosted with ❤ by GitHub LikeLike
      • Ok. Finally i resolved my problem. I used FetchFile processor between ListFile and TransformXML processors. And now on exit from TransformXML i got JSON format πŸ™‚
        I think that FetchFile processor deliver content file to TransformXML while ListFile only file which is not correct read by XSLT
        Thanks for reply. Now I will continue design my project.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.