-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathghost_export.json
More file actions
1 lines (1 loc) · 275 KB
/
ghost_export.json
File metadata and controls
1 lines (1 loc) · 275 KB
1
{"meta":{"exported_on":1468438082,"version":"000"},"data":{"posts":[{"title":"Getting Started with Apache Nifi","slug":"getting-started-with-apache-nifi","markdown":"* Update April 8th 2015: The Vagrantfile has been updated to pull the release tag 0.0.2 for stability\n\n# Install options\n\n[Apache NiFi](https://nifi.apache.org) is currently in incubation and so does not have any releases, so to start we have to checkout the project and build the code base. A users guide is avaialble on the [nifi website](https://nifi.apache.org/quickstart.html) with requirements for building and running nifi, mainly java 7 and maven 3.0.5+. OpendJDK will work but currently the unit tests will not pass. A quick tldr; can be found below.\n\n## Using Vagrant\n\nIf you like to keep your project dependencies seperate like me, I've created a [Vagrantfile](https://gist.github.com/czobrisky/a1ed32d9429600f4e661#file-vagrantfile) that will get all dependencies and then build the project in the shared /vagrant/ folder. After building it will start nifi and you can point your browser to localhost:8080/nifi/ and skip to [Building a Simple Dataflow](#building-dataflow). The provisioning of the VM will take about 10-15 minutes depending on your hardware.\n\n<!-- more -->\n\n<script src=\"https://gist.github.com/czobrisky/a1ed32d9429600f4e661.js\"></script>\n\nIf you want to build it for yourself it normally takes 2-6 minutes depending on your system, with parrallel builds enabled.\nClone from Apache Nifi git repo:\n\n\tgit git clone http://git-wip-us.apache.org/repos/asf/nifi.git\n\tcd nifi\n\t\t\nOnce you have the source, first have to build the nar-maven-plugin.\n\n\tcd nifi-nar-maven-plugin\n\tmvn clean install\n\nNow build the rest of the source from the root directory.\n\n\tcd ../nifi\n\tmvn -T C2.0 clean install\n\nNiFi recently updated their build process so it now supports parrallel building of components using the -T maven arg to speed up the build.\n\nThe last step is to build the assembly and unpack the created tar.gz.\n\n\tcp nifi-assembly/target/nifi-*-SNAPSHOT-bin.tar.gz ~/nifi-example/\n\ttar -zxf nifi-*-SNAPSHOT-bin.tar.gz\n\nAfter unpacking, cd into the directory and start NiFi in the background using the nifi.sh script in the bin directory.\n\n\tcd nifi-*-SNAPSHOT\n\tbin/nifi.sh start\n\nNow open a browser and point to localhost:8080/nifi to view the NiFi flow UI.\n\n{% img /images/getting-started/nifi-home.png %}\n\n# <a name=\"building-dataflow\"></a>Building your first Apache Nifi dataflow\n\nThe homepage webpage for Apache NiFi is a grid with some options on the top of the page. This is where you build your data flows, the configuration of each processor and the relationship between processors. This is saved as a FlowFile and can be saved for later use or imported into another Apache NiFi instance. As an example, we'll fetch an XML document from an RSS feed and then based on an attribute and move it to a predefined directory.\n\nTo start, drag the processor icon, {% img /images/getting-started/processor.png %} in the menu bar down to the canvas below. Select the GetHTTP processor from the list, you can use the search bar on the right side of the Add Processor modal or the Tags on the left to filter the list.\n{% img /images/getting-started/add-processor-highlight.png %}\n\nRight clicking on the processor brings up a dropdown that allows you to configure the processor along with a few other options.\n{% img right /images/getting-started/processor-dropdown.png %}\n\nFrom this selection a \"Configure Processor\" modal opens. Most default settinsg are fine, you can change the name if you would like to \"Fetch from XKCD\", but look through the tabs just to see what they contain. Select the properties tab to configure the options of the processor.\n\n{% img /images/getting-started/configure-processor.png %}\n\nThe URL we are going to grab a file from is [XKCD's](http://xkcd.com) rss feed, [rss.xml](http://xkcd.com/rss.xml). The GetHTTP procesor is simple to configure and just needs the URL property to be set to http://xkcd.com/rss.xml. Click on the value across from the URL property and enter http://xkcd.com/rss.xml.\n\nNow drag down another processor, EvaluateXPath. Under the properties for this processor, set the following property-value pairs:\n\n* Destination\t-\tflowfile-attribute\n* Return Type\t-\tauto-detect\n\nNow add a new property as follows by clicking the New property symbol in the upper right side of the Configure Processor modal.\n\n* pubDate\t-\tchannel/item[2]/pubDate\n* title \t-\tchannel/item[2]/title\n* link \t\t-\tchannel/item[2]/link\n\n{% img /images/getting-started/config-xpath.png %}\n\nAfter applying the changes, we can now connect the two processors we have. Click the middle of the GetHTTP processor and then drag to the EvaluateXPath processor. A new modal will appear to confirm the relationship and click ok. After connecting the two. {% img right /images/getting-started/connected-procs.png %}\n\nWe are almost done, just two more PutFile processors to go. Before we add the next two processors, we need to create two directories to put matched and unmatched files. (This is really optional since the PutFile processor will create the directory by default if it does not exist, but it's easier to do in advance knowing the full path.) If you used the Vagrantfile that I linked to earlier, these directories are already created for you, so you can skip this part. If you did not, just create a directory called matched and unmatched and remember the full path.\n\nNow add two PutFile Processors and change one name to matched, the other to unmatched. Both PutFile processors will have auto terminated relationships for failure and success. These are checked off on the Settings tab when configuring each processor. {% img /images/getting-started/auto-terminate-hg.png %} For the matched PutFile processor, change the directory property value to the directory you created earlier called matched, with the full path. If you are using the Vagrantfile, the directory is \"/vagrant/matched/\". For the unmatched PutFile processor, do the same thing for the unmatched directory you created, \"/vagrant/unmatched/\" if using the Vagrantfile.\n\nNow wire the last few processors, EvaluateXPath and the two PutFile processors. Make sure to connect the correct relationships from the EvaluateXPath for the matched and unmatched. Lastly, wire the failure relationship for the EvaluateXPath back to itself by left clicking on the processor as you normally would to connect processors, pulling away and then back to the original processor and then select failure from the relationship modal. This allows failures to retry on the processor they failed on, EvaluateXPath in this example.\n\nYour final flow should look similar to this.\n\n{% img /images/getting-started/completed-flow.png %}\n\nAll processors should be stopped and have a red square in their top left. If they have an exclaimation mark in a yellow triangle, then there is an issue with the processor. Hover of the yellow triangle to get more information. (To see an example, remove one of the auto terminate relationships from a PutFile processor).\n\nNow we can run our flow file and see what happens! Run the flow by clicking the play button in the flow menu bar.\n\n{% img /images/getting-started/menu-bar-hg.png %}\n\nMake sure you don't have a processor selected or it will only start that one processor. After running, it should pull down the rss.xml file, parse the attributes, and route it to the matched directory via the PutFile processor. You can view the statistics on each processor from the flow, In and out, and Tasks/Time. After it grabs the file, it should look similar to below. You can check the directory you specified in the matched PutFile processor and the file should be there!\n\n{% img /images/getting-started/finished-flow.png %}\n\nI hope this simple walk through of creating a flow was helpful and you can start to see the power behind Apache Nifi. In our next post we will show you how to create a custom processor and add it to your flow.","html":"<ul>\n<li>Update April 8th 2015: The Vagrantfile has been updated to pull the release tag 0.0.2 for stability</li>\n</ul>\n\n\n<h1>Install options</h1>\n\n<p><a href=\"https://nifi.apache.org\">Apache NiFi</a> is currently in incubation and so does not have any releases, so to start we have to checkout the project and build the code base. A users guide is avaialble on the <a href=\"https://nifi.apache.org/quickstart.html\">nifi website</a> with requirements for building and running nifi, mainly java 7 and maven 3.0.5+. OpendJDK will work but currently the unit tests will not pass. A quick tldr; can be found below.</p>\n\n<h2>Using Vagrant</h2>\n\n<p>If you like to keep your project dependencies seperate like me, I’ve created a <a href=\"https://gist.github.com/czobrisky/a1ed32d9429600f4e661#file-vagrantfile\">Vagrantfile</a> that will get all dependencies and then build the project in the shared /vagrant/ folder. After building it will start nifi and you can point your browser to localhost:8080/nifi/ and skip to <a href=\"#building-dataflow\">Building a Simple Dataflow</a>. The provisioning of the VM will take about 10-15 minutes depending on your hardware.</p>\n\n<!-- more -->\n\n\n\n\n<script src=\"https://gist.github.com/czobrisky/a1ed32d9429600f4e661.js\"></script>\n\n\n<p>If you want to build it for yourself it normally takes 2-6 minutes depending on your system, with parrallel builds enabled.\nClone from Apache Nifi git repo:</p>\n\n<pre><code>git git clone http://git-wip-us.apache.org/repos/asf/nifi.git\ncd nifi\n</code></pre>\n\n<p>Once you have the source, first have to build the nar-maven-plugin.</p>\n\n<pre><code>cd nifi-nar-maven-plugin\nmvn clean install\n</code></pre>\n\n<p>Now build the rest of the source from the root directory.</p>\n\n<pre><code>cd ../nifi\nmvn -T C2.0 clean install\n</code></pre>\n\n<p>NiFi recently updated their build process so it now supports parrallel building of components using the -T maven arg to speed up the build.</p>\n\n<p>The last step is to build the assembly and unpack the created tar.gz.</p>\n\n<pre><code>cp nifi-assembly/target/nifi-*-SNAPSHOT-bin.tar.gz ~/nifi-example/\ntar -zxf nifi-*-SNAPSHOT-bin.tar.gz\n</code></pre>\n\n<p>After unpacking, cd into the directory and start NiFi in the background using the nifi.sh script in the bin directory.</p>\n\n<pre><code>cd nifi-*-SNAPSHOT\nbin/nifi.sh start\n</code></pre>\n\n<p>Now open a browser and point to localhost:8080/nifi to view the NiFi flow UI.</p>\n\n<p>{% img /images/getting-started/nifi-home.png %}</p>\n\n<h1><a name=\"building-dataflow\"></a>Building your first Apache Nifi dataflow</h1>\n\n<p>The homepage webpage for Apache NiFi is a grid with some options on the top of the page. This is where you build your data flows, the configuration of each processor and the relationship between processors. This is saved as a FlowFile and can be saved for later use or imported into another Apache NiFi instance. As an example, we’ll fetch an XML document from an RSS feed and then based on an attribute and move it to a predefined directory.</p>\n\n<p>To start, drag the processor icon, {% img /images/getting-started/processor.png %} in the menu bar down to the canvas below. Select the GetHTTP processor from the list, you can use the search bar on the right side of the Add Processor modal or the Tags on the left to filter the list.\n{% img /images/getting-started/add-processor-highlight.png %}</p>\n\n<p>Right clicking on the processor brings up a dropdown that allows you to configure the processor along with a few other options.\n{% img right /images/getting-started/processor-dropdown.png %}</p>\n\n<p>From this selection a “Configure Processor” modal opens. Most default settinsg are fine, you can change the name if you would like to “Fetch from XKCD”, but look through the tabs just to see what they contain. Select the properties tab to configure the options of the processor.</p>\n\n<p>{% img /images/getting-started/configure-processor.png %}</p>\n\n<p>The URL we are going to grab a file from is <a href=\"http://xkcd.com\">XKCD’s</a> rss feed, <a href=\"http://xkcd.com/rss.xml\">rss.xml</a>. The GetHTTP procesor is simple to configure and just needs the URL property to be set to <a href=\"http://xkcd.com/rss.xml.\">http://xkcd.com/rss.xml.</a> Click on the value across from the URL property and enter <a href=\"http://xkcd.com/rss.xml.\">http://xkcd.com/rss.xml.</a></p>\n\n<p>Now drag down another processor, EvaluateXPath. Under the properties for this processor, set the following property-value pairs:</p>\n\n<ul>\n<li>Destination - flowfile-attribute</li>\n<li>Return Type - auto-detect</li>\n</ul>\n\n\n<p>Now add a new property as follows by clicking the New property symbol in the upper right side of the Configure Processor modal.</p>\n\n<ul>\n<li>pubDate - channel/item[2]/pubDate</li>\n<li>title - channel/item[2]/title</li>\n<li>link - channel/item[2]/link</li>\n</ul>\n\n\n<p>{% img /images/getting-started/config-xpath.png %}</p>\n\n<p>After applying the changes, we can now connect the two processors we have. Click the middle of the GetHTTP processor and then drag to the EvaluateXPath processor. A new modal will appear to confirm the relationship and click ok. After connecting the two. {% img right /images/getting-started/connected-procs.png %}</p>\n\n<p>We are almost done, just two more PutFile processors to go. Before we add the next two processors, we need to create two directories to put matched and unmatched files. (This is really optional since the PutFile processor will create the directory by default if it does not exist, but it’s easier to do in advance knowing the full path.) If you used the Vagrantfile that I linked to earlier, these directories are already created for you, so you can skip this part. If you did not, just create a directory called matched and unmatched and remember the full path.</p>\n\n<p>Now add two PutFile Processors and change one name to matched, the other to unmatched. Both PutFile processors will have auto terminated relationships for failure and success. These are checked off on the Settings tab when configuring each processor. {% img /images/getting-started/auto-terminate-hg.png %} For the matched PutFile processor, change the directory property value to the directory you created earlier called matched, with the full path. If you are using the Vagrantfile, the directory is “/vagrant/matched/”. For the unmatched PutFile processor, do the same thing for the unmatched directory you created, “/vagrant/unmatched/” if using the Vagrantfile.</p>\n\n<p>Now wire the last few processors, EvaluateXPath and the two PutFile processors. Make sure to connect the correct relationships from the EvaluateXPath for the matched and unmatched. Lastly, wire the failure relationship for the EvaluateXPath back to itself by left clicking on the processor as you normally would to connect processors, pulling away and then back to the original processor and then select failure from the relationship modal. This allows failures to retry on the processor they failed on, EvaluateXPath in this example.</p>\n\n<p>Your final flow should look similar to this.</p>\n\n<p>{% img /images/getting-started/completed-flow.png %}</p>\n\n<p>All processors should be stopped and have a red square in their top left. If they have an exclaimation mark in a yellow triangle, then there is an issue with the processor. Hover of the yellow triangle to get more information. (To see an example, remove one of the auto terminate relationships from a PutFile processor).</p>\n\n<p>Now we can run our flow file and see what happens! Run the flow by clicking the play button in the flow menu bar.</p>\n\n<p>{% img /images/getting-started/menu-bar-hg.png %}</p>\n\n<p>Make sure you don’t have a processor selected or it will only start that one processor. After running, it should pull down the rss.xml file, parse the attributes, and route it to the matched directory via the PutFile processor. You can view the statistics on each processor from the flow, In and out, and Tasks/Time. After it grabs the file, it should look similar to below. You can check the directory you specified in the matched PutFile processor and the file should be there!</p>\n\n<p>{% img /images/getting-started/finished-flow.png %}</p>\n\n<p>I hope this simple walk through of creating a flow was helpful and you can start to see the power behind Apache Nifi. In our next post we will show you how to create a custom processor and add it to your flow.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1421370483000,"created_by":1,"updated_at":1421370483000,"updated_by":1,"published_at":1421370483000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"how-tos"}]},{"title":"Navigating Apache Nifi","slug":"navigating-apache-nifi","markdown":"The main user interface of Apache Nifi is their web ui. This makes it much more enjoyable to use than a command line interface, but can still be hard to grasp quickly or know where certain things are when you first start using it. To help reduce the learning curve, we are going through and breaking down the web ui through a video showing how-to navigate Apache Nifi.\n\nOur main points will be the menu bar and building a data flow with some tips and tricks along the way. The data flow will be based off of our previous post [Getting Started with Apache Nifi]({{ site.url }}/getting-started-with-apache-nifi/).\n\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/FgTGAWLC170\" frameborder=\"0\" allowfullscreen></iframe>","html":"<p>The main user interface of Apache Nifi is their web ui. This makes it much more enjoyable to use than a command line interface, but can still be hard to grasp quickly or know where certain things are when you first start using it. To help reduce the learning curve, we are going through and breaking down the web ui through a video showing how-to navigate Apache Nifi.</p>\n\n<p>Our main points will be the menu bar and building a data flow with some tips and tricks along the way. The data flow will be based off of our previous post <a href=\"{{%20site.url%20}}/getting-started-with-apache-nifi/\">Getting Started with Apache Nifi</a>.</p>\n\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/FgTGAWLC170\" frameborder=\"0\" allowfullscreen></iframe>\n\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1422417500000,"created_by":1,"updated_at":1422417500000,"updated_by":1,"published_at":1422417500000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"videos"},{"name":"how-tos"}]},{"title":"What's in Nifi's first release: Nifi 0.0.1","slug":"whats-in-nifis-first-release-nifi-0-dot-0-1","markdown":"Apache Nifi just had their first release, [0.0.1](https://nifi.apache.org/download.html). It showed good movement forward with 75 bug fixes, 24 improvements, and 2 new features. A list of [Release Notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329078) is on their Jira page. With the release, it gives the users source downloads to start from, which is great, but lets look at the highlights of this release.\n\n## Apache Nifi Release 0.0.1 Highlights\n\n* One of the biggest changes was the directory structure to allow parrallel maven builds. This took the build time down from 20-30 minutes to 3-7 minutes. [Nifi Jira 169](https://issues.apache.org/jira/browse/NIFI-169)\n* The improvement of the assembly process also allowed for more stream lined packaging. [Nifi Jira 228](https://issues.apache.org/jira/browse/NIFI-228)\n\nOutside of the build process, most of the updates were cleaning up the code base to make it more developer friendly and fixing alot of small bugs;\n\n* Updating libraries to their most current version\n* Adding documentation/users guides\n* Guarantee builds on varying OSes: OSX, Linux, Windows.\n\nI'm pretty excited to see what's in the next release, but also excited that I don't have to build nifi to use it. Release binaries are available on the [nifi website](https://nifi.apache.org/download.html).","html":"<p>Apache Nifi just had their first release, <a href=\"https://nifi.apache.org/download.html\">0.0.1</a>. It showed good movement forward with 75 bug fixes, 24 improvements, and 2 new features. A list of <a href=\"https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329078\">Release Notes</a> is on their Jira page. With the release, it gives the users source downloads to start from, which is great, but lets look at the highlights of this release.</p>\n\n<h2>Apache Nifi Release 0.0.1 Highlights</h2>\n\n<ul>\n<li>One of the biggest changes was the directory structure to allow parrallel maven builds. This took the build time down from 20-30 minutes to 3-7 minutes. <a href=\"https://issues.apache.org/jira/browse/NIFI-169\">Nifi Jira 169</a></li>\n<li>The improvement of the assembly process also allowed for more stream lined packaging. <a href=\"https://issues.apache.org/jira/browse/NIFI-228\">Nifi Jira 228</a></li>\n</ul>\n\n\n<p>Outside of the build process, most of the updates were cleaning up the code base to make it more developer friendly and fixing alot of small bugs;</p>\n\n<ul>\n<li>Updating libraries to their most current version</li>\n<li>Adding documentation/users guides</li>\n<li>Guarantee builds on varying OSes: OSX, Linux, Windows.</li>\n</ul>\n\n\n<p>I’m pretty excited to see what’s in the next release, but also excited that I don’t have to build nifi to use it. Release binaries are available on the <a href=\"https://nifi.apache.org/download.html\">nifi website</a>.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1423086897000,"created_by":1,"updated_at":1423086897000,"updated_by":1,"published_at":1423086897000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]},{"title":"Apache Nifi: What Processors are there?","slug":"apache-nifi-processors","markdown":"** Includes all processors through release 0.7.0 **\n\nI looked around at what can be done with Apache NiFi and didn't notice a list of processors without looking at the code or building the project. I think a list of available processors, the work horse of Apache Nifi, would greatly help decide if it is right for certain needs. So, I went into the usage guide in the Apache Nifi UI and pulled a list of processors and a quick description for those who want to know what possibilities there are before getting into nifi itself!\n\n# List of processors\nWith new releases of Nifi, the number of processors have increased from the original 53 to 154!\nHere is a list of all 154 processors, listed alphabetically, that are currently in Apache Nifi as of the most rescent release. Each one links to a description of the processor further down. The Usage documentation available in the web ui has much more detail about each processor, it's properties, modifiable attributes, and relationships and each processor has it's own page in the UI, so here is just a quick overview. Again, this content is taken directly from Nifi's Usage guide in their web UI and all credit/rights belong to them under the Apache 2.0 License.\n\nNifi has improved their documentation, which was originally only available when running apache nifi. The documentation now is produced through the build process and has been added to [their website](https://nifi.apache.org/docs.html). So if you need more information or more detail about each processor just check there.\n\n<!--more -->\n* [AttributesToJSON](#AttributesToJSON)\n* [Base64EncodeContent](#Base64EncodeContent)\n* [CompressContent](#CompressContent)\n* [ConsumeAMQP](#ConsumeAMQP)\n* [ConsumerJMS](#ConsumeJMS)\n* [ConsumeKafka](#ConsumeKafka)\n* [ConsumeMQTT](#ConsumeMQTT)\n* [ControlRate](#ControlRate)\n* [ConvertAvroSchema](#ConvertAvroSchema)\n* [ConvertAvroToJSON](#ConvertAvroToJSON)\n* [ConvertCharacterSet](#ConvertCharacterSet)\n* [ConvertCSVToAvro](#ConvertCSVToAvro)\n* [ConvertJSONToAvro](#ConvertJSONToAvro)\n* [ConvertCSVToSQL](#ConvertCSVToSQL) DEPRECATED - no longer available as of 0.5.0\n* [ConvertJSONToSQL](#ConvertJSONToSQL)\n* [CreateHadoopSequenceFile](#CreateHadoopSequenceFile)\n* [DebugFlow](#DebugFlow)\n* [DeleteS3Object](#DeleteS3Object)\n* [DeleteSQS](#DeleteSQS)\n* [DetectDuplicate](#DetectDuplicate)\n* [DistributeLoad](#DistributeLoad)\n* [DuplicateFlowFile](#DuplicateFlowFile)\n* [EncryptContent](#EncryptContent)\n* [EvaluateJSONPath](#EvaluateJSONPath)\n* [EvaluateRegularExpression](#EvaluateRegularExpression) \n* [ExtractMediaMetadata](#ExtractMediaMetadata)\nDEPRECATED-Use [ExtractText](#ExtractText)\n* [EvaluateXPath](#EvaluateXPath)\n* [EvaluateXQuery](#EvaluateXQuery)\n* [ExecuteFlumeSink](#ExecuteFlumeSink)\n* [ExecuteFlumeSource](#ExecuteFlumeSource)\n* [ExecuteProcess](#ExecuteProcess)\n* [ExecuteScript](#ExecuteScript)\n* [ExecuteSQL](#ExecuteSQL)\n* [ExecuteStreamCommand](#ExecuteStreamCommand)\n* [ExtractAvroMetadata](#ExtractAvroMetadata)\n* [ExtractHL7Attributes](#ExtractHL7Attributes)\n* [ExtractImageMetadata](#ExtractImageMetadata)\n* [ExtractText](#ExtractText)\n* [FetchDistributedMapCache](#FetchDistributedMapCache)\n* [FetchElasticSearch](#FetchElasticSearch)\n* [FetchFile](#FetchFile)\n* [FetchHDFS](#FetchHDFS)\n* [FetchS3Object](#FetchS3Object)\n* [FetchSFTP](#FetchSFTP)\n* [GenerateFlowFile](#GenerateFlowFile)\n* [GeoEnrichIP](#GeoEnrichIP)\n* [GetAzureEventHub](#GetAzureEventHub)\n* [GetCouchbaseKey](#GetCouchbaseKey)\n* [GetDynamoDB](#GetDynamoDB)\n* [GetFile](#GetFile)\n* [GetFTP](#GetFTP)\n* [GetHBase](#GetHBase)\n* [GetHDFS](#GetHDFS)\n* [GetHDFSEvents](#GetHDFSEvents)\n* [GetHDFSSequenceFile](#GetHDFSSequenceFile)\n* [GetHTMLElement](#GetHTMLElement)\n* [GetHTTP](#GetHTTP)\n* [GetJMSQueue](#GetJMSQueue)\n* [GetJMSTopic](#GetJMSTopic)\n* [GetKafka](#GetKafka)\n* [GetMongo](#GetMongo)\n* [GetSFTP](#GetSFTP)\n* [GetSNMP](#GetSNMP)\n* [GetSolr](#GetSolr)\n* [GetSplunk](#GetSplunk)\n* [GetSQS](#GetSQS)\n* [GetTwitter](#GetTwitter)\n* [HandleHttpRequest](#HandleHttpRequest)\n* [HandleHttpResponse](#HandleHttpResponse)\n* [HashAttribute](#HashAttribute)\n* [HashContent](#HashContent)\n* [IdentifyMimeType](#IdentifyMimeType)\n* [InferAvroShema](#InferAvroShema)\n* [InvokeHTTP](#InvokeHTTP)\n* [InvokeScriptedProcessor](#InvokeScriptedProcessor)\n* [JoltTransformJSON](#JoltTransformJSON)\n* [ListenHTTP](#ListenHTTP)\n* [ListenLumberjack](#ListenLumberjack)\n* [ListenRELP](#ListenRELP)\n* [ListenSyslog](#ListenSyslog)\n* [ListenTCP](#ListenTCP)\n* [ListenUDP](#ListenUDP)\n* [ListFile](#ListFile)\n* [ListHDFS](#ListHDFS)\n* [ListS3](#ListS3)\n* [ListSFTP](#ListSFTP)\n* [LogAttribute](#LogAttribute)\n* [MergeContent](#MergeContent)\n* [ModifyBytes](#ModifyBytes)\n* [ModifyHTMLElement](#ModifyHTMLElement)\n* [MonitorActivity](#MonitorActivity)\n* [ParseSyslog](#ParseSyslog)\n* [PostHTTP](#PostHTTP)\n* [PublishAMQP](#PublishAMQP)\n* [PublishJMS](#PublishJMS)\n* [PublishKafka](#PublishKafka)\n* [PublishMQTT](#PublishMQTT)\n* [PutAzureEventHub](#PutAzureEventHub)\n* [PutCassandraQL](#PutCassandraQL)\n* [PutCouchbaseKey](#PutCouchbaseKey)\n* [PutDistributedMapCache](#PutDistributedMapCache)\n* [PutDynamoDB](#PutDynamoDB)\n* [PutElasticsearch](#PutElasticsearch)\n* [PutEmail](#PutEmail)\n* [PutFile](#PutFile)\n* [PutFTP](#PutFTP)\n* [PutHBaseCell](#PutHBaseCell)\n* [PutHBaseJSON](#PutHBaseJSON)\n* [PutHDFS](#PutHDFS)\n* [PutHiveQL](#PutHiveQL)\n* [PutHTMLElement](#PutHTMLElement)\n* [PutJMS](#PutJMS)\n* [PutKafka](#PutKafka)\n* [PutKinesisFirehose](#PutKinesisFirehose)\n* [PutLambda](#PutLambda)\n* [PutMongo](#PutMongo)\n* [PutRiemann](#PutRiemann)\n* [PutS3Object](#PutS3Object)\n* [PutSlack](#PutSlack)\n* [PutSFTP](#PutSFTP)\n* [PutSNS](#PutSNS)\n* [PutSolrContentStream](#PutSolrContentStream)\n* [PutSplunk](#PutSplunk)\n* [PutSQL](#PutSQL)\n* [PutSQS](#PutSQS)\n* [PutSyslog](#PutSyslog)\n* [PutTCP](#PutTCP)\n* [PutUDP](#PutUDP)\n* [QueryCassandra](#QueryCassandra)\n* [QueryDatabaseTable](#QueryDatabaseTable)\n* [ReplaceText](#ReplaceText)\n* [ReplaceTextWithMapping](#ReplaceTextWithMapping)\n* [ResizeImage](#ResizeImage)\n* [RouteHL7](#RouteHL7)\n* [RouteOnAttribute](#RouteOnAttribute)\n* [RouteOnContent](#RouteOnContent)\n* [RouteText](#RouteText)\n* [ScanAttribute](#ScanAttribute)\n* [ScanContent](#ScanContent)\n* [SelectHiveQL](#SelectHiveQL)\n* [SegmentContent](#SegmentContent)\n* [SetSNMP](#SetSNMP)\n* [SplitAvro](#SplitAvro)\n* [SplitContent](#SplitContent)\n* [SpringContextProcessor](#SpringContextProcessor)\n* [SplitJson](#SplitJson)\n* [SplitText](#SplitText)\n* [SplitXML](#SplitXML)\n* [StoreInKiteDataset](#StoreInKiteDataset)\n* [TailFile](#TailFile)\n* [TransformXML](#TransformXML)\n* [UnpackContent](#UnpackContent)\n* [UpdateAttribute](#UpdateAttribute)\n* [ValidateXML](#ValidateXML)\n* [YandexTranslate](#YandexTranslate)\n\n### <a name=\"AttributesToJSON\"></a>AttributesToJSON\nGenerates a JSON representation of the input FlowFile Attributes. The resulting JSON can be written to either a new Attribute 'JSONAttributes' or written to the FlowFile as content.\n\n### <a name=\"Base64EncodeContent\"></a>Base64EncodeContent\nThis processor base64 encodes FlowFile content, or decodes FlowFile content from base64.\n\n### <a name=\"CompressContent\"></a>CompressContent\nThis processor compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate\n\n### <a name=\"ConsumeAMQP\"></a>ConsumeAMQP\nConsumes AMQP Message transforming its content to a FlowFile and transitioning it to 'success' relationship\n\n### <a name=\"ConsumeJMS\"></a>ConsumeJMS\nConsumes JMS Message of type BytesMessage or TextMessage transforming its content to a FlowFile and transitioning it to 'success' relationship.\n\n### <a name=\"ConsumeKafka\"></a>ConsumeKafka\nThis Processors polls Apache Kafka for data using KafkaConsumer API available with Kafka 0.9+. When a message is received from Kafka, this Processor emits a FlowFile where the content of the FlowFile is the value of the Kafka message.\n\n### <a name=\"ConsumeMQTT\"></a>ConsumeMQTT\nSubscribes to a topic and receives messages from an MQTT broker\n\n### <a name=\"ControlRate\"></a>ControlRate\nThis processor controls the rate at which data is transferred to follow-on processors.\n\n### <a name=\"ConvertAvroSchema\"></a>ConvertAvroSchema\nConvert records from one Avro schema to another, including support for flattening and simple type conversions.\n\nThis processor is used to convert data between two Avro formats, such as those coming from the ConvertCSVToAvro or ConvertJSONToAvro processors. The input and output content of the flow files should be Avro data files. The processor includes support for the following basic type conversions:\n\nAnything to String, using the data's default String representation\nString types to numeric types int, long, double, and float\nConversion to and from optional Avro types\nIn addition, fields can be renamed or unpacked from a record type by using the dynamic properties.\n\n### <a name =\"ConvertAvroToJSON\"></a>ConvertAvroToJSON\nConverts a Binary Avro record into a JSON object. This processor provides a direct mapping of an Avro field to a JSON field, such that the resulting JSON will have the same hierarchical structure as the Avro document. Note that the Avro schema information will be lost, as this is not a translation from binary Avro to JSON formatted Avro. The output JSON is encoded the UTF-8 encoding. If an incoming FlowFile contains a stream of multiple Avro records, the resultant FlowFile will contain a JSON Array containing all of the Avro records.\n\n### <a name=\"ConvertCharacterSet\"></a>ConvertCharacterSet\nThis processor converts a FlowFile's content from one character set to another.\n\n### <a name=\"ConvertCSVToAvro\"></a>ConvertCSVToAvro\nConverts CSV files to Avro according to an Avro Schema\n\n### <a name=\"ConvertCSVToJSON\"></a>ConvertCSVToJSON\nConverts JSON files to Avro according to an Avro Schema\n\n### <a name=\"ConvertCSVToSQL\"></a>ConvertCSVToSQL - DEPRECATED as of 0.5.0\nConverts JSON files to Avro according to an Avro Schema\nConverts a JSON-formatted FlowFile into an UPDATE or INSERT SQL statement. The incoming FlowFile is expected to be \"flat\" JSON message, meaning that it consists of a single JSON element and each field maps to a simple type. If a field maps to a JSON object, that JSON object will be interpreted as Text. If the input is an array of JSON elements, each element in the array is output as a separate FlowFile to the 'sql' relationship. Upon successful conversion, the original FlowFile is routed to the 'original' relationship and the SQL is routed to the 'sql' relationship.\n\n### <a name=\"ConvertJSONToSQL\"></a>ConvertJSONToSQL\nConverts a JSON-formatted FlowFile into an UPDATE or INSERT SQL statement. The incoming FlowFile is expected to be \"flat\" JSON message, meaning that it consists of a single JSON element and each field maps to a simple type. If a field maps to a JSON object, that JSON object will be interpreted as Text. If the input is an array of JSON elements, each element in the array is output as a separate FlowFile to the 'sql' relationship. Upon successful conversion, the original FlowFile is routed to the 'original' relationship and the SQL is routed to the 'sql' relationship.\n\n### <a name=\"CreateHadoopSequenceFile\"></a>CreateHadoopSequenceFile\nThis processor is used to create a Hadoop Sequence File, which essentially is a file of key/value pairs. The key will be a file name and the value will be the flow file content. The processor will take either a merged (a.k.a. packaged) flow file or a singular flow file. Historically, this processor handled the merging by type and size or time prior to creating a SequenceFile output; it no longer does this. If creating a SequenceFile that contains multiple files of the same type is desired, precede this processor with a RouteOnAttribute processor to segregate files of the same type and follow that with a MergeContent processor to bundle up files. If the type of files is not important, just use the MergeContent processor. When using the MergeContent processor, the following Merge Formats are supported by this processor:\n\nTAR\nZIP\nFlowFileStream v3\nThe created SequenceFile is named the same as the incoming FlowFile with the suffix '.sf'. For incoming FlowFiles that are bundled, the keys in the SequenceFile are the individual file names, the values are the contents of each file.\nNOTE: The value portion of a key/value pair is loaded into memory. While there is a max size limit of 2GB, this could cause memory issues if there are too many concurrent tasks and the flow file sizes are large.\n\n### <a name=\"DebugFlow\"></a>DebugFlow\nThe DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling. It can force responses needed to exercise or test various failure modes that can occur when a processor runs.\n\n### <a name=\"DeleteDynamoDB\"></a>DeleteDynamoDB\nDeletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key)\n\n### <a name=\"DeleteS3Object\"></a>DeleteS3Object\nDeletes FlowFiles on an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.\n\n### <a name=\"DeleteSQS\"></a>DeleteSQS\nDeletes a message from an Amazon Simple Queuing Service Queue\n\n### <a name=\"DetectDuplicate\"></a>DetectDuplicate\nThis processor detects duplicate data by examining flow file attributes, thus allowing the user to configure what it means for two FlowFiles to be considered \"duplicates\". This processor does not read the contents of a flow file, and is typically preceded by another processor which computes a value based on the flow file content and adds that value to the flow file's attributes; e.g. HashContent. Because this Processor needs to be able to work within a NiFi cluster, it makes use of a distributed cache service to determine whether or not the data has been seen previously.\n\nIf the processor is to be run on a standalone instance of NiFi, that instance should have both a DistributedMapCacheClient and a DistributedMapCacheServer configured in its controller-services.xml file.\n\n### <a name=\"DistributeLoad\"></a>DistributeLoad\nThis processor distributes FlowFiles to downstream processors based on a distribution strategy. The user may select the strategy \"round robin\", the strategy \"next available\", or \"load distribution service\". If using the round robin strategy, the default is to assign each destination (i.e., relationship) a weighting of 1 (evenly distributed). However, the user may add optional properties to change this weighting. When adding a property, the name must be a positive integer between 1 and the number of relationships (inclusive). For example, if Number of Relationships has a value of 8 and a property is added with the name 5 and the value 10, then relationship 5 (among the 8) will receive 10 FlowFiles in each iteration instead of 1. All other relationships will receive 1 FlowFile in each iteration.\n\n### <a name=\"DuplicateFlowFile\"></a>DuplicateFlowFile\nIntended for load testing, this processor will create the configured number of copies of each incoming FlowFile\n\n### <a name=\"EncryptContent\"></a>EncryptContent\nEncrypts or Decrypts a FlowFile using either symmetric encryption with a password and randomly generated salt, or asymmetric encryption using a public and secret key.\n\n### <a name=\"EvaluateJsonPath\"></a>EvaluateJsonPath\nEvaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid JsonPath expression. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to 'scalar' the FlowFile will be unmodified and will be routed to failure. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value and will be routed as a match. If Destination is 'flowfile-content' and the JsonPath does not evaluate to a defined path, the FlowFile will be routed to 'unmatched' without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to 'matched'.\n\n### <a name=\"EvaluateRegularExpression\"></a>EvaluateRegularExpression\nWARNING: This has been deprecated and will be removed in 0.2.0. Use ExtractText instead.\n\n### <a name=\"EvaluateXPath\"></a>EvaluateXPath\nThis processor evaluates one or more XPaths against the content of a FlowFile. The results of those XPaths are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid XPath expression. If the XPath evaluates to more than one node and the Return Type is set to 'nodeset' (either directly, or via 'auto-detect' with a Destination of 'flowfile-content'), the FlowFile will be unmodified and will be routed to failure. If the XPath does not evaluate to a Node, the FlowFile will be routed to 'unmatched' without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to 'matched'\n\n### <a name=\"EvaluateXQuery\"></a>EvaluateXQuery\nThis processor evaluates one or more XQueries against the content of a FlowFile. The results of those XQueries are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XQueries are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is 'flowfile-attribute'; otherwise, the property name is ignored). The value of the property must be a valid XQuery. If the XQuery returns more than one result, new attributes or FlowFiles (for Destinations of 'flowfile-attribute' or 'flowfile-content' respectively) will be created for each result (attributes will have a '.n' one-up number appended to the specified attribute name). If any provided XQuery returns a result, the FlowFile(s) will be routed to 'matched'. If no provided XQuery returns a result, the FlowFile will be routed to 'unmatched'. If the Destination is 'flowfile-attribute' and the XQueries matche nothing, no attributes will be applied to the FlowFile.\n\n### <a name=\"ExecuteFlumeSink\"></a>ExecuteFlumeSink\nThis processor executes a Flume sink. Each input FlowFile is converted into a Flume Event for processing by the sink.\n\n### <a name=\"ExecuteFlumeSource\"></a>ExecuteFlumeSource\nExecute a Flume source. Each Flume Event is sent to the success relationship as a FlowFile\n\n### <a name=\"ExecuteProcess\"></a>ExecuteProcess\nRuns an operating system command specified by the user and writes the output of that command to a FlowFile. If the command is expected to be long-running, the Processor can output the partial data on a specified interval. When this option is used, the output is expected to be in textual format, as it typically does not make sense to split binary data on arbitrary time-based intervals.\n\n### <a name=\"ExecuteScript\"></a>ExecuteScript\nExperimental - Executes a script given the flow file and a process session. The script is responsible for handling the incoming flow file (transfer to SUCCESS or remove, e.g.) as well as any flow files created by the script. If the handling is incomplete or incorrect, the session will be rolled back. Experimental: Impact of sustained usage not yet verified.\n\n### <a name=\"ExecuteSQL\"></a>ExecuteSQL\nExecute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported\n\n### <a name=\"ExecuteStreamCommand\"></a>ExecuteStreamCommand\nThis processor executes an external command on the contents of a FlowFile, and creates a new FlowFile with the results of the command.\n\n### <a name=\"ExtractAvroMetadata\"></a>ExtractAvroMetadata\nExtracts metadata from the header of an Avro datafile.\n\n### <a name=\"ExtractHL7Attributes\"></a>ExtractHL7Attributes\nExtracts information from an HL7 (Health Level 7) formatted FlowFile and adds the information as FlowFile Attributes. The attributes are named as <Segment Name> <dot> <Field Index>. If the segment is repeating, the naming will be <Segment Name> <underscore> <Segment Index> <dot> <Field Index>. For example, we may have an attribute named \"MHS.12\" with a value of \"2.1\" and an attribute named \"OBX_11.3\" with a value of \"93000^CPT4\".\n\n### <a name=\"ExtractImageMetadata\"></a>ExtractImageMetadata\nExtract the image metadata from flowfiles containing images. This processor relies on this metadata extractor library https://github.com/drewnoakes/metadata-extractor. It extracts a long list of metadata types including but not limited to EXIF, IPTC, XMP and Photoshop fields. For the full list visit the library's website.NOTE: The library being used loads the images into memory so extremely large images may cause problems\n\n### <a name=\"ExtractMediaMetadata\"></a>ExtractMediaMetadata\nExtract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library's website at http://tika.apache.org/.\n\n### <a name=\"ExtractText\"></a>ExtractText\nEvaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The first capture group, if any found, will be placed into that attribute name.But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided.The value of the property must be a valid Regular Expressions with one or more capturing groups. If the Regular Expression matches more than once, only the first match will be used. If any provided Regular Expression matches, the FlowFile(s) will be routed to 'matched'. If no provided Regular Expression matches, the FlowFile will be routed to 'unmatched' and no attributes will be applied to the FlowFile.\n\n### <a name=\"FetchDistributedMapCache\"></a>FetchDistributedMapCache\nComputes a cache key from FlowFile attributes, for each incoming FlowFile, and fetches the value from the Distributed Map Cache associated with that key. The incoming FlowFile's content is replaced with the binary data received by the Distributed Map Cache. If there is no value stored under that key then the flow file will be routed to 'not-found'. Note that the processor will always attempt to read the entire cached value into memory before placing it in it's destination. This could be potentially problematic if the cached value is very large.\n\n### <a name=\"FetchElasticSearch\"></a>FetchElasticSearch\nRetrieves a document from Elasticsearch using the specified connection properties and the identifier of the document to retrieve. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.\n\n### <a name=\"FetchFile\"></a>FetchFile\nReads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized.\n\n### <a name=\"FetchHDFS\"></a>FetchHDFS\nRetrieves a file from HDFS. The content of the incoming FlowFile is replaced by the content of the file in HDFS. The file in HDFS is left intact without any changes being made to it.\n\n### <a name=\"FetchS3Object\"></a>FetchS3Object\nRetrieves the contents of an S3 Object and writes it to the content of a FlowFile\n\n### <a name=\"FetchSFTP\"></a>FetchSFTP\nFetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.\n\n### <a name=\"GenerateFlowFile\"></a>GenerateFlowFile\nThis processor creates FlowFiles of random data to be used for load testing purposes.\n\n### <a name=\"GeoEnrichIP\"></a>GeoEnrichIP\nLooks up geolocation information for an IP address and adds the geo information to FlowFile attributes. The geo data is provided as a MaxMind database. The attribute that contains the IP address to lookup is provided by the 'IP Address Attribute' property. If the name of the attribute provided is 'X', then the the attributes added by enrichment will take the form X.geo.<fieldName>\n\n### <a name=\"GetAzureEventHub\"></a>GetAzureEventHub\nReceives messages from a Microsoft Azure Event Hub, writing the contents of the Azure message to the content of the FlowFile\n\n### <a name=\"GetCouchbaseKey\"></a>GetCouchbaseKey\nGet a document from Couchbase Server via Key/Value access. The ID of the document to fetch may be supplied by setting the <Document Id> property. NOTE: if the Document Id property is not set, the contents of the FlowFile will be read to determine the Document Id, which means that the contents of the entire FlowFile will be buffered in memory.\n\n### <a name=\"GetDynamoDB\"></a>GetDynamoDB\nRetrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document ('Map') attribute of the DynamoDB item is read into the content of the FlowFile.\n\n### <a name=\"GetFile\"></a>GetFile\nThis processor obtains FlowFiles from a local directory. NiFi will need at least read permissions on the files it will pull otherwise it will ignore them.\n\n### <a name=\"GetFTP\"></a>GetFTP\nThis processor fetches files from an FTP server and creates FlowFiles from them.\n\n### <a name=\"GetHBase\"></a>GetHBase\nThis Processor polls HBase for any records in the specified table. The processor keeps track of the timestamp of the cells that it receives, so that as new records are pushed to HBase, they will automatically be pulled. Each record is output in JSON format, as {\"row\": \"<row key>\", \"cells\": { \"<column 1 family>:<column 1 qualifier>\": \"<cell 1 value>\", \"<column 2 family>:<column 2 qualifier>\": \"<cell 2 value>\", ... }}. For each record received, a Provenance RECEIVE event is emitted with the format hbase://<table name>/<row key>, where <row key> is the UTF-8 encoded value of the row's key.\n\n### <a name=\"GetHDFS\"></a>GetHDFS\nFetch files from Hadoop Distributed File System (HDFS) into FlowFiles. This Processor will delete the file from HDFS after fetching it.\n\n### <a name=\"GetHDFSEvents\"></a>GetHDFSEvents\nThis processor polls the notification events provided by the HdfsAdmin API. Since this uses the HdfsAdmin APIs it is required to run as an HDFS super user. Currently there are six types of events (append, close, create, metadata, rename, and unlink). Please see org.apache.hadoop.hdfs.inotify.Event documentation for full explanations of each event. This processor will poll for new events based on a defined duration. For each event received a new flow file will be created with the expected attributes and the event itself serialized to JSON and written to the flow file's content. For example, if event.type is APPEND then the content of the flow file will contain a JSON file containing the information about the append event. If successful the flow files are sent to the 'success' relationship. Be careful of where the generated flow files are stored. If the flow files are stored in one of processor's watch directories there will be a never ending flow of events. It is also important to be aware that this processor must consume all events. The filtering must happen within the processor. This is because the HDFS admin's event notifications API does not have filtering.\n\n### <a name=\"GetHDFSSequenceFile\"></a>GetHDFSSequenceFile\nFetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles\n\n### <a name=\"GetHTMLElement\"></a>GetHTMLElement\nExtracts HTML element values from the incoming flowfile's content using a CSS selector. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then \"queried\" using the user defined CSS selector string. The result of \"querying\" the HTML DOM may produce 0-N results. If no results are found the flowfile will be transferred to the \"element not found\" relationship to indicate so to the end user. If N results are found a new flowfile will be created and emitted for each result. The query result will either be placed in the content of the new flowfile or as an attribute of the new flowfile. By default the result is written to an attribute. This can be controlled by the \"Destination\" property. Resulting query values may also have data prepended or appended to them by setting the value of property \"Prepend Element Value\" or \"Append Element Value\". Prepended and appended values are treated as string values and concatenated to the result retrieved from the HTML DOM query operation. A more thorough reference for the CSS selector syntax can be found at \"http://jsoup.org/apidocs/org/jsoup/select/Selector.html\"\n\n### <a name=\"GetHTTP\"></a>GetHTTP\nThis processor fetches files via HTTP and creates FlowFiles from them.\n\n### <a name=\"GetJMSQueue\"></a>GetJMSQueue\nThis processor pulls messages from a JMS Queue, creating a FlowFile for each JMS message or bundle of messages, as configured.\n\n### <a name=\"GetJMSTopic\"></a>GetJMSTopic\nThis processor pulls messages from a JMS Topic, creating a FlowFile for each JMS message or bundle of messages, as configured.\n\n### <a name=\"GetKafka\"></a>GetKafka\nThis Processors polls Apache Kafka for data. When a message is received from Kafka, this Processor emits a FlowFile where the content of the FlowFile is the value of the Kafka message. If the message has a key associated with it, an attribute named kafka.key will be added to the FlowFile, with the value being the UTF-8 Encoded value of the Message's Key.\n\nKafka supports the notion of a Consumer Group when pulling messages in order to provide scalability while still offering a publish-subscribe interface. Each Consumer Group must have a unique identifier. The Consumer Group identifier that is used by NiFi is the UUID of the Processor. This means that all of the nodes within a cluster will use the same Consumer Group Identifier so that they do not receive duplicate data but multiple GetKafka Processors can be used to pull from multiple Topics, as each Processor will receive a different Processor UUID and therefore a different Consumer Group Identifier.\n\n### <a name=\"GetMongo\"></a>GetMongo\nCreates FlowFiles from documents in MongoDB\n\n### <a name=\"GetSFTP\"></a>GetSFTP\nThis processor pulls files from an SFTP server and creates FlowFiles to encapsulate them.\n\n### <a name=\"GetSNMP\"></a>GetSNMP\nRetrieves information from SNMP Agent and outputs a FlowFile with information in attributes and without any content.\n\n### <a name=\"GetSolr\"></a>GetSolr\nQueries Solr and outputs the results as a FlowFile\n\n### <a name=\"GetSplunk\"></a>GetSplunk\nRetrieves data from Splunk Enterprise.\n\n### <a name=\"GetSQS\"></a>GetSQS\nFetches messages from an Amazon Simple Queuing Service Queue\n\n### <a name=\"GetTwitter\"></a>GetTwitter\nPulls status changes from Twitter's streaming API\n\n### <a name=\"HandleHttpRequest\"></a>HandleHttpRequest\nThis processor starts an HTTP server and creates a FlowFile for each HTTP Request that it receives. The Processor leaves the HTTP Connection open and is intended to be used in conjunction with a HandleHttpResponse Processor.\n\nThe pairing of this Processor with a HandleHttpResponse Processor provides the ability to use NiFi to visually construct a web server that can carry out any functionality that is available through the existing Processors. For example, one could construct a Web-based front end to an SFTP Server by constructing a flow such as:\n\nHandleHttpRequest -> PutSFTP -> HandleHttpResponse\n\nThe HandleHttpRequest Processor provides several Properties to configure which methods are supported, the paths that are supported, and SSL configuration. The FlowFiles that are generated by this Processor have the following attributes added to them, providing powerful routing capabilities and traceability of all data.\n\n### <a name=\"HandleHttpResponse\"></a>HandleHttpResponse\nThis processor responds to an HTTP request that was received by the HandleHttpRequest Processor.\n\nThe pairing of this Processor with a HandleHttpRequest Processor provides the ability to use NiFi to visually construct a web server that can carry out any functionality that is available through the existing Processors. For example, one could construct a Web-based front end to an SFTP Server by constructing a flow such as:\n\nHandleHttpRequest -> PutSFTP -> HandleHttpResponse\n\nThis Processor must be configured with the same <HTTP Context Map> service as the corresponding HandleHttpRequest Processor. Otherwise, all FlowFiles will be routed to the 'failure' relationship.\n\nAll FlowFiles must have an attribute named http.context.identifier. The value of this attribute is used to lookup the HTTP Response so that the proper message can be sent back to the requestor. If this attribute is missing, the FlowFile will be routed to 'failure.'\n\n### <a name=\"HashAttribute\"></a>HashAttribute\nThis processor hashes together the key/value pairs of several FlowFile attributes and adds the hash as a new attribute. The user may add optional properties such that the name of each property is the name of a FlowFile attribute to consider and the value of the property is a regular expression that, if matched by the attribute value, causes that attribute to be used as part of the hash. If the regular expression contains a capturing group, only the value of the capturing group is used.\n\n### <a name=\"HashContent\"></a>HashContent\nThis processor calculates a hash value for the content of a FlowFile and puts the hash value on the FlowFile as an attribute whose name is determined by the Hash Attribute Name property.\n\n### <a name=\"IdentifyMimeType\"></a>IdentifyMimeType\nThis processor attempts to identify the MIME Type used for a FlowFile. If the MIME Type can be identified, an attribute with the name 'mime.type' is added with the value being the MIME Type. If the MIME Type cannot be determined, the value will be set to 'application/octet-stream'. In addition, the attribute mime.extension will be set if a common file extension for the MIME Type is known.\n\nThe following MIME Types are detected:\n\n* application/gzip\n* application/bzip2\n* application/flowfile-v3\n* application/flowfile-v1 (requires Identify TAR be set to true)\n* application/xml\n* video/mp4\n* video/x-m4v\n* video/mp4a-latm\n* video/quicktime\n* video/mpeg\n* audio/wav\n* audio/mp3\n* image/bmp\n* image/png\n* image/jpg\n* image/gif\n* image/tif\n* application/vnd.ms-works\n* application/msexcel\n* application/mspowerpoint\n* application/msaccess\n* application/x-ms-wmv\n* application/pdf\n* application/x-rpm\n* application/tar\n* application/x-7z-compressed\n* application/java-archive\n* application/zip\n* application/x-lzh\n\n### <a name=\"InferAvroShema\"></a>InferAvroShema\nExamines the contents of the incoming FlowFile to infer an Avro schema. The processor will use the Kite SDK to make an attempt to automatically generate an Avro schema from the incoming content. When inferring the schema from JSON data the key names will be used in the resulting Avro schema definition. When inferring from CSV data a \"header definition\" must be present either as the first line of the incoming data or the \"header definition\" must be explicitly set in the property \"CSV Header Definition\". A \"header definition\" is simply a single comma separated line defining the names of each column. The \"header definition\" is required in order to determine the names that should be given to each field in the resulting Avro definition. When inferring data types the higher order data type is always used if there is ambiguity. For example when examining numerical values the type may be set to \"long\" instead of \"integer\" since a long can safely hold the value of any \"integer\". Only CSV and JSON content is currently supported for automatically inferring an Avro schema. The type of content present in the incoming FlowFile is set by using the property \"Input Content Type\". The property can either be explicitly set to CSV, JSON, or \"use mime.type value\" which will examine the value of the mime.type attribute on the incoming FlowFile to determine the type of content present.\n\n### <a name=\"InvokeHTTP\"></a>InvokeHTTP\nMaking requests to remote HTTP servers. Supporting common HTTP methods. Storing results as new flowfiles upon success. Routing to failure on error.\n\nAn HTTP client processor that converts FlowFile attributes to HTTP headers with configurable HTTP method, URL, etc.\n\n### <a name=\"InvokeScriptedProcessor\"></a>InvokeScriptedProcessor\nExperimental - Invokes a script engine for a Processor defined in the given script. The script must define a valid class that implements the Processor interface, and it must set a variable 'processor' to an instance of the class. Processor methods such as onTrigger() will be delegated to the scripted Processor instance. Also any Relationships or PropertyDescriptors defined by the scripted processor will be added to the configuration dialog. Experimental: Impact of sustained usage not yet verified.\n\n### <a name=\"JoltTransformJSON\"></a>JoltTransformJSON\nApplies a list of Jolt specifications to the flowfile JSON payload. A new FlowFile is created with transformed content and is routed to the 'success' relationship. If the JSON transform fails, the original FlowFile is routed to the 'failure' relationship.\n\n### <a name=\"ListenHTTP\"></a>ListenHTTP\nThis processor starts an HTTP service that is used to receive FlowFiles from remote sources. The URL of the service is http://{hostname}:{port}/contentListener.\n\n### <a name=\"ListenLumberjack\"></a>ListenLumberjack\nListens for Lumberjack messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more Lumberjack frames. In the case where the Lumberjack frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.\n\n### <a name=\"ListenRELP\"></a>ListenRELP\nListens for RELP messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more RELP frames. In the case where the RELP frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.\n\n### <a name=\"ListenSyslog\"></a>ListenSyslog\nListens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The timestamp can be an RFC5424 timestamp with a format of \"yyyy-MM-dd'T'HH:mm:ss.SZ\" or \"yyyy-MM-dd'T'HH:mm:ss.S+hh:mm\", or it can be an RFC3164 timestamp with a format of \"MMM d HH:mm:ss\". If an incoming messages matches one of these patterns, the message will be parsed and the individual pieces will be placed in FlowFile attributes, with the original message in the content of the FlowFile. If an incoming message does not match one of these patterns it will not be parsed and the syslog.valid attribute will be set to false with the original message in the content of the FlowFile. Valid messages will be transferred on the success relationship, and invalid messages will be transferred on the invalid relationship.\n\n### <a name=\"ListenTCP\"></a>ListenTCP\nListens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb.\n\n### <a name=\"ListenUDP\"></a>ListenUDP\nThis processor listens for Datagram Packets on a given port and concatenates the contents of those packets together generating flow files\n\n### <a name=\"ListFile\"></a>ListFile\nRetrieves a listing of files from the local filesystem. For each file that is listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with ListFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetFile, this Processor does not delete any data from the local filesystem.\n\n### <a name=\"ListHDFS\"></a>ListHDFS\nThis processor retrieves a listing of files from HDFS. For each file that is listed in HDFS, creates a FlowFile that represents the HDFS file so that it can be fetched in conjunction with ListHDFS. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetHDFS, this Processor does not delete any data from HDFS.\n\n### <a name=\"ListS3\"></a>ListS3\nRetrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.\n\n### <a name=\"ListSFTP\"></a>ListSFTP\nPerforms a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files.\n\n### <a name=\"LogAttribute\"></a>LogAttribute\nThis processor reads the attributes on incoming FlowFiles and prints those attributes and their values to the log at the logging level specified by the user.\n\n### <a name=\"MergeContent\"></a>MergeContent\nThis processor merges a group of FlowFiles together into a \"Bundle\" based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the processor be configured with only a single incoming connection, as groups of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate. After files have been merged by this processor, they can be unpackaged later using the UnpackContent processor.\n\n### <a name=\"ModifyBytes\"></a>ModifyBytes\nThis processor updates the content of a FlowFile by removing bytes from start or end of a file.\n\n### <a name=\"ModifyHTMLElement\"></a>ModifyHTMLElement\nModifies the value of an existing HTML element. The desired element to be modified is located by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then \"queried\" using the user defined CSS selector string to find the element the user desires to modify. If the HTML element is found the element's value is updated in the DOM using the value specified \"Modified Value\" property. All DOM elements that match the CSS selector will be updated. Once all of the DOM elements have been updated the DOM is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at \"http://jsoup.org/apidocs/org/jsoup/select/Selector.html\"\n\n### <a name=\"MonitorActivity\"></a>MonitorActivity\nThis processor monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow's activity is restored.\n\n### <a name=\"ParseSyslog\"></a>ParseSyslog\nParses the contents of a Syslog message and adds attributes to the FlowFile for each of the parts of the Syslog message\n\n### <a name=\"PostHTTP\"></a>PostHTTP\nThis processor performs an HTTP post with the content of each incoming FlowFile.\n\n### <a name=\"PublishAMQP\"></a>PublishAMQP\nCreates a AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange.In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the 'Routing Key' to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect. Fixing the binding (normally done by AMQP administrator) will resolve the issue.\n\n### <a name=\"PublishJMS\"></a>PublishJMS\nCreates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage.\n\n### <a name=\"PublishKafka\"></a>PublishKafka\nSends the contents of a FlowFile as a message to Apache Kafka. The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line.\n\n### <a name=\"PublishMQTT\"></a>PublishMQTT\nPublishes a message to an MQTT topic\n\n### <a name=\"PutAzureEventHub\"></a>PutAzureEventHub\nSends the contents of a FlowFile to a Windows Azure Event Hub. Note: the content of the FlowFile will be buffered into memory before being sent, so care should be taken to avoid sending FlowFiles to this Processor that exceed the amount of Java Heap Space available.\n\n### <a name=\"PutCassandraQL\"></a>PutCassandraQL\nExecute provided Cassandra Query Language (CQL) statement on a Cassandra 1.x or 2.x cluster. The content of an incoming FlowFile is expected to be the CQL command to execute. The CQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention cql.args.N.type and cql.args.N.value, where N is a positive integer. The cql.args.N.type is expected to be a lowercase string indicating the Cassandra type.\n\n### <a name=\"PutCouchbaseKey\"></a>PutCouchbaseKey\nPut a document to Couchbase Server via Key/Value access.\n\n### <a name=\"PutDistributedMapCache\"></a>PutDistributedMapCache\nGets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. If the cache already contains the entry and the cache update strategy is 'keep original' the entry is not replaced.'\n\n### <a name=\"PutDynamoDB\"></a>PutDynamoDB\nPuts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.\n\n### <a name=\"PutElasticsearch\"></a>PutElasticsearch\nWrites the contents of a FlowFile to Elasticsearch, using the specified parameters such as the index to insert into and the type of the document. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.\n\n### <a name=\"PutEmail\"></a>PutEmail\nThis processor sends an e-mail to configured recipients for each incoming FlowFile.\n\n### <a name=\"PutFile\"></a>PutFile\nThis processor writes FlowFiles to the local file system.\n\n### <a name=\"PutFTP\"></a>PutFTP\nThis processor sends FlowFiles via FTP to an FTP server.\n\n### <a name=\"PutHBaseCell\"></a>PutHBaseCell\nAdds the Contents of a FlowFile to HBase as the value of a single cell\n\n### <a name=\"PutHBaseJSON\"></a>PutHBaseJSON\nAdds rows to HBase based on the contents of incoming JSON documents. Each FlowFile must contain a single UTF-8 encoded JSON document, and any FlowFiles where the root element is not a single document will be routed to failure. Each JSON field name and value will become a column qualifier and value of the HBase row. Any fields with a null value will be skipped, and fields with a complex value will be handled according to the Complex Field Strategy. The row id can be specified either directly on the processor through the Row Identifier property, or can be extracted from the JSON document by specifying the Row Identifier Field Name property. This processor will hold the contents of all FlowFiles for the given batch in memory at one time.\n\n### <a name=\"PutHDFS\"></a>PutHDFS\nThis processor writes FlowFiles to an HDFS cluster. It will create directories in which to store files as needed based on the Directory property.\n\nWhen files are written to HDFS, the file's owner is the user identity of the NiFi process, the file's group is the group of the parent directory, and the read/write/execute permissions use the default umask. The owner can be overridden using the Remote Owner property, the group can be overridden using the Remote Group property, and the read/write/execute permissions can be overridden using the Permissions umask property.\n\nNOTE: This processor can change owner or group only if the user identity of the NiFi process has super user privilege in HDFS to do so.\n\nNOTE: The Permissions umask property cannot add execute permissions to regular files.\n\n### <a name=\"PutHiveQL\"></a>PutHiveQL\nExecutes a HiveQL DDL/DML command (UPDATE, INSERT, e.g.). The content of an incoming FlowFile is expected to be the HiveQL command to execute. The HiveQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention hiveql.args.N.type and hiveql.args.N.value, where N is a positive integer. The hiveql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.\n\n### <a name=\"PutHTMLElement\"></a>PutHTMLElement\nPlaces a new HTML element in the existing HTML DOM. The desired position for the new HTML element is specified by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML DOM location may be located in a similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then \"queried\" using the user defined CSS selector string to find the position where the user desires to add the new HTML element. Once the new HTML element is added to the DOM it is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at \"http://jsoup.org/apidocs/org/jsoup/select/Selector.html\"\n\n### <a name=\"PutJMS\"></a>PutJMS\nThis processor creates a JMS message from the contents of a FlowFile and sends the message to a JMS server.\n\n### <a name=\"PutKafka\"></a>PutKafka\nThis Processors puts the contents of a FlowFile to a Topic in Apache Kafka. The full contents of a FlowFile becomes the contents of a single message in Kafka. This message is optionally assigned a key by using the <Kafka Key> Property.\n\nThe Processor allows the user to configure an optional Message Delimiter that can be used to send many messages per FlowFile. For example, a \\n could be used to indicate that the contents of the FlowFile should be used to send one message per line of text. If the property is not set, the entire contents of the FlowFile will be sent as a single message. When using the delimiter, if some messages are successfully sent but other messages fail to send, the FlowFile will be FORKed into two child FlowFiles, with the successfully sent messages being routed to 'success' and the messages that could not be sent going to 'failure'.\n\n### <a name=\"PutKinesisFirehose\"></a>PutKinesisFirehose\nSends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.\n\n### <a name=\"PutLambda\"></a>PutLambda\nSends the contents to a specified Amazon Lamba Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.\n\n### <a name=\"PutMongo\"></a>PutMongo\nWrites the contents of a FlowFile to MongoDB\n\n### <a name=\"PutRiemann\"></a>PutRiemann\nSend events to Riemann (http://riemann.io) when FlowFiles pass through this processor. You can use events to notify Riemann that a FlowFile passed through, or you can attach a more meaningful metric, such as, the time a FlowFile took to get to this processor. All attributes attached to events support the NiFi Expression Language.\n\n### <a name=\"PutS3Object\"></a>PutS3Object\nPuts FlowFiles to an Amazon S3 Bucket\n\n### <a name=\"PutSFTP\"></a>PutSFTP\nThis processor sends FlowFiles via SFTP to an SFTP server.\n\n### <a name=\"PutSlack\"></a>PutSlack\nThe PutSlack processor sends messages to Slack, a team-oriented messaging service.\n\nThis processor uses Slack's incoming webhooks custom integration to post messages to a specific channel. Before using PutSlack, your Slack team should be configured for the incoming webhooks custom integration, and you'll need to configure at least one incoming webhook.\n\n### <a name=\"PutSNS\"></a>PutSNS\nSends the content of a FlowFile as a notification to the Amazon Simple Notification Service\n\n### <a name=\"PutSolrContentStream\"></a>PutSolrContentStream\nThis processor streams the contents of a FlowFile to an Apache Solr update handler. Any properties added to this processor by the user are passed to Solr on the update request. If a parameter must be sent multiple times with different values, properties can follow a naming convention: name.number, where name is the parameter name and number is a unique number. Repeating parameters will be sorted by their property name.\n\nExample: To specify multiple 'f' parameters for indexing custom json, the following properties can be defined:\n\n* split: /exams\n* f.1: first:/first\n* f.2: last:/last\n* f.3: grade:/grade\nThis will result in sending the following url to Solr: \nsplit=/exams&f=first:/first&f=last:/last&f=grade:/grade\n\n### <a name=\"PutSplunk\"></a>PutSplunk\nSends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.\n\n### <a name=\"PutSQL\"></a>PutSQL\nExecutes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.\n\n### <a name=\"PutSQS\"></a>PutSQS\nPublishes a message to an Amazon Simple Queuing Service Queue\n\n### <a name=\"PutSyslog\"></a>PutSyslog\nSends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the \"Message ___\" properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of \"yyyy-MM-dd'T'HH:mm:ss.SZ\" or \"yyyy-MM-dd'T'HH:mm:ss.S+hh:mm\", or it can be an RFC3164 timestamp with a format of \"MMM d HH:mm:ss\". If a message is constructed that does not form a valid Syslog message according to the above description, then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship, failures routed to the failure relationship.\n\n### <a name=\"PutTCP\"></a>PutTCP\nThe PutTCP processor receives a FlowFile and transmits the FlowFile content over a TCP connection to the configured TCP server. By default, the FlowFiles are transmitted over the same TCP connection (or pool of TCP connections if multiple input threads are configured). To assist the TCP server with determining message boundaries, an optional \"Outgoing Message Delimiter\" string can be configured which is appended to the end of each FlowFiles content when it is transmitted over the TCP connection. An optional \"Connection Per FlowFile\" parameter can be specified to change the behaviour so that each FlowFiles content is transmitted over a single TCP connection which is opened when the FlowFile is received and closed after the FlowFile has been sent. This option should only be used for low message volume scenarios, otherwise the platform may run out of TCP sockets.\n\n### <a name=\"PutUDP\"></a>PutUDP\nThe PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.\n\n### <a name=\"QueryCassandra\"></a>QueryCassandra\nExecute provided Cassandra Query Language (CQL) select query on a Cassandra 1.x or 2.x cluster. Query result may be converted to Avro or JSON format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute 'executecql.row.count' indicates how many rows were selected.\n\n### <a name=\"QueryDatabaseTable\"></a>QueryDatabaseTable\nExecute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute 'querydbtable.row.count' indicates how many rows were selected.\n\n### <a name=\"ReplaceText\"></a>ReplaceText\nThis processor updates the content of a FlowFile by evaluating a regular expression (regex) against the content and replacing the section of content that matches the regular expression with an alternate, user-defined, value.\n\n### <a name=\"ReplaceTextWithMapping\"></a>ReplaceTextWithMapping\nThis processor updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.\n\n### <a name=\"ResizeImage\"></a>ResizeImage\nResizes an image to user-specified dimensions. This Processor uses the image codecs registered with the environment that NiFi is running in. By default, this includes JPEG, PNG, BMP, WBMP, and GIF images.\n\n### <a name=\"RouteHL7\"></a>RouteHL7\nRoutes incoming HL7 data according to user-defined queries. To add a query, add a new property to the processor. The name of the property will become a new relationship for the processor, and the value is an HL7 Query Language query. If a FlowFile matches the query, a copy of the FlowFile will be routed to the associated relationship.\n\n### <a name=\"RouteOnAttribute\"></a>RouteOnAttribute\nThis processor routes FlowFiles based on their attributes using the NiFi Expression Language. Users add properties with valid NiFi Expression Language Expressions as the values. Each Expression must return a value of type Boolean (true or false).\n\nExample: The goal is to route all files with filenames that start with ABC down a certain path. Add a property with the following name and value:\n\n* property name: ABC\n* property value: ${filename:startsWith('ABC')}\nIn this example, all files with filenames that start with ABC will follow the ABC relationship.\n\n### <a name=\"RouteOnContent\"></a>RouteOnContent\nThis processor applies user-added regular expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose regular expression matches. The user adds properties where the name is the relationship that the FlowFile should follow if it matches the regular expression, which is defined as the property's value. User-defined properties do support the NiFi Expression Language, but in such cases, the results are interpreted as literal values, not regular expressions.\n\n### <a name=\"RouteText\"></a>RouteText\nRoutes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the 'Matching Strategy'. The data is then routed according to these rules, routing each line of the text individually.\n\n### <a name=\"ScanAttribute\"></a>ScanAttribute\nThis processor scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms.\n\n### <a name=\"ScanContent\"></a>ScanContent\nThis processor scans the content of FlowFiles for terms that are found in a user-supplied dictionary file. If a term is matched, the UTF-8 encoded version of the term is added to the FlowFile using the matching.term attribute. This allows for follow-on processors to use the value of the matching.term attribute to make routing decisions and so forth.\n\n### <a name=\"SegmentContent\"></a>SegmentContent\nThis processor segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given attributes that can then be used by the MergeContent processor to reconstruct the original FlowFile.\n\n### <a name=\"SelectHiveQL\"></a>SelectHiveQL\nExecute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute 'selecthiveql.row.count' indicates how many rows were selected.\n\n### <a name=\"SetSNMP\"></a>SetSNMP\nBased on incoming FlowFile attributes, the processor will execute SNMP Set requests. When founding attributes with name like snmp$<OID>, the processor will atempt to set the value of attribute to the corresponding OID given in the attribute name\n\n### <a name=\"SplitAvro\"></a>SplitAvro\nSplits a binary encoded Avro datafile into smaller files based on the configured Output Size. The Output Strategy determines if the smaller files will be Avro datafiles, or bare Avro records with metadata in the FlowFile attributes. The output will always be binary encoded.\n\n### <a name=\"SplitContent\"></a>SplitContent\nThis processor splits incoming FlowFiles by a specified byte sequence.\n\n### <a name=\"SplitJson\"></a>SplitJson\nThis processor splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. Each generated FlowFile is comprised of an element of the specified array and transferred to relationship 'split,' with the original file transferred to the 'original' relationship. If the specified JsonPath is not found or does not evaluate to an array element, the original file is routed to 'failure' and no files are generated.\n\n### <a name=\"SplitText\"></a>SplitText\nThis processor splits a text file into multiple smaller text files on line boundaries, each having up to a configured number of lines.\n\n### <a name=\"SplitXML\"></a>SplitXML\nThis processor splits an XML file into multiple separate FlowFiles, each comprising a child or descendant of the original root element.\n\n### <a name=\"SpringContextProcessor\"></a>SpringContextProcessor\nA Processor that supports sending and receiving data from application defined in Spring Application Context via predefined in/out MessageChannels.\n\n### <a name=\"StoreInKiteDataset\"></a>StoreInKiteDataset\nStores Avro records in a Kite dataset.\n\n### <a name=\"TailFile\"></a>TailFile\n\"Tails\" a file, ingesting data from the file as it is written to the file. The file is expected to be textual. Data is ingested only when a new line is encountered (carriage return or new-line character or combination). If the file to tail is periodically \"rolled over\", as is generally the case with log files, an optional Rolling Filename Pattern can be used to retrieve data from files that have rolled over, even if the rollover occurred while NiFi was not running (provided that the data still exists upon restart of NiFi). It is generally advisable to set the Run Schedule to a few seconds, rather than running with the default value of 0 secs, as this Processor will consume a lot of resources if scheduled very aggressively. At this time, this Processor does not support ingesting files that have been compressed when 'rolled over'.\n\n### <a name=\"TransformXML\"></a>TransformXML\nThis processor transforms the contents of FlowFiles based on a user-specified XSLT stylesheet file. XSL versions 1.0 and 2.0 are supported.\n\n### <a name=\"UnpackContent\"></a>UnpackContent\nThis processor unpacks the content of FlowFiles that have been packaged with one of several different packaging formats, emitting one to many FlowFiles for each input FlowFile.\n\n### <a name=\"UpdateAttribute\"></a>UpdateAttribute\nThis processor updates the attributes of a FlowFile using properties or rules that are added by the user. There are two ways to use this processor to add or modify attributes. One way is the \"Basic Usage\"; this allows you to set default attribute changes that affect every FlowFile going through the processor. The second way is the \"Advanced Usage\"; this allows you to make conditional attribute changes that only affect a FlowFile if it meets certain conditions. It is possible to use both methods in the same processor at the same time.\n\n### <a name=\"ValidateXML\"></a>ValidateXML\nThis processor validates the contents of FlowFiles against a user-specified XML schema file.\n\n### <a name=\"YandexTranslate\"></a>YandexTranslate\nTranslates content and attributes from one language to another\n\nIf you have questions about a processor, I'd encourage you to download the binaries and start up Apache Nifi. Now you can see the documentation for processors in [their documentation](https://nifi.apache.org/docs.html), which is now built with each release. If you really want more information, let me know and I'll try to compile a more complete post about each and every processor.","html":"<p><strong> Includes all processors through release 0.7.0 </strong></p>\n\n<p>I looked around at what can be done with Apache NiFi and didn’t notice a list of processors without looking at the code or building the project. I think a list of available processors, the work horse of Apache Nifi, would greatly help decide if it is right for certain needs. So, I went into the usage guide in the Apache Nifi UI and pulled a list of processors and a quick description for those who want to know what possibilities there are before getting into nifi itself!</p>\n\n<h1>List of processors</h1>\n\n<p>With new releases of Nifi, the number of processors have increased from the original 53 to 154!\nHere is a list of all 154 processors, listed alphabetically, that are currently in Apache Nifi as of the most rescent release. Each one links to a description of the processor further down. The Usage documentation available in the web ui has much more detail about each processor, it’s properties, modifiable attributes, and relationships and each processor has it’s own page in the UI, so here is just a quick overview. Again, this content is taken directly from Nifi’s Usage guide in their web UI and all credit/rights belong to them under the Apache 2.0 License.</p>\n\n<p>Nifi has improved their documentation, which was originally only available when running apache nifi. The documentation now is produced through the build process and has been added to <a href=\"https://nifi.apache.org/docs.html\">their website</a>. So if you need more information or more detail about each processor just check there.</p>\n\n<!--more -->\n\n\n<ul>\n<li><a href=\"#AttributesToJSON\">AttributesToJSON</a></li>\n<li><a href=\"#Base64EncodeContent\">Base64EncodeContent</a></li>\n<li><a href=\"#CompressContent\">CompressContent</a></li>\n<li><a href=\"#ConsumeAMQP\">ConsumeAMQP</a></li>\n<li><a href=\"#ConsumeJMS\">ConsumerJMS</a></li>\n<li><a href=\"#ConsumeKafka\">ConsumeKafka</a></li>\n<li><a href=\"#ConsumeMQTT\">ConsumeMQTT</a></li>\n<li><a href=\"#ControlRate\">ControlRate</a></li>\n<li><a href=\"#ConvertAvroSchema\">ConvertAvroSchema</a></li>\n<li><a href=\"#ConvertAvroToJSON\">ConvertAvroToJSON</a></li>\n<li><a href=\"#ConvertCharacterSet\">ConvertCharacterSet</a></li>\n<li><a href=\"#ConvertCSVToAvro\">ConvertCSVToAvro</a></li>\n<li><a href=\"#ConvertJSONToAvro\">ConvertJSONToAvro</a></li>\n<li><a href=\"#ConvertCSVToSQL\">ConvertCSVToSQL</a> DEPRECATED - no longer available as of 0.5.0</li>\n<li><a href=\"#ConvertJSONToSQL\">ConvertJSONToSQL</a></li>\n<li><a href=\"#CreateHadoopSequenceFile\">CreateHadoopSequenceFile</a></li>\n<li><a href=\"#DebugFlow\">DebugFlow</a></li>\n<li><a href=\"#DeleteS3Object\">DeleteS3Object</a></li>\n<li><a href=\"#DeleteSQS\">DeleteSQS</a></li>\n<li><a href=\"#DetectDuplicate\">DetectDuplicate</a></li>\n<li><a href=\"#DistributeLoad\">DistributeLoad</a></li>\n<li><a href=\"#DuplicateFlowFile\">DuplicateFlowFile</a></li>\n<li><a href=\"#EncryptContent\">EncryptContent</a></li>\n<li><a href=\"#EvaluateJSONPath\">EvaluateJSONPath</a></li>\n<li><a href=\"#EvaluateRegularExpression\">EvaluateRegularExpression</a></li>\n<li><a href=\"#ExtractMediaMetadata\">ExtractMediaMetadata</a>\nDEPRECATED-Use <a href=\"#ExtractText\">ExtractText</a></li>\n<li><a href=\"#EvaluateXPath\">EvaluateXPath</a></li>\n<li><a href=\"#EvaluateXQuery\">EvaluateXQuery</a></li>\n<li><a href=\"#ExecuteFlumeSink\">ExecuteFlumeSink</a></li>\n<li><a href=\"#ExecuteFlumeSource\">ExecuteFlumeSource</a></li>\n<li><a href=\"#ExecuteProcess\">ExecuteProcess</a></li>\n<li><a href=\"#ExecuteScript\">ExecuteScript</a></li>\n<li><a href=\"#ExecuteSQL\">ExecuteSQL</a></li>\n<li><a href=\"#ExecuteStreamCommand\">ExecuteStreamCommand</a></li>\n<li><a href=\"#ExtractAvroMetadata\">ExtractAvroMetadata</a></li>\n<li><a href=\"#ExtractHL7Attributes\">ExtractHL7Attributes</a></li>\n<li><a href=\"#ExtractImageMetadata\">ExtractImageMetadata</a></li>\n<li><a href=\"#ExtractText\">ExtractText</a></li>\n<li><a href=\"#FetchDistributedMapCache\">FetchDistributedMapCache</a></li>\n<li><a href=\"#FetchElasticSearch\">FetchElasticSearch</a></li>\n<li><a href=\"#FetchFile\">FetchFile</a></li>\n<li><a href=\"#FetchHDFS\">FetchHDFS</a></li>\n<li><a href=\"#FetchS3Object\">FetchS3Object</a></li>\n<li><a href=\"#FetchSFTP\">FetchSFTP</a></li>\n<li><a href=\"#GenerateFlowFile\">GenerateFlowFile</a></li>\n<li><a href=\"#GeoEnrichIP\">GeoEnrichIP</a></li>\n<li><a href=\"#GetAzureEventHub\">GetAzureEventHub</a></li>\n<li><a href=\"#GetCouchbaseKey\">GetCouchbaseKey</a></li>\n<li><a href=\"#GetDynamoDB\">GetDynamoDB</a></li>\n<li><a href=\"#GetFile\">GetFile</a></li>\n<li><a href=\"#GetFTP\">GetFTP</a></li>\n<li><a href=\"#GetHBase\">GetHBase</a></li>\n<li><a href=\"#GetHDFS\">GetHDFS</a></li>\n<li><a href=\"#GetHDFSEvents\">GetHDFSEvents</a></li>\n<li><a href=\"#GetHDFSSequenceFile\">GetHDFSSequenceFile</a></li>\n<li><a href=\"#GetHTMLElement\">GetHTMLElement</a></li>\n<li><a href=\"#GetHTTP\">GetHTTP</a></li>\n<li><a href=\"#GetJMSQueue\">GetJMSQueue</a></li>\n<li><a href=\"#GetJMSTopic\">GetJMSTopic</a></li>\n<li><a href=\"#GetKafka\">GetKafka</a></li>\n<li><a href=\"#GetMongo\">GetMongo</a></li>\n<li><a href=\"#GetSFTP\">GetSFTP</a></li>\n<li><a href=\"#GetSNMP\">GetSNMP</a></li>\n<li><a href=\"#GetSolr\">GetSolr</a></li>\n<li><a href=\"#GetSplunk\">GetSplunk</a></li>\n<li><a href=\"#GetSQS\">GetSQS</a></li>\n<li><a href=\"#GetTwitter\">GetTwitter</a></li>\n<li><a href=\"#HandleHttpRequest\">HandleHttpRequest</a></li>\n<li><a href=\"#HandleHttpResponse\">HandleHttpResponse</a></li>\n<li><a href=\"#HashAttribute\">HashAttribute</a></li>\n<li><a href=\"#HashContent\">HashContent</a></li>\n<li><a href=\"#IdentifyMimeType\">IdentifyMimeType</a></li>\n<li><a href=\"#InferAvroShema\">InferAvroShema</a></li>\n<li><a href=\"#InvokeHTTP\">InvokeHTTP</a></li>\n<li><a href=\"#InvokeScriptedProcessor\">InvokeScriptedProcessor</a></li>\n<li><a href=\"#JoltTransformJSON\">JoltTransformJSON</a></li>\n<li><a href=\"#ListenHTTP\">ListenHTTP</a></li>\n<li><a href=\"#ListenLumberjack\">ListenLumberjack</a></li>\n<li><a href=\"#ListenRELP\">ListenRELP</a></li>\n<li><a href=\"#ListenSyslog\">ListenSyslog</a></li>\n<li><a href=\"#ListenTCP\">ListenTCP</a></li>\n<li><a href=\"#ListenUDP\">ListenUDP</a></li>\n<li><a href=\"#ListFile\">ListFile</a></li>\n<li><a href=\"#ListHDFS\">ListHDFS</a></li>\n<li><a href=\"#ListS3\">ListS3</a></li>\n<li><a href=\"#ListSFTP\">ListSFTP</a></li>\n<li><a href=\"#LogAttribute\">LogAttribute</a></li>\n<li><a href=\"#MergeContent\">MergeContent</a></li>\n<li><a href=\"#ModifyBytes\">ModifyBytes</a></li>\n<li><a href=\"#ModifyHTMLElement\">ModifyHTMLElement</a></li>\n<li><a href=\"#MonitorActivity\">MonitorActivity</a></li>\n<li><a href=\"#ParseSyslog\">ParseSyslog</a></li>\n<li><a href=\"#PostHTTP\">PostHTTP</a></li>\n<li><a href=\"#PublishAMQP\">PublishAMQP</a></li>\n<li><a href=\"#PublishJMS\">PublishJMS</a></li>\n<li><a href=\"#PublishKafka\">PublishKafka</a></li>\n<li><a href=\"#PublishMQTT\">PublishMQTT</a></li>\n<li><a href=\"#PutAzureEventHub\">PutAzureEventHub</a></li>\n<li><a href=\"#PutCassandraQL\">PutCassandraQL</a></li>\n<li><a href=\"#PutCouchbaseKey\">PutCouchbaseKey</a></li>\n<li><a href=\"#PutDistributedMapCache\">PutDistributedMapCache</a></li>\n<li><a href=\"#PutDynamoDB\">PutDynamoDB</a></li>\n<li><a href=\"#PutElasticsearch\">PutElasticsearch</a></li>\n<li><a href=\"#PutEmail\">PutEmail</a></li>\n<li><a href=\"#PutFile\">PutFile</a></li>\n<li><a href=\"#PutFTP\">PutFTP</a></li>\n<li><a href=\"#PutHBaseCell\">PutHBaseCell</a></li>\n<li><a href=\"#PutHBaseJSON\">PutHBaseJSON</a></li>\n<li><a href=\"#PutHDFS\">PutHDFS</a></li>\n<li><a href=\"#PutHiveQL\">PutHiveQL</a></li>\n<li><a href=\"#PutHTMLElement\">PutHTMLElement</a></li>\n<li><a href=\"#PutJMS\">PutJMS</a></li>\n<li><a href=\"#PutKafka\">PutKafka</a></li>\n<li><a href=\"#PutKinesisFirehose\">PutKinesisFirehose</a></li>\n<li><a href=\"#PutLambda\">PutLambda</a></li>\n<li><a href=\"#PutMongo\">PutMongo</a></li>\n<li><a href=\"#PutRiemann\">PutRiemann</a></li>\n<li><a href=\"#PutS3Object\">PutS3Object</a></li>\n<li><a href=\"#PutSlack\">PutSlack</a></li>\n<li><a href=\"#PutSFTP\">PutSFTP</a></li>\n<li><a href=\"#PutSNS\">PutSNS</a></li>\n<li><a href=\"#PutSolrContentStream\">PutSolrContentStream</a></li>\n<li><a href=\"#PutSplunk\">PutSplunk</a></li>\n<li><a href=\"#PutSQL\">PutSQL</a></li>\n<li><a href=\"#PutSQS\">PutSQS</a></li>\n<li><a href=\"#PutSyslog\">PutSyslog</a></li>\n<li><a href=\"#PutTCP\">PutTCP</a></li>\n<li><a href=\"#PutUDP\">PutUDP</a></li>\n<li><a href=\"#QueryCassandra\">QueryCassandra</a></li>\n<li><a href=\"#QueryDatabaseTable\">QueryDatabaseTable</a></li>\n<li><a href=\"#ReplaceText\">ReplaceText</a></li>\n<li><a href=\"#ReplaceTextWithMapping\">ReplaceTextWithMapping</a></li>\n<li><a href=\"#ResizeImage\">ResizeImage</a></li>\n<li><a href=\"#RouteHL7\">RouteHL7</a></li>\n<li><a href=\"#RouteOnAttribute\">RouteOnAttribute</a></li>\n<li><a href=\"#RouteOnContent\">RouteOnContent</a></li>\n<li><a href=\"#RouteText\">RouteText</a></li>\n<li><a href=\"#ScanAttribute\">ScanAttribute</a></li>\n<li><a href=\"#ScanContent\">ScanContent</a></li>\n<li><a href=\"#SelectHiveQL\">SelectHiveQL</a></li>\n<li><a href=\"#SegmentContent\">SegmentContent</a></li>\n<li><a href=\"#SetSNMP\">SetSNMP</a></li>\n<li><a href=\"#SplitAvro\">SplitAvro</a></li>\n<li><a href=\"#SplitContent\">SplitContent</a></li>\n<li><a href=\"#SpringContextProcessor\">SpringContextProcessor</a></li>\n<li><a href=\"#SplitJson\">SplitJson</a></li>\n<li><a href=\"#SplitText\">SplitText</a></li>\n<li><a href=\"#SplitXML\">SplitXML</a></li>\n<li><a href=\"#StoreInKiteDataset\">StoreInKiteDataset</a></li>\n<li><a href=\"#TailFile\">TailFile</a></li>\n<li><a href=\"#TransformXML\">TransformXML</a></li>\n<li><a href=\"#UnpackContent\">UnpackContent</a></li>\n<li><a href=\"#UpdateAttribute\">UpdateAttribute</a></li>\n<li><a href=\"#ValidateXML\">ValidateXML</a></li>\n<li><a href=\"#YandexTranslate\">YandexTranslate</a></li>\n</ul>\n\n\n<h3><a name=\"AttributesToJSON\"></a>AttributesToJSON</h3>\n\n<p>Generates a JSON representation of the input FlowFile Attributes. The resulting JSON can be written to either a new Attribute ‘JSONAttributes’ or written to the FlowFile as content.</p>\n\n<h3><a name=\"Base64EncodeContent\"></a>Base64EncodeContent</h3>\n\n<p>This processor base64 encodes FlowFile content, or decodes FlowFile content from base64.</p>\n\n<h3><a name=\"CompressContent\"></a>CompressContent</h3>\n\n<p>This processor compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate</p>\n\n<h3><a name=\"ConsumeAMQP\"></a>ConsumeAMQP</h3>\n\n<p>Consumes AMQP Message transforming its content to a FlowFile and transitioning it to ‘success’ relationship</p>\n\n<h3><a name=\"ConsumeJMS\"></a>ConsumeJMS</h3>\n\n<p>Consumes JMS Message of type BytesMessage or TextMessage transforming its content to a FlowFile and transitioning it to ‘success’ relationship.</p>\n\n<h3><a name=\"ConsumeKafka\"></a>ConsumeKafka</h3>\n\n<p>This Processors polls Apache Kafka for data using KafkaConsumer API available with Kafka 0.9+. When a message is received from Kafka, this Processor emits a FlowFile where the content of the FlowFile is the value of the Kafka message.</p>\n\n<h3><a name=\"ConsumeMQTT\"></a>ConsumeMQTT</h3>\n\n<p>Subscribes to a topic and receives messages from an MQTT broker</p>\n\n<h3><a name=\"ControlRate\"></a>ControlRate</h3>\n\n<p>This processor controls the rate at which data is transferred to follow-on processors.</p>\n\n<h3><a name=\"ConvertAvroSchema\"></a>ConvertAvroSchema</h3>\n\n<p>Convert records from one Avro schema to another, including support for flattening and simple type conversions.</p>\n\n<p>This processor is used to convert data between two Avro formats, such as those coming from the ConvertCSVToAvro or ConvertJSONToAvro processors. The input and output content of the flow files should be Avro data files. The processor includes support for the following basic type conversions:</p>\n\n<p>Anything to String, using the data’s default String representation\nString types to numeric types int, long, double, and float\nConversion to and from optional Avro types\nIn addition, fields can be renamed or unpacked from a record type by using the dynamic properties.</p>\n\n<h3><a name =\"ConvertAvroToJSON\"></a>ConvertAvroToJSON</h3>\n\n<p>Converts a Binary Avro record into a JSON object. This processor provides a direct mapping of an Avro field to a JSON field, such that the resulting JSON will have the same hierarchical structure as the Avro document. Note that the Avro schema information will be lost, as this is not a translation from binary Avro to JSON formatted Avro. The output JSON is encoded the UTF-8 encoding. If an incoming FlowFile contains a stream of multiple Avro records, the resultant FlowFile will contain a JSON Array containing all of the Avro records.</p>\n\n<h3><a name=\"ConvertCharacterSet\"></a>ConvertCharacterSet</h3>\n\n<p>This processor converts a FlowFile’s content from one character set to another.</p>\n\n<h3><a name=\"ConvertCSVToAvro\"></a>ConvertCSVToAvro</h3>\n\n<p>Converts CSV files to Avro according to an Avro Schema</p>\n\n<h3><a name=\"ConvertCSVToJSON\"></a>ConvertCSVToJSON</h3>\n\n<p>Converts JSON files to Avro according to an Avro Schema</p>\n\n<h3><a name=\"ConvertCSVToSQL\"></a>ConvertCSVToSQL - DEPRECATED as of 0.5.0</h3>\n\n<p>Converts JSON files to Avro according to an Avro Schema\nConverts a JSON-formatted FlowFile into an UPDATE or INSERT SQL statement. The incoming FlowFile is expected to be “flat” JSON message, meaning that it consists of a single JSON element and each field maps to a simple type. If a field maps to a JSON object, that JSON object will be interpreted as Text. If the input is an array of JSON elements, each element in the array is output as a separate FlowFile to the ‘sql’ relationship. Upon successful conversion, the original FlowFile is routed to the ‘original’ relationship and the SQL is routed to the ‘sql’ relationship.</p>\n\n<h3><a name=\"ConvertJSONToSQL\"></a>ConvertJSONToSQL</h3>\n\n<p>Converts a JSON-formatted FlowFile into an UPDATE or INSERT SQL statement. The incoming FlowFile is expected to be “flat” JSON message, meaning that it consists of a single JSON element and each field maps to a simple type. If a field maps to a JSON object, that JSON object will be interpreted as Text. If the input is an array of JSON elements, each element in the array is output as a separate FlowFile to the ‘sql’ relationship. Upon successful conversion, the original FlowFile is routed to the ‘original’ relationship and the SQL is routed to the ‘sql’ relationship.</p>\n\n<h3><a name=\"CreateHadoopSequenceFile\"></a>CreateHadoopSequenceFile</h3>\n\n<p>This processor is used to create a Hadoop Sequence File, which essentially is a file of key/value pairs. The key will be a file name and the value will be the flow file content. The processor will take either a merged (a.k.a. packaged) flow file or a singular flow file. Historically, this processor handled the merging by type and size or time prior to creating a SequenceFile output; it no longer does this. If creating a SequenceFile that contains multiple files of the same type is desired, precede this processor with a RouteOnAttribute processor to segregate files of the same type and follow that with a MergeContent processor to bundle up files. If the type of files is not important, just use the MergeContent processor. When using the MergeContent processor, the following Merge Formats are supported by this processor:</p>\n\n<p>TAR\nZIP\nFlowFileStream v3\nThe created SequenceFile is named the same as the incoming FlowFile with the suffix ‘.sf’. For incoming FlowFiles that are bundled, the keys in the SequenceFile are the individual file names, the values are the contents of each file.\nNOTE: The value portion of a key/value pair is loaded into memory. While there is a max size limit of 2GB, this could cause memory issues if there are too many concurrent tasks and the flow file sizes are large.</p>\n\n<h3><a name=\"DebugFlow\"></a>DebugFlow</h3>\n\n<p>The DebugFlow processor aids testing and debugging the FlowFile framework by allowing various responses to be explicitly triggered in response to the receipt of a FlowFile or a timer event without a FlowFile if using timer or cron based scheduling. It can force responses needed to exercise or test various failure modes that can occur when a processor runs.</p>\n\n<h3><a name=\"DeleteDynamoDB\"></a>DeleteDynamoDB</h3>\n\n<p>Deletes a document from DynamoDB based on hash and range key. The key can be string or number. The request requires all the primary keys for the operation (hash or hash and range key)</p>\n\n<h3><a name=\"DeleteS3Object\"></a>DeleteS3Object</h3>\n\n<p>Deletes FlowFiles on an Amazon S3 Bucket. If attempting to delete a file that does not exist, FlowFile is routed to success.</p>\n\n<h3><a name=\"DeleteSQS\"></a>DeleteSQS</h3>\n\n<p>Deletes a message from an Amazon Simple Queuing Service Queue</p>\n\n<h3><a name=\"DetectDuplicate\"></a>DetectDuplicate</h3>\n\n<p>This processor detects duplicate data by examining flow file attributes, thus allowing the user to configure what it means for two FlowFiles to be considered “duplicates”. This processor does not read the contents of a flow file, and is typically preceded by another processor which computes a value based on the flow file content and adds that value to the flow file’s attributes; e.g. HashContent. Because this Processor needs to be able to work within a NiFi cluster, it makes use of a distributed cache service to determine whether or not the data has been seen previously.</p>\n\n<p>If the processor is to be run on a standalone instance of NiFi, that instance should have both a DistributedMapCacheClient and a DistributedMapCacheServer configured in its controller-services.xml file.</p>\n\n<h3><a name=\"DistributeLoad\"></a>DistributeLoad</h3>\n\n<p>This processor distributes FlowFiles to downstream processors based on a distribution strategy. The user may select the strategy “round robin”, the strategy “next available”, or “load distribution service”. If using the round robin strategy, the default is to assign each destination (i.e., relationship) a weighting of 1 (evenly distributed). However, the user may add optional properties to change this weighting. When adding a property, the name must be a positive integer between 1 and the number of relationships (inclusive). For example, if Number of Relationships has a value of 8 and a property is added with the name 5 and the value 10, then relationship 5 (among the 8) will receive 10 FlowFiles in each iteration instead of 1. All other relationships will receive 1 FlowFile in each iteration.</p>\n\n<h3><a name=\"DuplicateFlowFile\"></a>DuplicateFlowFile</h3>\n\n<p>Intended for load testing, this processor will create the configured number of copies of each incoming FlowFile</p>\n\n<h3><a name=\"EncryptContent\"></a>EncryptContent</h3>\n\n<p>Encrypts or Decrypts a FlowFile using either symmetric encryption with a password and randomly generated salt, or asymmetric encryption using a public and secret key.</p>\n\n<h3><a name=\"EvaluateJsonPath\"></a>EvaluateJsonPath</h3>\n\n<p>Evaluates one or more JsonPath expressions against the content of a FlowFile. The results of those expressions are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. JsonPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid JsonPath expression. If the JsonPath evaluates to a JSON array or JSON object and the Return Type is set to ‘scalar’ the FlowFile will be unmodified and will be routed to failure. A Return Type of JSON can return scalar values if the provided JsonPath evaluates to the specified value and will be routed as a match. If Destination is ‘flowfile-content’ and the JsonPath does not evaluate to a defined path, the FlowFile will be routed to ‘unmatched’ without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to ‘matched’.</p>\n\n<h3><a name=\"EvaluateRegularExpression\"></a>EvaluateRegularExpression</h3>\n\n<p>WARNING: This has been deprecated and will be removed in 0.2.0. Use ExtractText instead.</p>\n\n<h3><a name=\"EvaluateXPath\"></a>EvaluateXPath</h3>\n\n<p>This processor evaluates one or more XPaths against the content of a FlowFile. The results of those XPaths are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XPaths are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is flowfile-attribute; otherwise, the property name is ignored). The value of the property must be a valid XPath expression. If the XPath evaluates to more than one node and the Return Type is set to ‘nodeset’ (either directly, or via ‘auto-detect’ with a Destination of ‘flowfile-content’), the FlowFile will be unmodified and will be routed to failure. If the XPath does not evaluate to a Node, the FlowFile will be routed to ‘unmatched’ without having its contents modified. If Destination is flowfile-attribute and the expression matches nothing, attributes will be created with empty strings as the value, and the FlowFile will always be routed to ‘matched’</p>\n\n<h3><a name=\"EvaluateXQuery\"></a>EvaluateXQuery</h3>\n\n<p>This processor evaluates one or more XQueries against the content of a FlowFile. The results of those XQueries are assigned to FlowFile Attributes or are written to the content of the FlowFile itself, depending on configuration of the Processor. XQueries are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed (if the Destination is ‘flowfile-attribute’; otherwise, the property name is ignored). The value of the property must be a valid XQuery. If the XQuery returns more than one result, new attributes or FlowFiles (for Destinations of ‘flowfile-attribute’ or ‘flowfile-content’ respectively) will be created for each result (attributes will have a ‘.n’ one-up number appended to the specified attribute name). If any provided XQuery returns a result, the FlowFile(s) will be routed to ‘matched’. If no provided XQuery returns a result, the FlowFile will be routed to ‘unmatched’. If the Destination is ‘flowfile-attribute’ and the XQueries matche nothing, no attributes will be applied to the FlowFile.</p>\n\n<h3><a name=\"ExecuteFlumeSink\"></a>ExecuteFlumeSink</h3>\n\n<p>This processor executes a Flume sink. Each input FlowFile is converted into a Flume Event for processing by the sink.</p>\n\n<h3><a name=\"ExecuteFlumeSource\"></a>ExecuteFlumeSource</h3>\n\n<p>Execute a Flume source. Each Flume Event is sent to the success relationship as a FlowFile</p>\n\n<h3><a name=\"ExecuteProcess\"></a>ExecuteProcess</h3>\n\n<p>Runs an operating system command specified by the user and writes the output of that command to a FlowFile. If the command is expected to be long-running, the Processor can output the partial data on a specified interval. When this option is used, the output is expected to be in textual format, as it typically does not make sense to split binary data on arbitrary time-based intervals.</p>\n\n<h3><a name=\"ExecuteScript\"></a>ExecuteScript</h3>\n\n<p>Experimental - Executes a script given the flow file and a process session. The script is responsible for handling the incoming flow file (transfer to SUCCESS or remove, e.g.) as well as any flow files created by the script. If the handling is incomplete or incorrect, the session will be rolled back. Experimental: Impact of sustained usage not yet verified.</p>\n\n<h3><a name=\"ExecuteSQL\"></a>ExecuteSQL</h3>\n\n<p>Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported</p>\n\n<h3><a name=\"ExecuteStreamCommand\"></a>ExecuteStreamCommand</h3>\n\n<p>This processor executes an external command on the contents of a FlowFile, and creates a new FlowFile with the results of the command.</p>\n\n<h3><a name=\"ExtractAvroMetadata\"></a>ExtractAvroMetadata</h3>\n\n<p>Extracts metadata from the header of an Avro datafile.</p>\n\n<h3><a name=\"ExtractHL7Attributes\"></a>ExtractHL7Attributes</h3>\n\n<p>Extracts information from an HL7 (Health Level 7) formatted FlowFile and adds the information as FlowFile Attributes. The attributes are named as <Segment Name> <dot> <Field Index>. If the segment is repeating, the naming will be <Segment Name> <underscore> <Segment Index> <dot> <Field Index>. For example, we may have an attribute named “MHS.12” with a value of “2.1” and an attribute named “OBX_11.3” with a value of “93000<sup>CPT4</sup>”.</p>\n\n<h3><a name=\"ExtractImageMetadata\"></a>ExtractImageMetadata</h3>\n\n<p>Extract the image metadata from flowfiles containing images. This processor relies on this metadata extractor library <a href=\"https://github.com/drewnoakes/metadata-extractor.\">https://github.com/drewnoakes/metadata-extractor.</a> It extracts a long list of metadata types including but not limited to EXIF, IPTC, XMP and Photoshop fields. For the full list visit the library’s website.NOTE: The library being used loads the images into memory so extremely large images may cause problems</p>\n\n<h3><a name=\"ExtractMediaMetadata\"></a>ExtractMediaMetadata</h3>\n\n<p>Extract the content metadata from flowfiles containing audio, video, image, and other file types. This processor relies on the Apache Tika project for file format detection and parsing. It extracts a long list of metadata types for media files including audio, video, and print media formats.NOTE: the attribute names and content extracted may vary across upgrades because parsing is performed by the external Tika tools which in turn depend on other projects for metadata extraction. For the more details and the list of supported file types, visit the library’s website at <a href=\"http://tika.apache.org/.\">http://tika.apache.org/.</a></p>\n\n<h3><a name=\"ExtractText\"></a>ExtractText</h3>\n\n<p>Evaluates one or more Regular Expressions against the content of a FlowFile. The results of those Regular Expressions are assigned to FlowFile Attributes. Regular Expressions are entered by adding user-defined properties; the name of the property maps to the Attribute Name into which the result will be placed. The first capture group, if any found, will be placed into that attribute name.But all capture groups, including the matching string sequence itself will also be provided at that attribute name with an index value provided.The value of the property must be a valid Regular Expressions with one or more capturing groups. If the Regular Expression matches more than once, only the first match will be used. If any provided Regular Expression matches, the FlowFile(s) will be routed to ‘matched’. If no provided Regular Expression matches, the FlowFile will be routed to ‘unmatched’ and no attributes will be applied to the FlowFile.</p>\n\n<h3><a name=\"FetchDistributedMapCache\"></a>FetchDistributedMapCache</h3>\n\n<p>Computes a cache key from FlowFile attributes, for each incoming FlowFile, and fetches the value from the Distributed Map Cache associated with that key. The incoming FlowFile’s content is replaced with the binary data received by the Distributed Map Cache. If there is no value stored under that key then the flow file will be routed to ‘not-found’. Note that the processor will always attempt to read the entire cached value into memory before placing it in it’s destination. This could be potentially problematic if the cached value is very large.</p>\n\n<h3><a name=\"FetchElasticSearch\"></a>FetchElasticSearch</h3>\n\n<p>Retrieves a document from Elasticsearch using the specified connection properties and the identifier of the document to retrieve. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.</p>\n\n<h3><a name=\"FetchFile\"></a>FetchFile</h3>\n\n<p>Reads the contents of a file from disk and streams it into the contents of an incoming FlowFile. Once this is done, the file is optionally moved elsewhere or deleted to help keep the file system organized.</p>\n\n<h3><a name=\"FetchHDFS\"></a>FetchHDFS</h3>\n\n<p>Retrieves a file from HDFS. The content of the incoming FlowFile is replaced by the content of the file in HDFS. The file in HDFS is left intact without any changes being made to it.</p>\n\n<h3><a name=\"FetchS3Object\"></a>FetchS3Object</h3>\n\n<p>Retrieves the contents of an S3 Object and writes it to the content of a FlowFile</p>\n\n<h3><a name=\"FetchSFTP\"></a>FetchSFTP</h3>\n\n<p>Fetches the content of a file from a remote SFTP server and overwrites the contents of an incoming FlowFile with the content of the remote file.</p>\n\n<h3><a name=\"GenerateFlowFile\"></a>GenerateFlowFile</h3>\n\n<p>This processor creates FlowFiles of random data to be used for load testing purposes.</p>\n\n<h3><a name=\"GeoEnrichIP\"></a>GeoEnrichIP</h3>\n\n<p>Looks up geolocation information for an IP address and adds the geo information to FlowFile attributes. The geo data is provided as a MaxMind database. The attribute that contains the IP address to lookup is provided by the ‘IP Address Attribute’ property. If the name of the attribute provided is ‘X’, then the the attributes added by enrichment will take the form X.geo.<fieldName></p>\n\n<h3><a name=\"GetAzureEventHub\"></a>GetAzureEventHub</h3>\n\n<p>Receives messages from a Microsoft Azure Event Hub, writing the contents of the Azure message to the content of the FlowFile</p>\n\n<h3><a name=\"GetCouchbaseKey\"></a>GetCouchbaseKey</h3>\n\n<p>Get a document from Couchbase Server via Key/Value access. The ID of the document to fetch may be supplied by setting the <Document Id> property. NOTE: if the Document Id property is not set, the contents of the FlowFile will be read to determine the Document Id, which means that the contents of the entire FlowFile will be buffered in memory.</p>\n\n<h3><a name=\"GetDynamoDB\"></a>GetDynamoDB</h3>\n\n<p>Retrieves a document from DynamoDB based on hash and range key. The key can be string or number.For any get request all the primary keys are required (hash or hash and range based on the table keys).A Json Document (‘Map’) attribute of the DynamoDB item is read into the content of the FlowFile.</p>\n\n<h3><a name=\"GetFile\"></a>GetFile</h3>\n\n<p>This processor obtains FlowFiles from a local directory. NiFi will need at least read permissions on the files it will pull otherwise it will ignore them.</p>\n\n<h3><a name=\"GetFTP\"></a>GetFTP</h3>\n\n<p>This processor fetches files from an FTP server and creates FlowFiles from them.</p>\n\n<h3><a name=\"GetHBase\"></a>GetHBase</h3>\n\n<p>This Processor polls HBase for any records in the specified table. The processor keeps track of the timestamp of the cells that it receives, so that as new records are pushed to HBase, they will automatically be pulled. Each record is output in JSON format, as {“row”: “<row key>”, “cells”: { “<column 1 family>:<column 1 qualifier>”: “<cell 1 value>”, “<column 2 family>:<column 2 qualifier>”: “<cell 2 value>”, … }}. For each record received, a Provenance RECEIVE event is emitted with the format hbase://<table name>/<row key>, where <row key> is the UTF-8 encoded value of the row’s key.</p>\n\n<h3><a name=\"GetHDFS\"></a>GetHDFS</h3>\n\n<p>Fetch files from Hadoop Distributed File System (HDFS) into FlowFiles. This Processor will delete the file from HDFS after fetching it.</p>\n\n<h3><a name=\"GetHDFSEvents\"></a>GetHDFSEvents</h3>\n\n<p>This processor polls the notification events provided by the HdfsAdmin API. Since this uses the HdfsAdmin APIs it is required to run as an HDFS super user. Currently there are six types of events (append, close, create, metadata, rename, and unlink). Please see org.apache.hadoop.hdfs.inotify.Event documentation for full explanations of each event. This processor will poll for new events based on a defined duration. For each event received a new flow file will be created with the expected attributes and the event itself serialized to JSON and written to the flow file’s content. For example, if event.type is APPEND then the content of the flow file will contain a JSON file containing the information about the append event. If successful the flow files are sent to the ‘success’ relationship. Be careful of where the generated flow files are stored. If the flow files are stored in one of processor’s watch directories there will be a never ending flow of events. It is also important to be aware that this processor must consume all events. The filtering must happen within the processor. This is because the HDFS admin’s event notifications API does not have filtering.</p>\n\n<h3><a name=\"GetHDFSSequenceFile\"></a>GetHDFSSequenceFile</h3>\n\n<p>Fetch sequence files from Hadoop Distributed File System (HDFS) into FlowFiles</p>\n\n<h3><a name=\"GetHTMLElement\"></a>GetHTMLElement</h3>\n\n<p>Extracts HTML element values from the incoming flowfile’s content using a CSS selector. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string. The result of “querying” the HTML DOM may produce 0-N results. If no results are found the flowfile will be transferred to the “element not found” relationship to indicate so to the end user. If N results are found a new flowfile will be created and emitted for each result. The query result will either be placed in the content of the new flowfile or as an attribute of the new flowfile. By default the result is written to an attribute. This can be controlled by the “Destination” property. Resulting query values may also have data prepended or appended to them by setting the value of property “Prepend Element Value” or “Append Element Value”. Prepended and appended values are treated as string values and concatenated to the result retrieved from the HTML DOM query operation. A more thorough reference for the CSS selector syntax can be found at “<a href=\"http://jsoup.org/apidocs/org/jsoup/select/Selector.html\">http://jsoup.org/apidocs/org/jsoup/select/Selector.html</a>”</p>\n\n<h3><a name=\"GetHTTP\"></a>GetHTTP</h3>\n\n<p>This processor fetches files via HTTP and creates FlowFiles from them.</p>\n\n<h3><a name=\"GetJMSQueue\"></a>GetJMSQueue</h3>\n\n<p>This processor pulls messages from a JMS Queue, creating a FlowFile for each JMS message or bundle of messages, as configured.</p>\n\n<h3><a name=\"GetJMSTopic\"></a>GetJMSTopic</h3>\n\n<p>This processor pulls messages from a JMS Topic, creating a FlowFile for each JMS message or bundle of messages, as configured.</p>\n\n<h3><a name=\"GetKafka\"></a>GetKafka</h3>\n\n<p>This Processors polls Apache Kafka for data. When a message is received from Kafka, this Processor emits a FlowFile where the content of the FlowFile is the value of the Kafka message. If the message has a key associated with it, an attribute named kafka.key will be added to the FlowFile, with the value being the UTF-8 Encoded value of the Message’s Key.</p>\n\n<p>Kafka supports the notion of a Consumer Group when pulling messages in order to provide scalability while still offering a publish-subscribe interface. Each Consumer Group must have a unique identifier. The Consumer Group identifier that is used by NiFi is the UUID of the Processor. This means that all of the nodes within a cluster will use the same Consumer Group Identifier so that they do not receive duplicate data but multiple GetKafka Processors can be used to pull from multiple Topics, as each Processor will receive a different Processor UUID and therefore a different Consumer Group Identifier.</p>\n\n<h3><a name=\"GetMongo\"></a>GetMongo</h3>\n\n<p>Creates FlowFiles from documents in MongoDB</p>\n\n<h3><a name=\"GetSFTP\"></a>GetSFTP</h3>\n\n<p>This processor pulls files from an SFTP server and creates FlowFiles to encapsulate them.</p>\n\n<h3><a name=\"GetSNMP\"></a>GetSNMP</h3>\n\n<p>Retrieves information from SNMP Agent and outputs a FlowFile with information in attributes and without any content.</p>\n\n<h3><a name=\"GetSolr\"></a>GetSolr</h3>\n\n<p>Queries Solr and outputs the results as a FlowFile</p>\n\n<h3><a name=\"GetSplunk\"></a>GetSplunk</h3>\n\n<p>Retrieves data from Splunk Enterprise.</p>\n\n<h3><a name=\"GetSQS\"></a>GetSQS</h3>\n\n<p>Fetches messages from an Amazon Simple Queuing Service Queue</p>\n\n<h3><a name=\"GetTwitter\"></a>GetTwitter</h3>\n\n<p>Pulls status changes from Twitter’s streaming API</p>\n\n<h3><a name=\"HandleHttpRequest\"></a>HandleHttpRequest</h3>\n\n<p>This processor starts an HTTP server and creates a FlowFile for each HTTP Request that it receives. The Processor leaves the HTTP Connection open and is intended to be used in conjunction with a HandleHttpResponse Processor.</p>\n\n<p>The pairing of this Processor with a HandleHttpResponse Processor provides the ability to use NiFi to visually construct a web server that can carry out any functionality that is available through the existing Processors. For example, one could construct a Web-based front end to an SFTP Server by constructing a flow such as:</p>\n\n<p>HandleHttpRequest -> PutSFTP -> HandleHttpResponse</p>\n\n<p>The HandleHttpRequest Processor provides several Properties to configure which methods are supported, the paths that are supported, and SSL configuration. The FlowFiles that are generated by this Processor have the following attributes added to them, providing powerful routing capabilities and traceability of all data.</p>\n\n<h3><a name=\"HandleHttpResponse\"></a>HandleHttpResponse</h3>\n\n<p>This processor responds to an HTTP request that was received by the HandleHttpRequest Processor.</p>\n\n<p>The pairing of this Processor with a HandleHttpRequest Processor provides the ability to use NiFi to visually construct a web server that can carry out any functionality that is available through the existing Processors. For example, one could construct a Web-based front end to an SFTP Server by constructing a flow such as:</p>\n\n<p>HandleHttpRequest -> PutSFTP -> HandleHttpResponse</p>\n\n<p>This Processor must be configured with the same <HTTP Context Map> service as the corresponding HandleHttpRequest Processor. Otherwise, all FlowFiles will be routed to the ‘failure’ relationship.</p>\n\n<p>All FlowFiles must have an attribute named http.context.identifier. The value of this attribute is used to lookup the HTTP Response so that the proper message can be sent back to the requestor. If this attribute is missing, the FlowFile will be routed to ‘failure.’</p>\n\n<h3><a name=\"HashAttribute\"></a>HashAttribute</h3>\n\n<p>This processor hashes together the key/value pairs of several FlowFile attributes and adds the hash as a new attribute. The user may add optional properties such that the name of each property is the name of a FlowFile attribute to consider and the value of the property is a regular expression that, if matched by the attribute value, causes that attribute to be used as part of the hash. If the regular expression contains a capturing group, only the value of the capturing group is used.</p>\n\n<h3><a name=\"HashContent\"></a>HashContent</h3>\n\n<p>This processor calculates a hash value for the content of a FlowFile and puts the hash value on the FlowFile as an attribute whose name is determined by the Hash Attribute Name property.</p>\n\n<h3><a name=\"IdentifyMimeType\"></a>IdentifyMimeType</h3>\n\n<p>This processor attempts to identify the MIME Type used for a FlowFile. If the MIME Type can be identified, an attribute with the name ‘mime.type’ is added with the value being the MIME Type. If the MIME Type cannot be determined, the value will be set to ‘application/octet-stream’. In addition, the attribute mime.extension will be set if a common file extension for the MIME Type is known.</p>\n\n<p>The following MIME Types are detected:</p>\n\n<ul>\n<li>application/gzip</li>\n<li>application/bzip2</li>\n<li>application/flowfile-v3</li>\n<li>application/flowfile-v1 (requires Identify TAR be set to true)</li>\n<li>application/xml</li>\n<li>video/mp4</li>\n<li>video/x-m4v</li>\n<li>video/mp4a-latm</li>\n<li>video/quicktime</li>\n<li>video/mpeg</li>\n<li>audio/wav</li>\n<li>audio/mp3</li>\n<li>image/bmp</li>\n<li>image/png</li>\n<li>image/jpg</li>\n<li>image/gif</li>\n<li>image/tif</li>\n<li>application/vnd.ms-works</li>\n<li>application/msexcel</li>\n<li>application/mspowerpoint</li>\n<li>application/msaccess</li>\n<li>application/x-ms-wmv</li>\n<li>application/pdf</li>\n<li>application/x-rpm</li>\n<li>application/tar</li>\n<li>application/x-7z-compressed</li>\n<li>application/java-archive</li>\n<li>application/zip</li>\n<li>application/x-lzh</li>\n</ul>\n\n\n<h3><a name=\"InferAvroShema\"></a>InferAvroShema</h3>\n\n<p>Examines the contents of the incoming FlowFile to infer an Avro schema. The processor will use the Kite SDK to make an attempt to automatically generate an Avro schema from the incoming content. When inferring the schema from JSON data the key names will be used in the resulting Avro schema definition. When inferring from CSV data a “header definition” must be present either as the first line of the incoming data or the “header definition” must be explicitly set in the property “CSV Header Definition”. A “header definition” is simply a single comma separated line defining the names of each column. The “header definition” is required in order to determine the names that should be given to each field in the resulting Avro definition. When inferring data types the higher order data type is always used if there is ambiguity. For example when examining numerical values the type may be set to “long” instead of “integer” since a long can safely hold the value of any “integer”. Only CSV and JSON content is currently supported for automatically inferring an Avro schema. The type of content present in the incoming FlowFile is set by using the property “Input Content Type”. The property can either be explicitly set to CSV, JSON, or “use mime.type value” which will examine the value of the mime.type attribute on the incoming FlowFile to determine the type of content present.</p>\n\n<h3><a name=\"InvokeHTTP\"></a>InvokeHTTP</h3>\n\n<p>Making requests to remote HTTP servers. Supporting common HTTP methods. Storing results as new flowfiles upon success. Routing to failure on error.</p>\n\n<p>An HTTP client processor that converts FlowFile attributes to HTTP headers with configurable HTTP method, URL, etc.</p>\n\n<h3><a name=\"InvokeScriptedProcessor\"></a>InvokeScriptedProcessor</h3>\n\n<p>Experimental - Invokes a script engine for a Processor defined in the given script. The script must define a valid class that implements the Processor interface, and it must set a variable ‘processor’ to an instance of the class. Processor methods such as onTrigger() will be delegated to the scripted Processor instance. Also any Relationships or PropertyDescriptors defined by the scripted processor will be added to the configuration dialog. Experimental: Impact of sustained usage not yet verified.</p>\n\n<h3><a name=\"JoltTransformJSON\"></a>JoltTransformJSON</h3>\n\n<p>Applies a list of Jolt specifications to the flowfile JSON payload. A new FlowFile is created with transformed content and is routed to the ‘success’ relationship. If the JSON transform fails, the original FlowFile is routed to the ‘failure’ relationship.</p>\n\n<h3><a name=\"ListenHTTP\"></a>ListenHTTP</h3>\n\n<p>This processor starts an HTTP service that is used to receive FlowFiles from remote sources. The URL of the service is <a href=\"http://\">http://</a>{hostname}:{port}/contentListener.</p>\n\n<h3><a name=\"ListenLumberjack\"></a>ListenLumberjack</h3>\n\n<p>Listens for Lumberjack messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more Lumberjack frames. In the case where the Lumberjack frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.</p>\n\n<h3><a name=\"ListenRELP\"></a>ListenRELP</h3>\n\n<p>Listens for RELP messages being sent to a given port over TCP. Each message will be acknowledged after successfully writing the message to a FlowFile. Each FlowFile will contain data portion of one or more RELP frames. In the case where the RELP frames contain syslog messages, the output of this processor can be sent to a ParseSyslog processor for further processing.</p>\n\n<h3><a name=\"ListenSyslog\"></a>ListenSyslog</h3>\n\n<p>Listens for Syslog messages being sent to a given port over TCP or UDP. Incoming messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The format of each message is: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The timestamp can be an RFC5424 timestamp with a format of “yyyy-MM-dd’T'HH:mm:ss.SZ” or “yyyy-MM-dd’T'HH:mm:ss.S+hh:mm”, or it can be an RFC3164 timestamp with a format of “MMM d HH:mm:ss”. If an incoming messages matches one of these patterns, the message will be parsed and the individual pieces will be placed in FlowFile attributes, with the original message in the content of the FlowFile. If an incoming message does not match one of these patterns it will not be parsed and the syslog.valid attribute will be set to false with the original message in the content of the FlowFile. Valid messages will be transferred on the success relationship, and invalid messages will be transferred on the invalid relationship.</p>\n\n<h3><a name=\"ListenTCP\"></a>ListenTCP</h3>\n\n<p>Listens for incoming TCP connections and reads data from each connection using a line separator as the message demarcator. The default behavior is for each message to produce a single FlowFile, however this can be controlled by increasing the Batch Size to a larger value for higher throughput. The Receive Buffer Size must be set as large as the largest messages expected to be received, meaning if every 100kb there is a line separator, then the Receive Buffer Size must be greater than 100kb.</p>\n\n<h3><a name=\"ListenUDP\"></a>ListenUDP</h3>\n\n<p>This processor listens for Datagram Packets on a given port and concatenates the contents of those packets together generating flow files</p>\n\n<h3><a name=\"ListFile\"></a>ListFile</h3>\n\n<p>Retrieves a listing of files from the local filesystem. For each file that is listed, creates a FlowFile that represents the file so that it can be fetched in conjunction with ListFile. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetFile, this Processor does not delete any data from the local filesystem.</p>\n\n<h3><a name=\"ListHDFS\"></a>ListHDFS</h3>\n\n<p>This processor retrieves a listing of files from HDFS. For each file that is listed in HDFS, creates a FlowFile that represents the HDFS file so that it can be fetched in conjunction with ListHDFS. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data. Unlike GetHDFS, this Processor does not delete any data from HDFS.</p>\n\n<h3><a name=\"ListS3\"></a>ListS3</h3>\n\n<p>Retrieves a listing of objects from an S3 bucket. For each object that is listed, creates a FlowFile that represents the object so that it can be fetched in conjunction with FetchS3Object. This Processor is designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left off without duplicating all of the data.</p>\n\n<h3><a name=\"ListSFTP\"></a>ListSFTP</h3>\n\n<p>Performs a listing of the files residing on an SFTP server. For each file that is found on the remote server, a new FlowFile will be created with the filename attribute set to the name of the file on the remote server. This can then be used in conjunction with FetchSFTP in order to fetch those files.</p>\n\n<h3><a name=\"LogAttribute\"></a>LogAttribute</h3>\n\n<p>This processor reads the attributes on incoming FlowFiles and prints those attributes and their values to the log at the logging level specified by the user.</p>\n\n<h3><a name=\"MergeContent\"></a>MergeContent</h3>\n\n<p>This processor merges a group of FlowFiles together into a “Bundle” based on a user-defined strategy and packages them into a single FlowFile. It is recommended that the processor be configured with only a single incoming connection, as groups of FlowFiles will not be created from FlowFiles in different connections. This processor updates the mime.type attribute as appropriate. After files have been merged by this processor, they can be unpackaged later using the UnpackContent processor.</p>\n\n<h3><a name=\"ModifyBytes\"></a>ModifyBytes</h3>\n\n<p>This processor updates the content of a FlowFile by removing bytes from start or end of a file.</p>\n\n<h3><a name=\"ModifyHTMLElement\"></a>ModifyHTMLElement</h3>\n\n<p>Modifies the value of an existing HTML element. The desired element to be modified is located by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML elements may be selected in the similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string to find the element the user desires to modify. If the HTML element is found the element’s value is updated in the DOM using the value specified “Modified Value” property. All DOM elements that match the CSS selector will be updated. Once all of the DOM elements have been updated the DOM is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at “<a href=\"http://jsoup.org/apidocs/org/jsoup/select/Selector.html\">http://jsoup.org/apidocs/org/jsoup/select/Selector.html</a>”</p>\n\n<h3><a name=\"MonitorActivity\"></a>MonitorActivity</h3>\n\n<p>This processor monitors the flow for activity and sends out an indicator when the flow has not had any data for some specified amount of time and again when the flow’s activity is restored.</p>\n\n<h3><a name=\"ParseSyslog\"></a>ParseSyslog</h3>\n\n<p>Parses the contents of a Syslog message and adds attributes to the FlowFile for each of the parts of the Syslog message</p>\n\n<h3><a name=\"PostHTTP\"></a>PostHTTP</h3>\n\n<p>This processor performs an HTTP post with the content of each incoming FlowFile.</p>\n\n<h3><a name=\"PublishAMQP\"></a>PublishAMQP</h3>\n\n<p>Creates a AMQP Message from the contents of a FlowFile and sends the message to an AMQP Exchange.In a typical AMQP exchange model, the message that is sent to the AMQP Exchange will be routed based on the ‘Routing Key’ to its final destination in the queue (the binding). If due to some misconfiguration the binding between the Exchange, Routing Key and Queue is not set up, the message will have no final destination and will return (i.e., the data will not make it to the queue). If that happens you will see a log in both app-log and bulletin stating to that effect. Fixing the binding (normally done by AMQP administrator) will resolve the issue.</p>\n\n<h3><a name=\"PublishJMS\"></a>PublishJMS</h3>\n\n<p>Creates a JMS Message from the contents of a FlowFile and sends it to a JMS Destination (queue or topic) as JMS BytesMessage.</p>\n\n<h3><a name=\"PublishKafka\"></a>PublishKafka</h3>\n\n<p>Sends the contents of a FlowFile as a message to Apache Kafka. The messages to send may be individual FlowFiles or may be delimited, using a user-specified delimiter, such as a new-line.</p>\n\n<h3><a name=\"PublishMQTT\"></a>PublishMQTT</h3>\n\n<p>Publishes a message to an MQTT topic</p>\n\n<h3><a name=\"PutAzureEventHub\"></a>PutAzureEventHub</h3>\n\n<p>Sends the contents of a FlowFile to a Windows Azure Event Hub. Note: the content of the FlowFile will be buffered into memory before being sent, so care should be taken to avoid sending FlowFiles to this Processor that exceed the amount of Java Heap Space available.</p>\n\n<h3><a name=\"PutCassandraQL\"></a>PutCassandraQL</h3>\n\n<p>Execute provided Cassandra Query Language (CQL) statement on a Cassandra 1.x or 2.x cluster. The content of an incoming FlowFile is expected to be the CQL command to execute. The CQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention cql.args.N.type and cql.args.N.value, where N is a positive integer. The cql.args.N.type is expected to be a lowercase string indicating the Cassandra type.</p>\n\n<h3><a name=\"PutCouchbaseKey\"></a>PutCouchbaseKey</h3>\n\n<p>Put a document to Couchbase Server via Key/Value access.</p>\n\n<h3><a name=\"PutDistributedMapCache\"></a>PutDistributedMapCache</h3>\n\n<p>Gets the content of a FlowFile and puts it to a distributed map cache, using a cache key computed from FlowFile attributes. If the cache already contains the entry and the cache update strategy is ‘keep original’ the entry is not replaced.'</p>\n\n<h3><a name=\"PutDynamoDB\"></a>PutDynamoDB</h3>\n\n<p>Puts a document from DynamoDB based on hash and range key. The table can have either hash and range or hash key alone. Currently the keys supported are string and number and value can be json document. In case of hash and range keys both key are required for the operation. The FlowFile content must be JSON. FlowFile content is mapped to the specified Json Document attribute in the DynamoDB item.</p>\n\n<h3><a name=\"PutElasticsearch\"></a>PutElasticsearch</h3>\n\n<p>Writes the contents of a FlowFile to Elasticsearch, using the specified parameters such as the index to insert into and the type of the document. If the cluster has been configured for authorization and/or secure transport (SSL/TLS) and the Shield plugin is available, secure connections can be made. This processor supports Elasticsearch 2.x clusters.</p>\n\n<h3><a name=\"PutEmail\"></a>PutEmail</h3>\n\n<p>This processor sends an e-mail to configured recipients for each incoming FlowFile.</p>\n\n<h3><a name=\"PutFile\"></a>PutFile</h3>\n\n<p>This processor writes FlowFiles to the local file system.</p>\n\n<h3><a name=\"PutFTP\"></a>PutFTP</h3>\n\n<p>This processor sends FlowFiles via FTP to an FTP server.</p>\n\n<h3><a name=\"PutHBaseCell\"></a>PutHBaseCell</h3>\n\n<p>Adds the Contents of a FlowFile to HBase as the value of a single cell</p>\n\n<h3><a name=\"PutHBaseJSON\"></a>PutHBaseJSON</h3>\n\n<p>Adds rows to HBase based on the contents of incoming JSON documents. Each FlowFile must contain a single UTF-8 encoded JSON document, and any FlowFiles where the root element is not a single document will be routed to failure. Each JSON field name and value will become a column qualifier and value of the HBase row. Any fields with a null value will be skipped, and fields with a complex value will be handled according to the Complex Field Strategy. The row id can be specified either directly on the processor through the Row Identifier property, or can be extracted from the JSON document by specifying the Row Identifier Field Name property. This processor will hold the contents of all FlowFiles for the given batch in memory at one time.</p>\n\n<h3><a name=\"PutHDFS\"></a>PutHDFS</h3>\n\n<p>This processor writes FlowFiles to an HDFS cluster. It will create directories in which to store files as needed based on the Directory property.</p>\n\n<p>When files are written to HDFS, the file’s owner is the user identity of the NiFi process, the file’s group is the group of the parent directory, and the read/write/execute permissions use the default umask. The owner can be overridden using the Remote Owner property, the group can be overridden using the Remote Group property, and the read/write/execute permissions can be overridden using the Permissions umask property.</p>\n\n<p>NOTE: This processor can change owner or group only if the user identity of the NiFi process has super user privilege in HDFS to do so.</p>\n\n<p>NOTE: The Permissions umask property cannot add execute permissions to regular files.</p>\n\n<h3><a name=\"PutHiveQL\"></a>PutHiveQL</h3>\n\n<p>Executes a HiveQL DDL/DML command (UPDATE, INSERT, e.g.). The content of an incoming FlowFile is expected to be the HiveQL command to execute. The HiveQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention hiveql.args.N.type and hiveql.args.N.value, where N is a positive integer. The hiveql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.</p>\n\n<h3><a name=\"PutHTMLElement\"></a>PutHTMLElement</h3>\n\n<p>Places a new HTML element in the existing HTML DOM. The desired position for the new HTML element is specified by using CSS selector syntax. The incoming HTML is first converted into a HTML Document Object Model so that HTML DOM location may be located in a similar manner that CSS selectors are used to apply styles to HTML. The resulting HTML DOM is then “queried” using the user defined CSS selector string to find the position where the user desires to add the new HTML element. Once the new HTML element is added to the DOM it is rendered to HTML and the result replaces the flowfile content with the updated HTML. A more thorough reference for the CSS selector syntax can be found at “<a href=\"http://jsoup.org/apidocs/org/jsoup/select/Selector.html\">http://jsoup.org/apidocs/org/jsoup/select/Selector.html</a>”</p>\n\n<h3><a name=\"PutJMS\"></a>PutJMS</h3>\n\n<p>This processor creates a JMS message from the contents of a FlowFile and sends the message to a JMS server.</p>\n\n<h3><a name=\"PutKafka\"></a>PutKafka</h3>\n\n<p>This Processors puts the contents of a FlowFile to a Topic in Apache Kafka. The full contents of a FlowFile becomes the contents of a single message in Kafka. This message is optionally assigned a key by using the <Kafka Key> Property.</p>\n\n<p>The Processor allows the user to configure an optional Message Delimiter that can be used to send many messages per FlowFile. For example, a \\n could be used to indicate that the contents of the FlowFile should be used to send one message per line of text. If the property is not set, the entire contents of the FlowFile will be sent as a single message. When using the delimiter, if some messages are successfully sent but other messages fail to send, the FlowFile will be FORKed into two child FlowFiles, with the successfully sent messages being routed to ‘success’ and the messages that could not be sent going to ‘failure’.</p>\n\n<h3><a name=\"PutKinesisFirehose\"></a>PutKinesisFirehose</h3>\n\n<p>Sends the contents to a specified Amazon Kinesis Firehose. In order to send data to firehose, the firehose delivery stream name has to be specified.</p>\n\n<h3><a name=\"PutLambda\"></a>PutLambda</h3>\n\n<p>Sends the contents to a specified Amazon Lamba Function. The AWS credentials used for authentication must have permissions execute the Lambda function (lambda:InvokeFunction).The FlowFile content must be JSON.</p>\n\n<h3><a name=\"PutMongo\"></a>PutMongo</h3>\n\n<p>Writes the contents of a FlowFile to MongoDB</p>\n\n<h3><a name=\"PutRiemann\"></a>PutRiemann</h3>\n\n<p>Send events to Riemann (<a href=\"http://riemann.io\">http://riemann.io</a>) when FlowFiles pass through this processor. You can use events to notify Riemann that a FlowFile passed through, or you can attach a more meaningful metric, such as, the time a FlowFile took to get to this processor. All attributes attached to events support the NiFi Expression Language.</p>\n\n<h3><a name=\"PutS3Object\"></a>PutS3Object</h3>\n\n<p>Puts FlowFiles to an Amazon S3 Bucket</p>\n\n<h3><a name=\"PutSFTP\"></a>PutSFTP</h3>\n\n<p>This processor sends FlowFiles via SFTP to an SFTP server.</p>\n\n<h3><a name=\"PutSlack\"></a>PutSlack</h3>\n\n<p>The PutSlack processor sends messages to Slack, a team-oriented messaging service.</p>\n\n<p>This processor uses Slack’s incoming webhooks custom integration to post messages to a specific channel. Before using PutSlack, your Slack team should be configured for the incoming webhooks custom integration, and you’ll need to configure at least one incoming webhook.</p>\n\n<h3><a name=\"PutSNS\"></a>PutSNS</h3>\n\n<p>Sends the content of a FlowFile as a notification to the Amazon Simple Notification Service</p>\n\n<h3><a name=\"PutSolrContentStream\"></a>PutSolrContentStream</h3>\n\n<p>This processor streams the contents of a FlowFile to an Apache Solr update handler. Any properties added to this processor by the user are passed to Solr on the update request. If a parameter must be sent multiple times with different values, properties can follow a naming convention: name.number, where name is the parameter name and number is a unique number. Repeating parameters will be sorted by their property name.</p>\n\n<p>Example: To specify multiple ‘f’ parameters for indexing custom json, the following properties can be defined:</p>\n\n<ul>\n<li>split: /exams</li>\n<li>f.1: first:/first</li>\n<li>f.2: last:/last</li>\n<li>f.3: grade:/grade\nThis will result in sending the following url to Solr:\nsplit=/exams&f=first:/first&f=last:/last&f=grade:/grade</li>\n</ul>\n\n\n<h3><a name=\"PutSplunk\"></a>PutSplunk</h3>\n\n<p>Sends logs to Splunk Enterprise over TCP, TCP + TLS/SSL, or UDP. If a Message Delimiter is provided, then this processor will read messages from the incoming FlowFile based on the delimiter, and send each message to Splunk. If a Message Delimiter is not provided then the content of the FlowFile will be sent directly to Splunk as if it were a single message.</p>\n\n<h3><a name=\"PutSQL\"></a>PutSQL</h3>\n\n<p>Executes a SQL UPDATE or INSERT command. The content of an incoming FlowFile is expected to be the SQL command to execute. The SQL command may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format.</p>\n\n<h3><a name=\"PutSQS\"></a>PutSQS</h3>\n\n<p>Publishes a message to an Amazon Simple Queuing Service Queue</p>\n\n<h3><a name=\"PutSyslog\"></a>PutSyslog</h3>\n\n<p>Sends Syslog messages to a given host and port over TCP or UDP. Messages are constructed from the “Message ___” properties of the processor which can use expression language to generate messages from incoming FlowFiles. The properties are used to construct messages of the form: (<PRIORITY>)(VERSION )(TIMESTAMP) (HOSTNAME) (BODY) where version is optional. The constructed messages are checked against regular expressions for RFC5424 and RFC3164 formatted messages. The timestamp can be an RFC5424 timestamp with a format of “yyyy-MM-dd’T'HH:mm:ss.SZ” or “yyyy-MM-dd’T'HH:mm:ss.S+hh:mm”, or it can be an RFC3164 timestamp with a format of “MMM d HH:mm:ss”. If a message is constructed that does not form a valid Syslog message according to the above description, then it is routed to the invalid relationship. Valid messages are sent to the Syslog server and successes are routed to the success relationship, failures routed to the failure relationship.</p>\n\n<h3><a name=\"PutTCP\"></a>PutTCP</h3>\n\n<p>The PutTCP processor receives a FlowFile and transmits the FlowFile content over a TCP connection to the configured TCP server. By default, the FlowFiles are transmitted over the same TCP connection (or pool of TCP connections if multiple input threads are configured). To assist the TCP server with determining message boundaries, an optional “Outgoing Message Delimiter” string can be configured which is appended to the end of each FlowFiles content when it is transmitted over the TCP connection. An optional “Connection Per FlowFile” parameter can be specified to change the behaviour so that each FlowFiles content is transmitted over a single TCP connection which is opened when the FlowFile is received and closed after the FlowFile has been sent. This option should only be used for low message volume scenarios, otherwise the platform may run out of TCP sockets.</p>\n\n<h3><a name=\"PutUDP\"></a>PutUDP</h3>\n\n<p>The PutUDP processor receives a FlowFile and packages the FlowFile content into a single UDP datagram packet which is then transmitted to the configured UDP server. The user must ensure that the FlowFile content being fed to this processor is not larger than the maximum size for the underlying UDP transport. The maximum transport size will vary based on the platform setup but is generally just under 64KB. FlowFiles will be marked as failed if their content is larger than the maximum transport size.</p>\n\n<h3><a name=\"QueryCassandra\"></a>QueryCassandra</h3>\n\n<p>Execute provided Cassandra Query Language (CQL) select query on a Cassandra 1.x or 2.x cluster. Query result may be converted to Avro or JSON format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘executecql.row.count’ indicates how many rows were selected.</p>\n\n<h3><a name=\"QueryDatabaseTable\"></a>QueryDatabaseTable</h3>\n\n<p>Execute provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘querydbtable.row.count’ indicates how many rows were selected.</p>\n\n<h3><a name=\"ReplaceText\"></a>ReplaceText</h3>\n\n<p>This processor updates the content of a FlowFile by evaluating a regular expression (regex) against the content and replacing the section of content that matches the regular expression with an alternate, user-defined, value.</p>\n\n<h3><a name=\"ReplaceTextWithMapping\"></a>ReplaceTextWithMapping</h3>\n\n<p>This processor updates the content of a FlowFile by evaluating a Regular Expression against it and replacing the section of the content that matches the Regular Expression with some alternate value provided in a mapping file.</p>\n\n<h3><a name=\"ResizeImage\"></a>ResizeImage</h3>\n\n<p>Resizes an image to user-specified dimensions. This Processor uses the image codecs registered with the environment that NiFi is running in. By default, this includes JPEG, PNG, BMP, WBMP, and GIF images.</p>\n\n<h3><a name=\"RouteHL7\"></a>RouteHL7</h3>\n\n<p>Routes incoming HL7 data according to user-defined queries. To add a query, add a new property to the processor. The name of the property will become a new relationship for the processor, and the value is an HL7 Query Language query. If a FlowFile matches the query, a copy of the FlowFile will be routed to the associated relationship.</p>\n\n<h3><a name=\"RouteOnAttribute\"></a>RouteOnAttribute</h3>\n\n<p>This processor routes FlowFiles based on their attributes using the NiFi Expression Language. Users add properties with valid NiFi Expression Language Expressions as the values. Each Expression must return a value of type Boolean (true or false).</p>\n\n<p>Example: The goal is to route all files with filenames that start with ABC down a certain path. Add a property with the following name and value:</p>\n\n<ul>\n<li>property name: ABC</li>\n<li>property value: ${filename:startsWith(‘ABC’)}\nIn this example, all files with filenames that start with ABC will follow the ABC relationship.</li>\n</ul>\n\n\n<h3><a name=\"RouteOnContent\"></a>RouteOnContent</h3>\n\n<p>This processor applies user-added regular expressions to the content of a FlowFile and routes a copy of the FlowFile to each destination whose regular expression matches. The user adds properties where the name is the relationship that the FlowFile should follow if it matches the regular expression, which is defined as the property’s value. User-defined properties do support the NiFi Expression Language, but in such cases, the results are interpreted as literal values, not regular expressions.</p>\n\n<h3><a name=\"RouteText\"></a>RouteText</h3>\n\n<p>Routes textual data based on a set of user-defined rules. Each line in an incoming FlowFile is compared against the values specified by user-defined Properties. The mechanism by which the text is compared to these user-defined properties is defined by the ‘Matching Strategy’. The data is then routed according to these rules, routing each line of the text individually.</p>\n\n<h3><a name=\"ScanAttribute\"></a>ScanAttribute</h3>\n\n<p>This processor scans the specified attributes of FlowFiles, checking to see if any of their values are present within the specified dictionary of terms.</p>\n\n<h3><a name=\"ScanContent\"></a>ScanContent</h3>\n\n<p>This processor scans the content of FlowFiles for terms that are found in a user-supplied dictionary file. If a term is matched, the UTF-8 encoded version of the term is added to the FlowFile using the matching.term attribute. This allows for follow-on processors to use the value of the matching.term attribute to make routing decisions and so forth.</p>\n\n<h3><a name=\"SegmentContent\"></a>SegmentContent</h3>\n\n<p>This processor segments a FlowFile into multiple smaller segments on byte boundaries. Each segment is given attributes that can then be used by the MergeContent processor to reconstruct the original FlowFile.</p>\n\n<h3><a name=\"SelectHiveQL\"></a>SelectHiveQL</h3>\n\n<p>Execute provided HiveQL SELECT query against a Hive database connection. Query result will be converted to Avro or CSV format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query. FlowFile attribute ‘selecthiveql.row.count’ indicates how many rows were selected.</p>\n\n<h3><a name=\"SetSNMP\"></a>SetSNMP</h3>\n\n<p>Based on incoming FlowFile attributes, the processor will execute SNMP Set requests. When founding attributes with name like snmp$<OID>, the processor will atempt to set the value of attribute to the corresponding OID given in the attribute name</p>\n\n<h3><a name=\"SplitAvro\"></a>SplitAvro</h3>\n\n<p>Splits a binary encoded Avro datafile into smaller files based on the configured Output Size. The Output Strategy determines if the smaller files will be Avro datafiles, or bare Avro records with metadata in the FlowFile attributes. The output will always be binary encoded.</p>\n\n<h3><a name=\"SplitContent\"></a>SplitContent</h3>\n\n<p>This processor splits incoming FlowFiles by a specified byte sequence.</p>\n\n<h3><a name=\"SplitJson\"></a>SplitJson</h3>\n\n<p>This processor splits a JSON File into multiple, separate FlowFiles for an array element specified by a JsonPath expression. Each generated FlowFile is comprised of an element of the specified array and transferred to relationship ‘split,’ with the original file transferred to the ‘original’ relationship. If the specified JsonPath is not found or does not evaluate to an array element, the original file is routed to ‘failure’ and no files are generated.</p>\n\n<h3><a name=\"SplitText\"></a>SplitText</h3>\n\n<p>This processor splits a text file into multiple smaller text files on line boundaries, each having up to a configured number of lines.</p>\n\n<h3><a name=\"SplitXML\"></a>SplitXML</h3>\n\n<p>This processor splits an XML file into multiple separate FlowFiles, each comprising a child or descendant of the original root element.</p>\n\n<h3><a name=\"SpringContextProcessor\"></a>SpringContextProcessor</h3>\n\n<p>A Processor that supports sending and receiving data from application defined in Spring Application Context via predefined in/out MessageChannels.</p>\n\n<h3><a name=\"StoreInKiteDataset\"></a>StoreInKiteDataset</h3>\n\n<p>Stores Avro records in a Kite dataset.</p>\n\n<h3><a name=\"TailFile\"></a>TailFile</h3>\n\n<p>“Tails” a file, ingesting data from the file as it is written to the file. The file is expected to be textual. Data is ingested only when a new line is encountered (carriage return or new-line character or combination). If the file to tail is periodically “rolled over”, as is generally the case with log files, an optional Rolling Filename Pattern can be used to retrieve data from files that have rolled over, even if the rollover occurred while NiFi was not running (provided that the data still exists upon restart of NiFi). It is generally advisable to set the Run Schedule to a few seconds, rather than running with the default value of 0 secs, as this Processor will consume a lot of resources if scheduled very aggressively. At this time, this Processor does not support ingesting files that have been compressed when ‘rolled over’.</p>\n\n<h3><a name=\"TransformXML\"></a>TransformXML</h3>\n\n<p>This processor transforms the contents of FlowFiles based on a user-specified XSLT stylesheet file. XSL versions 1.0 and 2.0 are supported.</p>\n\n<h3><a name=\"UnpackContent\"></a>UnpackContent</h3>\n\n<p>This processor unpacks the content of FlowFiles that have been packaged with one of several different packaging formats, emitting one to many FlowFiles for each input FlowFile.</p>\n\n<h3><a name=\"UpdateAttribute\"></a>UpdateAttribute</h3>\n\n<p>This processor updates the attributes of a FlowFile using properties or rules that are added by the user. There are two ways to use this processor to add or modify attributes. One way is the “Basic Usage”; this allows you to set default attribute changes that affect every FlowFile going through the processor. The second way is the “Advanced Usage”; this allows you to make conditional attribute changes that only affect a FlowFile if it meets certain conditions. It is possible to use both methods in the same processor at the same time.</p>\n\n<h3><a name=\"ValidateXML\"></a>ValidateXML</h3>\n\n<p>This processor validates the contents of FlowFiles against a user-specified XML schema file.</p>\n\n<h3><a name=\"YandexTranslate\"></a>YandexTranslate</h3>\n\n<p>Translates content and attributes from one language to another</p>\n\n<p>If you have questions about a processor, I’d encourage you to download the binaries and start up Apache Nifi. Now you can see the documentation for processors in <a href=\"https://nifi.apache.org/docs.html\">their documentation</a>, which is now built with each release. If you really want more information, let me know and I’ll try to compile a more complete post about each and every processor.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1423105206000,"created_by":1,"updated_at":1423105206000,"updated_by":1,"published_at":1423105206000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"processors"}]},{"title":"Developing A Custom Apache Nifi Processor (JSON)","slug":"developing-a-custom-apache-nifi-processor-json","markdown":"The list of available Apache Nifi processors is extensive, as documented in [this post]({{ site.url }}/apache-nifi-processors/). There is still a need to develop your own; to pull data from a database, to process an uncommon file format, or many other unique situations. So to get you started, we will work through a basic processor that takes a json file as input and a json path as a parameter to place into the contents and an attribute. The full source is hosted on [Github](https://github.com/pcgrenier/nifi-examples).\n\n<!-- more -->\n\n## Setup\n\nStart by creating a simple maven project in your favorite IDE. Then edit the pom.xml.\n\n{% gist f99d27d08c3903f9d50c pom.xml %}\n\nThis pom.xml includes a single plug-in for building a nifi nar, which is similar to a war for nifi, that bundles everything up in a way nifi can unpack. The nifi-api is the only other \"required\" dependency. The other nifi dependencies are really use full as you will see.\n\nThe next important piece is telling nifi which classes to load and register. This is done in a single file located at /src/main/resources/META-INF/services/org.apache.nifi.processor.Processor\n\n{% gist f98e563e787c1b73c425 org.apache.nifi.processor.Processor %}\n\n## The JSON Processor\n\nNow that everything is defined and findable by Apache Nifi, lets build a processor. Define a simple java class as defined in the setup process (rocks.nifi.examples.processors.JsonProcessor).\n\nTags are useful for finding your processor in the list of processors in the GUI. So in this case in the search box you could just type 'json' and your processor will be found. The capability description is also displayed in the processor selection box. Nifi.rocks will make future posts on documenting your custom processors. Finally most processors will just extend the AbstractProcessor, for more complicated tasks it may be required to go a level deeper for the AbstractSessionFactoryProcessor.\n\n{% codeblock lang:java Apache Nifi Processor Header https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\n@SideEffectFree\n@Tags({\"JSON\", \"NIFI ROCKS\"})\n@CapabilityDescription(\"Fetch value from json path.\")\npublic class JsonProcessor extends AbstractProcessor {\n{% endcodeblock %}\n\nNot really interesting stuff here. Properties will hold all a list of all the available properties tha are exposed to the user. Relationships will hold the relationships the processor will use to direct the flow files. For more details on relationships, properties, and components of an Apache Nifi flow please read the [offical developer guide](https://nifi.apache.org/developer-guide.html). There is plenty of room to expand on custom validators, but there is a large selection of validators in nifi-processor-utils package.\n\n\n{% codeblock lang:java Variable Declaration https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\n\nprivate List<PropertyDescriptor> properties;\nprivate Set<Relationship> relationships;\n\npublic static final String MATCH_ATTR = \"match\";\n\npublic static final PropertyDescriptor JSON_PATH = new PropertyDescriptor.Builder()\n .name(\"Json Path\")\n .required(true)\n .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)\n .build();\n\npublic static final Relationship SUCCESS = new Relationship.Builder()\n .name(\"SUCCESS\")\n .description(\"Succes relationship\")\n .build();\n{% endcodeblock %}\n\nThe init function is called at the start of Apache Nifi. Remember that this is a highly multi-threaded environment and be careful what you do in this space. This is why both the list of properties and the set of relationships are set with unmodifiable collections. I put the getters for the properties and relationships here as well.\n\n{% codeblock lang:java Apache Nifi Init https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\n@Override\npublic void init(final ProcessorInitializationContext context){\n List<PropertyDescriptor> properties = new ArrayList<>();\n properties.add(JSON_PATH);\n this.properties = Collections.unmodifiableList(properties);\n\n Set<Relationship> relationships = new HashSet<>();\n relationships.add(SUCCESS);\n this.relationships = Collections.unmodifiableSet(relationships);\n}\n\n@Override\npublic Set<Relationship> getRelationships(){\n return relationships;\n}\n\n@Override\npublic List<PropertyDescriptor> getSupportedPropertyDescriptors(){\n return properties;\n}\n{% endcodeblock %}\n\nThe onTrigger method is called when ever a flow file is passed to the processor. For more details on the context and session variables please again refer to the [official developer guide](https://nifi.apache.org/developer-guide.html#flowfile).\n\n{% codeblock lang:java Apache Nifi OnTrigger https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\n\n@Override\npublic void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {\n final ProcessorLog log = this.getLogger();\n final AtomicReference<String> value = new AtomicReference<>();\n\n FlowFile flowfile = session.get();\n\n session.read(flowfile, new InputStreamCallback() {\n @Override\n public void process(InputStream in) throws IOException {\n try{\n String json = IOUtils.toString(in);\n String result = JsonPath.read(json, \"$.hello\");\n value.set(result);\n }catch(Exception ex){\n ex.printStackTrace();\n log.error(\"Failed to read json string.\");\n }\n }\n });\n\n // Write the results to an attribute\n String results = value.get();\n if(results != null && !results.isEmpty()){\n flowfile = session.putAttribute(flowfile, \"match\", results);\n }\n\n // To write the results back out ot flow file\n flowfile = session.write(flowfile, new OutputStreamCallback() {\n\n @Override\n public void process(OutputStream out) throws IOException {\n out.write(value.get().getBytes());\n }\n });\n\n session.transfer(flowfile, SUCCESS);\n}\n{% endcodeblock %}\n\nIn general you pull the flow file out of session. Read and write to the flow files and add attributes where needed. To work on flow files nifi provides 3 callback interfaces.\n\n* InputStreamCallback: For reading the contents of the flow file through a input stream.\n\n Using Apache Commons to read the input stream out to a string. Use JsonPath to attempt to read the json and set a value to the pass on. It would normally be best practice in the case of a exception to pass the original flow file to a Error relation point in the case of an exception.\n\n{% codeblock lang:java Apache Nifi InputStreamCallback https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\nsession.read(flowfile, new InputStreamCallback() {\n @Override\n public void process(InputStream in) throws IOException {\n try{\n String json = IOUtils.toString(in);\n String result = JsonPath.read(json, \"$.hello\");\n value.set(result);\n }catch(Exception ex){\n ex.printStackTrace();\n log.error(\"Failed to read json string.\");\n }\n }\n}); \n{% endcodeblock %}\n\n* OutputStreamCallback: For writing to a flowfile, this will over write not concatenate.\n\n We simply write out the value we recieved in the InputStreamCallback\n\n{% codeblock lang:java Apache Nifi OutputStreamCallback https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\nflowfile = session.write(flowfile, new OutputStreamCallback() {\n @Override\n public void process(OutputStream out) throws IOException {\n out.write(value.get().getBytes());\n }\n});\n{% endcodeblock %}\n\n* StreamCallback: This is for both reading and writing to the same flow file. With both the outputstreamcallback and streamcall back remember to assign it back to a flow file. This processor is not in use in the code and could have been. The choice was deliberate to show a way of moving data out of callbacks and back in.\n\n{% codeblock lang:java Apache Nifi StreamCallback %}\nflowfile = session.write(flowfile, new StreamCallback() {\n @Override\n public void process(InputStream in, OutputStream out) throws IOException {\n String json = IOUtils.toString(in);\n String result = JsonPath.read(json, \"$.hello\");\n out.write(result.getBytes());\n }\n});\n{% endcodeblock %}\n\nFlow files can also contain meta data in attributes to push between processors.\n\n{% codeblock lang:java Setting low file attributes https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\n// Write the results to an attribute\nString results = value.get();\nif(results != null && !results.isEmpty()){\n flowfile = session.putAttribute(flowfile, \"match\", results);\n}\n{% endcodeblock %}\n\nFinally every flow file that is generated needs to be deleted or transfered.\n\n{% codeblock lang:java Session Transfer https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java JsonProcessor.java %}\nsession.transfer(flowfile, SUCCESS);\n{% endcodeblock %}\n\nAt this point you should be able to build with a simple\n\n{% codeblock lang:shell-session %}\nmvn clean install\n{% endcodeblock %}\n\n## Deployment\n\n1. Copy the target/examples-1.0-SNAPSHOT.nar to $NIFI_HOME/lib\n2. $NIFI_HOME/bin/nifi.sh stop\n3. $NIFI_HOME/bin/nifi.sh start\n\nAfter Nifi finishes starting you should be able to add it to your flow.\n\nNifi.rocks will follow up with how to generate unit tests and documentation for your custom processors soon.\n","html":"<p>The list of available Apache Nifi processors is extensive, as documented in <a href=\"{{%20site.url%20}}/apache-nifi-processors/\">this post</a>. There is still a need to develop your own; to pull data from a database, to process an uncommon file format, or many other unique situations. So to get you started, we will work through a basic processor that takes a json file as input and a json path as a parameter to place into the contents and an attribute. The full source is hosted on <a href=\"https://github.com/pcgrenier/nifi-examples\">Github</a>.</p>\n\n<!-- more -->\n\n\n<h2>Setup</h2>\n\n<p>Start by creating a simple maven project in your favorite IDE. Then edit the pom.xml.</p>\n\n<p>{% gist f99d27d08c3903f9d50c pom.xml %}</p>\n\n<p>This pom.xml includes a single plug-in for building a nifi nar, which is similar to a war for nifi, that bundles everything up in a way nifi can unpack. The nifi-api is the only other “required” dependency. The other nifi dependencies are really use full as you will see.</p>\n\n<p>The next important piece is telling nifi which classes to load and register. This is done in a single file located at /src/main/resources/META-INF/services/org.apache.nifi.processor.Processor</p>\n\n<p>{% gist f98e563e787c1b73c425 org.apache.nifi.processor.Processor %}</p>\n\n<h2>The JSON Processor</h2>\n\n<p>Now that everything is defined and findable by Apache Nifi, lets build a processor. Define a simple java class as defined in the setup process (rocks.nifi.examples.processors.JsonProcessor).</p>\n\n<p>Tags are useful for finding your processor in the list of processors in the GUI. So in this case in the search box you could just type ‘json’ and your processor will be found. The capability description is also displayed in the processor selection box. Nifi.rocks will make future posts on documenting your custom processors. Finally most processors will just extend the AbstractProcessor, for more complicated tasks it may be required to go a level deeper for the AbstractSessionFactoryProcessor.</p>\n\n<p>{% codeblock lang:java Apache Nifi Processor Header <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}\n@SideEffectFree\n@Tags({“JSON”, “NIFI ROCKS”})\n@CapabilityDescription(“Fetch value from json path.”)\npublic class JsonProcessor extends AbstractProcessor {\n{% endcodeblock %}</p>\n\n<p>Not really interesting stuff here. Properties will hold all a list of all the available properties tha are exposed to the user. Relationships will hold the relationships the processor will use to direct the flow files. For more details on relationships, properties, and components of an Apache Nifi flow please read the <a href=\"https://nifi.apache.org/developer-guide.html\">offical developer guide</a>. There is plenty of room to expand on custom validators, but there is a large selection of validators in nifi-processor-utils package.</p>\n\n<p>{% codeblock lang:java Variable Declaration <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}</p>\n\n<p>private List<PropertyDescriptor> properties;\nprivate Set<Relationship> relationships;</p>\n\n<p>public static final String MATCH_ATTR = “match”;</p>\n\n<p>public static final PropertyDescriptor JSON_PATH = new PropertyDescriptor.Builder()\n .name(“Json Path”)\n .required(true)\n .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)\n .build();</p>\n\n<p>public static final Relationship SUCCESS = new Relationship.Builder()\n .name(“SUCCESS”)\n .description(“Succes relationship”)\n .build();\n{% endcodeblock %}</p>\n\n<p>The init function is called at the start of Apache Nifi. Remember that this is a highly multi-threaded environment and be careful what you do in this space. This is why both the list of properties and the set of relationships are set with unmodifiable collections. I put the getters for the properties and relationships here as well.</p>\n\n<p>{% codeblock lang:java Apache Nifi Init <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}\n@Override\npublic void init(final ProcessorInitializationContext context){\n List<PropertyDescriptor> properties = new ArrayList<>();\n properties.add(JSON_PATH);\n this.properties = Collections.unmodifiableList(properties);</p>\n\n<pre><code>Set<Relationship> relationships = new HashSet<>();\nrelationships.add(SUCCESS);\nthis.relationships = Collections.unmodifiableSet(relationships);\n</code></pre>\n\n<p>}</p>\n\n<p>@Override\npublic Set<Relationship> getRelationships(){\n return relationships;\n}</p>\n\n<p>@Override\npublic List<PropertyDescriptor> getSupportedPropertyDescriptors(){\n return properties;\n}\n{% endcodeblock %}</p>\n\n<p>The onTrigger method is called when ever a flow file is passed to the processor. For more details on the context and session variables please again refer to the <a href=\"https://nifi.apache.org/developer-guide.html#flowfile\">official developer guide</a>.</p>\n\n<p>{% codeblock lang:java Apache Nifi OnTrigger <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}</p>\n\n<p>@Override\npublic void onTrigger(ProcessContext context, ProcessSession session) throws ProcessException {\n final ProcessorLog log = this.getLogger();\n final AtomicReference<String> value = new AtomicReference<>();</p>\n\n<pre><code>FlowFile flowfile = session.get();\n\nsession.read(flowfile, new InputStreamCallback() {\n @Override\n public void process(InputStream in) throws IOException {\n try{\n String json = IOUtils.toString(in);\n String result = JsonPath.read(json, \"$.hello\");\n value.set(result);\n }catch(Exception ex){\n ex.printStackTrace();\n log.error(\"Failed to read json string.\");\n }\n }\n});\n\n// Write the results to an attribute\nString results = value.get();\nif(results != null && !results.isEmpty()){\n flowfile = session.putAttribute(flowfile, \"match\", results);\n}\n\n// To write the results back out ot flow file\nflowfile = session.write(flowfile, new OutputStreamCallback() {\n\n @Override\n public void process(OutputStream out) throws IOException {\n out.write(value.get().getBytes());\n }\n});\n\nsession.transfer(flowfile, SUCCESS);\n</code></pre>\n\n<p>}\n{% endcodeblock %}</p>\n\n<p>In general you pull the flow file out of session. Read and write to the flow files and add attributes where needed. To work on flow files nifi provides 3 callback interfaces.</p>\n\n<ul>\n<li><p>InputStreamCallback: For reading the contents of the flow file through a input stream.</p>\n\n<p>Using Apache Commons to read the input stream out to a string. Use JsonPath to attempt to read the json and set a value to the pass on. It would normally be best practice in the case of a exception to pass the original flow file to a Error relation point in the case of an exception.</p></li>\n</ul>\n\n\n<p>{% codeblock lang:java Apache Nifi InputStreamCallback <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}\nsession.read(flowfile, new InputStreamCallback() {\n @Override\n public void process(InputStream in) throws IOException {\n try{\n String json = IOUtils.toString(in);\n String result = JsonPath.read(json, “$.hello”);\n value.set(result);\n }catch(Exception ex){\n ex.printStackTrace();\n log.error(“Failed to read json string.”);\n }\n }\n});<br/>\n{% endcodeblock %}</p>\n\n<ul>\n<li><p>OutputStreamCallback: For writing to a flowfile, this will over write not concatenate.</p>\n\n<p>We simply write out the value we recieved in the InputStreamCallback</p></li>\n</ul>\n\n\n<p>{% codeblock lang:java Apache Nifi OutputStreamCallback <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}\nflowfile = session.write(flowfile, new OutputStreamCallback() {\n @Override\n public void process(OutputStream out) throws IOException {\n out.write(value.get().getBytes());\n }\n});\n{% endcodeblock %}</p>\n\n<ul>\n<li>StreamCallback: This is for both reading and writing to the same flow file. With both the outputstreamcallback and streamcall back remember to assign it back to a flow file. This processor is not in use in the code and could have been. The choice was deliberate to show a way of moving data out of callbacks and back in.</li>\n</ul>\n\n\n<p>{% codeblock lang:java Apache Nifi StreamCallback %}\nflowfile = session.write(flowfile, new StreamCallback() {\n @Override\n public void process(InputStream in, OutputStream out) throws IOException {\n String json = IOUtils.toString(in);\n String result = JsonPath.read(json, “$.hello”);\n out.write(result.getBytes());\n }\n});\n{% endcodeblock %}</p>\n\n<p>Flow files can also contain meta data in attributes to push between processors.</p>\n\n<p>{% codeblock lang:java Setting low file attributes <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}\n// Write the results to an attribute\nString results = value.get();\nif(results != null && !results.isEmpty()){\n flowfile = session.putAttribute(flowfile, “match”, results);\n}\n{% endcodeblock %}</p>\n\n<p>Finally every flow file that is generated needs to be deleted or transfered.</p>\n\n<p>{% codeblock lang:java Session Transfer <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/main/java/rocks/nifi/examples/processors/JsonProcessor.java</a> JsonProcessor.java %}\nsession.transfer(flowfile, SUCCESS);\n{% endcodeblock %}</p>\n\n<p>At this point you should be able to build with a simple</p>\n\n<p>{% codeblock lang:shell-session %}\nmvn clean install\n{% endcodeblock %}</p>\n\n<h2>Deployment</h2>\n\n<ol>\n<li>Copy the target/examples-1.0-SNAPSHOT.nar to $NIFI_HOME/lib</li>\n<li>$NIFI_HOME/bin/nifi.sh stop</li>\n<li>$NIFI_HOME/bin/nifi.sh start</li>\n</ol>\n\n\n<p>After Nifi finishes starting you should be able to add it to your flow.</p>\n\n<p>Nifi.rocks will follow up with how to generate unit tests and documentation for your custom processors soon.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1423275273000,"created_by":1,"updated_at":1423275273000,"updated_by":1,"published_at":1423275273000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"processors"}]},{"title":"Apache Nifi Release 0.0.2 Highlights","slug":"apache-nifi-release-0-dot-0-2-highlights","markdown":"Apache Nifi just kicked out their second release, [0.0.2](https://nifi.apache.org/download.html). It doesn't have too many new features in it, hence a patch release according to [semantic versioning](http://semver.org), but it is definitely an improvement over the previous version, with it's bug fixes, improvements, and new features numbers below.\n\n* [Bug Fixes](#bugfixes):\t\t35\n* [Improvements](#improvements):\t\t17\n* [New Features](#new-features):\t\t5\n\n<!--more-->\n### <a name=\"bugfixes\"></a>Bug Fixes\nThe nifi community came in with some big fixes with this release. The biggest ones I feel are the ability to now build on Mac OSX and increasing documentation. There were quite a few other fixes that greatly improved the overall NiFi experience, processors now stop correctly and aren't continued to be scheduled and some clustering issues were resolved for example.\n\n### <a name=\"improvements\"></a>Improvements\nThe improvements in this release included a few maintanence related tasks and again some over all usability tasks, with most of them falling on the usability side. The main improvements in Nifi release 0.0.2 were document related and overall usability updates, the cleanup of development documents, and too many to name for the usability. In this release though, you should expect the following usability improvements:\n\n* proxy support for the GetFTP Processor\n* Component toolbox improvements\n* Dragging/drawing relationships\n* Labeling with the same color for multiple items\n\n### <a name=\"new-features\"></a>New Features\nNow on to the good stuff, new features! This release only had a few additional features added, but from what I've read there are quite a few more coming. The highlights in this category are:\n\nMultiple new processors\n* JSON Processors\n* ExecuteProcess processor\n* StoreInKiteDataset processor\n \nNew processors are great and mean more possibilities and use cases for Apache Nifi. We showed you how to create a JSON processor in a previous post, [Developing a custom Nifi Processor: JSON]({{site_url}}/developing-a-custom-apache-nifi-processor-json), and now someone has contributed a JSON processor to Nifi. For a full list of Nifi Processors, including the newly added ones, see our [previous post]({{ site_url }}/apache-nifi-processors).\n\nIt has been rumored that the next release will contain quite a few changes and be a minor release bumping up to version 0.1.0. With over 200 tickets open on their JIRA, Nifi could see a great amount of change in the coming months.\n\nThe full list of release notes can be found on Apache Nifi's [JIRA page](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329373) and downloads for all of their releases, source code and binaries, are available from their [website](https://nifi.apache.org/download.html). Nifi is also making thier releases available through maven via apaches [maven repository](https://repository.apache.org/content/repositories/releases/org/apache/nifi/).\n\nAs always, check back for how-tos and Apache Nifi related updates. New videos and posts are coming!!","html":"<p>Apache Nifi just kicked out their second release, <a href=\"https://nifi.apache.org/download.html\">0.0.2</a>. It doesn’t have too many new features in it, hence a patch release according to <a href=\"http://semver.org\">semantic versioning</a>, but it is definitely an improvement over the previous version, with it’s bug fixes, improvements, and new features numbers below.</p>\n\n<ul>\n<li><a href=\"#bugfixes\">Bug Fixes</a>: 35</li>\n<li><a href=\"#improvements\">Improvements</a>: 17</li>\n<li><a href=\"#new-features\">New Features</a>: 5</li>\n</ul>\n\n\n<!--more-->\n\n\n<h3><a name=\"bugfixes\"></a>Bug Fixes</h3>\n\n<p>The nifi community came in with some big fixes with this release. The biggest ones I feel are the ability to now build on Mac OSX and increasing documentation. There were quite a few other fixes that greatly improved the overall NiFi experience, processors now stop correctly and aren’t continued to be scheduled and some clustering issues were resolved for example.</p>\n\n<h3><a name=\"improvements\"></a>Improvements</h3>\n\n<p>The improvements in this release included a few maintanence related tasks and again some over all usability tasks, with most of them falling on the usability side. The main improvements in Nifi release 0.0.2 were document related and overall usability updates, the cleanup of development documents, and too many to name for the usability. In this release though, you should expect the following usability improvements:</p>\n\n<ul>\n<li>proxy support for the GetFTP Processor</li>\n<li>Component toolbox improvements</li>\n<li>Dragging/drawing relationships</li>\n<li>Labeling with the same color for multiple items</li>\n</ul>\n\n\n<h3><a name=\"new-features\"></a>New Features</h3>\n\n<p>Now on to the good stuff, new features! This release only had a few additional features added, but from what I’ve read there are quite a few more coming. The highlights in this category are:</p>\n\n<p>Multiple new processors\n* JSON Processors\n* ExecuteProcess processor\n* StoreInKiteDataset processor</p>\n\n<p>New processors are great and mean more possibilities and use cases for Apache Nifi. We showed you how to create a JSON processor in a previous post, <a href=\"{{site_url}}/developing-a-custom-apache-nifi-processor-json\">Developing a custom Nifi Processor: JSON</a>, and now someone has contributed a JSON processor to Nifi. For a full list of Nifi Processors, including the newly added ones, see our <a href=\"{{%20site_url%20}}/apache-nifi-processors\">previous post</a>.</p>\n\n<p>It has been rumored that the next release will contain quite a few changes and be a minor release bumping up to version 0.1.0. With over 200 tickets open on their JIRA, Nifi could see a great amount of change in the coming months.</p>\n\n<p>The full list of release notes can be found on Apache Nifi’s <a href=\"https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329373\">JIRA page</a> and downloads for all of their releases, source code and binaries, are available from their <a href=\"https://nifi.apache.org/download.html\">website</a>. Nifi is also making thier releases available through maven via apaches <a href=\"https://repository.apache.org/content/repositories/releases/org/apache/nifi/\">maven repository</a>.</p>\n\n<p>As always, check back for how-tos and Apache Nifi related updates. New videos and posts are coming!!</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1426815911000,"created_by":1,"updated_at":1426815911000,"updated_by":1,"published_at":1426815911000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]},{"title":"Developing a Custom Apache Nifi Processor-Unit Tests (Part I)","slug":"developing-a-custom-apache-nifi-processor-unit-tests-partI","markdown":"The Apache Nifi framework has built in unit testing with Junit using test runners. They have a few examples in their code base, but learning first hand really helps. In this post we'll go over adding unit tests to the [JSON Processor]({{ site.url }}/developing-a-custom-apache-nifi-processor-json/) that we developed previously.\n\nTo start, we'll checkout the JSON Processor code from [Github](https://github.com/pcgrenier/nifi-examples) and then open it up in your favorite text editor. There is already a test package/folder in the project that contains a unit test, [rocks.nifi.examples.processors/JsonProcessorTest.java](https://github.com/pcgrenier/nifi-examples/blob/master/src/test/java/rocks/nifi/examples/processors/JsonProcessorTest.java).\n\nWhen unit testing in Apache Nifi, there are a few items on top of the normal JUnit annotations that are required. While you can unit test with just JUnit, using the built in method in Apache Nifi makes it much easier. In a future post we'll show you how to unit test using Mockito and JUnit to test helper methods where actually invoking a full processor seems excessive, or if you just don't want to use Nifi's test runner.\n\n<!-- more -->\nThe first thing to do to unit test Apache Nifi is grabbing the necessary dependencies from maven.\n\n{% codeblock lang:xml pom.xml https://github.com/pcgrenier/nifi-examples/blob/master/pom.xml#L42 %}\n\t\t<dependency>\n <groupId>org.apache.nifi</groupId>\n <artifactId>nifi-mock</artifactId>\n <version>${nifi.version}</version>\n <scope>test</scope>\n </dependency>\n\n{% endcodeblock %}\n\nAfter including the nifi-mock dependency, there will be a few includes from the org.apache.nifi.utils package that will be necessary to import; mainly TestRunner, TestRunners and MockFlowFile.\n\nSince we are using JUnit, the usual tags apply for defining a test function inside the class and we'll use the @Test for this one, although the other annotations can be used. After adding the JUnit tag, there are a few requirements that Nifi puts on you in order to utilize the test runners they have provided. The first is to create a test runner to use and then to set a flow file or flow file content inorder to run the processor. In this example, we are using a ByteArrayInputStream as the flow file content, although you could use a JSON file in a resource folder just as well. \n\nOnce a test runner is created, you can set flow file properties on it using `runner.setProperties(PropertyDescriptor)` method and can enqueue a file using `runner.enqueue(content)`. Then you can run the test runner and make assertions.\n\nNifi makes it easy to test assertions with some built in assertions for flowfiles. You can test the flowfile was transfered to the appropriate relationship and get the flowfile to test for expected attributes and content.\n\nBelow is how we test the JSON Processor, with a typical test setup for Apache Nifi.\n\n{% codeblock lang:java JSON Processor Unit Test https://github.com/pcgrenier/nifi-examples/blob/master/src/test/java/rocks/nifi/examples/processors/JsonProcessorTest.java JsonProcessorTest.java %}\n@org.junit.Test\n public void testOnTrigger() throws IOException {\n // Content to be mock a json file\n InputStream content = new ByteArrayInputStream(\"{\\\"hello\\\":\\\"nifi rocks\\\"}\".getBytes());\n \n // Generate a test runner to mock a processor in a flow\n TestRunner runner = TestRunners.newTestRunner(new JsonProcessor());\n \n // Add properites\n runner.setProperty(JsonProcessor.JSON_PATH, \"$.hello\");\n \n // Add the content to the runner\n runner.enqueue(content);\n \n // Run the enqueued content, it also takes an int = number of contents queued\n runner.run(1);\n \n // All results were processed with out failure\n runner.assertQueueEmpty();\n \n // If you need to read or do aditional tests on results you can access the content\n List<MockFlowFile> results = runner.getFlowFilesForRelationship(JsonProcessor.SUCCESS);\n assertTrue(\"1 match\", results.size() == 1);\n MockFlowFile result = results.get(0);\n String resultValue = new String(runner.getContentAsByteArray(result));\n System.out.println(\"Match: \" + IOUtils.toString(runner.getContentAsByteArray(result)));\n \n // Test attributes and content\n result.assertAttributeEquals(JsonProcessor.MATCH_ATTR, \"nifi rocks\");\n result.assertContentEquals(\"nifi rocks\");\n \n }\n{% endcodeblock %}\n\nWatch for our next post about unit testing Apache Nifi with Mockito and JUnit without using nifi's testrunner class.","html":"<p>The Apache Nifi framework has built in unit testing with Junit using test runners. They have a few examples in their code base, but learning first hand really helps. In this post we’ll go over adding unit tests to the <a href=\"{{%20site.url%20}}/developing-a-custom-apache-nifi-processor-json/\">JSON Processor</a> that we developed previously.</p>\n\n<p>To start, we’ll checkout the JSON Processor code from <a href=\"https://github.com/pcgrenier/nifi-examples\">Github</a> and then open it up in your favorite text editor. There is already a test package/folder in the project that contains a unit test, <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/test/java/rocks/nifi/examples/processors/JsonProcessorTest.java\">rocks.nifi.examples.processors/JsonProcessorTest.java</a>.</p>\n\n<p>When unit testing in Apache Nifi, there are a few items on top of the normal JUnit annotations that are required. While you can unit test with just JUnit, using the built in method in Apache Nifi makes it much easier. In a future post we’ll show you how to unit test using Mockito and JUnit to test helper methods where actually invoking a full processor seems excessive, or if you just don’t want to use Nifi’s test runner.</p>\n\n<!-- more -->\n\n\n<p>The first thing to do to unit test Apache Nifi is grabbing the necessary dependencies from maven.</p>\n\n<p>{% codeblock lang:xml pom.xml <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/pom.xml#L42\">https://github.com/pcgrenier/nifi-examples/blob/master/pom.xml#L42</a> %}\n <dependency>\n <groupId>org.apache.nifi</groupId>\n <artifactId>nifi-mock</artifactId>\n <version>${nifi.version}</version>\n <scope>test</scope>\n </dependency></p>\n\n<p>{% endcodeblock %}</p>\n\n<p>After including the nifi-mock dependency, there will be a few includes from the org.apache.nifi.utils package that will be necessary to import; mainly TestRunner, TestRunners and MockFlowFile.</p>\n\n<p>Since we are using JUnit, the usual tags apply for defining a test function inside the class and we’ll use the @Test for this one, although the other annotations can be used. After adding the JUnit tag, there are a few requirements that Nifi puts on you in order to utilize the test runners they have provided. The first is to create a test runner to use and then to set a flow file or flow file content inorder to run the processor. In this example, we are using a ByteArrayInputStream as the flow file content, although you could use a JSON file in a resource folder just as well.</p>\n\n<p>Once a test runner is created, you can set flow file properties on it using <code>runner.setProperties(PropertyDescriptor)</code> method and can enqueue a file using <code>runner.enqueue(content)</code>. Then you can run the test runner and make assertions.</p>\n\n<p>Nifi makes it easy to test assertions with some built in assertions for flowfiles. You can test the flowfile was transfered to the appropriate relationship and get the flowfile to test for expected attributes and content.</p>\n\n<p>Below is how we test the JSON Processor, with a typical test setup for Apache Nifi.</p>\n\n<p>{% codeblock lang:java JSON Processor Unit Test <a href=\"https://github.com/pcgrenier/nifi-examples/blob/master/src/test/java/rocks/nifi/examples/processors/JsonProcessorTest.java\">https://github.com/pcgrenier/nifi-examples/blob/master/src/test/java/rocks/nifi/examples/processors/JsonProcessorTest.java</a> JsonProcessorTest.java %}\n@org.junit.Test\n public void testOnTrigger() throws IOException {\n // Content to be mock a json file\n InputStream content = new ByteArrayInputStream(“{\\\"hello\\”:\\“nifi rocks\\”}“.getBytes());</p>\n\n<pre><code> // Generate a test runner to mock a processor in a flow\n TestRunner runner = TestRunners.newTestRunner(new JsonProcessor());\n\n // Add properites\n runner.setProperty(JsonProcessor.JSON_PATH, \"$.hello\");\n\n // Add the content to the runner\n runner.enqueue(content);\n\n // Run the enqueued content, it also takes an int = number of contents queued\n runner.run(1);\n\n // All results were processed with out failure\n runner.assertQueueEmpty();\n\n // If you need to read or do aditional tests on results you can access the content\n List<MockFlowFile> results = runner.getFlowFilesForRelationship(JsonProcessor.SUCCESS);\n assertTrue(\"1 match\", results.size() == 1);\n MockFlowFile result = results.get(0);\n String resultValue = new String(runner.getContentAsByteArray(result));\n System.out.println(\"Match: \" + IOUtils.toString(runner.getContentAsByteArray(result)));\n\n // Test attributes and content\n result.assertAttributeEquals(JsonProcessor.MATCH_ATTR, \"nifi rocks\");\n result.assertContentEquals(\"nifi rocks\");\n\n}\n</code></pre>\n\n<p>{% endcodeblock %}</p>\n\n<p>Watch for our next post about unit testing Apache Nifi with Mockito and JUnit without using nifi’s testrunner class.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1428166492000,"created_by":1,"updated_at":1428166492000,"updated_by":1,"published_at":1428166492000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"processors"},{"name":"unit tests"}]},{"title":"Apache Nifi Graduation","slug":"apache-nifi-graduation","markdown":"Apache Nifi graduated from a podling to a top level Apache Software Foundation project a few months ago, which is great news! Congratulations to the NiFi team and all the contributors to making Apache Nifi great and meeting the requirements to move from an incubating project to a top level project. \n\n<!-- more -->\n### So...what does this mean?\nIf you are not familiar with Apache, new projects are first accepted into the Apache incubator. A list of these type of projects can be seen on [Apache's Incubator Page](http://incubator.apache.org/). The move from a podling, a project that is in incubation, to a Top Level Project (TLP), means Nifi can be taken more seriously and now has Apache behind it. It's not just some pet project where a guy decided to give out his source code; It is an Apache TLP, with the likes of Hadoop, Cassandra, etc. \n\nSo congratulations to Apache Nifi, looking forward to what will come next!","html":"<p>Apache Nifi graduated from a podling to a top level Apache Software Foundation project a few months ago, which is great news! Congratulations to the NiFi team and all the contributors to making Apache Nifi great and meeting the requirements to move from an incubating project to a top level project.</p>\n\n<!-- more -->\n\n\n<h3>So…what does this mean?</h3>\n\n<p>If you are not familiar with Apache, new projects are first accepted into the Apache incubator. A list of these type of projects can be seen on <a href=\"http://incubator.apache.org/\">Apache’s Incubator Page</a>. The move from a podling, a project that is in incubation, to a Top Level Project (TLP), means Nifi can be taken more seriously and now has Apache behind it. It’s not just some pet project where a guy decided to give out his source code; It is an Apache TLP, with the likes of Hadoop, Cassandra, etc.</p>\n\n<p>So congratulations to Apache Nifi, looking forward to what will come next!</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1443489431000,"created_by":1,"updated_at":1443489431000,"updated_by":1,"published_at":1443489431000,"published_by":1,"tags":[{"name":"apache nifi"}]},{"title":"Apache Nifi Release 0.3.0 highlights","slug":"apache-nifi-release-0-dot-3-0-highlights","markdown":"Apache Nifi released their third release, [0.3.0](https://nifi.apache.org/download.html) since graduating to a TLP. It was a minor release but still included a descent number of changes.\n\n* [Bug Fixes](#bugfixes): \t\t\t56\n* [Improvements](#improvements): \t20\n* [New Features](#newfeatures): \t5\n\nRelease Highlights according to nifi can be found on their [confluence page](https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.3.0), but are also listed below.\n\n* Performance improvements in handling large volumes of small files.\n* Performance improvements in Provenance repositories.\n* Added Reporting Task for ApacheTM AmbariTM.\n* Improved stability of nifi bootstrap.\n* Added Processors for working with images.\n* Support for interacting with Kerberos enabled Hadoop clusters\n* Added additional Avro capabilities - merging datafiles & converting to json\n* Added processors for performing INSERT, UPDATE, DELETE statements against relational databases\n\n<!--more-->\n### <a name=\"bugfixes\"></a>Bug Fixes\nThe bug fixes in this version are plenty. Quite a few were related to the management of a nifi instance, the starting and stopping and how the nifi.sh script works. It's good to see that all parts of the code are getting worked on and not just the core nifi framework. There were also quite a few fixes dealing with some of the current processors, most notably GetSFTP, GetFTP, GetKafka, GetTwitter, EncryptContent, ExecuteSQL, ExecuteFlumeSource, and the Nifi Spark Receiver. Processor fixes are great since they are the major part of the flow!\n\n### <a name=\"improvements\"></a>Improvements\nThe 20 improvements is a gain substantial considering the quick turn around between release 0.2.1 and 0.3.0. Documentation was updated which is always good, both the admin guide and user guide.\n\n### <a name=\"new-features\"></a>New Features\nNew features are always the exciting stuff that you've been waiting for, and if you were waiting for Apache Flume support, this might just be the release for you! The 5 new features are listed below:\n\n* Add processors that can run Apache Flume sources/sinks\n* Create reporting task to deliver metrics to Apach Ambari\n* Add location bounding box filter to twitter processor\n* Create a NAR for handling images\n* Kerberos support for Hadoop processors\n\nAs always, new features are warmly welcomed. It's great to see the incorporation of more Apache projects into Apache Nifi processors, with both Flume and Ambari processors being included in this release.\n\nThe next release, 0.4.0, has 90 issues currently in [Jira](https://issues.apache.org/jira/browse/NIFI/fixforversion/12333070/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel). I'm not completely sure if Nifi has settled on firm release dates or not, but I imagine this one should be out in a month or two, if anything by the end of the year.\n\nThe full list of release notes can be found on [Apache Nifi's Jira](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329653) This release is available through Nifi's site [download's section](https://nifi.apache.org/download.html), along with their previous releases.\n\nWe've missed a few of Nifi's releases, 0.2.0-incubating and 0.2.1 since they graduated but hopefully will continue to give you a break down of Nifi releases as they come!","html":"<p>Apache Nifi released their third release, <a href=\"https://nifi.apache.org/download.html\">0.3.0</a> since graduating to a TLP. It was a minor release but still included a descent number of changes.</p>\n\n<ul>\n<li><a href=\"#bugfixes\">Bug Fixes</a>: 56</li>\n<li><a href=\"#improvements\">Improvements</a>: 20</li>\n<li><a href=\"#newfeatures\">New Features</a>: 5</li>\n</ul>\n\n\n<p>Release Highlights according to nifi can be found on their <a href=\"https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.3.0\">confluence page</a>, but are also listed below.</p>\n\n<ul>\n<li>Performance improvements in handling large volumes of small files.</li>\n<li>Performance improvements in Provenance repositories.</li>\n<li>Added Reporting Task for ApacheTM AmbariTM.</li>\n<li>Improved stability of nifi bootstrap.</li>\n<li>Added Processors for working with images.</li>\n<li>Support for interacting with Kerberos enabled Hadoop clusters</li>\n<li>Added additional Avro capabilities - merging datafiles & converting to json</li>\n<li>Added processors for performing INSERT, UPDATE, DELETE statements against relational databases</li>\n</ul>\n\n\n<!--more-->\n\n\n<h3><a name=\"bugfixes\"></a>Bug Fixes</h3>\n\n<p>The bug fixes in this version are plenty. Quite a few were related to the management of a nifi instance, the starting and stopping and how the nifi.sh script works. It’s good to see that all parts of the code are getting worked on and not just the core nifi framework. There were also quite a few fixes dealing with some of the current processors, most notably GetSFTP, GetFTP, GetKafka, GetTwitter, EncryptContent, ExecuteSQL, ExecuteFlumeSource, and the Nifi Spark Receiver. Processor fixes are great since they are the major part of the flow!</p>\n\n<h3><a name=\"improvements\"></a>Improvements</h3>\n\n<p>The 20 improvements is a gain substantial considering the quick turn around between release 0.2.1 and 0.3.0. Documentation was updated which is always good, both the admin guide and user guide.</p>\n\n<h3><a name=\"new-features\"></a>New Features</h3>\n\n<p>New features are always the exciting stuff that you’ve been waiting for, and if you were waiting for Apache Flume support, this might just be the release for you! The 5 new features are listed below:</p>\n\n<ul>\n<li>Add processors that can run Apache Flume sources/sinks</li>\n<li>Create reporting task to deliver metrics to Apach Ambari</li>\n<li>Add location bounding box filter to twitter processor</li>\n<li>Create a NAR for handling images</li>\n<li>Kerberos support for Hadoop processors</li>\n</ul>\n\n\n<p>As always, new features are warmly welcomed. It’s great to see the incorporation of more Apache projects into Apache Nifi processors, with both Flume and Ambari processors being included in this release.</p>\n\n<p>The next release, 0.4.0, has 90 issues currently in <a href=\"https://issues.apache.org/jira/browse/NIFI/fixforversion/12333070/?selectedTab=com.atlassian.jira.jira-projects-plugin:version-issues-panel\">Jira</a>. I’m not completely sure if Nifi has settled on firm release dates or not, but I imagine this one should be out in a month or two, if anything by the end of the year.</p>\n\n<p>The full list of release notes can be found on <a href=\"https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329653\">Apache Nifi’s Jira</a> This release is available through Nifi’s site <a href=\"https://nifi.apache.org/download.html\">download’s section</a>, along with their previous releases.</p>\n\n<p>We’ve missed a few of Nifi’s releases, 0.2.0-incubating and 0.2.1 since they graduated but hopefully will continue to give you a break down of Nifi releases as they come!</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1443489467000,"created_by":1,"updated_at":1443489467000,"updated_by":1,"published_at":1443489467000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]},{"title":"Apache Nifi Release 0.4.0 highlights","slug":"apache-nifi-release-0-dot-4-0-highlights","markdown":"Apache Nifi kicked out thier second release since graduating this past Friday. With the release of 0.4.0, they are one step closer to a 1.0.0 release which will contain some very interesting things. While we look forward to the first big release of Apache Nifi, lets break down the changes in the current build. As always, release 0.4.0 is available on [Apache Nifi's download page](https://nifi.apache.org/download.html).\n\n* [Bug Fixes](#bugfixes): \t\t\t93\n* [Improvements](#improvements): \t44\n* [New Features](#new-features): \t13\n\nThe release notes are available on [Nifi's confluence page](https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.4.0) and a quick highlight can be found below. First lets just say, this is the biggest release so far, with almost double the amount of bug fixes, and over double the amount of improvements and new features! \n\n* General UI improvements to usability\n* Multiple Authentication Mechanisms\n* New Provenance event types and ability to searh provenance events\n* Idle CPU usage was reduced\n* [New Processors!](#newprocessors)\n* Improved OS support\n\n<!--more-->\n### <a name=\"bugfixes\"></a>Bug Fixes\nBug fixes, bug fixes everywhere! So many bug fixes that it's almost impossible to list them all here. Some of the big ones include fixing quite a few issues with MergeContent (efficiencies with 10,000+ files, queue swapping and an issue with delimiters), improved S3 support with PutS3 and FetchS3 fixes (now works with java 8, logs the URL correctly, corrected provenance evnet) and HDFS now acts accordingly with failures (passes to failure relationship and acts as expected for permission failures). There were also fixes with the database connection pooling and with the PutSQL supporting multiple data types and booleans. ConvertJSONToSQL also had quite a few fixes including caching results and metadata. For the full list, either fire up nifi and check it out or look at the release notes! \n\n### <a name=\"improvements\"></a>Improvements\nThere are a few new provenance events with this release, including download event, fetch event, and REMOTE_INVOCATION event. ExecuteSQL can now be run periodically without an input FlowFile allowing for expanded use of SQL tasks. InvokeHTTP has been improved, it now has unique ids across clusters and has additional unit tests. The management side of nifi has been refactored slightly, mainly to clean up the shell scripts and fix some whitespace bugs. SSL support has been added to the PutS3 processor. It's good to see improvements in so many different areas of nifi!\n\n### <a name=\"new-features\"></a>New Features\nThe 13 new features include a few <a name=\"newprocessors\"></a>new processors:\n\n* AttributesToJSON\n* DeleteS3Object\n* ExtractAvroMetadata\n* FetchFile\n* FetchSFTP\n* GetAzureEventHub\n* GetCouchbaseKey\n* GetHBase\n* ListenSyslog\n* ListFile\n* ListSFTP\n* ParseSyslog\n* PutAzureEventHub\n* PutCouchbaseKey\n* PutDistributedMapCache\n* PutHBaseCell\n* PutHBaseJSON\n* PutSyslog\n* RouteText\n* SplitAvro\n* TailFile\n\nTo see more information about all Nifi Processors, checkout our full list of [nifi processors and their quick descriptions]({{ site.url }}/apache-nifi-processors/). You can also check [Nifi's Documentation](https://nifi.apache.org/docs.html) for a little more information. In addition to the new processors, Nifi now supports LDAP authentication ontop of username/password and two way SSL. An enhancement to the UI now allows users to drop queued FLowFiles. Before you had to stop the processor, add a new processor with a failure to discard the file and restart Nifi just to clear a queue! Now it's much easier! \n\nThere is talks already about the first major release of Apache Nifi, 1.0.0 and redoing some of the decisions they made which would break backwords compatability. We look forward to their next release which seems to be about every 6-8 weeks. See you guys then!","html":"<p>Apache Nifi kicked out thier second release since graduating this past Friday. With the release of 0.4.0, they are one step closer to a 1.0.0 release which will contain some very interesting things. While we look forward to the first big release of Apache Nifi, lets break down the changes in the current build. As always, release 0.4.0 is available on <a href=\"https://nifi.apache.org/download.html\">Apache Nifi’s download page</a>.</p>\n\n<ul>\n<li><a href=\"#bugfixes\">Bug Fixes</a>: 93</li>\n<li><a href=\"#improvements\">Improvements</a>: 44</li>\n<li><a href=\"#new-features\">New Features</a>: 13</li>\n</ul>\n\n\n<p>The release notes are available on <a href=\"https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.4.0\">Nifi’s confluence page</a> and a quick highlight can be found below. First lets just say, this is the biggest release so far, with almost double the amount of bug fixes, and over double the amount of improvements and new features!</p>\n\n<ul>\n<li>General UI improvements to usability</li>\n<li>Multiple Authentication Mechanisms</li>\n<li>New Provenance event types and ability to searh provenance events</li>\n<li>Idle CPU usage was reduced</li>\n<li><a href=\"#newprocessors\">New Processors!</a></li>\n<li>Improved OS support</li>\n</ul>\n\n\n<!--more-->\n\n\n<h3><a name=\"bugfixes\"></a>Bug Fixes</h3>\n\n<p>Bug fixes, bug fixes everywhere! So many bug fixes that it’s almost impossible to list them all here. Some of the big ones include fixing quite a few issues with MergeContent (efficiencies with 10,000+ files, queue swapping and an issue with delimiters), improved S3 support with PutS3 and FetchS3 fixes (now works with java 8, logs the URL correctly, corrected provenance evnet) and HDFS now acts accordingly with failures (passes to failure relationship and acts as expected for permission failures). There were also fixes with the database connection pooling and with the PutSQL supporting multiple data types and booleans. ConvertJSONToSQL also had quite a few fixes including caching results and metadata. For the full list, either fire up nifi and check it out or look at the release notes!</p>\n\n<h3><a name=\"improvements\"></a>Improvements</h3>\n\n<p>There are a few new provenance events with this release, including download event, fetch event, and REMOTE_INVOCATION event. ExecuteSQL can now be run periodically without an input FlowFile allowing for expanded use of SQL tasks. InvokeHTTP has been improved, it now has unique ids across clusters and has additional unit tests. The management side of nifi has been refactored slightly, mainly to clean up the shell scripts and fix some whitespace bugs. SSL support has been added to the PutS3 processor. It’s good to see improvements in so many different areas of nifi!</p>\n\n<h3><a name=\"new-features\"></a>New Features</h3>\n\n<p>The 13 new features include a few <a name=\"newprocessors\"></a>new processors:</p>\n\n<ul>\n<li>AttributesToJSON</li>\n<li>DeleteS3Object</li>\n<li>ExtractAvroMetadata</li>\n<li>FetchFile</li>\n<li>FetchSFTP</li>\n<li>GetAzureEventHub</li>\n<li>GetCouchbaseKey</li>\n<li>GetHBase</li>\n<li>ListenSyslog</li>\n<li>ListFile</li>\n<li>ListSFTP</li>\n<li>ParseSyslog</li>\n<li>PutAzureEventHub</li>\n<li>PutCouchbaseKey</li>\n<li>PutDistributedMapCache</li>\n<li>PutHBaseCell</li>\n<li>PutHBaseJSON</li>\n<li>PutSyslog</li>\n<li>RouteText</li>\n<li>SplitAvro</li>\n<li>TailFile</li>\n</ul>\n\n\n<p>To see more information about all Nifi Processors, checkout our full list of <a href=\"{{%20site.url%20}}/apache-nifi-processors/\">nifi processors and their quick descriptions</a>. You can also check <a href=\"https://nifi.apache.org/docs.html\">Nifi’s Documentation</a> for a little more information. In addition to the new processors, Nifi now supports LDAP authentication ontop of username/password and two way SSL. An enhancement to the UI now allows users to drop queued FLowFiles. Before you had to stop the processor, add a new processor with a failure to discard the file and restart Nifi just to clear a queue! Now it’s much easier!</p>\n\n<p>There is talks already about the first major release of Apache Nifi, 1.0.0 and redoing some of the decisions they made which would break backwords compatability. We look forward to their next release which seems to be about every 6-8 weeks. See you guys then!</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1450148456000,"created_by":1,"updated_at":1450148456000,"updated_by":1,"published_at":1450148456000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]},{"title":"Developing a Custom Apache Nifi Controller Service","slug":"developing-a-custom-apache-nifi-controller-service","markdown":"Controller services are shared between processors, other controller services and reporting tasks. Normally they provide access to a shared resource, such as a database or ssl context, or externally managed content. This post will cover the basics of a controller service through a simple example. This example will take a file path that contains one or more properties files, and provide a processor access to those properties. The full source is hosted on [Github](https://github.com/pcgrenier/nifi-examples/tree/sample-processor).\n\n<!-- more -->\n\n## Setup\n\nThis project will use a more advanced maven structure than the simple one used in the [developing a custom processor]({{site.url}}//developing-a-custom-apache-nifi-processor-json) post. If you have looked at the processor post you'll see that most of the setup for services is very similar. \n\n{% codeblock Apache Nifi Controller Servcie Folder Structure %}\n./sample-bundle\n├── pom.xml\n├── sample-bundle-nar\n│ ├── pom.xml\n│ └── src\n├── sample-controller-service\n│ ├── pom.xml\n│ └── src\n├── sample-controller-service-api\n│ ├── pom.xml\n│ └── src\n├── sample-controller-service-api-nar\n│ ├── pom.xml\n│ └── src\n└── sample-processor\n ├── pom.xml\n └── src\n{% endcodeblock %}\n\nI won't go into details on the pom files but the general idea is that you want a seperate nar for the api interface and the service itself, this allows something much smaller to be used as a dependency in your other bundles. So the sample-bundle-nar will pull in the sample-processor and sample-controller-service packages. The sample-controller-service-api-nar will just pull in the sample-controller-service-api.\n\n## The Controller Service API Interface\n\n{% codeblock lang:java Controller Service API https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service-api/src/main/java/rocks/nifi/examples/PropertiesFileService.java PropertiesFileService.java %}\npublic interface PropertiesFileService extends ControllerService{\n String getProperty(String key);\n}\n{% endcodeblock %}\n\nThis is just a simple interface that extends nifi's ControllerService. We also provide the only entry point to processors, the getProperty function. This is similar to the current services in Apache Nifi such as [DBCPService.java](https://raw.githubusercontent.com/apache/nifi/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-api/src/main/java/org/apache/nifi/dbcp/DBCPService.java) providing the getConnection() function.\n\n## The Controller Service\n\nIf you have read the [developing a custom processor]({{site.url}}//developing-a-custom-apache-nifi-processor-json) post a lot of this will be review. Controller services provide the same interfaces for configuration and validation. The initialization method only differs in taking a ControllerServiceInitializationContext.\n\nJust like with the processors, tags are useful for finding your controller services. The capability description annotation provides a simple explanation of what the controller service will provide.\n\n{% codeblock lang:java Controller Service https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java StandardPropertiesFileService.java %}\n@Tags({\"nifirocks\", \"properties\"})\n@CapabilityDescription(\"Provides a controller service to manage property files.\")\npublic class StandardPropertiesFileService extends AbstractControllerService implements PropertiesFileService{\n{% endcodeblock %}\n\nNext we create the property descriptors. One will take the file or directory holding the property files, the other will be how often to check. Unlike processors controller services do not contain relationships.\n\n{% codeblock lang:java Controller Service https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java StandardPropertiesFileService.java %}\npublic static final PropertyDescriptor CONFIG_URI = new PropertyDescriptor.Builder()\n .name(\"Configuration Directory\")\n .description(\"Configuration directory for properties files.\")\n .defaultValue(null)\n .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)\n .build();\n\npublic static final PropertyDescriptor RELOAD_INTERVAL = new PropertyDescriptor.Builder()\n .name(\"Reload Interval\")\n .description(\"Time before looking for changes\")\n .defaultValue(\"60 min\")\n .required(true)\n .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)\n .build();\n\nprivate static final List<PropertyDescriptor> serviceProperties;\n\nstatic{\n final List<PropertyDescriptor> props = new ArrayList<>();\n props.add(CONFIG_URI);\n props.add(RELOAD_INTERVAL);\n serviceProperties = Collections.unmodifiableList(props);\n}\n\n{% endcodeblock %}\n\nThe next step is the onConfigured function which will read the properties set and call any other necessary functions needed to start the service. We just read the two properties we have, configUri and reloadIntervalMilli, and then call loadPropertiesFile(). After the properties file is loaded, we start up a file watcher and executer so that the properties can be dynamic and not just read in at startup. \n\n{% codeblock lang:java Conroller Service https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java StandardPropertiesFileService.java %}\n@OnEnabled\npublic void onConfigured(final ConfigurationContext context) throws InitializationException{\n log.info(\"Starting properties file service\");\n configUri = context.getProperty(CONFIG_URI).getValue();\n reloadIntervalMilli = context.getProperty(RELOAD_INTERVAL).asTimePeriod(TimeUnit.MILLISECONDS);\n\n // Initialize the properties\n loadPropertiesFiles();\n\n fileWatcher = new SynchronousFileWatcher(Paths.get(configUri), new LastModifiedMonitor());\n executor = Executors.newSingleThreadScheduledExecutor();\n FilesWatcherWorker reloadTask = new FilesWatcherWorker();\n executor.scheduleWithFixedDelay(reloadTask, 0, reloadIntervalMilli, TimeUnit.MILLISECONDS);\n\n}\n\n{% endcodeblock %}\n\nTo see the other private functions you can refer to the github code. \n\nThe last step in the process is to create a processor to use the service. This is exactly the same as creating a normal processor, but in this instance we want to add some specifics to use the PropertiesFileService. To do this, we just grab a reference from context of the Controller Service, and then call the getProperty(propertyName) function. We are just going to get the property and add it to the nifi flowfile properties so it is available to other processors down the line for now. To specify which property we want we will add a PropertyDescriptor so that the user can set it in the Nifi UI, and another PropertyDescriptor to specify which PropertiesFileService to get the property value from.\n\n{% codeblock lang:java Controll Service Processor https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-processor/src/main/java/rocks/nifi/examples/processors/ControllerServiceProcessor.java ControllerServiceProcessor.java %}\n\npublic static final PropertyDescriptor PROPERTY_NAME = new PropertyDescriptor.Builder()\n .name(\"Property Name\")\n .required(true)\n .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)\n .build();\n\n public static final PropertyDescriptor PROPERTIES_SERVICE = new PropertyDescriptor.Builder()\n .name(\"Properties Service\")\n .description(\"System properties loader\")\n .required(false)\n .identifiesControllerService(PropertiesFileService.class)\n .build();\n.......\n\n@Override\npublic void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {\n final ProcessorLog log = this.getLogger();\n final AtomicReference<String> value = new AtomicReference<>();\n\n final String propertyName = context.getProperty(PROPERTY_NAME).getValue();\n final PropertiesFileService propertiesService = context.getProperty(PROPERTIES_SERVICE).asControllerService(PropertiesFileService.class);\n final String property = propertiesService.getProperty(propertyName);\n log.info(\"Property = \" + property);\n\n\n FlowFile flowfile = session.get();\n // Write the results to an attribute\n\n if(property != null && !property.isEmpty()){\n flowfile = session.putAttribute(flowfile, \"property\", property);\n }\n\n session.transfer(flowfile, SUCCESS);\n}\n\n{% endcodeblock %}\n\nOnce you have the actual service and processor written, if you have followed the directory setup in the source code, you are good to go! Just build your service project(mvn clean install), from the root pom.xml directory. Then copy the nar, from the sample-bundle-nar/target/ directory, into your Apache Nifi instance lib directory, . Once copied, start/restart Apache Nifi and you now have your service available as usual to be used!\n\n## Configuring the Service\n\nOnce you have deployed the service nar bundle, go to the Controller Settings in the upper right of the web gui.\n\n{% img /images/nifi-controller-settings.png %}\n\nThen search or select the Controller Services tab and click the '+' button on the upper right of the model. You can either search for the StandardPropertiesFileService, or just select it since there aren't many services. From there configure the service as needed.\n\n{% img /images/controller-service.png %}\n\nOnce the service is configured, add the ControllerServiceProcessor to the flow and configure the PropertyName and PropertyService, the name of the property that you want from the Java properties file and the PropertiesService that you just setup.\n\n{% img /images/controller-processor.png %}\n\nAnd that is pretty much it for configuring and setting up a service for use in a flow. Now you just use the Flow file attribute as you would any other attribute. This could be extended to grab multiple properties, maybe all of the from a file, and set them as Flow file attributes. This is just a basic example showing how you can create a controller service that fits your needs.\n\nIf you have any questions about custom services, let us know below or at info@nifi.rocks!","html":"<p>Controller services are shared between processors, other controller services and reporting tasks. Normally they provide access to a shared resource, such as a database or ssl context, or externally managed content. This post will cover the basics of a controller service through a simple example. This example will take a file path that contains one or more properties files, and provide a processor access to those properties. The full source is hosted on <a href=\"https://github.com/pcgrenier/nifi-examples/tree/sample-processor\">Github</a>.</p>\n\n<!-- more -->\n\n\n<h2>Setup</h2>\n\n<p>This project will use a more advanced maven structure than the simple one used in the <a href=\"{{site.url}}//developing-a-custom-apache-nifi-processor-json\">developing a custom processor</a> post. If you have looked at the processor post you’ll see that most of the setup for services is very similar.</p>\n\n<p>{% codeblock Apache Nifi Controller Servcie Folder Structure %}\n./sample-bundle\n├── pom.xml\n├── sample-bundle-nar\n│ ├── pom.xml\n│ └── src\n├── sample-controller-service\n│ ├── pom.xml\n│ └── src\n├── sample-controller-service-api\n│ ├── pom.xml\n│ └── src\n├── sample-controller-service-api-nar\n│ ├── pom.xml\n│ └── src\n└── sample-processor\n ├── pom.xml\n └── src\n{% endcodeblock %}</p>\n\n<p>I won’t go into details on the pom files but the general idea is that you want a seperate nar for the api interface and the service itself, this allows something much smaller to be used as a dependency in your other bundles. So the sample-bundle-nar will pull in the sample-processor and sample-controller-service packages. The sample-controller-service-api-nar will just pull in the sample-controller-service-api.</p>\n\n<h2>The Controller Service API Interface</h2>\n\n<p>{% codeblock lang:java Controller Service API <a href=\"https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service-api/src/main/java/rocks/nifi/examples/PropertiesFileService.java\">https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service-api/src/main/java/rocks/nifi/examples/PropertiesFileService.java</a> PropertiesFileService.java %}\npublic interface PropertiesFileService extends ControllerService{\n String getProperty(String key);\n}\n{% endcodeblock %}</p>\n\n<p>This is just a simple interface that extends nifi’s ControllerService. We also provide the only entry point to processors, the getProperty function. This is similar to the current services in Apache Nifi such as <a href=\"https://raw.githubusercontent.com/apache/nifi/master/nifi-nar-bundles/nifi-standard-services/nifi-dbcp-service-api/src/main/java/org/apache/nifi/dbcp/DBCPService.java\">DBCPService.java</a> providing the getConnection() function.</p>\n\n<h2>The Controller Service</h2>\n\n<p>If you have read the <a href=\"{{site.url}}//developing-a-custom-apache-nifi-processor-json\">developing a custom processor</a> post a lot of this will be review. Controller services provide the same interfaces for configuration and validation. The initialization method only differs in taking a ControllerServiceInitializationContext.</p>\n\n<p>Just like with the processors, tags are useful for finding your controller services. The capability description annotation provides a simple explanation of what the controller service will provide.</p>\n\n<p>{% codeblock lang:java Controller Service <a href=\"https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java\">https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java</a> StandardPropertiesFileService.java %}\n@Tags({“nifirocks”, “properties”})\n@CapabilityDescription(“Provides a controller service to manage property files.”)\npublic class StandardPropertiesFileService extends AbstractControllerService implements PropertiesFileService{\n{% endcodeblock %}</p>\n\n<p>Next we create the property descriptors. One will take the file or directory holding the property files, the other will be how often to check. Unlike processors controller services do not contain relationships.</p>\n\n<p>{% codeblock lang:java Controller Service <a href=\"https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java\">https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java</a> StandardPropertiesFileService.java %}\npublic static final PropertyDescriptor CONFIG_URI = new PropertyDescriptor.Builder()\n .name(“Configuration Directory”)\n .description(“Configuration directory for properties files.”)\n .defaultValue(null)\n .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)\n .build();</p>\n\n<p>public static final PropertyDescriptor RELOAD_INTERVAL = new PropertyDescriptor.Builder()\n .name(“Reload Interval”)\n .description(“Time before looking for changes”)\n .defaultValue(“60 min”)\n .required(true)\n .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)\n .build();</p>\n\n<p>private static final List<PropertyDescriptor> serviceProperties;</p>\n\n<p>static{\n final List<PropertyDescriptor> props = new ArrayList<>();\n props.add(CONFIG_URI);\n props.add(RELOAD_INTERVAL);\n serviceProperties = Collections.unmodifiableList(props);\n}</p>\n\n<p>{% endcodeblock %}</p>\n\n<p>The next step is the onConfigured function which will read the properties set and call any other necessary functions needed to start the service. We just read the two properties we have, configUri and reloadIntervalMilli, and then call loadPropertiesFile(). After the properties file is loaded, we start up a file watcher and executer so that the properties can be dynamic and not just read in at startup.</p>\n\n<p>{% codeblock lang:java Conroller Service <a href=\"https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java\">https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-controller-service/src/main/java/rocks/nifi/examples/StandardPropertiesFileService.java</a> StandardPropertiesFileService.java %}\n@OnEnabled\npublic void onConfigured(final ConfigurationContext context) throws InitializationException{\n log.info(“Starting properties file service”);\n configUri = context.getProperty(CONFIG_URI).getValue();\n reloadIntervalMilli = context.getProperty(RELOAD_INTERVAL).asTimePeriod(TimeUnit.MILLISECONDS);</p>\n\n<pre><code>// Initialize the properties\nloadPropertiesFiles();\n\nfileWatcher = new SynchronousFileWatcher(Paths.get(configUri), new LastModifiedMonitor());\nexecutor = Executors.newSingleThreadScheduledExecutor();\nFilesWatcherWorker reloadTask = new FilesWatcherWorker();\nexecutor.scheduleWithFixedDelay(reloadTask, 0, reloadIntervalMilli, TimeUnit.MILLISECONDS);\n</code></pre>\n\n<p>}</p>\n\n<p>{% endcodeblock %}</p>\n\n<p>To see the other private functions you can refer to the github code.</p>\n\n<p>The last step in the process is to create a processor to use the service. This is exactly the same as creating a normal processor, but in this instance we want to add some specifics to use the PropertiesFileService. To do this, we just grab a reference from context of the Controller Service, and then call the getProperty(propertyName) function. We are just going to get the property and add it to the nifi flowfile properties so it is available to other processors down the line for now. To specify which property we want we will add a PropertyDescriptor so that the user can set it in the Nifi UI, and another PropertyDescriptor to specify which PropertiesFileService to get the property value from.</p>\n\n<p>{% codeblock lang:java Controll Service Processor <a href=\"https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-processor/src/main/java/rocks/nifi/examples/processors/ControllerServiceProcessor.java\">https://github.com/pcgrenier/nifi-examples/blob/sample-processor/sample-processor/src/main/java/rocks/nifi/examples/processors/ControllerServiceProcessor.java</a> ControllerServiceProcessor.java %}</p>\n\n<p>public static final PropertyDescriptor PROPERTY_NAME = new PropertyDescriptor.Builder()\n .name(“Property Name”)\n .required(true)\n .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)\n .build();</p>\n\n<pre><code>public static final PropertyDescriptor PROPERTIES_SERVICE = new PropertyDescriptor.Builder()\n .name(\"Properties Service\")\n .description(\"System properties loader\")\n .required(false)\n .identifiesControllerService(PropertiesFileService.class)\n .build();\n</code></pre>\n\n<p>…….</p>\n\n<p>@Override\npublic void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {\n final ProcessorLog log = this.getLogger();\n final AtomicReference<String> value = new AtomicReference<>();</p>\n\n<pre><code>final String propertyName = context.getProperty(PROPERTY_NAME).getValue();\nfinal PropertiesFileService propertiesService = context.getProperty(PROPERTIES_SERVICE).asControllerService(PropertiesFileService.class);\nfinal String property = propertiesService.getProperty(propertyName);\nlog.info(\"Property = \" + property);\n\n\nFlowFile flowfile = session.get();\n// Write the results to an attribute\n\nif(property != null && !property.isEmpty()){\n flowfile = session.putAttribute(flowfile, \"property\", property);\n}\n\nsession.transfer(flowfile, SUCCESS);\n</code></pre>\n\n<p>}</p>\n\n<p>{% endcodeblock %}</p>\n\n<p>Once you have the actual service and processor written, if you have followed the directory setup in the source code, you are good to go! Just build your service project(mvn clean install), from the root pom.xml directory. Then copy the nar, from the sample-bundle-nar/target/ directory, into your Apache Nifi instance lib directory, . Once copied, start/restart Apache Nifi and you now have your service available as usual to be used!</p>\n\n<h2>Configuring the Service</h2>\n\n<p>Once you have deployed the service nar bundle, go to the Controller Settings in the upper right of the web gui.</p>\n\n<p>{% img /images/nifi-controller-settings.png %}</p>\n\n<p>Then search or select the Controller Services tab and click the ‘+’ button on the upper right of the model. You can either search for the StandardPropertiesFileService, or just select it since there aren’t many services. From there configure the service as needed.</p>\n\n<p>{% img /images/controller-service.png %}</p>\n\n<p>Once the service is configured, add the ControllerServiceProcessor to the flow and configure the PropertyName and PropertyService, the name of the property that you want from the Java properties file and the PropertiesService that you just setup.</p>\n\n<p>{% img /images/controller-processor.png %}</p>\n\n<p>And that is pretty much it for configuring and setting up a service for use in a flow. Now you just use the Flow file attribute as you would any other attribute. This could be extended to grab multiple properties, maybe all of the from a file, and set them as Flow file attributes. This is just a basic example showing how you can create a controller service that fits your needs.</p>\n\n<p>If you have any questions about custom services, let us know below or at info@nifi.rocks!</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1454805477000,"created_by":1,"updated_at":1454805477000,"updated_by":1,"published_at":1454805477000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"controller service"}]},{"title":"Apache Nifi Release 0.5.0 Highlights","slug":"apache-nifi-release-0-dot-5-0-highlights","markdown":"Apache Nifi just voted on another release and have annouced and released 0.5.0, probably their most interesting release to date. This release contains quite a few big features that expand what nifi has to offer and also how you can use it; notably script execution in a flow. This new processor allows you to execute scripts in Javascript, Groovy, JRuby, Jython, Lua, and Python. To grab the latest binaries, go to\n[Apache Nifi's download page](https://nifi.apache.org/download.html) The full release notes are also available on [Apache Nifi's wiki page](https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.5.0).\n\n* [Bug Fixes](#bugfixes): \t\t\t64\n* [Improvements](#improvements): \t28\n* [New Features](#new-features): \t10\n\nThis release is substantial and really expands Apache Nifi's usablity, specificall for those users who don't feel comfortable with Java. Lets take a deeper look at what else is included!\n\n<!--more-->\n### <a name=\"bugfixes\"></a>Bug Fixes\nSince Nifi is quite a young project, less than a year old in the open source world, it still has quite a few bugs coming up as the user base expands. It's great to see that as bugs come up, they are getting the attention of the community and developers. Each release is getting better and better. For this release, quite a few of the bugs are just under the covers, rewording error messages or documentation fixes. There are also some functionality bugs that were corrected in this release; Controller Service sorting by type and state now works, PutS3 now allows larger than 5GB files, Unit tests are performing better - no time outs on certain environments, a few fixes to PutSQL processor for improved functionality also. The full list of bugs addressed in this release can be found in [Apache Nifi's Jira](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12334158)\n\n### <a name=\"improvements\"></a>Improvements\nApache Nifi Expression language suppport has been added to a few fields for some processors with this release - the MergeContent Processor's Correlation Attribute Field and the xxxS3Object SECRET_KEY and ACCESS_KEY properties. The UI now deletes an object when it is selected and the backspace key is pressed, which is a step in the right direction to improve usability. Their are talks of a new UI that is in the works to allow for better user role management and also use HTML5 and newer web technology for the UI. Some other improvements were new method for the Expression Langauge, getDelimitedField, PutJMS now works as expected for the uri shcema specified.\n\n### <a name=\"new-features\"></a>New Features\nSome big changes for this release! As I talked about before, there are a few new processors - 14 total - and also one that was deprecated/removed - ConvertCSVToSql. The big new processors are a new processor to execute scripts and allow them access to the flowfile attributes - the ExecuteScript processor. There is also a new processor for writing events to Riemann - the PutRiemann processor. ElasticSearch is also now suppported through it's own processors. The last big processor for this release is Get/PutAMQP processors for the AMQP connections - similar to JMS for newer message servers such as rabbitMQ and [Apache's QPID](http://qpid.apache.org/). For a full list of processors, check out our [full updated list of nifi processors]({{site.url}}/apache-nifi-processors/).\n\nAll 14 new processors, with links to their description: \n\n* [ConsumeAMQP]({{site.url}}/apache-nifi-processors/#ConsumeAMQP)\n* [ConvertJSONToSQL]({{site.url}}/apache-nifi-processors/#ConvertJSONToSQL)\n* [ExecuteScript]({{site.url}}/apache-nifi-processors/#ExecuteScript)\n* [FetchDistributedMapCache]({{site.url}}/apache-nifi-processors/#FetchDistributedMapCache)\n* [FetchElasticSearch]({{site.url}}/apache-nifi-processors/#FetchElasticSearch)\n* [GetHTMLElement]({{site.url}}/apache-nifi-processors/#GetHTMLElement)\n* [InferAvroShema]({{site.url}}/apache-nifi-processors/#InferAvroShema)\n* [InvokeScriptedProcessor]({{site.url}}/apache-nifi-processors/#InvokeScriptedProcessor)\n* [ListenRELP]({{site.url}}/apache-nifi-processors/#ListenRELP)\n* [ModifyHTMLElement]({{site.url}}/apache-nifi-processors/#ModifyHTMLElement)\n* [PutHTMLElement]({{site.url}}/apache-nifi-processors/#PutHTMLElement)\n* [PutRiemann]({{site.url}}/apache-nifi-processors/#PutRiemann)\n\n\nAnother big new feature in this release is the ability to inspect/interact with FlowFiles inside of the Apache Nifi UI - flow files inside of the flow! This is so awesome! You can now select a processor/connection and peak at what is happening in a sense and upload/remove files from a queue! Exciting stuff!!\n\nThis release contained quite a few new and interesting additions that we plan on follwing up on with future posts, so be sure to check back to see how these new features can be used.","html":"<p>Apache Nifi just voted on another release and have annouced and released 0.5.0, probably their most interesting release to date. This release contains quite a few big features that expand what nifi has to offer and also how you can use it; notably script execution in a flow. This new processor allows you to execute scripts in Javascript, Groovy, JRuby, Jython, Lua, and Python. To grab the latest binaries, go to\n<a href=\"https://nifi.apache.org/download.html\">Apache Nifi’s download page</a> The full release notes are also available on <a href=\"https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.5.0\">Apache Nifi’s wiki page</a>.</p>\n\n<ul>\n<li><a href=\"#bugfixes\">Bug Fixes</a>: 64</li>\n<li><a href=\"#improvements\">Improvements</a>: 28</li>\n<li><a href=\"#new-features\">New Features</a>: 10</li>\n</ul>\n\n\n<p>This release is substantial and really expands Apache Nifi’s usablity, specificall for those users who don’t feel comfortable with Java. Lets take a deeper look at what else is included!</p>\n\n<!--more-->\n\n\n<h3><a name=\"bugfixes\"></a>Bug Fixes</h3>\n\n<p>Since Nifi is quite a young project, less than a year old in the open source world, it still has quite a few bugs coming up as the user base expands. It’s great to see that as bugs come up, they are getting the attention of the community and developers. Each release is getting better and better. For this release, quite a few of the bugs are just under the covers, rewording error messages or documentation fixes. There are also some functionality bugs that were corrected in this release; Controller Service sorting by type and state now works, PutS3 now allows larger than 5GB files, Unit tests are performing better - no time outs on certain environments, a few fixes to PutSQL processor for improved functionality also. The full list of bugs addressed in this release can be found in <a href=\"https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12334158\">Apache Nifi’s Jira</a></p>\n\n<h3><a name=\"improvements\"></a>Improvements</h3>\n\n<p>Apache Nifi Expression language suppport has been added to a few fields for some processors with this release - the MergeContent Processor’s Correlation Attribute Field and the xxxS3Object SECRET_KEY and ACCESS_KEY properties. The UI now deletes an object when it is selected and the backspace key is pressed, which is a step in the right direction to improve usability. Their are talks of a new UI that is in the works to allow for better user role management and also use HTML5 and newer web technology for the UI. Some other improvements were new method for the Expression Langauge, getDelimitedField, PutJMS now works as expected for the uri shcema specified.</p>\n\n<h3><a name=\"new-features\"></a>New Features</h3>\n\n<p>Some big changes for this release! As I talked about before, there are a few new processors - 14 total - and also one that was deprecated/removed - ConvertCSVToSql. The big new processors are a new processor to execute scripts and allow them access to the flowfile attributes - the ExecuteScript processor. There is also a new processor for writing events to Riemann - the PutRiemann processor. ElasticSearch is also now suppported through it’s own processors. The last big processor for this release is Get/PutAMQP processors for the AMQP connections - similar to JMS for newer message servers such as rabbitMQ and <a href=\"http://qpid.apache.org/\">Apache’s QPID</a>. For a full list of processors, check out our <a href=\"{{site.url}}/apache-nifi-processors/\">full updated list of nifi processors</a>.</p>\n\n<p>All 14 new processors, with links to their description:</p>\n\n<ul>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ConsumeAMQP\">ConsumeAMQP</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ConvertJSONToSQL\">ConvertJSONToSQL</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ExecuteScript\">ExecuteScript</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#FetchDistributedMapCache\">FetchDistributedMapCache</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#FetchElasticSearch\">FetchElasticSearch</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#GetHTMLElement\">GetHTMLElement</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#InferAvroShema\">InferAvroShema</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#InvokeScriptedProcessor\">InvokeScriptedProcessor</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ListenRELP\">ListenRELP</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ModifyHTMLElement\">ModifyHTMLElement</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutHTMLElement\">PutHTMLElement</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutRiemann\">PutRiemann</a></li>\n</ul>\n\n\n<p>Another big new feature in this release is the ability to inspect/interact with FlowFiles inside of the Apache Nifi UI - flow files inside of the flow! This is so awesome! You can now select a processor/connection and peak at what is happening in a sense and upload/remove files from a queue! Exciting stuff!!</p>\n\n<p>This release contained quite a few new and interesting additions that we plan on follwing up on with future posts, so be sure to check back to see how these new features can be used.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1455653389000,"created_by":1,"updated_at":1455653389000,"updated_by":1,"published_at":1455653389000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]},{"title":"Apache Nifi Release 0.6.0 Highlights","slug":"apache-nifi-release-0-dot-6-0-highlights","markdown":"Apache Nifi release 0.6.0 and 0.6.1 recently and I wanted to go throught and highlight some of the key changes that may affect you Nifi users. So far Nifi has kept pretty well to thier 6 week release schedule that they have discussed. That's a really agressive release schedule so we'll see if they can keep that up once they hit the 0.7.0/1.0.0 release, which they say is going to actually be pushed back a little. The 0.6.0 release was again substantial with 52 bug fixes, 15 improvements and 7 new features. 0.6.1 didn't take to long to follow with 11 bug fixes and 1 improvement. The release binaries are on the [Apache Nifi download page](https://nifi.apache.org/download.html)The full release notes are avaialble on [Apaches Nifi's wiki page](https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.6.1). The combined 0.6.* release information is below.\n\n* [Bug Fixes](#bugfixes): \t\t\t63\n* [Improvements](#improvements): \t16\n* [New Features](#new-features): \t7\n\n<!--more-->\n### <a name=\"bugfixes\"></a>Bug Fixes\nSo many bug fixes, yet again! Even though Apache Nifi has not released a major version, Apache Nifi is going strong and fixing those annoying issues. There are quite a few bug fixes that resolve issues with Apache Nifi Processors, which is great since these are the main part of nifi; PutKafka, InferAvroSchema, GetHttp, ControlRate, PutElsaticSearch, GetKafka, ListFile, SplitText, all had fixes! And this just touches on some of the bug fixes. If you really want to see if a specific bug fix was completed, you can check [here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12334372), but really no need, just reset assured the bugs are getting fixed!\n\n### <a name=\"improvements\"></a>Improvements\nThe big improvements in this release are proxy support for the AWS Processors and RouteText to use Nifi Expression Language to route lines. The other improvements are pretty good too, but mostly under the covers, for example ListHDFS to set the correct state in Zookeeper.\n\n### <a name=\"new-features\"></a>New Features\nLet's get straight to them, new features an new processors:\n\n* Processors for Apache [Cassandra]({{site.url}}/apache-nifi-processors/#PutCassandraQL)\n* Kerberos authentication\n* [PublishJMS Processor]({{site.url}}/apache-nifi-processors/#PublishJMS)\n* [ConsumeJMS Processor]({{site.url}}/apache-nifi-processors/#ConsumeJMS)\n* [AWS Lambda Processor]({{site.url}}/apache-nifi-processors/#PutLambda)\n\nThat's pretty much it. To see all processors, checkout our [processor page]({{site.url}}/apache-nifi-processors/) which lists all current processors and their descriptions. Check back after next release for a quick break down!","html":"<p>Apache Nifi release 0.6.0 and 0.6.1 recently and I wanted to go throught and highlight some of the key changes that may affect you Nifi users. So far Nifi has kept pretty well to thier 6 week release schedule that they have discussed. That’s a really agressive release schedule so we’ll see if they can keep that up once they hit the 0.7.0/1.0.0 release, which they say is going to actually be pushed back a little. The 0.6.0 release was again substantial with 52 bug fixes, 15 improvements and 7 new features. 0.6.1 didn’t take to long to follow with 11 bug fixes and 1 improvement. The release binaries are on the <a href=\"https://nifi.apache.org/download.html\">Apache Nifi download page</a>The full release notes are avaialble on <a href=\"https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.6.1\">Apaches Nifi’s wiki page</a>. The combined 0.6.* release information is below.</p>\n\n<ul>\n<li><a href=\"#bugfixes\">Bug Fixes</a>: 63</li>\n<li><a href=\"#improvements\">Improvements</a>: 16</li>\n<li><a href=\"#new-features\">New Features</a>: 7</li>\n</ul>\n\n\n<!--more-->\n\n\n<h3><a name=\"bugfixes\"></a>Bug Fixes</h3>\n\n<p>So many bug fixes, yet again! Even though Apache Nifi has not released a major version, Apache Nifi is going strong and fixing those annoying issues. There are quite a few bug fixes that resolve issues with Apache Nifi Processors, which is great since these are the main part of nifi; PutKafka, InferAvroSchema, GetHttp, ControlRate, PutElsaticSearch, GetKafka, ListFile, SplitText, all had fixes! And this just touches on some of the bug fixes. If you really want to see if a specific bug fix was completed, you can check <a href=\"https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12334372\">here</a>, but really no need, just reset assured the bugs are getting fixed!</p>\n\n<h3><a name=\"improvements\"></a>Improvements</h3>\n\n<p>The big improvements in this release are proxy support for the AWS Processors and RouteText to use Nifi Expression Language to route lines. The other improvements are pretty good too, but mostly under the covers, for example ListHDFS to set the correct state in Zookeeper.</p>\n\n<h3><a name=\"new-features\"></a>New Features</h3>\n\n<p>Let’s get straight to them, new features an new processors:</p>\n\n<ul>\n<li>Processors for Apache <a href=\"{{site.url}}/apache-nifi-processors/#PutCassandraQL\">Cassandra</a></li>\n<li>Kerberos authentication</li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PublishJMS\">PublishJMS Processor</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ConsumeJMS\">ConsumeJMS Processor</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutLambda\">AWS Lambda Processor</a></li>\n</ul>\n\n\n<p>That’s pretty much it. To see all processors, checkout our <a href=\"{{site.url}}/apache-nifi-processors/\">processor page</a> which lists all current processors and their descriptions. Check back after next release for a quick break down!</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1468290308000,"created_by":1,"updated_at":1468290308000,"updated_by":1,"published_at":1468290308000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]},{"title":"Apache Nifi Release 0.7.0 Highlights","slug":"apache-nifi-release-0-dot-7-0-highlights","markdown":"Apache Nifi's newest release is out, 0.7.0. You can grab the binaries from [their site](https://nifi.apache.org/download.html) as always. So lets dive in and see what to look for in this release!\n\n* [Bug Fixes](#bugfixes): \t\t\t72\n* [Improvements](#improvements): \t49\n* [New Features](#new-features): \t10\n\nAs always, a bunch of bug fixes, but this time there are quite a few improvements.\n\n<!--more-->\n### <a name=\"bugfixes\"></a>Bug Fixes\nWhere to begin! This release has the second most bug fixes we've seen. Again, there are a ton to cover so take a peak at the full [release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12335078) if you want to see more, since this is just a quick highlight. Lots of processor fixes - NullPointerException to non-responsive processors, a few nifi startup fixes, and other fixes such as fixing OS specific issues, looking at you Windows bugs. UI interaction was fixed when many HDFS and HBase processors were on the graph. Bug fixes are hard to highlight, since they are so user specific for what you might have run into, but it is nice to know that reported bugs are getting fixed.\n\n### <a name=\"improvements\"></a>Improvements\nI didn't know where to start for Bug fixes, but really don't know where to begin for Improvements. I'll start by saying there are a few new processors that they included in this section, but I'm going to list in the New Features below to remain consistent. Other actual improvements were updates to HTTP Processors to allow proxy authentication which I know I saw someone on here asked about, so it's nice to see that it is now doable! You are now able to use SSL with AMQP processors. ExecuteScript processors can now be executed concurrently too. Performance improvements in StreamScanner, SSL for the mongo processors, and the Mock framework can now register FlowFile Assertions.\n\nIt's good to mention that new documentation has been created for an in depth dive for developers, with Nifi and it's design decisions. This is a great read and I will try to do a quick break down in the future for what points I think are really important for developers.\n\n### <a name=\"new-features\"></a>New Features\nA few new processors here, but outside of that Apache Nifi now supports custom properties in the Expression Language.\n\nNew Processors:\n\n* [ConsumeKafka]({{site.url}}/apache-nifi-processors/#ConsumeKafka)\n* [ConsumeMQTT]({{site.url}}/apache-nifi-processors/#ConsumeMQTT)\n* [DebugFlow]({{site.url}}/apache-nifi-processors/#DebugFlow)\n* [DeleteDynamoDB]({{site.url}}/apache-nifi-processors/#DeleteDynamoDB)\n* [ExtractMediaMetadata]({{site.url}}/apache-nifi-processors/#ExtractMediaMetadata)\n* [GetDynamoDB]({{site.url}}/apache-nifi-processors/#GetDynamoDB)\n* [GetHDFSEvents]({{site.url}}/apache-nifi-processors/#GetHDFSEvents)\n* [GetSNMP]({{site.url}}/apache-nifi-processors/#GetSNMP)\n* [JoltTransformJSON]({{site.url}}/apache-nifi-processors/#JoltTransformJSON)\n* [ListenLumberjack]({{site.url}}/apache-nifi-processors/#ListenLumberjack)\n* [ListS3]({{site.url}}/apache-nifi-processors/#ListS3)\n* [PublishKafka]({{site.url}}/apache-nifi-processors/#PublishKafka)\n* [PublishMQTT]({{site.url}}/apache-nifi-processors/#PublishMQTT)\n* [PutDynamoDB]({{site.url}}/apache-nifi-processors/#PutDynamoDB)\n* [PutHiveQL]({{site.url}}/apache-nifi-processors/#PutHiveQL)\n* [PutSlack]({{site.url}}/apache-nifi-processors/#PutSlack)\n* [PutTCP]({{site.url}}/apache-nifi-processors/#PutTCP)\n* [PutUDP]({{site.url}}/apache-nifi-processors/#PutUDP)\n* [SelectHiveQL]({{site.url}}/apache-nifi-processors/#SelectHiveQL)\n* [SetSNMP]({{site.url}}/apache-nifi-processors/#SetSNMP)\n\nMost of these could be seen in the pipleine since they had some similarties to current ones, for example AMQP or JMS consume and publish, and now adding MQTT to be supported. Apache Nifi now has more support for AWS services, Amazon's Dynamo and S3, with the ListS3 and the Delete/Put DynamoDB Processors. The biggest upgrade, I think, for processors is the DebugFlow processor, which allows you to produce a behavior you'd like to test for a flow. This could be transfering a flow file to a success/failure relationship, wanting to rollback a FlowFile to test behaviour without penalty, or just throwing an exception.\nI encourage you to take a look at the descriptions of these procesors on our [Processors Page]({{site.url}}/apache-nifi-processors/) which lists all current processors, and is updated after every release. \n\n","html":"<p>Apache Nifi’s newest release is out, 0.7.0. You can grab the binaries from <a href=\"https://nifi.apache.org/download.html\">their site</a> as always. So lets dive in and see what to look for in this release!</p>\n\n<ul>\n<li><a href=\"#bugfixes\">Bug Fixes</a>: 72</li>\n<li><a href=\"#improvements\">Improvements</a>: 49</li>\n<li><a href=\"#new-features\">New Features</a>: 10</li>\n</ul>\n\n\n<p>As always, a bunch of bug fixes, but this time there are quite a few improvements.</p>\n\n<!--more-->\n\n\n<h3><a name=\"bugfixes\"></a>Bug Fixes</h3>\n\n<p>Where to begin! This release has the second most bug fixes we’ve seen. Again, there are a ton to cover so take a peak at the full <a href=\"https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12335078\">release notes</a> if you want to see more, since this is just a quick highlight. Lots of processor fixes - NullPointerException to non-responsive processors, a few nifi startup fixes, and other fixes such as fixing OS specific issues, looking at you Windows bugs. UI interaction was fixed when many HDFS and HBase processors were on the graph. Bug fixes are hard to highlight, since they are so user specific for what you might have run into, but it is nice to know that reported bugs are getting fixed.</p>\n\n<h3><a name=\"improvements\"></a>Improvements</h3>\n\n<p>I didn’t know where to start for Bug fixes, but really don’t know where to begin for Improvements. I’ll start by saying there are a few new processors that they included in this section, but I’m going to list in the New Features below to remain consistent. Other actual improvements were updates to HTTP Processors to allow proxy authentication which I know I saw someone on here asked about, so it’s nice to see that it is now doable! You are now able to use SSL with AMQP processors. ExecuteScript processors can now be executed concurrently too. Performance improvements in StreamScanner, SSL for the mongo processors, and the Mock framework can now register FlowFile Assertions.</p>\n\n<p>It’s good to mention that new documentation has been created for an in depth dive for developers, with Nifi and it’s design decisions. This is a great read and I will try to do a quick break down in the future for what points I think are really important for developers.</p>\n\n<h3><a name=\"new-features\"></a>New Features</h3>\n\n<p>A few new processors here, but outside of that Apache Nifi now supports custom properties in the Expression Language.</p>\n\n<p>New Processors:</p>\n\n<ul>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ConsumeKafka\">ConsumeKafka</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ConsumeMQTT\">ConsumeMQTT</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#DebugFlow\">DebugFlow</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#DeleteDynamoDB\">DeleteDynamoDB</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ExtractMediaMetadata\">ExtractMediaMetadata</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#GetDynamoDB\">GetDynamoDB</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#GetHDFSEvents\">GetHDFSEvents</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#GetSNMP\">GetSNMP</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#JoltTransformJSON\">JoltTransformJSON</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ListenLumberjack\">ListenLumberjack</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#ListS3\">ListS3</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PublishKafka\">PublishKafka</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PublishMQTT\">PublishMQTT</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutDynamoDB\">PutDynamoDB</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutHiveQL\">PutHiveQL</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutSlack\">PutSlack</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutTCP\">PutTCP</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#PutUDP\">PutUDP</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#SelectHiveQL\">SelectHiveQL</a></li>\n<li><a href=\"{{site.url}}/apache-nifi-processors/#SetSNMP\">SetSNMP</a></li>\n</ul>\n\n\n<p>Most of these could be seen in the pipleine since they had some similarties to current ones, for example AMQP or JMS consume and publish, and now adding MQTT to be supported. Apache Nifi now has more support for AWS services, Amazon’s Dynamo and S3, with the ListS3 and the Delete/Put DynamoDB Processors. The biggest upgrade, I think, for processors is the DebugFlow processor, which allows you to produce a behavior you’d like to test for a flow. This could be transfering a flow file to a success/failure relationship, wanting to rollback a FlowFile to test behaviour without penalty, or just throwing an exception.\nI encourage you to take a look at the descriptions of these procesors on our <a href=\"{{site.url}}/apache-nifi-processors/\">Processors Page</a> which lists all current processors, and is updated after every release.</p>\n","image":null,"featured":0,"page":0,"status":"published","language":"de_DE","meta_title":null,"meta_description":null,"author_id":1,"created_at":1468375633000,"created_by":1,"updated_at":1468375633000,"updated_by":1,"published_at":1468375633000,"published_by":1,"tags":[{"name":"apache nifi"},{"name":"release"}]}],"tags":[{"id":1,"name":"apache nifi","slug":"apache nifi","description":""},{"id":2,"name":"release","slug":"release","description":""},{"id":3,"name":"controller service","slug":"controller service","description":""},{"id":4,"name":"processors","slug":"processors","description":""},{"id":5,"name":"unit tests","slug":"unit tests","description":""},{"id":6,"name":"videos","slug":"videos","description":""},{"id":7,"name":"how-tos","slug":"how-tos","description":""}]}}