{"id":1152,"date":"2016-05-14T14:55:55","date_gmt":"2016-05-14T22:55:55","guid":{"rendered":"http:\/\/www.developerscloset.com\/?page_id=1152"},"modified":"2018-05-14T15:08:18","modified_gmt":"2018-05-14T23:08:18","slug":"flume","status":"publish","type":"page","link":"https:\/\/www.developerscloset.com\/?page_id=1152","title":{"rendered":"Flume"},"content":{"rendered":"<p><a href=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/flume-image.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-1154 alignnone\" src=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/flume-image-300x225.png\" alt=\"\" width=\"233\" height=\"175\" srcset=\"https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/flume-image-300x225.png 300w, https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/flume-image.png 301w\" sizes=\"auto, (max-width: 233px) 100vw, 233px\" \/><\/a><\/p>\n<p>Flume collects and aggregates data from almost any source into a persistent store such as HDFS.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69ea229b25a8e\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69ea229b25a8e\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.developerscloset.com\/?page_id=1152\/#Flume_Configuration\" >Flume Configuration<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.developerscloset.com\/?page_id=1152\/#Install_Flume\" >Install Flume<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.developerscloset.com\/?page_id=1152\/#Configure_Flume\" >Configure Flume<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.developerscloset.com\/?page_id=1152\/#Data_Flow_Model\" >Data Flow Model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.developerscloset.com\/?page_id=1152\/#When_To_Use_Flume\" >When To Use Flume<\/a><\/li><\/ul><\/nav><\/div>\n<h1 id=\"Flume-FlumeConfiguration\"><span class=\"ez-toc-section\" id=\"Flume_Configuration\"><\/span>Flume Configuration<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"Flume-InstallFlume\"><span class=\"ez-toc-section\" id=\"Install_Flume\"><\/span>Install Flume<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Cloudera Manager distributes Flume in CDH and offers the following services:<\/p>\n<ul>\n<li><strong>Flume Agent<\/strong>\u00a0\u2013 Flume is used to retrieve data. You do not need many flume agents, and they can share a host with Hdfs DataNodes, Mapredue TaskTrackers and and HBase RegionServers. However, we should keep Flume on its own server, sharing a network with the many other Flume servers in their own cluster (network access from the Flume service is only needed to Cloudera Manager, ZooKeeper, HBase, and HDFS to write out to storage). Make sure there is an HBase Gateway on the Flume node to keep the HBase configurations up-to-date. Flume is dependent on the Zookeeper service.<\/li>\n<\/ul>\n<h2 id=\"Flume-ConfigureFlume\"><span class=\"ez-toc-section\" id=\"Configure_Flume\"><\/span>Configure Flume<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div style=\"max-width:100%;margin:auto;overflow:hidden\">\n<div style=\"width:100%;overflow:auto\">\n<table>\n<thead>\n<tr class=\"tablesorter-headerRow\" role=\"row\">\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Configuration<\/div>\n<\/th>\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Description<\/div>\n<\/th>\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Value<\/div>\n<\/th>\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Calculation<\/div>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr role=\"row\">\n<td class=\"confluenceTd\">Java Heap Size of Agent in Bytes<\/td>\n<td class=\"confluenceTd\">Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.<\/td>\n<td class=\"confluenceTd\"><u>1 GB<\/u><\/td>\n<td class=\"confluenceTd\">Base this calculation on the largest file size to be consumed.\u00a0The only limitation on the data to be consumed by Flume is memory.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<h1><span class=\"ez-toc-section\" id=\"Data_Flow_Model\"><\/span>Data Flow Model<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>A Flume source consumes events delivered to it by an external source like a web server. When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it\u2019s consumed by a Flume sink. The sink removes the event from the channel and puts it into an external repository like HDFS (via Flume HDFS sink) or forwards it to the Flume source of the next Flume agent (next hop) in the flow. The source and sink within the given agent run asynchronously with the events staged in the channel.<\/p>\n<p><a href=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/Flume-Flow-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"300\" height=\"107\" class=\" wp-image-1153 alignnone\" src=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/Flume-Flow-1-300x107.png\" alt=\"\" srcset=\"https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/Flume-Flow-1-300x107.png 300w, https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/Flume-Flow-1.png 620w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<h1><span class=\"ez-toc-section\" id=\"When_To_Use_Flume\"><\/span>When To Use Flume<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>If you need to ingest textual log data into Hadoop\/HDFS then Flume is the right fit for your problem, full stop. For other use cases, here are some guidelines:<\/p>\n<p>Flume is designed to transport and ingest regularly-generated event data over relatively stable, potentially complex topologies. The notion of \u201cevent data\u201d is very broadly defined. To Flume, an event is just a generic blob of bytes. There are some limitations on how large an event can be &#8211; for instance, it cannot be larger than what you can store in memory or on disk on a single machine &#8211; but in practice, flume events can be everything from textual log entries to image files. The key property of an event is that they are generated in a continuous, streaming fashion. If your data is not regularly generated (i.e. you are trying to do a single bulk load of data into a Hadoop cluster) then Flume will still work, but it is probably overkill for your situation. Flume likes relatively stable topologies. Your topologies do not need to be immutable, because Flume can deal with changes in topology without losing data and can also tolerate periodic reconfiguration due to fail-over or provisioning. It probably won\u2019t work well if you plant to change topologies every day, because reconfiguration takes some thought and overhead.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Flume collects and aggregates data from almost any source into a persistent store such as HDFS. Flume Configuration Install Flume Cloudera Manager distributes Flume in [&#8230;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"class_list":["post-1152","page","type-page","status-publish","hentry"],"jetpack_shortlink":"https:\/\/wp.me\/P1BQ8S-iA","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1152","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1152"}],"version-history":[{"count":2,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1152\/revisions"}],"predecessor-version":[{"id":1158,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1152\/revisions\/1158"}],"wp:attachment":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}