{"id":1127,"date":"2015-05-11T13:09:37","date_gmt":"2015-05-11T21:09:37","guid":{"rendered":"http:\/\/www.developerscloset.com\/?page_id=1127"},"modified":"2018-05-11T14:34:14","modified_gmt":"2018-05-11T22:34:14","slug":"impala","status":"publish","type":"page","link":"https:\/\/www.developerscloset.com\/?page_id=1127","title":{"rendered":"Impala"},"content":{"rendered":"<p><a href=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/impala.png\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-1128 alignnone\" src=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/impala-160x300.png\" alt=\"\" width=\"109\" height=\"204\" srcset=\"https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/impala-160x300.png 160w, https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/impala.png 182w\" sizes=\"auto, (max-width: 109px) 100vw, 109px\" \/><\/a><\/p>\n<p>Impala provides a real-time SQL query interface for data stored in HDFS and HBase. Impala requires Hive service and shares the Hive Metastore with Hue. Impala also offers connectors for various external applications like Tableau.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69ea21f13639b\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69ea21f13639b\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.developerscloset.com\/?page_id=1127\/#Configure_Impala\" >Configure Impala<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.developerscloset.com\/?page_id=1127\/#Install_Impala\" >Install Impala<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.developerscloset.com\/?page_id=1127\/#Impala_Configuration\" >Impala Configuration<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.developerscloset.com\/?page_id=1127\/#Administer_Impala\" >Administer Impala<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.developerscloset.com\/?page_id=1127\/#ODBC_Connector\" >ODBC Connector<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.developerscloset.com\/?page_id=1127\/#Impala_Query_Editor_Hue\" >Impala Query Editor (Hue)<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"Configure_Impala\"><\/span>Configure Impala<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2><span class=\"ez-toc-section\" id=\"Install_Impala\"><\/span>Install Impala<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Cloudera Manager distributes Impala in CDH and offers the following services:<\/p>\n<ul>\n<li><strong>Impala StateStore<\/strong>\u00a0\u2013 (Cloudera recommends the StateStore be on a\u00a0<u>separate<\/u>\u00a0server from the\u00a0Impala Daemon, preferably on the server running the HDFS NameNode) &#8211; The Impala StateStore is the service that tracks the location and status of all Impala Daemon\u00a0instances in the cluster. Run\u00a0<u>one<\/u>\u00a0instance of this daemon in your cluster. Most production deployments run this daemon on the server where the HDFS NameNode is installed, often on node #02.<\/li>\n<li><strong>Impala Catalog Server<\/strong>\u00a0\u2013 (run the catalog server on the\u00a0<u>same<\/u>\u00a0server as the StateStore daemon) &#8211; Cloudera recommends the catalog server be on the same host as the StateStore. The Impala component known as the catalog service relays the metadata changes from Impala SQL statements to all the nodes in a cluster. The Impala Catalog Server\u00a0is physically represented by a daemon process named catalogd; you only need\u00a0one Impala Catalog Server on in the cluster. Do not run the Impala Catalog service on a server where you are running an Impala Daemon.<\/li>\n<li><strong>Impala Daemon<\/strong>\u00a0\u2013 Run one Impala Daemon on each\u00a0server in the cluster that has a HDFS\u00a0DataNode \u2013 but\u00a0<u>not<\/u>\u00a0on a node with the Impala StateStore Daemon. Also, you should\u00a0<u>not<\/u>\u00a0run an Impala Daemon service on a server running an HDFS NameNode &#8211; the memory used\u00a0can be\u00a0too high. The Impala Daemon service plans and executes queries against HDFS and HBase data. As data use increases, memory use will increase.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Impala_Configuration\"><\/span>Impala Configuration<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div style=\"max-width: 100%;margin: auto;overflow: hidden\">\n<div style=\"width: 100%;overflow: auto\">\n<table>\n<thead>\n<tr>\n<th>Configuration<\/th>\n<th>Description<\/th>\n<th>Small (&lt; 16 GB memory on a node)<\/th>\n<th>Large (&gt; 16 GB memory on a node)<\/th>\n<th>Calculation<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr role=\"row\">\n<td class=\"confluenceTd\"><strong>Impala Daemon Memory Limit<\/strong><\/p>\n<p>mem_limit<\/td>\n<td class=\"confluenceTd\">Memory limit in bytes for Impala Daemon, enforced by the daemon itself. If reached, queries running on the Impala Daemon may be killed. Leave it blank to let Impala pick its own limit. Use a value of -1 B to specify no limit.<\/td>\n<td class=\"confluenceTd\">256 MB<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">1 GB<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">In HDFS base the calculation on the block count used in Impala joins.<\/p>\n<p>(block_count\/100,000 *\u00a0.5 GB)<\/td>\n<\/tr>\n<tr role=\"row\">\n<td class=\"confluenceTd\" colspan=\"1\"><strong>HBase RPC Timeout<\/strong>hbase.rpc.timeout<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Timeout in seconds for all HBase RPCs made by Impala. Overrides configuration in HBase service.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">3 seconds<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>9 seconds<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">On Azure we needed to bump the 3 second timeout to 9 seconds to allow for network slowness inherent to Azure.<\/td>\n<\/tr>\n<tr role=\"row\">\n<td class=\"confluenceTd\" colspan=\"1\"><strong>Process Swap Memory Thresholds<\/strong><\/p>\n<p>Impala Daemon Default Group<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">The health test thresholds on the swap memory usage of the process.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Critical:\u00a0<u>Never<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Critical:\u00a0<u>Never<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Obviously swap is bad for Impala, but there are times when a warning is enough.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p><em>More details:<\/em>\u00a0During join operations, portions of data from each joined table are loaded into memory. Data sets can be very large, so ensure your hardware has sufficient memory to accommodate the joins you anticipate completing.<\/p>\n<p><em>Even more details:<\/em>\u00a0While requirements vary according to data set size, the following is generally recommended:<\/p>\n<ul>\n<li>Memory &#8211; 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query processing on a particular node exceed the amount of memory available to Impala on that node, the query writes temporary work data to disk, which can lead to long query times. Note that because the work is parallelized, and intermediate results for aggregate queries are typically smaller than the original data, Impala can query and join tables that are much larger than the memory available on an individual node.<\/li>\n<li>Storage &#8211; DataNodes with 12 or more disks each. I\/O speeds are often the limiting factor for disk performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be querying.<\/li>\n<\/ul>\n<p><em>For even more details:\u00a0<\/em>Cluster Sizing Calculator:\u00a0<a class=\"external-link\" href=\"http:\/\/www.cloudera.com\/content\/cloudera\/en\/documentation\/cloudera-impala\/latest\/topics\/impala_cluster_sizing.html\" rel=\"nofollow\">http:\/\/www.cloudera.com\/content\/cloudera\/en\/documentation\/cloudera-impala\/latest\/topics\/impala_cluster_sizing.html<\/a><\/p>\n<h1><span class=\"ez-toc-section\" id=\"Administer_Impala\"><\/span>Administer Impala<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2><span class=\"ez-toc-section\" id=\"ODBC_Connector\"><\/span>ODBC Connector<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Connect to any Impala Daemon over ODBC. The Impala ODBC Version 2 and higher connect to Impala on port 21050. For authentication,\u00a0Impala supports Kerberos authentication with all the supported versions of the driver, and requires ODBC 2.05.13 for Impala or later for LDAP username\/password authentication.\u00a0Download the ODBC Connector:\u00a0<a href=\"https:\/\/www.cloudera.com\/downloads\/connectors\/impala\/odbc\/2-5-41.html\">https:\/\/www.cloudera.com\/downloads\/connectors\/impala\/odbc\/2-5-41.html<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Impala_Query_Editor_Hue\"><\/span>Impala Query Editor (Hue)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Hue\u00a0offers a stripped down query editor that\u00a0displays databases,\u00a0allows users to save scripts, explain, and query databases. While Hue&#8217;s query editor is limited, it might come in handy for a quick overview.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Impala provides a real-time SQL query interface for data stored in HDFS and HBase. Impala requires Hive service and shares the Hive Metastore with Hue. [&#8230;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"class_list":["post-1127","page","type-page","status-publish","hentry"],"jetpack_shortlink":"https:\/\/wp.me\/P1BQ8S-ib","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1127"}],"version-history":[{"count":2,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1127\/revisions"}],"predecessor-version":[{"id":1132,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1127\/revisions\/1132"}],"wp:attachment":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}