{"id":1105,"date":"2017-05-11T09:16:28","date_gmt":"2017-05-11T17:16:28","guid":{"rendered":"http:\/\/www.developerscloset.com\/?page_id=1105"},"modified":"2018-06-12T13:29:35","modified_gmt":"2018-06-12T21:29:35","slug":"hive-2","status":"publish","type":"page","link":"https:\/\/www.developerscloset.com\/?page_id=1105","title":{"rendered":"Hive"},"content":{"rendered":"<p><a href=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/hive.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1103 alignnone\" src=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/hive.png\" alt=\"\" width=\"180\" height=\"158\" \/><\/a><\/p>\n<p>Hive is a data warehouse system that facilitates reading, writing, and managing large datasets residing in distributed storage. Structure can be projected onto data already in storage. Hive offers a SQL-like language called HiveQL. Hive also offers connectors for various external applications like Tableau.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69ea2105deb4b\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69ea2105deb4b\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Configure_Hive\" >Configure Hive<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Install_Hive\" >Install Hive<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Hive_Configuration\" >Hive Configuration<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Table_Lock_Manager\" >Table Lock Manager<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Long_Live_and_Process_LLAP\" >Long Live and Process (LLAP)<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Administer_Hive\" >Administer Hive<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Test_Hive\" >Test Hive<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Query_Examples\" >Query Examples<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Split_Function\" >Split Function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Create_Table\" >Create Table<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Scripting_Against_Hive\" >Scripting Against Hive<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#ODBC_Connector\" >ODBC Connector<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Troubleshooting\" >Troubleshooting<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Hive_shell_debug_mode\" >Hive shell debug mode<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Hive_Metastore_Servers_will_not_start_%E2%80%93_Hive_Metastore_canary_failed_to_create_a_database\" >Hive Metastore Servers will not start &#8211; Hive Metastore canary failed to create a database<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.developerscloset.com\/?page_id=1105\/#Out_of_Memory_Error_%E2%80%93_Within_Hive_Before_being_submitted_to_Yarn\" >Out of Memory Error &#8211; Within Hive Before being submitted to Yarn<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"Configure_Hive\"><\/span>Configure Hive<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2><span class=\"ez-toc-section\" id=\"Install_Hive\"><\/span>Install Hive<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Cloudera Manager distributes Hive in CDH and offers the following services:<\/p>\n<ul>\n<li><strong>Hive Metastore Server\u00a0<\/strong>(<u>required<\/u>) \u2013 The Hive Metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients (including Hive) access to this information via the Metastore service API. By default the Hive Metastore Server uses a Derby database. It is best to run the Hive Metastore Server on the same server as the database because large queries can cause performance issues if the database and the Hive Metastore Server are on separate servers. It is also\u00a0recommended to install the Hive Metastore Server on the\u00a0same node as an HDFS NameNode. Although Hive can be configured to use more than one Hive Metastore Server, Cloudera does not support having multiple Hive Metastore Servers as this may result in problems such as concurrency errors.<\/li>\n<li><strong>HiveServer2<\/strong>\u00a0(<u>required<\/u>\u00a0\u2013\u00a0<u>same node as the Hive Metastore Server<\/u>) &#8211; HiveServer2 (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results. HiveServer2 differs from HiveServer, and Beeline is the supported CLI to communicate with HiveServer2. HiveServer2 supports multiple clients making many simultaneous requests, which is an improvement over HiveServer.<\/li>\n<li><strong>Gateway<\/strong>\u00a0(<u>required<\/u>\u00a0\u2013 add a Hive Gateway to any host from which you will run the Hive CLI, such as an APP server and where the Hive Metastore Server is installed). If you are not sure, add the Hive Gateway to all APP servers and the server where you installed the Hive Metastore Server.<\/li>\n<li><strong>WebHCat Server<\/strong>\u00a0(<u>optional<\/u>) \u2013 can be installed onto any node. WebHCat is the REST API for HCatalog, a table and storage management layer for Hadoop. We do not use this service.<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Hive_Configuration\"><\/span>Hive Configuration<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div style=\"max-width: 100%;margin: auto;overflow: hidden\">\n<div style=\"width: 100%;overflow: auto\">\n<table>\n<tbody>\n<tr>\n<td><strong>Configuration<\/strong><\/td>\n<td><strong>Description<\/strong><\/td>\n<td><strong>Small\u00a0(8 GB Memory)<\/strong><\/td>\n<td><strong>Medium\u00a0(16 GB Memory)<\/strong><\/td>\n<td><strong>Large\u00a0(32 GB Memory)<\/strong><\/td>\n<td><strong>Calculation<\/strong><\/td>\n<\/tr>\n<tr>\n<td>Java Heap Size of HiveServer2<\/td>\n<td>Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.<\/td>\n<td><u>256 MB<\/u><\/td>\n<td><u>2 GB<\/u><\/td>\n<td><u>4 GB<\/u><\/td>\n<td>We were able to reliably run against 70 GB files\u00a0on 256 MB of heap, but\u00a0when we went beyond we were often hit by out of memory errors. I&#8217;d recommend 2 GB for each 1 TB of files that you intend to go after, but this is not a regular slope, instead it is much more heavy on the lower end (100 GB files) and uses less heap as you increase the file repository size.<\/td>\n<\/tr>\n<tr>\n<td><strong>HiveServer2 Port<\/strong><\/p>\n<p>hive.server2.thrift.port<\/td>\n<td>Port on which HiveServer2 will listen for connections.<\/td>\n<td>10000<\/td>\n<td>10000<\/td>\n<td>10000<\/td>\n<td>Standard HiveServer2 port is 10000.<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"display-name\"><strong>Hive Metastore Connection Timeout<\/strong><\/p>\n<p class=\"property-name\">hive.metastore.client.socket.timeout<\/p>\n<\/td>\n<td>Timeout for requests to the Hive Metastore Server. Consider increasing this if you have tables with a lot of metadata and see timeout errors. Used by most Hive Metastore clients such as Hive CLI and HiveServer2, but not by Impala. Impala has a separately configured timeout.<\/td>\n<td>5 minutes<\/td>\n<td>5 minutes<\/td>\n<td><u>30 minutes<\/u><\/td>\n<td>Default = 5 minutes. Bigger for bigger tables.<\/td>\n<\/tr>\n<tr>\n<td>Service Monitor Client Config Overrides<\/td>\n<td>For advanced use only, a list of configuration properties that will be used by the Service Monitor instead of the current client configuration for the service.<\/td>\n<td>60<\/td>\n<td>60<\/td>\n<td><u>&lt;property&gt;&lt;name&gt;hive.metastore.client.socket.timeout&lt;\/name&gt;&lt;value&gt;1800&lt;\/value&gt;&lt;\/property&gt;<\/u><\/td>\n<td>Default = 60 seconds. This configuration is set by Cloudera&#8217;s installer. We override the value to match the default<\/p>\n<p>hive.metastore.client.socket.timeout<\/p>\n<p>because the two values should match.<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"pre codeblock hljs\">hive.execution.engine<\/p>\n<\/td>\n<td>The default execution engine for running hive queries. Can be set to mr for MapReduce or spark for Spark.<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td>Use this option to use Hive-on-Spark, or the default MapReduce.<\/p>\n<p class=\"p\">For Hive to work on Spark, you must deploy Spark gateway roles on the same machine that hosts HiveServer2. Otherwise, Hive on Spark cannot read from Spark configurations and cannot submit Spark jobs.<\/p>\n<p class=\"p\">After installation, run the following command in Hive so that Hive will use Spark as the back-end engine for all subsequent queries.<\/p>\n<p class=\"p\">Reference:\u00a0<a class=\"external-link\" href=\"https:\/\/www.cloudera.com\/documentation\/enterprise\/5-6-x\/topics\/admin_hos_config.html#hos_config\" rel=\"nofollow\">https:\/\/www.cloudera.com\/documentation\/enterprise\/5-6-x\/topics\/admin_hos_config.html#hos_config<\/a><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>hive.stats.fetch.column.stats<\/td>\n<td>Whether column stats for a table are fetched during explain.<\/td>\n<td>true<\/td>\n<td>true<\/td>\n<td>true<\/td>\n<td>For improved performance, Cloudera recommends that you configure the this properties for Hive.<\/td>\n<\/tr>\n<tr>\n<td>hive.optimize.index.filter<\/td>\n<td>Whether to use the indexing optimization for all queries.<\/td>\n<td>true<\/td>\n<td>true<\/td>\n<td>true<\/td>\n<td>For improved performance, Cloudera recommends that you configure the this properties for Hive.<\/td>\n<\/tr>\n<tr>\n<td>\n<p class=\"display-name\"><strong>Spark Executor Maximum Java Heap Size<\/strong><\/p>\n<p class=\"display-name\">spark.executor.memory<\/p>\n<\/td>\n<td>\n<p class=\"p\">Maximum size of each Spark executor&#8217;s Java heap memory when Hive is running on Spark.<\/p>\n<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td>\n<ul>\n<li class=\"li\">Compute a memory size equal to\u00a0<samp class=\"ph codeph\">yarn.nodemanager.resource.memory-mb * (spark.executor.cores \/ yarn.nodemanager.resource.cpu-vcores)<\/samp>\u00a0and then split that between\u00a0<samp class=\"ph codeph\">spark.executor.memory<\/samp>\u00a0and\u00a0<samp class=\"ph codeph\">spark.yarn.executor.memoryOverhead<\/samp>.<\/li>\n<li class=\"li\"><samp class=\"ph codeph\">spark.yarn.executor.memoryOverhead<\/samp>\u00a0is 15-20% of the total memory size.<\/li>\n<\/ul>\n<p class=\"p\">For general Spark configuration recommendations, see\u00a0<a class=\"external-link\" href=\"https:\/\/www.cloudera.com\/documentation\/enterprise\/5-6-x\/topics\/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_config_apps\" rel=\"nofollow\">Configuring Spark on YARN Applications<\/a>.<\/p>\n<p class=\"p\">Executor memory size can have a number of effects on Hive. Increasing executor memory increases the number of queries for which Hive can enable mapjoin optimization. However, if there&#8217;s too much executor memory, it takes longer to perform garbage collection. Also, some experiments shows that HDFS doesn\u2019t handle concurrent writers well, so it may face a race condition if there are too many executor cores.<\/p>\n<p class=\"p\">Cloudera recommends that you set the value for\u00a0<samp class=\"ph codeph\">spark.executor.cores<\/samp>\u00a0to\u00a0<samp class=\"ph codeph\">5<\/samp>,\u00a0<samp class=\"ph codeph\">6<\/samp>, or\u00a0<samp class=\"ph codeph\">7<\/samp>, depending on what the host is divisible by. For example, if\u00a0<samp class=\"ph codeph\">yarn.nodemanager.resource.cpu-vcores<\/samp>\u00a0is\u00a0<samp class=\"ph codeph\">19<\/samp>, then you would set the value to\u00a0<samp class=\"ph codeph\">6<\/samp>. Executors must have the same number of cores. If you set the value to\u00a0<samp class=\"ph codeph\">5<\/samp>, three executors with 5 cores each can be launched, leaving four cores unused. If you set the value to\u00a0<samp class=\"ph codeph\">7<\/samp>, only two executors are used, and five cores are unused. If the number of cores is 20, set the value to\u00a0<samp class=\"ph codeph\">5<\/samp>\u00a0so that each executor gets four cores, and no cores are unused.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><strong>Spark Executor Memory Overhead<\/strong><\/p>\n<p>spark.yarn.executor.memoryOverhead<\/td>\n<td>This is the amount of extra off-heap memory that can be requested from YARN, per executor process. This, together with spark.executor.memory, is the total memory that YARN can use to create JVM for an executor process.<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td>See\u00a0spark.executor.memory for more details.<\/td>\n<\/tr>\n<tr>\n<td>hive.stats.collect.rawdatasize<\/td>\n<td>Not found in Cloudera&#8217;s configuration.<\/td>\n<td>true<\/td>\n<td>true<\/td>\n<td>true<\/td>\n<td>Hive on Spark uses statistics to determine the threshold for converting common join to map join. There are two types of statistics about the size of data:<\/p>\n<ul class=\"ul\">\n<li class=\"li\"><samp class=\"ph codeph\">totalSize<\/samp>: The approximate size of data on the disk<\/li>\n<li class=\"li\"><samp class=\"ph codeph\">rawDataSize<\/samp>: The approximate size of data in memory<\/li>\n<\/ul>\n<p>When both metrics are available, Hive chooses\u00a0<samp class=\"ph codeph\">rawDataSize<\/samp>.<\/td>\n<\/tr>\n<tr>\n<td><strong>Hive Auto Convert Join Noconditional Size<\/strong><\/p>\n<p>hive.auto.convert.join.noconditionaltask.size<\/td>\n<td>If Hive auto convert join is on, and the sum of the size for n-1 of the tables\/partitions for a n-way join is smaller than the specified size, the join is directly converted to a MapJoin (there is no conditional task).<\/td>\n<td>20 MB<\/td>\n<td>20 MB<\/td>\n<td><u>256 MB<\/u><\/td>\n<td>The threshold for the sum of all the small table size (by default,\u00a0<samp class=\"ph codeph\">rawDataSize<\/samp>), for map join conversion. You can increase the value if you want better performance by converting more common joins to map joins. However, if you set this value too high, tasks may fail because too much memory is being used by data from small tables.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Table_Lock_Manager\"><\/span>Table Lock Manager<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For Table Lock Manager, make the following configuration changes before using HiveServer2. To understand the Table Lock Manager, you need to understand Concurrency, which a must in databases. At a minimum, strive to support concurrent readers and writers whenever possible. It would be useful to add a mechanism to discover the current locks which have been acquired, though all locks are acquired implicitly. Failure to do so may result in unpredictable behavior. To enable Hive&#8217;s Table Lock Manager:<\/p>\n<ol>\n<li>Open Cloudera Manager\u2019s UI, browse to Services, Hive, click on Configuration.<\/li>\n<li>Expand HiveServer2 Default Group, click on Advanced, and add the following to the\u00a0HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml\u00a0(substitute the Zookeeper servers in the hive.zookeeper.quorum &lt;value&gt;):<\/li>\n<\/ol>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&lt;property&gt;\r\n  &lt;name&gt;hive.support.concurrency&lt;\/name&gt;\r\n  &lt;description&gt;Enable Hive's Table Lock Manager Service&lt;\/description&gt;\r\n  &lt;value&gt;true&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n&lt;property&gt;\r\n  &lt;name&gt;hive.zookeeper.quorum&lt;\/name&gt;\r\n  &lt;description&gt;Zookeeper quorum used by Hive's Table Lock Manager&lt;\/description&gt;\r\n  &lt;value&gt;zk.servername12,zk.servername13&lt;\/value&gt;\r\n&lt;\/property&gt;\r\n<\/pre>\n<h2><span class=\"ez-toc-section\" id=\"Long_Live_and_Process_LLAP\"><\/span>Long Live and Process (LLAP)<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Long Live and Process (LLAP) functionality was added in Hive 2.0 (<a class=\"external-link\" href=\"https:\/\/issues.apache.org\/jira\/browse\/HIVE-7926\" rel=\"nofollow\">HIVE-7926<\/a>\u00a0and associated tasks).\u00a0\u00a0<a class=\"external-link\" href=\"https:\/\/issues.apache.org\/jira\/browse\/HIVE-9850\" rel=\"nofollow\">HIVE-9850<\/a>\u00a0links documentation, features and issues for this enhancement<\/p>\n<p>For configuration of LLAP, see LLAP Section of\u00a0<a class=\"external-link\" href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/Configuration+Properties\" rel=\"nofollow\">Configuration Properties<\/a>.<\/p>\n<p><a class=\"external-link\" href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/LLAP\" rel=\"nofollow\">https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/LLAP<\/a><\/p>\n<p><a class=\"external-link\" href=\"http:\/\/hortonworks.com\/blog\/stinger-next-enterprise-sql-hadoop-scale-apache-hive\/\" rel=\"nofollow\">http:\/\/hortonworks.com\/blog\/stinger-next-enterprise-sql-hadoop-scale-apache-hive\/<\/a><\/p>\n<p>Test build:\u00a0<a class=\"external-link\" href=\"http:\/\/www.lewuathe.com\/blog\/2015\/08\/12\/try-hive-llap\/\" rel=\"nofollow\">http:\/\/www.lewuathe.com\/blog\/2015\/08\/12\/try-hive-llap\/<\/a><\/p>\n<h1><span class=\"ez-toc-section\" id=\"Administer_Hive\"><\/span>Administer Hive<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"Hive-TestHive\"><span class=\"ez-toc-section\" id=\"Test_Hive\"><\/span>Test Hive<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Enter the Hive shell:<\/p>\n<p>hive shell<\/p>\n<p>show databases;<\/p>\n<p>Debug Hive using the following commands:<\/p>\n<p>hive -hiveconf hive.root.logger=ALL,console<\/p>\n<p>show databases;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Query_Examples\"><\/span>Query Examples<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Query Hive table and output the results to HDFS folder:<\/p>\n<p>set\u00a0mapred.job.name\u00a0= operations-report-file;<br \/>\nINSERT\u00a0OVERWRITE DIRECTORY\u00a0&#8216;\/path\/to\/hdfs\/folder\/&#8217;<br \/>\nROW FORMAT DELIMITED FIELDS TERMINATED\u00a0BY\u00a0&#8216;,&#8217;<br \/>\nselect<br \/>\nTO_DATE(from_unixtime(UNIX_TIMESTAMP(split(rowkey,&#8221;:&#8221;)[3],\u00a0&#8216;yyyy-MM-dd&#8217;)))\u00a0AS\u00a0datetime<br \/>\n,\u00a0count(*)\u00a0AS\u00a0rowcount<br \/>\nFROM\u00a0default.tablename<br \/>\nWHERE\u00a0TO_DATE(from_unixtime(UNIX_TIMESTAMP(split(rowkey,&#8221;:&#8221;)[3],\u00a0&#8216;yyyy-MM-dd&#8217;))) &gt; date_sub(FROM_UNIXTIME(UNIX_TIMESTAMP(),&#8217;yyyy-MM-dd&#8217;) ,14)<br \/>\nGROUP\u00a0BY\u00a0split(rowkey,&#8221;:&#8221;)[3]<br \/>\nORDER\u00a0BY\u00a0datetime;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Split_Function\"><\/span>Split Function<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>select\u00a0split(rowkey,&#8221;:&#8221;)[3]\u00a0from\u00a0tablename limit 10;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Create_Table\"><\/span>Create Table<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Create Hive table using a file stored in HDFS:<\/p>\n<p>CREATE\u00a0TABLE\u00a0default.operations_report_file (<br \/>\ndate\u00a0DATE,<br \/>\nrowcount\u00a0INT<br \/>\n) ROW FORMAT<br \/>\nDELIMITED FIELDS TERMINATED\u00a0BY\u00a0&#8216;\\t&#8217;<br \/>\nLINES TERMINATED\u00a0BY\u00a0&#8216;\\n&#8217;\u00a0STORED\u00a0AS\u00a0TEXTFILE;<br \/>\nLOAD\u00a0DATA INPATH\u00a0&#8216;\/ops\/operations-report-file.tsv&#8217;\u00a0INTO\u00a0TABLE\u00a0default.operations_report_file;<\/p>\n<p>Create Hive table using a local file:<\/p>\n<p>CREATE\u00a0DATABASE\u00a0test;CREATE\u00a0EXTERNAL\u00a0TABLE\u00a0test.hivetest1<br \/>\n(<br \/>\nCustomerNumber\u00a0INT,<br \/>\nDepartmentNumber\u00a0INT,<br \/>\nCategoryNumber\u00a0INT,<br \/>\nVisitTime\u00a0INT<br \/>\n)<br \/>\nROW FORMAT DELIMITED FIELDS TERMINATED\u00a0BY\u00a0&#8216;,&#8217;;LOAD\u00a0DATA\u00a0LOCAL\u00a0INPATH\u00a0&#8216;customers.csv&#8217;<br \/>\nOVERWRITE\u00a0INTO\u00a0TABLE\u00a0test.hivetest1;<\/p>\n<p>Create Hive table against an HBase table:<\/p>\n<p>CREATE\u00a0EXTERNAL\u00a0TABLE\u00a0hbase_test6c( rowKey string,<br \/>\nac1 string,<br \/>\nac2 string,<br \/>\nac3 string,<br \/>\nac4 string,<br \/>\nac5 string,<br \/>\nac6 string<br \/>\n)<br \/>\nSTORED\u00a0BY\u00a0&#8216;org.apache.hadoop.hive.hbase.HBaseStorageHandler&#8217;<br \/>\nWITH\u00a0SERDEPROPERTIES (&#8220;hbase.columns.mapping&#8221;\u00a0=\u00a0&#8220;:key,cf:c1,cf:c2,cf:c3,cf:c4,cf:c5,cf:c6&#8221;)<br \/>\nTBLPROPERTIES(&#8220;hbase.table.name&#8221;\u00a0=\u00a0&#8220;test6c&#8221;);<\/p>\n<p>Additional example against an HBase table:<\/p>\n<p>create\u00a0external\u00a0table\u00a0if\u00a0not\u00a0exists hbasepersontable (id\u00a0INT, firstname STRING, lastname STRING )<br \/>\nstored\u00a0by\u00a0&#8216;org.apache.hadoop.hive.hbase.HBaseStorageHandler&#8217;<br \/>\nwith\u00a0serdeproperties (&#8216;hbase.columns.mapping&#8217;\u00a0=\u00a0&#8216;:key, firstname:firstname, lastname:lastname&#8217;)<br \/>\ntblproperties (&#8216;hbase.table.name&#8217;\u00a0=\u00a0&#8216;PERSON&#8217;);<\/p>\n<h2 id=\"Hive-ScriptingAgainstHive\"><span class=\"ez-toc-section\" id=\"Scripting_Against_Hive\"><\/span>Scripting Against Hive<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Return results from Hive using bash:<\/p>\n<p># the -S silences the connection messages<br \/>\n# use -e to pass a command<br \/>\nmy_value=`hive -S -e\u00a0&#8216;select count(*) from salestrigger.external_triggered_stores;&#8217;|tail\u00a0-n1`;<br \/>\necho\u00a0$my_value;<\/p>\n<h2 id=\"Hive-ODBCConnector\"><span class=\"ez-toc-section\" id=\"ODBC_Connector\"><\/span>ODBC Connector<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>I found a useful ODBC Connector to\u00a0connect to Hive&#8217;s HiveServer2:\u00a0<a class=\"external-link\" href=\"http:\/\/kb.tableau.com\/articles\/knowledgebase\/hadoop-hive-connection\" rel=\"nofollow\">http:\/\/kb.tableau.com\/articles\/knowledgebase\/hadoop-hive-connection<\/a><\/p>\n<h1 id=\"Hive-Troubleshooting\"><span class=\"ez-toc-section\" id=\"Troubleshooting\"><\/span>Troubleshooting<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"Hive-Hiveshelldebugmode\"><span class=\"ez-toc-section\" id=\"Hive_shell_debug_mode\"><\/span>Hive shell debug mode<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>You can enter a debug mode in the hive shell with the following command:<\/p>\n<blockquote><p>hive -hiveconf hive.root.logger=DEBUG,console<\/p><\/blockquote>\n<h2 id=\"Hive-HiveMetastoreServerswillnotstart-HiveMetastorecanaryfailedtocreateadatabase\"><span class=\"ez-toc-section\" id=\"Hive_Metastore_Servers_will_not_start_%E2%80%93_Hive_Metastore_canary_failed_to_create_a_database\"><\/span>Hive Metastore Servers will not start &#8211; Hive Metastore canary failed to create a database<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The Hive Server will not connect to the Hive Metastore and Cloudera reports that canary failed to create a database. In the Hive Metastore logs you see the following error: Metastore Thrift Server threw an exception&#8230;<\/p>\n<p>MetaException(message:Hive Schema version 0.13.0 does not match metastore&#8217;s schema version 0.12.0 Metastore is not upgraded or corrupt)<\/p>\n<p><strong>Resolution:\u00a0<\/strong>You will need to update the database, there must have been an update to the Hive Server which missed the database \u2013 or the database has become corrupt. Stop the Hive service, (in Cloudera Manager) within the Hive Metastore instance, click Actions, and select Update Database Schema. Without Cloudera Manager you&#8217;ll find the update-database-schema script in the bin folder. After the schema has successfully been updated, start Hive Metastore and Hive Server.<\/p>\n<p><strong>From the Hive Metastore log:<\/strong><\/p>\n<p>8:26:23.133 AM ERROR\u00a0 org.apache.hadoop.hive.metastore.HiveMetaStore<\/p>\n<p>Metastore Thrift Server threw an exception&#8230;<\/p>\n<p>MetaException(message:Hive Schema version 0.13.0 does not match metastore&#8217;s schema version 0.12.0 Metastore is not upgraded or corrupt)<\/p>\n<p>at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6311)<\/p>\n<p>at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6282)<\/p>\n<p>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)<\/p>\n<p>&#8230;<\/p>\n<p>8:26:22.774 AM ERROR\u00a0 org.apache.hadoop.hive.metastore.MetaStoreDirectSql<\/p>\n<p>Database initialization failed; direct SQL is disabled<\/p>\n<p>javax.jdo.JDOException: Exception thrown when executing query<\/p>\n<p>at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)<\/p>\n<p>at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:230)<\/p>\n<p>at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.&lt;init&gt;(MetaStoreDirectSql.java:119)<\/p>\n<p>&#8230;<\/p>\n<p>org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:224)<\/p>\n<p>at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)<\/p>\n<p>at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)<\/p>\n<p>at org.apache.hadoop.hive.metastore.RawStoreProxy.&lt;init&gt;(RawStoreProxy.java:58)<\/p>\n<p>&#8230;<\/p>\n<p>NestedThrowablesStackTrace:<\/p>\n<p>org.postgresql.util.PSQLException:\u00a0ERROR: column A0.OWNER_NAME does not exist<\/p>\n<p>Position: 122<\/p>\n<p>at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)<\/p>\n<p>at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)<\/p>\n<p>at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)<\/p>\n<p>&#8230;<\/p>\n<p><strong>From the Hive Server log:<\/strong><\/p>\n<p>8:22:20.788 PM ERROR\u00a0 org.apache.hive.service.CompositeService<\/p>\n<p>Error starting services HiveServer2<\/p>\n<p>org.apache.hive.service.ServiceException: Unable to connect to MetaStore!<\/p>\n<p>at org.apache.hive.service.cli.CLIService.start(CLIService.java:140)<\/p>\n<p>at org.apache.hive.service.CompositeService.start(CompositeService.java:70)<\/p>\n<p>at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)<\/p>\n<p>&#8230;<\/p>\n<p>Caused by: MetaException(message:Got exception: org.apache.thrift.transport.TTransportException\u00a0java.net.SocketException: Broken pipe)<\/p>\n<p>at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1114)<\/p>\n<p>at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:839)<\/p>\n<p>at org.apache.hive.service.cli.CLIService.start(CLIService.java:138)<\/p>\n<p>&#8230; 9 more<\/p>\n<h2 id=\"Hive-OutofMemoryError-WithinHiveBeforebeingsubmittedtoYarn\"><span class=\"ez-toc-section\" id=\"Out_of_Memory_Error_%E2%80%93_Within_Hive_Before_being_submitted_to_Yarn\"><\/span>Out of Memory Error &#8211; Within Hive Before being submitted to Yarn<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><strong>Resolution:<\/strong><\/p>\n<p>It appears that the Hive query was not being submitted to Yarn. That points to an out of memory error from the HiveServer2. Increase memory under: Java Heap Size of HiveServer2<\/p>\n<p><strong>Error:<\/strong><\/p>\n<p>vi \/var\/log\/hive\/hadoop-cmf-hive-HIVESERVER2-servername12.log.out<\/p>\n<p>2015-05-07 10:30:09,925 WARN org.apache.hive.service.cli.thrift.ThriftCLIService: Error executing statement:<\/p>\n<p>java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space<\/p>\n<p>at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:84)<\/p>\n<p>at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37)<\/p>\n<p>at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64)<\/p>\n<p>&#8230;<\/p>\n<p>Caused by: java.lang.OutOfMemoryError: Java heap space<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hive is a data warehouse system that facilitates reading, writing, and managing large datasets residing in distributed storage. Structure can be projected onto data already [&#8230;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"class_list":["post-1105","page","type-page","status-publish","hentry"],"jetpack_shortlink":"https:\/\/wp.me\/P1BQ8S-hP","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1105","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1105"}],"version-history":[{"count":5,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1105\/revisions"}],"predecessor-version":[{"id":1416,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1105\/revisions\/1416"}],"wp:attachment":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}