{"id":1068,"date":"2018-03-06T14:35:10","date_gmt":"2018-03-06T22:35:10","guid":{"rendered":"http:\/\/www.developerscloset.com\/?p=1068"},"modified":"2018-03-06T15:16:57","modified_gmt":"2018-03-06T23:16:57","slug":"hbase-rowcount-script","status":"publish","type":"post","link":"https:\/\/www.developerscloset.com\/?p=1068","title":{"rendered":"HBase: RowCount Script"},"content":{"rendered":"<p>I wrote a quick script to count all rows in all tables in HBase. This works great for my Dev clusters that have ever-growing tables filled with clutter. The script uses a MapReduce job to go against all HBase tables. I have used this in Prod, but with mixed results: Sometimes the HBase tables are too large for the MR jobs to run within 24 hours.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\r\n#!\/bin\/bash\r\n# Filename: rc-start-rowcount.sh\r\n# Description: start a row count for each table\r\n#\r\n# Example:\r\n# \/opt\/scripts\/rc-start-rowcount.sh\r\n\r\n# 1. check if the row count is already running\r\n# 2. if the row count is NOT running, then run a row count\r\n\r\ncd \/opt\/cloudera\/parcels\/CDH\/bin\r\n\r\nScriptDir=&quot;\/opt\/scripts\/&quot;;\r\nWorkingDir=&quot;\/opt\/scripts\/rc-work&quot;;\r\nTest=&quot;&quot;;\r\nListOfHBaseTables=&quot;rc-tables.txt&quot;;\r\nListOfRunningYarnJobs=&quot;rc-yarn-jobs.txt&quot;;\r\nScriptToRun=&quot;rc-script.sh&quot;;\r\nLogDir=&quot;\/var\/log\/scripts&quot;;\r\nLogFile=&quot;rc-start-rowcount.log&quot;;\r\n\r\necho &quot;`date`: Start&quot; &gt;&gt; $LogDir\/$LogFile;\r\n\r\nStartTest=`ps ax|grep rc-parserowcount.sh|grep bash`\r\necho $StartTest\r\nif &#x5B;&#x5B; ! $StartTest == &quot;&quot; ]]; then\r\necho &quot;`date`: WARNING: rc-parse-rowcount.sh is running, exit&quot; &gt;&gt; $LogDir\/$LogFile;\r\necho $StartTest &gt;&gt; $LogDir\/$LogFile;\r\nexit;\r\nfi\r\n\r\n# create the script\r\necho &quot;#!\/bin\/bash&quot; &gt; $WorkingDir\/$ScriptToRun\r\n\r\necho 'list; quit;' | hbase shell &gt; $WorkingDir\/$ListOfHBaseTables\r\nsed -i '\/^$\/d' $WorkingDir\/$ListOfHBaseTables\r\nsed -i '$d' $WorkingDir\/$ListOfHBaseTables\r\n\r\n# get running applications from yarn\r\nyarn application -list &gt; $WorkingDir\/$ListOfRunningYarnJobs\r\n\r\nwhile read table; do\r\n#echo 'table:' $table\r\n# 1. check if row count is running\r\n# if Test is blank=NOT running, anything else=running\r\nTest=`grep $table $WorkingDir\/$ListOfRunningYarnJobs`;\r\nif &#x5B;&#x5B; $Test == &quot;&quot; ]]; then\r\n# 2. if the row count is NOT running, then run a row count\r\necho &quot;sleep 10;hbase org.apache.hadoop.hbase.mapreduce.RowCounter $table &gt; $WorkingDir\/$table.txt 2&gt;&amp;1 &amp;&quot; &gt;&gt; $WorkingDir\/$ScriptToRun 2&gt;&amp;1\r\n#echo 'run this table:' $table\r\necho &quot;`date`: Process Table: $table&quot; &gt;&gt; $LogDir\/$LogFile;\r\nfi\r\ndone &lt;$WorkingDir\/$ListOfHBaseTables\r\n\r\n# set the script to be executable\r\nchmod +x $WorkingDir\/$ScriptToRun\r\n\r\n# run the script that included all map reduce jobs\r\ncat $WorkingDir\/$ScriptToRun\r\n$WorkingDir\/$ScriptToRun\r\n\r\necho &quot;`date`: End&quot; &gt;&gt; $LogDir\/$LogFile;\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I wrote a quick script to count all rows in all tables in HBase. This works great for my Dev clusters that have ever-growing tables [&#8230;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[88,75,25],"tags":[],"class_list":["post-1068","post","type-post","status-publish","format-standard","hentry","category-bash","category-hbase","category-linux"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p1BQ8S-he","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/posts\/1068","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1068"}],"version-history":[{"count":1,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/posts\/1068\/revisions"}],"predecessor-version":[{"id":1069,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/posts\/1068\/revisions\/1069"}],"wp:attachment":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1068"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1068"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}