{"id":1197,"date":"2015-07-13T08:38:33","date_gmt":"2015-07-13T16:38:33","guid":{"rendered":"http:\/\/www.developerscloset.com\/?page_id=1197"},"modified":"2018-08-14T14:29:34","modified_gmt":"2018-08-14T22:29:34","slug":"hdfs","status":"publish","type":"page","link":"https:\/\/www.developerscloset.com\/?page_id=1197","title":{"rendered":"HDFS"},"content":{"rendered":"<p><a href=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/06\/hdfs-logo.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1198 alignnone\" src=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/06\/hdfs-logo.jpg\" alt=\"\" width=\"211\" height=\"118\" \/><\/a><\/p>\n<p>Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute hosts throughout a cluster to enable reliable, extremely rapid computations.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69ea210602210\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69ea210602210\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Configure_Hdfs\" >Configure Hdfs<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Install_HDFS\" >Install HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Configuration\" >HDFS Configuration<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Administer_HDFS\" >Administer HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Configure_HDFS_High_Availability\" >Configure HDFS High Availability<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Enabling_High_Availability_and_Automatic_Failover\" >Enabling High Availability and Automatic Failover<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Manually_Failing_Over_to_the_Standby_NameNode\" >Manually Failing Over to the Standby NameNode<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Configure_Uber_in_HDFS\" >Configure Uber in HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#NFS_Gateway\" >NFS Gateway<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Commands\" >HDFS Commands<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#File_System_Check\" >File System Check<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#List_Corrupt_Blocks\" >List Corrupt Blocks<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#DataNode_Report\" >DataNode Report<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Create_a_File_in_HDFS_using_touchz\" >Create a File in HDFS using touchz<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Move_a_File_from_Windows_to_HDFS\" >Move a File from Windows to HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#List_Folder_Structure\" >List Folder Structure<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Get_a_summary_of_the_file_sizes\" >Get a summary of the file sizes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Copy_data_from_local_disk_to_HDFS\" >Copy data from local disk to HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Copy_data_from_HDFS_to_local\" >Copy data from HDFS to local<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Delete_Folder_Recursively\" >Delete Folder Recursively<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Set_Folder_Permissions\" >Set Folder Permissions<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Create_User_Home_Directory\" >Create User Home Directory<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#WebHDFS_and_HttpFS_API\" >WebHDFS and HttpFS API<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Change_Replication_Factor\" >Change Replication Factor<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Copy_data_from_one_Hadoop_cluster_to_another_Hadoop_cluster_using_DistCp\" >Copy data from one Hadoop cluster to another Hadoop cluster using DistCp<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Recover_Under-Replicated_Missing_or_Corrupt_Blocks\" >Recover Under-Replicated, Missing, or Corrupt Blocks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-27\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Checkpointing_in_HDFS\" >Checkpointing in HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-28\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Balancer\" >HDFS Balancer<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-29\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Start_the_Balancer\" >Start the Balancer<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-30\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Stop_the_Balancer\" >Stop the Balancer<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-31\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Troubleshooting\" >Troubleshooting<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-32\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Manually_Start_HDFS\" >Manually Start HDFS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-33\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_is_in_Safe_Mode\" >HDFS is in Safe Mode<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-34\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Checkpoint_Age_has_become_bad\" >HDFS Checkpoint Age has become bad<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-35\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Started_with_Bad_Health_Address_is_already_in_use\" >HDFS Started with Bad Health: Address is already in use<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-36\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_NameNode_Stopped_HDFS_Service_is_Down\" >HDFS NameNode Stopped: HDFS Service is Down<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-37\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Error_Canary_test_failed_%E2%80%93_Permission_denied_error13\" >HDFS Error: Canary test failed \u2013 Permission denied: error=13<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-38\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_JournalNode_FileNotFoundException_No_such_file_or_directory\" >HDFS JournalNode:\u00a0FileNotFoundException:\u00a0No such file or directory<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-39\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_JournalNode_JournalNotFormattedException_Journal_Storage_Directory_not_formatted\" >HDFS JournalNode:\u00a0JournalNotFormattedException:\u00a0Journal Storage Directory * not formatted<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-40\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#Azure_Dev\" >Azure Dev<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-41\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Under_Replicated_Blocks\" >HDFS Under Replicated Blocks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-42\" href=\"https:\/\/www.developerscloset.com\/?page_id=1197\/#HDFS_Missing_Blocks\" >HDFS Missing Blocks<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1 id=\"HDFS-ConfigureHdfs\"><span class=\"ez-toc-section\" id=\"Configure_Hdfs\"><\/span>Configure Hdfs<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"HDFS-InstallHDFS\"><span class=\"ez-toc-section\" id=\"Install_HDFS\"><\/span>Install HDFS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>HDFS is distributed by CDH and requires the following services:<\/p>\n<ul>\n<li><strong>NameNode<\/strong>\u00a0The NameNode contains a global map of where the data is stored, when DataNodes start they present a list of the blocks they are hosting to the NameNode. Do not place the NameNode on the same server as the MapReduce JobTracker or YARN ResourceManager as the memory needed to run these services is high.<\/li>\n<li><strong>SecondaryNameNode<\/strong>\u00a0(not on the same hosts as the NameNode, and not required if HDFS HA is configured). The SecondaryNameNode contains information necessary to rebuild the NameNode. The SecondaryNameNode is\u00a0NOT a backup NameNode as it only contains configurations.<\/li>\n<li><strong>JournalNode<\/strong>\u00a0\u2013 Used for HDFS HA \u2013 there must be at least three JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons can reasonably be collocated on machines with other Hadoop services. You can also run more than three JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JournalNodes.<\/li>\n<li><strong>Failover Controller<\/strong>\u00a0&#8211; Used for HDFS HA &#8211; there\u00a0is one Failover Controller for each NameNode,\u00a0the FC\u00a0MUST be placed on the same host as\u00a0each NameNode.\u00a0ZKFC uses the Zookeeper Service for coordination in determining which is the Active NameNode and in determining when to failover to the Standby NameNode.<\/li>\n<li><strong>Balancer<\/strong>\u00a0&#8211; required service &#8211; (it is ok to have the Balancer on the same host as the NameNode). The Balancer is a service that redistributes blocks among DataNodes to keep the cluster storage equal across all volumes. It is helpful to run the Balance after a large amount of data has been deleted or if a new DataNode has been added.<\/li>\n<li><strong>DataNodes<\/strong>\u00a0&#8211; The DataNode can share a host with the HBase RegionServer and MapReduce TaskTracker or YARN NodeManager. For data locality colocate the HBase RegionServer with a DataNode.<\/li>\n<li><strong>Gateway<\/strong>\u00a0&#8211; The Gateway stores configuration information about HDFS, including the network topology. Install a Gateway on all APP servers.<\/li>\n<li><strong>HttpFS<\/strong>\u00a0&#8211; HttpFS is a service that provides HTTP access to HDFS. HttpFS has a REST HTTP API supporting all HDFS File System operations (both read and write).<\/li>\n<li><strong>NFS Gateway<\/strong>\u00a0&#8211;\u00a0The NGS Gateway allows HDFS to be mounted as a local disk. The NFS Gateway server can be any host in the cluster, including the NameNode, a DataNode, or any HDFS client. The client can be any NFSv3-client-compatible machine\u00a0(NOT supported on an Ubuntu client).<\/li>\n<\/ul>\n<h2 id=\"HDFS-HDFSConfiguration\"><span class=\"ez-toc-section\" id=\"HDFS_Configuration\"><\/span>HDFS Configuration<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div style=\"max-width: 100%;margin: auto;overflow: hidden\">\n<div style=\"width: 100%;overflow: auto\">\n<table class=\"wrapped confluenceTable\">\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/><\/colgroup>\n<tbody>\n<tr>\n<td class=\"confluenceTd\"><strong>Configuration<\/strong><\/td>\n<td class=\"confluenceTd\"><strong>Description<\/strong><\/td>\n<td class=\"confluenceTd\"><strong>Small (8 GB Memory)<\/strong><\/td>\n<td class=\"confluenceTd\"><strong>Medium (16 GB Memory)<\/strong><\/td>\n<td class=\"confluenceTd\"><strong>Large (28 GB Memory)<\/strong><\/td>\n<td class=\"confluenceTd\"><strong>Calculation<\/strong><\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>DataNode Data Directory<\/strong><\/p>\n<p>dfs.data.dir, dfs.datanode.data.dir<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Comma-delimited list of directories on the local file system where the DataNode stores HDFS block data.<\/p>\n<p><em>Warning: be very careful when modifying this property. Removing or changing entries can result in data loss. If you want to hot swap drives, override the value of this property for the specific DataNode role instance whose drive is to be hot-swapped; do not modify the property value in the role group.<\/em><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/dn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/dn, \/space2\/dfs\/dn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/dn, \/space2\/dfs\/dn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Typical values are \/spaceN\/dfs\/dn for N = 1, 2, 3&#8230;<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>NameNode Data Directories<\/strong><\/p>\n<p>dfs.name.dir, dfs.namenode.name.dir<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Determines where on the local file system the NameNode should store the name table (fsimage). For redundancy, enter a comma-delimited list of directories to replicate the name table in all of the directories.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/nn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/nn, \/space2\/dfs\/nn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/nn, \/space2\/dfs\/nn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Typical values are \/spaceN\/dfs\/nn where N=1..3.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>HDFS Checkpoint Directories<\/strong><\/p>\n<p>fs.checkpoint.dir, dfs.namenode.checkpoint.dir<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Determines where on the local file system the DFS SecondaryNameNode should store the temporary images to merge. For redundancy, enter a comma-delimited list of directories to replicate the image in all of the directories.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/snn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/snn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">\/space1\/dfs\/snn<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Typical values are \/spaceN\/dfs\/snn for N = 1, 2, 3&#8230;<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\">Java Heap Size of NameNode in Bytes<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>1 GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>1 GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>4\u00a0GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Hdfs NameNode memory calculation = 1 GB * 1,000,000 blocks, or on really slow disk = 1 GB * 100,000 blocks. In cases of slow disk (Azure), bump this up to compensate.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\">Java Heap Size of DataNode in Bytes<\/td>\n<td class=\"confluenceTd\">Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.<\/td>\n<td class=\"confluenceTd\"><u>768 MB<\/u><\/td>\n<td class=\"confluenceTd\"><u>1\u00a0GB<\/u><\/td>\n<td class=\"confluenceTd\"><u>2\u00a0GB<\/u><\/td>\n<td class=\"confluenceTd\">I have not seen a need for a larger heap size than 1 GB for 500,000 blocks.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\">Java Heap Size of Secondary NameNode in Bytes<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>1\u00a0GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>1\u00a0GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>4\u00a0GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">On single node, developer servers, the SecondaryNameNode memory can be reduced. On larger clusters keep the memory high enough to manage the edits (half the NameNode). Not used in HDFS HA.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\">Reserved Space for Non DFS Use<\/td>\n<td class=\"confluenceTd\">Reserved space in bytes per volume for non Distributed File System (DFS) use.<\/td>\n<td class=\"confluenceTd\"><u>10 GB<\/u><\/td>\n<td class=\"confluenceTd\"><u>10 GB (for a single node VM)<br \/>\n<\/u><u>100 GB<\/u><\/td>\n<td class=\"confluenceTd\"><u>100 GB<br \/>\n<\/u><\/td>\n<td class=\"confluenceTd\">Each DataNode that also runs a MapReduce TaskTracker or YARN NodeManager must have the Reserved Space for Non DFS Use increased. The\u00a0MR services\u00a0use too much data to keep the default.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>DataNode Failed Volumes Tolerated<\/strong><\/p>\n<p>dfs.datanode.failed.volumes.tolerated<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">The number of volumes that are allowed to fail before a DataNode stops offering service. By default, any volume failure will cause a DataNode to shutdown.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Protect HDFS from failed volumes (or what HDFS incorrectly assumes is a failed volume, like Azure shutting down a VM by first shutting down the volumes).<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>DataNode Volume Choosing Policy<\/strong><\/p>\n<p>dfs.datanode.fsdataset.volume.choosing.policy<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Available Space\u00a0DataNode Policy for picking which volume should get a new block.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>Available Space<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>Available Space\u00a0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>Available Space\u00a0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">By default a DataNode writes new block replicas to disk volumes solely on a round-robin basis. Change this to the Available Space volume-choosing policy that causes the DataNode to take into account how much space is available on each volume when deciding where to place a new replica.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>DataNode Data Directory Permissions<\/strong><\/p>\n<p>dfs.datanode.data.dir.perm<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Permissions for the directories on the local file system where the DataNode stores its blocks. The permissions must be octal. 755 and 700 are typical values.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>755<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>755\u00a0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>755<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">We manage permissions by group so our service accounts can access the data.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>Default Umask<\/strong><\/p>\n<p>dfs.umaskmode, fs.permissions.umask-mode<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Default umask for file and directory creation, specified in an octal value (with a leading 0). Default is 022.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>002<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>002\u00a0<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>002<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">We manage permissions by group, which makes the umask important. We set the value of umask to 002, the default is 022. This allows users of the group to read and write.<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\"><strong>Automatically Restart Process<\/strong><\/p>\n<p>DataNode<\/p>\n<p>Failover Controller<\/p>\n<p>HttpFS<\/p>\n<p>JournalNode<\/p>\n<p>NameNode<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">When set, this role&#8217;s process is automatically (and transparently) restarted in the event of an unexpected failure.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">N\/A<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">N\/A<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Enabled<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">If HA is enabled, the Failover Controller should be set to restart on failure. This service intermittently dies on Azure\u00a0(is a victim of low heap on clusters with low memory).<\/td>\n<\/tr>\n<tr>\n<td class=\"confluenceTd\" colspan=\"1\">\n<div class=\"display-name\"><strong>HttpFS Proxy User Groups<\/strong><\/div>\n<div class=\"property-name\">hadoop.proxyuser.httpfs.groups<\/div>\n<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Comma-delimited list of groups to allow the HttpFS user to impersonate. The default &#8216;*&#8217; allows all groups. To disable entirely, use a string that does not correspond to a group name, such as &#8216;_no_group_&#8217;.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">linux-dsiq-webhdfs<\/p>\n<p>domain^users<\/p>\n<p>hue<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">linux-dsiq-webhdfs<\/p>\n<p>domain^users<\/p>\n<p>hue<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">linux-dsiq-webhdfs<\/p>\n<p>domain^users<\/p>\n<p>hue<\/td>\n<td class=\"confluenceTd\" colspan=\"1\">We use this configuration to lock down read\/write permissions on the HttpFS service.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<h2><span class=\"ez-toc-section\" id=\"Administer_HDFS\"><\/span>Administer HDFS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Start the DataNode locally (no SSH):<\/p>\n<p>\/opt\/hadoop\/hadoop-#\/sbin\/hadoop-daemon.sh &#8211;config ~\/opt\/hadoop\/etc\/hadoop\/ start datanode<\/p>\n<h2 id=\"HDFS-ConfigureHDFSHighAvailability\"><span class=\"ez-toc-section\" id=\"Configure_HDFS_High_Availability\"><\/span>Configure HDFS High Availability<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>CM 5 and CDH 5 support Quorum-based Storage as the only HDFS HA implementation. Quorum-based Storage refers to the HA implementation that uses a Quorum Journal Manager (QJM).<\/p>\n<p>In order for the Standby NameNode to keep its state synchronized with the Active NameNode in this implementation, both nodes communicate with a group of separate daemons called JournalNodes. When any namespace modification is performed by the Active NameNode, it durably logs a record of the modification to a majority of these JournalNodes. The Standby NameNode is capable of reading the edits from the JournalNodes, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.<\/p>\n<p>In order to provide a fast failover, it is also necessary that the Standby NameNode has up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and they send block location information and heartbeats to both.<\/p>\n<p>It is vital for the correct operation of an HA cluster that only one of the NameNodes be active at a time. Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called &#8220;split-brain scenario,&#8221; the JournalNodes will only ever allow a single NameNode to be a writer at a time. During a failover, the NameNode which is to become active will simply take over the role of writing to the JournalNodes, which will effectively prevent the other NameNode from continuing in the Active state, allowing the new Active NameNode to safely proceed with failover.<\/p>\n<p>In order to deploy an HA cluster using Quorum-based Storage, you should prepare the following:<\/p>\n<ul>\n<li>NameNode machines &#8211; the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.<\/li>\n<li>JournalNode machines &#8211; the machines on which you run the JournalNodes.<\/li>\n<li>The JournalNode daemon is relatively lightweight, so these daemons can reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.<\/li>\n<li>Cloudera recommends that you deploy the JournalNode daemons on the &#8220;master&#8221; host or hosts (NameNode, Standby NameNode, JobTracker, etc.) so the JournalNodes&#8217; local directories can use the reliable local storage on those machines. You should not use SAN or NAS storage for these directories.<\/li>\n<li>There must be at least three JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes. This will allow the system to tolerate the failure of a single machine. You can also run more than three JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JournalNodes, (three, five, seven, etc.) Note that when running with N JournalNodes, the system can tolerate at most (N &#8211; 1) \/ 2 failures and continue to function normally. If the requisite quorum is not available, the NameNode will not format or start, and you will see an error similar to this:<\/li>\n<\/ul>\n<div>\n<blockquote><p>12\/10\/01 17:34:18 WARN namenode.FSEditLog: Unable to determine input streams from QJM to [10.0.1.10:8485, 10.0.1.10:8486, 10.0.1.10:8487]. Skipping.<\/p>\n<p>java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.<\/p><\/blockquote>\n<\/div>\n<p><em>Note: In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. If you are reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled, you can reuse the hardware which you had previously dedicated to the Secondary NameNode.<\/em><\/p>\n<h3 id=\"HDFS-EnablingHighAvailabilityandAutomaticFailover\"><span class=\"ez-toc-section\" id=\"Enabling_High_Availability_and_Automatic_Failover\"><\/span>Enabling High Availability and Automatic Failover<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>In CM, the Enable High Availability workflow leads you through adding a second (standby) NameNode and configuring JournalNodes. During the workflow, Cloudera Manager creates a federated namespace.<\/p>\n<ol>\n<li>Go to the HDFS service.<\/li>\n<li>Click on Instances,\u00a0<strong>stop<\/strong>\u00a0the\u00a0<strong>Secondary NameNode<\/strong>\u00a0service. The Secondary NameNode MUST be stopped before continuing or the setup will fail.<\/li>\n<li>Select\u00a0<strong>Actions<\/strong>\u00a0&gt;\u00a0<strong>Enable High Availability<\/strong>. A screen showing the hosts that are eligible to run a standby NameNode and the JournalNodes displays.\n<ol>\n<li>Specify a name for the nameservice or accept the default name\u00a0<strong>nameservice1<\/strong>\u00a0and click\u00a0<strong>Continue<\/strong>.<\/li>\n<li>In the\u00a0<strong>NameNode Hosts<\/strong>\u00a0field, click\u00a0<strong>Select a host<\/strong>. The host selection dialog displays.<\/li>\n<li>Check the checkbox next to the hosts where you want the standby NameNode to be set up and click\u00a0<strong>OK<\/strong>. The standby NameNode cannot be on the same host as the active NameNode, and the host that is chosen should have the same hardware configuration (RAM, disk space, number of cores, and so on) as the active NameNode.<\/li>\n<li>In the\u00a0<strong>JournalNode Hosts<\/strong>\u00a0field, click\u00a0<strong>Select hosts<\/strong>. The host selection dialog displays.<\/li>\n<li>Check the checkboxes next to an odd number of hosts (a minimum of three) to act as JournalNodes and click\u00a0<strong>OK<\/strong>. JournalNodes should be hosted on hosts with similar hardware specification as the NameNodes. It is recommended that you put a JournalNode each on the same hosts as the active and standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker.<\/li>\n<li><strong>Failover Controller<\/strong>\u00a0&#8211; make sure these are on the same host as each NameNode. There are two FC for the two NN.<\/li>\n<li>Click\u00a0<strong>Continue<\/strong>.<\/li>\n<li>In the\u00a0<strong>JournalNode Edits Directory<\/strong>\u00a0property, enter a directory location for the JournalNode edits directory into the fields for each JournalNode host (<strong>\/space1\/dfs\/jn<\/strong>).\n<ol>\n<li>You may enter only one directory for each JournalNode. The paths do not need to be the same on every JournalNode.<\/li>\n<li>The directories you specify should be empty, and must have the appropriate permissions.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<li><strong>Extra Options<\/strong>: Decide whether Cloudera Manager should clear existing data in ZooKeeper, standby NameNode, and JournalNodes. If the directories are not empty (for example, you are re-enabling a previous HA configuration), Cloudera Manager will not automatically delete the contents\u2014you can select to delete the contents by keeping the default checkbox selection. The recommended default is to clear the directories. If you choose not to do so, the data should be in sync across the edits directories of the JournalNodes and should have the same version data as the NameNodes.<\/li>\n<li>Click\u00a0<strong>Continue<\/strong>.<\/li>\n<\/ol>\n<p><em>Note: Cloudera Manager executes a set of commands that will stop the dependent services, delete, create, and configure roles and directories as appropriate, create a nameservice and failover controller, and restart the dependent services and deploy the new client configuration.<\/em><\/p>\n<p><em>Another Note: you may see the following warning:\u00a0The following manual steps must be performed after completing this wizard:\u00a0For each of the Hive service(s) Hive, stop the Hive service, back up the Hive Metastore Database to a persistent store, run the service command &#8220;Update Hive Metastore NameNodes&#8221;, then restart the Hive services.<\/em><\/p>\n<ol>\n<li>If you want to use Hive, Impala, or Hue in a cluster with HA configured, follow the procedures in\u00a0<a class=\"external-link\" href=\"http:\/\/www.cloudera.com\/content\/cloudera\/en\/documentation\/core\/latest\/topics\/cdh_hag_hdfs_ha_cdh_components_config.html#topic_2_6\" rel=\"nofollow\"><strong>Configuring Other CDH Components to Use HDFS HA<\/strong><\/a>. The following manual steps must be performed after completing this wizard:\n<ol>\n<li>Configure the HDFS Web Interface Role of\u00a0<strong>Hue<\/strong>\u00a0service(s) to be an HTTPFS role instead of a NameNode. Select the\u00a0<strong>Hue<\/strong>\u00a0server, click\u00a0<strong>Configuration<\/strong>, and select the\u00a0<strong>HTTPFS<\/strong>\u00a0role. Click\u00a0<strong>Save Changes<\/strong>, and start the Hue service.<\/li>\n<li>For each of the\u00a0<strong>Hive<\/strong>\u00a0service(s), stop the Hive service, back up the Hive Metastore Database to a persistent store, run the service command &#8220;Update Hive Metastore NameNodes&#8221;, then restart the Hive services.\n<ol>\n<li>Go the\u00a0<strong>Hive<\/strong>\u00a0service.<\/li>\n<li>Select\u00a0<strong>Actions<\/strong>\u00a0&gt;\u00a0<strong>Stop<\/strong>.\u00a0<em>Note: You may want to stop the Hue and Impala services first, if present, as they depend on the Hive service.<\/em><\/li>\n<li>Click\u00a0<strong>Stop<\/strong>\u00a0to confirm the command.<\/li>\n<li>Back up the Hive metastore database.<\/li>\n<li>Select\u00a0<strong>Actions<\/strong>\u00a0&gt;\u00a0<strong>Update Hive Metastore NameNodes<\/strong>\u00a0and confirm the command.<\/li>\n<li>Select\u00a0<strong>Actions<\/strong>\u00a0&gt;\u00a0<strong>Start<\/strong>.<\/li>\n<li>Restart the\u00a0<strong>Hue<\/strong>\u00a0and Impala services if you stopped them prior to updating the metastore.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<li>Configure\u00a0<strong>Oozie<\/strong>\u00a0to use the HDFS nameservice instead of the URI in the &lt;name-node&gt; element of the workflow:<\/li>\n<\/ol>\n<p><strong>Example<\/strong>:<\/p>\n<div>\n<pre>&lt;action name=\"mr-node\"&gt;<\/pre>\n<pre>\u00a0 &lt;map-reduce&gt;<\/pre>\n<pre>\u00a0\u00a0\u00a0 &lt;job-tracker&gt;${jobTracker}&lt;\/job-tracker&gt;<\/pre>\n<pre>\u00a0\u00a0\u00a0 &lt;name-node&gt;hdfs:\/\/nameservice1<\/pre>\n<\/div>\n<p>where nameservice1 is the value of dfs.nameservices in hdfs-site.xml.<\/p>\n<h3 id=\"HDFS-ManuallyFailingOvertotheStandbyNameNode\"><span class=\"ez-toc-section\" id=\"Manually_Failing_Over_to_the_Standby_NameNode\"><\/span>Manually Failing Over to the Standby NameNode<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>If you are running an HDFS service with HA enabled, you can manually cause the active NameNode to failover to the standby NameNode. This is useful for planned downtime\u2014for hardware changes, configuration changes, or software upgrades of your primary host.<\/p>\n<p>Manual failover:<\/p>\n<p>hdfs haadmin -failover nn2(from) nn1(to)<\/p>\n<p>Stop services on nn2<\/p>\n<p>Once you&#8217;ve made sure that the nn2 node is inactive, you can stop services on that node: in this example, stop services on nn2. Stop the NameNode, the ZKFC daemon if this an automatic-failover deployment, and the JournalNode if you are moving it. Proceed as follows.<\/p>\n<p>Stop the NameNode daemon:<\/p>\n<p>$ sudo service hadoop-hdfs-namenode stop<\/p>\n<p>Stop the ZKFC daemon if it is running:<\/p>\n<p>$ sudo service hadoop-hdfs-zkfc stop<\/p>\n<p>Stop the JournalNode daemon if it is running:<\/p>\n<p>$ sudo service hadoop-hdfs-journalnode stop<\/p>\n<p>Make sure these services are not set to restart on boot. If you are not planning to use nn2 as a NameNode again, you may want remove the services.<\/p>\n<p>In Cloudera Manager:<\/p>\n<ol>\n<li>Go to the HDFS service.<\/li>\n<li>Click the Instances tab.<\/li>\n<li>Select Actions &gt; Manual Failover. (This option does not appear if HA is not enabled for the cluster.)<\/li>\n<li>From the pop-up, select the NameNode that should be made active, then click Manual Failover.<\/li>\n<li>When all the steps have been completed, click Finish.<\/li>\n<\/ol>\n<p>Cloudera Manager transitions the NameNode you selected to be the active NameNode, and the other NameNode to be the standby NameNode. HDFS should never have two active NameNodes.<\/p>\n<p>Reference:<\/p>\n<p><a class=\"external-link\" href=\"http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CDH5\/latest\/CDH5-High-Availability-Guide\/cdh5hag_hdfs_ha_config.html\" rel=\"nofollow\">Configuring HDFS High Availability<\/a><\/p>\n<p><a class=\"external-link\" href=\"http:\/\/blog.cloudera.com\/blog\/2012\/03\/high-availability-for-the-hadoop-distributed-file-system-hdfs\/\" rel=\"nofollow\">High Availability for the Hadoop Distributed File System (HDFS)<\/a><\/p>\n<p>Rename NameNode or Replace NameNode<\/p>\n<p>If you rename the NameNode or replace the NameNode with a new NameNode you will have to clear the ZooKeeper&#8217;s zkfs znodes<\/p>\n<p>Clear Failover Controller ZooKeeper<\/p>\n<p>zkfc -formatZK<\/p>\n<p>Finally, refresh the DataNodes to pick up the new NameNodes.<\/p>\n<p>datanodes refreshnamenode<\/p>\n<h2 id=\"HDFS-ConfigureUberinHDFS\"><span class=\"ez-toc-section\" id=\"Configure_Uber_in_HDFS\"><\/span>Configure Uber in HDFS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The application master for MapReduce jobs is a Java application whose main class is MRAppMaster. It initializes the job by creating a number of bookkeeping objects to keep track of the job\u2019s progress, as it will receive progress and completion reports from the tasks (step 6). Next, it retrieves the input splits computed in the client from the shared filesystem (step 7). It then creates a map task object for each split, and a number of reduce task objects determined by the mapreduce.job.reduces property.<\/p>\n<p>The next thing the application master does is decide how to run the tasks that make up the MapReduce job. If the job is small, the application master may choose to run them in the same JVM as itself, since it judges the overhead of allocating new containers and running tasks in them as outweighing the gain to be had in running them in parallel, compared to running them sequentially on one node. (This is different to MapReduce 1, where small jobs are never run on a single tasktracker.) Such a job is said to be uberized, or run as an uber task.<\/p>\n<p>What qualifies as a small job? By default one that has less than 10 mappers, only one reducer, and the input size is less than the size of one HDFS block. (These values may be changed for a job by setting mapreduce.job.ubertask.maxmaps, mapreduce.job.uber task.maxreduces, and mapreduce.job.ubertask.maxbytes.) It\u2019s also possible to disable uber tasks entirely (by setting mapreduce.job.ubertask.enable to false).<\/p>\n<p>Reference:\u00a0<a class=\"external-link\" href=\"http:\/\/sungsoo.github.io\/2013\/12\/12\/yarn-mapreduce2.html\" rel=\"nofollow\">http:\/\/sungsoo.github.io\/2013\/12\/12\/yarn-mapreduce2.html<\/a><\/p>\n<h2 id=\"HDFS-NFSGateway\"><span class=\"ez-toc-section\" id=\"NFS_Gateway\"><\/span>NFS Gateway<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>NFS Gateway is not supported on Ubuntu &#8211; only on RHEL. There is a workaround for Ubuntu, but it is insecure &#8211; anyone can access HDFS from a remote machine if this workaround is in place.<\/p>\n<p>After mounting HDFS to his or her local filesystem, a user can:<\/p>\n<ul>\n<li>Browse the HDFS file system through the local file system<\/li>\n<li>Upload and download files from the HDFS file system to and from the local file system.<\/li>\n<li>Stream data directly to HDFS through the mount point.<\/li>\n<\/ul>\n<p>File append is supported, but random write is not.<\/p>\n<h1 id=\"HDFS-HDFSCommands\"><span class=\"ez-toc-section\" id=\"HDFS_Commands\"><\/span>HDFS Commands<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"HDFS-FileSystemCheck\"><span class=\"ez-toc-section\" id=\"File_System_Check\"><\/span>File System Check<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_436078\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash plain\">-u hdfs hdfs\u00a0<\/code><code class=\"bash functions\">fsck<\/code>\u00a0<code class=\"bash plain\">\/ -files -blocks -locations &gt; ~<\/code><code class=\"bash plain\">\/fsck-files-08162017<\/code><code class=\"bash plain\">.log<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>Example report:<\/p>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_703225\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash plain\">Status: HEALTHY<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Total size: 1242886634788 B (Total\u00a0<\/code><code class=\"bash functions\">open<\/code>\u00a0<code class=\"bash plain\">files size: 540 B)<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Total\u00a0<\/code><code class=\"bash functions\">dirs<\/code><code class=\"bash plain\">: 24939<\/code><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Total files: 347544<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Total symlinks: 0 (Files currently being written: 9)<\/code><\/div>\n<div class=\"line number6 index5 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Total blocks (validated): 300935 (avg. block size 4130083 B) (Total\u00a0<\/code><code class=\"bash functions\">open<\/code>\u00a0<code class=\"bash functions\">file<\/code>\u00a0<code class=\"bash plain\">blocks (not validated): 8)<\/code><\/div>\n<div class=\"line number7 index6 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Minimally replicated blocks: 300935 (100.0 %)<\/code><\/div>\n<div class=\"line number8 index7 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Over-replicated blocks: 0 (0.0 %)<\/code><\/div>\n<div class=\"line number9 index8 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Under-replicated blocks: 0 (0.0 %)<\/code><\/div>\n<div class=\"line number10 index9 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Mis-replicated blocks: 0 (0.0 %)<\/code><\/div>\n<div class=\"line number11 index10 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Default replication factor: 3<\/code><\/div>\n<div class=\"line number12 index11 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Average block replication: 3.0217955<\/code><\/div>\n<div class=\"line number13 index12 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Corrupt blocks: 0<\/code><\/div>\n<div class=\"line number14 index13 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Missing replicas: 0 (0.0 %)<\/code><\/div>\n<div class=\"line number15 index14 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Number of data-nodes: 10<\/code><\/div>\n<div class=\"line number16 index15 alt1\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">Number of racks: 1<\/code><\/div>\n<div class=\"line number17 index16 alt2\"><code class=\"bash spaces\">\u00a0<\/code><code class=\"bash plain\">FSCK ended at Wed Aug 16 14:39:06 PDT 2017\u00a0<\/code><code class=\"bash keyword\">in<\/code>\u00a0<code class=\"bash plain\">16085 milliseconds<\/code><\/div>\n<div class=\"line number18 index17 alt1\"><\/div>\n<div class=\"line number19 index18 alt2\"><code class=\"bash plain\">The filesystem under path\u00a0<\/code><code class=\"bash string\">'\/'<\/code>\u00a0<code class=\"bash plain\">is HEALTHY<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"HDFS-ListCorruptBlocks\"><span class=\"ez-toc-section\" id=\"List_Corrupt_Blocks\"><\/span>List Corrupt Blocks<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_539787\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash plain\">-u hdfs hdfs\u00a0<\/code><code class=\"bash functions\">fsck<\/code>\u00a0<code class=\"bash plain\">-list-corruptfileblocks &gt; ~<\/code><code class=\"bash plain\">\/fsck-corruptfileblocks-08162017<\/code><code class=\"bash plain\">.log<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2 id=\"HDFS-DataNodeReport\" class=\"auto-cursor-target\"><span class=\"ez-toc-section\" id=\"DataNode_Report\"><\/span>DataNode Report<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_696358\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash plain\">-u hdfs hdfs dfsadmin -report &gt; ~<\/code><code class=\"bash plain\">\/dfsadmin-report-08162017<\/code><code class=\"bash plain\">.log<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>Example report:<\/p>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_117517\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash plain\">Configured Capacity: 68167702077440 (62.00 TB)<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash plain\">Present Capacity: 64158271559713 (58.35 TB)<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash plain\">DFS Remaining: 60359705043522 (54.90 TB)<\/code><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash plain\">DFS Used: 3798566516191 (3.45 TB)<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><code class=\"bash plain\">DFS Used%: 5.92%<\/code><\/div>\n<div class=\"line number6 index5 alt1\"><code class=\"bash plain\">Under replicated blocks: 0<\/code><\/div>\n<div class=\"line number7 index6 alt2\"><code class=\"bash plain\">Blocks with corrupt replicas: 0<\/code><\/div>\n<div class=\"line number8 index7 alt1\"><code class=\"bash plain\">Missing blocks: 0<\/code><\/div>\n<div class=\"line number9 index8 alt2\"><code class=\"bash plain\">Missing blocks (with replication factor 1): 0<\/code><\/div>\n<div class=\"line number10 index9 alt1\"><\/div>\n<div class=\"line number11 index10 alt2\"><code class=\"bash plain\">-------------------------------------------------<\/code><\/div>\n<div class=\"line number12 index11 alt1\"><code class=\"bash plain\">Live datanodes (10):<\/code><\/div>\n<div class=\"line number13 index12 alt2\"><\/div>\n<div class=\"line number14 index13 alt1\"><code class=\"bash plain\">Name: 10.200.0.12:50010 (servername01)<\/code><\/div>\n<div class=\"line number15 index14 alt2\"><code class=\"bash plain\">Hostname: servername01<\/code><\/div>\n<div class=\"line number16 index15 alt1\"><code class=\"bash plain\">Rack:\u00a0<\/code><code class=\"bash plain\">\/rackname01<\/code><\/div>\n<div class=\"line number17 index16 alt2\"><code class=\"bash plain\">Decommission Status : Normal<\/code><\/div>\n<div class=\"line number18 index17 alt1\"><code class=\"bash plain\">Configured Capacity: 6816770207744 (6.20 TB)<\/code><\/div>\n<div class=\"line number19 index18 alt2\"><code class=\"bash plain\">DFS Used: 393658411357 (366.62 GB)<\/code><\/div>\n<div class=\"line number20 index19 alt1\"><code class=\"bash plain\">Non DFS Used: 396947193216 (369.69 GB)<\/code><\/div>\n<div class=\"line number21 index20 alt2\"><code class=\"bash plain\">DFS Remaining: 6026164603171 (5.48 TB)<\/code><\/div>\n<div class=\"line number22 index21 alt1\"><code class=\"bash plain\">DFS Used%: 5.77%<\/code><\/div>\n<div class=\"line number23 index22 alt2\"><code class=\"bash plain\">DFS Remaining%: 88.40%<\/code><\/div>\n<div class=\"line number24 index23 alt1\"><code class=\"bash plain\">Configured Cache Capacity: 4294967296 (4 GB)<\/code><\/div>\n<div class=\"line number25 index24 alt2\"><code class=\"bash plain\">Cache Used: 0 (0 B)<\/code><\/div>\n<div class=\"line number26 index25 alt1\"><code class=\"bash plain\">Cache Remaining: 4294967296 (4 GB)<\/code><\/div>\n<div class=\"line number27 index26 alt2\"><code class=\"bash plain\">Cache Used%: 0.00%<\/code><\/div>\n<div class=\"line number28 index27 alt1\"><code class=\"bash plain\">Cache Remaining%: 100.00%<\/code><\/div>\n<div class=\"line number29 index28 alt2\"><code class=\"bash plain\">Xceivers: 10<\/code><\/div>\n<div class=\"line number30 index29 alt1\"><code class=\"bash plain\">Last contact: Wed Aug 16 14:42:17 PDT 2017<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2 id=\"HDFS-CreateaFileinHDFSusingtouchz\"><span class=\"ez-toc-section\" id=\"Create_a_File_in_HDFS_using_touchz\"><\/span>Create a File in HDFS using touchz<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_316733\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash plain\">hadoop fs -<\/code><code class=\"bash functions\">ls<\/code>\u00a0<code class=\"bash plain\">\/user\/username<\/code><code class=\"bash plain\">\/<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash plain\">hadoop fs -touchz\u00a0<\/code><code class=\"bash plain\">\/user\/username<\/code><code class=\"bash plain\">\/test-file-10272017-1<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2 id=\"HDFS-MoveaFilefromWindowstoHDFS\"><span class=\"ez-toc-section\" id=\"Move_a_File_from_Windows_to_HDFS\"><\/span>Move a File from Windows to HDFS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>1.\u00a0Create a connection using FileZilla.<\/p>\n<p>2. FileZilla moves the file from Windows to \/home\/username in Linux.<\/p>\n<p>3. Move from Linux to Hadoop:<\/p>\n<p>a. Bring up Cygwin and ssh to\u00a0servername01<\/p>\n<div>\n<blockquote><p>ssh -Y username@servername01<\/p><\/blockquote>\n<\/div>\n<p>b. Log in with ADM account user name and password<\/p>\n<p>c. I am currently in \/home\/username, view files in directory:<\/p>\n<div>\n<blockquote><p>ls<\/p><\/blockquote>\n<\/div>\n<p>d. Connect to Hadoop cluster<\/p>\n<div>\n<blockquote><p>hadoop fs -ls \/ # list files at root<\/p>\n<p>hadoop fs -ls \/user\/username # list a file that I want to upload data to in Hadoop<\/p>\n<p>hadoop fs -copyFromLocal \/home\/username\/20140713_TAB.txt \/user\/username<\/p>\n<p># the\u00a0-f option will overwrite the file if it exists<\/p><\/blockquote>\n<\/div>\n<p>4. Move from Hadoop back to Linux:<\/p>\n<div>\n<blockquote><p>hadoop fs -copyToLocal \/user\/username\/20140713_TAB.txt \/home\/username\/20140713_TAB_a.txt<\/p>\n<p>ls # to prove to myself it is there<\/p><\/blockquote>\n<\/div>\n<p>5. Move from Linux to Windows \u2013 use FileZilla<\/p>\n<p>Reference:\u00a0<a class=\"external-link\" href=\"http:\/\/hadoop.apache.org\/docs\/r0.18.3\/hdfs_shell.html\" rel=\"nofollow\">http:\/\/hadoop.apache.org\/docs\/r0.18.3\/hdfs_shell.html<\/a><\/p>\n<h2 id=\"HDFS-ListFolderStructure\"><span class=\"ez-toc-section\" id=\"List_Folder_Structure\"><\/span>List Folder Structure<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div>\n<blockquote><p>hadoop fs -ls \/asset\/<\/p><\/blockquote>\n<\/div>\n<h2 id=\"HDFS-Getasummaryofthefilesizes\"><span class=\"ez-toc-section\" id=\"Get_a_summary_of_the_file_sizes\"><\/span>Get a summary of the file sizes<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div>\n<blockquote><p>hadoop fs -du -s -h \/user\/*<\/p>\n<p>hadoop fs -du -s -h \/tmp\/temp* | grep T # filter for temp* files that are over 1 TB in size<\/p><\/blockquote>\n<\/div>\n<h2 id=\"HDFS-CopydatafromlocaldisktoHDFS\"><span class=\"ez-toc-section\" id=\"Copy_data_from_local_disk_to_HDFS\"><\/span>Copy data from local disk to HDFS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div>\n<blockquote><p>hadoop fs -copyFromLocal \/tmp\/from\/linux\/testfile.txt \/tmp\/to\/hdfs\/folder\/<\/p><\/blockquote>\n<h2 id=\"HDFS-CopydatafromHDFStolocal\"><span class=\"ez-toc-section\" id=\"Copy_data_from_HDFS_to_local\"><\/span>Copy data from HDFS to local<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<\/div>\n<div>\n<blockquote><p>hadoop fs -copyToLocal \/tmp\/file.txt \/tmp\/local\/folder<\/p><\/blockquote>\n<\/div>\n<h2 id=\"HDFS-DeleteFolderRecursively\"><span class=\"ez-toc-section\" id=\"Delete_Folder_Recursively\"><\/span>Delete Folder Recursively<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><em>Note: Be very careful that there are NO SPACES between the server:port name and the file or you will DELETE the entire root of HDFS.<\/em><\/p>\n<div>\n<blockquote><p>hadoop fs -rm -r \/tmp\/folder-to-delete-temp\/<\/p><\/blockquote>\n<\/div>\n<h2 id=\"HDFS-SetFolderPermissions\"><span class=\"ez-toc-section\" id=\"Set_Folder_Permissions\"><\/span>Set Folder Permissions<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div>\n<blockquote><p>sudo -u hdfs hadoop fs -chmod -R 775 \/staging\/wm\/offer_scoring<\/p><\/blockquote>\n<\/div>\n<h2 id=\"HDFS-CreateUserHomeDirectory\"><span class=\"ez-toc-section\" id=\"Create_User_Home_Directory\"><\/span>Create User Home Directory<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To create a user&#8217;s home directory you&#8217;ll have to sudo as hdfs:<\/p>\n<div>\n<blockquote><p>sudo -u hdfs hadoop fs -mkdir \/user\/username<\/p>\n<p>sudo -u hdfs hadoop fs -chown -R username:Users \/user\/username<\/p><\/blockquote>\n<p>A little simpler:<\/p>\n<\/div>\n<blockquote>\n<div>USER=username;sudo -u hdfs hadoop fs -mkdir -p \/user\/$USER;sudo -u hdfs hadoop fs -chown -R $USER \/user\/$USER;<\/div>\n<\/blockquote>\n<h2 id=\"HDFS-WebHDFSandHttpFSAPI\"><span class=\"ez-toc-section\" id=\"WebHDFS_and_HttpFS_API\"><\/span>WebHDFS and HttpFS API<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Here is an example opening a file on a dev cluster:<\/p>\n<p>http:\/\/nn.servername01:50070\/webhdfs\/v1\/asset\/reference\/timeanddate\/sample\/romaniaholidays2014.json?op=OPEN<\/p>\n<p>Note that the VM name of the NameNode is hardcoded into the URL, this is not the best because the NameNode will change as we use HDFS HA.<\/p>\n<p>So another good API to use is the HttpFS \u2013 the centralized service understands HDFS HA and is a better option, here is an example:<\/p>\n<p>http:\/\/fs.servername01:14000\/webhdfs\/v1\/asset\/reference\/timeanddate\/sample\/romaniaholidays2014.json?user.name=username&#038;op=OPEN<\/p>\n<p>Try with the ?op=create and create a file, for example\u2026<\/p>\n<h2 id=\"HDFS-ChangeReplicationFactor\"><span class=\"ez-toc-section\" id=\"Change_Replication_Factor\"><\/span>Change Replication Factor<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>HDFS stores files as data blocks and distributes these blocks across the entire cluster. As HDFS was designed to be fault-tolerant and to run on commodity hardware, blocks are replicated a number of times to ensure high data availability. The\u00a0<strong>replication factor<\/strong>\u00a0is a property that can be set in the HDFS configuration file that will allow you to adjust the global replication factor for the entire cluster. For each block stored in HDFS, there will be\u00a0<strong>n \u2013 1<\/strong>\u00a0duplicated blocks distributed across the cluster. For example, if the replication factor was set to 3 (default value in HDFS) there would be one original block and two replicas.<\/p>\n<p>SSH\u00a0to the node and run the following command where -w is the new replication factor:<\/p>\n<div>\n<blockquote><p>sudo -u hdfs hadoop dfs -setrep -R -w 1 \/<\/p><\/blockquote>\n<\/div>\n<h2 id=\"HDFS-CopydatafromoneHadoopclustertoanotherHadoopclusterusingDistCp\"><span class=\"ez-toc-section\" id=\"Copy_data_from_one_Hadoop_cluster_to_another_Hadoop_cluster_using_DistCp\"><\/span>Copy data from one Hadoop cluster to another Hadoop cluster using DistCp<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>DistCp (distributed copy) is a tool used for large inter\/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.<\/p>\n<p>Example usage:<\/p>\n<blockquote><p>hadoop distcp hdfs:\/\/<em>source-namenode:8020\/contents\/of\/folder\/from\/*<\/em>\u00a0hdfs:\/\/<em>destination-namenode:8020\/folder\/to\/<\/em><\/p><\/blockquote>\n<p>By default, distcp will skip files that already exist in the destination, but they can be overwritten by supplying the -overwrite option. You can also update only files that have changed using the -update option.<\/p>\n<p>distcp is implemented as a MapReduce job where the work of copying is done by maps that run in parallel across the cluster. There are no reducers. Each file is copied by a single map, and distcp tries to give each map approximately the same amount of data, by bucketing files into roughly equal allocations.<\/p>\n<p>The following command will copy the folder contents from one\u00a0<u>Hadoop 4.# cluster<\/u>\u00a0to a folder on a\u00a0<u>Hadoop 5.# cluster<\/u>. The hftp is necessary because Hadoop 4 and 5 are not wire-compatible. The command must be run on the destination cluster. Be sure your user has access to write to the destination folder.<\/p>\n<div>\n<blockquote><p>hadoop distcp -pb hftp:\/\/nn.servername01:50070\/user\/source\/* hdfs:\/\/fs.servername02:8020\/user\/destination\/<\/p><\/blockquote>\n<\/div>\n<p><em>Note: The -pb option will only preserve the block size.<\/em><\/p>\n<p><em>Double Note: For copying between two different versions of Hadoop we must use the HftpFileSystem, which is a read-only files system. So the distcp must be run on the\u00a0<strong>destination\u00a0<\/strong>cluster.<\/em><\/p>\n<p>The following command will copy the folder contents from one\u00a0<u>Hadoop 5.# cluster<\/u>\u00a0to a folder on another\u00a0<u>Hadoop 5.# cluster<\/u>.<\/p>\n<blockquote><p>hadoop distcp -pb hdfs:\/\/fs.servername01:8020\/user\/username\/* hdfs:\/\/fs.servername02:8020\/user\/username\/<\/p>\n<p>hadoop distcp -pb hdfs:\/\/hdfs-nn.servername01:8020\/user\/username\/* hdfs:\/\/hdfs-nn.servername02:8020\/user\/username\/<\/p><\/blockquote>\n<p>For additional options, use -p &lt;arg&gt; to preserve the following attributes: (rbugpcaxt):\u00a0(replication,\u00a0block-size, user, group, permission,\u00a0checksum-type, ACL, XATTR, timestamps).<\/p>\n<p><em>Note: If\u00a0-p is specified with no &lt;arg&gt;, then\u00a0preserves replication, block size, user,\u00a0group, permission, checksum type and\u00a0timestamps.<\/em><\/p>\n<h1 id=\"HDFS-RecoverUnder-Replicated,Missing,orCorruptBlocks\"><span class=\"ez-toc-section\" id=\"Recover_Under-Replicated_Missing_or_Corrupt_Blocks\"><\/span>Recover Under-Replicated, Missing, or Corrupt Blocks<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>If you run into a situation where there are a large number of under-replicated, missing, or corrupt blocks, you need to first make sure no volumes have failed. If all volumes are accounted for, then tackle the under-replicated blocks FIRST. As soon as all blocks have replicated you will have a clear picture of the missing or corrupt blocks. Next, take care of the missing blocks, and finally corrupt.<\/p>\n<p>Read the following section fully before proceeding &#8211; you need to be careful with these instructions as you might delete blocks unnecessarily.<\/p>\n<p>Fix any HDFS issues using fsck:<\/p>\n<div>\n<blockquote><p>sudo -u hdfs hdfs fsck \/<\/p>\n<p># save the results to a file<\/p>\n<p>sudo -u hdfs hdfs fsck \/ &gt; ~\/hdfs-fsck-09082014.txt<\/p>\n<p># report on under-replicated, missing, or corrupt blocks<\/p>\n<p>sudo -u hdfs hdfs dfsadmin -report<\/p><\/blockquote>\n<p>List only corrupt file blocks:<\/p>\n<blockquote><p>sudo -u hdfs hdfs fsck -list-corruptfileblocks &gt; ~\/list-corruptfileblocks-10262017.txt<\/p><\/blockquote>\n<\/div>\n<p>To determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with<\/p>\n<div>\n<blockquote><p>hadoop fsck \/ | egrep -v &#8216;^\\.+$&#8217; | grep -v eplica<\/p><\/blockquote>\n<\/div>\n<p>which ignores lines with nothing but dots and lines talking about replication.<\/p>\n<p>Once you find a file that is corrupt you can attempt to find out why it was marked as corrupt &#8211; where are the blocks, what servers are the blocks located on?<\/p>\n<div>\n<blockquote><p>hadoop fsck \/path\/to\/corrupt\/file -locations -blocks -files<\/p><\/blockquote>\n<\/div>\n<p>Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.<\/p>\n<p>You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted\/re-provisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.<\/p>\n<p>If the blocks are completely missing:<\/p>\n<div>\n<pre>0. blk_-6574099661639162407_21831 len=134217728 repl=3 [172.16.1.115:50010, 172.16.1.128:50010, 129.93.239.178:50010]<\/pre>\n<pre>1. blk_-8603098634897134795_21831 len=134217728 repl=3 MISSING!<\/pre>\n<pre>2. ...<\/pre>\n<\/div>\n<p>In the above log output, all possible sources of the second block are gone and the namenode has no knowledge of any host with it. This can happen when nodes are completely off or have no network connection to the namenode. In this case, the easiest solution is to grep for the block ID &#8220;8603098634897134795&#8221; in the namenode logs in hopes of seeing the last place that block lived. Providing you keep namenode logs around and the logging level is set high enough [what is high enough anyway?] you will hopefully find a datanode containing the block. If you are able to bring the datanode back up and the blocks are readable from the hard drive(s) the namenode will replicate it back to the appropriate amount and the file corruption will be gone.<\/p>\n<p>Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks.<\/p>\n<p>Once you determine what happened and you cannot recover any more blocks, just use the command to DELETE the missing blocks.<\/p>\n<div>\n<blockquote><p>hadoop fs -rm \/path\/to\/file\/with\/permanently\/missing\/blocks<\/p><\/blockquote>\n<\/div>\n<p><em>Note: After you delete the block you might have moved the block into the Trash. The Trash protects you from accidentally deleting blocks &#8211; giving you one more chance to recover. To fully delete the block you need to remove it from your Trash.<\/em><\/p>\n<p>To get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.<\/p>\n<p>Reference:\u00a0<a class=\"external-link\" href=\"https:\/\/twiki.grid.iu.edu\/bin\/view\/Storage\/HadoopRecovery\" rel=\"nofollow\">https:\/\/twiki.grid.iu.edu\/bin\/view\/Storage\/HadoopRecovery<\/a><\/p>\n<h1 id=\"HDFS-CheckpointinginHDFS\"><span class=\"ez-toc-section\" id=\"Checkpointing_in_HDFS\"><\/span>Checkpointing in HDFS<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>A typical edit ranges from 10s to 100s of bytes, but over time enough edits can accumulate to become unwieldy. A couple of problems can arise from these large edit logs. In extreme cases, it can fill up all the available disk capacity on a node, but more subtly, a large edit log can substantially delay NameNode startup as the NameNode reapplies all the edits. This is where checkpointing comes in.<\/p>\n<p>Checkpointing is a process that takes an fsimage and edit log and compacts them into a new fsimage. This way, instead of replaying a potentially unbounded edit log, the NameNode can load the final in-memory state directly from the fsimage. This is a far more efficient operation and reduces NameNode startup time.<\/p>\n<p>However, creating a new fsimage is an I\/O- and CPU-intensive operation, sometimes taking minutes to perform. During a checkpoint, the namesystem also needs to restrict concurrent access from other users. So, rather than pausing the active NameNode to perform a checkpoint, HDFS defers it to either the SecondaryNameNode or Standby NameNode, depending on whether NameNode high-availability is configured.<\/p>\n<p>In either case though, checkpointing is triggered by one of two conditions: if enough time has elapsed since the last checkpoint (dfs.namenode.checkpoint.period), or if enough new edit log transactions have accumulated (dfs.namenode.checkpoint.txns). The checkpointing node periodically checks if either of these conditions are met (dfs.namenode.checkpoint.check.period), and if so, kicks off the checkpointing process.<\/p>\n<p>Reference:\u00a0<a class=\"external-link\" href=\"http:\/\/blog.cloudera.com\/blog\/2014\/03\/a-guide-to-checkpointing-in-hadoop\/\" rel=\"nofollow\">http:\/\/blog.cloudera.com\/blog\/2014\/03\/a-guide-to-checkpointing-in-hadoop\/<\/a><\/p>\n<h1 id=\"HDFS-HDFSBalancer\"><span class=\"ez-toc-section\" id=\"HDFS_Balancer\"><\/span>HDFS Balancer<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>The HDFS Balancer is a tool used to balance data across the DataNodes. If you add new DataNodes you might notice that the data is not distributed equally across all nodes.<\/p>\n<h2 id=\"HDFS-StarttheBalancer\"><span class=\"ez-toc-section\" id=\"Start_the_Balancer\"><\/span>Start the Balancer<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To start the HDFS Balancer, select the HDFS service from Cloudera Manager, click on Instances, and click on the Balancer service. From within the Balancer service, click Actions, and click Rebalance.<\/p>\n<p>Find the Balancer\u2019s log to discover the status of the balancing process: \/run\/cloudera-scm-agent\/process\/<em>pid#<\/em>-hdfs-BALANCER\/logs\/stdout.log, using a command like this: ls -lrt \/run\/cloudera-scm-agent\/process\/|grep BALANCER<\/p>\n<p>In Cloudera Manager\u2019s Web UI you can find the log by clicking on the Stdout log under the Balancer service under the node Processes.<\/p>\n<p>Near the beginning of the log you will find how much work has to be done to balance the cluster and how many nodes are over or underutilized. For example:<\/p>\n<p>2014-08-26 08:40:31,834 INFO\u00a0 [main] balancer.Balancer (Balancer.java:logNodes(907)) &#8211;\u00a01 over-utilized: [Source[192.168.210.202:50010, utilization=86.94839955017261]]<\/p>\n<p>2014-08-26 08:40:31,835 INFO\u00a0 [main] balancer.Balancer (Balancer.java:logNodes(907)) &#8211;\u00a04 underutilized: [BalancerDatanode[192.168.210.224:50010, utilization=10.070419522861195], BalancerDatanode[192.168.210.64:50010, utilization=1.4040975994717294E-5], BalancerDatanode[192.168.210.229:50010, utilization=3.730515639121875], BalancerDatanode[192.168.210.208:50010, utilization=15.298922338705484]]<\/p>\n<p>2014-08-26 08:40:31,861 INFO\u00a0 [main] balancer.Balancer (Balancer.java:run(1344)) &#8211;\u00a0Need to move 1.47 TB to make the cluster balanced.<\/p>\n<p>2014-08-26 08:40:31,863 INFO\u00a0 [main] balancer.Balancer (Balancer.java:chooseTarget(1038)) &#8211; Decided to move 10 GB bytes from 192.168.210.202:50010 to 192.168.210.224:50010<\/p>\n<p>2014-08-26 08:40:31,863 INFO\u00a0 [main] balancer.Balancer (Balancer.java:chooseSource(1086)) &#8211; Decided to move 10 GB bytes from 192.168.210.254:50010 to 192.168.210.64:50010<\/p>\n<p>2014-08-26 08:40:31,863 INFO\u00a0 [main] balancer.Balancer (Balancer.java:chooseSource(1086)) &#8211; Decided to move 10 GB bytes from 192.168.210.215:50010 to 192.168.210.229:50010<\/p>\n<p>2014-08-26 08:40:31,863 INFO\u00a0 [main] balancer.Balancer (Balancer.java:chooseSource(1086)) &#8211; Decided to move 10 GB bytes from 192.168.210.207:50010 to 192.168.210.208:50010<\/p>\n<p>2014-08-26 08:40:31,863 INFO\u00a0 [main] balancer.Balancer (Balancer.java:run(1358)) &#8211;\u00a0Will move 40 GB in this iteration<\/p>\n<p>Aug 26, 2014 8:40:31 AM\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a00\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0 B\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1.47 TB\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 40 GB<\/p>\n<p>2014-08-26 08:40:32,101 INFO\u00a0 [pool-2-thread-3] balancer.Balancer (Balancer.java:dispatch(344)) &#8211; Moving block 8184095414603472774 from 192.168.210.207:50010 to 192.168.210.208:50010 through 192.168.210.254:50010 is succeeded.<\/p>\n<h2 id=\"HDFS-StoptheBalancer\"><span class=\"ez-toc-section\" id=\"Stop_the_Balancer\"><\/span>Stop the Balancer<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To stop the Balancer, click Running Commands in the upper right hand side of Cloudera Manager\u2019s web UI, find the Rebalance process, and click Abort.<\/p>\n<p>Reference:\u00a0<a class=\"external-link\" href=\"http:\/\/www.swiss-scalability.com\/2013\/08\/hadoop-hdfs-balancer-explained.html\" rel=\"nofollow\">http:\/\/www.swiss-scalability.com\/2013\/08\/hadoop-hdfs-balancer-explained.html<\/a><\/p>\n<p>Some notes:\u00a0<a class=\"external-link\" href=\"http:\/\/mail-archives.apache.org\/mod_mbox\/hadoop-common-user\/201108.mbox\/%3CE5C6ED175FFCE34D974527016DF712FD6A90C5D31C@AMRXM3113.dir.svc.accenture.com%3E\" rel=\"nofollow\">http:\/\/mail-archives.apache.org\/mod_mbox\/hadoop-common-user\/201108.mbox\/%3CE5C6ED175FFCE34D974527016DF712FD6A90C5D31C@AMRXM3113.dir.svc.accenture.com%3E<\/a><\/p>\n<p><span class=\"confluence-embedded-file-wrapper confluence-embedded-manual-size\"><img decoding=\"async\" class=\"confluence-embedded-image\" src=\"https:\/\/dsiqinc.atlassian.net\/wiki\/download\/thumbnails\/46858536\/cloudera-3.png?version=1&amp;modificationDate=1429227115562&amp;cacheVersion=1&amp;api=v2&amp;width=409&amp;height=400\" height=\"400\" \/><\/span><\/p>\n<h1 id=\"HDFS-Troubleshooting\"><span class=\"ez-toc-section\" id=\"Troubleshooting\"><\/span>Troubleshooting<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"HDFS-ManuallyStartHDFS\"><span class=\"ez-toc-section\" id=\"Manually_Start_HDFS\"><\/span>Manually Start HDFS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Not done very often, but once in a while, you may have to manually start HDFS:<\/p>\n<blockquote><p>for x in `cd \/etc\/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done<\/p><\/blockquote>\n<h2 id=\"HDFS-HDFSisinSafeMode\"><span class=\"ez-toc-section\" id=\"HDFS_is_in_Safe_Mode\"><\/span>HDFS is in Safe Mode<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In Cloudera Manager, browse to the NameNode service and click Actions. You will see an option in the menu to Enter and Leave Safe Mode.<\/p>\n<p>To get the safe mode status:<\/p>\n<p>sudo hdfs dfsadmin -safemode get<\/p>\n<p>To take HDFS out of safe mode:<\/p>\n<p>sudo -su hdfs hdfs dfsadmin -safemode leave<\/p>\n<p>To turn safe mode on:<\/p>\n<p>sudo -su hdfs hdfs dfsadmin -safemode enter<\/p>\n<h2 id=\"HDFS-HDFSCheckpointAgehasbecomebad\"><span class=\"ez-toc-section\" id=\"HDFS_Checkpoint_Age_has_become_bad\"><\/span>HDFS Checkpoint Age has become bad<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>If a Backup (Standby) or Secondary NameNode fails, the Active NameNode will have no other node to write checkpoints to (documenting changes to data) and\u00a0the two NameNodes will slowly becoming disparate. This could lead to data loss if the Active NameNode fails because we would be left with a NameNode that does not know about any of new or changed data. So an alert is thrown: NAME_NODE_HA_CHECKPOINT_AGE has become bad.<\/p>\n<p>To fix this problem, find the NameNode that is down, find out why it is down, and restart it. The Active NameNode will write its checkpoint and the alert will clear.<\/p>\n<p>In Cloudera Manager, here is what this problem\u00a0would look like:<\/p>\n<p><span class=\"confluence-embedded-file-wrapper\"><img decoding=\"async\" class=\"confluence-embedded-image\" src=\"https:\/\/dsiqinc.atlassian.net\/wiki\/download\/attachments\/46858536\/hdfs-checkpoint-1.png?version=2&amp;modificationDate=1442252766926&amp;cacheVersion=1&amp;api=v2\" \/><\/span><\/p>\n<p>In this screenshot you can see that although the Active NameNode is bad \u2013 it is actually still running and SHOULD NOT be restarted. The first NameNode in the list with the \u2018down\u2019 symbol should be restarted (which line through the red ball indicates down).<\/p>\n<p>To fix this, check the box of the NameNode (with the red symbol with a white line) and select restart. The NameNode will come back up as a Backup and the checkpoint will be written.<\/p>\n<h2 id=\"HDFS-HDFSStartedwithBadHealth:Addressisalreadyinuse\"><span class=\"ez-toc-section\" id=\"HDFS_Started_with_Bad_Health_Address_is_already_in_use\"><\/span>HDFS Started with Bad Health: Address is already in use<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><strong>Error:\u00a0<\/strong>Hdfs started with &#8216;bad&#8217; health. When you drill down to the Datanode and SecondaryNameNode, they are in &#8216;bad&#8217; health.<\/p>\n<p><strong>The log file contains the error:<\/strong><\/p>\n<p>2013-02-19 13:20:37,437 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.BindException: Problem binding to [servername01:50010] java.net.BindException: Address already in use; For more details see:\u00a0 http:\/\/wiki.apache.org\/hadoop \/BindException<\/p>\n<p>Turns out port 50010 and 50090 which Datanode and SecondaryNameNode are trying to use were already taken by tcp.<\/p>\n<p>This may also fail during configuration of a Single Node:<\/p>\n<p>Checking if the name directories of the NameNode are empty. Formatting HDFS only if empty.<\/p>\n<p>Failed to format NameNode.<\/p>\n<p>Starting HDFS Service<\/p>\n<p>Service did not start successfully; not all of the required roles started: Service hdfs1 does not have sufficient running NameNodes.<\/p>\n<p><strong>Solution:<\/strong><\/p>\n<div>\n<blockquote><p>vi\u00a0\/var\/log\/hadoop-hdfs\/hadoop-cmf-hdfs1-SECONDARYNAMENODE-servername01.log.out<\/p>\n<p>sudo netstat -a -t &#8211;numeric-ports -p | grep java | grep LISTEN | grep 500<\/p><\/blockquote>\n<\/div>\n<p><strong>OR<\/strong><\/p>\n<div>\n<blockquote><p>sudo lsof -P -n | grep LISTEN | grep 500<\/p>\n<p>sudo kill -9 1147<\/p>\n<p>restart Hdfs service<\/p>\n<p>sudo jps # to verify results<\/p><\/blockquote>\n<\/div>\n<p><strong>OR<\/strong><\/p>\n<div>\n<p>Stop the hdfs services:<\/p>\n<blockquote><p>sudo service hadoop-hdfs-namenode stop<\/p>\n<p>service hadoop-hdfs-datanode stop<\/p>\n<p>sudo service hadoop-hdfs-secondarynamenode stop<\/p>\n<p>sudo service hadoop-0.20-mapreduce-tasktracker stop<\/p>\n<p>sudo service hadoop-0.20-mapreduce-jobtracker stop<\/p><\/blockquote>\n<\/div>\n<p><strong>Reference:\u00a0<\/strong><a class=\"external-link\" href=\"http:\/\/grokbase.com\/t\/cloudera\/scm-users\/133k2jtxc2\/unable-to-start-namenode-and-hbase-master-on-cloudera-manager\" rel=\"nofollow\">http:\/\/grokbase.com\/t\/cloudera\/scm-users\/133k2jtxc2\/unable-to-start-namenode-and-hbase-master-on-cloudera-manager<\/a><\/p>\n<h2 id=\"HDFS-HDFSNameNodeStopped:HDFSServiceisDown\"><span class=\"ez-toc-section\" id=\"HDFS_NameNode_Stopped_HDFS_Service_is_Down\"><\/span>HDFS NameNode Stopped: HDFS Service is Down<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The BETA cluster NameNode has stopped after I decommissioned a node from HDFS. I have never seen this behavior before. Restarting the HDFS cluster did not resolve the problem.<\/p>\n<p>I decommissioned HDFS DataNode blvbetahdp34. It seems that immediately after this task the NameNode, running on blvbetahdp02 crashed &#8211; although the service remained running on the server. This caused a situation where the \/space1{2}\/dfs\/nn\/in_use.lock file was locked by the running process. I moved the in_use.lock file to tmp, but that was unnecessary. I discovered the running NameNode by searching for the port and killed the running NameNode process. A restart of the NameNode initially failed on a canary test, but I restarted the entire cluster which worked.<\/p>\n<p>Some notes:<\/p>\n<p>2014-03-31 11:39:43,862 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join<\/p>\n<p>java.io.IOException: Cannot lock storage \/space1\/dfs\/nn. The directory is already locked<\/p>\n<p>at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:634)<\/p>\n<p>at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:457)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:292)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:207)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:728)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:521)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:445)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.&lt;init&gt;(NameNode.java:621)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.&lt;init&gt;(NameNode.java:606)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1177)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241)<\/p>\n<p>2014-03-31 11:39:43,881 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1<\/p>\n<p>2014-03-31 11:39:43,884 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:<\/p>\n<p>\/************************************************************<\/p>\n<p>SHUTDOWN_MSG: Shutting down NameNode at servername01\/192.168.210.253<\/p>\n<p>************************************************************\/<\/p>\n<p>Move in_use.lock and restart the NameNode:<\/p>\n<p>2014-03-31 12:21:12,076 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.<\/p>\n<p>2014-03-31 12:21:12,077 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join<\/p>\n<p>java.net.BindException: Problem binding to [servername01:8022] java.net.BindException: Address already in use; For more details see:\u00a0\u00a0<span class=\"nolink\">http:\/\/wiki.apache.org\/hadoop\/BindException<\/span><\/p>\n<p>at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)<\/p>\n<p>at org.apache.hadoop.ipc.Server.bind(Server.java:403)<\/p>\n<p>at org.apache.hadoop.ipc.Server$Listener.(Server.java:501)<\/p>\n<p>at org.apache.hadoop.ipc.Server.(Server.java:1894)<\/p>\n<p>at org.apache.hadoop.ipc.RPC$Server.(RPC.java:970)<\/p>\n<p>at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:375)<\/p>\n<p>at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:350)<\/p>\n<p>at org.apache.hadoop.ipc.RPC.getServer(RPC.java:695)<\/p>\n<p>at org.apache.hadoop.ipc.RPC.getServer(RPC.java:684)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:221)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:468)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:447)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:621)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:606)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1177)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241)<\/p>\n<p>2014-03-31 12:21:12,093 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1<\/p>\n<p>2014-03-31 12:21:12,095 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:<\/p>\n<p>\/************************************************************<\/p>\n<p>SHUTDOWN_MSG: Shutting down NameNode at servername01\/192.168.210.253<\/p>\n<p>************************************************************\/<\/p>\n<div>\n<blockquote><p>netstat -a -t &#8211;numeric-ports -p|grep 8022<\/p><\/blockquote>\n<\/div>\n<p>tcp\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0\u00a0\u00a0\u00a0 0 192.168.210.253:8022\u00a0\u00a0\u00a0 0.0.0.0:*\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 LISTEN\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<u>31689<\/u>\/java<\/p>\n<div>\n<blockquote><p>sudo kill 31689<\/p><\/blockquote>\n<\/div>\n<p>Restart NameNode using Cloudera Manager (the canary test might fail) &#8211; then restart the entire HDFS cluster.<\/p>\n<h2 id=\"HDFS-HDFSError:Canarytestfailed\u2013Permissiondenied:error=13\"><span class=\"ez-toc-section\" id=\"HDFS_Error_Canary_test_failed_%E2%80%93_Permission_denied_error13\"><\/span>HDFS Error: Canary test failed \u2013 Permission denied: error=13<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>1.\u00a0Initially this outage presented itself as an HDFS NameNode outage. The Canary test failed and the error messages that I could find pointed to the NameNode. From the logs in Cloudera Manager:<\/p>\n<p>Canary test failed to read file in directory \/tmp\/.cloudera_health_monitoring_canary_files<\/p>\n<p>The health test result for HDFS_HA_NAMENODE_HEALTH\u00a0 has become bad: The active NameNode&#8217;s health is bad.<\/p>\n<p>2. Looking deeper into the NameNode logs, I discovered a Permission Denied error when running a script in the \/run folder:<\/p>\n<p><span class=\"nolink\">http:\/\/servername01:50070\/logs\/hadoop-cmf-hdfs1-NAMENODE-servername01.log.out<\/span><\/p>\n<p>2014-04-10 10:46:56,637 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: \/tmp\/.cloudera_health_monitoring_canary_files\/.canary_file_2014_04_10-10_46_56 is closed by DFSClient_NONMAPREDUCE_244274405_96<\/p>\n<p>2014-04-10 10:46:56,675 WARN org.apache.hadoop.net.ScriptBasedMapping: Exception running \/run\/cloudera-scm-agent\/process\/1080-hdfs-NAMENODE\/topology.py 192.168.210.242<\/p>\n<p>java.io.IOException: Cannot run program &#8220;\/run\/cloudera-scm-agent\/process\/1080-hdfs-NAMENODE\/topology.py&#8221; (in directory &#8220;\/run\/cloudera-scm-agent\/process\/1080-hdfs-NAMENODE&#8221;):\u00a0<a class=\"external-link\" href=\"http:\/\/java.io\/\" rel=\"nofollow\">java.io<\/a>.IOException: error=13, Permission denied<\/p>\n<p>at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)<\/p>\n<p>at org.apache.hadoop.util.Shell.runCommand(Shell.java:206)<\/p>\n<p>at org.apache.hadoop.util.Shell.run(Shell.java:188)<\/p>\n<p>at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:381)<\/p>\n<p>at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:242)<\/p>\n<p>at org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:180)<\/p>\n<p>at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)<\/p>\n<p>at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.sortLocatedBlocks(DatanodeManager.java:334)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1343)<\/p>\n<p>at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:413)<\/p>\n<p>at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:172)<\/p>\n<p>at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44938)<\/p>\n<p>at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)<\/p>\n<p>at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)<\/p>\n<p>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)<\/p>\n<p>at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)<\/p>\n<p>at java.security.AccessController.doPrivileged(Native Method)<\/p>\n<p>at javax.security.auth.Subject.doAs(Subject.java:396)<\/p>\n<p>at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)<\/p>\n<p>at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)<\/p>\n<p>Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied<\/p>\n<p>at java.lang.UNIXProcess.&lt;init&gt;(UNIXProcess.java:148)<\/p>\n<p>at java.lang.ProcessImpl.start(ProcessImpl.java:65)<\/p>\n<p>at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)<\/p>\n<p>&#8230; 19 more<\/p>\n<p>Or narrow down the search with this command:<\/p>\n<div>\n<blockquote><p>tail -n 24 \/var\/log\/hadoop-hdfs\/hadoop-cmf-hdfs1-NAMENODE-servername01.log.out|grep &#8220;WARN org.apache.hadoop&#8221;<\/p><\/blockquote>\n<\/div>\n<p>If you see the WARN exception, you\u2019re in trouble\u2026<\/p>\n<p>3. This pointed me to evaluate our \/run mount, which is mounted on tmpfs (shared memory). I discovered that Ubuntu had changed tmpfs to noexec, which will not allow a script to execute. Executing scripts in shared memory is usually not a good thing anyways, and Ubuntu must be locking down their OS. However, Clouera still runs scripts out of the \/run folder on tmpfs. To fix this I mounted tmpfs without noexec by editing the \/etc\/fstab file in the following manner (Note: you can temporarily fixed the problem by mounting \/run with the exec option (sudo mount -o remount,exec \/run), but that&#8217;s less than ideal):<\/p>\n<p>Running mount shows how tmpfs is already mounted:<\/p>\n<div>\n<blockquote><p>tmpfs on \/run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)<\/p><\/blockquote>\n<\/div>\n<p>Note the noexec flag on tmpfs.<\/p>\n<div>\n<blockquote><p>sudo vi \/etc\/fstab<\/p><\/blockquote>\n<\/div>\n<p># add this line to the bottom of the fstab file:<\/p>\n<div>\n<blockquote><p>tmpfs \/run tmpfs rw,nosuid,size=10%,mode=0755 0 0<\/p><\/blockquote>\n<\/div>\n<p>4. Save the file and mount the filesystem:<\/p>\n<div>\n<blockquote><p>sudo mount -o mount \/run<\/p><\/blockquote>\n<\/div>\n<p>5. Restart HDFS.<\/p>\n<p><em>Note: Look at the tmpfs settings:<\/em><\/p>\n<div>\n<blockquote><p>sudo cat \/lib\/init\/fstab<\/p><\/blockquote>\n<\/div>\n<p># These are the filesystems that are always mounted on boot, you can<\/p>\n<p># override any of these by copying the appropriate line from this file into<\/p>\n<p># \/etc\/fstab and tweaking it as you see fit.\u00a0 See fstab(5).<\/p>\n<h2 id=\"HDFS-HDFSJournalNode:FileNotFoundException:Nosuchfileordirectory\"><span class=\"ez-toc-section\" id=\"HDFS_JournalNode_FileNotFoundException_No_such_file_or_directory\"><\/span>HDFS JournalNode:\u00a0FileNotFoundException:\u00a0No such file or directory<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div>\n<div class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_892874\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash plain\">Aug 15, 2:34:50.515 PM\u00a0 INFO\u00a0\u00a0\u00a0 org.apache.hadoop.ipc.Server\u00a0\u00a0\u00a0<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash plain\">IPC Server handler 4 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.heartbeat from 161.170.176.104:39343 Call<\/code><code class=\"bash comments\">#631824 Retry#0<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash plain\">java.io.FileNotFoundException:\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1\/current\/last-promised-epoch<\/code><code class=\"bash plain\">.tmp (No such\u00a0<\/code><code class=\"bash functions\">file<\/code>\u00a0<code class=\"bash plain\">or directory)<\/code><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at java.io.FileOutputStream.open0(Native Method)<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at java.io.FileOutputStream.<\/code><code class=\"bash functions\">open<\/code><code class=\"bash plain\">(FileOutputStream.java:270)<\/code><\/div>\n<div class=\"line number6 index5 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:213)<\/code><\/div>\n<div class=\"line number7 index6 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at java.io.FileOutputStream.&lt;init&gt;(FileOutputStream.java:162)<\/code><\/div>\n<div class=\"line number8 index7 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.util.AtomicFileOutputStream.&lt;init&gt;(AtomicFileOutputStream.java:58)<\/code><\/div>\n<div class=\"line number9 index8 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.util.PersistentLongFile.writeFile(PersistentLongFile.java:78)<\/code><\/div>\n<div class=\"line number10 index9 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.util.PersistentLongFile.<\/code><code class=\"bash functions\">set<\/code><code class=\"bash plain\">(PersistentLongFile.java:64)<\/code><\/div>\n<div class=\"line number11 index10 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.Journal.updateLastPromisedEpoch(Journal.java:327)<\/code><\/div>\n<div class=\"line number12 index11 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:435)<\/code><\/div>\n<div class=\"line number13 index12 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.Journal.heartbeat(Journal.java:418)<\/code><\/div>\n<div class=\"line number14 index13 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.heartbeat(JournalNodeRpcServer.java:155)<\/code><\/div>\n<div class=\"line number15 index14 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.heartbeat(QJournalProtocolServerSideTranslatorPB.java:172)<\/code><\/div>\n<div class=\"line number16 index15 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25423)<\/code><\/div>\n<div class=\"line number17 index16 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)<\/code><\/div>\n<div class=\"line number18 index17 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)<\/code><\/div>\n<div class=\"line number19 index18 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)<\/code><\/div>\n<div class=\"line number20 index19 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)<\/code><\/div>\n<div class=\"line number21 index20 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at java.security.AccessController.doPrivileged(Native Method)<\/code><\/div>\n<div class=\"line number22 index21 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at javax.security.auth.Subject.doAs(Subject.java:422)<\/code><\/div>\n<div class=\"line number23 index22 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)<\/code><\/div>\n<div class=\"line number24 index23 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)<\/code><\/div>\n<div class=\"line number25 index24 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0\u00a0<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p class=\"auto-cursor-target\">Solution: The Storage Directory should have been created, and I am not sure how it was deleted. You can create the folder manually:<\/p>\n<\/div>\n<\/div>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_589764\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash functions\">mkdir<\/code>\u00a0<code class=\"bash plain\">-p\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1\/current\/<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash functions\">chown<\/code>\u00a0<code class=\"bash plain\">-R hdfs:hdfs\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><em>Note: This solution will lead to the\u00a0JournalNotFormattedException, explained below.<\/em><\/p>\n<h2 id=\"HDFS-HDFSJournalNode:JournalNotFormattedException:JournalStorageDirectory*notformatted\"><span class=\"ez-toc-section\" id=\"HDFS_JournalNode_JournalNotFormattedException_Journal_Storage_Directory_not_formatted\"><\/span>HDFS JournalNode:\u00a0JournalNotFormattedException:\u00a0Journal Storage Directory * not formatted<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Error:<\/p>\n<div>\n<div class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_324046\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash plain\">Aug 15, 2:46:01.285 PM\u00a0 WARN\u00a0\u00a0\u00a0 org.apache.hadoop.security.UserGroupInformation<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash plain\">PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1<\/code>\u00a0<code class=\"bash plain\">not formatted<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash plain\">Aug 15, 2:46:01.286 PM\u00a0 INFO\u00a0\u00a0\u00a0 org.apache.hadoop.ipc.Server\u00a0\u00a0\u00a0<\/code><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash plain\">IPC Server handler 3 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getEditLogManifest from 161.170.176.10:37369 Call<\/code><code class=\"bash comments\">#83 Retry#0<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><code class=\"bash plain\">org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal Storage Directory\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1<\/code>\u00a0<code class=\"bash plain\">not formatted<\/code><\/div>\n<div class=\"line number6 index5 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:472)<\/code><\/div>\n<div class=\"line number7 index6 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:655)<\/code><\/div>\n<div class=\"line number8 index7 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:186)<\/code><\/div>\n<div class=\"line number9 index8 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:236)<\/code><\/div>\n<div class=\"line number10 index9 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431)<\/code><\/div>\n<div class=\"line number11 index10 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)<\/code><\/div>\n<div class=\"line number12 index11 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)<\/code><\/div>\n<div class=\"line number13 index12 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)<\/code><\/div>\n<div class=\"line number14 index13 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)<\/code><\/div>\n<div class=\"line number15 index14 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at java.security.AccessController.doPrivileged(Native Method)<\/code><\/div>\n<div class=\"line number16 index15 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at javax.security.auth.Subject.doAs(Subject.java:422)<\/code><\/div>\n<div class=\"line number17 index16 alt2\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)<\/code><\/div>\n<div class=\"line number18 index17 alt1\"><code class=\"bash spaces\">\u00a0\u00a0\u00a0\u00a0<\/code><code class=\"bash plain\">at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p class=\"auto-cursor-target\"><strong>Solution:<\/strong>\u00a0The JournalNode was missing the cluster configuration file (VERSION). Create the VERSION file in the Storage Directory and add the contents of the VERSION file from another JournalNode (I&#8217;ve pasted an example below).<\/p>\n<div>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_661311\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash functions\">mkdir<\/code>\u00a0<code class=\"bash plain\">-p\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1\/current\/<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash functions\">chown<\/code>\u00a0<code class=\"bash plain\">-R hdfs:hdfs\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1\/<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><code class=\"bash comments\"># create the file<\/code><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash functions\">vi<\/code>\u00a0<code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1\/current\/VERSION<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><code class=\"bash functions\">sudo<\/code>\u00a0<code class=\"bash functions\">chown<\/code>\u00a0<code class=\"bash plain\">hdfs:hdfs\u00a0<\/code><code class=\"bash plain\">\/space1\/dfs\/jn\/nameservice1\/current\/VERSION<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h3 id=\"HDFS-AzureDev\" class=\"auto-cursor-target\"><span class=\"ez-toc-section\" id=\"Azure_Dev\"><\/span>Azure Dev<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>#Wed Jul 29 23:37:46 PDT 2015<br \/>\nnamespaceID=1236816086<br \/>\nclusterID=cluster18<br \/>\ncTime=1438238263778<br \/>\nstorageType=JOURNAL_NODE<br \/>\nlayoutVersion=-60<\/p>\n<h2 id=\"HDFS-HDFSUnderReplicatedBlocks\"><span class=\"ez-toc-section\" id=\"HDFS_Under_Replicated_Blocks\"><\/span>HDFS Under Replicated Blocks<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Details: This is an HDFS service-level health check that checks that the number of under-replicated blocks does not rise above some percentage of the cluster&#8217;s total blocks. A failure of this health check may indicate a loss of DataNodes. Use the HDFS fsck command to identify which files contain under replicated blocks.<\/p>\n<p>There are reasons for managing the replication level of data on a running Hadoop system. For example, if you don\u2019t have even distribution of blocks across your DataNodes, you can increase replication temporarily and then bring it back down.<\/p>\n<p>To set replication of an individual file to 4:<\/p>\n<div>\n<blockquote><p>sudo -u hdfs hadoop dfs -setrep -w 4 \/path\/to\/file<\/p><\/blockquote>\n<\/div>\n<p>You can also do this recursively. To change replication of entire HDFS to 1:<\/p>\n<div>\n<blockquote><p>sudo -u hdfs hadoop dfs -setrep -R -w 1 \/<\/p><\/blockquote>\n<\/div>\n<p>To script a fix for under-replicated blocks in HDFS, try the following:<\/p>\n<div class=\"code panel pdl conf-macro output-block\">\n<div class=\"codeContent panelContent pdl\">\n<div>\n<div id=\"highlighter_942533\" class=\"syntaxhighlighter sh-confluence nogutter bash\">\n<table border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td class=\"code\">\n<div class=\"container\" title=\"Hint: double-click to select code\">\n<div class=\"line number1 index0 alt2\"><code class=\"bash comments\">####Fix under-replicated blocks###<\/code><\/div>\n<div class=\"line number2 index1 alt1\"><code class=\"bash functions\">su<\/code>\u00a0<code class=\"bash plain\">- &lt;$hdfs_user&gt;<\/code><\/div>\n<div class=\"line number3 index2 alt2\"><\/div>\n<div class=\"line number4 index3 alt1\"><code class=\"bash plain\">hdfs\u00a0<\/code><code class=\"bash functions\">fsck<\/code>\u00a0<code class=\"bash plain\">\/ |\u00a0<\/code><code class=\"bash functions\">grep<\/code>\u00a0<code class=\"bash string\">'Under replicated'<\/code>\u00a0<code class=\"bash plain\">|\u00a0<\/code><code class=\"bash functions\">awk<\/code>\u00a0<code class=\"bash plain\">-F<\/code><code class=\"bash string\">':'<\/code>\u00a0<code class=\"bash string\">'{print $1}'<\/code>\u00a0<code class=\"bash plain\">&gt;&gt;\u00a0<\/code><code class=\"bash plain\">\/tmp\/under_replicated_files<\/code><\/div>\n<div class=\"line number5 index4 alt2\"><\/div>\n<div class=\"line number6 index5 alt1\"><code class=\"bash keyword\">for<\/code>\u00a0<code class=\"bash plain\">hdfsfile\u00a0<\/code><code class=\"bash keyword\">in<\/code>\u00a0<code class=\"bash plain\">`<\/code><code class=\"bash functions\">cat<\/code>\u00a0<code class=\"bash plain\">\/tmp\/under_replicated_files<\/code><code class=\"bash plain\">`;\u00a0<\/code><code class=\"bash keyword\">do<\/code>\u00a0<code class=\"bash functions\">echo<\/code>\u00a0<code class=\"bash string\">\"Fixing $hdfsfile :\"<\/code>\u00a0<code class=\"bash plain\">; hadoop fs -setrep 3 $hdfsfile;\u00a0<\/code><code class=\"bash keyword\">done<\/code><\/div>\n<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2 id=\"HDFS-HDFSMissingBlocks\"><span class=\"ez-toc-section\" id=\"HDFS_Missing_Blocks\"><\/span>HDFS Missing Blocks<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>To discover what blocks have not been replicated, browse to the NameNode Web UI under the hdfs Service in Cloudera Manager. Then click on the Warning link in the Cluster Summary section of the page.<\/p>\n<p>Or browse to the page locally using the w3m web browser:<\/p>\n<div>\n<blockquote><p>w3m &#8216;http:\/\/NameNodeServerName:50070\/corrupt_files.jsp&#8217;<\/p><\/blockquote>\n<\/div>\n<p>On the Warning page you will have a list of Reported Corrupt Files. You can either replicate these files (if you can locate them), or delete these files.<\/p>\n<p>To delete the files, copy the file name and run the following command on the NameNode server:<\/p>\n<div>\n<blockquote><p># delete a file<\/p>\n<p>hadoop fs -rm hdfs:\/\/servername01:8020\/user\/username\/asset\/attempt_1416875798797.txt<\/p>\n<p># delete a folder or file recursively<\/p>\n<p>sudo -u hdfs hadoop dfs -rmr hdfs:\/\/servername01:8020\/user\/oozie\/.Trash\/131007170000\/user\/oozie\/share\/lib\/sqoop\/hive-builtins-0.10.0-cdh4.4.0.jar<\/p><\/blockquote>\n<\/div>\n<p>You may have to run the command more than once if it places the file in the hdfs user\u2019s trash.<\/p>\n<p>The process of deleting missing blocks has been automated with the Hadoop\\Delete Missing Blocks from HDFS Orchestrator Runbook.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on [&#8230;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"class_list":["post-1197","page","type-page","status-publish","hentry"],"jetpack_shortlink":"https:\/\/wp.me\/P1BQ8S-jj","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1197"}],"version-history":[{"count":5,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1197\/revisions"}],"predecessor-version":[{"id":1441,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1197\/revisions\/1441"}],"wp:attachment":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}