Developer's Closet A place where I can put my PHP, SQL, Perl, JavaScript, and VBScript code.

1Jul/140
HBase All Regions in Transition: state=FAILED_OPEN

After I added a jar file to the HBase Master I had a problem where regions failed to transition to a RegionServer. Below are the errors; removing the jar file from the hbase/lib folder resolved this problem (full path to jar: /opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hbase/lib/). What tipped me off was the missing class definition: Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/ipc/CoprocessorProtocol.

Failed open of region=REGION.NAME,,4194066667839.6ea7d7ff9276f9c0e9b126c73e25bc54., starting to roll back the global memstore size.
java.lang.IllegalStateException: Could not instantiate a region instance.
at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3970)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4276)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4249)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4205)
at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4156)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:475)
at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:140)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor7.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3967)
... 10 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/ipc/CoprocessorProtocol
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
...

9:10:19.721 AM INFO org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler
Opening of region {ENCODED => 6ea7d7ff9276f9c0e9b126c73e25bc54, NAME => 'REGION.NAME,,4194066667839.6ea7d7ff9276f9c0e9b126c73e25bc54.', STARTKEY => '', ENDKEY => ''} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 28

24Jun/140
Cloudera Manager HBase Check HFile Version

An under-documented feature in Cloudera Manager is the HBase Check HFile Version. When upgrading from CDH 4.7 to CDH 5.0, I ran across the instructions that the HBase upgrade will not be successful if there are any HFiles with version 1 present. Run "Check HFile Version" from the HBase service Actions menu to ensure that HBase is ready for the upgrade.

After a check if HFiles with v1 are present, and "Process (###) has reached expected state", you are looking for the message that No files with v1 were found and HBase can be upgraded. Otherwise HBase regions will have to be compacted. I'll post more on that later.

In Stderr, look for the following:

INFO util.HFileV1Detector: Count of HFileV1: 0
INFO util.HFileV1Detector: Count of corrupted files: 0
INFO util.HFileV1Detector: Count of Regions with HFileV1: 0
12Jun/140
Cannot Start HBase Master: SplitLogManager: Error Splitting

I could not start HBase within Cloudera Manager, the service reported errors. I was initially confused because I could start the Master when the RegionServers were stopped, but as soon as I started a RegionServer, the master went down. I tracked this problem down to an unexpected server reboot of the node running the HBase Master. After the Master restarted, HBase was not able to continue reading from the transaction log because it had become unusable (corrupt). I had to delete the broken file before restarting the Master node.

Digging through the logs: sudo tail /var/log/hbase/hbase-cmf-hbase1-MASTER-ServerName.log.out, I discovered:

java.io.IOException: error or interrupted while splitting logs in [hdfs://ServerName:8020/hbase/.logs/ServerName,60020,1393982440484-splitting] Task = installed = 1 done = 0 error = 1

In the log file, look for the file that cannot be split:

hdfs://ServerName:8020/hbase/.logs/ServerName,60020,1393982440484-splitting

Then search hdfs for the file:

sudo -u hdfs hadoop fs -ls /hbase/.logs

Note that the file is 0 KB. Next, move the offending file:

sudo -u hdfs hadoop fs -mv /hbase/.logs/ServerName,60020,1393982440484-splitting /tmp/ServerName,60020,1393982440484-splitting.old

Restart the HBase Master service. The splitting log file can be replayed back to recover any lost data, but I didn't look into that because there was no data to recover.

Note: Here is a fantastic HBase command to identify and fix any problems with HBase:

sudo -u hbase hbase hbck -fix