HBase: ReplicationLogCleaner: Failed to get stat of replication rs node

The oldWALs folder in HBase has been growing for the past few days – the folder is 1 TB and growing – the oldWALs are not being deleted. Looking into the Master logs, I discover the following exception:

Error:

May 25, 9:13:17.118 AM ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper
ZooKeeper getData failed after 4 attempts
May 25, 9:13:17.118 AM WARN org.apache.hadoop.hbase.zookeeper.ZKUtil
replicationLogCleaner-0x65de7ccd821943e, quorum=zk.server01:2181,zk.server02:2181,zk.server03:2181, baseZNode=/hbase Unable to get data of znode /hbase/replication/rs
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)

May 25, 9:13:17.118 AM ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher
replicationLogCleaner-0x65de7ccd821943e, quorum=zk.server01:2181,zk.server02:2181,zk.server03:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/replication/rs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)

May 25, 9:13:17.119 AM WARN org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner
ReplicationLogCleaner received abort, ignoring. Reason: Failed to get stat of replication rs node
May 25, 9:13:17.119 AM WARN org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner
Failed to read zookeeper, skipping checking deletable files

Resolution:

Restart the HBase Master service should correct this problem. If you are using HA (more than one HBase Master) you will not notice an impact in service. If not, only DDL changes will be impacted, so the risk is low. After a restart the ReplicationLogCleaner service should be able to connect to ZooKeeper and continue. The oldWALs should clear momentarily.

Leave a Reply