Sqoop

Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. The version supported by Cloudera Manager is Sqoop 2.

Configure Sqoop

Install Sqoop

Sqoop requires a Sqoop 2 Server, we often collocate the Sqoop 2 Server with Oozie or if necessary on the node running the HDFS NameNode. Try to keep the Sqoop service off nodes running YARN NodeManagers and HBase RegionServers as they will use too much memory.

  1. From Cloudera Manager, click Add a New Service.
  2. Select the Sqoop 2 service and add the service to a node, preferably a node running the HDFS NameNode or Oozie. Keep the Sqoop service off nodes running YARN NodeManagers and HBase RegionServers.

Configure Sqoop

Configuration
Description
Value
Calculation
Sqoop 2 Server Metastore Directory Directory where the Sqoop 2 Server places its metastore data. This is used only when Sqoop Repository Database Type is Derby. /space1/sqoop2 Do not allow the directory to be on /root – unexpected file growth.
Java Heap Size of Sqoop 2 Server in Bytes Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx. 1 GB A small amount of heap is required, use the largest file size to be ingested as an indication of heap size.
Sqoop 2 Server Advanced Configuration Snippet (Safety Valve) for sqoop.properties A string to be inserted into sqoop.properties for this role only. org.apache.sqoop.connector.autoupgrade=true Set Sqoop to Allow an Auto Upgrade.

Troubleshooting

Sqoop Server Startup Failure: Upgrade required but not allowed

Problem: After an upgrade from CDH 5.0.2 to CDH 5.0.3, Sqoop failed to start with the following error: Server startup failure, Connector registration failed, Upgrade required but not allowed – Connector: generic-jdbc-connector.

Resolution: Add the following property to the Sqoop 2 Server Advanced Configuration Snippet (Safety Valve) for sqoop.properties, under Cloudera Manager, Sqoop Service, Configuration, Sqoop 2 Server Default Group, Advanced:

org.apache.sqoop.connector.autoupgrade=true

After the upgrade has completed successfully, the property can be removed.

Log File: /var/log/sqoop2/sqoop-cmf-sqoop-SQOOP_SERVER-servername01.log.out

Server startup failure

org.apache.sqoop.common.SqoopException: CONN_0007:Connector registration failed

at org.apache.sqoop.connector.ConnectorManager.registerConnectors(ConnectorManager.java:236)

at org.apache.sqoop.connector.ConnectorManager.initialize(ConnectorManager.java:197)

at org.apache.sqoop.connector.ConnectorManager.initialize(ConnectorManager.java:145)

Caused by: org.apache.sqoop.common.SqoopException: JDBCREPO_0026:Upgrade required but not allowed – Connector: generic-jdbc-connector

at org.apache.sqoop.repository.JdbcRepository$3.doIt(JdbcRepository.java:190)

at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:90)

at org.apache.sqoop.repository.JdbcRepository.doWithConnection(JdbcRepository.java:61)

Sqoop does not start on the  Hadoop cluster after a Sqoop service restart

Resolution: Recreating Sqoop Ddatabase after that Sqoop2 start

Log File:

Can’t fetch repository structure version.
org.apache.commons.dbcp.SQLNestedException: Borrow prepareStatement from pool failed
at org.apache.commons.dbcp.PoolingConnection.prepareStatement(PoolingConnection.java:113)
at org.apache.commons.dbcp.DelegatingConnection.prepareStatement(DelegatingConnection.java:281)
at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:313)

Caused by: java.sql.SQLSyntaxErrorException: Schema ‘SQOOP’ does not exist
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)

Caused by: java.sql.SQLException: Schema ‘SQOOP’ does not exist
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)

Caused by: ERROR 42Y07: Schema ‘SQOOP’ does not exist
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.getSchemaDescriptor(Unknown Source)
at org.apache.derby.impl.sql.compile.QueryTreeNode.getSchemaDescriptor(Unknown Source)