Hue

Hue is a graphical user interface for Hadoop. Hue applications are collected into a desktop-style environment and delivered as a Web application, requiring no additional installation for individual users.

Configure Hue

Install Hue

Cloudera Manager distributes Hue in CDH and offers the following services:

  • Hue Server – For small clusters of less than 10 nodes, you can place the Hue service on the same node as the active HDFS NameNode. For larger clusters or for production expect Hue to require more memory – and configure Hue to use MySQL instead of the default PostgreSQL database. To use Hue with HBase, make sure that the HBase Thrift service is installed (see HBase for more information about the HBase Thrift service).

Install and Configure Hue

  1. Browse to Cloudera Manager, select the arrow down next to the “host”
  2. Select “add service”
  3. Select Hue
  4. On “Add Service Wizard” page click on the box under Hue Service
  5. Select Node running the Active Name Node (NN)
  6. Click continue
  7. The service will then install and restart
  8. Deploy client configuration and restart (likely will require a restart)
  9. To configure the service click on Hue from Cloudera Manager site
  10. Click configuration
  11. Search for each configuration below from the search box located under “filters”

Configure an LDAP Backend

On the main Cloudera Manager site, click on Hue, and select Configurations. Click on the Security category.

Configuration Value
Authentication Backend

backend

desktop.auth.backend.LdapBackend
LDAP URL

ldap_url

ldap://company.com

Enable LDAP TLS

use_start_tls

True
Active Directory Domain

nt_domain

company.com
Create LDAP users on login

create_users_on_login

True
LDAP Search Base

base_dn

OU=Organization,DC=company,DC=com
LDAP Bind User Distinguished Name

bind_dn

HUEServerName##
LDAP Bind Password

bind_password

*****
LDAP User Filter

user_filter

objectclass=*
LDAP Username Attribute

user_name_attr

sAMAccountName
LDAP Group Filter

group_filter

objectclass=*
LDAP Group Name Attribute

group_name_attr

cn

For more information, refer to the Hue Installation Guide: http://cloudera.github.io/hue/docs-2.0.1/manual.html

Test Hue

  1. connect to http://hue.servername01:9090
  2. Log in with your Hue credentials

If you fail to connect to the Hue UI there may be a problem with The HBase Thrift server: After you install Hue, you need to make sure that your HBase installation has the HBase Thrift Server installed or you will receive this error from the Hue HBase browser: HBase browser couldn’t connect to localhost:9090

Here is the reason why: In Hue 2.5.0, there is a new feature called “HBase Browser”, it is for user to quickly browsing huge tables and accessing HBase content. You can also create new tables, add data, modify existing cells and filter data with the auto-completing search bar. If you click on “HBase Browser” icon and get “API error: couldn’t connect to localhost:9090”, probably you don’t have a HBase thrift server running.

And how to fix this: In your CM, go to “All Services” -> “hbase1” -> “Instances”, then under “Role Instances”, click on “Add”, choose a node to be “HBase Thrift Server”, then start the Thrift server. By default, Hue connects to itself on port 9090, so make sure Hue knows which node is the Thrift server.

Inspecting the Hue Database

Hue requires an SQL database to store small amounts of data, including user account information as well as history of job submissions and Hive queries. By default, Hue is configured to use either PostgreSQL or an embedded database SQLite for this purpose, and should require no configuration or management by the administrator. However, MySQL is the recommended database to use; this section contains instructions for configuring Hue to access MySQL and other databases.

The default SQLite database used by Hue is located in /usr/share/hue/desktop/desktop.db. You can inspect this database from the command line using the sqlite3 program.

Pig Scripts are located in the following tables:

pig_document

pig_pigscript

For example:

# sqlite3 /var/lib/hue/desktop.db

SQLite version 3.6.22

Enter “.help” for instructions

Enter SQL statements terminated with a “;”

sqlite> .tables

sqlite> .schema auth_user

sqlite> select username from auth_user;

admin

test

sample

sqlite> .quit

Troubleshooting

Hue Cannot See Pig Scripts After Upgrade

Missing Pig scripts: After upgrading from CDH4.7 to CDH5.1.0 the Hue landing page displays this error:

Server Error (500)

Sorry, there’s been an error. An email was sent to your administrators. Thank you for your patience.

More Info:

File Name Line Number Function Name

/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py 111 get_response

/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/views.py 56 home

/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/api.py 37 _get_docs

/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/db/models/sql/compiler.py763 results_iter

/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/db/models/sql/compiler.py818 execute_sql

/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/db/backends/sqlite3/base.py 344 execute

Cause: Upgrade failed to create all the required tables for Hue.

Resolution:

1. Go to the Hue directory:

cd /var/lib/hue

2. Backup the database:

cp desktop.db desktop.db.back

3. Sync the database by running syncdb:

/opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue syncdb –noinput

4. Run the following:

/opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue migrate –delete-ghost-migrations

Hue is Running Slow

On some installations the Hue service shares its node with the Cloudera Manager Service, which can use quite a bit of memory. If Hue is running slow it is possible that the node is too busy. Restart the Cloudera Manager Service and watch memory. Consider reinstalling Hue on another node.

  1. Check if memory is a problem on the node, browse to Cloudera Manager, select the node.
  2. How much memory is used? Is it in the red zone, or yellow? For example, 80% used is generally good for Hue. Too much higher and you will notice slowness.
  3. If too much memory is in use, restart the Cloudera Manager Service.
  4. In Cloudera Manager, click Clusters, and select Cloudera Manager Service.
  5. Within the Cloudera Manager Service, click Actions, Restart.
  6. Make sure the service comes back up. You should notice that the memory used has gone down quite a bit and Hue is a little more responsive.

Hue is not responding – DatabaseError: database is locked

Problem: Hue does not open, the website spins but does not present a page.

Resolution: I had to restart the service twice, on the second time I took Hue completely down for about a minute to make sure the database had stopped completely. I then started the service and Hue was able to connect.

In the log: I see the following:

DatabaseError: database is locked

[14/Oct/2014 13:10:00 -0700] base         ERROR    Internal Server Error: /pig/dashboard/

Traceback (most recent call last):

File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py”, line 111, in get_response

response = callback(request, *callback_args, **callback_kwargs)

File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/oozie/src/oozie/views/dashboard.py”, line 88, in decorate

return view_func(request, *args, **kwargs)

File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/pig/src/pig/views.py”, line 58, in dashboard

hue_jobs = Document.objects.available(PigScript, request.user, with_history=True)

 

File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/models/query.py”, line 445, in get_or_create

return self.get(**lookup), False

File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/models/sql/compiler.py”, line 818, in execute_sql

cursor.execute(sql, params)

File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/backends/sqlite3/base.py”, line 344, in execute

return Database.Cursor.execute(self, query, params)

DatabaseError: database is locked

[14/Oct/2014 15:00:51 -0700] api          ERROR    An error happen while watching the demo running: ‘NoneType’ object has no attribute ‘group’

[14/Oct/2014 15:00:51 -0700] api          ERROR    An error happen while watching the demo running: ‘NoneType’ object has no attribute ‘group’

[14/Oct/2014 15:00:52 -0700] api          ERROR    An error happen while watching the demo running: ‘NoneType’ object has no attribute ‘group’

Hue: Cannot access Spark from Hue

An error happened with the Spark Server:

HTTPConnectionPool(host=’localhost’, port=8090): Max retries exceeded with url: /jobs (Caused by <class ‘socket.error’>: [Errno 111] Connection refused)

Under Hue Configuration (within Cloudera Manager) / Advanced / Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini

Add the following section:

[spark]

# URL of the REST Spark Job Server.

server_url=http://spark.rest.servername01:18080/

See the Configure Spark section for more information.

Hue: Cannot run Pig Scripts from Hue to YARN (using MRv2)

Resolution: YARN’s resources were set too low (memory was set to 50 MB, when it should have been set to 1 GB).

I tried to narrow down the problem I’m having with running Pig scripts through Hue and YARN. Here is what I do:

1. Create a Pig Script in Hue:

offers = LOAD ‘/tmp/datafile.txt’ USING PigStorage AS (name:CHARARRAY);

The script succeeds.

2. However, when I add a dump to the script, like this:

offers = LOAD ‘/tmp/datafile.txt’ USING PigStorage AS (name:CHARARRAY);

dump offers;

To see the log, click on the status of the Pig job in the top right corner, it will open its Oozie workflow, then click on the Pig action on the log icon on the right. You should have more interesting logs!

For example, in the log I see this line: 2014-08-14 16:24:35,692 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher  – More information at: http://node.servername05:50030/jobdetails.jsp?jobid=job_1408018429315_0002

The script never moves past 0% and repeats Heat beat over and over again. The job displays in Oozie but never goes anywhere (the job is stuck on RUNNING). This same script worked in CDH 4.7 using MRv1. I can’t find much in the logs to help identify a problem, it just never finishes.

Here is an excerpt from the job’s log:

2014-08-19 14:31:01,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – More information at: http://node.servername05:50030/jobdetails.jsp?jobid=job_1408403413938_0014

2014-08-19 14:31:01,227 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 0% complete

Heart beat

 

Did not work:

  • Reinstall the Oozie sharelib, 1. Stop Oozie, 2. Under Actions, select Install Sharelib 3. Make sure that the sharelib is using the one for Yarn: oozie-sharelib-yarn.tar.gz
  • Click on ‘Hue server’, stop it, then do ‘Synchronize Database’ and restart Hue.
  • I applied the change from step #5 in the document: http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/, but unfortunately, it did not help. But this looks very similar to my problem.

For information on how to configure Yarn, see Configure Yarn, specifically, Configure Yarn Resources.

Hue: Cannot Open Workflow Editor

Problem: Hue cannot edit an Oozie workflow.

Open Hue, click Workflow, and select Editor.

Receive the error: Server Error (500)

Resolution: After some debugging (see below), on the node that is running Hue can you run this from a bash shell:

/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/bin/hue migrate –delete-ghost-migrations

Results from the migrate command:

Running migrations for desktop:

– Migrating forwards to 0007_auto__add_documentpermission__add_documenttag__add_document.

> desktop:0007_auto__add_documentpermission__add_documenttag__add_document

– Loading initial data for desktop.

Installed 0 object(s) from 0 fixture(s)

 

Running migrations for oozie:

– Migrating forwards to 0025_change_examples_path_format.

> oozie:0022_auto__chg_field_mapreduce_node_ptr__chg_field_start_node_ptr

> oozie:0023_auto__add_field_node_data__add_field_job_data

> oozie:0024_auto__chg_field_subworkflow_sub_workflow

> oozie:0025_change_examples_path_format

– Migration ‘oozie:0025_change_examples_path_format’ is marked for no-dry-run.

– Loading initial data for oozie.

Installed 0 object(s) from 0 fixture(s)

 

south.exceptions.GhostMigrations:

! These migrations are in the database but not on disk:

<oozie: 0022_change_examples_path_format>

! I’m not trusting myself; either fix this yourself by fiddling

! with the south_migrationhistory table, or pass –delete-ghost-migrations

! to South to have it delete ALL of these records (this may not be good).

The error points to a problem in Oozie:

[30/Jun/2014 09:14:17 -0700] base         ERROR    Internal Server Error: /oozie/list_workflows/

Traceback (most recent call last):

File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py”, line 111, in get_response

response = callback(request, *callback_args, **callback_kwargs)

File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/apps/oozie/src/oozie/views/editor.py”, line 64, in list_workflows

data = Document.objects.available(Workflow, request.user)

File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/models/sql/compiler.py”, line 818, in execute_sql

cursor.execute(sql, params)

File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/backends/sqlite3/base.py”, line 344, in execute

return Database.Cursor.execute(self, query, params)

DatabaseError: no such table: desktop_documenttag

[30/Jun/2014 09:14:17 -0700] middleware   INFO     Processing exception: no such table: desktop_documenttag: Traceback (most recent call last):

File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py”, line 111, in get_response

response = callback(request, *callback_args, **callback_kwargs)

File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/backends/sqlite3/base.py”, line 344, in execute

return Database.Cursor.execute(self, query, params)

DatabaseError: no such table: desktop_documenttag

[30/Jun/2014 09:14:16 -0700] access       INFO     192.168.200.157 admin – “GET /oozie/list_workflows/ HTTP/1.1”

Hue: Cannot Create a New Workflow

User receives a 500 Server error when they click on the Workflow Editor and attempt to Create a new Workflow.

Error: User: httpfs is not allowed to impersonate hue (error 500)

On Hue’s web UI we see the following: 500 Server error: Sorry, there’s been an error. An email was sent to your administrators. Thank you for your patience.

Within Hue’s log file we see:

sudo less /var/log/hue/runcpserver.log

[12/Sep/2017 14:42:58 -0700] connectionpool INFO Resetting dropped connection: servername01
[12/Sep/2017 14:42:58 -0700] middleware INFO Processing exception: RemoteException: User: httpfs is not allowed to impersonate hue (error 500): Traceback (most recent call last):
File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/core/handlers/base.py”, line 112, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/db/transaction.py”, line 371, in inner
return func(*args, **kwargs)

WebHdfsException: RemoteException: User: httpfs is not allowed to impersonate hue (error 500)

Narrow down the error within httpfs:

less /var/log/hadoop-httpfs/hadoop-cmf-hdfs-HTTPFS-httpfs.servername01.log.out

2017-09-12 15:50:27,723 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hue (auth:PROXY) via httpfs (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: httpfs is not allowed to impersonate hue

Resolution:

The impersionation account error to HttpFS gave me the clue. We set proxy groups in HDFS to allow us to tighten permissions on this service. Permissions, in the form of an impersonation account, were added to protect our HttpFS service from unauthorized read/writes.

Find the hadoop.proxyuser.httpfs.groups configuration in HDFS and add hue.