{"id":1156,"date":"2016-12-14T14:59:54","date_gmt":"2016-12-14T22:59:54","guid":{"rendered":"http:\/\/www.developerscloset.com\/?page_id=1156"},"modified":"2018-05-14T15:12:09","modified_gmt":"2018-05-14T23:12:09","slug":"solr","status":"publish","type":"page","link":"https:\/\/www.developerscloset.com\/?page_id=1156","title":{"rendered":"Solr"},"content":{"rendered":"<p><a href=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/solr-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1157 alignnone\" src=\"http:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/solr-1-300x152.png\" alt=\"\" width=\"225\" height=\"114\" srcset=\"https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/solr-1-300x152.png 300w, https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/solr-1-768x388.png 768w, https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/solr-1-1024x517.png 1024w, https:\/\/www.developerscloset.com\/wp-content\/uploads\/2018\/05\/solr-1.png 1692w\" sizes=\"auto, (max-width: 225px) 100vw, 225px\" \/><\/a><\/p>\n<p>Solr, known as Cloudera Search by Cloudera, built on Lucene, is a distributed service engine, used for indexing and searching data stored in HDFS.<\/p>\n<div class=\"toc-macro client-side-toc-macro conf-macro output-block hidden-outline\">\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69ea2435d1025\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69ea2435d1025\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.developerscloset.com\/?page_id=1156\/#Configure_Solr\" >Configure Solr<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.developerscloset.com\/?page_id=1156\/#Install_Solr\" >Install Solr<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.developerscloset.com\/?page_id=1156\/#Configure_Solr-2\" >Configure Solr<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.developerscloset.com\/?page_id=1156\/#Creating_Your_First_Solr_Collection\" >Creating Your First Solr Collection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.developerscloset.com\/?page_id=1156\/#Adding_another_Collection_with_Replication\" >Adding another Collection with Replication<\/a><\/li><\/ul><\/nav><\/div>\n<h1 id=\"Solr-ConfigureSolr\"><span class=\"ez-toc-section\" id=\"Configure_Solr\"><\/span>Configure Solr<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<h2 id=\"Solr-InstallSolr\"><span class=\"ez-toc-section\" id=\"Install_Solr\"><\/span>Install Solr<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Cloudera Manager distributes Solr in CDH and offers the following services:<\/p>\n<ul>\n<li><strong>Solr Server &#8211;\u00a0<\/strong>Add a Solr Server to a host that is\u00a0<u>not<\/u>\u00a0hosting ZooKeeper or Oozie as\u00a0<u>Solr will take a lot of CPU and memory<\/u>. You can collocate a Solr server with a YARN NodeManager, HBase RegionServer,\u00a0and a HDFS DataNode. When collocating with NodeManagers, be sure that the resources of the machine are not oversubscribed.\u00a0Due to the amount of memory used by RegionServers, if you collocate a Solr Server with a RegionServer, make sure you calculate memory carefully to not oversubscribe the server. \u00a0Also\u00a0you should\u00a0not install a Solr Server on a node running a YARN ResourceManager or HBase Master.<\/li>\n<li><strong>Gateway<\/strong>\u00a0&#8211; Add a Solr Gateway to all APP servers where a CLI and network map is required.<\/li>\n<\/ul>\n<h2 id=\"Solr-ConfigureSolr.1\"><span class=\"ez-toc-section\" id=\"Configure_Solr-2\"><\/span>Configure Solr<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<div style=\"max-width: 100%;margin: auto;overflow: hidden\">\n<div style=\"width: 100%;overflow: auto\">\n<table>\n<thead class=\"tableFloatingHeaderOriginal\">\n<tr class=\"tablesorter-headerRow\" role=\"row\">\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Configuration<\/div>\n<\/th>\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Description<\/div>\n<\/th>\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Value<\/div>\n<\/th>\n<th class=\"confluenceTh tablesorter-header sortableHeader tablesorter-headerUnSorted\" role=\"columnheader\" scope=\"col\">\n<div class=\"tablesorter-header-inner\">Calculation<\/div>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr role=\"row\">\n<td class=\"confluenceTd\"><strong>Java Heap Size of Solr Server<\/strong><\/td>\n<td class=\"confluenceTd\">Maximum size in bytes for the Java Process heap memory. Passed to Java -Xmx.<\/td>\n<td class=\"confluenceTd\"><u>1 GB<\/u><\/td>\n<td class=\"confluenceTd\"><\/td>\n<\/tr>\n<tr role=\"row\">\n<td class=\"confluenceTd\" colspan=\"1\"><strong>Java Direct Memory Size of Solr Server<\/strong><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">Maximum amount of off-heap memory in bytes that may be allocated by the Java process. Passed to Java -XX:MaxDirectMemorySize. If unset, defaults to the size of the heap.<\/td>\n<td class=\"confluenceTd\" colspan=\"1\"><u>1 GB<\/u><\/td>\n<td class=\"confluenceTd\" colspan=\"1\">The amount of data in memory to be indexed and available to a search. In some cases can be MUCH higher than the Java heap. See my notes below.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>Notes: To ensure an appropriate amount of memory, consider your requirements and experiment in your environment. In general:<\/p>\n<ul>\n<li>4 GB is sufficient for some smaller loads or for evaluation.<\/li>\n<li>12 GB is sufficient for some production environments.<\/li>\n<li>48 GB is sufficient for most situations.<\/li>\n<\/ul>\n<p>Here is Cloudera\u2019s current Solr guide:\u00a0<a class=\"external-link\" href=\"http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CDH5\/latest\/Search\/Cloudera-Search-User-Guide\/Cloudera-Search-User-Guide.html\" rel=\"nofollow\">http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CDH5\/latest\/Search\/Cloudera-Search-User-Guide\/Cloudera-Search-User-Guide.html<\/a><\/p>\n<p>To use Solr for the first time you will have to create Collections. Here is how:\u00a0<a class=\"external-link\" href=\"http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CDH5\/latest\/Search\/Cloudera-Search-Installation-Guide\/csig_deploy_search_solrcloud.html\" rel=\"nofollow\">http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CDH5\/latest\/Search\/Cloudera-Search-Installation-Guide\/csig_deploy_search_solrcloud.html<\/a>, look under the heading: Creating Your First Solr Collection. You will then be able to create a new core.<\/p>\n<p>Reference for an article about managing distributed Solr Servers:\u00a0<a class=\"external-link\" href=\"http:\/\/blog.mgm-tp.com\/2010\/09\/hadoop-log-management-part4\/\" rel=\"nofollow\">http:\/\/blog.mgm-tp.com\/2010\/09\/hadoop-log-management-part4\/<\/a><\/p>\n<p>In regards to resourcing the system for Solr, here is good insight from an expert:<\/p>\n<p>Whether or not you separate the Solr servers into their own cluster or collocate Solr with your existing Hadoop\/YARN nodes depends on the size of your search index. If the index fits into one Core, I would recommend using a dedicated Solr-Server separated from the Hadoop-Cluster.<\/p>\n<p>If on the other hand the index is too large for a single core and you need a kind of sharding, you might be able to reuse your cluster for Solr. But first you need to evaluate the use of your Hadoop Cluster. If the Cluster is also heavily used for Map\/Reduce-Jobs, you will not have enough resources for Solr.<\/p>\n<p>Bottom line: If your Hadoop cluster is primarily used for storage and has only a light Map\/Reduce load, you can reuse it for running Solr. In all other cases you are better off with a separate Solr Cluster.<\/p>\n<h1 id=\"Solr-CreatingYourFirstSolrCollection\"><span class=\"ez-toc-section\" id=\"Creating_Your_First_Solr_Collection\"><\/span>Creating Your First Solr Collection<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>By default, the Solr server comes up with no collections. Make sure that you create your first collection using the\u00a0instancedir\u00a0that you provided to Solr in previous steps by using the same collection name. (numOfShards\u00a0is the number of SolrCloud shards you want to partition the collection across. The number of shards cannot exceed the total number of Solr servers in your SolrCloud cluster):<\/p>\n<div>\n<blockquote><p>solrctl collection &#8211;create collection1 -s {{numOfShards}}<\/p><\/blockquote>\n<\/div>\n<p>You should be able to check that the collection is active. For example, you should be able to navigate to:\u00a0<a class=\"external-link\" href=\"http:\/\/servername:8983\/solr\/collection1\/select?q=*%3A*&amp;wt=json&amp;indent=true\" rel=\"nofollow\">http:\/\/<em>ServerName<\/em>:8983\/solr\/collection1\/select?q=*%3A*&amp;wt=json&amp;indent=true<\/a>\u00a0and verify that the collection is active. Similarly, you should also be able to observe the topology of your SolrCloud using a URL similar to:\u00a0<a class=\"external-link\" href=\"http:\/\/servername:8983\/solr\/#\/~cloud\" rel=\"nofollow\">http:\/\/<em>ServerName<\/em>:8983\/solr\/#\/~cloud<\/a><\/p>\n<p>Reference:\u00a0<a class=\"external-link\" href=\"http:\/\/blog.cloudera.com\/blog\/2013\/11\/how-to-add-cloudera-search-to-your-cluster-using-cloudera-manager\/\" rel=\"nofollow\">http:\/\/blog.cloudera.com\/blog\/2013\/11\/how-to-add-cloudera-search-to-your-cluster-using-cloudera-manager\/<\/a><\/p>\n<h1 id=\"Solr-AddinganotherCollectionwithReplication\"><span class=\"ez-toc-section\" id=\"Adding_another_Collection_with_Replication\"><\/span>Adding another Collection with Replication<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<p>To support scaling for query load, create a second collection with replication. Having multiple servers with replicated collections distributes the request load for each shard. Create one shard cluster with a replication factor of two. Your cluster must have at least two running servers to support this configuration, so ensure Cloudera Search is installed on at least two servers before continuing with this process. A replication factor of two causes two copies of the index files to be stored in two different locations.<\/p>\n<p>1.\u00a0Generate the config files for the collection:<\/p>\n<blockquote><p>solrctl instancedir &#8211;generate $HOME\/solr_configs2<\/p><\/blockquote>\n<p>2. Upload the instance directory to ZooKeeper:<\/p>\n<blockquote><p>solrctl instancedir &#8211;create collection2 $HOME\/solr_configs2<\/p><\/blockquote>\n<p>3. Create the second collection:<\/p>\n<blockquote>\n<div>solrctl collection &#8211;create collection2 -s 1 -r 2<\/div>\n<\/blockquote>\n<p>Verify the collection is live and that your one shard is being served by two nodes. For example, you should receive content from:\u00a0<a class=\"external-link\" href=\"http:\/\/servername:8983\/solr\/#\/~cloud\" rel=\"nofollow\">http:\/\/<em>ServerName<\/em>:8983\/solr\/#\/~cloud<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Solr, known as Cloudera Search by Cloudera, built on Lucene, is a distributed service engine, used for indexing and searching data stored in HDFS. Configure [&#8230;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"class_list":["post-1156","page","type-page","status-publish","hentry"],"jetpack_shortlink":"https:\/\/wp.me\/P1BQ8S-iE","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1156","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1156"}],"version-history":[{"count":1,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1156\/revisions"}],"predecessor-version":[{"id":1159,"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=\/wp\/v2\/pages\/1156\/revisions\/1159"}],"wp:attachment":[{"href":"https:\/\/www.developerscloset.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1156"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}