Real-Time Search With MongoDB and Solr

My latest project is 酷查网 Koolcha.com. It's using 2 major technologies:

Mean.js works as the skeleton of the website. It handles all the business logic(Node.js), data storage(MongoDB), routing(Express) and user interface(AngularJS). Solr serves for a very important feature: search, of course, which allows user to find desired information more efficiently. Try it here.

Rationale of choosing Solr:

MongoDB provides text indexes to support text search of string content in documents of a collection. Then Why do I bother to use Solr?

MongoDB supports text search for various languages. text indexes drop language-specific stop words (e.g. in English, “the”, “an”, “a”, “and”, etc.) and uses simple language-specific suffix stemming. For a list of the supported languages, see Text Search Languages

And it does NOT support Chinese!

Solr, on the other hand, supports Chinese and many more languages.

Simple decision to make.

Prerequisite:

Install MongoDB, Apache Solr and mongo-connector

Note: The example in this article only works on local machine.

Setup Mongo Replica Set

mongo-connector replicates operations from the MongoDB oplog, so a replica set must be running before startup.

The MongoDB Manual has a thorough explanation of how to do this. For development purposes, it's probably sufficient to use a single mongod instance:

1. Update mongod.conf

$ vim /usr/local/etc/mongod.conf

Update dbPath to your MongoDB. In my case it's: /data/db/

2. Update host file

Add ComputerName to /etc/hosts

127.0.0.1 <ComputerName>  

3. Start MongoDB with a replica set

$ mongod --replSet rs0
$ mongo
> rs.initiate()
> rs.status()

Setup Solr

  • To learn more about Solr, please go to Solr Resources
  • Follow the Docs using mongo-connector with Solr

1. Start Solr and Create Core

$ cd solr-5.3.0
$ bin/solr start               # this starts solr
$ bin/solr create -c koolcha   # this creates a document collection called "koolcha"

To verify, go to:
http://localhost:8983/solr/#/koolcha

You can also stop and restart Solr by running the commands

$ bin/solr stop
$ bin/solr restart

2. Config solrconfig.xml

$ vim server/solr/koolcha/conf/solrconfig.xml

Add the following line to enable LukeRequestHandler

<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />  

To verify the change, go to:
http://localhost:8983/solr/#/koolcha/files?file=solrconfig.xml

3. Config Solr Schema

$ vim server/solr/koolcha/conf/managed-schema

Add

<!-- Mongo-Connector -->  
<field name="_ts" type="long" indexed="true" stored="true" />  
<field name="ns" type="string" indexed="true" stored="true"/>

<!-- Custom Fields, Search Against -->  
<field name="title" type="string" indexed="true" stored="true"/>  
<field name="description" type="string" indexed="true" stored="true"/>  
<field name="category" type="string" indexed="true" stored="true"/>  
<field name="city" type="string" indexed="true" stored="true"/>  

To verify the change:

4. Allow CORS (Optional)

To use Solr Restful service, this allows CORS. E.g. Make a request from http://localhost:3000 (Mean.js) to http://localhost:8983 (Solr)

$ vim solr-5.3.0/server/etc/webdefault.xml

Add

<filter>  
  <filter-name>cross-origin</filter-name>
  <filter-class>org.eclipse.jetty.servlets.CrossOriginFilter</filter-class>
  <init-param>
    <param-name>chainPreflight</param-name>
    <param-value>false</param-value>
  </init-param>
</filter>

<filter-mapping>  
  <filter-name>cross-origin</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

Connect Mongo and Solr with mongo-connector

mongo-connector creates a pipeline from a MongoDB cluster to one or more target systems, such as Solr, Elasticsearch, or another MongoDB cluster. It synchronizes data in MongoDB to the target then tails the MongoDB oplog, keeping up with operations in MongoDB in real-time.

The above image is pretty self-explanatory how mongo-connector works.

Once the Solr and MongoDB replica set is running, you may start mongo-connector. The simplest invocation resembles the following:

mongo-connector -m <mongodb server hostname>:<replica set port> \  
                -t <replication endpoint URL, e.g. http://localhost:8983/solr> \
                -d <name of doc manager, e.g., solr_doc_manager>

In my case:

mongo-connector -v -m localhost:27017 -n <DB-Name>.<Collection-Name> \  
                   -t http://localhost:8983/solr/koolcha --auto-commit-interval=0 \
                   -d solr_doc_manager --unique-key=id

mongo-connector has many other options besides those demonstrated above. To get a full listing with descriptions, try mongo-connector --help or read Config Options.

Happy searching!

Cont.

[1] If you're interested in How Solr index with Chinese, read here.

Yang Zhao

Read more posts by this author.


Comment