Building an Apache Solr Custom Index in Drupal

Apache Solr is a powerful search engine which can meet complex search requirements. While Drupal simple search needs can be met out-of-the-box, complex requirements such that involve multiple data structure with one-to-many and many-to-many relationships would need custom indexs to be built to meet the specific needs.

This project I recently worked on was one such. I found the following links helpful:

          1. How do I make a custom entity update or create automatically update the solr index?

          2. Using apachesolr to index custom data

Here’s a step-by-step process to build a custom Index in Apache Solr.

Step 1: Define a Custom index hook_entity_info_alter() is used by Modules to alter the information that defines an entity.

In our case as the information in the custom table was needed only for indexing  in Apache Solr, directly using hook_entity_info_alter() alter we defined an Index & a Bundle name for a non existing entity. Using this entity we also define a custom index.

/* Implementation of hook_entity_info_alter() */ function mymodule_entity_info_alter(&$entity_info) { // add my custom mysql table to entity $entity_info['index_name']= array('apachesolr' => array('index' => TRUE),'label' => 'Index Name'); $entity_info['index_name']['bundles']['bundle_name'] = array('apachesolr' => array('index' => TRUE), 'label' => 'Bundle Name'); }

Step 2: hook_apachesolr_entity_info_alter() be available to us as a custom index in the Apache Solr index settings at admin/config/search/apachesolr.

This is a hook is used to define additional  information to the custom index defined in the previous step. The additional information include a status callback, document callback & reindex callback.

Status callback:

Normally in apache solr for content types Apache Solr module maintains a table that sets the status of all nodes in the system that are indexed. This table has the fields: entity_type, entity_id, bundle, status, changed. The status field is used to track if the row needs update in Apache Solr or not.

In our case we have added a custom column to our custom table to manage the same. Using a callback we manage the status of each row that needs to be indexed.

The status callback defined is called when the custom entity needs an update of the index. Status set to 0 if it needs index & 1 if it is up to date. This information is stored. Typically apache solr automatically updates this information on node event. In the case of a custom table the status needs to be updated based on custom events.

Document callback:

Defines an array $document and helps us build the custom apache solr field index. This function gets called for every row that is getting indexed - mapping fields in our custom table to apache solr index. here we can also add additional computed fields that may be needed only at the time of search. Refer field definitions in the schema.xml that ships with Drupal's Apache Solr module before defining field types using 'ts', 'is', 'tm', etc.,

Remember while defining the document, each row must be associated with an unique id.

$document->id = $document->id.$entity->entity_id;  (the document id must be unique because this only mapping reference of apache solr indexed rows )

This document callback is called by functions related to indexing in Apache Solr listed below apachesolr_index_entity_to_documents($item, $env_id) - function that Loads entity queued for indexing and converts into one or more documents that are sent to the Apache Solr server for indexing.

apachesolr_index_entities_document($row, $entity_type, $env_id) in file apachesolr.index.inc

You need to add the attached patch to these indexing function to make custom entities work. I have also created an issue for this patch at https://drupal.org/node/2201309

Reindex  callback:

This is used to add any additional functionality at the time of reindexing.

/** * Implementation of * hook_apachesolr_entity_info_alter() */ function mymodule_apachesolr_entity_info_alter(&$entity_info) { // define custom index in entity // REQUIRED VALUES // myentity should be replaced with user/node/custom entity $entity_info['index_name'] = array(); // Set this entity as indexable
$entity_info['index_name']['indexable'] = TRUE; // Validate each entity if it can be indexed or not. Multiple callbacks are // allowed. If one of them returns false it won't be indexed $entity_info['index_name']['status callback'][] = 'apachesolr_index_index_name_status_callback'; // Build up a custom document.
$entity_info['index_name']['document callback'][] = 'apachesolr_index_index_name_solr_document'; // What to do when a reindex is issued. Most probably this will reset all the // items in the index_table $entity_info['index_name']['reindex callback'] = 'apachesolr_index_index_name_solr_reindex'; // optional values // Index in a separate table? Useful for huge datasets. $entity_info['index_name']['index_table'] = 'your custom table name'; }