Sitecore content search and LINQ (part 1)

With the rise of cloud services and today’s requirements for data retrieval, traditional relational- and tree data models with their query structures tend to be replaced with non-relational models and search-based query mechanisms. Within Sitecore this shift is noticable with the introduction of item buckets and the use of Sitecore contentsearch for larger amounts of data. Using a search engine based storage- and search mechanism does not only improve performance drastically but also improves flexibility of data retrieval. Since the release of Sitecore 7, Sitecore wants developers to favour indexes over the database for performance reasons. See http://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/06/sitecore-7-poco-explained.aspx

It is well known Sitecore may run into performance issues when a lot of items are stored in the content tree and searched by querying the traditional way. But also consider a scenario that’s not uncommon today:
Let’s say we have a user that got a gift card and wants to browse our webshop to see what’s available for the gift card amount. So he or she wants to enter something that translates to a query like “show me all products with a price tag less than …” or “all products added to the shop since …”. This cuts right through all categories so we’d have to query the whole tree structure, filtering out all products that do not apply. With a complex tree structure and a large number of product items, even Sitecore’s fast query will give performance issues.

By default Sitecore uses the open source Lucene engine but this can be replaced by the SOLR engine provider that comes with Sitecore, or another custom or 3rd party provider. When converting a folder in the content tree to an item bucket, it will use an index to retrieve items when you enter a search expression in the bucket’s search box. However you don’t need to use item buckets for using contentsearch; it works just as well with items stored in a conventional tree structure.

When showing our products on a public site, the information will normally come from the Sitecore web database, on which also a Lucene index is defined. However when using contentsearch, Sitecore recommends creating your own custom index instead of using the web index for a number of reasons:

  • The web index contains references for (almost) all items in the web database, decreasing performance;
  • By default the index does not store whole values, so you’d still have to retrieve individual items from the database;

By creating your own custom index (or multiple indices) you can specify to only contain the items you need, and store necessary data in the index so you don’t have to retrieve it from the database.

Creating a custom index

For creating a custom index you need to create two configuration files based on the web index. I’d recommend also installing Luke or some similar index viewer for troubleshooting. Luke is a Java application (.jar) so you need to install Java also.
There used to be a blog showing the minimum needed to create a custom index but that seems to be no longer online. So I’ll outline the process here:

  • Make a copy of these files in App_Config\Include in the website folder:
    • ContentSearch.Lucene.Index.Web.config ==> rename the copy to Sitecore.ContentSearch.Lucene.Index.MyIndex.config.
    • ContentSearch.Lucene.DefaultIndexConfiguration.config ==> rename the copy to Sitecore.ContentSearch.Lucene.MyIndexConfiguration.config.
  • In Sitecore.ContentSearch.Lucene.Index.MyIndex.config (= index definition):
    • Index node: rename the id from sitecore_web_index to sitecore_myindex.
    • Configuration node: set the ref atribute to “contentSearch/indexConfigurations/myIndexConfiguration”. We will create our own configuration in the other file under this XML path.
    • Note the settings like publishing strategy, database and root for the search. These can be left as-is or changed, i.e. use sitecore/Content/Products folder as root to only include items from this folder in the content tree. Leave the strategy to be “onPublishEndAsync”, which will update the index when publishing.
  • In Sitecore.ContentSearch.Lucene.MyIndexConfiguration.config (=index configuration)
    • Instead of keeping all sections in this file you can replace a lot of them with references to the original in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file. See Sitecore documentation and references for this.
    • Add the following directly under the <sitecore> tag:
<!--This section for database is so that the indexes get updated in any environment when an item changes -->
  <databases>
    <database id="web" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
      <Engines.HistoryEngine.Storage>
        <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
          <param connectionStringName="$(id)" />
          <EntryLifeTime>30.00:00:00</EntryLifeTime>
        </obj>
      </Engines.HistoryEngine.Storage>
    <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
  </database>
</databases>
  • Rename the “defaultLuceneIndexConfiguration” XML node to “myIndexConfiguration”.
  • Remove the “Settings” section since it is already in the default configuration file we copied from.
  • “IndexAllFields” must be left to true.
  • Remove the nodes under the “FieldNames” node, EXCEPT the “_uniqueid” one. The “_uniqueid” field is necessary for Sitecore.
  • In “FieldTypes” remove types you don’t need in the index. For the remaining change STORAGETYPE to YES to have the values stored for these fields in the index. When storing field values in the index you don’t need to retrieve them from the database. It will increase the size of your index but having to go to the database after each search would more or less nullify the performance you get from using search.
  • Uncomment the <include hint="list:IncludeTemplate"> node and remove the “BucketFolderTemplateId” node. We will specify the templates in here we want items to be indexed from.
  • Go through the other “field=” nodes to see if you’re ok with them (like excluding certain fields).

When done, log into Sitecore as admin, and go to the control panel and indexing manager. Your new index should be listed so you can (re)generate it (you may have to restart your site or webserver first). Once done it should have created a folder sitecore_myindex in your Sitecore data folder that you can open with Luke to see the new index contents. A number of fields (most of them starting with an underscore) will always be present as Sitecore indexes them by default.

The above creates a working custom index for a single server setup using Lucene. For multiple server setups SOLR may be a better solution than Lucene, and more configuration may be required depending on your environment.

Be aware that with using the “onPublishEndAsync” publishing strategy there may be a small delay between an item being published and the index being updated. This can result in an updated or newly created item not showing directly on your web site.

Storing information in the new index

We will be using our index for specific items. Let’s say we have to build a website showing product information, and defined a template “Product” in Sitecore for creating product items. In Sitecore.ContentSearch.Lucene.MyIndexConfiguration.config, in de <include hint="list:IncludeTemplate"> node, add a node:

<productTemplateId>{guid}</productTemplateId>

where {guid} should be the guid of your product template. Rebuild the index from the Sitecore control panel. Now when creating an item based on the “Product” template it should be present in the index after publishing. Using Luke, open your custom index and look for fields you defined in the template or check “_uniqueid” to see if the guid of the new item is present. Note that every time you add or change something to the configuration file or when changing a template you need to regenerate the index from Sitecore’s control panel!

Using the custom index

We can now use our custom index from code for retrieving data without going to the database, which is way faster and can be used to search through and retrieve large numbers of items. Assuming you have already set up a Visual Studio project for your Sitecore site, you need to add references to Sitecore.ContentSearch.dll and Sitecore.ContentSearch.Linq.dll to your project. Then create a class to reflect the product information from the index:

using System;
using System.ComponentModel;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.SearchTypes;

public class ProductSearchResult: SearchResultItem
{
  public string ProductName { get; set; }
  public string Description { get; set; }
  public string SerialNumber { get; set; }
  public DateTime Released { get; set; }
}

The properties have to correspond with the field names otherwise you will need to annotate them with attributes to map them, and they must have empty public setters. The class has to derive from SearchResultItem, which will also give it properties like ItemId and Name that will correspond with the item in the Sitecore database. Note the name “SearchResultItem” is misleading since the class itself has nothing to do with actual Sitecore items, and searchresults are not linked to a database in any way.

Now add a method somewhere in your code that will use this class for retrieving the data, which can look like this:

/// <summary>
/// Get the products from Sitecore, using search
/// </summary>
public List<ProductSearchResult> SearchProducts()
{
  ISearchIndex myIndex = ContentSearchManager.GetIndex("sitecore_myindex");
  using (var context = myIndex.CreateSearchContext())
  {
     var query = context.GetQueryable<ProductSearchResult>();
     var result = query.ToList();
     return result;
  }
}

First you need to get an ISearchIndex instance from ContentSearchManager by telling Sitecore what index to get. Then we create a search context from this instance. From this context we request a IQueryable for our type which we can cast to a list containing our data from the index.

Instead of using the query directly we can also call GetResults() to retrieve the data from the index along with metadata and faceting. Note that the GetResults() method is an extension method that resides in the Sitecore.ContentSearch.Linq.dll library. The result returned is an object containing a SearchHit collection. Each SearchHit has a property Document that is an object of the type specificied on the query, in our case ProductSearchResult. This is where the data of our search result can be found. So the last two lines of the above could be replaced by the following to return the same result:

var result = query.GetResults();
return result.Hits.Select(p=>p.Document).ToList();

The code from this example will return every entry in the index. Often we will store more than one type of item in an index by entering more than one template in the configuration. The above will always map the entries matching a query to the given type even if the corresponding Sitecore item is of a different type (template), so when having more than one type in the index you’ll have to filter on template. I’ll explain in part 2 of this post how you can request a selection from the index.

WARNING: do NOT use the Dispose() method on the index object or use it in a using context! You will end up with a corrupt index when you do. After filing a bug report Sitecore claimed this is by design and the Dispose() is intended for internal use only (even though there is no tooltip or any documentation about this).

Mapping index fields

The SearchResultItem class has a Fields collection to access the fields by name much similar to a Sitecore Item object. However accessing the field values directly this way bypasses any conversions and mappings done by Sitecore and you get the raw values. Since Lucene is a third party product and using optimizations for storage, you have to be aware of differences in formats between Sitecore and Lucene fields. Most noticably:

  • All field names in the index are in lowercase.
  • All IDs (Guids) are in short ID format (no brackets or hyphens). Sitecore contains operations on ID type objects to convert them.
  • Datetimes are in a format derived from ISO 8601 format.
  • The ItemId property which contains the corresponding item ID is stored in the “_group” field.

On the class derived from SearchResultItem, you can use the [Indexfield] attribute to map an index field to a property explicitly. Note you have to specify the Lucene index field name. The [TypeConverter] attribute can be used to explicitly convert an index field to a type. The Sitecore.ContentSearch.Converters namespace contains specific conversion types for Sitecore, like the [IndexFieldIDValueConverter] to map fields containing IDs.

As an example for our ProductSearchResult class:

[IndexField("released")]
[TypeConverter(typeof(DateTimeConverter))]
public DateTime ReleaseDate { get; set; }

Computed fields and related (media) items

You can store computed field values in an index and set them as property on your SearchResultItem-derived class. This is particularly handy for storing the reference paths to related items like media items, since fields like “image” only store the alt text in the index. By creating a computed field to store the media item reference in you can get your related media item references directly from the index. In Sitecore documentation you can find how to create computed fields. You can add computed fields to the index by adding them to the <fields hint="raw:AddComputedIndexField"> section of your index configuration file. Since you’re storing calculated values be aware of how and when Sitecore updates them or you end up with stale values!

As for the actual content of these media items you still have to get them from the Sitecore media library of course, or use a third party product with a connector to store your media items in. This goes beyond the scope of this article. You  can index the actual content of some types of media items like PDF files by using IFilters, as described by John West in his blog on http://www.sitecore.net/learn/blogs/technical-blogs/john-west-sitecore-blog/posts/2013/04/sitecore-7-indexing-media-with-ifilters.aspx.

I explain about filtering and using the LINQ interface for contentsearch in the next part on https://sionict.wordpress.com/2015/10/06/sitecore-content-search-and-linq-part-2/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s