Sitecore content search and LINQ (part 2)

This is part 2 of my post about Sitecore content search and LINQ. For the first part see https://sionict.wordpress.com/2015/09/19/sitecore-content-search-and-linq-part-1/.

In the first part we set up our custom index and created a piece of code to use it. Now let’s say that we don’t only have products but also services to list on our site. We created a Sitecore template for them and want our Service items to be in the index as well. So we added our Service template ID to the index as we did before with the product template ID:

<serviceTemplateId>{guid}</serviceTemplateId>

Don’t forget to regenerate your index and check with Luke the contents are there after publishing your first Service items! Sometimes you may have to restart IIS and regenerate again to get the index working. If it keeps failing clear the whole custom index folder in the Sitecore data folder and try to regenerate again.

We create a class for our service entries in our code:

using Sitecore.ContentSearch.SearchTypes;
namespace ScPredicateBuilderApp.DataObjects
{
   public class ServiceSearchResult:SearchResultItem
   {
      public string ServiceName { get; set; }
      public string Description { get; set; }
      public double Rate { get; set; }
   }
}

and some code to retrieve them from the index similar as we did for products but with the above class for type parameter in GetQueryable(). Of course C# generics can be very convenient to prevent a lot of duplicate code so you may want to change the SearchProducts() method of the previous part into something more generic:

/// <summary>
/// Get the index contents using search
/// </summary>
public List<T> SearchIndexContent<T>() where T : SearchResultItem
{
   ISearchIndex myIndex = ContentSearchManager.GetIndex("sitecore_myindex");
   using (var context = myIndex.CreateSearchContext())
   {
      var query = context.GetQueryable<T>();
      var result = query.ToList();
      return result;
   }
}

Now when calling this method pass either ProductSearchResult or ServiceSearchResult as type parameter to get a list of the corresponding type.

However… as said before Lucene entries are generic (“documents”) and Sitecore has no way of knowing what entry belongs to what type (class) you specified based on the index alone. It will try to map ALL entries to the type you passed, leaving empty properties for fields it cannot map. So the first thing we have to do when using contentsearch is add a filter on template to the Lucene query.

Adding a template filter

Where and how we pass the template ID to filter on is a matter of conventions in your working environment or personal preference. You can put a base class between SearchResultItem and your content search classes where you retrieve the template ID (and to define common properties), or pass it as a parameter when calling the above method. In this case I do the latter. So we add a parameter to the method:

public List<T> SearchIndexContent<T>(string templateId) where T : SearchResultItem

In this case I defined some configuration somewhere for my project and have the Sitecore IDs as string, passing them as such for parameter. Now we add a line in our using block and extend our query line with a filter on the template. In other words replace the “var query = ....” line with:

var tId=new ID(templateId);
var query = context.GetQueryable<T>().Filter(t=>t.TemplateId==tId);

Filter() is part of the LINQ interface Sitecore has implemented for contentsearch. There is also a Where() implementation. The difference is that Where returns a result taking Lucene’s scoring system into account whereas Filter just returns the result set. Since we don’t do anything with the scoring here we use Filter.

As with any LINQ implementation on a datasource you have to assign external parameters to a local variable before entering them into a LINQ expression to avoid so-called “modified closure” issues. These issues can cause nasty bugs that won’t give you a warning or error but can return incorrect results. Also, if the parameter in the Filter() expression is the result of a function or calculation you have to assign it first to a local variable or it will cause runtime exceptions. For example combining the above into something like:

var query = context.GetQueryable<T>().Filter(t=>t.TemplateId== new ID(templateId));

will NOT work. So always use a local variable as parameter in LINQ expressions.

You can chain LINQ methods just like any other LINQ implementation. Sitecore will take the expression and convert it to a Lucene query, returning the result as an IQueryable. Let’s say we want our products sorted on creation date, so something like:

var query = context.GetQueryable<T>().Filter(t=>t.TemplateId==tId).OrderByDescending(p=>p.CreatedDate);

will work. CreatedDate comes from a computed field that was already defined by Sitecore in our index configuration file as “__smallcreateddate”. It is one of the predefined properties on SearchResultItem.

Since our generic method takes SearchResultItem as type, we don’t have our product- or service specific properties here. However you can refer to fields directly using the Lucene field names and indexer. Let’s say we have a boolean field “Available” added to our Product template, we can do something like:

var query = context.GetQueryable<T>().Filter(t=>t.TemplateId==tId).Filter(p=>p["available"]=="1");

Note that using the Fields collection for the parameter (p=>p.Fields[“available”]) instead of the indexer uses a function get_Item() internally, causing an exception. It is one of many quirks you have to be aware of when using LINQ for contentsearch. Also you’re referring values as they are in Lucene this way, which means they’re not type-converted and thus might not give the results you expect. The above is therefore a string comparison.

Of course we now also have a problem for our ServiceSearchResult entries since they don’t have a field “available”. It results in a null value for the field and thus the above expression is always false.

Building LINQ expressions dynamically

So what if we want to have expressions depending on our item type? We could go back to making type-specific search functions, having to duplicate the template filtering expression (and probably more) in each one. Moreover, in real-world scenario’s we may not know in advance what expressions may be needed (a user may or may not select a specific search option). In short, we want to build our search expression from separate parts.

In comes Sitecore’s PredicateBuilder. This class resides in the Sitecore.ContentSearch.Linq.Utilities namespace and can be used to create and manipulate (partial) LINQ expressions for a given type. You start by either calling the True() or False() method, where you use True() for operations than need to be combined using logical “And “ and False() for logical “Or”. Then you use the And() or Or() methods for the comparison expression. The current version also has a Create<T>() method but at his point I cannot confirm it works the same as the True/And and False/Or method combinations under all circumstances.

These methods return an object of type System.Linq.Expressions.Expression. To make this all clear it is best to show an example. Say we put the expression for available products in a separate method, it will look like this:

/// <summary>
/// Return an expression for filtering on available products only
/// </summary>
/// <returns></returns>
public Expression<Func<ProductSearchResult, bool>> GetAvailableProductsExpression()
{
   var predicate = PredicateBuilder.True<ProductSearchResult>();    //True for "And"
   predicate = predicate.And(p => p.Available == true);
   return predicate;
}

Now we add a parameter of type Expression<Func<T, bool>> to our SearchIndexContent method, and pass the result of the above in when calling the search for products. For services we don’t pass in a specific expression and use a dummy expression in our SearchIndexContent method. It now looks like this:

/// <summary>
/// Get the index contents using search
/// </summary>
public List<T> SearchIndexContent<T>(string templateId, Expression<Func<T, bool>> expression = null) where T : SearchResultItem
{
   ISearchIndex myIndex = ContentSearchManager.GetIndex(Constants.IndexName);
   using (var context = myIndex.CreateSearchContext())
   {
      var tId = new ID(templateId);
      // Dummy if null
      var exp = expression ?? PredicateBuilder.False<T>().Or(p => true);
      var query = context.GetQueryable<T>().Filter(t => t.TemplateId == tId).Filter(exp);
      var result = query.ToList();
      return result;
   }
}

So to get our products we call it with:

var products = SearchIndexContent<ProductSearchResult>(Constants.ProductTemplateId, GetAvailableProductsExpression());

And to get our services:

var services = SearchIndexContent<ServiceSearchResult>(Constants.ServiceTemplateId);

(Note for this example code I defined the IDs of the templates as strings in a static Constants class.)

Predicate expressions can be combined and nested. Instead of putting a boolean expression in the And() or Or() methods directly you can put the result of another PredicateBuilder in. This way you can logically combine “And” and “Or” expressions, create large complex expressions from smaller parts and “inject” expressions based on user filter selections for example.

Using PredicateBuilder in general

As mentioned before the resulting expression of a PredicateBuilder operation is of a .NET type and not a Sitecore type. The PredicateBuilder is not tied to Sitecore’s content search itself. As far as I know you can use it with any type that can be cast using AsQueryable(). This can be quite handy, for example to do post-search filtering using the facets from the query.getResults() method mentioned in part 1 of this article. Or to perform post-search operations on the result set that Sitecore’s LINQ parser cannot translate to a search query.

There’s a lot more to Sitecore’s content search, LINQ and the PredicateBuilder than can be covered by this article. Unfortunately Sitecore’s documentation about it is far from complete, so you’ll have to look around on the web and experiment to figure out all that’s there.

Getting the demo code

I have uploaded a sample application to GitHub at https://github.com/mcrvdriel/ScPredicateBuilderDemo. It contains a simple Visual Studio 2013 solution, and a folder containing a package you can import into Sitecore to get the templates and some items. Since Sitecore libraries are proprietary software they are not included and you have to add them to the solution yourself. You need to have a working Sitecore 8 on your environment to be able to use this code, and set the demo project to publish to it.

Advertisements

Sitecore content search and LINQ (part 1)

With the rise of cloud services and today’s requirements for data retrieval, traditional relational- and tree data models with their query structures tend to be replaced with non-relational models and search-based query mechanisms. Within Sitecore this shift is noticable with the introduction of item buckets and the use of Sitecore contentsearch for larger amounts of data. Using a search engine based storage- and search mechanism does not only improve performance drastically but also improves flexibility of data retrieval. Since the release of Sitecore 7, Sitecore wants developers to favour indexes over the database for performance reasons. See http://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/06/sitecore-7-poco-explained.aspx

It is well known Sitecore may run into performance issues when a lot of items are stored in the content tree and searched by querying the traditional way. But also consider a scenario that’s not uncommon today:
Let’s say we have a user that got a gift card and wants to browse our webshop to see what’s available for the gift card amount. So he or she wants to enter something that translates to a query like “show me all products with a price tag less than …” or “all products added to the shop since …”. This cuts right through all categories so we’d have to query the whole tree structure, filtering out all products that do not apply. With a complex tree structure and a large number of product items, even Sitecore’s fast query will give performance issues.

By default Sitecore uses the open source Lucene engine but this can be replaced by the SOLR engine provider that comes with Sitecore, or another custom or 3rd party provider. When converting a folder in the content tree to an item bucket, it will use an index to retrieve items when you enter a search expression in the bucket’s search box. However you don’t need to use item buckets for using contentsearch; it works just as well with items stored in a conventional tree structure.

When showing our products on a public site, the information will normally come from the Sitecore web database, on which also a Lucene index is defined. However when using contentsearch, Sitecore recommends creating your own custom index instead of using the web index for a number of reasons:

  • The web index contains references for (almost) all items in the web database, decreasing performance;
  • By default the index does not store whole values, so you’d still have to retrieve individual items from the database;

By creating your own custom index (or multiple indices) you can specify to only contain the items you need, and store necessary data in the index so you don’t have to retrieve it from the database.

Creating a custom index

For creating a custom index you need to create two configuration files based on the web index. I’d recommend also installing Luke or some similar index viewer for troubleshooting. Luke is a Java application (.jar) so you need to install Java also.
There used to be a blog showing the minimum needed to create a custom index but that seems to be no longer online. So I’ll outline the process here:

  • Make a copy of these files in App_Config\Include in the website folder:
    • ContentSearch.Lucene.Index.Web.config ==> rename the copy to Sitecore.ContentSearch.Lucene.Index.MyIndex.config.
    • ContentSearch.Lucene.DefaultIndexConfiguration.config ==> rename the copy to Sitecore.ContentSearch.Lucene.MyIndexConfiguration.config.
  • In Sitecore.ContentSearch.Lucene.Index.MyIndex.config (= index definition):
    • Index node: rename the id from sitecore_web_index to sitecore_myindex.
    • Configuration node: set the ref atribute to “contentSearch/indexConfigurations/myIndexConfiguration”. We will create our own configuration in the other file under this XML path.
    • Note the settings like publishing strategy, database and root for the search. These can be left as-is or changed, i.e. use sitecore/Content/Products folder as root to only include items from this folder in the content tree. Leave the strategy to be “onPublishEndAsync”, which will update the index when publishing.
  • In Sitecore.ContentSearch.Lucene.MyIndexConfiguration.config (=index configuration)
    • Instead of keeping all sections in this file you can replace a lot of them with references to the original in the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file. See Sitecore documentation and references for this.
    • Add the following directly under the <sitecore> tag:
<!--This section for database is so that the indexes get updated in any environment when an item changes -->
  <databases>
    <database id="web" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
      <Engines.HistoryEngine.Storage>
        <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
          <param connectionStringName="$(id)" />
          <EntryLifeTime>30.00:00:00</EntryLifeTime>
        </obj>
      </Engines.HistoryEngine.Storage>
    <Engines.HistoryEngine.SaveDotNetCallStack>false</Engines.HistoryEngine.SaveDotNetCallStack>
  </database>
</databases>
  • Rename the “defaultLuceneIndexConfiguration” XML node to “myIndexConfiguration”.
  • Remove the “Settings” section since it is already in the default configuration file we copied from.
  • “IndexAllFields” must be left to true.
  • Remove the nodes under the “FieldNames” node, EXCEPT the “_uniqueid” one. The “_uniqueid” field is necessary for Sitecore.
  • In “FieldTypes” remove types you don’t need in the index. For the remaining change STORAGETYPE to YES to have the values stored for these fields in the index. When storing field values in the index you don’t need to retrieve them from the database. It will increase the size of your index but having to go to the database after each search would more or less nullify the performance you get from using search.
  • Uncomment the <include hint="list:IncludeTemplate"> node and remove the “BucketFolderTemplateId” node. We will specify the templates in here we want items to be indexed from.
  • Go through the other “field=” nodes to see if you’re ok with them (like excluding certain fields).

When done, log into Sitecore as admin, and go to the control panel and indexing manager. Your new index should be listed so you can (re)generate it (you may have to restart your site or webserver first). Once done it should have created a folder sitecore_myindex in your Sitecore data folder that you can open with Luke to see the new index contents. A number of fields (most of them starting with an underscore) will always be present as Sitecore indexes them by default.

The above creates a working custom index for a single server setup using Lucene. For multiple server setups SOLR may be a better solution than Lucene, and more configuration may be required depending on your environment.

Be aware that with using the “onPublishEndAsync” publishing strategy there may be a small delay between an item being published and the index being updated. This can result in an updated or newly created item not showing directly on your web site.

Storing information in the new index

We will be using our index for specific items. Let’s say we have to build a website showing product information, and defined a template “Product” in Sitecore for creating product items. In Sitecore.ContentSearch.Lucene.MyIndexConfiguration.config, in de <include hint="list:IncludeTemplate"> node, add a node:

<productTemplateId>{guid}</productTemplateId>

where {guid} should be the guid of your product template. Rebuild the index from the Sitecore control panel. Now when creating an item based on the “Product” template it should be present in the index after publishing. Using Luke, open your custom index and look for fields you defined in the template or check “_uniqueid” to see if the guid of the new item is present. Note that every time you add or change something to the configuration file or when changing a template you need to regenerate the index from Sitecore’s control panel!

Using the custom index

We can now use our custom index from code for retrieving data without going to the database, which is way faster and can be used to search through and retrieve large numbers of items. Assuming you have already set up a Visual Studio project for your Sitecore site, you need to add references to Sitecore.ContentSearch.dll and Sitecore.ContentSearch.Linq.dll to your project. Then create a class to reflect the product information from the index:

using System;
using System.ComponentModel;
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.SearchTypes;

public class ProductSearchResult: SearchResultItem
{
  public string ProductName { get; set; }
  public string Description { get; set; }
  public string SerialNumber { get; set; }
  public DateTime Released { get; set; }
}

The properties have to correspond with the field names otherwise you will need to annotate them with attributes to map them, and they must have empty public setters. The class has to derive from SearchResultItem, which will also give it properties like ItemId and Name that will correspond with the item in the Sitecore database. Note the name “SearchResultItem” is misleading since the class itself has nothing to do with actual Sitecore items, and searchresults are not linked to a database in any way.

Now add a method somewhere in your code that will use this class for retrieving the data, which can look like this:

/// <summary>
/// Get the products from Sitecore, using search
/// </summary>
public List<ProductSearchResult> SearchProducts()
{
  ISearchIndex myIndex = ContentSearchManager.GetIndex("sitecore_myindex");
  using (var context = myIndex.CreateSearchContext())
  {
     var query = context.GetQueryable<ProductSearchResult>();
     var result = query.ToList();
     return result;
  }
}

First you need to get an ISearchIndex instance from ContentSearchManager by telling Sitecore what index to get. Then we create a search context from this instance. From this context we request a IQueryable for our type which we can cast to a list containing our data from the index.

Instead of using the query directly we can also call GetResults() to retrieve the data from the index along with metadata and faceting. Note that the GetResults() method is an extension method that resides in the Sitecore.ContentSearch.Linq.dll library. The result returned is an object containing a SearchHit collection. Each SearchHit has a property Document that is an object of the type specificied on the query, in our case ProductSearchResult. This is where the data of our search result can be found. So the last two lines of the above could be replaced by the following to return the same result:

var result = query.GetResults();
return result.Hits.Select(p=>p.Document).ToList();

The code from this example will return every entry in the index. Often we will store more than one type of item in an index by entering more than one template in the configuration. The above will always map the entries matching a query to the given type even if the corresponding Sitecore item is of a different type (template), so when having more than one type in the index you’ll have to filter on template. I’ll explain in part 2 of this post how you can request a selection from the index.

WARNING: do NOT use the Dispose() method on the index object or use it in a using context! You will end up with a corrupt index when you do. After filing a bug report Sitecore claimed this is by design and the Dispose() is intended for internal use only (even though there is no tooltip or any documentation about this).

Mapping index fields

The SearchResultItem class has a Fields collection to access the fields by name much similar to a Sitecore Item object. However accessing the field values directly this way bypasses any conversions and mappings done by Sitecore and you get the raw values. Since Lucene is a third party product and using optimizations for storage, you have to be aware of differences in formats between Sitecore and Lucene fields. Most noticably:

  • All field names in the index are in lowercase.
  • All IDs (Guids) are in short ID format (no brackets or hyphens). Sitecore contains operations on ID type objects to convert them.
  • Datetimes are in a format derived from ISO 8601 format.
  • The ItemId property which contains the corresponding item ID is stored in the “_group” field.

On the class derived from SearchResultItem, you can use the [Indexfield] attribute to map an index field to a property explicitly. Note you have to specify the Lucene index field name. The [TypeConverter] attribute can be used to explicitly convert an index field to a type. The Sitecore.ContentSearch.Converters namespace contains specific conversion types for Sitecore, like the [IndexFieldIDValueConverter] to map fields containing IDs.

As an example for our ProductSearchResult class:

[IndexField("released")]
[TypeConverter(typeof(DateTimeConverter))]
public DateTime ReleaseDate { get; set; }

Computed fields and related (media) items

You can store computed field values in an index and set them as property on your SearchResultItem-derived class. This is particularly handy for storing the reference paths to related items like media items, since fields like “image” only store the alt text in the index. By creating a computed field to store the media item reference in you can get your related media item references directly from the index. In Sitecore documentation you can find how to create computed fields. You can add computed fields to the index by adding them to the <fields hint="raw:AddComputedIndexField"> section of your index configuration file. Since you’re storing calculated values be aware of how and when Sitecore updates them or you end up with stale values!

As for the actual content of these media items you still have to get them from the Sitecore media library of course, or use a third party product with a connector to store your media items in. This goes beyond the scope of this article. You  can index the actual content of some types of media items like PDF files by using IFilters, as described by John West in his blog on http://www.sitecore.net/learn/blogs/technical-blogs/john-west-sitecore-blog/posts/2013/04/sitecore-7-indexing-media-with-ifilters.aspx.

I explain about filtering and using the LINQ interface for contentsearch in the next part on https://sionict.wordpress.com/2015/10/06/sitecore-content-search-and-linq-part-2/

ASP.NET session state and authentication

A few weeks after rebuilding a security implementation of an existing ASP.NET webforms system, I got a call from my client saying one of their customers lost their in-session data and was confronted with defaults from the system. A quick look at the logs showed the customer in this case had left the system idle for a while, then returned after a session timeout had occured. As expected the user was redirected to the logon screen, but managed somehow to get back into the system bypassing the logon (although of course not as a different user).

Explanation

After some research looking into the configuration I found the installation of a third-party component we used in the system had added a line to the web.config:

<sessionState timeout=”20” …….

As for the authentication, the system used ASP.NET forms authentication with the default timeout, which is 30 minutes. At this point it is important to realize that ASP.NET authentication is not connected to a particular session. ASP.NET configured for forms authentication creates an authentication ticket with a timeout that is usually stored in an authentication cookie (with default name “.ASPXAUTH”). Setting the timeout on the forms authentication does NOT set the session timeout, something that is often misunderstood or overlooked in ASP.NET applications.

Apparently the user in this case had a session timeout but after being redirected to the logon page used the browser’s back button BEFORE the authentication timeout occured. The difference between session state timeout and authentication timeout had left a 10 minute window where a user without a session was still authenticated. Since the user still had a valid authentication ticket, the system just created a new session but of course the previously stored session information was lost, presenting the user with default settings.

Synchronize session and authentication

To avoid the above situation from happening, first of all set the authentication timeout and the session timeout to at least the same values. By default authentication uses a sliding expiration unless configured not to, meaning the counter is reset on user activity (but not necessarily after each request). For session state this is always the case.

Depending on your requirements you can choose a strategy to avoid getting sessions out of sync with authentication. One way is to just reinitialize the session if it was expired and the user is still logged in. This is easy if no or little information is kept in relation to the session. Another way is to make sure session ending does end the authentication and vice versa.

Part 1: end authentication when session is expired

To implement ending the authentication after session expiration, first make sure the session sticks by entering something into it, otherwise the session will get renewed on every request. To do this, directly after authenticating the user store the session ID in a session variable. So in a logon form (ASP.NET webforms) or ASP.NET MVC controller it will look something like:

...
//Authentication, validation etc.
....

FormsAuthentication.SetAuthCookie(UserName, false);
Session["__MyAppSession"] = Session.SessionID;
..

Since we are going to bind our authentication to the session, it is pointless to set the createPersistentCookie parameter to true of course.

Now we can check on any request if we still have the session active, and if not log out the user. The exact place to do this can be tricky and causes a lot of questions on forums and such, but arguably the best location is in the Application_AcquireRequestState event in global.asax.

Since there’s no guarantee we have a valid user or session in this event, we need to do a lot of null-checking. The code will look like this:

void Application_AcquireRequestState(object sender, EventArgs e)
 {
 var session = System.Web.HttpContext.Current.Session;
 if (session == null || string.IsNullOrWhiteSpace(session.SessionID)) return;
 var userIsAuthenticated = User != null &&
 User.Identity != null &&
 User.Identity.IsAuthenticated;
 if (userIsAuthenticated && !session.SessionID.Equals(Session["__MyAppSession"]))
 {
 Logoff();
 }
 // part 2 gets here
 }
private void Logoff()
{ 
    FormsAuthentication.SignOut(); 
    var authCookie = new HttpCookie(FormsAuthentication.FormsCookieName, string.Empty) { Expires = DateTime.Now.AddYears(-1) }; 
    Response.Cookies.Add(authCookie); 
    FormsAuthentication.RedirectToLoginPage(); 
}

Now, if a request is sent while the session expired with the user still authenticated, the stored ID (or actually the session variable “__MyAppSession”) will no longer be present and the user will be logged of. The part with the cookie I will explain below.

Part 2: end session when authentication ends (timeout)

This can be added quite easily. After the comment in the above event code, add the following:

if (!userIsAuthenticated && session.SessionID.Equals(Session["__MyAppSession"]))
{ 
    ClearSession();
}

And the ClearSession method:

private void ClearSession()
{
    Session.Abandon();
    var sessionCookie = new HttpCookie("ASP.NET_SessionId", string.Empty) { Expires = DateTime.Now.AddYears(-1) };
    Response.Cookies.Add(sessionCookie);
}

Now if a session exists with session information while the user is no longer authenticated, the session will be abandoned.

Part 3: full user logout

When a user actively logs off we have to clear both the authentication and the session. Since this is usually a single “fire and forget” operation that can be called from various places it’s usually implemented best as a static operation in a logical place.

Per recommendation it is best not to only call FormsAuthentication.SignOut() and session.Abandon(), but to actively overwrite the cookies with ones having expired dates. So a full logoff will look like this:

public static void Logoff()
{
    FormsAuthentication.SignOut();
    Session.Abandon();
    var authCookie = new HttpCookie(FormsAuthentication.FormsCookieName, string.Empty) { Expires = DateTime.Now.AddYears(-1) };
    Response.Cookies.Add(authCookie);
    var sessionCookie = new HttpCookie("ASP.NET_SessionId", string.Empty) { Expires = DateTime.Now.AddYears(-1) };
    Response.Cookies.Add(sessionCookie);
    FormsAuthentication.RedirectToLoginPage();
}

In the above Session and Response are of course from the context (controller or form).

A note on ASP.NET MVC: per recommendation MVC applications should be stateless as much as possible, and not store information in the session. In this case it doesn’t really matter if the session gets renewed automatically while a user is still authenticated since there shouldn’t be any persistent information in there anyway.

Sitecore 7.0 with Windows Identity Foundation 4.5 security

Recently I found myself on a project with the task to implement signin for a new intranet platform based on Sitecore 7.0, using MVC and .NET 4.5 running in the Windows Azure cloud platform. Per requirements the end customer didn’t want to maintain user information within Sitecore but use multiple ADFS 2.0- and other domains for authentication. The Azure Access Control Services (ACS) would be the central gateway for user authentication.

The requirements called for a federated security model combined with Sitecore virtual users. Since the platform is all .NET 4.5 the logical choice for implementing the federated security was the Windows Identity Foundation (WIF). For using WIF within a .NET application Microsoft already provides a lot of examples, many of them not even requiring code but using configuration only. However almost all of these examples apply to a more or less standard .NET application and won’t work within a Sitecore environment. Main reason for this is differences between the .NET 4.5 claim-based security implementation and the Sitecore security model. Also with .NET 4.5 WIF has been fully integrated into the .NET framework core and therefore has some differences with earlier versions.

WIF, ACS and Sitecore

In this post I’ll explain a way on how to implement WIF security in a Sitecore 7.0/ MVC environment. For this article I assume the reader is familiair with terms and abbreviations used in federated security and WIF. If not there’s plenty of information to be found on the net. I also won’t go into details of setting up ACS itself or (the trusts with) ADFS 2.0 domains. A detailed explanation of how this works can be found at http://azure.microsoft.com/en-us/documentation/articles/active-directory-dotnet-how-to-use-access-control/ and related articles.

For the rest of this article you should have created and configured an ACS namespace for you or your organization with at least one identity provider. For legal reasons any code, configuration and references here are examples and not from the actual project. In production environments more exception- and security handling is required.

Steps involved

In our implementation, when a user that’s not yet authenticated goes to the site, the following main steps take place:

  1. Our system makes a request to the ACS to retrieve the list with information for the configured identity providers. The user is redirected to our “login” page, which is similar to normal forms authentication except the user is presented with the list of identity providers instead of a username/ password page, and has to pick the provider of his or her choice;
  2. After picking a provider, the system calls the login url that came with the information from the ACS. If the user is already authenticated with this provider, it immediately returns a security token with a claimset. If not, the user is presented with a login page or box by this provider and has to log in;
  3. The token and claimset is returned through the ACS which may or may not transform or add any information, depending on how it is configured. The ACS returns a security token and the claimset to our system;
  4. WIF intercepts the returned information and performs the necessary checks and steps. See the MSDN pages on “WSFederationAuthenticationModule” for more information;
  5. Our system retrieves the information (claims) through the WIF modules.
  6. With this information we create a Sitecore virtual user, add the necessary roles and attributes to it and log in.

Step 1: Get a list of registered (trusted) IP’s from ACS

First step is to get the configured providers information from our ACS. This can be done by a call to a Javascript endpoint that exists on the ACS. This call can have a bunch of query parameters, of which three are required:

  • protocol, in this case wsfederation
  • version, in this case 1.0
  • realm, the url of your (future) web application that has been configured in your ACS portal as a RP (Relying Party) application.

Let’s say we have configured http://localhost/ on our ACS as relying party, it will look like the following:


https://namespace.accesscontrol.windows.net/v2/metadata/IdentityProviders.js?protocol=wsfederation&version=1.0&realm=http://localhost

where namespace is the namespace that you registered with ACS for you or your organization. This call will return a JSON structure containing an array of objects (one for each configured identity provider) that translates to the following C# class:

[Serializable]
public class IdentityProvider
{
  public List<string> EmailAddressSuffixes { get; set; }
  public string ImageUrl { get; set; }
  public string LoginUrl { get; set; }
  public string LogoutUrl { get; set; }
  public string Name { get; set; }
}

Note that the returned JSON structure uses Microsoft C# naming convention and not the common Javascript convention. When using JSON.NET to deserialize the response the code for sending the request and getting the result will look like the following:

 List<IdentityProvider> Providers;
 using (System.Net.WebClient webClient = new System.Net.WebClient())
 {
   webClient.Encoding = System.Text.Encoding.UTF8;
   string jsonResponse = webClient.DownloadString(requestString);
   Providers = JsonConvert.DeserializeObject<List<IdentityProvider>>(jsonResponse);
 }

Where RequestString must contain the request as shown before. We used a simple form with a submit button and a dropdown box. The dropdown listed the Name property of each provider, and used the serialized provider object for value so we didn’t have to store anything in session variables or hidden fields, keeping our application stateless as recommended for MVC applications.

Step 2: Request authentication from chosen IP

Once the user selected an identity provider (IP), we deserialize the value back to our Provider object and use the value of the LoginUrl property to request authentication from that IP. In an MVC environment we can do this very easily by returning a Redirect action result to that URL. The LoginUrl property should contain the full URL (including going through the ACS) with all information required. Let’s say the user selected an IP and the submit action calls this Controller method (The SelectedIdentityProvider parameter should contain the value property of the chosen provider from the dropdown):

[HttpPost]
[System.Web.Http.AllowAnonymous]
public ActionResult ProviderSelected(string SelectedIdentityProvider)
{
IdentityProvider provider;
…
//Code to retrieve the SelectedIdentityProvider object and assign it to provider
…
return Redirect(provider.LoginUrl);
}

If not yet authenticated the user should be presented with a login box or screen by that IP. Once authenticated, the IP returns the issued security token to our ACS namespace which was set as the wreply parameter in the Login URL.

Step 3: Returning the security token and claims

On the ACS portal we should have configured our Sitecore application as relying party and set the “Return URL” field to the URL of a controller method that handles further login. Optionally we can set a “Error URL” and implement an error handling controller method in case something went wrong on the IP side.

The ACS calls back to our application on the return URL. This return call should be intercepted and processed by the WIF modules (see next step) and then WIF actually calls the return URL on our application. This processing involves validating the returned token and then creating a ClaimsPrincipal, using this to create a session security token. Because the WIF modules reside in the ASP.NET pipeline the security can be implemented in a standard .NET application using configuration only. However this ClaimsPrincipal is an IPrincipal implementation and this is where the problem arises within a Sitecore 7.0 environment, since Sitecore security and users do not derive (yet?) from this claims model.

Step 4: Setting up WIF to process the returned security token and claims

The core modules here are the WSFederationAuthenticationModule and the SessionAutentication modules, which both exist as properties on the FederationAutentication static class. In .NET 4.5 the classes reside in the System.IdentityModel.Services Namespace. You need to add references to System.IdentityModel and System.identityModel.Services in your project. Note that the second reference may (accidentally?) contain a lowercase “i”, violating the usual Microsoft naming convention.

We derived our own ScFederationAuthenticationModule and ScSessionAuthentication from these classes since in both(!) classes we need to override the InitializeModule, the InitializePropertiesFromConfiguration and the OnAuthenticateRequest methods. We define two boolean properties moduleInitialized and propertiesInitialized on each of our our derived classes. See also http://msdn.microsoft.com/en-us/library/system.identitymodel.services.httpmodulebase.init(v=vs.110).aspx for this.

protected override void InitializeModule(System.Web.HttpApplication context)
{
  if (this.moduleInitialized) return;
  this.moduleInitialized = true;
  base.InitializeModule(context);
}

protected override void InitializePropertiesFromConfiguration()
{
  if (this.propertiesInitialized) return;
  this.propertiesInitialized = true;
  base.InitializePropertiesFromConfiguration();
}

protected override void OnAuthenticateRequest(object sender, EventArgs args)
{
  // Skip event if Sitecore user already authenticated.
  if (Sitecore.Context.User != null && Sitecore.Context.User.IsAuthenticated)
  {
    return;
  }
  base.OnAuthenticateRequest(sender, args);
}

The overrides are necessary to prevent WIF from interfering after we have created and signed in our (virtual) user in Sitecore.

The WIF modules need to be in the ASP.NET pipeline so they need to be added to web.config. Under the <modules> node in <system.webServer> add the following 2 entries:

<add name="WSFederationAuthenticationModule" type="SitecoreFedSecurity.ScFederationAuthenticationModule, SitecoreFedSecurity" />
<add name="SessionAuthenticationModule" type="SitecoreFedSecurity.ScSessionAuthenticationModule, SitecoreFedSecurity" />

with SitecoreFedSecurity being the namespace and assembly name for our derived classes. These entries need to be right after the Sitecore.Nexus.Web.HttpModule entry.

We then need to define 2 configuration sections for these modules:

<section name="system.identityModel" type="System.IdentityModel.Configuration.SystemIdentityModelSection, System.IdentityModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089" />
<section name="system.identityModel.services" type="System.IdentityModel.Services.Configuration.SystemIdentityModelServicesSection, System.IdentityModel.Services, Version=4.0.0.0, Culture=neutral, PublicKeyToken=B77A5C561934E089" />

And the definitions of these sections:

<system.identityModel>
  <identityConfiguration>
    <audienceUris>
      <add value="http://localhost/" />
    </audienceUris>
    <securityTokenHandlers>
      <add type="System.IdentityModel.Services.Tokens.MachineKeySessionSecurityTokenHandler, System.IdentityModel.Services, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
      <remove type="System.IdentityModel.Tokens.SessionSecurityTokenHandler, System.IdentityModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
    </securityTokenHandlers>
    <certificateValidation certificateValidationMode="None" />
    <issuerNameRegistry type="System.IdentityModel.Tokens.ConfigurationBasedIssuerNameRegistry, System.IdentityModel, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089">
    <trustedIssuers>
      <add thumbprint="[certificate thumbprint]" name="https://sionict.accesscontrol.windows.net/" />
    </trustedIssuers>
  </issuerNameRegistry> 
</identityConfiguration>
</system.identityModel>
<system.identityModel.services>
  <federationConfiguration>
    <cookieHandler requireSsl="false" />
    <wsFederation passiveRedirectEnabled="false" issuer="https://[namespace].accesscontrol.windows.net/v2/wsfederation" realm="http://localhost/" requireHttps="false" persistentCookiesOnPassiveRedirects="false" />
  </federationConfiguration>
</system.identityModel.services>

This configuration is explained in various articles about WIF so I won’t go into detail here. A note though about the issuerNameRegistry node: there seems to be two variants, of which this is the one that currently seems to work. Previously you needed to add System.IdentityModel.Tokens.ValidatingIsserNameRegistry from NuGet (which has a different setup) to your project and which can be found in various examples on the net, but that doesn’t seem to work anymore. See “http://stackoverflow.com/questions/23692326/adfs-2-0-microsoft-identityserver-web-invalidrequestexception-msis7042.

Replace the references to localhost with your own application’s URL if it is different. There’s two other things here specific for your situation: [namespace] should be replaced with your specific ACS namespace, and [certificate thumbprint] is the thumbprint of the X.509 certificate for your ACS namespace. It can be found in the ACS portal under “Certificates and Keys”.

Aside from the above configuration your application should expose a FederationMetadata.xml which simplifies maintenance on the ACS. See the Microsoft documentation on this. The location of this file and a few other paths need to be accessible for any (anonymous) user. See the Notes at the end of this post.

Step 5: Retrieving claims information

Now that we have WIF set up in our system, we can implement the Controller method we have set as return URL on the ACS to retrieve the information from the claimset. Since we need to create and authenticate a Sitecore user, we need to retrieve the necessary information from WIF and perform a few checks. In our case we named the controller method for the return URL SignIn:

public string SignIn()
{
  bool result = false;
  System.IdentityModel.Tokens.SessionSecurityToken sessionToken = null;
  System.Security.Claims.ClaimsPrincipal claimsPrincipal = null;
  try
  {
    result = ((SCSessionAuthenticationModule)FederatedAuthentication.SessionAuthenticationModule).TryReadSessionTokenFromCookie(out sessionToken);
    if (result) claimsPrincipal = sessionToken.ClaimsPrincipal;
  }
  catch (System.Exception ex)
  {
    return string.Format("Could not retrieve session security token cookie. Could not create user. Exception: {0}", ex.Message);
  }
  //Check status
  if ((claimsPrincipal == null) || (claimsPrincipal.Identity == null))
  {
    return "No claimsPrincipal is set. Could not create user";
  }
  if (!claimsPrincipal.Identity.IsAuthenticated)
  {
    return string.Format("Chosen identity provider did not authenticate identity {0}", claimsPrincipal.Identity.Name);
  }
  //TODO: Create a virtual user based on the principal
}

We access the securitytoken cookie set by WIF through the TryReadSessionTokenFromCookie method of the SessionAuthenticationModule. Despite the “Try..” naming of the method it still throws an exception if the cookie could not be read, so you need to add exception handling here. After getting the token you need to verify the principal is present, and the user is actually authenticated by the IP.

Step 6: Creating the (virtual) Sitecore user and log in

Now that we have the information from the identity provider we can create and log in our Sitecore virtual user. Replace the “TODO” comment in the above code with the following:

string identifier = (string.IsNullOrWhiteSpace(claimsPrincipal.Identity.Name)) ?
  		claimsPrincipal.Claims.FirstOrDefault().Value :
		  claimsPrincipal.Identity.Name;
Sitecore.Security.Accounts.User user = Sitecore.Security.Authentication.AuthenticationManager.BuildVirtualUser("extranet\\" + identifier, true);
//Add any roles or attributes for the user here, before login
Sitecore.Security.Authentication.AuthenticationManager.LoginVirtualUser(user);
return string.Empty;

The Name property should be set by WIF from the corresponding claim, but not all identity providers include a name in the claimset so it can be null. Windows Live for example only returns an unique ID. In our example here we pick the first claim from the set but which claim you need depends on your situation. We also haven’t set any roles or additional properties here but that should be pretty straightforward using the Sitecore API.

Signing out

Besides a login URL, the identity provider information also contains a signout URL we can use to sign out the user with the chosen IP. Completely signing out can involve 4 steps:

  • Sign out of Sitecore with AuthenticationManager.Logout();
  • Sing out WIF with the SignOut() method on the WSFederationAuthenticationModule (or rather our derived class);
  • Sign out with the identity provider with WSFederationAuthenticationModule.FederatedSignOut(..), using the signout URL;
  • Sign out with ACS using WSFederationAuthenticationModule.GetFederationPassiveSignOutUrl(..)

Federated security and signing out can be problematic. It is up to the identity provider if and how to process a signout request, and it may not be possible to sign out because it completely ignores these requests. Other providers abort the above sequence because they do not return after the signout request but display a message page instead. There has been quite some criticism towards Microsoft also for providing plenty of examples for federated security signin but little examples about signing out. Be aware that especially in public environments, even after the above steps (and even after closing the browser as some providers instruct you to do!), the user may not be signed out by the IP, causing an automatic authentication without having to login on a subsequent session.

Notes

A few things must be kept in mind when implementing this security model:

  • The Sitecore CMS still needs the built-in users to be able to access through the CMS login page. Also the FederationMetaData should be accessible, and possibly some other paths containing styles, images or scripts. We use the <location> configuration setting to give access to all users on these folders:
    <location path="FederationMetadata">
      <system.web>
        <authorization>
          <allow users="*" />
        </authorization>
      </system.web>
    </location>
    

    Unfortunately the <location> setting can take only one path so you need to create an entry like this for every path.

  • Make sure you have set up MVC routing properly for your Sitecore environment for the callbacks from the ACS to work.
  • Both WIF modules contain an OnAuthenticateRequest. As it turns out this name is somewhat confusing as it is called on every request, and the actual check whether or not it is a request for authentication is done within the (base) implementation of this method.
  • When hooking up WIF events, be aware that the WIF modules are alive al long as the session is active because they are set as properties on the (static) FederationAutentication class, but MVC objects like controllers are disposed between calls. So when WIF is processing a request and firing the various events before calling the return URL, there is no MVC controller alive.
  • Since this all involves security and users, I got remarks that there should be an ASP.NET MembershipProvider somewhere. It is possible to implement certain parts in methods of a custom MembershipProvider if it needs to be enriched with information from Sitecore or another local storage. Do realize membershiproviders are nothing more than an abstraction between user information storage and applications, and within a federated security model this is is all delegated to the identity provider.

My first post

So, I finally got around to starting a blog. As so many things nowadays on the net, it is surprisingly easy to set up. So why didn’t I start one before?

Main reason is that in my opinion there’s already an overload of blogs, articles, forums and what not on the net that add little or nothing to the world. “Information overload” is one of the biggest issues we have to deal with today. So I never had the urge to add just more to that. However, lately I found myself increasingly in need of “publishing space” for information that others told me could be interesting to share.

My intention is to write posts just for sharing thoughts but also posts that will be of a quite technical nature. In fact the main reason to start this blog now comes from a fellow software engineer who gave me the advice to publish something about a technical issue I solved recently on a project. So here it is: my own blog.