Robust Search on E-Commerce Website by Chun Lin Goh

Azure Search is provided as a Search-as-a-Service solution that helps e-commerce developers to easily offer a robust search experience on the website without managing search infrastructure and understanding in depth about search.

Working in the e-commerce team, we find search to be very important because as the number of products grows, search function becomes a top-priority task for us to implement. Azure Search is what we choose to use.

Upload Data to Index

For our e-commerce platform, we have data stored separately in different systems. With Azure Search, we can easily compile and push data to populate the Azure Search index. By doing this way, we don’t need to worry about what our data sources will be.

To upload data to Azure Search in .NET, firstly, we need to get the Microsoft Azure Search .NET SDK NuGet Package.

nuget-microsoft-azure-search.png
Installed Azure Search NuGet package in ASP .NET MVC project.

 

Secondly, we need to create an instance of the SearchServiceClient.

SearchServiceClient serviceClient = CreateSearchServiceClient();

...

private static SearchServiceClient CreateSearchServiceClient()
{
    return new SearchServiceClient(
        AZURE_SEARCH_SERVICE_NAME, 
        new SearchCredentials(AZURE_SEARCH_ADMIN_API_KEY)
    );
}

Both the Azure Search Service Name and Azure Search Admin API Key can be found in the Azure Portal.

service-name-and-admin-keys.png
Service Name (which is shown beside the Azure Search logo) and Admin Keys.

After that, we need to make sure if the index exists or not. If yes, we need to delete it before we can upload a new index. The AZURE_SEARCH_INDEX_NAME is the name we give for our index in Azure Search. It can be called, for example, ecommerce-products-index.

if (await serviceClient.Indexes.ExistsAsync(AZURE_SEARCH_INDEX_NAME))
{
    await serviceClient.Indexes.DeleteAsync(AZURE_SEARCH_INDEX_NAME);
}

await serviceClient.Indexes.CreateAsync(new Index()
{
    Name = AZURE_SEARCH_INDEX_NAME,
    Fields = FieldBuilder.BuildForType<Product>()
});

After that, we need to retrieve the SearchIndexClient for the index.

var indexClient = serviceClient.Indexes.GetClient(AZURE_SEARCH_INDEX_NAME);

We can then proceed to upload data to Azure Search via an IndexBatch object, as shown in the code below.

IEnumerable products = ...; // Retrieve available products from data sources

var batch = IndexBatch.Upload(products);

try
{
    indexClient.Documents.Index(batch);
}
catch (IndexBatchException ex)
{
    // Exception handling... 
}

Upload is just one of the indexing actions. There are other actions such as Merge, MergeOrUpload, and Delete. Some of you may wonder, with the MergeOrUpload function, why we still choose to delete and re-create index every time we do the indexing. This is because MergeOrUpload will not remove documents from the index and we don’t want to have deleted products to be indexed as well.

Take note that we are only allowed to include up to 1,000 documents in a single indexing request (or 16MB, whichever limit comes first).

There are more than one way of uploading data to Azure Search for indexing such as using the Azure Portal and REST. You can read more at Azure Search Documentation – Add Data.

Model Class – Product

To create a list of Field objects for indexing, the FieldBuilder class needs a model class to define the fields. In the sample codes above, the model is called Product which looks as follows.

using Microsoft.Azure.Search;
using Microsoft.Azure.Search.Models;
using System.ComponentModel.DataAnnotations;

...

[SerializePropertyNamesAsCamelCase]
public class Product
{
    [Key]
    public string SerialNumber { get; set; }

    [IsFilterable, IsSortable, IsSearchable]
    public string Name { get; set; }

    [IsFilterable, IsSortable, IsSearchable, IsFacetable]
    public string Category { get; set; }

    [IsSearchable]
    public string Summary { get; set; }

    [IsSearchable]
    public string Description { get; set; } 

    [IsFilterable, IsSortable, IsFacetable] 
    public int? Rating { get; set; }
}

The SerializePropertyNamesAsCamelCase attribute is defined in the Azure Search .NET SDK. In Azure Search JSON documents, the name of each of its field is in Camel Case while the public property of a model is using Pascal Case, according to the .NET naming guidelines. Hence the attribute helps to map the Pascal-Case properties to Camel-Case fields in the Azure Search index documents.

When creating index, we can define attributes for each of the field.

  • IsSearchable: Marks the field as full-text search-able. Searchable fields consume additional space in the Azure Search index because an extra tokenized version of the field value needs to be stored also for full-text searches.
  • IsFilterable: Allows the field to be referenced in filter query. All fields are filterable by default.
  • IsSortable: Allows the result documents to be sorted using the field in the orderby expression.
  • IsFacetable: Unlike filter that is used to restrict which documents are returned by a query, facet is used to produce summaries of field values across the documents.

According to Microsoft Docs, neither the SDK nor the Azure Search service will help us to make sure that no document contains null value for non-nullable property, we have no choice but to use nullable data types, otherwise there will be errors thrown.

Query Index with Simple Query Syntax

Now we have useful information about our products up on the Azure Search. How do we allow customers to retrieve them?

Azure Search provides developers many methods to create powerful queries. There are two main types of query available, i.e. Search and Filter. The one that we use in our e-commerce is just the search. A search query searches for terms in all searchable fields in the index.

There are two types of search query syntax. By default, Azure Search is using the Simple Query Syntax.

Query execution has four stages:

  1. Query Parsing: Separate query terms from query operators and create a query tree to be sent to the search engine;
  2. Lexical Analysis: Perform lexical analysis on query terms;
  3. Document Matching: Match documents containing any or all of the terms;
  4. Scoring: Score matching documents based on the contents of the inverted index.
search-request-processing-workflow
The components used to process a search request in Azure Search. (Source: Microsoft Docs)

In the Simple Query Syntax, we are able to include Operators in our search query, such as + (AND), || (OR), and – (NOT). So if a customer would like to find out the tour package that will visit Eiffel Tower but not any museum, he/she can simply key in Eiffel -museum in the search box.

eiffel--museum.png
All the tours in France that are going to Eiffel Tower but not any museum in France. (Source: Changi Recommends)

There is another interesting thing that I notice is that according to the Microsoft Docs, the OR operator is | but for my case, OR operation will only work if I use ||.

Besides these operators, the famous Phrase Search Operator (” “) is also available. So, “Eiffel Museum” will only return documents that contain whole phrase together and in that order. Is there an Eiffel Museum in Paris? =)

To find out more about Simple Query Syntax and the operators available in it, please read Microsoft Docs.

Query Index with Lucene Query Syntax

Another query syntax is Lucene Query Syntax, the powerful and expressive query language developed as part of Apache Lucene.

For e-commerce, two operations supported by Lucene Query Syntax are my favourites:

  • Fuzzy Search;
  • Wildcard Search.

It’s quite common to make typos in search. Fortunately, Azure Search provides Fuzzy Search function which is based on the Damerau-Levenshtein Distance, a string metric for measuring the edit distance between two sequences.

To do a Fuzzy Search, we just need to append the character ~ to the single word and specify the edit distance, which is by default 2. So Eifle~1 will match documents having “Eiffle” in their content too.

I did a similar task which was related to implementing a search function in JavaScript when I was working in Easibook as well. By calculating the Levenshtein Distance of user input and the records in database, the small JavaScript code I wrote is able to suggest the places even user keys in the place name wrongly.

Unlike the algorithm that I used in Easibook, the Damerau-Levenshtein Distance is a modified version of it which now also considers transposition of two adjacent characters as a single edit.

levenshtein-distance-vs-damerau--levenshtein-distance.png
Levenshtein Distance vs. Damerau-Levenshtein Distance.

Another query operation that I like is the Wildcard Search. By default, Azure Search is doing word-by-word search. However, there are customers sometimes hope that they don’t have to complete the entire search term. So, they hope “Desert” will be returned as a result when they are searching with the term “des”.

To accomplish this mission, I simply apply Wildcard Search by using multiple characters (*) or single character (?) wildcard. Take note that Lucene Query Syntax only supports the use of wildcard symbols with a single search term, and not a phrase.

To find out more about Lucene Query Syntax, please checkout its Microsoft Docs too.

Search Score

By default, the top 50 (up to 1,000 in a single response) matched documents are returned by Azure Search. Every document returned is assigned a Search Score. In the result set, the documents are ranked according to their Search Score, from highest to lowest.

Default Azure Search scoring is computed based on statistical properties of the data and the query. Azure Search favours documents that contain many instances of the search term. Based on the Microsoft Docs, the search score goes up even higher if the search term is rare across the data corpus, but common within the document. This default search scoring fits most our search cases in e-commerce, so even though customizing scoring is possible, we don’t proceed to do that.

Pricing

The following pricing table reflects the rates charged by Azure Search for Southeast Asia customers in USD.

pricing-details.png
Pricing of Azure Search and Its Data Transfer Standard Rates (For latest rates please refer to Microsoft Azure)

Personal Demo

Previously I did a simple Windows console app as my personal hobby project demonstrating the ability of Azure Search when I was just started exploring Azure Search half a year ago. I hosted the codes of the project on GitHub. Feel free to check it out and help contributing to it. Thanks in advance! =)

azure-search-demo-console.png
The Azure Search Demo Console I built in Dec 2016 to search Singapore .NET Developers Community meetups.

Related Posts

Leave a Reply