What can OData bring to ElasticSearch?

The key concept is that such integration allows to implement an indirection level between the logical schema, defined with OData, and the physical one, defined within ElasticSearch.

This allows to transparently apply processing strategies according to the mapping between these two schemas.

All along the post, we will describe in details the concepts behind an integration between OData and ElasticSearch. We can notice that all the general concepts apply to most noSQL databases.

Bridging logical and physical schemas

OData Entity Data Model

The central concepts in the EDM are entities and the relations between them. Entities are instances of Entity Types (for example, Customer, Employee, and so on) which are structured records consisting of named and typed properties and with a key. Complex Types are structured types also consisting of a list of properties but with no key, and thus can only exist as a property of a containing entity. An Entity Key is formed from a subset of properties of the Entity Type and is the way to uniquely identifying instances of Entity Types and allowing Entity Type instances to participate in relationships using navigation properties. Entities are grouped in Entity Sets. Finally, all instance containers like Entity Sets are grouped in an Entity Container.

ElasticSearch mapping

ElasticSearch defines metadata regarding the document kinds it manages within indices. These metadata allows to define types of properties and eventually their formats but also the way document they will be handled during the indexing phase (stored or not, indexed or not, analyzers to apply, ).

The following snippet describes a sample:

{
    "product": {
        "properties": {
            "name": {
                "type" : "string",
                "index": "analyzed,
                "store": true,
                "index_name" : "msg",
                "analyzer": "standard"
             },
            "description":{"type":"string"},
            "releaseDate":{"type":"date"},
            "discontinuedDate":{"type":"date"},
            "rating":{"type":"integer"},
            "price":{"type":"double"},
            "available":{"type":"boolean"},
            "hint":{"type":"string"}
    }
}

Such hints are indexing-oriented: they dont define relations between elements either constraints.

Need for an intermediate schema

As we saw, the ElasticSearch mapping focuses but doesnt contain all the neccessary hints to build an EDM. For this reason, an intermediate schema needs to be introduced.

It will contain additional hints about types (cardinalities, relations, denormalization, ). It will be used to deduce the corresponding EDM. Some hints wont be exposed through this model but will be useful when handling OData requests.

The following content describes the structure of this intermediate schema:

name (string)
pk (true | false);
minOccurs (0 | 1)
maxOccurs (integer or -1)
denormalizedFieldName (string)
notNull (boolean)
regexp (regexp);
uniqueBy (true | false)
autoGenerated (true | false)
indexed (true | false)
stored (true | false)
relationKind (parentChild | denormalized | reference)

In the case of ElasticSearch, this can be stored with the field _meta of type mappings, as described below:

{
    "properties": {
        "age": { "type":"integer" },
        "gender":{ "type":"boolean" },
        "phone":{ "type":"string" },
        "address": {
            "type": "nested",
            "properties": {
                "street": { "type":"string" },
                "city": { "type":"string" },
                "state": { "type":"string" },
                "zipCode": { "type":"string" },
                "country": { "type":"string" }
            }
        }
    },
    "_meta":{
        "constraints":{
            "personId":{ "pk":true, "type":"integer" }
        }
    }
}

Another approach consists in defining it outside ElasticSearch within the OData ElasticSearch support programatically or within a configuration file. Below is a possible solution:

MetadataBuilder builder = new MetadataBuilder();

TargetEntityType personDetailsAddressType
              = builder.addTargetComplexType(
                                  "odata", "personDetailsAddress");
personDetailsAddressType.addField("street", "Edm.String");
personDetailsAddressType.addField("city", "Edm.String");
personDetailsAddressType.addField("state", "Edm.String");
personDetailsAddressType.addField("zipCode", "Edm.String");
personDetailsAddressType.addField("country", "Edm.String");

TargetEntityType personDetailsType
               = builder.addTargetEntityType(
                                  "odata", "personDetails");
personDetailsType.addPkField("personId", "Edm.Int32");
personDetailsType.addField("age", "Edm.Int32");
personDetailsType.addField("gender", "Edm.Boolean");
personDetailsType.addField("phone", "Edm.String");
personDetailsType.addField("address", "odata.personDetailsAddress");

Data management

In the case of data management, this indirection level has an interest since the OData implementation for ElasticSearch can apply strategies regarding the kind of data. We wont dive here into details but we can distinguish these different use cases:

  • Handling primary keys. ElasticSearch manages the primary key of the entity by itself. The key isnt stored as a field in the document itself but in a special metadata called id of type string. ElasticSearch gives you the choice to provide the primary key value or let the database generate an unique string identifier for you. We can notice that only single primary keys are supported. The abstraction can integrate the best way to handle the primary and add a support for primary with other types.
  • OData supports partial updates and single property updates out of the box. ElasticSearch also provides this feature using scripts. This approach can be hidden within the OData implementation for ElasticSearch.
  • OData provides the concept navigation properties to manage links between different entities. Whereas this isnt supported natively in ElasticSearch (like for all noSQL databases), this can be simulate using parent / child support or denormalization. Based on collected metadata for the schema, the OData implementation for ElasticSearch can adapt the processing to transparently support such approaches.

Queries

For queries, the OData abstraction allows to adapt the underlying ElasticSearch queries according to the context and the element they apply on.

Simple queries

The most simpliest queries involve operators eq (equals) and ne (not equals). With ElasticSearch, we need to take care to avoid a classical pitfall. As a matter of fact, such queries need to use term queries but this can only apply in the case of document fields with type string to non indexed fields. Other types are natively supported. In the case of indexed fields are automatically analyzed, we will rather use the function contains to do a match query under the hood. As a matter of fact, a term query wont generally provide the right result.

Below here is described how such queries are handled:

  • Operator eq (equals): name eq 'my name', quantity eq 12

The following ElasticSearch query will be executed:

{
    "term" : {
        "name" : "my name"
    }
}

  • Operator eq with null value: name eq null

The following ElasticSearch query will be executed:

{
    "filtered" : {
        "query" : {
            "match_all" : { }
        },
        "filter" : {
            "missing" : {
                "field" : "name"
            }
        }
    }
}

  • Operator ne (not equals): name ne 'my name', quantity ne 12

The following ElasticSearch query will be executed:

{
    "filtered" : {
        "query" : {
            "match_all" : { }
        },
        "filter" : {
            "not" : {
                "filter" : {
                    "query" : {
                        "term" : {
                            "name" : "my name"
                        }
                    }
                }
            }
        }
    }
}

  • Operator ne with null value: name ne null

The following ElasticSearch query will be executed:

{
    "filtered" : {
        "query" : {
            "match_all" : { }
        },
        "filter" : {
            "exists" : {
                "field" : "name"
            }
        }
    }
}

Canonical functions in queries

The function contains allows to make a match query and can perfectly applied to analyzed fields.

  • Function contains: contains(name, 'my name')

The following ElasticSearch query will be executed:

{
    "match" : {
        "description" : {
            "query" : "whole",
            "type" : "boolean"
        }
    }
}

  • Function startswith: startswith(name, 'bre')

The following ElasticSearch query will be executed:

{
    "prefix" : {
        "name" : {
            "prefix" : "bre"
        }
    }
}

Handling nested fields

OData queries provides the ability to define paths with several levels. For example, expression like that are supported: address/city/name. There are several use cases depending on the relations between fields.

For example, if the field city is contained within a nested field, we can transparently adapt the ElasticSearch query to wrap it within a nested one. This can apply to all queries previously described here.

  • Operator eq (equals): address/city/name eq 'my name'

The following ElasticSearch query will be executed:

{
    "nested" : {
        "query" : {
            "term" : {
                "city.name" : "my name"
            }
        },
        "path" : "address"
    }
}

We dont go further here but we can handle the case when parent / child relations or denormalization come into account to deduce the ElasticSearch queries to execute.

Compounded queries

OData queries also support operators like and, or or not to compound all queries described previously.

  • Operator or: contains(name, 'my name') or contains(description, 'my description')

The following ElasticSearch query will be executed:

{
    "filtered" : {
        "query" : {
            "match_all" : { }
        },
        "filter" : {
            "or" : {
                "filters" : [ {
                    "query" : {
                        "match" : {
                            "name" : "my name"
                        }
                    }
                }, {
                    "query" : {
                        "match" : {
                            "description" : "my description"
                        }
                    }
                } ]
            }
        }
    }
}

Handling relations

We saw previously that we can easily and transparently handle nested fields. Its the same for parent / child relations. If we are in the case of a navigation property implemented in ElasticSearch with such feature, we can easily adapt the corresponding query and use a has_child query.

  • Operator eq (equals): address/street eq 'my street'

The following ElasticSearch query will be executed:

{
    "has_child": {
        "type": "address",
        "query": {
            "term": {
                "street": "my street"
            }
        }
    }
}

Updating the denormalized data

Denormalized data are duplicated within several ElasticSearch types in a single index or across several ones. This allows to simulate data joins and return a data graph within query results and by executing a single query.

However, there is always a data that triggers the updates of duplicated ones when updated. This data corresponds to the one that is present in the logical schema at a single place. Denormalized data dont appear within this schema since they correspond to a design choice of the physical schema.

When updating this data, the OData service will build a batch update request to update all the dependent ones. As we saw previously, we have the hints about such denormalization links within the intermediate schema. With such approach handling updates of denormalized data is completely transparent.

This entry was posted in ElasticSearch, OData and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s