Azure Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure’s geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs), something no other database service can offer.

Azure DocumentDB

Azure DocumentDB is part of Azure Cosmos DB, which is a series of database services Microsoft is bundling together as a way to make creation of distributed databases simple.

DocumentDB is a NoSQL database service, built for fast and predictable performance, high availability, elastic scaling, and ease of development. As a schema-free NoSQL database, DocumentDB provides familiar SQL query capabilities with consistent low latencies on JSON data – ensuring that 99% of your reads are served under 10 milliseconds and 99% of your writes are served under 15 milliseconds. These unique benefits make DocumentDB a great fit for web, mobile, gaming, and IoT, and many other applications that need seamless scale and global replication.

Administration

Add Users to DocumentDB

To add an account to DocumentDB, add the AD account to the appropriate AD group, for example:

Azure DocumentDB – Reader
Azure DocumentDB – Contributor
Azure DocumentDB – Operator

Query DocumentDB

For a quick browse of the documents stored in a Collection, take advantage of the Data Explorer. Sign on to Azure’s Portal, browse to CosmosDB, click on DocumentDB, and select the Data Explorer. Select your Collection and expand the reservoir. Click on Documents and you will see a list of unfiltered documents. Click Edit Filter to add where clauses, etc.

REST API

For the NiFi implementation of the Feeds, we are using CosmosDB REST API to write the documents. To date we have only used the SDK in accessing CosmosDB. Below are my findings while blazing the trail on how to use the REST API to create a document in CosmosDB using the REST API.

Reference

https://docs.microsoft.com/en-us/rest/api/documentdb/access-control-on-documentdb-resources?redirectedfrom=MSDN#resource-tokens
https://docs.microsoft.com/en-us/rest/api/documentdb/common-documentdb-rest-request-headers
https://docs.microsoft.com/en-us/rest/api/documentdb/documents
https://docs.microsoft.com/en-us/rest/api/documentdb/users
https://docs.microsoft.com/en-us/rest/api/documentdb/permissions

Master Key Tokens

The master key token is the all access key token that allows users to have full control of Cosmos DB resources in a particular account. The master key is created during the creation of an account. Master key Tokens DO NOT have access to Collections and documents when using the REST API the do have all access when using the SDK.

Resource Tokens

A Resource Token is created when users in a database are set up with access permissions for precise access control on a resource, also known as a permission resource. A permission resource contains a hash resource token specifically constructed with the information regarding the resource path and access type a user has access to. The permission resource token is time bound and the validity period can be overridden. When a permission resource is acted upon on (POST, GET, PUT), a new resource token is generated.

Header

There are 3 keys that are required when using the REST API with CosmosDB

Authorization

All REST operations, whether you’re using a master key token or resource token, must include the authorization header with the authorization string in order to interact with a resource. The authorization string has the following format:
type={typeoftoken}&ver={tokenversion}&sig={hashsignature} (Example: type=master&ver=1.0&sig=5mDuQBYA0kb70WDJoTUzSBMTG3owkC0/cEN4fqa18/s=)
*This token must be URL Encoded

Here at the details for each variable used in the Authorization string

type

Denotes the type of token: master or resource, both must be lower case and you specify master if you are using the Master Key in the creation of the signature or “resource” if you are using a resource token as you will see below.

ver

Denotes the version of the token, currently 1.0. (note a value of 1 also works)

sig

Denotes the hashed token signature.
The hash signature for a token is constructed from the following 4 parameters: Verb, ResourceType, ResourceLink and Date.
Verb: The Verb portion of the string is the HTTP verb, such as GET, POST or PUT.
ResourceType: The ResourceType portion of the string identifies the type of resource that the request is for, Eg. “dbs”, “colls”, “docs”, “users”, “permissions”, etc..(any resource name in Cosmos)
ResourceLink: The ResourceLink portion of the string is the identity property of the resource that the request is directed at. ResourceLink must maintain its case for the id of the resource.
Example: For a collection it will look like: “dbs/Reservoirs/colls/contextFeeds”.
What Azure documentation does not explain clearly is when creating a resource the ResourceLink is the parent.
Example: If we want to create a permisison since permisisons belong to a user the
ResourceType = “permissions”
ResourceLink = “dbs/reservoirs/users/<username>”
Example: If we want to create a document since documents belong to a collection:
ResourceType = “docs”
ResourceLink = “dbs/reservoirs/colls/contextFeeds”
Date: The Date portion of the string is the UTC date and time the message was sent (in “HTTP-date” format as defined by RFC 7231 Date/Time Formats)
Example: Tue, 01 Nov 1994 08:12:31 GMT.
*Important:
When using a Master Key Token the date used in the signature must be the same date used in x-ms-date
When using a Resource Token, the Date used to create the signature is the current date,
but the date you specify in the x-ms-date header is the date returned from your REST call when you get the Permission,
which when acted upon will create a new resource token (as explained above in “Resource tokens”).

StringToSign = Verb.toLowerCase() + “\n” + ResourceType.toLowerCase() + “\n” + ResourceLink + “\n” + Date.toLowerCase() + “\n” + “” + “\n”;

The authorization string should be encoded before adding it to the REST request to ensure that it contains no invalid characters. Ensure that it’s Base64 encoded using MIME RFC2045.
*Important: The master key used in the hash signature should be decoded using MIME RFC2045 as it’s Base64 encoded.

x-ms-date

You need to provide the date of the request (current datetime) per RFC 1123 date format expressed in Coordinated Universal Time.
Example, Fri, 08 Apr 2015 03:52:31 GMT
When using a master key this is the same date you specified when you created the signature
When using a resource token this is the timestamp (converted to RFC 1123 format) returned from REST call when you created or fetched the resource.

x-ms-version

This value is currently 2017-02-22, check google for any newer version when you need to.

Now that we know how to create the header keys for a REST API call, we can continue with creating a Resource Token to use in creating documents in Cosmos.

To create a document we must
1) Create a user (one time setup)
2) Create permissions for this user (one time setup)
3) Get the Permission resource so a new resource token is create which is valid for 1 hour by default
4) We can then use this resource token to fetch, update, create, delete documents

As stated above a resource token is created when you grant a user permision so we must first create a user. You can use the SDK to do a one time setup or follow below for steps using the REST API:

Create a User

verb: POST
resourcetype: users
resourcelink: dbs/reservoirs
masterkey: <get the master key from “keys” in the Azure portal for the server you are accessing (example: servername01)
keytype: master
version: 1.0
date: 15 Jan 2018 21:15:43 GMT <<current datetime (C# example: utc_date = DateTime.UtcNow.ToString(“r”))

Using these parameters generate the authorization key using one of the many code examples you can find online:

authHeader = GenerateMasterKeyAuthorizationSignature(verb, resourceId, resourceType, masterKey, "master""1.0");
private static string GenerateMasterKeyAuthorizationSignature(string verb, string resourceId, string resourceType, string key, string keyType, string tokenVersion)
    {
        var hmacSha256 = new System.Security.Cryptography.HMACSHA256 { Key = Convert.FromBase64String(key) };
        string payLoad = string.Format(System.Globalization.CultureInfo.InvariantCulture,
            "{0}\n{1}\n{2}\n{3}\n{4}\n",
            verb.ToLowerInvariant(),
            resourceType.ToLowerInvariant(),
            resourceId,
            date.ToLowerInvariant(),
            ""
        );
        byte[] hashPayLoad = hmacSha256.ComputeHash(System.Text.Encoding.UTF8.GetBytes(payLoad));
        string signature = Convert.ToBase64String(hashPayLoad);
        return System.Web.HttpUtility.UrlEncode(String.Format(System.Globalization.CultureInfo.InvariantCulture, "type={0}&ver={1}&sig={2}",
            keyType,
            tokenVersion,
            signature));
    }

Using a HTTP tool (like Postman) enter the following parts and execute (you will need to generate your own values):
URL: https://servername01.documents.azure.com:443/dbs/reservoirs/users
Header:
Authorization: type%3dmaster%26ver%3d1.0%26sig%3dqdnr8SgZmqHs7AaQAXW2Y3lK%2f12xkFL690mOD%2b4BYSc%3d
x-ms-date: Mon, 15 Jan 2018 21:15:43 GMT
x-ms-version: 2017-02-22
Body:
{“id”: “nifi_user”}

Create a Permission

    verb: POST
    resourcetype: permissions
    resourcelink: dbs/reservoirs/users/some_user
    masterkey: <get the master key from “keys” in the Azure portal for the server you are accessing (example: servername01)
    keytype: master
    version: 1.0
    date: 15 Jan 2018 21:15:43 GMT <<current datetime

**Generate your authorization token using the code shown above

Using a HTTP tool (like Postman) enter the following parts and execute (you will need to generate your own values):
URL: https://servername01.documents.azure.com:443/dbs/reservoirs/users/nifi_user/permissions
Header:
    Authorization: type%3dmaster%26ver%3d1.0%26sig%3dqdnr8SgZmqHs7AaQAXW2Y3lK%2f12xkFL690mOD%2b4BYSc%3d
    x-ms-date: Mon, 15 Jan 2018 21:15:43 GMT
    x-ms-version: 2017-02-22
Body:
{
“id”: “nifi_user_permission”,
“permissionMode”: “All”,
“resource”: “dbs/reservoirs/colls/contextFeeds”
}

Get Existing Permisison

This will create and return a new resource token to use that is time bound
verb: GET
resourcetype: permissions
resourcelink: dbs/reservoirs/users/nifi_user/permissions/nifi_user_permission
masterkey: <get the master key from “keys” in the Azure portal for the server you are accessing (example: servername01)
keytype: master
version: 1.0
date: 15 Jan 2018 21:15:43 GMT <<current datetime

**Generate your authorization token

Using a HTTP tool (like Postman) enter the following parts and execute (you will need to generate your own values):
URL: https://servername01.documents.azure.com:443/dbs/reservoirs/users/nifi_user/permissions/nifi_user_permission
Header:
    Authorization: type%3dmaster%26ver%3d1.0%26sig%3dqdnr8SgZmqHs7AaQAXW2Y3lK%2f12xkFL690mOD%2b4BYSc%3d
    x-ms-date: Mon, 15 Jan 2018 21:15:43 GMT
    x-ms-version: 2017-02-22
Body:

Response Body:
{
“id”: “nifi_user_permission”,
“permissionMode”: “All”,
“resource”: “dbs/reservoirs/colls/contextFeeds”,
“_rid”: “mxUdAF77CQCTvzGG9CMpAA==”,
“_self”: “dbs/mxUdAA==/users/mxUdAF77CQA=/permissions/mxUdAF77CQCTvzGG9CMpAA==/”,
“_etag”: “\”00000900-0000-0000-0000-5a5d38980000\””,
“_ts”: 1516058776,
“_token”:        “type=resource&ver=1&sig=cvEla+OX1DfXcgiL5Fx9lQ==;Gja/ev6itmGlnhmcVkvxlkFSIFv9+j/1AWe7YIomrNxd3dTho3m1J0brQjp9WmFMnteSgLT1KxTf7+Mmu1vKJFbp9zD7tnv1kKe45EMAs0XmTmkiX2TYcjRVr4u0/Bfd1T77HeBeV8w3nPjfqhv0+6tKDqlZZuo4U NXcvgucq5Ez2ca+K+Ft5rjGdCUA1j3s6zC1/idqqC1z5NAKqNyIxaJC8cCQobCgdulrUyWJccexVE+MChDSkCGuLs5CGwaH;”
}
You want to exact the _ts and _token values in order to create a document.
*Important: the _ts value will need 3 zeros added to the end before you can epoch convert it to a valid datetime.

Create a Document

    verb: POST
Using a HTTP tool (like Postman) enter and execute the following:
URL: https://servername01.documents.azure.com:443/dbs/reservoirs/colls/contextFeeds
Header:
Authorization: <the _token value from step 3>      type=resource&ver=1&sig=cvEla+OX1DfXcgiL5Fx9lQ==;Gja/ev6itmGlnhmcVkvxlkFSIFv9+j/1AWe7YIomrNxd3dTho3m1J0brQjp9WmFMnteSgLT1KxTf7+Mmu1vKJFbp9zD7tnv1kKe45EMAs0XmTmkiX2TYcjRVr4u0/Bfd1T77HeBeV8w3nPjfqhv0+6tKDqlZZuo4UNXcvgucq5Ez2ca+K+Ft5rjGdCUA1j3s6zC1/idqqC1z5NAKqNyIxaJC8cCQobCgdulrUyWJccexVE+MChDSkCGuLs5CGwaH;
x-ms-date: <the _ts: value from step 3 converted to RFC 7231 Date/Time Format
x-ms-version: 2017-02-22
Body: <your valid json document>

Update a Document

    verb: PUT
Using a HTTP tool (like Postman) enter and execute the following:
URL: https://servername01.documents.azure.com:443/dbs/reservoirs/colls/contextFeeds/docs/<document id>
Header:
    Authorization: <the _token value from step 3> type=resource&ver=1&sig=cvEla+OX1DfXcgiL5Fx9lQ==;Gja/ev6itmGlnhmcVkvxlkFSIFv9+j/1AWe7YIomrNxd3dTho3m1J0brQjp9WmFMnteSgLT1KxTf7+Mmu1vKJFbp9zD7tnv1kKe45EMAs0XmTmkiX2TYcjRVr4u0/Bfd1T77HeBeV8w3nPjfqhv0+6tKDqlZZuo4UNXcvgucq5Ez2ca+K+Ft5rjGdCUA1j3s6zC1/idqqC1z5NAKqNyIxaJC8cCQobCgdulrUyWJccexVE+MChDSkCGuLs5CGwaH;
    x-ms-date: <the _ts: value from step 3 converted to RFC 7231 Date/Time Format
    x-ms-version: 2017-02-22
Body:

<your valid json document with the same document Id as specified in the URL>

Key Features

Azure DocumentDB offers the following key capabilities and benefits:

  • Elastically scalable throughput and storage: Easily scale up or scale down your DocumentDB JSON database to meet your application needs. Your data is stored on solid state disks (SSD) for low predictable latencies. DocumentDB supports containers for storing JSON data called collections that can scale to virtually unlimited storage sizes and provisioned throughput. You can elastically scale DocumentDB with predictable performance seamlessly as your application grows.
  • Multi-region replication: DocumentDB transparently replicates your data to all regions you’ve associated with your DocumentDB account, enabling you to develop applications that require global access to data while providing tradeoffs between consistency, availability and performance, all with corresponding guarantees. DocumentDB provides transparent regional failover with multi-homing APIs, and the ability to elastically scale throughput and storage across the globe. Learn more in Distribute data globally with DocumentDB.
  • Ad hoc queries with familiar SQL syntax: Store heterogeneous JSON documents within DocumentDB and query these documents through a familiar SQL syntax. DocumentDB utilizes a highly concurrent, lock free, log structured indexing technology to automatically index all document content. This enables rich real-time queries without the need to specify schema hints, secondary indexes, or views. Learn more in Query DocumentDB.
  • JavaScript execution within the database: Express application logic as stored procedures, triggers, and user defined functions (UDFs) using standard JavaScript. This allows your application logic to operate over data without worrying about the mismatch between the application and the database schema. DocumentDB provides full transactional execution of JavaScript application logic directly inside the database engine. The deep integration of JavaScript enables the execution of INSERT, REPLACE, DELETE, and SELECT operations from within a JavaScript program as an isolated transaction. Learn more in DocumentDB server-side programming.
  • Tunable consistency levels: Select from four well defined consistency levels to achieve optimal trade-off between consistency and performance. For queries and read operations, DocumentDB offers four distinct consistency levels: strong, bounded-staleness, session, and eventual. These granular, well-defined consistency levels allow you to make sound trade-offs between consistency, availability, and latency. Learn more in Using consistency levels to maximize availability and performance in DocumentDB.
  • Fully managed: Eliminate the need to manage database and machine resources. As a fully-managed Microsoft Azure service, you do not need to manage virtual machines, deploy and configure software, manage scaling, or deal with complex data-tier upgrades. Every database is automatically backed up and protected against regional failures. You can easily add a DocumentDB account and provision capacity as you need it, allowing you to focus on your application instead of operating and managing your database.
  • Open by design: Get started quickly by using existing skills and tools. Programming against DocumentDB is simple, approachable, and does not require you to adopt new tools or adhere to custom extensions to JSON or JavaScript. You can access all of the database functionality including CRUD, query, and JavaScript processing over a simple RESTful HTTP interface. DocumentDB embraces existing formats, languages, and standards while offering high value database capabilities on top of them.
  • Automatic indexing: By default, DocumentDB automatically indexes all the documents in the database and does not expect or require any schema or creation of secondary indices. Don’t want to index everything? Don’t worry, you can opt out of paths in your JSON files too.
  • Compatibility with MongoDB apps: With DocumentDB: API for MongoDB, you can use DocumentDB databases as the data store for apps written for MongoDB. This means that by using existing drivers for MongoDB databases, your application written for MongoDB can now communicate with DocumentDB and use DocumentDB databases instead of MongoDB databases. In many cases, you can switch from using MongoDB to DocumentDB by simply changing a connection string. Learn more in What is DocumentDB: API for MongoDB?

Price Estimate

At any scale, you can store data and provision throughput capacity. Each collection is billed hourly based on the amount of data stored (in GBs) and throughput reserved in units of 100 RUs/second, with a minimum of 400 RUs/second.

SSD Storage (per GB) $0.25 GB / Month
Reserved RUs /second (per 100 RUs, 400 RUs minimum) $0.008/hr

Estimate Request Units and Data Storage

Using an assortment of examples and a small JSON document (15-lines):

Documents
Create/sec
Read/sec
Update/sec
Delete/sec

RU
Total/sec

Cost/month
1,000,000 1 1 100 1 1,000 $59.00
1,000,000 1 1 400 1 4,000 $240.00
1,000,000 1 1 1,000 1 10,600 $630.00
1,000,000 1 1 40,000 1 42,600 $2,535.55

Support Options

Three levels of support are available:

  • Developer: Business-day only Support with web incident submission, unlimited break/fix 24×7, fastest response time <8 hours, $29.00/month
  • Standard: 24×7 Support with web incident submission, phone support, fastest response time <2 hours, $300.00/month
  • Professional Direct: 24×7 Support with priority handling, escalation phone line, fastest response time <1 hours, $1,000/month

References

Documentation: https://docs.microsoft.com/en-us/azure/documentdb/

Capacity Planner: https://www.documentdb.com/capacityplanner