azure-sdk-for-js/sdk/cosmosdb/cosmos
Manik Khandelwal cae13bb96a
[Cosmos] JS SDK support for partitioned DiskANN (#31839)
### Packages impacted by this PR
@azure/cosmos

### Describe the problem that is addressed by this PR
In this PR we are adding three optional parameters in the Indexing
Policy in the vector Index to support partitioned DiskANN. The three
optional parameters are:

quantizationByteSize : The number of bytes used in product quantization
of the vectors.This is an optional parameter and applies to index types
DiskANN and quantizedFlat. The allowed range for this parameter is
between 1 and 200.
vectorIndexShardKey : The list of string containing the shard keys used
for partitioning the vector indexes. This is an optional parameter and
applies to index types DiskANN and quantizedFlat.
indexingSearchListSize : The size of the candidate list of approximate
neighbors stored while building the diskANN index as part of the
optimization processes.This is an optional parameter and applies to
index type DiskANN only. The allowed range is between 25 and 500.
The changes have been tested using a Test account, since as of now the
backend changes aren't live yet.

### What are the possible designs available to address the problem? If
there are more than one possible design, why was the one in this PR
chosen?


### Are there test cases added in this PR? _(If not, why?)_
Yes

### Provide a list of related PRs _(if any)_


### Command used to generate this PR:**_(Applicable only to SDK release
request PRs)_

### Checklists
- [ ] Added impacted package name to the issue description
- [ ] Does this PR needs any fixes in the SDK Generator?** _(If so,
create an Issue in the
[Autorest/typescript](https://github.com/Azure/autorest.typescript)
repository and link it here)_
- [ ] Added a changelog (if necessary)

---------

Co-authored-by: Ujjwal Soni <soniujjwal@microsoft.com>
Co-authored-by: Manik Khandelwal <mkhandelwal@microsoft.com>
2024-11-19 23:45:35 +05:30
..
.vscode
MultiRegionWriteSample
benchmarking
consumer-test
review [Cosmos] JS SDK support for partitioned DiskANN (#31839) 2024-11-19 23:45:35 +05:30
samples [Cosmos] JS SDK support for partitioned DiskANN (#31839) 2024-11-19 23:45:35 +05:30
samples-dev [Cosmos] JS SDK support for partitioned DiskANN (#31839) 2024-11-19 23:45:35 +05:30
src [Cosmos] JS SDK support for partitioned DiskANN (#31839) 2024-11-19 23:45:35 +05:30
test [Cosmos] JS SDK support for partitioned DiskANN (#31839) 2024-11-19 23:45:35 +05:30
.gitignore
CHANGELOG.md [Cosmos] JS SDK support for partitioned DiskANN (#31839) 2024-11-19 23:45:35 +05:30
Contributing.md
LICENSE
PoliCheckExclusions.txt
README.md
SDK + Samples Workspace.code-workspace
api-extractor.json
browser-test.js
bundle-types.js
consumer-test.js
dev.md
eslint.config.mjs
package.json Samples for fts (#31829) 2024-11-19 17:49:18 +05:30
prep-samples.js
sample.env
tsconfig.json
tsconfig.strict.json Full text and Hybrid search (#31690) 2024-11-18 23:33:15 +05:30
tsdoc.json

README.md

Azure Cosmos DB client library for JavaScript/TypeScript

latest npm badge Build Status

Azure Cosmos DB is a globally distributed, multi-model database service that supports document, key-value, wide-column, and graph databases. This package is intended for JavaScript/TypeScript applications to interact with SQL API databases and the JSON documents they contain:

  • Create Cosmos DB databases and modify their settings
  • Create and modify containers to store collections of JSON documents
  • Create, read, update, and delete the items (JSON documents) in your containers
  • Query the documents in your database using SQL-like syntax

Key links:

Getting started

Prerequisites

Azure Subscription and Cosmos DB SQL API Account

You must have an Azure Subscription, and a Cosmos DB account (SQL API) to use this package.

If you need a Cosmos DB SQL API account, you can use the Azure Cloud Shell to create one with this Azure CLI command:

az cosmosdb create --resource-group <resource-group-name> --name <cosmos-database-account-name>

Or you can create an account in the Azure Portal

NodeJS

This package is distributed via npm which comes preinstalled with NodeJS using an LTS version.

CORS

You need to set up Cross-Origin Resource Sharing (CORS) rules for your Cosmos DB account if you need to develop for browsers. Follow the instructions in the linked document to create new CORS rules for your Cosmos DB.

Install this package

npm install @azure/cosmos

Get Account Credentials

You will need your Cosmos DB Account Endpoint and Key. You can find these in the Azure Portal or use the Azure CLI snippet below. The snippet is formatted for the Bash shell.

az cosmosdb show --resource-group <your-resource-group> --name <your-account-name> --query documentEndpoint --output tsv
az cosmosdb keys list --resource-group <your-resource-group> --name <your-account-name> --query primaryMasterKey --output tsv

Create an instance of CosmosClient

Interaction with Cosmos DB starts with an instance of the CosmosClient class

const { CosmosClient } = require("@azure/cosmos");

const endpoint = "https://your-account.documents.azure.com";
const key = "<database account masterkey>";
const client = new CosmosClient({ endpoint, key });

async function main() {
  // The rest of the README samples are designed to be pasted into this function body
}

main().catch((error) => {
  console.error(error);
});

For simplicity we have included the key and endpoint directly in the code but you will likely want to load these from a file not in source control using a project such as dotenv or loading from environment variables

In production environments, secrets like keys should be stored in Azure Key Vault

Key concepts

Once you've initialized a CosmosClient, you can interact with the primary resource types in Cosmos DB:

  • Database: A Cosmos DB account can contain multiple databases. When you create a database, you specify the API you'd like to use when interacting with its documents: SQL, MongoDB, Gremlin, Cassandra, or Azure Table. Use the Database object to manage its containers.

  • Container: A container is a collection of JSON documents. You create (insert), read, update, and delete items in a container by using methods on the Container object.

  • Item: An Item is a JSON document stored in a container. Each Item must include an id key with a value that uniquely identifies the item within the container. If you do not provide an id, the SDK will generate one automatically.

For more information about these resources, see Working with Azure Cosmos databases, containers and items.

Examples

The following sections provide several code snippets covering some of the most common Cosmos DB tasks, including:

Create a database

After authenticating your CosmosClient, you can work with any resource in the account. The code snippet below creates a NOSQL API database.

const { database } = await client.databases.createIfNotExists({ id: "Test Database" });
console.log(database.id);

Create a container

This example creates a container with default settings

const { container } = await database.containers.createIfNotExists({ id: "Test Database" });
console.log(container.id);

Using Partition Keys

This example shows various types of partition Keys supported.

await container.item("id", "1").read();        // string type
await container.item("id", 2).read();          // number type
await container.item("id", true).read();       // boolean type
await container.item("id", {}).read();         // None type
await container.item("id", undefined).read();  // None type
await container.item("id", null).read();       // null type

If the Partition Key consists of a single value, it could be supplied either as a literal value, or an array.

await container.item("id", "1").read();
await container.item("id", ["1"]).read();

If the Partition Key consists of more than one values, it should be supplied as an array.

await container.item("id", ["a", "b"]).read();
await container.item("id", ["a", 2]).read();
await container.item("id", [{}, {}]).read();
await container.item("id", ["a", {}]).read();
await container.item("id", [2, null]).read();

Insert items

To insert items into a container, pass an object containing your data to Items.upsert. The Azure Cosmos DB service requires each item has an id key. If you do not provide one, the SDK will generate an id automatically.

This example inserts several items into the container

const cities = [
  { id: "1", name: "Olympia", state: "WA", isCapitol: true },
  { id: "2", name: "Redmond", state: "WA", isCapitol: false },
  { id: "3", name: "Chicago", state: "IL", isCapitol: false }
];
for (const city of cities) {
  await container.items.create(city);
}

Read an item

To read a single item from a container, use Item.read. This is a less expensive operation than using SQL to query by id.

await container.item("1", "1").read();

CRUD on Container with hierarchical partition key

Create a Container with hierarchical partition key

const containerDefinition = {
  id: "Test Database",
  partitionKey: {
    paths: ["/name", "/address/zip"],
    version: PartitionKeyDefinitionVersion.V2,
    kind: PartitionKeyKind.MultiHash,
  },
}
const { container } = await database.containers.createIfNotExists(containerDefinition);
console.log(container.id);

Insert an item with hierarchical partition key defined as - ["/name", "/address/zip"]

const item = {
  id: "1",
  name: 'foo',
  address: {
    zip: 100
  },
  active: true
}
await container.items.create(item);

To read a single item from a container with hierarchical partition key defined as - ["/name", "/address/zip"],

await container.item("1", ["foo", 100]).read();

Query an item with hierarchical partition key with hierarchical partition key defined as - ["/name", "/address/zip"],

const { resources } = await container.items
  .query("SELECT * from c WHERE c.active = true", {
          partitionKey: ["foo", 100],
        })
  .fetchAll();
for (const item of resources) {
  console.log(`${item.name}, ${item.address.zip} `);
}

Delete an item

To delete items from a container, use Item.delete.

// Delete the first item returned by the query above
await container.item("1").delete();

Query the database

A Cosmos DB SQL API database supports querying the items in a container with Items.query using SQL-like syntax:

const { resources } = await container.items
  .query("SELECT * from c WHERE c.isCapitol = true")
  .fetchAll();
for (const city of resources) {
  console.log(`${city.name}, ${city.state} is a capitol `);
}

Perform parameterized queries by passing an object containing the parameters and their values to Items.query:

const { resources } = await container.items
  .query({
    query: "SELECT * from c WHERE c.isCapitol = @isCapitol",
    parameters: [{ name: "@isCapitol", value: true }]
  })
  .fetchAll();
for (const city of resources) {
  console.log(`${city.name}, ${city.state} is a capitol `);
}

For more information on querying Cosmos DB databases using the SQL API, see Query Azure Cosmos DB data with SQL queries.

Change Feed Pull Model

Change feed can be fetched for a partition key, a feed range or an entire container.

To process the change feed, create an instance of ChangeFeedPullModelIterator. When you initially create ChangeFeedPullModelIterator, you must specify a required changeFeedStartFrom value inside the ChangeFeedIteratorOptions which consists of both the starting position for reading changes and the resource(a partition key or a FeedRange) for which changes are to be fetched. You can optionally use maxItemCount in ChangeFeedIteratorOptions to set the maximum number of items received per page.

Note: If no changeFeedStartFrom value is specified, then changefeed will be fetched for an entire container from Now().

There are four starting positions for change feed:

  • Beginning
// Signals the iterator to read changefeed from the beginning of time.
const options = {
  changeFeedStartFrom: ChangeFeedStartFrom.Beginning(),
};
const iterator = container.getChangeFeedIterator(options);
  • Time
// Signals the iterator to read changefeed from a particular point of time.
const time = new Date("2023/09/11"); // some sample date
const options = {
  changeFeedStartFrom: ChangeFeedStartFrom.Time(time),
};
  • Now
// Signals the iterator to read changefeed from this moment onward.
const options = {
  changeFeedStartFrom: ChangeFeedStartFrom.Now(),
};
  • Continuation
// Signals the iterator to read changefeed from a saved point.
const continuationToken = "some continuation token recieved from previous request";
const options = {
  changeFeedStartFrom: ChangeFeedStartFrom.Continuation(continuationToken),
};

Here's an example of fetching change feed for a partition key

const partitionKey = "some-partition-Key-value";
const options = {
  changeFeedStartFrom: ChangeFeedStartFrom.Beginning(partitionKey),
};

const iterator = container.items.getChangeFeedIterator(options);

while (iterator.hasMoreResults) {
  const response = await iterator.readNext();
  // process this response
}

Because the change feed is effectively an infinite list of items that encompasses all future writes and updates, the value of hasMoreResults is always true. When you try to read the change feed and there are no new changes available, you receive a response with NotModified status.

More detailed usage guidelines and examples of change feed can be found here.

Error Handling

The SDK generates various types of errors that can occur during an operation.

  1. ErrorResponse is thrown if the response of an operation returns an error code of >=400.
  2. TimeoutError is thrown if Abort is called internally due to timeout.
  3. AbortError is thrown if any user passed signal caused the abort.
  4. RestError is thrown in case of failure of underlying system call due to network issues.
  5. Errors generated by any devDependencies. For Eg. @azure/identity package could throw CredentialUnavailableError.

Following is an example for handling errors of type ErrorResponse, TimeoutError, AbortError, and RestError.

try {
  // some code
} catch (err) {
  if (err instanceof ErrorResponse) {
    // some specific error handling.
  } else if (err instanceof RestError) {
    // some specific error handling.
  }
  // handle other type of errors in similar way.
  else {
    // for any other error.
  }
}

It's important to properly handle these errors to ensure that your application can gracefully recover from any failures and continue functioning as expected. More details about some of these errors and their possible solutions can be found here.

Troubleshooting

General

When you interact with Cosmos DB errors returned by the service correspond to the same HTTP status codes returned for REST API requests:

HTTP Status Codes for Azure Cosmos DB

Conflicts

For example, if you try to create an item using an id that's already in use in your Cosmos DB database, a 409 error is returned, indicating the conflict. In the following snippet, the error is handled gracefully by catching the exception and displaying additional information about the error.

try {
  await containers.items.create({ id: "existing-item-id" });
} catch (error) {
  if (error.code === 409) {
    console.log("There was a conflict with an existing item");
  }
}

Transpiling

The Azure SDKs are designed to support ES5 JavaScript syntax and LTS versions of Node.js. If you need support for earlier JavaScript runtimes such as Internet Explorer or Node 6, you will need to transpile the SDK code as part of your build process.

Handle transient errors with retries

While working with Cosmos DB, you might encounter transient failures caused by rate limits enforced by the service, or other transient problems like network outages. For information about handling these types of failures, see Retry pattern in the Cloud Design Patterns guide, and the related Circuit Breaker pattern.

Logging

Enabling logging may help uncover useful information about failures. In order to see a log of HTTP requests and responses, set the AZURE_LOG_LEVEL environment variable to info. Alternatively, logging can be enabled at runtime by calling setLogLevel in the @azure/logger. While using AZURE_LOG_LEVEL make sure to set it before logging library is initialized. Ideally pass it through command line, if using libraries like dotenv make sure such libraries are initialized before logging library.

const { setLogLevel } = require("@azure/logger");
setLogLevel("info");

For more detailed instructions on how to enable logs, you can look at the @azure/logger package docs.

Diagnostics

Cosmos Diagnostics feature provides enhanced insights into all your client operations. A CosmosDiagnostics object is added to response of all client operations. such as

  • Point look up operation reponse - item.read(), container.create(), database.delete()
  • Query operation reponse -queryIterator.fetchAll(),
  • Bulk and Batch operations -item.batch().
  • Error/Exception response objects.

A CosmosDiagnostics object is added to response of all client operations. There are 3 Cosmos Diagnostic levels, info, debug and debug-unsafe. Where only info is meant for production systems and debug and debug-unsafe are meant to be used during development and debugging, since they consume significantly higher resources. Cosmos Diagnostic level can be set in 2 ways

  • Programatically
  const client = new CosmosClient({ endpoint, key, diagnosticLevel: CosmosDbDiagnosticLevel.debug });
  • Using environment variables. (Diagnostic level set by Environment variable has higher priority over setting it through client options.)
  export AZURE_COSMOSDB_DIAGNOSTICS_LEVEL="debug"

Cosmos Diagnostic has three members

  • ClientSideRequestStatistics Type: Contains aggregates diagnostic details, including metadata lookups, retries, endpoints contacted, and request and response statistics like payload size and duration. (is always collected, can be used in production systems.)

  • DiagnosticNode: Is a tree-like structure that captures detailed diagnostic information. Similar to har recording present in browsers. This feature is disabled by default and is intended for debugging non-production environments only. (collected at diagnostic level debug and debug-unsafe)

  • ClientConfig: Captures essential information related to client's configuration settings during client initialization. (collected at diagnostic level debug and debug-unsafe)

Please make sure to never set diagnostic level to debug-unsafe in production environment, since it this level CosmosDiagnostics captures request and response payloads and if you choose to log it (it is by default logged by @azure/logger at verbose level). These payloads might get captured in your log sinks.

Consuming Diagnostics

  • Since diagnostics is added to all Response objects. You could programatically access CosmosDiagnostic as follows.
  // For point look up operations
  const { container, diagnostics: containerCreateDiagnostic } =
    await database.containers.createIfNotExists({
      id: containerId,
      partitionKey: {
        paths: ["/key1"],
      },
  });

  // For Batch operations
   const operations: OperationInput[] = [
    {
      operationType: BulkOperationType.Create,
      resourceBody: { id: 'A', key: "A", school: "high" },
    },
  ];
  const response = await container.items.batch(operations, "A"); 
  
  // For query operations
  const queryIterator = container.items.query("select * from c");
  const { resources, diagnostics } = await queryIterator.fetchAll();

  // While error handling
  try {
    // Some operation that might fail
  } catch (err) {
    const diagnostics = err.diagnostics
  }
  • You could also log diagnostics using @azure/logger, diagnostic is always logged using @azure/logger at verbose level. So if you set Diagnostic level to debug or debug-unsafe and @azure/logger level to verbose, diagnostics will be logged.

Next steps

More sample code

Several samples are available to you in the SDK's GitHub repository. These samples provide example code for additional scenarios commonly encountered while working with Cosmos DB:

  • Database Operations
  • Container Operations
  • Item Operations
  • Configuring Indexing
  • Reading a container Change Feed
  • Stored Procedures
  • Changing Database/Container throughput settings
  • Multi Region Write Operations

Limitations

Currently the features below are not supported. For alternatives options, check the Workarounds section below.

Data Plane Limitations:

  • Queries with COUNT from a DISTINCT subquery
  • Direct TCP Mode access
  • Aggregate cross-partition queries, like sorting, counting, and distinct, don't support continuation tokens. Streamable queries, like SELECT * FROM WHERE , support continuation tokens. See the "Workaround" section for executing non-streamable queries without a continuation token.
  • Change Feed: Processor
  • Change Feed: Read multiple partitions key values
  • Change Feed pull model support for partial hierarchical partition keys #27059
  • Cross-partition ORDER BY for mixed types
  • Control Plane Limitations:

    • Get CollectionSizeUsage, DatabaseUsage, and DocumentUsage metrics
    • Create Geospatial Index
    • Update Autoscale throughput

    Workarounds

    Continuation token for cross partitions queries

    You can achieve cross partition queries with continuation token support by using Side car pattern. This pattern can also enable applications to be composed of heterogeneous components and technologies.

    Executing non-stremable cross-partition query

    To execute non-streamable queries without the use of continuation tokens, you can create a query iterator with the required query specification and options. The following sample code demonstrates how to use a query iterator to fetch all results without the need for a continuation token:

    const querySpec = {
      query: "SELECT c.status, COUNT(c.id) AS count FROM c GROUP BY c.status",
    };
    const queryOptions = {
      maxItemCount: 10, // maximum number of items to return per page
      enableCrossPartitionQuery: true,
    };
    const querIterator = await container.items.query(querySpec, queryOptions);
    while (querIterator.hasMoreResults()) {
      const { resources: result } = await querIterator.fetchNext();
      //Do something with result
    }
    

    This approach can also be used for streamable queries.

    Control Plane operations

    Typically, you can use Azure Portal, Azure Cosmos DB Resource Provider REST API, Azure CLI or PowerShell for the control plane unsupported limitations.

    Additional documentation

    For more extensive documentation on the Cosmos DB service, see the Azure Cosmos DB documentation on docs.microsoft.com.

    Contributing

    If you'd like to contribute to this library, please read the contributing guide to learn more about how to build and test the code.

    Impressions