adcd033e32
Revert "Added TLS1.2 for Azure Table" |
||
---|---|---|
.nuget | ||
AzureTable | ||
Console | ||
Core | ||
CsvFile | ||
DocumentDb | ||
DynamoDb | ||
HBase | ||
JsonFile | ||
Microsoft.DataTransfer.FunctionalTests | ||
MongoDb | ||
RavenDb | ||
Shared | ||
Solution Items | ||
Sql | ||
TestSettings | ||
Wpf | ||
media/import-data | ||
.gitattributes | ||
.gitignore | ||
Contributing.md | ||
DataTransfer.sln | ||
EULA.RTF | ||
LICENSE.txt | ||
README.md | ||
THIRDPARTYNOTICES.RTF |
README.md
PLEASE NOTE
Azure Cosmos DB data migration tool is offered as community support repo. That means:
dt.exe and dtui.exe are provided here as a community-supported sample
dt.exe and dtui.exe are not first-party/first-class tools maintained by Microsoft
Azure Cosmos DB data migration tool (dt.exe and dtui.exe)
The Azure Cosmos DB data migration tool is an open source solution to import data to Azure CosmosDB endpoints from a variety of sources. Some source/endpoint pairs supported by this tool for migration include:
- Migrate a JSON file to Azure Cosmos DB SQL API
- Migrate a CSV file to Azure Cosmos DB SQL API
- Export from Azure Cosmos DB SQL API to a JSON file
- Migrate a SQL Server instance to Azure Cosmos DB SQL API
- And other scenarios.
Additionally, other sources including Amazon AWS DynamoDB and RavenDB are supported as sources and endpoints for migration. For other Azure Cosmos DB data migration scenarios, we recommend using Azure Data Factory (ADF) to facilitate small data migrations - see our guidance on Azure Cosmos DB SQL API and Azure Cosmos DB API for MongoDB migrations using ADF. For larger migrations, view our guide for ingesting data.
For help using the tool, or for guidance on migrating to Azure Cosmos DB APIs other than SQL API, please see the tutorial in the next section.
Tutorial: Use Data migration tool to migrate your data to Azure Cosmos DB
This tutorial provides instructions on using the Azure Cosmos DB data migration tool, which can import data from various sources into Azure Cosmos containers and tables. You can import from JSON files and CSV files (including but not limited to files stored in Azure Blob storage), as well as SQL Server, MongoDB, Azure Table storage, Amazon DynamoDB, and even Azure Cosmos DB SQL API collections. You migrate that data to collections and tables for use with Azure Cosmos DB. The Data migration tool can also be used when migrating from a single partition collection to a multi-partition collection for the SQL API.
This tutorial covers the following tasks:
- Installing the Data migration tool
- Importing data from different data sources
- Exporting from Azure Cosmos DB to JSON
Here we will place the greatest emphasis on the JSON, CSV, Blob storage, and SQL Server migration scenarios as these are the most popular use-cases. However, guidance on other scenarios is also provided toward the end of this tutorial.
[NOTE]
The Azure Cosmos DB data migration tool is an open source tool designed for small migrations. For larger migrations, view our guide for ingesting data.
The following is a summary of the level of support for different Azure Cosmos DB APIs in the Data migration tool:
- SQL API - You can use any of the source options provided in the Data migration tool to import data at a small scale. Learn about migration options for importing data at a large scale.
- Table API - You can use the Data migration tool or AzCopy to import data. For more information, see Import data for use with the Azure Cosmos DB Table API.
- Azure Cosmos DB's API for MongoDB - The Data migration tool doesn't support Azure Cosmos DB's API for MongoDB either as a source or as a target, although the tool does support migrations from MongoDB. If you want to migrate the data in or out of collections in Azure Cosmos DB, refer to How to migrate MongoDB data to a Cosmos database with Azure Cosmos DB's API for MongoDB for instructions. You can still use the Data migration tool to export data from MongoDB to Azure Cosmos DB SQL API collections for use with the SQL API.
- Cassandra API - The Data migration tool isn't a supported import tool for Cassandra API accounts. Learn about migration options for importing data into Cassandra API
- Gremlin API - The Data migration tool isn't a supported import tool for Gremlin API accounts at this time. Learn about migration options for importing data into Gremlin API
Prerequisites
Before following the instructions in this article, ensure that you do the following steps:
-
Install Microsoft .NET Framework 4.5.1 or higher.
-
Increase throughput: The duration of your data migration depends on the amount of throughput you set up for an individual collection or a set of collections. Be sure to increase the throughput for larger data migrations. After you've completed the migration, decrease the throughput to save costs. For more information about increasing throughput in the Azure portal, see performance levels and pricing tiers in Azure Cosmos DB.
-
Create Azure Cosmos DB resources: Before you start the migrating data, pre-create all your collections from the Azure portal. To migrate to an Azure Cosmos DB account that has database level throughput, provide a partition key when you create the Azure Cosmos containers.
[IMPORTANT] To make sure that the Data migration tool uses Transport Layer Security (TLS) 1.2 when connecting to your Azure Cosmos accounts, use the .NET Framework version 4.7 or follow the instructions found in this article.
Overview
The Data migration tool is an open-source solution that imports data to Azure Cosmos DB from a variety of sources, including:
- JSON files
- CSV files
- SQL Server
- MongoDB
- Azure Table storage
- Amazon DynamoDB
- HBase
- Azure Cosmos containers
While the import tool includes a graphical user interface (dtui.exe), it can also be driven from the command-line (dt.exe). In fact, there's an option to output the associated command after setting up an import through the UI. You can transform tabular source data, such as SQL Server or CSV files, to create hierarchical relationships (subdocuments) during import. Keep reading to learn more about source options, sample commands to import from each source, target options, and viewing import results.
[NOTE]
You should only use the Azure Cosmos DB migration tool for small migrations. For large migrations, view our guide for ingesting data.
Installation
Download executable package
- Download a zip of the latest signed dt.exe and dtui.exe Windows binaries here
- Unzip into any directory on your computer and open the extracted directory to find the binaries
Build from source
The migration tool source code is available on GitHub in this repository. You can download and compile the solution locally then run either:
- Dtui.exe: Graphical interface version of the tool
- Dt.exe: Command-line version of the tool
Setting up and starting migration
Please read the following three steps before getting started with the Data migration tool:
Choose your data source: Once you've installed the tool, it's time to import your data. What kind of data do you want to import or export?
- Import JSON files
- Export JSON files
- Import CSV files
- Import a supported file type from Azure Blob storage
- Import from SQL Server
- Import from any supported source and target, leveraging Bulk or Sequential operation on Azure Cosmos DB SQL API
- If your source is not mentioned above - look at Import or export from other source not mentioned
Additional settings: Optionally, please review these guidelines on additional configurations such as indexing and advanced settings.
Start migration: Once you have configured the Data migration tool, follow these steps for start migration.
Import JSON files
The JSON file source importer option allows you to import one or more single document JSON files or JSON files that each have an array of JSON documents. When adding folders that have JSON files to import, you have the option of recursively searching for files in subfolders.
The connection string is in the following format:
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>
- The
<CosmosDB Endpoint>
is the endpoint URI. You can get this value from the Azure portal. Navigate to your Azure Cosmos account. Open the Overview pane and copy the URI value. - The
<AccountKey>
is the "Password" or PRIMARY KEY. You can get this value from the Azure portal. Navigate to your Azure Cosmos account. Open the Connection Strings or Keys pane, and copy the "Password" or PRIMARY KEY value. - The
<CosmosDB Database>
is the CosmosDB database name.
Example:
AccountEndpoint=https://myCosmosDBName.documents.azure.com:443/;AccountKey=wJmFRYna6ttQ79ATmrTMKql8vPri84QBiHTt6oinFkZRvoe7Vv81x9sn6zlVlBY10bEPMgGM982wfYXpWXWB9w==;Database=myDatabaseName
[NOTE]
Use the Verify command to ensure that the Cosmos DB account specified in the connection string field can be accessed.
Here are some command-line samples to import JSON files:
#Import a single JSON file
dt.exe /s:JsonFile /s.Files:.\Sessions.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Sessions /t.CollectionThroughput:2500
#Import a directory of JSON files
dt.exe /s:JsonFile /s.Files:C:\TESessions\*.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Sessions /t.CollectionThroughput:2500
#Import a directory (including sub-directories) of JSON files
dt.exe /s:JsonFile /s.Files:C:\LastFMMusic\**\*.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Music /t.CollectionThroughput:2500
#Import a directory (single), directory (recursive), and individual JSON files
dt.exe /s:JsonFile /s.Files:C:\Tweets\*.*;C:\LargeDocs\**\*.*;C:\TESessions\Session48172.json;C:\TESessions\Session48173.json;C:\TESessions\Session48174.json;C:\TESessions\Session48175.json;C:\TESessions\Session48177.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:subs /t.CollectionThroughput:2500
#Import a single JSON file and partition the data across 4 collections
dt.exe /s:JsonFile /s.Files:D:\\CompanyData\\Companies.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:comp[1-4] /t.PartitionKey:name /t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Export to JSON file
The Azure Cosmos DB JSON exporter allows you to export any of the available source options to a JSON file that has an array of JSON documents. The tool handles the export for you. Alternatively, you can choose to view the resulting migration command and run the command yourself. The resulting JSON file may be stored locally or in Azure Blob storage.
You may optionally choose to prettify the resulting JSON. This action will increase the size of the resulting document while making the contents more human readable.
-
Standard JSON export
[{"id":"Sample","Title":"About Paris","Language":{"Name":"English"},"Author":{"Name":"Don","Location":{"City":"Paris","Country":"France"}},"Content":"Don's document in Azure Cosmos DB is a valid JSON document as defined by the JSON spec.","PageViews":10000,"Topics":[{"Title":"History of Paris"},{"Title":"Places to see in Paris"}]}]
-
Prettified JSON export
[ { "id": "Sample", "Title": "About Paris", "Language": { "Name": "English" }, "Author": { "Name": "Don", "Location": { "City": "Paris", "Country": "France" } }, "Content": "Don's document in Azure Cosmos DB is a valid JSON document as defined by the JSON spec.", "PageViews": 10000, "Topics": [ { "Title": "History of Paris" }, { "Title": "Places to see in Paris" } ] }]
Here is a command-line sample to export the JSON file to Azure Blob storage:
dt.exe /ErrorDetails:All /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB database_name>" /s.Collection:<CosmosDB collection_name>
/t:JsonFile /t.File:"blobs://<Storage account key>@<Storage account name>.blob.core.windows.net:443/<Container_name>/<Blob_name>"
/t.Overwrite
Click the link below to return to the steps for getting started with Data migration tool:
Import CSV files and convert CSV to JSON
The CSV file source importer option enables you to import one or more CSV files. When adding folders that have CSV files for import, you have the option of recursively searching for files in subfolders.
Similar to the SQL source, the nesting separator property may be used to create hierarchical relationships (sub-documents) during import. Consider the following CSV header row and data rows:
Note the aliases such as DomainInfo.Domain_Name and RedirectInfo.Redirecting. By specifying a nesting separator of '.', the import tool will create DomainInfo and RedirectInfo subdocuments during the import. Here is an example of a resulting document in Azure Cosmos DB:
{ "DomainInfo": { "Domain_Name": "ACUS.GOV", "Domain_Name_Address": "https://www.ACUS.GOV" }, "Federal Agency": "Administrative Conference of the United States", "RedirectInfo": { "Redirecting": "0", "Redirect_Destination": "" }, "id": "9cc565c5-ebcd-1c03-ebd3-cc3e2ecd814d" }
The import tool tries to infer type information for unquoted values in CSV files (quoted values are always treated as strings). Types are identified in the following order: number, datetime, boolean.
There are two other things to note about CSV import:
- By default, unquoted values are always trimmed for tabs and spaces, while quoted values are preserved as-is. This behavior can be overridden with the Trim quoted values checkbox or the /s.TrimQuoted command-line option.
- By default, an unquoted null is treated as a null value. This behavior can be overridden (that is, treat an unquoted null as a "null" string) with the Treat unquoted NULL as string checkbox or the /s.NoUnquotedNulls command-line option.
Here is a command-line sample for CSV import:
dt.exe /s:CsvFile /s.Files:.\Employees.csv /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Employees /t.IdField:EntityID /t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Import from Azure Blob storage
The JSON file, CSV file, and MongoDB export file source importer options allow you to import one or more files from Azure Blob storage. After specifying a Blob container URL and Account Key, provide a regular expression to select the file(s) to import.
Here is a command-line sample to import JSON files from Azure Blob storage:
dt.exe /s:JsonFile /s.Files:"blobs://<account key>@account.blob.core.windows.net:443/importcontainer/.*" /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:doctest
Click the link below to return to the steps for getting started with Data migration tool:
Import from SQL Server
The SQL source importer option allows you to import from an individual SQL Server database and optionally filter the records to be imported using a query. In addition, you can modify the document structure by specifying a nesting separator (more on that in a moment).
The format of the connection string is the standard SQL connection string format.
[NOTE]
Use the Verify command to ensure that the SQL Server instance specified in the connection string field can be accessed.
The nesting separator property is used to create hierarchical relationships (sub-documents) during import. Consider the following SQL query:
select CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as [Address.AddressType], AddressLine1 as [Address.AddressLine1], City as [Address.Location.City], StateProvinceName as [Address.Location.StateProvinceName], PostalCode as [Address.PostalCode], CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses WHERE AddressType='Main Office'
Which returns the following (partial) results:
Note the aliases such as Address.AddressType and Address.Location.StateProvinceName. By specifying a nesting separator of '.', the import tool creates Address and Address.Location subdocuments during the import. Here is an example of a resulting document in Azure Cosmos DB:
{ "id": "956", "Name": "Finer Sales and Service", "Address": { "AddressType": "Main Office", "AddressLine1": "#500-75 O'Connor Street", "Location": { "City": "Ottawa", "StateProvinceName": "Ontario" }, "PostalCode": "K4B 1S2", "CountryRegionName": "Canada" } }
Here are some command-line samples to import from SQL Server:
#Import records from SQL which match a query
dt.exe /s:SQL /s.ConnectionString:"Data Source=<server>;Initial Catalog=AdventureWorks;User Id=advworks;Password=<password>;" /s.Query:"select CAST(BusinessEntityID AS varchar) as Id, * from Sales.vStoreWithAddresses WHERE AddressType='Main Office'" /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:Stores /t.IdField:Id /t.CollectionThroughput:2500
#Import records from sql which match a query and create hierarchical relationships
dt.exe /s:SQL /s.ConnectionString:"Data Source=<server>;Initial Catalog=AdventureWorks;User Id=advworks;Password=<password>;" /s.Query:"select CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as [Address.AddressType], AddressLine1 as [Address.AddressLine1], City as [Address.Location.City], StateProvinceName as [Address.Location.StateProvinceName], PostalCode as [Address.PostalCode], CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses WHERE AddressType='Main Office'" /s.NestingSeparator:. /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:StoresSub /t.IdField:Id /t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Import to the SQL API from any source, leveraging Bulk or Sequential operation on Azure Cosmos DB
The following two topics are discussed in this section:
Alternatively, click the link below to return to the steps for getting started with Data migration tool:
Import to the SQL API (Bulk Import)
The Azure Cosmos DB Bulk importer allows you to import from any of the available source options, using an Azure Cosmos DB stored procedure for efficiency. The tool supports import to one single-partitioned Azure Cosmos container. It also supports sharded import whereby data is partitioned across more than one single-partitioned Azure Cosmos container. For more information about partitioning data, see Partitioning and scaling in Azure Cosmos DB. The tool creates, executes, and then deletes the stored procedure from the target collection(s).
The format of the Azure Cosmos DB connection string is:
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;
The Azure Cosmos DB account connection string can be retrieved from the Keys page of the Azure portal, as described in How to manage an Azure Cosmos DB account, however the name of the database needs to be appended to the connection string in the following format:
Database=<CosmosDB Database>;
[NOTE]
Use the Verify command to ensure that the Azure Cosmos DB instance specified in the connection string field can be accessed.
To import to a single collection, enter the name of the collection to import data from and click the Add button. To import to more than one collection, either enter each collection name individually or use the following syntax to specify more than one collection: collection_prefix[start index - end index]. When specifying more than one collection using the aforementioned syntax, keep the following guidelines in mind:
- Only integer range name patterns are supported. For example, specifying collection[0-3] creates the following collections: collection0, collection1, collection2, collection3.
- You can use an abbreviated syntax: collection[3] creates the same set of collections mentioned in step 1.
- More than one substitution can be provided. For example, collection[0-1] [0-9] generates 20 collection names with leading zeros (collection01, ..02, ..03).
Once the collection name(s) have been specified, choose the desired throughput of the collection(s) (400 RUs to 10,000 RUs). For best import performance, choose a higher throughput. For more information about performance levels, see Performance levels in Azure Cosmos DB.
[NOTE]
The performance throughput setting only applies to collection creation. If the specified collection already exists, its throughput won't be modified.
When you import to more than one collection, the import tool supports hash-based sharding. In this scenario, specify the document property you wish to use as the Partition Key. (If Partition Key is left blank, documents are sharded randomly across the target collections.)
You may optionally specify which field in the import source should be used as the Azure Cosmos DB document ID property during the import. If documents don't have this property, then the import tool generates a GUID as the ID property value.
There are a number of advanced options available during import. First, while the tool includes a default bulk import stored procedure (BulkInsert.js), you may choose to specify your own import stored procedure:
Additionally, when importing date types (for example, from SQL Server or MongoDB), you can choose between three import options:
- String: Persist as a string value
- Epoch: Persist as an Epoch number value
- Both: Persist both string and Epoch number values. This option creates a subdocument, for example: "date_joined": { "Value": "2013-10-21T21:17:25.2410000Z", "Epoch": 1382390245 }
The Azure Cosmos DB Bulk importer has the following additional advanced options:
- Batch Size: The tool defaults to a batch size of 50. If the documents to be imported are large, consider lowering the batch size. Conversely, if the documents to be imported are small, consider raising the batch size.
- Max Script Size (bytes): The tool defaults to a max script size of 512 KB.
- Disable Automatic Id Generation: If every document to be imported has an ID field, then selecting this option can increase performance. Documents missing a unique ID field aren't imported.
- Update Existing Documents: The tool defaults to not replacing existing documents with ID conflicts. Selecting this option allows overwriting existing documents with matching IDs. This feature is useful for scheduled data migrations that update existing documents.
- Number of Retries on Failure: Specifies how often to retry the connection to Azure Cosmos DB during transient failures (for example, network connectivity interruption).
- Retry Interval: Specifies how long to wait between retrying the connection to Azure Cosmos DB in case of transient failures (for example, network connectivity interruption).
- Connection Mode: Specifies the connection mode to use with Azure Cosmos DB. The available choices are DirectTcp, DirectHttps, and Gateway. The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.
[TIP] The import tool defaults to connection mode DirectTcp. If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.
Click the link below to return to the steps for getting started with Data migration tool:
Import to the SQL API (Sequential Record Import)
The Azure Cosmos DB sequential record importer allows you to import from an available source option on a record-by-record basis. You might choose this option if you’re importing to an existing collection that has reached its quota of stored procedures. The tool supports import to a single (both single-partition and multi-partition) Azure Cosmos container. It also supports sharded import whereby data is partitioned across more than one single-partition or multi-partition Azure Cosmos container. For more information about partitioning data, see Partitioning and scaling in Azure Cosmos DB.
The format of the Azure Cosmos DB connection string is:
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;
You can retrieve the connection string for the Azure Cosmos DB account from the Keys page of the Azure portal, as described in How to manage an Azure Cosmos DB account. However, the name of the database needs to be appended to the connection string in the following format:
Database=<Azure Cosmos database>;
[NOTE]
Use the Verify command to ensure that the Azure Cosmos DB instance specified in the connection string field can be accessed.
To import to a single collection, enter the name of the collection to import data into, and then click the Add button. To import to more than one collection, enter each collection name individually. You may also use the following syntax to specify more than one collection: collection_prefix[start index - end index]. When specifying more than one collection via the aforementioned syntax, keep the following guidelines in mind:
- Only integer range name patterns are supported. For example, specifying collection[0-3] creates the following collections: collection0, collection1, collection2, collection3.
- You can use an abbreviated syntax: collection[3] creates the same set of collections mentioned in step 1.
- More than one substitution can be provided. For example, collection[0-1] [0-9] creates 20 collection names with leading zeros (collection01, ..02, ..03).
Once the collection name(s) have been specified, choose the desired throughput of the collection(s) (400 RUs to 1,000,000 RUs). For best import performance, choose a higher throughput. For more information about performance levels, see Performance levels in Azure Cosmos DB. Any import to collections with throughput >10,000 RUs require a partition key. If you choose to have more than 1,000,000 RUs, you need to file a request in the portal to have your account increased.
[NOTE]
The throughput setting only applies to collection or database creation. If the specified collection already exists, its throughput won't be modified.
When importing to more than one collection, the import tool supports hash-based sharding. In this scenario, specify the document property you wish to use as the Partition Key. (If Partition Key is left blank, documents are sharded randomly across the target collections.)
You may optionally specify which field in the import source should be used as the Azure Cosmos DB document ID property during the import. (If documents don't have this property, then the import tool generates a GUID as the ID property value.)
There are a number of advanced options available during import. First, when importing date types (for example, from SQL Server or MongoDB), you can choose between three import options:
- String: Persist as a string value
- Epoch: Persist as an Epoch number value
- Both: Persist both string and Epoch number values. This option creates a subdocument, for example: "date_joined": { "Value": "2013-10-21T21:17:25.2410000Z", "Epoch": 1382390245 }
The Azure Cosmos DB - Sequential record importer has the following additional advanced options:
- Number of Parallel Requests: The tool defaults to two parallel requests. If the documents to be imported are small, consider raising the number of parallel requests. If this number is raised too much, the import may experience rate limiting.
- Disable Automatic Id Generation: If every document to be imported has an ID field, then selecting this option can increase performance. Documents missing a unique ID field aren't imported.
- Update Existing Documents: The tool defaults to not replacing existing documents with ID conflicts. Selecting this option allows overwriting existing documents with matching IDs. This feature is useful for scheduled data migrations that update existing documents.
- Number of Retries on Failure: Specifies how often to retry the connection to Azure Cosmos DB during transient failures (for example, network connectivity interruption).
- Retry Interval: Specifies how long to wait between retrying the connection to Azure Cosmos DB during transient failures (for example, network connectivity interruption).
- Connection Mode: Specifies the connection mode to use with Azure Cosmos DB. The available choices are DirectTcp, DirectHttps, and Gateway. The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.
[TIP] The import tool defaults to connection mode DirectTcp. If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.
Click the link below to return to the steps for getting started with Data migration tool:
Other sources
The following other sources are supported by the Data migration tool:
- Import from MongoDB
- Import MongoDB export files
- Import from Azure Table storage
- Import from Amazon DynamoDB
- Import from a SQL API collection
- Import from HBase
Alternatively - click the link below to return to the steps for getting started with Data migration tool:
Import from MongoDB
[IMPORTANT]
If you're importing to a Cosmos account configured with Azure Cosmos DB's API for MongoDB, follow these instructions for migration with Azure Data Migration Service
With the MongoDB source importer option, you can import from a single MongoDB collection, optionally filter documents using a query, and modify the document structure by using a projection.
The connection string is in the standard MongoDB format:
mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database>
[NOTE]
Use the Verify command to ensure that the MongoDB instance specified in the connection string field can be accessed.
Enter the name of the collection from which data will be imported. You may optionally specify or provide a file for a query, such as {pop: {$gt:5000}}
, or a projection, such as {loc:0}
, to both filter and shape the data that you're importing.
Here are some command-line samples to import from MongoDB:
#Import all documents from a MongoDB collection
dt.exe /s:MongoDB /s.ConnectionString:mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database> /s.Collection:zips /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:BulkZips /t.IdField:_id /t.CollectionThroughput:2500
#Import documents from a MongoDB collection which match the query and exclude the loc field
dt.exe /s:MongoDB /s.ConnectionString:mongodb://<dbuser>:<dbpassword>@<host>:<port>/<database> /s.Collection:zips /s.Query:{pop:{$gt:50000}} /s.Projection:{loc:0} /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:BulkZipsTransform /t.IdField:_id/t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Import MongoDB export files
[IMPORTANT]
If you're importing to an Azure Cosmos DB account with support for MongoDB, follow these instructions for migrations using MongoDB native tools.
The MongoDB export JSON file source importer option allows you to import one or more JSON files produced from the mongoexport utility.
When adding folders that have MongoDB export JSON files for import, you have the option of recursively searching for files in subfolders.
Here is a command-line sample to import from MongoDB export JSON files:
dt.exe /s:MongoDBExport /s.Files:D:\mongoemployees.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:employees /t.IdField:_id /t.Dates:Epoch /t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Import from Azure Table storage
The Azure Table storage source importer option allows you to import from an individual Azure Table storage table. Optionally, you can filter the table entities to be imported.
You may output data that was imported from Azure Table Storage to Azure Cosmos DB tables and entities for use with the Table API. Imported data can also be output to collections and documents for use with the SQL API. However, Table API is only available as a target in the command-line utility. You can't export to Table API by using the Data migration tool user interface. For more information, see Import data for use with the Azure Cosmos DB Table API.
The format of the Azure Table storage connection string is:
DefaultEndpointsProtocol=<protocol>;AccountName=<Account Name>;AccountKey=<Account Key>;
[NOTE]
Use the Verify command to ensure that the Azure Table storage instance specified in the connection string field can be accessed.
Enter the name of the Azure table from to import from. You may optionally specify a filter.
The Azure Table storage source importer option has the following additional options:
- Include Internal Fields
- All - Include all internal fields (PartitionKey, RowKey, and Timestamp)
- None - Exclude all internal fields
- RowKey - Only include the RowKey field
- Select Columns
- Azure Table storage filters don't support projections. If you want to only import specific Azure Table entity properties, add them to the Select Columns list. All other entity properties are ignored.
Here is a command-line sample to import from Azure Table storage:
dt.exe /s:AzureTable /s.ConnectionString:"DefaultEndpointsProtocol=https;AccountName=<Account Name>;AccountKey=<Account Key>" /s.Table:metrics /s.InternalFields:All /s.Filter:"PartitionKey eq 'Partition1' and RowKey gt '00001'" /s.Projection:ObjectCount;ObjectSize /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:metrics /t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Import from Amazon DynamoDB
The Amazon DynamoDB source importer option allows you to import from a single Amazon DynamoDB table. It can optionally filter the entities to be imported. Several templates are provided so that setting up an import is as easy as possible.
The format of the Amazon DynamoDB connection string is:
ServiceURL=<Service Address>;AccessKey=<Access Key>;SecretKey=<Secret Key>;
[NOTE]
Use the Verify command to ensure that the Amazon DynamoDB instance specified in the connection string field can be accessed.
Here is a command-line sample to import from Amazon DynamoDB:
dt.exe /s:DynamoDB /s.ConnectionString:ServiceURL=https://dynamodb.us-east-1.amazonaws.com;AccessKey=<accessKey>;SecretKey=<secretKey> /s.Request:"{ """TableName""": """ProductCatalog""" }" /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<Azure Cosmos DB Endpoint>;AccountKey=<Azure Cosmos DB Key>;Database=<Azure Cosmos DB database>;" /t.Collection:catalogCollection /t.CollectionThroughput:2500
Click the link below to return to the steps for getting started with Data migration tool:
Import from a SQL API collection
The Azure Cosmos DB source importer option allows you to import data from one or more Azure Cosmos containers and optionally filter documents using a query.
The format of the Azure Cosmos DB connection string is:
AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;
You can retrieve the Azure Cosmos DB account connection string from the Keys page of the Azure portal, as described in How to manage an Azure Cosmos DB account. However, the name of the database needs to be appended to the connection string in the following format:
Database=<CosmosDB Database>;
[NOTE]
Use the Verify command to ensure that the Azure Cosmos DB instance specified in the connection string field can be accessed.
To import from a single Azure Cosmos container, enter the name of the collection to import data from. To import from more than one Azure Cosmos container, provide a regular expression to match one or more collection names (for example, collection01 | collection02 | collection03). You may optionally specify, or provide a file for, a query to both filter and shape the data that you're importing.
[NOTE]
Since the collection field accepts regular expressions, if you're importing from a single collection whose name has regular expression characters, then those characters must be escaped accordingly.
The Azure Cosmos DB source importer option has the following advanced options:
- Include Internal Fields: Specifies whether or not to include Azure Cosmos DB document system properties in the export (for example, _rid, _ts).
- Number of Retries on Failure: Specifies the number of times to retry the connection to Azure Cosmos DB in case of transient failures (for example, network connectivity interruption).
- Retry Interval: Specifies how long to wait between retrying the connection to Azure Cosmos DB in case of transient failures (for example, network connectivity interruption).
- Connection Mode: Specifies the connection mode to use with Azure Cosmos DB. The available choices are DirectTcp, DirectHttps, and Gateway. The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.
[TIP]
The import tool defaults to connection mode DirectTcp. If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.
Here are some command-line samples to import from Azure Cosmos DB:
#Migrate data from one Azure Cosmos container to another Azure Cosmos containers
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /s.Collection:TEColl /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:TESessions /t.CollectionThroughput:2500
#Migrate data from more than one Azure Cosmos container to a single Azure Cosmos container
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /s.Collection:comp1|comp2|comp3|comp4 /t:DocumentDBDocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:singleCollection /t.CollectionThroughput:2500
#Export an Azure Cosmos container to a JSON file
dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /s.Collection:StoresSub /t:JsonFile /t.File:StoresExport.json /t.Overwrite
[TIP]
The Azure Cosmos DB Data Import Tool also supports import of data from the Azure Cosmos DB Emulator. When importing data from a local emulator, set the endpoint to
https://localhost:<port>
.
Click the link below to return to the steps for getting started with Data migration tool:
Import from HBase
The HBase source importer option allows you to import data from an HBase table and optionally filter the data. Several templates are provided so that setting up an import is as easy as possible.
The format of the HBase Stargate connection string is:
ServiceURL=<server-address>;Username=<username>;Password=<password>
[NOTE]
Use the Verify command to ensure that the HBase instance specified in the connection string field can be accessed.
Here is a command-line sample to import from HBase:
dt.exe /s:HBase /s.ConnectionString:ServiceURL=<server-address>;Username=<username>;Password=<password> /s.Table:Contacts /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=<CosmosDB Endpoint>;AccountKey=<CosmosDB Key>;Database=<CosmosDB Database>;" /t.Collection:hbaseimport
Click the link below to return to the steps for getting started with Data migration tool:
Additional configuration settings
This section discusses additional configuration settings for the Data migration tool:
Alternatively, click the link below to return to the steps for getting started with Data migration tool:
Specify an indexing policy
When you allow the migration tool to create Azure Cosmos DB SQL API collections during import, you can specify the indexing policy of the collections. In the advanced options section of the Azure Cosmos DB Bulk import and Azure Cosmos DB Sequential record options, navigate to the Indexing Policy section.
Using the Indexing Policy advanced option, you can select an indexing policy file, manually enter an indexing policy, or select from a set of default templates (by right-clicking in the indexing policy textbox).
The policy templates the tool provides are:
- Default. This policy is best when you perform equality queries against strings. It also works if you use ORDER BY, range, and equality queries for numbers. This policy has a lower index storage overhead than Range.
- Range. This policy is best when you use ORDER BY, range, and equality queries on both numbers and strings. This policy has a higher index storage overhead than Default or Hash.
[NOTE]
If you don't specify an indexing policy, then the default policy is applied. For more information about indexing policies, see Azure Cosmos DB indexing policies.
Click the link below to return to the steps for getting started with Data migration tool:
Advanced configuration
In the Advanced configuration screen, specify the location of the log file to which you would like any errors written. The following rules apply to this page:
-
If a file name isn't provided, then all errors are returned on the Results page.
-
If a file name is provided without a directory, then the file is created (or overwritten) in the current environment directory.
-
If you select an existing file, then the file is overwritten, there's no append option.
-
Then, choose whether to log all, critical, or no error messages. Finally, decide how frequently the on-screen transfer message is updated with its progress.
Click the link below to return to the steps for getting started with Data migration tool:
Start migration - confirm import settings and view command line
-
After you specify the source information, target information, and advanced configuration, review the migration summary and view or copy the resulting migration command if you want. (Copying the command is useful to automate import operations.)
-
Once you’re satisfied with your source and target options, click Import. The elapsed time, transferred count, and failure information (if you didn't provide a file name in the Advanced configuration) update as the import is in process. Once complete, you can export the results (for example, to deal with any import failures).
-
You may also start a new import by either resetting all values or keeping the existing settings. (For example, you may choose to keep connection string information, source and target choice, and more.)
Click the link below to return to the steps for getting started with Data migration tool:
Next steps
In this tutorial, you've done the following tasks:
- Installed the Data migration tool
- Imported data from different data sources
- Exported from Azure Cosmos DB to JSON
You can now proceed to the next tutorial and learn how to query data using Azure Cosmos DB.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.