First version of a doc about API scope.

This commit is contained in:
Alain Jobart 2017-03-17 14:32:35 -07:00
Родитель ca81fda801
Коммит 6fa0db7d7b
1 изменённых файлов: 95 добавлений и 0 удалений

95
doc/APIScope.md Normal file
Просмотреть файл

@ -0,0 +1,95 @@
# API Scope
This document describes the scope of the Vitess APIs, and how they map to
traditional concepts like database and tables.
## Introduction
Vitess is exposed as a single database by clients, but can be composed of any
arbitrary number of databases, some of them possibly sharded with large numbers
of shards. It is not obvious to map this system with clients that expect only a
single database.
## Shard Names and Key Ranges
For unsharded keyspaces, or custom sharded keyspaces, the shard names have
traditionally been numbers e.g. `0`. (It is important to note that Vitess *does
not* interpret these shard names, including name `0`, as range-based shard.)
For keyspaces that are sharded by keyrange, we use the range as the shard
name. For instance `40-80` contains all records whose sharding key is between
0x40..... and 0x80....
The conventions Vitess follow is:
* if a single shard should be targeted, the shard name should be used. This
allows single-shard keyspaces and custom sharded keyspaces to be accessed.
* if a subset of the data should be targeted, and we are using ranged-based
sharding, then a key range should be used. Vitess is then responsible for
mapping the keyrange to a list of shards, and for any aggregation.
## Execute API
The main entry point of a Vitess cluster is the 'Execute' call (or StreamExecute
for streaming queries). It takes a keyspace, a tablet type, and a query. The
VSchema helps Vitess route the query to the right shard and tablet type. This is
the most transparent way of accessing Vitess. Keyspace is the database, and
inside the query, a table is referenced either just by name, or by
`keyspace.name`.
We are adding a `shard` parameter to this entry point. It will work as follows:
* TODO(sougou) document this.
We want to add support for DBA statements:
* DDL statements in the short term will be sent to all shards of a
keyspace. Note for complex schema changes, using Schema Swap is recommended.
Longer term, we want to instead trigger a workflow that will use the best
strategy for the schema change.
* Read-only statements (like `describe table`) will be sent to the first shard
of the keyspace. This somewhat assumes the schema is consistent across all
shards, which may or may not be true. Guaranteeing schema consistency across
shards is however out of scope for this.
* Read-only statements that return table statistics (like data size, number of
rows, ...) will need to be scattered and aggregated across all shards. The
first version of the API won't support this.
We also want to add support for routing sequence queries through this Execute
query (right now only the VSchema engine can use a sequence).
Then, we also want to support changing the VSchema via SQL-like statements, like
`ALTER TABLE ADD VINDEX()`.
## Client Connectors
Client connectors (like JDBC) usually have a connection string that describes
the connection. It usually includes the keyspace and the tablet type, and uses
the Execute API.
## MySQL Server Protocol API
vtgate now exposes an API endpoint that implements the regular MySQL server
protocol. All calls are forwarded to the Execute API. The database provided on
the connection is used as the keyspace, if any. The tablet type is also Master
for now.
## Update Stream and Message Stream
These APIs are meant to target a keyspace and optionally a subset of its shards.
* The keyspace is provided in the API.
* To specify the shard, two options are provided:
* a shard name, to target an individual shard by name.
* a key range, to target a subset of shards in a way that will survive a
resharding event.
Note the current implementation for these services only supports a key range
that exactly maps to one shard. We want to make that better and support
aggregating multiple shards in a keyrange.