зеркало из https://github.com/github/vitess-gh.git
211 строки
9.9 KiB
Markdown
211 строки
9.9 KiB
Markdown
This document defines common Vitess concepts and terminology.
|
|
|
|
## Keyspace
|
|
|
|
A *keyspace* is a logical database. In the unsharded case, it maps directly
|
|
to a MySQL database name, but it can also map to multiple MySQL databases.
|
|
|
|
Reading data from a keyspace is like reading from a MySQL database. However,
|
|
depending on the consistency requirements of the read operation, Vitess
|
|
might fetch the data from a master database or from a replica. By routing
|
|
each query to the appropriate database, Vitess allows your code to be
|
|
structured as if it were reading from a single MySQL database.
|
|
|
|
When a database is
|
|
[sharded](http://en.wikipedia.org/wiki/Shard_(database_architecture)),
|
|
a keyspace maps to multiple MySQL databases. In that case, a single query sent
|
|
to Vitess will be routed to one or more shards, depending on where the requested
|
|
data resides.
|
|
|
|
## Keyspace ID
|
|
|
|
The *keyspace ID* is the value that is used to decide on which shard a given
|
|
record lives. [Range-based Sharding](http://vitess.io/user-guide/sharding.html#range-based-sharding)
|
|
refers to creating shards that each cover a particular range of keyspace IDs.
|
|
|
|
Often, the keyspace ID is computed as the hash of some column in your data,
|
|
such as the user ID. This would result in randomly spreading users across
|
|
the range-based shards.
|
|
Using this technique means you can split a given shard by replacing it with two
|
|
or more new shards that combine to cover the original range of keyspace IDs,
|
|
without having to move any records in other shards.
|
|
|
|
Previously, our resharding process required each table to store this value as a
|
|
`keyspace_id` column because it was computed by the application. However, this
|
|
column is no longer necessary when you allow VTGate to compute the keyspace ID
|
|
for you, for example by using a `hash` vindex.
|
|
|
|
## Shard
|
|
|
|
A *shard* is a division within a keyspace. A shard typically contains one MySQL
|
|
master and many MySQL slaves.
|
|
|
|
Each MySQL instance within a shard has the same data (excepting some replication
|
|
lag). The slaves can serve read-only traffic (with eventual consistency guarantees),
|
|
execute long-running data analysis tools, or perform administrative tasks
|
|
(backup, restore, diff, etc.).
|
|
|
|
A keyspace that does not use sharding effectively has one shard.
|
|
Vitess names the shard `0` by convention. When sharded, a keyspace has `N`
|
|
shards with non-overlapping data.
|
|
|
|
### Resharding
|
|
|
|
Vitess supports [dynamic resharding](http://vitess.io/user-guide/sharding.html#resharding),
|
|
in which the number of shards is changed on a live cluster. This can be either
|
|
splitting one or more shards into smaller pieces, or merging neighboring shards
|
|
into bigger pieces.
|
|
|
|
During dynamic resharding, the data in the source shards is copied into the
|
|
destination shards, allowed to catch up on replication, and then compared
|
|
against the original to ensure data integrity. Then the live serving
|
|
infrastructure is shifted to the destination shards, and the source shards are
|
|
deleted.
|
|
|
|
## Tablet
|
|
|
|
A *tablet* is a combination of a `mysqld` process and a corresponding `vttablet`
|
|
process, usually running on the same machine.
|
|
|
|
Each tablet is assigned a *tablet type*, which specifies what role it currently
|
|
performs.
|
|
|
|
### Tablet Types
|
|
|
|
* **master** - A *replica* tablet that happens to currently be the MySQL master
|
|
for its shard.
|
|
* **replica** - A MySQL slave that is eligible to be promoted to *master*.
|
|
Conventionally, these are reserved for serving live, user-facing
|
|
requests (like from the website's frontend).
|
|
* **rdonly** - A MySQL slave that cannot be promoted to *master*.
|
|
Conventionally, these are used for background processing jobs,
|
|
such as taking backups, dumping data to other systems, heavy
|
|
analytical queries, MapReduce, and resharding.
|
|
* **backup** - A tablet that has stopped replication at a consistent snapshot,
|
|
so it can upload a new backup for its shard. After it finishes,
|
|
it will resume replication and return to its previous type.
|
|
* **restore** - A tablet that has started up with no data, and is in the process
|
|
of restoring itself from the latest backup. After it finishes,
|
|
it will begin replicating at the GTID position of the backup,
|
|
and become either *replica* or *rdonly*.
|
|
* **drained** - A tablet that has been reserved by a Vitess background
|
|
process (such as rdonly tablets for resharding).
|
|
|
|
<!-- TODO: Add pointer to complete list of types and explain how to update type? -->
|
|
|
|
## Keyspace Graph
|
|
|
|
The *keyspace graph* allows Vitess to decide which set of shards to use for a
|
|
given keyspace, cell, and tablet type.
|
|
|
|
### Partitions
|
|
|
|
During horizontal resharding (splitting or merging shards), there can be shards
|
|
with overlapping key ranges. For example, the source shard of a split may serve
|
|
`c0-d0` while its destination shards serve `c0-c8` and `c8-d0` respectively.
|
|
|
|
Since these shards need to exist simultaneously during the migration,
|
|
the keyspace graph maintains a list (called a *partitioning* or just a *partition*)
|
|
of shards whose ranges cover all possible keyspace ID values, while being
|
|
non-overlapping and contiguous. Shards can be moved in and out of this list to
|
|
determine whether they are active.
|
|
|
|
The keyspace graph stores a separate partitioning for each `(cell, tablet type)` pair.
|
|
This allows migrations to proceed in phases: first migrate *rdonly* and
|
|
*replica* requests, one cell at a time, and finally migrate *master* requests.
|
|
|
|
### Served From
|
|
|
|
During vertical resharding (moving tables out from one keyspace to form a new
|
|
keyspace), there can be multiple keyspaces that contain the same table.
|
|
|
|
Since these multiple copies of the table need to exist simultaneously during
|
|
the migration, the keyspace graph supports keyspace redirects, called
|
|
`ServedFrom` records. That enables a migration flow like this:
|
|
|
|
1. Create `new_keyspace` and set its `ServedFrom` to point to `old_keyspace`.
|
|
1. Update the app to look for the tables to be moved in `new_keyspace`.
|
|
Vitess will automatically redirect these requests to `old_keyspace`.
|
|
1. Perform a vertical split clone to copy data to the new keyspace and start
|
|
filtered replication.
|
|
1. Remove the `ServedFrom` redirect to begin actually serving from `new_keyspace`.
|
|
1. Drop the now unused copies of the tables from `old_keyspace`.
|
|
|
|
There can be a different `ServedFrom` record for each `(cell, tablet type)` pair.
|
|
This allows migrations to proceed in phases: first migrate *rdonly* and
|
|
*replica* requests, one cell at a time, and finally migrate *master* requests.
|
|
|
|
## Replication Graph
|
|
|
|
The *replication graph* identifies the relationships between master
|
|
databases and their respective replicas. During a master failover,
|
|
the replication graph enables Vitess to point all existing replicas
|
|
to a newly designated master database so that replication can continue.
|
|
|
|
## Topology Service
|
|
|
|
The *[Topology Service](/user-guide/topology-service.html)*
|
|
is a set of backend processes running on different servers.
|
|
Those servers store topology data and provide a distributed locking service.
|
|
|
|
Vitess uses a plug-in system to support various backends for storing topology
|
|
data, which are assumed to provide a distributed, consistent key-value store.
|
|
By default, our [local example](http://vitess.io/getting-started/local-instance.html)
|
|
uses the ZooKeeper plugin, and the [Kubernetes example](http://vitess.io/getting-started/)
|
|
uses etcd.
|
|
|
|
The topology service exists for several reasons:
|
|
|
|
* It enables tablets to coordinate among themselves as a cluster.
|
|
* It enables Vitess to discover tablets, so it knows where to route queries.
|
|
* It stores Vitess configuration provided by the database administrator that is
|
|
needed by many different servers in the cluster, and that must persist between
|
|
server restarts.
|
|
|
|
A Vitess cluster has one global topology service, and a local topology service
|
|
in each cell. Since *cluster* is an overloaded term, and one Vitess cluster is
|
|
distinguished from another by the fact that each has its own global topology
|
|
service, we refer to each Vitess cluster as a **toposphere**.
|
|
|
|
### Global Topology
|
|
|
|
The global topology stores Vitess-wide data that does not change frequently.
|
|
Specifically, it contains data about keyspaces and shards as well as the
|
|
master tablet alias for each shard.
|
|
|
|
The global topology is used for some operations, including reparenting and
|
|
resharding. By design, the global topology server is not used a lot.
|
|
|
|
In order to survive any single cell going down, the global topology service
|
|
should have nodes in multiple cells, with enough to maintain quorum in the
|
|
event of a cell failure.
|
|
|
|
### Local Topology
|
|
|
|
Each local topology contains information related to its own cell.
|
|
Specifically, it contains data about tablets in the cell, the keyspace graph
|
|
for that cell, and the replication graph for that cell.
|
|
|
|
The local topology service must be available for Vitess to discover tablets
|
|
and adjust routing as tablets come and go. However, no calls to the topology
|
|
service are made in the critical path of serving a query at steady state.
|
|
That means queries are still served during temporary unavailability of topology.
|
|
|
|
## Cell (Data Center)
|
|
|
|
A *cell* is a group of servers and network infrastructure collocated in an area,
|
|
and isolated from failures in other cells. It is typically either a full data
|
|
center or a subset of a data center, sometimes called a *zone* or *availability zone*.
|
|
Vitess gracefully handles cell-level failures, such as when a cell is cut off the network.
|
|
|
|
Each cell in a Vitess implementation has a [local topology service](#topology-service),
|
|
which is hosted in that cell. The topology service contains most of the
|
|
information about the Vitess tablets in its cell.
|
|
This enables a cell to be taken down and rebuilt as a unit.
|
|
|
|
Vitess limits cross-cell traffic for both data and metadata.
|
|
While it may be useful to also have the ability to route read traffic to
|
|
individual cells, Vitess currently serves reads only from the local cell.
|
|
Writes will go cross-cell when necessary, to wherever the master for that shard
|
|
resides.
|