зеркало из https://github.com/github/vitess-gh.git
154 строки
6.6 KiB
Markdown
154 строки
6.6 KiB
Markdown
|
# Reference Tables
|
||
|
|
||
|
This document describes a proposed design and implementation guidelines for
|
||
|
the `Reference Tables` Vitess feature.
|
||
|
|
||
|
The idea is to have a `reference keyspace` that contains a small number of
|
||
|
`reference tables`, and replicate these tables to every shard of another
|
||
|
keyspace, the `destination keyspace`. Any update to the reference tables will be
|
||
|
propagated to the destination keyspace. The reference tables in the destination
|
||
|
keyspace can then be used directly, in read-only mode (in `JOIN` queries for
|
||
|
instance). This provides for much better performance than cross-keyspace joins.
|
||
|
|
||
|
Since the data is replicated to every shard on the destination keyspace, the
|
||
|
write QPS on the reference keyspace is also applied to every shard on the
|
||
|
destination keyspace. So the change rate in the reference keyspace cannot be
|
||
|
very high, and so let's also assume it is not sharded.
|
||
|
|
||
|
Vitess already has all the right components to support this scenario, it's just
|
||
|
a matter of plumbing it the right way. Let's explore the required changes.
|
||
|
|
||
|
## Replication Setup
|
||
|
|
||
|
We can copy all the data and then setup `Filtered Replication` between the
|
||
|
reference keyspace and each shard of the destination keyspace. This is really
|
||
|
just a corner case of the vertical splits Vitess already supports.
|
||
|
|
||
|
Action items:
|
||
|
|
||
|
* First, this setup probably needs to be explicitely mentioned somewhere in the
|
||
|
topology, not just as SourceShard objects in the destination keyspace, so
|
||
|
Vitess can know about this setup at a higher level. Let's add a `repeated
|
||
|
ReferenceKeyspace` field to the Keyspace object. Each `ReferenceKeyspace`
|
||
|
object contains the name of the reference keyspace, the list of tables to
|
||
|
copy, and the UID of the SourceShard entry (the same UID in all shards). By
|
||
|
making this a repeated field, the destination keyspace should be able to
|
||
|
support multiple reference keyspaces to copy data from, if necessary.
|
||
|
|
||
|
* `vtctl CopySchemaShard` can already be used to copy the schema from the
|
||
|
reference keyspace to each destination shard.
|
||
|
|
||
|
* A new vtworker data copy job needs to be added. `vtworker VerticalSplitClone`
|
||
|
would be a good start, but the new copy has a few special requirements: the
|
||
|
destination keyspace needs the data in all its shards, and the write rate
|
||
|
cannot cause the destination shards to be overloaded (or lag behind on
|
||
|
replication). This job would also populate an entry in the
|
||
|
`\_vt/blp\_checkpoint` table in the destination shards.
|
||
|
|
||
|
* Setting up Filtered Replication after the copy is easy, each destination Shard
|
||
|
just needs to have a SourceShard with the proper data, and after a
|
||
|
RefreshTablet, the destination masters will start the replication.
|
||
|
|
||
|
All these steps can be supported by a vtctld workflow.
|
||
|
|
||
|
## Supporting Horizontal Resharding in the Destination Keyspace
|
||
|
|
||
|
We still need to support horizontal resharding in the Destination Keyspace while
|
||
|
the Reference Tables feature is enabled.
|
||
|
|
||
|
Action items:
|
||
|
|
||
|
* Each step of the process would know what to do because of the
|
||
|
`ReferenceKeyspace` entries in the destinatin keyspace.
|
||
|
|
||
|
* `vtctl CopySchemaShard` needs to also copy the schema of the reference tables.
|
||
|
|
||
|
* `vtworker SplitClone` needs to also copy all of the reference table data, and
|
||
|
the `\_vt/blp\_checkpoint` entry for the reference keyspace. It needs to do
|
||
|
that copy from the first source shard to each destination shard only once. So
|
||
|
in case of a split, the source shard data is copied to each destination
|
||
|
shard. In case of a merge, only the first source shard data is copied to the
|
||
|
destination shard.
|
||
|
|
||
|
* Enabling filtered replication on the destination shards needs to not use the
|
||
|
same UID for replication as the reference keyspace entries. Right now, their
|
||
|
UID are hardcoded to start at 0. But since the reserved UIDs are documented in
|
||
|
the `ReferenceKeyspace` entries, it's easy.
|
||
|
|
||
|
* At this point, the destination shards will also replicate from the reference
|
||
|
keyspace. When the `vtctl MigrateServedType master` command is issued, it
|
||
|
needs to just remove the horizontal resharding Filtered Replication entries,
|
||
|
not the `ReferenceKeyspace` entries entries.
|
||
|
|
||
|
## Other Use Cases
|
||
|
|
||
|
Other scenarios might also need to be supported, or explicitely disabled:
|
||
|
|
||
|
* Simple schema changes, or complicated Schema Swap in the reference keyspace:
|
||
|
They would also need to be applied to the destination keyspace, the same way.
|
||
|
|
||
|
* Vertical Split of the reference keyspace: Since it is replicated, splitting it
|
||
|
will be more complicated.
|
||
|
|
||
|
## Query Routing
|
||
|
|
||
|
This would be handled by the vtgate and the VSchema. Once the reference tables
|
||
|
are documented in the VSchema, vtgate will know to do the following:
|
||
|
|
||
|
* DMLs on the reference tables are routed to the reference keyspace.
|
||
|
|
||
|
* Select queries on the reference tables only are also routed to the reference
|
||
|
keyspace.
|
||
|
|
||
|
* JOIN queries between reference tables and destination keyspace tables can be
|
||
|
routed only to the right destination keyspace (based on that keyspace sharding
|
||
|
situation).
|
||
|
|
||
|
Note this introduces some corner cases: for instance, if the client is asking
|
||
|
for a JOIN between reference tables and destination keyspace tables, with tablet
|
||
|
type `master`. Routing this to the destination keyspace would satisfy the
|
||
|
critical read for the destination tables, but not for the reference
|
||
|
tables. vtgate may need to perform the JOIN to both masters at this point.
|
||
|
|
||
|
Action Items:
|
||
|
|
||
|
* Find the right way to represent reference tables in the VSchema.
|
||
|
|
||
|
* Implement corresponding query routing.
|
||
|
|
||
|
## Notes
|
||
|
|
||
|
### Vitess Keyspace vs MySQL Database
|
||
|
|
||
|
This may force us to revisit the use of databases in our tablets. The current
|
||
|
assumption is that a keyspace only has one MySQL database (with a name usually
|
||
|
derived from the keyspace name with a `vt_` prefix, but that can also be
|
||
|
changed):
|
||
|
|
||
|
* When vttablet connects to MySQL for data queries, it uses that database name
|
||
|
by default.
|
||
|
|
||
|
* The VSchema also maps tables to keyspaces, so it can just send queries that
|
||
|
have no keyspace to the right shard (which in turns is configured properly for
|
||
|
that database).
|
||
|
|
||
|
* Vitess' Filtered Replication only replicates data related to that single
|
||
|
database. The database name has to be the same when we horizontally split a
|
||
|
keyspace, so statements from the source shards can be applied on the
|
||
|
destination shards.
|
||
|
|
||
|
* Vitess' Query Service only loads the schema for that single database.
|
||
|
|
||
|
Maybe it's time to change this assumption:
|
||
|
|
||
|
* A keyspace could be defined as a group of databases, each having a group of
|
||
|
tables.
|
||
|
|
||
|
* When addressing a table, we could support the `keyspace.database.table`
|
||
|
syntax.
|
||
|
|
||
|
* We could support moving databases from one keyspace to another.
|
||
|
|
||
|
But maybe this is too many indirections for nothing? Saying one keyspace is one
|
||
|
database may be just the complexity we need.
|