It contains a Keyspace and Shard validator for now, but it's easy to add
new ones.
In the process:
- Fixing minor bugs in vtctld2 web ui.
null and undefined are different in TS.
- Ignoring ErrNoNode in FindAllShardsInKeyspace.
It most likely means a shard creation was aborted in the middle, and we
don't want to error everything out when that happens.
- Using vtgate-suffixed error for execute.
We are computing the new error anyway, might as well return it. We
already do the withSuffix for streaming calls and begin / commit / ...
- Adding tests for vtgate error propagation, to verify the right
error code returned by TabletConn is returned to the caller.
- Replacing some error string parsing with codes.
Canonical error codes are much better to use for these.
- When receiving an error at scatter conn layer, instead of parsing the
error string, we now look at the canonical error code, to figure out if
we need to rollback the entire transaction.
- Changing the canonical error code for MySQL errors DATA_TOO_LONG and
DATA_OUT_OF_RANGE from UNKNOWN to BAD_INPUT.
Note the python client doesn't differentiate these two anyway.
Fix for `Access denied; you need (at least one of) the RELOAD
privilege(s) for this operation` during master failover process,
which happens when Orchestrator tries to execute RESET SLAVE.
After https://github.com/youtube/vitess/pull/2138 the notifications by
HealthCheck have changed so that the very first notification about tablet
happens with empty Target field. But TabletStatsCache code was reading info
about keyspace/shard/type of a tablet from Target field and then assumed that
the info didn't change later for a tablet with the same alias (until
notification with Down=false). To fix that I'm reading these values from Tablet
field instead of Target because Tablet is always populated.
Fixes#2164.
Adjust comments to show that split query no longer preserves WHERE
clause. Add ORDER BY clause to comments.
Adjust LIMIT to be numRowsPerQueryPart-1, 1 for correctness.
Some WHERE clauses are very sparse WRT the PK, causing some
split queries to scan most or all of the table. The full scan
algorithm is more stable with no WHERE filters.
- Adding workflow creation dialog in vtctld.
- Adding 'Stop' button to workflows in vtctld.
- Fixing exception for redirect keyspaces display.
- Adding skip_start to vtctld.
- Adding Ui side for showing non-running workflows.
- Adding a vtctld HTTP based long-polling mechanism.
- Releasing UI resources for workflow closure.
Also fixing a race condition in manager shutdown:
We were using (manager.ctx == nil) as a test to check out if the manager
was stopped. This is not reliable, as manager.ctx is updated after the
context cancelation and after taking the manager mutex. In the
meantime, a job could have also realized its context was canceled and
started its shutdown, which takes the manager mutex and checks for
manager.ctx == nil as well.
The new logic uses a new 'stopped' flag instead to differentiate the
Manager shutdown from a workflow Stop. It's better documented too.
This will make the type look more generic, so that it could be used for other
purposes as well, not only for a worker during resharding. In particular I plan
to use the same tablet type for schema swap reserving a seed vttablet to execute
the schema change.
I also add a "drain_reason" tag that will be set by anyone setting the DRAINED
tablet type, so that it's easy to understand why it is drained.
We found a situation where subqueries that used FQ column names
would cause V3 code to be confused about their vindexes. For
example, a query like:
select ... from (select t.id from t) as t1...
would end up creating a vindex named "t.id". But a reference
to t1.id would fail to match against "t.id" because the string
comparison would fail.
This PR fixes the issue by creating aliases named "id" along with
fully qualified names as "t.id", which would allow any kind of
reference to correctly find the columns referenced.
This behavior still doesn't fully match MySQL's behavior. However,
it comes closer, while still preserving SQL name resolution rules.
The particular use case where we deviate is as follows:
select id as foo from t having id = 1
MySQL allows the above construct. However, we'll fail because you
can reference that expression either as 'foo' or 't.id', but not
as 'id'.
Implementation notes:
The colsym already had two values to match against. I have now changed
the meaning of those:
1. The Alias is the alias of the column. If one is not provided,
then we assign the base name of the column. Previously we used to
assign the FQ name.
2. The ExprName (now renamed to QualifiedName) always contains
the FQ name. This used to have a value only if there was already
an alias provided.
The matching logic has been changed accordingly. Some new errors
have been added to handle possible ambiquities that can now happen
because we match using base names. Otherwise, most things should
work like before.
This change adds method TopologyWatcher.WaitForInitialTopology() that will wait
until the initial topology data is read and propagated to TabletRecorder through
AddTablet() method. This also adds HealthCheck.WaitForInitialStatsUpdates()
method which will wait until information about all tablets added via AddTablet()
method is propagated to the listener through StatsUpdate() method.
This will allow resharding and schema swap code to be sure that after they
created the topology watchers all tablets has been propagated to
TabletStatsCache (or other health check listener).
The schema swap code is modified here to use these new methods. This code is not
tested (because schema swap code is not fully ready yet), but serves as an
example of how these new methods can be used.
If there are no backups at all, we assume the shard is being initialized
and allow tablets to start up empty without restoring anything.
If any backups are found, but none can be read successfully either
because they are incomplete or because of transient network errors,
we should not start up empty. The existence of even a partial backup
implies that the shard has already begun accepting data, so it's better
to fail until a complete backup is accessible.
Before restoring, we use checkNoDB() to make sure there isn't any
existing data. There are some internal tables that exist even on an
empty tablet, so we use a whitelist to check only for user data.
Previously, we assumed the user's database would start with "vt_" since
we default to using "vt_{keyspace}". If the `-init_db_name_override`
flag had been used to set a database name that didn't start with "vt_",
it could result in unnecessary restores upon vttablet restarts.
Now we check for the overridden database name instead if one is set.
We were actually showing the most recent entry in the deduplicated
history. Since deduplication was changed to ignore replication lag,
the lag reported under "Current status" on the vttablet status page
wasn't actually current.
When a new tablet is added via AddTablet() HealthCheck currently will notify
listener about the tablet only when the connection to the tablet is established
successfully and when first health message is received. But the tablet can be
unavailable for a long time, and all this time the listener will not be aware of
the tablet existence at all. This may hinder a proper usage of things like
TabletStatsCache. To fix this I'm changing HealthCheck to send initial
notification with Serving == false immediately after AddTablet(), and only after
that wait for the first health message from the tablet.
Also the current implementation of RemoveTablet() just cancels context in the
healthCheckConn object which causes checkConn() function to return immediately.
But that means that listener may not receive the final notification with
Up == false for this tablet. I'm fixing this by making sure that any return
from checkConn() will trigger the final notification to the listener about the
tablet beign removed from the health checking. As a bonus this also makes so
that after Close() is called HealthCheck will notify listener about all tablets
being removed (Up == false). That's not strictly necessary, but I think it
doesn't hurt to do that.
Refreshing replicas is not necessary because they will do this themselves when they become the master.
The resharding.py test verifies this behavior for the PlannedReparentShard command. For this test, I've manually verified this refresh as well:
old master:
- one refresh triggered by vtworker SplitClone
new master:
- at least one refresh triggered by vtctl PlannedReparentShard
- one refresh triggered by vtctl MigrateServedTypes master
In case of a TabletExternallyReparented event, a refresh should happen as well because rpc_external_reparent.go calls "agent.updateState(...)".
A side-effect of this commit is that the "RefreshState" phase during the clone will finish faster if there are many replicas because they are no longer sequentially refreshed (per destination shard). Now this phase is also less prone to transient errors e.g. if any replica does get restarted.
It turns out we can receive an unrelated PREVIOUS_GTIDS_EVENT when
asking MySQL to stream from a given position. This is confusing our
state machine and has to be ignored.
The flag allowed to initialize a tablet with a different keyspace/shard combination and was marked as "use with caution".
Instead of having such a flag, the user should just delete and re-add the tablet.
Removing this feature also allows to remove the "UpdateTabletReplicationData()" call in the "InitTablet()" method in wrangler/tablet.go because the previous "CreateTablet()" call already does this.
1. Changing initialization order in vttablet.
Mostly it's fixing the following bug: the tablet used to go into serving
mode, then go into restore when restoring a backup, then back into
serving. It should not go into serving mode before restoring the backup.
2. Fixing a corner case in CreateTablet.
If topo.CreateTablet worked, but not topo.UpdateShardReplication, then
it would leave a Tablet without a ShardReplication. To self-correct
this, next time we create the tablet, we now also fix the
ShardReplication anyway.