Граф коммитов

10982 Коммитов

Автор SHA1 Сообщение Дата
Alain Jobart 7bbe0f8217 Merge pull request #2161 from alainjobart/vtctld
Adding a Topology Validator workflow.
2016-10-20 08:38:08 -07:00
Alain Jobart 50a6f36e0f Adding Topo Validator Workflow.
It contains a Keyspace and Shard validator for now, but it's easy to add
new ones.

In the process:
- Fixing minor bugs in vtctld2 web ui.
null and undefined are different in TS.
- Ignoring ErrNoNode in FindAllShardsInKeyspace.
It most likely means a shard creation was aborted in the middle, and we
don't want to error everything out when that happens.
2016-10-20 08:01:05 -07:00
Alain Jobart 9f800fe4ad Merge pull request #2165 from alainjobart/vtgate
Vtgate errors.
2016-10-20 07:12:53 -07:00
Alain Jobart 51d0f91439 A few fixes to the error handling in vtgate.
- Using vtgate-suffixed error for execute.
We are computing the new error anyway, might as well return it. We
already do the withSuffix for streaming calls and begin / commit / ...

- Adding tests for vtgate error propagation, to verify the right
error code returned by TabletConn is returned to the caller.

- Replacing some error string parsing with codes.
Canonical error codes are much better to use for these.

- When receiving an error at scatter conn layer, instead of parsing the
error string, we now look at the canonical error code, to figure out if
we need to rollback the entire transaction.

- Changing the canonical error code for MySQL errors DATA_TOO_LONG and
DATA_OUT_OF_RANGE from UNKNOWN to BAD_INPUT.
Note the python client doesn't differentiate these two anyway.
2016-10-20 07:11:15 -07:00
Martin Fris e855c40083 RELOAD privilege added for orchestrator mysql user (#2167)
Fix for `Access denied; you need (at least one of) the RELOAD
privilege(s) for this operation` during master failover process,
which happens when Orchestrator tries to execute RESET SLAVE.
2016-10-19 17:41:10 -07:00
Pavel Ivanov c90357f1b6 Fix TabletStatsCache to properly cache topology info. (#2166)
After https://github.com/youtube/vitess/pull/2138 the notifications by
HealthCheck have changed so that the very first notification about tablet
happens with empty Target field. But TabletStatsCache code was reading info
about keyspace/shard/type of a tablet from Target field and then assumed that
the info didn't change later for a tablet with the same alias (until
notification with Down=false). To fix that I'm reading these values from Tablet
field instead of Target because Tablet is always populated.

Fixes #2164.
2016-10-19 14:15:23 -07:00
Anthony Yeh d7245784a1 pullapprove: Add erzel-bot 2016-10-19 11:01:53 -07:00
dumbunny 6e76e4f498 Merge pull request #2157 from dumbunny/fullscan
Remove WHERE filters from the full-scan queries.
2016-10-19 10:37:02 -07:00
dumbunny 470cbfea94 Merge pull request #2159 from dumbunny/entity
Make resolve.buildEntityIds use list bind vars.
2016-10-18 13:44:38 -07:00
Dean Yasuda 98cdfda569 Revert the numRowsPerQueryPart-1 change.
Each non-initial query includes the prev boundary row.
2016-10-18 13:04:02 -07:00
Dean Yasuda de0528c2ee Fix bad sentence in full_scan_algorithm comment. 2016-10-18 11:44:23 -07:00
Dean Yasuda 24a38cc202 Fix full_scan_algorithm comments.
Adjust comments to show that split query no longer preserves WHERE
clause. Add ORDER BY clause to comments.

Adjust LIMIT to be numRowsPerQueryPart-1, 1 for correctness.
2016-10-18 11:38:44 -07:00
sougou 92c1dca28f 2pc (#2153)
2pc: export grpc functions

* govet issue: call cancel funcs from contexts.
* Export 2PC functions as gRPC methods
2016-10-18 09:50:14 -07:00
Dean Yasuda aded9623f5 Make resolve.buildEntityIds use list bind vars.
Previous behavior used individual bind vars, which bloats the
original_sql and bind vars map.
2016-10-18 06:58:21 -07:00
Dean Yasuda 4e46b98f08 Remove WHERE filters from the full-scan queries.
Some WHERE clauses are very sparse WRT the PK, causing some
split queries to scan most or all of the table. The full scan
algorithm is more stable with no WHERE filters.
2016-10-17 17:41:27 -07:00
Alain Jobart 354fcd2c35 Merge pull request #2149 from alainjobart/plus
Various vtctld UI workflow improvements.
2016-10-17 15:38:43 -07:00
Anthony Yeh e811b26405 vtgateclienttest: Test Unicode in query echo. (#2155)
In languages with native support, insert Unicode and encode to UTF-8.
In others, directly insert UTF-8. Check that the echo is unmangled.
2016-10-17 15:35:16 -07:00
Alain Jobart 07a5a980ab Many changes to the workflow UI.
- Adding workflow creation dialog in vtctld.
- Adding 'Stop' button to workflows in vtctld.
- Fixing exception for redirect keyspaces display.
- Adding skip_start to vtctld.
- Adding Ui side for showing non-running workflows.
- Adding a vtctld HTTP based long-polling mechanism.
- Releasing UI resources for workflow closure.

Also fixing a race condition in manager shutdown:
We were using (manager.ctx == nil) as a test to check out if the manager
was stopped. This is not reliable, as manager.ctx is updated after the
context cancelation and after taking the manager mutex. In the
meantime, a job could have also realized its context was canceled and
started its shutdown, which takes the manager mutex and checks for
manager.ctx == nil as well.
The new logic uses a new 'stopped' flag instead to differentiate the
Manager shutdown from a workflow Stop. It's better documented too.
2016-10-17 14:22:23 -07:00
Pavel Ivanov 3c90125150 Regenerate proto files to rename WORKER tablet type to DRAINED. 2016-10-14 15:09:50 -07:00
Pavel Ivanov 0135d7650a Rename WORKER tablet type into DRAINED
This will make the type look more generic, so that it could be used for other
purposes as well, not only for a worker during resharding. In particular I plan
to use the same tablet type for schema swap reserving a seed vttablet to execute
the schema change.
I also add a "drain_reason" tag that will be set by anyone setting the DRAINED
tablet type, so that it's easy to understand why it is drained.
2016-10-14 15:09:50 -07:00
sougou b320fcd548 v3: Improve symbol resolution for subqueries (#2147)
We found a situation where subqueries that used FQ column names
would cause V3 code to be confused about their vindexes. For
example, a query like:
select ... from (select t.id from t) as t1...
would end up creating a vindex named "t.id". But a reference
to t1.id would fail to match against "t.id" because the string
comparison would fail.

This PR fixes the issue by creating aliases named "id" along with
fully qualified names as "t.id", which would allow any kind of
reference to correctly find the columns referenced.

This behavior still doesn't fully match MySQL's behavior. However,
it comes closer, while still preserving SQL name resolution rules.
The particular use case where we deviate is as follows:
select id as foo from t having id = 1
MySQL allows the above construct. However, we'll fail because you
can reference that expression either as 'foo' or 't.id', but not
as 'id'.

Implementation notes:
The colsym already had two values to match against. I have now changed
the meaning of those:
1. The Alias is the alias of the column. If one is not provided,
then we assign the base name of the column. Previously we used to
assign the FQ name.
2. The ExprName (now renamed to QualifiedName) always contains
the FQ name. This used to have a value only if there was already
an alias provided.

The matching logic has been changed accordingly. Some new errors
have been added to handle possible ambiquities that can now happen
because we match using base names. Otherwise, most things should
work like before.
2016-10-14 01:15:36 -07:00
Pavel Ivanov a16674a81f Allow to wait for all tablets to be propagated to TabletStatsCache (#2144)
This change adds method TopologyWatcher.WaitForInitialTopology() that will wait
until the initial topology data is read and propagated to TabletRecorder through
AddTablet() method. This also adds HealthCheck.WaitForInitialStatsUpdates()
method which will wait until information about all tablets added via AddTablet()
method is propagated to the listener through StatsUpdate() method.

This will allow resharding and schema swap code to be sure that after they
created the topology watchers all tablets has been propagated to
TabletStatsCache (or other health check listener).

The schema swap code is modified here to use these new methods. This code is not
tested (because schema swap code is not fully ready yet), but serves as an
example of how these new methods can be used.
2016-10-13 17:18:09 -07:00
Pavel Ivanov 890a717db0 Migrate deprecated usage of getMutable<Map> to map accessor. (#2152) 2016-10-13 15:54:51 -07:00
Pavel Ivanov 324fb4dff8 Regenerate php proto files. (#2150)
query.proto was changed at https://github.com/youtube/vitess/pull/2132 which
needs to go with these two new php files.
2016-10-13 15:52:25 -07:00
Amit Khare 2db3f4dbb3 enabling usage of dba password for mysql and mysqladmin (#2148) 2016-10-13 14:22:41 -07:00
Anthony Yeh 14d2320ab0 restore: Don't start up empty if backups exist. (#2146)
If there are no backups at all, we assume the shard is being initialized
and allow tablets to start up empty without restoring anything.

If any backups are found, but none can be read successfully either
because they are incomplete or because of transient network errors,
we should not start up empty. The existence of even a partial backup
implies that the shard has already begun accepting data, so it's better
to fail until a complete backup is accessible.
2016-10-13 11:26:43 -07:00
Anthony Yeh c18b009e40 Check for overridden database name before restoring. (#2143)
Before restoring, we use checkNoDB() to make sure there isn't any
existing data. There are some internal tables that exist even on an
empty tablet, so we use a whitelist to check only for user data.

Previously, we assumed the user's database would start with "vt_" since
we default to using "vt_{keyspace}". If the `-init_db_name_override`
flag had been used to set a database name that didn't start with "vt_",
it could result in unnecessary restores upon vttablet restarts.

Now we check for the overridden database name instead if one is set.
2016-10-12 16:54:43 -07:00
thompsonja 81cc95efa9 Merge pull request #2141 from youtube/webdriver
Merge both webdriver tests. Since vtcombo currently can serve both
2016-10-12 13:34:37 -07:00
Anthony Yeh b71e295c2d vttablet: Show true current health status. (#2140)
We were actually showing the most recent entry in the deduplicated
history. Since deduplication was changed to ignore replication lag,
the lag reported under "Current status" on the vttablet status page
wasn't actually current.
2016-10-12 11:55:38 -07:00
Pavel Ivanov 2a8a464ff3 Fix HealthCheck notifications about Up/Down tablets. (#2138)
When a new tablet is added via AddTablet() HealthCheck currently will notify
listener about the tablet only when the connection to the tablet is established
successfully and when first health message is received. But the tablet can be
unavailable for a long time, and all this time the listener will not be aware of
the tablet existence at all. This may hinder a proper usage of things like
TabletStatsCache. To fix this I'm changing HealthCheck to send initial
notification with Serving == false immediately after AddTablet(), and only after
that wait for the first health message from the tablet.

Also the current implementation of RemoveTablet() just cancels context in the
healthCheckConn object which causes checkConn() function to return immediately.
But that means that listener may not receive the final notification with
Up == false for this tablet. I'm fixing this by making sure that any return
from checkConn() will trigger the final notification to the listener about the
tablet beign removed from the health checking. As a bonus this also makes so
that after Close() is called HealthCheck will notify listener about all tablets
being removed (Up == false). That's not strictly necessary, but I think it
doesn't hurt to do that.
2016-10-12 08:58:13 -07:00
Alain Jobart 6f43d290ad Merge pull request #2136 from alainjobart/updatestream
Fixing update_stream.py for MySQL 5.6.
2016-10-11 15:54:07 -07:00
Michael Berlin 5a8829a1ad Merge pull request #2134 from michael-berlin/vtworker_less_refreshstate_calls
worker: Run RefreshState only on the destination master to start filtered replication.
2016-10-11 15:46:00 -07:00
Anthony Yeh 2c0b915a1e vtctld: Add panic handler for HTTP API. (#2139) 2016-10-11 15:39:20 -07:00
Anthony Yeh ee5ea9e3d8 gcsbackupstorage: Deprecate unused project name flag. (#2137)
The GCS API no longer requires us to specify the project name for any of
the calls we use.
2016-10-11 14:29:23 -07:00
Joshua Thompson fd43f2450c Merge both webdriver tests. Since vtcombo currently can serve both at
the same time, simply perform both tests at once.
2016-10-11 14:06:42 -07:00
Alain Jobart f3efd16500 Fixing update_stream.py for MySQL 5.6.
Also fixing a flakiness, a test was reading a value that may not be
final. Now we re-read it every time.
2016-10-11 13:32:04 -07:00
Michael Berlin c190c2011c worker: Run RefreshState only on the destination master to start filtered replication.
Refreshing replicas is not necessary because they will do this themselves when they become the master.

The resharding.py test verifies this behavior for the PlannedReparentShard command. For this test, I've manually verified this refresh as well:

old master:
- one refresh triggered by vtworker SplitClone

new master:
- at least one refresh triggered by vtctl PlannedReparentShard
- one refresh triggered by vtctl MigrateServedTypes master

In case of a TabletExternallyReparented event, a refresh should happen as well because rpc_external_reparent.go calls "agent.updateState(...)".

A side-effect of this commit is that the "RefreshState" phase during the clone will finish faster if there are many replicas because they are no longer sequentially refreshed (per destination shard). Now this phase is also less prone to transient errors e.g. if any replica does get restarted.
2016-10-10 19:13:51 -07:00
Michael Berlin 7c04dbe797 Unify usage of concurrency.AllErrorRecorder.
Reference stored instance by value instead of by pointer.
2016-10-10 18:57:04 -07:00
Alain Jobart 7aae4f09d6 Merge pull request #2133 from alainjobart/mysql56fix
Fixing resharding.py test for MySQL 5.6
2016-10-10 14:34:21 -07:00
sougou 370444e5e9 2pc: MM functionality (#2132)
* New proto var TransactionMetadata.
* Use 0 value to signify no timeout.
* Core MM functionality. A few more will be added later.
2016-10-10 13:09:07 -07:00
Michael Berlin 9a968ee3ed Merge pull request #2130 from michael-berlin/wrangler_remove_override
vtctl: InitTablet: Deprecate "--allow_different_shard" flag.
2016-10-10 10:27:44 -07:00
Alain Jobart fe07e207aa Fixing resharding.py test for MySQL 5.6
It turns out we can receive an unrelated PREVIOUS_GTIDS_EVENT when
asking MySQL to stream from a given position. This is confusing our
state machine and has to be ignored.
2016-10-10 10:23:20 -07:00
Michael Berlin ef653bf4d2 vtctl: InitTablet: Deprecate "--allow_different_shard" flag.
The flag allowed to initialize a tablet with a different keyspace/shard combination and was marked as "use with caution".

Instead of having such a flag, the user should just delete and re-add the tablet.

Removing this feature also allows to remove the "UpdateTabletReplicationData()" call in the "InitTablet()" method in wrangler/tablet.go because the previous "CreateTablet()" call already does this.
2016-10-07 18:51:49 -07:00
Alain Jobart ee30d0ebd1 Merge pull request #2129 from michael-berlin/rename_unit_test
topo: Rename unit test to include "_test" suffix.
2016-10-07 18:44:38 -07:00
Michael Berlin c428070363 topo: Rename unit test to include "_test" suffix. 2016-10-07 18:27:26 -07:00
Michael Berlin bc331e56ff grpc: Clarify when we can remove the macOS workaround again. 2016-10-07 17:11:32 -07:00
Michael Berlin 3d9ec1c3fe Merge pull request #2128 from Rastusik/osx_sierra_protobuf_fix
Fix for OSX Sierra build of Protobuf bundled with Grpc
2016-10-07 17:08:35 -07:00
Alain Jobart 626352618b Merge pull request #2126 from alainjobart/tabletmanager
Tabletmanager fixes.
2016-10-07 16:59:10 -07:00
Alain Jobart 9ec14f5316 Fixing two problems in TabletManager.
1. Changing initialization order in vttablet.
Mostly it's fixing the following bug: the tablet used to go into serving
mode, then go into restore when restoring a backup, then back into
serving. It should not go into serving mode before restoring the backup.

2. Fixing a corner case in CreateTablet.
If topo.CreateTablet worked, but not topo.UpdateShardReplication, then
it would leave a Tablet without a ShardReplication. To self-correct
this, next time we create the tablet, we now also fix the
ShardReplication anyway.
2016-10-07 16:58:11 -07:00
Alain Jobart a815a5ffc4 Merge pull request #2127 from alainjobart/rebuild
Rebuilding keyspaces in parallel.
2016-10-07 16:31:57 -07:00