Some changes of sqltypes were not backward compatible:
- Sometimes we get nil bytes for valid string types.
- bsonrpc and grpc were different for Result.Rows.
If the machine a tablet is on loses contact with the cluster, a human or
automated cluster manager may launch a replacement tablet on another
machine. If the original tablet later regains contact, we need to
prevent its healthcheck from modifying the tablet record, which is now
owned by a new instance.
To do this, we rely on the fact that each tablet will set its IP and
primary port in topology on startup. The IP:port should never change for
the life of a process, and any new process that runs simultaneously with
the old one should always have a different IP:port tuple.
Thus, we say that whichever tablet has its IP:port in the tablet record
is the true owner of that record. The healthcheck of any other tablet is
not allowed to modify the record. Note that this check for ownership
applies ONLY to the healthcheck (which includes going SPARE on
shutdown). All other updates to tablet records are unaffected.
The new Value implementation is now based on the vitess types.
* The inner interface has now been replaced by typ and val.
* All Values are expected to be consistent with their types.
For example, an Int64 type must contain a number.
* The functions that build values generally ensure consistency.
* There is a set of 'Trusted' functions that can bypass this
consistency check. They should be used with care.
* The proto3 conversion functions build the correct Value types
based on the field types.
* The bson conversion function provides a Repair function that
allows you to fix up the types after the fact. This should be
deleted after bson is deprecated.
* The building of Values from a QueryResult is non-trivial because
the field info is not part of the QueryResult for streaming
queries. So, the API requires fields to be explicitly passed in.
* Fuctions that encode or convert to native types expect Value
to be consistent. If not, they panic.
* proto3.QueryResult is considered to be trusted. If it contains
inconsistent data, it will cause panics.
* The EventStreamer has been fixed to ensure that the fields and
rows it publishes are trustable: They can used as parameters
to the Trusted API.
* The Raw() function usage has been minimized. We should see if
it can be deprecated. This way, we can make Result truly read-only.
There are a few more tweaks that need to be done:
* The Proto3ToResult call plumbing was hacked in to make everything
work. That part needs cleaning.
* The bind vars don't need to be converted to their native types
any more.
These tests no longer use our test runner and therefore fail when
test.go supplies --skip-build by default.
Specifying an explicit command in the test config overrides the default
assumption that the test would use our custom test runner.
There were two almost identical methods in utils.py and in tabletmanager.py.
For the tablet type I'm using strings instead of the proto constants now because it's easier to read and shorter. The proto function which converts the string into an enum value will still check if the type is valid.
While porting tests from java_vtgate_test_helper to vtcombo, I found
that streaming queries that fetched more than ~300 rows would indicate
success, but receive fewer rows than expected. The number of rows actually
returned was always in the same ballpark, but would fluctuate from run
to run.
It turned out that since vtcombo returns results in-memory from vttablet
to vtgate, we were returning a *Result once the rows filled up the
buffer size, and then concurrently modifying the already-returned struct
to fill in the next set of rows.
These blocks tearDown the environment when setUp fails. When the exception does not get re-raised, it becomes effectively swallowed and the test runs despite the failed setUp.
Track the state of each Zookeeper connection in the "ZkCachedConn" variable instead.
Remove "ZkMetaConn" variable because "ZkCachedConn" has the same information now.
Update tabletmanager.py end-to-end test.
I'm removing states.go because it caused a deadlock in the worker.py test.
Its removal is no loss because the original intent was to detect flip-flopping with that code. However, that's no longer necessary.
The observed deadlock occurred when a) somebody polls /debug/vars while b) we create a new Zookeeper connection and publish a "ZkCachedConn" variable for it.
When the connection gets created, then the "states" object is created. The same call also calls expvar.Publish() eventually and this is where the deadlock occurs. (It's a deadlock between a mutex in the "expvar" package and a mutex in the Zookeeper connection package.)
The test requires that some steps occur before shard_2 comes up,
so shard_2 has a separate setup function. However, the test was trying
to combine teardown of shard_2 with the rest, which was confusing and
resulted in attempting to init mysql twice without tearing it down.
This splits out teardown of shard_2 so it behaves correctly and is
easier to understand.
This gets rid of the opaque mysql-db-dir.tbz archive, replacing it with
a .sql file. The .sql file approach makes it clear what state the DB is
initialized with, and also makes it easy to customize.
The test does a planned reparent, but then later steps assume the old
tablet is still the master. After the reparent, we should update the
test's expectation of who the master is.
SrvKeyspace object doesn't exist. Fixing an issue with zktopo
that was not returning correct error for missing SrvKeyspace.
Also fixing some (but not all) pylint errors in the files
I touched.
- vttablet now uses query service to talk to other tablet for health check.
- making all retries and timeouts configurable (using short values in tests).
- doing a single manual health check on source tablets so their health is good.
The Makefile previously listed tests explicitly for groups like
site_test and worker_test. These lists got out of date when tests were
removed from test/config.json, and the make rules broke. Now the groups
are defined in config.json itself, so there is one place to update
everything.
This commit changes the following protocols:
- binlog_player_protocol
- vtctl_client_protocol
The only BSON protocol left is vtgate pending the implementation of the
gRPC vtgate client.
Note that we originally added this change in
https://github.com/youtube/vitess/pull/1230
However, we reverted it because the Kubernetes tutorial and images were
out of sync. Therefore, this commit technically is the revert of the
revert.
Revert "Revert "Change protocol defaults to grpc.""
This reverts commit 5e5f40a04e.
We assume the latest backup is the one that's at the end when the list
of directories is sorted. This assumption is violated if we put the
tablet alias first in the name.
It's been changed to use vttest, and there is a script
that brings up everything, and tears down at the end.
The one problem with the new scheme is that the querylog
doesn't tell you which tablet the query came frome. Something
to think about in the future.
I've written a code generator that converts the python cases
into go. Some massage is needed after the conversion.
The downside is that it's a brainless port. No re-interpretation,
regrouping or gap analysis.
The race detector found a bug in the QueryExecutor. It probably
hasn't failed on production because the racy code paths were always
trying to set the variable to the same value.
- now rebuild serving graph in vtcombo to allow range queries.
- adding vtctl VtGateExecuteKeyspaceIds command.
- added an 's' to vtctl VtGateExecuteShard to be consistent.
- pass in an empty Session to Begin in vtcombo, otherwise panics.
Change the vtgateclienttest/services/callerid.go test to always return
an error when callerid is handled. The error will contain word
"SUCCESS:" if callerid matches. This is necessary since falling through
breaks the expectations of the caller.
In the connection's _execute_batch call, some routing will be
keyspace_ids, and some will be by shards. This means that an
_execute_batch may create both a ExecuteBatchKeyspaceIds and
ExecuteBatchShard call. Merge the results back together.
Add tests for ExecuteBatchShard in python_client_test.py and
vtgatev2_test.py.
Since BatchVTGateCursor only adds executemany and nextset functionality,
just add that functionality to vtgate_cursor.VTGateCursor. Fix tests and
code that refers to BatchVTGateCursor.
To comply with PEP0249, remove the BatchVTGateCursor execute and flush
sequence and use executemany instead. The first argument to executemany
is sql if all commands share a common sql (this is supposed to be an
"operation" parameter). The params_list will supply sql if the first
argument is None.
Each params in the executeany params_list generates a result set.
Implement the nextset() to walk through the result_set.
Update tests to use the new BatchVTGateCursor API.
Always open a new connection for every stream_execute. Reusing an
existing connection may cause problems if you are in a transaction; this
logic is simpler.
Also, change the cursor._get_conn() method into a cursor.connection property.
Only cursor.set_effective_caller_id(effective_caller_id) is now
supported. Sending effective_caller_id as an argument to PEP0249
functions will fail or be ignored.
In base cursor classes, Make set_effective_caller_id a separate call
rather than passing effective_caller_id through interfaces (cleanup to
come later).
Remove some try-except log-and-raise wrappers in vtgatev2 test methods.
The default logging, and put the logging in get_connection instead.
Vtgate connection stream_execute now detaches the BsonRPCClient from the
connection and closes it after the generator exits. The connection will
quietly open a new BsonRPCClient if another execute is called.