1. Add two flags in vtctld: schemaChangeDir and schemaChangeController.
schemaChangeDir specifies a parent directory that contains schema
changes for all keyspaces.
schemaChangeController controls how to get schema change sqls from
schemaChangeDir and handle relevant schema change events.
2. Add RegisterControllerFactory in schemamanager that allows developers
to register different Controller implementations.
3. Add test case to test when schema change failed on some shards.
The idea is that sometimes a Controller implementation will take care all
keyspaces. In such case, schemamanager.Executor is not able to know keyspace
ahead.
DataSourcer and EventHandler interfaces are good but often a concrete
EventHandler implementation needs some information only exists in DataSourcer.
e.g. A DataSourcer reads schema changes from a file system and an EventHandler
wants to move this file around to response different schema change events.
The information exchange would be hard to do with two separate interfaces, as
this level of abstraction introduces more boilerplate code.
This change combines these two interface into a single one: schemamanager.Controller,
which reduces one level of abstraction and the code looks more cleaner.
The automation framework allows to automate cluster operations which
require a series of manual steps e.g. resharding.
A Cluster Operation has a list of task containers which are processed
sequentially. Each task container can contain one or more tasks which
will be executed in parallel.
Here's an example of a cluster operation with two task containers. The
second task container has two tasks:
- step 1
- step 2a | step 2b
If the task container contains one task, the task can emit new task
containers which will be inserted after the current task container. This
mechanism is used to fully expand Cluster Operations by special tasks
which emit new task containers e.g. "ReshardingTask".
This patchset implements the minimal steps to automate "resharding"
whereas task implementations for "vtctl" and "vtworker" are missing.
These will be added in later, separate commits.
1. Add SplitColumn field to SplitQuery so that caller could hint SplitQuery endpoint
which column is best for splitting query.
2. The split column must be indexed and this will be verified on the server side.
3. coding style fixes suggested by golint.
- worker error is remembered and displayed by the framework
- adding a StatusWorker that has common code for all workers
which maintain a state with a mutex.
New integration test only tests the backup appears to work,
nothing serious yet. Need to refactor a couple more things to
be much easier to test it all with unit tests.
1. Detect big schema change based on table rows. A schema change is
considered big if 1) alter more than 100,000 rows, or 2) change a
table with more than 2,000,000 rows.
2. Add unit test for TabletExecutor and also improve existing test cases.
1. Make DiffSchema compare table views.
2. Add schema diffs in schemamanager. Each schema changes have to change
table structure and schemanager rejects a sql that does not change
any table definition.
EmergencyReparent used to stop replication on all slaves,
then SetMaster would see replication is off, and not restart it.
Fixed unit and intergration tests to test this case.
MysqlDaemon fake now only has one place to store replication status.
API changes:
- SlaveStatus returns a myproto.ReplicationStatus (not a pointer any more)
- StopReplicationAndGetPosition is changed to StopReplicationAndGetStatus
- SetMaster has an extra forceStartSlave boolean.
Remove ReloadSchema, ValidateSchemaShard, ValidateSchemaKeyspace,
PreflightSchema, ApplySchemaShard and ApplySchemaKeyspace commands
out of vtctl. Those endpoints are still presented in wrangler package
and will be removed once autoschema is mature.
1. Stop applying schema changes through VtGate, using tabletmanager instead.
2. Update Executor api to not accept a list of shards as input. Schema
changes need to apply to all shards.
3. Fix topo in SimpleDataSourcer.
Use database statement is executed if caller specifies dbname. However,
sometimes the given query is trying to create this database and error
will be returned when executing the use database statement. It is okay
to ignore this error because the next ExecuteFetch will run the query and
it will also fail.
1. Add ExecuteFetchAsDba api in tabletmanager server.
2. Rename the existing ExecuteFetch to ExecuteFetchAsApp.
3. ExecuteFetchAsDba creates a dba connection on demand and takes care
of enable/disable binlog and reload schema.
4. Add GetDbaConnection func in MysqlDaemon interface.
5. Make sure fakesqldb package always store queries in lower case.
Why do they add 1 to the end value when storing and encoding intervals?
I don't know, but it was maddeningly unexpected. I'll never get those
hours of my life back...
1. Add EnableAutoCommit in queryservice, false by default.
2. If EnableAutoCommit is true, a DML outsides a transaction will be accepted
by querservice and it will starts a transaction to execute this DML.
3. Turn on this flag in integration tests to make existing tests pass.
If they were replicating, we stop / set master / start, if they were
not, we just set master. Side effect is ReparentTablet may not
start replication if it was stopped, but it's better that way.
In some flavors, it's possible that StripChecksum will be unable to
determine which bytes to strip. It should return an error in that case
to avoid the possibility of interpreting the data incorrectly.
The test was failing in a Docker image whose timezone setting was
different.
It was using time.Unix(), which adds local timezone info after
converting.
For a single dml that does not run inside a transaction, queryservice
will wrap it with a Begin and a Commit. In a case that query plan is
not recognized, Rollback will be called.
reloadSchema is false by default. If it is set to be true, ExecuteFetchAsDba
will reload tablet schema once it finishes successfully. In addition, this change
also make CopySchemaShard trigger schema reload.
The idea is that user could apply schema changes (DDLs) via schema manager tab.
Current code uses vtgate to do the schema change and this will be replaced by
an executor that applies directly to all vttablets.
Go's select will randomly pick a case if one or more of the communications can
proceed. In querylogz_test.go, timeout is set to zero and then it inserts a
message to the channel. This sometimes trigers timeout and thus message is not
retrieved and rendered.
1. double check the timeout condition when there is a message in the channel.
2. add a large timeout in the unit test to avoid run into the race condition.
codex_test.go uses createTableInfo func to create a TableInfo instance
for testing; however, it uses a map to store columns and then iterates
that map to add columns into TableInfo instance. This is not deterministic
because map iteration order is not guaranteed to be the insertion order.
1. remove global stats variables defined in query_engine.go.
2. introduce QueryServiceStats struct to be a holder of these stats.
3. use a dependency injection way to push QueryServiceStats instance from
QueryEngine to ConnPool, DBConn, TxPool, etc.
1. queryservice should not publish stats if either flag EnablePublishStats
is enabled or var name is empty.
2. stats.NewInt should call Publish if name is empty.
3. mysqlctl.NewMysqld publishes dba stats only if dbaName is not empty.
4. mysqlctl.NewMysqld publishes app stats only if appName is not empty.
5. QueryEngine publishes stats only if flag EnablePublishStats is enabled.
6. RowCacheInvalidator publishes stats only if flag EnablePublishStats is enabled.
CachePool launches a memcache process, if failed, it will sleep 100 millis
before the next attempt. This slows down the unit test since a fake memcache
service is guaranteed to succeed.
go/streamlog/streamlog.go starts a separate go routine for each new logger instance.
In the querylogz_test.go, it tries to call logger.Send(logStats) and then assumes this
logStats has been delivered. Then it calls querylogzHandler and verifies the result.
However, it is not guaranteed the message will be delivered after Send call, because Send
just put the message into dataQueue and only the go routine delivers it to each consumers.
We don't STOP SLAVE, RESET SLAVE in InitSlave any more.
We don't use BreakSlaves on new master in InitMaster
any more, we just use a couple SQL statements to init binlogs.
Under load, Begin gets occasionally called just when the context
is about to expire. In such cases, the query gets killed as soon
as it's issued. However, The Begin call itself almost never fails
because it's too fast. So, this just results in the connection
getting closed, which isn't seen until the next statement executes,
and results in a confusing 2006 error.
So, we give ourselves a 10ms grace period to make sure that we
give Begin enough time to succeed. This way, we won't see as many
confusing error messages.
There's a race condition between context and resource pool.
If a context is already expired, there's still a chance that
a Get will succeed because both channels are ready to communicate.
This bug also masked another bug in dbconnpool and some other
bugs in our tests.
All fixed now.
As part of removing googleBinlogEvent parsing support, I reworked the
binlog_streamer_test to use fake events instead of real ones. The actual
parsing of real events is already tested in its own unit test. Using
fake events simplifies the binlog_streamer_test, which really is only
meant to test the logic for converting a stream of events into a stream
of transaction objects.
1. rename logzcss.go to logz_utils.go and move two util funcs from txlogz.go to it.
2. let querylogzHandler accept a streamlog.StreamLogger instance instead of always
using SqlQueryLogger.
3. add two params timeout and limit to /querylogz, this allows one to control how many
queries the querylogz page shows.
1. add newQueryServiceConfig, newDBConfigs and newMysqld to testutils_test.go.
2. enforce these funcs under testUtils struct for better code readability.
transaction.go only has a single func Commit, it might make more sense
to move Commit to QueryEngine since it uses the QueryEngine's tx pool
and schemaInfo.
this change effectively brings sqlquery.go test coverage to 99%
1. add "queryCalled" in fakesqldb.DB so that tests could know number of
times a query hits a database.
2. add testUtils struct in testutils_test and this struct contains common
test funcs that could be used in tabletserver package.
3. add sqlquery tests to test SqlQuery.allowQueries failure modes.
4. add sqlquery tests to test SqlQuery.checkMySQL failure modes.
5. add sqlquery tests to test transaction failure modes.
6. add sqlquery tests to test SqlQuery.ExecuteBatch failure modes.
7. add sqlquery tests to test SqlQuery.SplitQuery failure modes.
1. use panic in query rule info when registering a registered query rule.
2. fix a bug in query_rules.go that should get an error if "op" is < QR_IN or > QR_NOTIN.
3. add more test cases to test query rules failure cases.
4. fix a comment that the returned error should be "want bool for OnMismatch" when OnMismatched is not a boolean (instead of "want bool for OnAbsent").
1. Refactor stream_queryz.go, move registerStreamQueryzHandlers to queryctl.go.
2. Add two funcs: streamQueryzHandler and streamQueryzTerminateHandler.
1. Rename streamlogger.go to sqlstats_stats.go since this matches file content better.
2. Rename QUERY_SOURCE_* contants to use camel case as suggested by golint.
3. Create a testutils_test.go to contain common test unitilies that could share among
all queryservice unit tests.