* Add FQDN fields to tablets/gates/vtctlds
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Add discovery-consul options for gate/vtctld FQDN tmpls
staticfile disco doesn't need this, because it can just read this from
the config. **Note**: the main downside is that the json key is "FQDN"
rather than "fqdn" which 🤷 ?
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Add FQDN impls to gates/vtctld for consul-disco
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Propagate FQDN field through the api
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Move per-cluster gate fetching to .... the cluster struct
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* nit: cleanup double imports
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Extract repeated template=>buffer execution to a textutil function
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Add fqdn template support to clusters and `parseTablet`
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* fix typo (ugh)
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Ensure `New` parses the final tablet fqdn tmpl off the config
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Fix typos, update comments
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Cleanup TODOs
Signed-off-by: Andrew Mason <amason@slack-corp.com>
There are no functional changes in this commit, just a pure port with
the most minor changes to ensure identical behavior (like the callback
logger).
I also updated our `proto` target to assume a standard protoc
installation (which installs a stdlib of well-known types into
/usr/local/include). We can decide if this is okay to make a
requirement, or if we should try to make this more portable in PR
review.
I also updated the local example to use the vtctldclient port, and went
through the full example with no issues.
Signed-off-by: Andrew Mason <amason@slack-corp.com>
I've only added some simple keyspace-related getters for now.
No documentation or tests or anything yet, this is just a
proof-of-concept / example implementation.
Signed-off-by: Andrew Mason <amason@slack-corp.com>
* Added a MasterStatus() method and carried it through the entire gRPC
chain, similar to what we do for SlaveStatus(). Other WIP changes to be
updated after getting approval on the MasterStatus() parts.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Small fixes per review suggestions.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Descriptive comments added per review suggestions.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Added unit tests and small fixes.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Refactored per review suggestion. Under new design, it's not possible to unit test anymore, since we don't have mock connections. Removed unit test because of this.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Fix very odd rebase issue I've never seen before, where rebase didn't show me conflicts in various files that it allowed to continue without fixing known conflicts.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Added new MasterStatus return for DemoteMaster and carried it through the entire call chain.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Remove deprecated field everywhere we possibly can, and ensure compatibility for newer client interacting with older server.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Remove copying of XXX fields, per review suggestion.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Added the option to StopReplicationAndGetStatus() to stop only the IO Thread, and passed it all the way down the call chain to MysqlDaemon, where we can now call a new method (implemented in all flavors) which stops only the io thread, if that's what was requested.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Oops
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Remove hook per review suggestion.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Adjusted stop slave io thread to pass in a ctx because it's a new function. Adjusted StopReplicationAndGetStatus so that it stops the slave before getting slave status. This will ensure that the relay log information is correct.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Add back in logic to bail if slave is already stopped.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* We have to patch these in because after calling stop slave there won't be a master host and master port anymore when we grab slave status. Instead we need to patch in positions so we retain master host and master port, otherwise set master will assume the tablet is the master because it has no master host and master port. In retrospect its probably a bad idea that we assume no master host and no master port means we've found the master (in set master).
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Refactored per offline discussions. We now return before and after slave status, so we are more explicit, and don't nest business logic into subfields of a hybrid struct.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Fix issues that cropped up after merge conflict
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Change way we get this state to also pull in the Connecting state. We are either running, or attempting to run. Either way we are not not running.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Changed references from slave to replica per review suggestion.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Embed StopReplicationStatus into StopReplicationAndGetStatusResponse and rename fields to make it clear.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Various fixes per review suggestions.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Lets try out issuing a stop no matter what and see what happens.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Adding back in bailouts. They are necessary.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Get rid of more slave references.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Changed stopIOThreadOnly to an enum so we can change the way in which we stop replication in the future.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Add a test to ensure that we can stop the io thread only.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Fix incorrect test methodology.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Used a generated enum from proto per convention for the stop replication mode.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Scrub more references of slave without obfuscating MySQL statements that are being called under the hood.
Signed-off-by: Peter Farr <Peter@PrismaPhonic.com>
* Support set statement with user defined variables
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
* Add needed bind variables
Signed-off-by: Andres Taylor <andres@planetscale.com>
* Changes the session UDV holder to use BindVariable instead of query.Value
Signed-off-by: Andres Taylor <andres@planetscale.com>
* Make sure to allow float values for UDVs
Signed-off-by: Andres Taylor <andres@planetscale.com>
* added udv set end2end
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
* Allow @ in sql identifiers
Signed-off-by: Andres Taylor <andres@planetscale.com>
* Make sure to allow variable names that contain characters that are not letters or digits
Signed-off-by: Andres Taylor <andres@planetscale.com>
* Improved user defined variable testing
Signed-off-by: Andres Taylor <andres@planetscale.com>
* Minor clean ups
Signed-off-by: Andres Taylor <andres@planetscale.com>
* Make sure to go over all variables in the SET statement
Signed-off-by: Andres Taylor <andres@planetscale.com>
Co-authored-by: Harshit Gangal <harshit@planetscale.com>
There was an aspiration that applications will use an event token
to validate how fresh a replica read was. The feature was neither
very usable nor used by anyone.
This has now been deleted.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
The protoc generated code was not matching the goimports standard.
I've added a goimports step after the build.
This caused another breakage because time.proto did not match its
package name of vttime. I've fixed that also.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
* PlannedReparentShard: Fix more known-recoverable problems.
PlannedReparentShard should be able to fix replication as long as all
tablets are reachable and all replication positions are in a
mutually-consistent state.
PRS also no longer trusts that the shard record contains up-to-date
information on the master, because we update that record asynchronously
now. Instead, it looks at MasterTermStartTime values stored in each
master tablet's record, so it makes the same choice of master as
vtgates.
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* PlannedReparentShard: Add -lag_threshold flag.
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* Fix expected error in reparent test.
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* PRS: Add test case for graceful recovery.
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
* PRS: Measure replication progress instead of lag.
Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
Added two new shard sessions for commit ordering:
pre and post.
Added API to set the commit order and changed
tx conn to honor it.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This is the first part of the changes to implement #4790.
This part implements all the management functionality for
routing rules.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
* When auto-commit is on, passDML is on and ExecuteBatch is in a transaction,
there is no need to explicitly create a transaction. We can forward the DML
directly to the database.
* This optimization yielded significant more throughput in vttablets. We got
around 25-30% improvement. Most of our queries are single point
inserts/updates that already use auto commit when coming from vtgates, so this
improvement is something that we've been wanting to do for a long time.
Signed-off-by: Rafael Chacon <rafael@slack-corp.com>
Although it is spelling mistakes, it might make an affects while reading.
Co-Authored-By: Nguyen Phuong An <AnNP@vn.fujitsu.com>
Signed-off-by: Kim Bao Long <longkb@vn.fujitsu.com>
The new lag tracking introduces the following changes:
* VStreamer sends its current time along with every event. This
allows for VPlayer to correct for any clock skew that may
exist between the two machines. This results in a more accurate
calculation of lag.
* If there are no events to send for a period of time, VStreamer
sends a heartbeat event. This allows us VPlayer to essentially
know for sure that it's still caught up.
* If VPlayer receives no event for its wait period, then it updates
the SecondsBehindMaster stat to indicate that it's actually falling
behind.
The VStreamer timeout for heartbeat is set slightly lower than the
VPlayer idle timeout. This ensures that Vplayer won't timeout
exactly when it's about to receive the heartbeat event.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
To make this possible, some things are added:
- The capability to lock all tables on a tablet, to momenterily stop updates
- Once the database is locked, we can create multiple consistent snapshot
transactions that all share the same view of the data
- Adds the capability to have replication move forward to a specific point
in the transaction log
This commit also refactors tabletserver and tx_engine, moving logic of
state transitions into the tx engine.
Signed-off-by: Andres Taylor <antaylor@squareup.com>
Also had to added transmission of field info, which may come
in handy for encoding the values on the player end.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
If this flag is set for a table, we'll treat the column list
as authoritative and expand select *.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
To enable conveying out of band warnings, add a new proto definition
for a QueryWarning and add a repeated array to the vtgate session.
Signed-off-by: Michael Demmer <mdemmer@slack-corp.com>
MigrateServedTypes has been made idempotent: if it fails in
the middle, you can safely retry the operation. If the operation
has previously succeeded, retrying it will be a no-op (except
for master migration).
For master migration. A new Frozen field has been added to the
tablet control record. This field signifies the point of no
return. If a migrate fails before reaching this state, then
we undo everything and re-enable the source shards. Once we
go past the 'frozen' state, you can only go forward. If there
are failures after the frozen state, the migrate can be safely
retried until successful. Once successful, a retry will return
an error saying that there's no resharding in progress.
The resharding end to end test has been updated to demonstrate
these behaviors.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Pinning was an internal feature used for pinning the
dual table to shard 0. Exposing this ability in vschema
allows someone to pin an unsharded table to a specific
shard.
This is still not a full feature because we ned to
think about the various use cases like resharding etc.
But it can be used for testing purposes.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
The number of vreplication control functions is too high. I counted
seven at the minimum. So, I'm now trying this new approach: a single
VReplicationExec function that can execute queries. The goal is to
accept SQL statements that access a 'vreplication' table which will
be interpreted by the engine, which will in turn update the
_vt.vreplication table and also accordingly act upon the change,
like start, stop etc.
For now, these queries will only be understood by VReplicationExec.
In the future, we can look at allowing these as part of the
QueryService API.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Added the following fields to vreplication:
workflow: who created and manages the replication
source: source info for the replication (proto)
state: Running, Stopped, etc
message: any message relevant to current state
BinlogSource contains the source info. It currently supports only
keyrange or table list for now, but can be extended to support
other rules. The information is stored in proto 'String' format
because its keyrange rendering is more readable than JSON.
The current change only populates the fields. Next change will
use the values.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
This change deprecates _vt.blp_checkpoint in favor
of vreplication, which stands for Vitess Replication.
The goal is to make vreplication a standalone sub-module
of vitess that other services can use, including the
resharding worflow.
This first step just changes to using vreplication instead of
blp_checkpoint. The table and APIs to perform more autonomous
functions.
Also, split strategy flags were unsafe and untested. So, they
have been deleted. More wholistic operations will be added
to better manage resharding and vreplication.
Signed-off-by: Sugu Sougoumarane <ssougou@gmail.com>
Also, simplify the protoc call by removing some intermediate tools and steps involving the python grpcio-tools wrapper.
The motivation for changing the build was that I had trouble getting the old build to work. It might just have been me.
This approach is arguably simpler because it involves fewer tools, no temp files, and no calling "sed" on the output.
Signed-off-by: David Weitzman <dweitzman@pinterest.com>
This makes RefreshConfig match the behavior of ReinitConfig.
Since the initial InitConfig happens in mysqlctld, it's important that
vttablet requests mysqlctld to regenerate config remotely, or else it
may not use the same settings.
This would allow re-using the current StreamHealthCheck to convey the
status of a group of tablets, between vtgate and l2vtgate (which will
be merged into vtgate).
This is the first step towards making VTGate more schema aware.
The VSchema has been extended to include a list of columns and
their types.
For now, this info has to be supplied as part of the VSchema.
In the future, this can be dynamically updated by vtgates
through a new protocol that will allow the vtgates to
subscribe to schema changes from the vttablets.
It's still useful to manually provide this information because
it can be used as fallback in case the auto-update mechanism
fails.
With this added awareness, two new features can be built:
1. If a column contains text, VTGate can use this information
for collating values correctly.
2. Add a strict mode where VTGate will reject a query if
unrecognized column names are used. Also, we can auto-resolve
column names to the correct table if they are unique in a query.
As part of this change, I added some additional wrappers in json2
for protos. This is because the default behavior does not encode
or decode enums in a user-friendly manner.