Our Kubernetes instructions use "go get" to compile vtctlclient.
Since our generated gRPC services are no longer compatible with the latest gRPC version, we need to update it.
See: https://github.com/youtube/vitess/issues/1811#issuecomment-258740349
I ran the following command to update the dependency:
$ govendor fetch google.golang.org/grpc/...@=v1.0.4
A prepared commit is not supposed to fail. But if it does, we have
to do our best effort to alert on this. This PR makes the necessary
changes for it:
* If a commit fails, that dtid is added to a list of failed commits.
A subsequent attempt to commit the same dtid will fail again. This
way, someone who retries won't accidentally get a succeeded message.
* A failed commit will also mark the transaction as failed in the redo
log. If a reparent happens, then that such dtids are repopulated in
the failed list. This will ensure that the info about the failed
commit survives reparents.
* A failed commit will also increment InternalErrors, which can be
tied to an alert.
* The marking of a transaction as failed could itself fail. At this
point, we've already raised an alert. So, there's not much we can
do beyond logging the failure.
* In the prep pool implementation, the failed list is called 'reserved'
because it also holds transactions while they're being commmitted.
This will prevent two racing commits from stepping over each other.
We'll switch serving vitess.io:
- before: from the gh-pages branch
- now: from the docs directory out of the master branch
This way it is much easier to update the website:
- no switching between branches required
- no untracked files get in our way in the gh-pages branch
- generated pages can be written in place and do not have to be moved around
- website can be updated and published in the same pull request
For example, http://vitess.io/doc/TopologyService/ was published but
never reachable from any navigation entry.
Stopping to publish /doc/ pages will also avoid duplication e.g.
- http://vitess.io/doc/Reparenting/ vs.
- http://vitess.io/user-guide/reparenting.html
Instead, to publish a page, an explicit *.md file must be created in /vitess.io and it must be linked in the navigation file vitess.io/_includes/left-nav-menu.html.
I did this for TopologyService.md and created vitess.io/user-guide/topology-service.md because it is a relevant page which is linked from several other pages.
This change also allows us to remove the script replace_doc_link.py. It
replaced github.com links (e.g.
https://github.com/youtube/vitess/blob/master/doc/TopologyService.md)
with a vitess.io link. Since we're always using absolute links anyway
(e.g. /user-guide/topology-service.html), this is no longer necessary.
By getting rid of the tool and the extra doc files (published under
_post), we can also avoid creating a copy when
running preview-site.sh. This way, Jekyll can watch the original folder
and automatically generate pages when something changes.
To make this work, I had to add the following symlinks:
vitess.io/_includes/doc
vitess.io/_includes/index.md
vitess.io/_includes/README.md
After this change, the following files in doc/ are not published on vitess.io:
- LifeOfAQuery.md
- Monitoring.md
- Production.md
- ReplicationStreamTiming.md
- TestingOnARamDisk.md
- TwoPhaseCommit.md
- V3HighLevelDesign.md
- V3VindexDesign.md
- Vision.md
- VTGateV3Features.md
Except for TestingOnARamDisk.md they also haven't been visible on vitess.io before. That's because no other page did link to them.
This change clearly separates the data owned by user of workflow API and data
owned by the workflow library itself. After this separation user can change the
data owned by it at any time and in any way it wants, but then it will need to
call the BroadcastChanges() method to copy that data into workflow library and
broadcast it to browsers.
This eliminates Node.Modify() method which was a pretty awkward callback-based
method of modifying the Node data. It highly limited what the API users could
do, and besides that it would race with any other Node changes that could be
done at the same time. This also fixes the problem when a Node couldn't be
updated in the UI without updating its children as well. Now that can be done if
BroadcastChanges() method is called with updateChildren = false.
- Make the first value of enums (the one that is equal to 0) to be the unknown
one, so that it was considered a programming error.
- Remove "omitempty" tag from fields where change from some value to empty value
must be passed in an update to the web UI.
- Add "omitempty" to Children because that's an important special use case.
- Modify javascript to assume that name, state, style and message of an action
is always there.
- Modify javascript to work when 'children' value was not received.
RollbackNonBusy gets called if a tablet was externally reparented.
If so, it's normal for transactions to still be outstanding. So,
they should be simply rolled back without raising any alarms,
It looks like something has changed recently and now when WorkflowManager
encodes Node into json it behaves similar to proto3: when 'state' and 'style'
in an action have enum value equal to 0, it's getting omitted from the resulting
json. That caused the javascript code to skip adding the Action object to the
Node. I'm fixing the javascript to account for that and consider missing 'state'
and 'style' as being equal to 0.
Also fixing related semi-sync problems:
* restarting replication if needed in ChangeType.
* fixing a bug in backup.by that would make it not remove the
right directories if -k is specified.
* fixing tabletmanager.py test:
* we were setting up a master and a rdonly only in one shard. In that
scenario, have to disable semi-sync, otherwise we can't ack.
* InitSlave called on an old master was initializing semi-sync using the
master type, and that is wrong (can't commit anything, no slave).
Using replica as a type for this case. Also adding a ChangeType to
REPLICA for the tablet in that case, with test.
If we can't find a ShardReplication object, fallback to reading all the
tablets in a cell (canonical tablet records) to find potential matches.
Also adding a unit test for that logic.
And adding a new 'even_if_serving' flag to DeleteShard. That way we
won't delete a serving shard by accident.
Note we do not use that support yet. It will be enabled in a separate
commit. Right now, we just handle querypb.BindVariable as a type in the
map[string]interface used with bind variables.
In the process, handling a bunch more types and adding tests for the
handling functions:
- Simplifying bindVariableToValue.
- Making vindexes.getNumber and getBytes better, with more tests.
- Also adding more types and test cases for query rules.