replication_reporter: Don't try to reparent to yourself.

We've seen it happen that a master tablet restarts and becomes a
replica. If the shard record still says we are master, we might end up
trying to reparent to ourselves.

I don't know yet how the tablet is getting forced to replica type, but
in any case we should enforce the invariant that we don't try to
reparent to ourselves.

Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
This commit is contained in:
Anthony Yeh 2019-09-03 23:47:15 -07:00
Родитель ce89f736be
Коммит 6f791c7a03
1 изменённых файлов: 9 добавлений и 0 удалений

Просмотреть файл

@ -30,6 +30,7 @@ import (
"vitess.io/vitess/go/vt/health"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/mysqlctl"
"vitess.io/vitess/go/vt/topo/topoproto"
)
var (
@ -126,6 +127,14 @@ func repairReplication(ctx context.Context, agent *ActionAgent) error {
return fmt.Errorf("no master tablet for shard %v/%v", tablet.Keyspace, tablet.Shard)
}
if topoproto.TabletAliasEqual(si.MasterAlias, tablet.Alias) {
// The shard record says we are master, but we disagree; we wouldn't
// reach this point unless we were told to check replication as a slave
// type. Hopefully someone is working on fixing that, but in any case,
// we should not try to reparent to ourselves.
return fmt.Errorf("shard %v/%v record claims tablet %v is master, but its type is %v", tablet.Keyspace, tablet.Shard, topoproto.TabletAliasString(tablet.Alias), tablet.Type)
}
// If Orchestrator is configured and if Orchestrator is actively reparenting, we should not repairReplication
if agent.orc != nil {
re, err := agent.orc.InActiveShardRecovery(tablet)