replication_reporter: Don't try to reparent to yourself.

We've seen it happen that a master tablet restarts and becomes a replica. If the shard record still says we are master, we might end up trying to reparent to ourselves. I don't know yet how the tablet is getting forced to replica type, but in any case we should enforce the invariant that we don't try to reparent to ourselves. Signed-off-by: Anthony Yeh <enisoc@planetscale.com>
2019-09-03 23:47:15 -07:00 · 2019-09-03 23:47:15 -07:00 · 6f791c7a03
--- a/go/vt/vttablet/tabletmanager/replication_reporter.go
+++ b/go/vt/vttablet/tabletmanager/replication_reporter.go
@ -30,6 +30,7 @@ import (
 	"vitess.io/vitess/go/vt/health"
 	"vitess.io/vitess/go/vt/log"
 	"vitess.io/vitess/go/vt/mysqlctl"
+	"vitess.io/vitess/go/vt/topo/topoproto"
 )

 var (
@ -126,6 +127,14 @@ func repairReplication(ctx context.Context, agent *ActionAgent) error {
 		return fmt.Errorf("no master tablet for shard %v/%v", tablet.Keyspace, tablet.Shard)
 	}

+	if topoproto.TabletAliasEqual(si.MasterAlias, tablet.Alias) {
+		// The shard record says we are master, but we disagree; we wouldn't
+		// reach this point unless we were told to check replication as a slave
+		// type. Hopefully someone is working on fixing that, but in any case,
+		// we should not try to reparent to ourselves.
+		return fmt.Errorf("shard %v/%v record claims tablet %v is master, but its type is %v", tablet.Keyspace, tablet.Shard, topoproto.TabletAliasString(tablet.Alias), tablet.Type)
+	}
+
 	// If Orchestrator is configured and if Orchestrator is actively reparenting, we should not repairReplication
 	if agent.orc != nil {
 		re, err := agent.orc.InActiveShardRecovery(tablet)