VTOrc Cleanup - Configs, APIs and old UI (#11356)

* feat: remove configurations that aren't needed like raft and initialization of database

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: use waitreplicastimeout for all timeouts related to waiting for execution on a replica

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove agents in vtorc since we don't use them

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove more unused configurations

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: cleanup post and pre processes that VTOrc doesn't intend to support

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove unused code and config for shell process

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove unused configurations

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: convert flags to pflag architecture

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove cli flags which aren't required

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove more unused configs

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove the old UI of orchestrator and cleanup app and http packages

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove web/vtorc files and fix Dockerfiles to not try to copy them

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix all the examples to not pass in the ListenAddress and not export port for the old UI

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove raft code and command applier

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove unused code and configs

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove more unused code

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove http configs that are no longer required

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove more unused configs and dead code

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove more unused configs

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove access tokens since they are unused

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove RecoveryPeriodBlockMinutes which was only for backward compatibility

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove unused package

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove replicas in instance since they weren't used anywhere

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: copy over vtorc things that vtgr was using to remove vtgr's dependency on vtorc package

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: change restart function to use tmc rpc calls

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove detection queries for vtorc since we read these from topo-server

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: improve read topology test to also verify the errantGTID set

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: use internal function to find difference in gtid sets instead of using the MySQL connection

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove population of group replication information

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove configs related to connecting to MySQL instances since it is no longer required

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove code to connect to MySQL backend since VTOrc now only uses SQLite3

Signed-off-by: Manan Gupta <manan@planetscale.com>

* cleanup: go mod tidy

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove unused/redundant flags, we are down to 0 :)

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove filtering configs

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove write buffering and associated config parameters

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove more miscellaneous configs

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove one more rejection parameter

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: move discovery queue parameters to be constants

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: move some more configs to be constants

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: add flags for all the configurations we have kept

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: fix imports in main.go

Signed-off-by: Manan Gupta <manan@planetscale.com>

* test: fix vtorc test output after so many changes

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: add release-notes docs for config changes

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove index definitions for tables that are already deleted

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: add API removal documentation as well to release notes

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: remove duplication of debug pages in release notes

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: use some flags in e2e test to verify they work

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: use sliceVar for the flag definition of cluster_to_watch

Signed-off-by: Manan Gupta <manan@planetscale.com>

Signed-off-by: Manan Gupta <manan@planetscale.com>
This commit is contained in:
Manan Gupta 2022-09-30 22:11:02 +05:30 коммит произвёл GitHub
Родитель e17b4e16f9
Коммит e7f98f859c
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
188 изменённых файлов: 4681 добавлений и 20559 удалений

Просмотреть файл

@ -266,63 +266,93 @@ With this new `explain` format, you can get an output that is very similar to th
### VTOrc
#### Configuration Renames
#### Old UI Removal and Replacement
VTOrc configurations that had `Orchestrator` as a substring have been renamed to have `VTOrc` instead. The old configuration won't
work in this release. So if backward compatibility is desired, then before upgrading it is suggested to duplicate the old configurations
and also set them for the new configurations.
The old UI that VTOrc inherited from `Orchestrator` has been removed. A replacement UI, more consistent with the other Vitess binaries has been created.
In order to use the new UI, `--port` flag has to be provided.
VTOrc ignores the configurations that it doesn't understand. So new configurations can be added while running the previous release.
After the upgrade, the old configurations can be dropped.
Along with the UI, the old APIs have also been deprecated. However, some of them have been ported over to the new UI -
| Old Configuration | New Configuration |
|:--------------------------------------:|:-------------------------------:|
| MySQLOrchestratorHost | MySQLVTOrcHost |
| MySQLOrchestratorMaxPoolConnections | MySQLVTOrcMaxPoolConnections |
| MySQLOrchestratorPort | MySQLVTOrcPort |
| MySQLOrchestratorDatabase | MySQLVTOrcDatabase |
| MySQLOrchestratorUser | MySQLVTOrcUser |
| MySQLOrchestratorPassword | MySQLVTOrcPassword |
| MySQLOrchestratorCredentialsConfigFile | MySQLVTOrcCredentialsConfigFile |
| MySQLOrchestratorSSLPrivateKeyFile | MySQLVTOrcSSLPrivateKeyFile |
| MySQLOrchestratorSSLCertFile | MySQLVTOrcSSLCertFile |
| MySQLOrchestratorSSLCAFile | MySQLVTOrcSSLCAFile |
| MySQLOrchestratorSSLSkipVerify | MySQLVTOrcSSLSkipVerify |
| MySQLOrchestratorUseMutualTLS | MySQLVTOrcUseMutualTLS |
| MySQLOrchestratorReadTimeoutSeconds | MySQLVTOrcReadTimeoutSeconds |
| MySQLOrchestratorRejectReadOnly | MySQLVTOrcRejectReadOnly |
| Old API | New API | Additional notes |
|----------------------------------|----------------------------------|-----------------------------------------------------------------------|
| `/api/problems` | `/api/problems` | The new API also supports filtering using the keyspace and shard name |
| `/api/disable-global-recoveries` | `/api/disable-global-recoveries` | Functionally remains the same |
| `/api/enable-global-recoveries` | `/api/enable-global-recoveries` | Functionally remains the same |
| `/api/health` | `/debug/health` | Functionally remains the same |
| `/api/replication-analysis` | `/api/replication-analysis` | Functionally remains the same. Output is now JSON format. |
Apart from these APIs, we also now have `/debug/status`, `/debug/vars` and `/debug/liveness` available in the new UI.
For example, if you have the following configuration -
#### Configuration Refactor and New Flags
Since VTOrc was forked from `Orchestrator`, it inherited a lot of configurations that don't make sense for the Vitess use-case.
All of such configurations have been removed.
VTOrc ignores the configurations that it doesn't understand. So old configurations can be kept around on upgrading and won't cause any issues.
They will just be ignored.
For all the configurations that are kept, flags have been added for them and the flags are the desired way to pass these configurations going forward.
The config file will be deprecated and removed in upcoming releases. The following is a list of all the configurations that are kept and the associated flags added.
| Configurations Kept | Flags Introduced |
|:-------------------------------------:|:-------------------------------------:|
| SQLite3DataFile | `--sqlite-data-file` |
| InstancePollSeconds | `--instance-poll-time` |
| SnapshotTopologiesIntervalHours | `--snapshot-topology-interval` |
| ReasonableReplicationLagSeconds | `--reasonable-replication-lag` |
| AuditLogFile | `--audit-file-location` |
| AuditToSyslog | `--audit-to-backend` |
| AuditToBackendDB | `--audit-to-syslog` |
| AuditPurgeDays | `--audit-purge-duration` |
| RecoveryPeriodBlockSeconds | `--recovery-period-block-duration` |
| PreventCrossDataCenterPrimaryFailover | `--prevent-cross-cell-failover` |
| LockShardTimeoutSeconds | `--lock-shard-timeout` |
| WaitReplicasTimeoutSeconds | `--wait-replicas-timeout` |
| TopoInformationRefreshSeconds | `--topo-information-refresh-duration` |
| RecoveryPollSeconds | `--recovery-poll-duration` |
Apart from configurations, some flags from VTOrc have also been removed -
- `sibling`
- `destination`
- `discovery`
- `skip-unresolve`
- `skip-unresolve-check`
- `noop`
- `binlog`
- `statement`
- `grab-election`
- `promotion-rule`
- `skip-continuous-registration`
- `enable-database-update`
- `ignore-raft-setup`
- `tag`
The ideal way to ensure backward compatibility is to remove the flags listed above while on the previous release. Then upgrade VTOrc.
After upgrading, remove the config file and instead pass the flags that are introduced.
#### Example Upgrade
If you are running VTOrc with the flags `--ignore-raft-setup --clusters_to_watch="ks/0" --config="path/to/config"` and the following configuration
```json
{
"MySQLOrchestratorHost": "host"
}
```
then, you should change it to
```json
{
"MySQLOrchestratorHost": "host",
"MySQLVTOrcHost": "host"
}
```
while still on the old release. After changing the configuration, you can upgrade vitess.
After upgrading, the old configurations can be dropped -
```json
{
"MySQLVTOrcHost": "host"
"Debug": true,
"ListenAddress": ":6922",
"MySQLTopologyUser": "orc_client_user",
"MySQLTopologyPassword": "orc_client_user_password",
"MySQLReplicaUser": "vt_repl",
"MySQLReplicaPassword": "",
"RecoveryPeriodBlockSeconds": 1,
"InstancePollSeconds": 1,
"PreventCrossDataCenterPrimaryFailover": true
}
```
First drop the flag `--ignore-raft-setup` while on the previous release. So, you'll be running VTOrc with `--clusters_to_watch="ks/0" --config="path/to/config"` and the same configuration listed above.
Now you can upgrade your VTOrc version continuing to use the same flags and configurations, and it will continue to work just the same. If you wish to use the new UI, then you can add the `--port` flag as well.
After upgrading, you can drop the configuration entirely and use the new flags like `--clusters_to_watch="ks/0" --recovery-period-block-duration=1s --instance-poll-time=1s --prevent-cross-cell-failover`
#### Default Configuration Files
The default files that VTOrc searches for configurations in have also changed from `"/etc/orchestrator.conf.json", "conf/orchestrator.conf.json", "orchestrator.conf.json"` to
`"/etc/vtorc.conf.json", "conf/vtorc.conf.json", "vtorc.conf.json"`.
#### Debug Pages in VTOrc
Like the other vitess binaries (`vtgate`, `vttablet`), now `vtorc` also takes a `--port` flag, on which it
displays the `/debug` pages including `/debug/status` and variables it tracks on `/debug/vars`.
This change is backward compatible and opt-in by default. Not specifying the flag works like it used to with
VTOrc running without displaying these pages.

Просмотреть файл

@ -51,8 +51,6 @@ ENV MYSQL_FLAVOR MariaDB103
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -51,8 +51,6 @@ ENV MYSQL_FLAVOR MariaDB
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -51,8 +51,6 @@ ENV MYSQL_FLAVOR MariaDB103
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -50,8 +50,6 @@ ENV PATH $VTROOT/bin:$PATH
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -51,8 +51,6 @@ ENV MYSQL_FLAVOR MySQL80
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -50,8 +50,6 @@ ENV PATH $VTROOT/bin:$PATH
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -51,8 +51,6 @@ ENV MYSQL_FLAVOR MySQL80
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -50,8 +50,6 @@ ENV PATH $VTROOT/bin:$PATH
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
# Create mount point for actual data (e.g. MySQL data dir)

Просмотреть файл

@ -78,8 +78,6 @@ ENV PATH $VTROOT/bin:$PATH
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
RUN mkdir -p /licenses

Просмотреть файл

@ -79,8 +79,6 @@ ENV MYSQL_FLAVOR MySQL80
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
RUN mkdir -p /licenses

Просмотреть файл

@ -69,8 +69,6 @@ ENV PATH $VTROOT/bin:$PATH
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
RUN mkdir -p /licenses

Просмотреть файл

@ -74,8 +74,6 @@ ENV MYSQL_FLAVOR MySQL80
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
RUN mkdir -p /licenses

Просмотреть файл

@ -86,8 +86,6 @@ ENV MYSQL_FLAVOR MySQL80
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
RUN mkdir -p /licenses

Просмотреть файл

@ -84,8 +84,6 @@ ENV MYSQL_FLAVOR MySQL80
# Copy artifacts from builder layer.
COPY --from=builder --chown=vitess:vitess /vt/install /vt
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/orchestrator /vt/web/orchestrator
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtorc /vt/web/vtorc
COPY --from=builder --chown=vitess:vitess /vt/src/vitess.io/vitess/web/vtadmin /vt/web/vtadmin
RUN mkdir -p /licenses

Просмотреть файл

@ -150,11 +150,8 @@ vitess/examples/compose$ ./lvtctl.sh Help
- vtgate web ui:
http://localhost:15099/debug/status
- vtorc web ui:
- vtorc ui:
http://localhost:13000
- vtorc debug ui:
http://localhost:13200
- Stream querylog
`curl -S localhost:15099/debug/querylog`

Просмотреть файл

@ -285,8 +285,7 @@ services:
- vtctld
- set_keyspace_durability_policy
ports:
- "13200:8080"
- "13000:3000"
- "13000:8080"
volumes:
- ".:/script"
environment:
@ -304,7 +303,7 @@ services:
- DB_PASS
- DB_CHARSET
healthcheck:
test: ["CMD-SHELL","curl -s --fail --show-error localhost:3000/api/status"]
test: ["CMD-SHELL","curl -s --fail --show-error localhost:8080/debug/health"]
interval: 5s
timeout: 10s
retries: 15

Просмотреть файл

@ -184,8 +184,7 @@ services:
- DB_PASS=
image: vitess/lite:${VITESS_TAG:-latest}
ports:
- 13200:8080
- 13000:3000
- 13000:8080
volumes:
- .:/script
vttablet101:

Просмотреть файл

@ -201,8 +201,7 @@ services:
- DB_PASS=
image: vitess/lite:${VITESS_TAG:-latest}
ports:
- 13000:3000
- 13200:8080
- 13000:8080
volumes:
- .:/script
vttablet101:

Просмотреть файл

@ -750,8 +750,7 @@ func generateVTOrc(dbInfo externalDbInfo, keyspaceInfoMap map[string]keyspaceInf
- DB_USER=%[4]s
- DB_PASS=%[5]s
ports:
- "13000:3000"
- "13200:%[1]d"
- "13000:%[1]d"
command: ["sh", "-c", "/script/vtorc-up.sh"]
%[6]s
`, opts.webPort, opts.topologyFlags, externalDb, dbInfo.dbUser, dbInfo.dbPass, dependsOn)

Просмотреть файл

@ -42,5 +42,4 @@ exec /vt/bin/vtorc \
$TOPOLOGY_FLAGS \
--logtostderr=true \
--port $web_port \
--orc_web_dir=/vt/web/vtorc \
--config $config

Просмотреть файл

@ -1,7 +1,6 @@
{
"Debug": true,
"EnableSyslog": false,
"ListenAddress": ":3000",
"MySQLTopologyUser": "orc_client_user",
"MySQLTopologyPassword": "orc_client_user_password",
"MySQLReplicaUser": "vt_repl",

Просмотреть файл

@ -3,13 +3,10 @@
source ./env.sh
log_dir="${VTDATAROOT}/tmp"
web_dir="../../web/vtorc"
vtorc_web_port=16000
port=16001
port=16000
vtorc \
$TOPOLOGY_FLAGS \
--orc_web_dir "${web_dir}" \
--logtostderr \
--alsologtostderr \
--config="./vtorc/config.json" \
@ -21,8 +18,7 @@ echo ${vtorc_pid} > "${log_dir}/vtorc.pid"
echo "\
vtorc is running!
- UI: http://localhost:${vtorc_web_port}
- Debug UI: http://localhost:${port}
- UI: http://localhost:${port}
- Logs: ${log_dir}/vtorc.out
- PID: ${vtorc_pid}
"

Просмотреть файл

@ -1,5 +1,4 @@
{
"ListenAddress": ":16000",
"MySQLTopologyUser": "orc_client_user",
"MySQLTopologyPassword": "orc_client_user_password",
"MySQLReplicaUser": "vt_repl",

Просмотреть файл

@ -3,13 +3,10 @@
source ./env.sh
log_dir="${VTDATAROOT}/tmp"
web_dir="../../web/vtorc"
vtorc_web_port=16000
port=16001
port=16000
vtorc \
$TOPOLOGY_FLAGS \
--orc_web_dir "${web_dir}" \
--logtostderr \
--alsologtostderr \
--config="./vtorc/config.json" \
@ -21,8 +18,7 @@ echo ${vtorc_pid} > "${log_dir}/vtorc.pid"
echo "\
vtorc is running!
- UI: http://localhost:${vtorc_web_port}
- Debug UI: http://localhost:${port}
- UI: http://localhost:${port}
- Logs: ${log_dir}/vtorc.out
- PID: ${vtorc_pid}
"

Просмотреть файл

@ -1,5 +1,4 @@
{
"ListenAddress": ":16000",
"MySQLTopologyUser": "orc_client_user",
"MySQLTopologyPassword": "orc_client_user_password",
"MySQLReplicaUser": "vt_repl",

4
go.mod
Просмотреть файл

@ -49,9 +49,6 @@ require (
github.com/klauspost/pgzip v1.2.4
github.com/krishicks/yaml-patch v0.0.10
github.com/magiconair/properties v1.8.5
github.com/martini-contrib/auth v0.0.0-20150219114609-fa62c19b7ae8
github.com/martini-contrib/gzip v0.0.0-20151124214156-6c035326b43f
github.com/martini-contrib/render v0.0.0-20150707142108-ec18f8345a11
github.com/mattn/go-sqlite3 v1.14.14
github.com/minio/minio-go v0.0.0-20190131015406-c8a261de75c1
github.com/mitchellh/go-testing-interface v1.14.0 // indirect
@ -59,7 +56,6 @@ require (
github.com/olekukonko/tablewriter v0.0.5-0.20200416053754-163badb3bac6
github.com/opentracing-contrib/go-grpc v0.0.0-20180928155321-4b5a12d3ff02
github.com/opentracing/opentracing-go v1.1.0
github.com/oxtoacart/bpool v0.0.0-20190530202638-03653db5a59c // indirect
github.com/patrickmn/go-cache v2.1.0+incompatible
github.com/philhofer/fwd v1.0.0 // indirect
github.com/pierrec/lz4 v2.6.1+incompatible

8
go.sum
Просмотреть файл

@ -506,12 +506,6 @@ github.com/mailru/easyjson v0.0.0-20190312143242-1de009706dbe/go.mod h1:C1wdFJiN
github.com/mailru/easyjson v0.0.0-20190614124828-94de47d64c63/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
github.com/mailru/easyjson v0.0.0-20190626092158-b2ccc519800e/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
github.com/mailru/easyjson v0.7.0/go.mod h1:KAzv3t3aY1NaHWoQz1+4F1ccyAH66Jk7yos7ldAVICs=
github.com/martini-contrib/auth v0.0.0-20150219114609-fa62c19b7ae8 h1:1ded5x5QpCLsyTH5ct62Rh1RXPFnn0/dubCqAeh+stU=
github.com/martini-contrib/auth v0.0.0-20150219114609-fa62c19b7ae8/go.mod h1:ahTFgV/NtzY/CALneRrC67m1dis5arHTQDfyIhKk69E=
github.com/martini-contrib/gzip v0.0.0-20151124214156-6c035326b43f h1:wVDxEVZP1eiPIlHVaafUAEUDtyl6ytjHv3egJVbyfOk=
github.com/martini-contrib/gzip v0.0.0-20151124214156-6c035326b43f/go.mod h1:jhUB0rZB2TPWqy0yGugKRRictO591eSO7If7O4MfCaA=
github.com/martini-contrib/render v0.0.0-20150707142108-ec18f8345a11 h1:YFh+sjyJTMQSYjKwM4dFKhJPJC/wfo98tPUc17HdoYw=
github.com/martini-contrib/render v0.0.0-20150707142108-ec18f8345a11/go.mod h1:Ah2dBMoxZEqk118as2T4u4fjfXarE0pPnMJaArZQZsI=
github.com/mattn/go-colorable v0.0.9/go.mod h1:9vuHe8Xs5qXnSaW/c/ABM9alt+Vo+STaOChaDxuIBZU=
github.com/mattn/go-colorable v0.1.4/go.mod h1:U0ppj6V5qS13XJ6of8GYAs25YV2eR4EVcfRqFIhoBtE=
github.com/mattn/go-colorable v0.1.6 h1:6Su7aK7lXmJ/U79bYtBjLNaha4Fs1Rg9plHpcH+vvnE=
@ -587,8 +581,6 @@ github.com/opentracing-contrib/go-grpc v0.0.0-20180928155321-4b5a12d3ff02 h1:0R5
github.com/opentracing-contrib/go-grpc v0.0.0-20180928155321-4b5a12d3ff02/go.mod h1:JNdpVEzCpXBgIiv4ds+TzhN1hrtxq6ClLrTlT9OQRSc=
github.com/opentracing/opentracing-go v1.1.0 h1:pWlfV3Bxv7k65HYwkikxat0+s3pV4bsqf19k25Ur8rU=
github.com/opentracing/opentracing-go v1.1.0/go.mod h1:UkNAQd3GIcIGf0SeVgPpRdFStlNbqXla1AfSYxPUl2o=
github.com/oxtoacart/bpool v0.0.0-20190530202638-03653db5a59c h1:rp5dCmg/yLR3mgFuSOe4oEnDDmGLROTvMragMUXpTQw=
github.com/oxtoacart/bpool v0.0.0-20190530202638-03653db5a59c/go.mod h1:X07ZCGwUbLaax7L0S3Tw4hpejzu63ZrrQiUe6W0hcy0=
github.com/pascaldekloe/goe v0.0.0-20180627143212-57f6aae5913c/go.mod h1:lzWF7FIEvWOWxwDKqyGYQf6ZUaNfKdP144TG7ZOy1lc=
github.com/pascaldekloe/goe v0.1.0 h1:cBOtyMzM9HTpWjXfbbunk26uA6nG3a8n06Wieeh0MwY=
github.com/pascaldekloe/goe v0.1.0/go.mod h1:lzWF7FIEvWOWxwDKqyGYQf6ZUaNfKdP144TG7ZOy1lc=

Просмотреть файл

@ -31,17 +31,12 @@ import (
vtlog "vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/logutil"
"vitess.io/vitess/go/vt/servenv"
"vitess.io/vitess/go/vt/vtorc/app"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtorc/logic"
"vitess.io/vitess/go/vt/vtorc/server"
)
var (
GitCommit string
AppVersion string
)
// transformArgsForPflag turns a slice of raw args passed on the command line,
// possibly incompatible with pflag (because the user is expecting stdlib flag
// parsing behavior) and transforms them into the arguments that should have
@ -107,6 +102,9 @@ func main() {
grpccommon.RegisterFlags(fs)
vtlog.RegisterFlags(fs)
logutil.RegisterFlags(fs)
logic.RegisterFlags(fs)
server.RegisterFlags(fs)
config.RegisterFlags(fs)
servenv.RegisterDefaultFlags()
servenv.RegisterFlags()
acl.RegisterFlags(fs)
@ -116,20 +114,6 @@ func main() {
os.Args = os.Args[0:1]
configFile := fs.String("config", "", "config file name")
sibling := fs.StringP("sibling", "s", "", "sibling instance, host_fqdn[:port]")
destination := fs.StringP("destination", "d", "", "destination instance, host_fqdn[:port] (synonym to -s)")
discovery := fs.Bool("discovery", true, "auto discovery mode")
config.RuntimeCLIFlags.SkipUnresolve = fs.Bool("skip-unresolve", false, "Do not unresolve a host name")
config.RuntimeCLIFlags.SkipUnresolveCheck = fs.Bool("skip-unresolve-check", false, "Skip/ignore checking an unresolve mapping (via hostname_unresolve table) resolves back to same hostname")
config.RuntimeCLIFlags.Noop = fs.Bool("noop", false, "Dry run; do not perform destructing operations")
config.RuntimeCLIFlags.BinlogFile = fs.String("binlog", "", "Binary log file name")
config.RuntimeCLIFlags.Statement = fs.String("statement", "", "Statement/hint")
config.RuntimeCLIFlags.GrabElection = fs.Bool("grab-election", false, "Grab leadership (only applies to continuous mode)")
config.RuntimeCLIFlags.PromotionRule = fs.String("promotion-rule", "prefer", "Promotion rule for register-andidate (prefer|neutral|prefer_not|must_not)")
config.RuntimeCLIFlags.SkipContinuousRegistration = fs.Bool("skip-continuous-registration", false, "Skip cli commands performaing continuous registration (to reduce orchestratrator backend db load")
config.RuntimeCLIFlags.EnableDatabaseUpdate = fs.Bool("enable-database-update", false, "Enable database update, overrides SkipVTOrcDatabaseUpdate")
config.RuntimeCLIFlags.IgnoreRaftSetup = fs.Bool("ignore-raft-setup", false, "Override RaftEnabled for CLI invocation (CLI by default not allowed for raft setups). NOTE: operations by CLI invocation may not reflect in all raft nodes.")
config.RuntimeCLIFlags.Tag = fs.String("tag", "", "tag to add ('tagname' or 'tagname=tagvalue') or to search ('tagname' or 'tagname=tagvalue' or comma separated 'tag0,tag1=val1,tag2' for intersection of all)")
os.Args = append(os.Args, transformArgsForPflag(fs, args[1:])...)
if !reflect.DeepEqual(args, os.Args) {
@ -143,48 +127,22 @@ Please update your scripts before the next version, when this will begin to brea
}
servenv.ParseFlags("vtorc")
config.UpdateConfigValuesFromFlags()
if *destination != "" && *sibling != "" {
log.Fatalf("-s and -d are synonyms, yet both were specified. You're probably doing the wrong thing.")
}
switch *config.RuntimeCLIFlags.PromotionRule {
case "prefer", "neutral", "prefer_not", "must_not":
{
// OK
}
default:
{
log.Fatalf("--promotion-rule only supports prefer|neutral|prefer_not|must_not")
}
}
if *destination == "" {
*destination = *sibling
}
startText := "starting vtorc"
if AppVersion != "" {
startText += ", version: " + AppVersion
}
if GitCommit != "" {
startText += ", git commit: " + GitCommit
}
log.Info(startText)
log.Info("starting vtorc")
if len(*configFile) > 0 {
config.ForceRead(*configFile)
} else {
config.Read("/etc/vtorc.conf.json", "conf/vtorc.conf.json", "vtorc.conf.json")
}
if *config.RuntimeCLIFlags.EnableDatabaseUpdate {
config.Config.SkipOrchestratorDatabaseUpdate = false
}
if config.Config.AuditToSyslog {
inst.EnableAuditSyslog()
}
config.RuntimeCLIFlags.ConfiguredVersion = AppVersion
config.MarkConfigurationLoaded()
go app.HTTP(*discovery)
// Log final config values to debug if something goes wrong.
config.LogConfigValues()
server.StartVTOrcDiscovery()
server.RegisterVTOrcAPIEndpoints()
servenv.OnRun(func() {

Просмотреть файл

@ -1,75 +1,74 @@
Usage of vtorc:
--alsologtostderr log to standard error as well as files
--binlog string Binary log file name
--catch-sigpipe catch and ignore SIGPIPE on stdout and stderr if specified
--clusters_to_watch string Comma-separated list of keyspaces or keyspace/shards that this instance will monitor and repair. Defaults to all clusters in the topology. Example: "ks1,ks2/-80"
--config string config file name
-d, --destination string destination instance, host_fqdn[:port] (synonym to -s)
--discovery auto discovery mode (default true)
--enable-database-update Enable database update, overrides SkipVTOrcDatabaseUpdate
--grab-election Grab leadership (only applies to continuous mode)
--grpc_auth_static_client_creds string When using grpc_static_auth in the server, this file provides the credentials to use to authenticate with server.
--grpc_compression string Which protocol to use for compressing gRPC. Default: nothing. Supported: snappy
--grpc_enable_tracing Enable gRPC tracing.
--grpc_initial_conn_window_size int gRPC initial connection window size
--grpc_initial_window_size int gRPC initial window size
--grpc_keepalive_time duration After a duration of this time, if the client doesn't see any activity, it pings the server to see if the transport is still alive. (default 10s)
--grpc_keepalive_timeout duration After having pinged for keepalive check, the client waits for a duration of Timeout and if no activity is seen even after that the connection is closed. (default 10s)
--grpc_max_message_size int Maximum allowed RPC message size. Larger messages will be rejected by gRPC with the error 'exceeding the max size'. (default 16777216)
--grpc_prometheus Enable gRPC monitoring with Prometheus.
-h, --help display usage and exit
--ignore-raft-setup Override RaftEnabled for CLI invocation (CLI by default not allowed for raft setups). NOTE: operations by CLI invocation may not reflect in all raft nodes.
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800)
--logtostderr log to standard error instead of files
--noop Dry run; do not perform destructing operations
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--orc_web_dir string VTOrc http file location (default "web/vtorc")
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--port int port for the server
--pprof strings enable profiling
--promotion-rule string Promotion rule for register-andidate (prefer|neutral|prefer_not|must_not) (default "prefer")
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
--shutdown_wait_time duration maximum time to wait for vtorc to release all the locks that it is holding before shutting down on SIGTERM (default 30s)
-s, --sibling string sibling instance, host_fqdn[:port]
--skip-continuous-registration Skip cli commands performaing continuous registration (to reduce orchestratrator backend db load
--skip-unresolve Do not unresolve a host name
--skip-unresolve-check Skip/ignore checking an unresolve mapping (via hostname_unresolve table) resolves back to same hostname
--statement string Statement/hint
--stderrthreshold severity logs at or above this threshold go to stderr (default 1)
--tablet_manager_grpc_ca string the server ca to use to validate servers when connecting
--tablet_manager_grpc_cert string the cert to use to connect
--tablet_manager_grpc_concurrency int concurrency to use to talk to a vttablet server for performance-sensitive RPCs (like ExecuteFetchAs{Dba,AllPrivs,App}) (default 8)
--tablet_manager_grpc_connpool_size int number of tablets to keep tmclient connections open to (default 100)
--tablet_manager_grpc_crl string the server crl to use to validate server certificates when connecting
--tablet_manager_grpc_key string the key to use to connect
--tablet_manager_grpc_server_name string the server name to use to validate server certificate
--tablet_manager_protocol string Protocol to use to make tabletmanager RPCs to vttablets. (default "grpc")
--tag string tag to add ('tagname' or 'tagname=tagvalue') or to search ('tagname' or 'tagname=tagvalue' or comma separated 'tag0,tag1=val1,tag2' for intersection of all)
--topo_etcd_lease_ttl int Lease TTL for locks and leader election. The client will use KeepAlive to keep the lease going. (default 30)
--topo_etcd_tls_ca string path to the ca to use to validate the server cert when connecting to the etcd topo server
--topo_etcd_tls_cert string path to the client cert to use to connect to the etcd topo server, requires topo_etcd_tls_key, enables TLS
--topo_etcd_tls_key string path to the client key to use to connect to the etcd topo server, enables TLS
--topo_global_root string the path of the global topology data in the global topology server
--topo_global_server_address string the address of the global topology server
--topo_implementation string the topology implementation to use
--topo_k8s_context string The kubeconfig context to use, overrides the 'current-context' from the config
--topo_k8s_kubeconfig string Path to a valid kubeconfig file. When running as a k8s pod inside the same cluster you wish to use as the topo, you may omit this and the below arguments, and Vitess is capable of auto-discovering the correct values. https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod
--topo_k8s_namespace string The kubernetes namespace to use for all objects. Default comes from the context or in-cluster config
--topo_zk_auth_file string auth to use when connecting to the zk topo server, file contents should be <scheme>:<auth>, e.g., digest:user:pass
--topo_zk_base_timeout duration zk base timeout (see zk.Connect) (default 30s)
--topo_zk_max_concurrency int maximum number of pending requests to send to a Zookeeper server. (default 64)
--topo_zk_tls_ca string the server ca to use to validate servers when connecting to the zk topo server
--topo_zk_tls_cert string the cert to use to connect to the zk topo server, requires topo_zk_tls_key, enables TLS
--topo_zk_tls_key string the key to use to connect to the zk topo server, enables TLS
-v, --v Level log level for V logs
--version print binary version
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
--alsologtostderr log to standard error as well as files
--audit-file-location string File location where the audit logs are to be stored
--audit-purge-duration duration Duration for which audit logs are held before being purged. Should be in multiples of days (default 168h0m0s)
--audit-to-backend Whether to store the audit log in the VTOrc database
--audit-to-syslog Whether to store the audit log in the syslog
--catch-sigpipe catch and ignore SIGPIPE on stdout and stderr if specified
--clusters_to_watch strings Comma-separated list of keyspaces or keyspace/shards that this instance will monitor and repair. Defaults to all clusters in the topology. Example: "ks1,ks2/-80"
--config string config file name
--grpc_auth_static_client_creds string When using grpc_static_auth in the server, this file provides the credentials to use to authenticate with server.
--grpc_compression string Which protocol to use for compressing gRPC. Default: nothing. Supported: snappy
--grpc_enable_tracing Enable gRPC tracing.
--grpc_initial_conn_window_size int gRPC initial connection window size
--grpc_initial_window_size int gRPC initial window size
--grpc_keepalive_time duration After a duration of this time, if the client doesn't see any activity, it pings the server to see if the transport is still alive. (default 10s)
--grpc_keepalive_timeout duration After having pinged for keepalive check, the client waits for a duration of Timeout and if no activity is seen even after that the connection is closed. (default 10s)
--grpc_max_message_size int Maximum allowed RPC message size. Larger messages will be rejected by gRPC with the error 'exceeding the max size'. (default 16777216)
--grpc_prometheus Enable gRPC monitoring with Prometheus.
-h, --help display usage and exit
--instance-poll-time duration Timer duration on which VTOrc refreshes MySQL information (default 5s)
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-shard-timeout duration Duration for which a shard lock is held when running a recovery (default 30s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
--log_rotate_max_size uint size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800)
--logtostderr log to standard error instead of files
--onclose_timeout duration wait no more than this for OnClose handlers before stopping (default 1ns)
--onterm_timeout duration wait no more than this for OnTermSync handlers before stopping (default 10s)
--pid_file string If set, the process will write its pid to the named file, and delete it on graceful shutdown.
--port int port for the server
--pprof strings enable profiling
--prevent-cross-cell-failover Prevent VTOrc from promoting a primary in a different cell than the current primary in case of a failover
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--reasonable-replication-lag duration Maximum replication lag on replicas which is deemed to be acceptable (default 10s)
--recovery-period-block-duration duration Duration for which a new recovery is blocked on an instance after running a recovery (default 30s)
--recovery-poll-duration duration Timer duration on which VTOrc polls its database to run a recovery (default 1s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
--shutdown_wait_time duration Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s)
--snapshot-topology-interval duration Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours
--sqlite-data-file string SQLite Datafile to use as VTOrc's database (default "file::memory:?mode=memory&cache=shared")
--stderrthreshold severity logs at or above this threshold go to stderr (default 1)
--tablet_manager_grpc_ca string the server ca to use to validate servers when connecting
--tablet_manager_grpc_cert string the cert to use to connect
--tablet_manager_grpc_concurrency int concurrency to use to talk to a vttablet server for performance-sensitive RPCs (like ExecuteFetchAs{Dba,AllPrivs,App}) (default 8)
--tablet_manager_grpc_connpool_size int number of tablets to keep tmclient connections open to (default 100)
--tablet_manager_grpc_crl string the server crl to use to validate server certificates when connecting
--tablet_manager_grpc_key string the key to use to connect
--tablet_manager_grpc_server_name string the server name to use to validate server certificate
--tablet_manager_protocol string Protocol to use to make tabletmanager RPCs to vttablets. (default "grpc")
--topo-information-refresh-duration duration Timer duration on which VTOrc refreshes the keyspace and vttablet records from the topology server (default 15s)
--topo_etcd_lease_ttl int Lease TTL for locks and leader election. The client will use KeepAlive to keep the lease going. (default 30)
--topo_etcd_tls_ca string path to the ca to use to validate the server cert when connecting to the etcd topo server
--topo_etcd_tls_cert string path to the client cert to use to connect to the etcd topo server, requires topo_etcd_tls_key, enables TLS
--topo_etcd_tls_key string path to the client key to use to connect to the etcd topo server, enables TLS
--topo_global_root string the path of the global topology data in the global topology server
--topo_global_server_address string the address of the global topology server
--topo_implementation string the topology implementation to use
--topo_k8s_context string The kubeconfig context to use, overrides the 'current-context' from the config
--topo_k8s_kubeconfig string Path to a valid kubeconfig file. When running as a k8s pod inside the same cluster you wish to use as the topo, you may omit this and the below arguments, and Vitess is capable of auto-discovering the correct values. https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod
--topo_k8s_namespace string The kubernetes namespace to use for all objects. Default comes from the context or in-cluster config
--topo_zk_auth_file string auth to use when connecting to the zk topo server, file contents should be <scheme>:<auth>, e.g., digest:user:pass
--topo_zk_base_timeout duration zk base timeout (see zk.Connect) (default 30s)
--topo_zk_max_concurrency int maximum number of pending requests to send to a Zookeeper server. (default 64)
--topo_zk_tls_ca string the server ca to use to validate servers when connecting to the zk topo server
--topo_zk_tls_cert string the cert to use to connect to the zk topo server, requires topo_zk_tls_key, enables TLS
--topo_zk_tls_key string the key to use to connect to the zk topo server, enables TLS
-v, --v Level log level for V logs
--version print binary version
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
--wait-replicas-timeout duration Duration for which to wait for replica's to respond when issuing RPCs (default 30s)

Просмотреть файл

@ -73,7 +73,7 @@ func parseInterval(s string) (interval, error) {
// parseMysql56GTIDSet is registered as a GTIDSet parser.
//
// https://dev.mysql.com/doc/refman/5.6/en/replication-gtids-concepts.html
func parseMysql56GTIDSet(s string) (GTIDSet, error) {
func parseMysql56GTIDSet(s string) (Mysql56GTIDSet, error) {
set := Mysql56GTIDSet{}
// gtid_set: uuid_set [, uuid_set] ...
@ -656,5 +656,23 @@ func popInterval(dst *interval, s1, s2 *[]interval) bool {
}
func init() {
gtidSetParsers[Mysql56FlavorID] = parseMysql56GTIDSet
gtidSetParsers[Mysql56FlavorID] = func(s string) (GTIDSet, error) {
return parseMysql56GTIDSet(s)
}
}
// Subtract takes in two Mysql56GTIDSets as strings and subtracts the second from the first
// The result is also a string.
// An error is thrown if parsing is not possible for either GTIDSets
func Subtract(lhs, rhs string) (string, error) {
lhsSet, err := parseMysql56GTIDSet(lhs)
if err != nil {
return "", err
}
rhsSet, err := parseMysql56GTIDSet(rhs)
if err != nil {
return "", err
}
diffSet := lhsSet.Difference(rhsSet)
return diffSet.String(), nil
}

Просмотреть файл

@ -17,6 +17,7 @@ limitations under the License.
package mysql
import (
"fmt"
"reflect"
"sort"
"strings"
@ -576,3 +577,61 @@ func TestMySQL56GTIDSetLast(t *testing.T) {
assert.Equal(t, want, got)
}
}
func TestSubtract(t *testing.T) {
tests := []struct {
name string
lhs string
rhs string
difference string
wantErr string
}{
{
name: "Extra GTID set on left side",
lhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
rhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8",
difference: "8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
}, {
name: "Extra GTID set on right side",
lhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8",
rhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
difference: "",
}, {
name: "Empty left side",
lhs: "",
rhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8",
difference: "",
}, {
name: "Empty right side",
lhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
rhs: "",
difference: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
}, {
name: "Equal sets",
lhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
rhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8,8bc65cca-3fe4-11ed-bbfb-091034d48b3e:1",
difference: "",
}, {
name: "parsing error in left set",
lhs: "incorrect set",
rhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8",
wantErr: `invalid MySQL 5.6 GTID set ("incorrect set"): expected uuid:interval`,
}, {
name: "parsing error in right set",
lhs: "8bc65c84-3fe4-11ed-a912-257f0fcdd6c9:1-8",
rhs: "incorrect set",
wantErr: `invalid MySQL 5.6 GTID set ("incorrect set"): expected uuid:interval`,
},
}
for _, tt := range tests {
t.Run(fmt.Sprintf("%s: %s-%s", tt.name, tt.lhs, tt.rhs), func(t *testing.T) {
got, err := Subtract(tt.lhs, tt.rhs)
if tt.wantErr != "" {
assert.EqualError(t, err, tt.wantErr)
} else {
assert.NoError(t, err)
assert.Equal(t, tt.difference, got)
}
})
}
}

Просмотреть файл

@ -54,7 +54,6 @@ type VTOrcConfiguration struct {
MySQLReplicaUser string
MySQLReplicaPassword string
RecoveryPeriodBlockSeconds int
InstancePollSeconds int
PreventCrossDataCenterPrimaryFailover bool `json:",omitempty"`
LockShardTimeoutSeconds int `json:",omitempty"`
ReplicationLagQuery string `json:",omitempty"`
@ -76,7 +75,6 @@ func (config *VTOrcConfiguration) AddDefaults(webPort int) {
if config.RecoveryPeriodBlockSeconds == 0 {
config.RecoveryPeriodBlockSeconds = 1
}
config.InstancePollSeconds = 1
config.ListenAddress = fmt.Sprintf(":%d", webPort)
}
@ -111,6 +109,9 @@ func (orc *VTOrcProcess) Setup() (err error) {
"--topo_global_root", orc.TopoGlobalRoot,
"--config", orc.ConfigPath,
"--port", fmt.Sprintf("%d", orc.Port),
// This parameter is overriden from the config file, added here to just verify that we indeed use the config file paramter over the flag
"--recovery-period-block-duration", "10h",
"--instance-poll-time", "1s",
"--orc_web_dir", path.Join(os.Getenv("VTROOT"), "web", "vtorc"),
)
if *isCoverage {

Просмотреть файл

@ -24,9 +24,9 @@ import (
"vitess.io/vitess/go/test/endtoend/cluster"
"vitess.io/vitess/go/test/endtoend/vtorc/utils"
"vitess.io/vitess/go/vt/vtorc/app"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtorc/server"
_ "github.com/go-sql-driver/mysql"
_ "github.com/mattn/go-sqlite3"
@ -49,30 +49,10 @@ func TestReadTopologyInstanceBufferable(t *testing.T) {
require.NoError(t, err)
err = flag.Set("topo_global_root", clusterInfo.ClusterInstance.VtctlProcess.TopoGlobalRoot)
require.NoError(t, err)
falseVal := false
emptyVal := ""
config.Config.Debug = true
config.Config.MySQLTopologyUser = "orc_client_user"
config.Config.MySQLTopologyPassword = "orc_client_user_password"
config.Config.MySQLReplicaUser = "vt_repl"
config.Config.MySQLReplicaPassword = ""
config.Config.RecoveryPeriodBlockSeconds = 1
config.Config.InstancePollSeconds = 1
config.RuntimeCLIFlags.SkipUnresolve = &falseVal
config.RuntimeCLIFlags.SkipUnresolveCheck = &falseVal
config.RuntimeCLIFlags.Noop = &falseVal
config.RuntimeCLIFlags.BinlogFile = &emptyVal
config.RuntimeCLIFlags.Statement = &emptyVal
config.RuntimeCLIFlags.GrabElection = &falseVal
config.RuntimeCLIFlags.SkipContinuousRegistration = &falseVal
config.RuntimeCLIFlags.EnableDatabaseUpdate = &falseVal
config.RuntimeCLIFlags.IgnoreRaftSetup = &falseVal
config.RuntimeCLIFlags.Tag = &emptyVal
config.MarkConfigurationLoaded()
go func() {
app.HTTP(true)
}()
server.StartVTOrcDiscovery()
primary := utils.ShardPrimaryTablet(t, clusterInfo, keyspace, shard0)
assert.NotNil(t, primary, "should have elected a primary")
@ -87,7 +67,7 @@ func TestReadTopologyInstanceBufferable(t *testing.T) {
primaryInstance, err := inst.ReadTopologyInstanceBufferable(&inst.InstanceKey{
Hostname: utils.Hostname,
Port: primary.MySQLPort,
}, false, nil)
}, nil)
require.NoError(t, err)
require.NotNil(t, primaryInstance)
assert.Contains(t, primaryInstance.InstanceAlias, "zone1")
@ -119,10 +99,14 @@ func TestReadTopologyInstanceBufferable(t *testing.T) {
assert.Equal(t, primaryInstance.ReplicationSQLThreadState, inst.ReplicationThreadStateNoThread)
assert.Equal(t, fmt.Sprintf("%v:%v", keyspace.Name, shard0.Name), primaryInstance.ClusterName)
// insert an errant GTID in the replica
_, err = utils.RunSQL(t, "insert into vt_insert_test(id, msg) values (10173, 'test 178342')", replica, "vt_ks")
require.NoError(t, err)
replicaInstance, err := inst.ReadTopologyInstanceBufferable(&inst.InstanceKey{
Hostname: utils.Hostname,
Port: replica.MySQLPort,
}, false, nil)
}, nil)
require.NoError(t, err)
require.NotNil(t, replicaInstance)
assert.Contains(t, replicaInstance.InstanceAlias, "zone1")
@ -148,7 +132,7 @@ func TestReadTopologyInstanceBufferable(t *testing.T) {
assert.NotEmpty(t, replicaInstance.ExecutedGtidSet)
assert.Contains(t, replicaInstance.ExecutedGtidSet, primaryInstance.ServerUUID)
assert.Empty(t, replicaInstance.GtidPurged)
assert.Empty(t, replicaInstance.GtidErrant)
assert.Regexp(t, ".{8}-.{4}-.{4}-.{4}-.{12}:.*", replicaInstance.GtidErrant)
assert.True(t, replicaInstance.HasReplicationCredentials)
assert.Equal(t, replicaInstance.ReplicationIOThreadState, inst.ReplicationThreadStateRunning)
assert.Equal(t, replicaInstance.ReplicationSQLThreadState, inst.ReplicationThreadStateRunning)

Просмотреть файл

@ -18,7 +18,15 @@ package config
import (
"encoding/json"
"fmt"
"net/url"
"os"
"regexp"
"strings"
"gopkg.in/gcfg.v1"
"vitess.io/vitess/go/vt/log"
)
// VTGRConfig is the config for VTGR
@ -56,3 +64,510 @@ func ReadVTGRConfig(file string) (*VTGRConfig, error) {
}
return vtgrCfg, nil
}
/*
Everything below has been copied over from the VTOrc package
*/
var (
envVariableRegexp = regexp.MustCompile("[$][{](.*)[}]")
)
const (
DefaultStatusAPIEndpoint = "/api/status"
)
const (
MySQLTopologyMaxPoolConnections = 3
)
// Configuration makes for orchestrator configuration input, which can be provided by user via JSON formatted file.
// Some of the parameteres have reasonable default values, and some (like database credentials) are
// strictly expected from user.
// TODO(sougou): change this to yaml parsing, and possible merge with tabletenv.
type Configuration struct {
Debug bool // set debug mode (similar to --debug option)
EnableSyslog bool // Should logs be directed (in addition) to syslog daemon?
ListenAddress string // Where orchestrator HTTP should listen for TCP
ListenSocket string // Where orchestrator HTTP should listen for unix socket (default: empty; when given, TCP is disabled)
HTTPAdvertise string // optional, for raft setups, what is the HTTP address this node will advertise to its peers (potentially use where behind NAT or when rerouting ports; example: "http://11.22.33.44:3030")
AgentsServerPort string // port orchestrator agents talk back to
MySQLTopologyUser string // The user VTOrc will use to connect to MySQL instances
MySQLTopologyPassword string // The password VTOrc will use to connect to MySQL instances
MySQLReplicaUser string // User to set on replica MySQL instances while configuring replication settings on them. If set, use this credential instead of discovering from mysql. TODO(sougou): deprecate this in favor of fetching from vttablet
MySQLReplicaPassword string // Password to set on replica MySQL instances while configuring replication settings on them.
MySQLTopologyCredentialsConfigFile string // my.cnf style configuration file from where to pick credentials. Expecting `user`, `password` under `[client]` section
MySQLTopologySSLPrivateKeyFile string // Private key file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLCertFile string // Certificate PEM file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLCAFile string // Certificate Authority PEM file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLSkipVerify bool // If true, do not strictly validate mutual TLS certs for Topology mysql instances
MySQLTopologyUseMutualTLS bool // Turn on TLS authentication with the Topology MySQL instances
MySQLTopologyUseMixedTLS bool // Mixed TLS and non-TLS authentication with the Topology MySQL instances
TLSCacheTTLFactor uint // Factor of InstancePollSeconds that we set as TLS info cache expiry
BackendDB string // EXPERIMENTAL: type of backend db; either "mysql" or "sqlite3"
SQLite3DataFile string // when BackendDB == "sqlite3", full path to sqlite3 datafile
SkipOrchestratorDatabaseUpdate bool // When true, do not check backend database schema nor attempt to update it. Useful when you may be running multiple versions of orchestrator, and you only wish certain boxes to dictate the db structure (or else any time a different orchestrator version runs it will rebuild database schema)
PanicIfDifferentDatabaseDeploy bool // When true, and this process finds the orchestrator backend DB was provisioned by a different version, panic
RaftEnabled bool // When true, setup orchestrator in a raft consensus layout. When false (default) all Raft* variables are ignored
RaftBind string
RaftAdvertise string
RaftDataDir string
DefaultRaftPort int // if a RaftNodes entry does not specify port, use this one
RaftNodes []string // Raft nodes to make initial connection with
ExpectFailureAnalysisConcensus bool
MySQLOrchestratorHost string
MySQLOrchestratorMaxPoolConnections int // The maximum size of the connection pool to the Orchestrator backend.
MySQLOrchestratorPort uint
MySQLOrchestratorDatabase string
MySQLOrchestratorUser string
MySQLOrchestratorPassword string
MySQLOrchestratorCredentialsConfigFile string // my.cnf style configuration file from where to pick credentials. Expecting `user`, `password` under `[client]` section
MySQLOrchestratorSSLPrivateKeyFile string // Private key file used to authenticate with the Orchestrator mysql instance with TLS
MySQLOrchestratorSSLCertFile string // Certificate PEM file used to authenticate with the Orchestrator mysql instance with TLS
MySQLOrchestratorSSLCAFile string // Certificate Authority PEM file used to authenticate with the Orchestrator mysql instance with TLS
MySQLOrchestratorSSLSkipVerify bool // If true, do not strictly validate mutual TLS certs for the Orchestrator mysql instances
MySQLOrchestratorUseMutualTLS bool // Turn on TLS authentication with the Orchestrator MySQL instance
MySQLOrchestratorReadTimeoutSeconds int // Number of seconds before backend mysql read operation is aborted (driver-side)
MySQLOrchestratorRejectReadOnly bool // Reject read only connections https://github.com/go-sql-driver/mysql#rejectreadonly
MySQLConnectTimeoutSeconds int // Number of seconds before connection is aborted (driver-side)
MySQLDiscoveryReadTimeoutSeconds int // Number of seconds before topology mysql read operation is aborted (driver-side). Used for discovery queries.
MySQLTopologyReadTimeoutSeconds int // Number of seconds before topology mysql read operation is aborted (driver-side). Used for all but discovery queries.
MySQLConnectionLifetimeSeconds int // Number of seconds the mysql driver will keep database connection alive before recycling it
DefaultInstancePort int // In case port was not specified on command line
ReplicationLagQuery string // custom query to check on replica lg (e.g. heartbeat table). Must return a single row with a single numeric column, which is the lag.
ReplicationCredentialsQuery string // custom query to get replication credentials. Must return a single row, with two text columns: 1st is username, 2nd is password. This is optional, and can be used by orchestrator to configure replication after primary takeover or setup of co-primary. You need to ensure the orchestrator user has the privileges to run this query
DiscoverByShowSlaveHosts bool // Attempt SHOW SLAVE HOSTS before PROCESSLIST
UseSuperReadOnly bool // Should orchestrator super_read_only any time it sets read_only
InstancePollSeconds uint // Number of seconds between instance reads
InstanceWriteBufferSize int // Instance write buffer size (max number of instances to flush in one INSERT ODKU)
BufferInstanceWrites bool // Set to 'true' for write-optimization on backend table (compromise: writes can be stale and overwrite non stale data)
InstanceFlushIntervalMilliseconds int // Max interval between instance write buffer flushes
UnseenInstanceForgetHours uint // Number of hours after which an unseen instance is forgotten
SnapshotTopologiesIntervalHours uint // Interval in hour between snapshot-topologies invocation. Default: 0 (disabled)
DiscoveryMaxConcurrency uint // Number of goroutines doing hosts discovery
DiscoveryQueueCapacity uint // Buffer size of the discovery queue. Should be greater than the number of DB instances being discovered
DiscoveryQueueMaxStatisticsSize int // The maximum number of individual secondly statistics taken of the discovery queue
DiscoveryCollectionRetentionSeconds uint // Number of seconds to retain the discovery collection information
DiscoverySeeds []string // Hard coded array of hostname:port, ensuring orchestrator discovers these hosts upon startup, assuming not already known to orchestrator
InstanceBulkOperationsWaitTimeoutSeconds uint // Time to wait on a single instance when doing bulk (many instances) operation
HostnameResolveMethod string // Method by which to "normalize" hostname ("none"/"default"/"cname")
MySQLHostnameResolveMethod string // Method by which to "normalize" hostname via MySQL server. ("none"/"@@hostname"/"@@report_host"; default "@@hostname")
SkipBinlogServerUnresolveCheck bool // Skip the double-check that an unresolved hostname resolves back to same hostname for binlog servers
ExpiryHostnameResolvesMinutes int // Number of minutes after which to expire hostname-resolves
RejectHostnameResolvePattern string // Regexp pattern for resolved hostname that will not be accepted (not cached, not written to db). This is done to avoid storing wrong resolves due to network glitches.
ReasonableReplicationLagSeconds int // Above this value is considered a problem
ProblemIgnoreHostnameFilters []string // Will minimize problem visualization for hostnames matching given regexp filters
VerifyReplicationFilters bool // Include replication filters check before approving topology refactoring
ReasonableMaintenanceReplicationLagSeconds int // Above this value move-up and move-below are blocked
CandidateInstanceExpireMinutes uint // Minutes after which a suggestion to use an instance as a candidate replica (to be preferably promoted on primary failover) is expired.
AuditLogFile string // Name of log file for audit operations. Disabled when empty.
AuditToSyslog bool // If true, audit messages are written to syslog
AuditToBackendDB bool // If true, audit messages are written to the backend DB's `audit` table (default: true)
AuditPurgeDays uint // Days after which audit entries are purged from the database
RemoveTextFromHostnameDisplay string // Text to strip off the hostname on cluster/clusters pages
ReadOnly bool
AuthenticationMethod string // Type of autherntication to use, if any. "" for none, "basic" for BasicAuth, "multi" for advanced BasicAuth, "proxy" for forwarded credentials via reverse proxy, "token" for token based access
OAuthClientID string
OAuthClientSecret string
OAuthScopes []string
HTTPAuthUser string // Username for HTTP Basic authentication (blank disables authentication)
HTTPAuthPassword string // Password for HTTP Basic authentication
AuthUserHeader string // HTTP header indicating auth user, when AuthenticationMethod is "proxy"
PowerAuthUsers []string // On AuthenticationMethod == "proxy", list of users that can make changes. All others are read-only.
PowerAuthGroups []string // list of unix groups the authenticated user must be a member of to make changes.
AccessTokenUseExpirySeconds uint // Time by which an issued token must be used
AccessTokenExpiryMinutes uint // Time after which HTTP access token expires
ClusterNameToAlias map[string]string // map between regex matching cluster name to a human friendly alias
DetectClusterAliasQuery string // Optional query (executed on topology instance) that returns the alias of a cluster. Query will only be executed on cluster primary (though until the topology's primary is resovled it may execute on other/all replicas). If provided, must return one row, one column
DetectClusterDomainQuery string // Optional query (executed on topology instance) that returns the VIP/CNAME/Alias/whatever domain name for the primary of this cluster. Query will only be executed on cluster primary (though until the topology's primary is resovled it may execute on other/all replicas). If provided, must return one row, one column
DetectInstanceAliasQuery string // Optional query (executed on topology instance) that returns the alias of an instance. If provided, must return one row, one column
DetectPromotionRuleQuery string // Optional query (executed on topology instance) that returns the promotion rule of an instance. If provided, must return one row, one column.
DataCenterPattern string // Regexp pattern with one group, extracting the datacenter name from the hostname
RegionPattern string // Regexp pattern with one group, extracting the region name from the hostname
PhysicalEnvironmentPattern string // Regexp pattern with one group, extracting physical environment info from hostname (e.g. combination of datacenter & prod/dev env)
DetectDataCenterQuery string // Optional query (executed on topology instance) that returns the data center of an instance. If provided, must return one row, one column. Overrides DataCenterPattern and useful for installments where DC cannot be inferred by hostname
DetectRegionQuery string // Optional query (executed on topology instance) that returns the region of an instance. If provided, must return one row, one column. Overrides RegionPattern and useful for installments where Region cannot be inferred by hostname
DetectPhysicalEnvironmentQuery string // Optional query (executed on topology instance) that returns the physical environment of an instance. If provided, must return one row, one column. Overrides PhysicalEnvironmentPattern and useful for installments where env cannot be inferred by hostname
DetectSemiSyncEnforcedQuery string // Optional query (executed on topology instance) to determine whether semi-sync is fully enforced for primary writes (async fallback is not allowed under any circumstance). If provided, must return one row, one column, value 0 or 1.
SupportFuzzyPoolHostnames bool // Should "submit-pool-instances" command be able to pass list of fuzzy instances (fuzzy means non-fqdn, but unique enough to recognize). Defaults 'true', implies more queries on backend db
InstancePoolExpiryMinutes uint // Time after which entries in database_instance_pool are expired (resubmit via `submit-pool-instances`)
PromotionIgnoreHostnameFilters []string // Orchestrator will not promote replicas with hostname matching pattern (via -c recovery; for example, avoid promoting dev-dedicated machines)
ServeAgentsHTTP bool // Spawn another HTTP interface dedicated for orchestrator-agent
AgentsUseSSL bool // When "true" orchestrator will listen on agents port with SSL as well as connect to agents via SSL
AgentsUseMutualTLS bool // When "true" Use mutual TLS for the server to agent communication
AgentSSLSkipVerify bool // When using SSL for the Agent, should we ignore SSL certification error
AgentSSLPrivateKeyFile string // Name of Agent SSL private key file, applies only when AgentsUseSSL = true
AgentSSLCertFile string // Name of Agent SSL certification file, applies only when AgentsUseSSL = true
AgentSSLCAFile string // Name of the Agent Certificate Authority file, applies only when AgentsUseSSL = true
AgentSSLValidOUs []string // Valid organizational units when using mutual TLS to communicate with the agents
UseSSL bool // Use SSL on the server web port
UseMutualTLS bool // When "true" Use mutual TLS for the server's web and API connections
SSLSkipVerify bool // When using SSL, should we ignore SSL certification error
SSLPrivateKeyFile string // Name of SSL private key file, applies only when UseSSL = true
SSLCertFile string // Name of SSL certification file, applies only when UseSSL = true
SSLCAFile string // Name of the Certificate Authority file, applies only when UseSSL = true
SSLValidOUs []string // Valid organizational units when using mutual TLS
StatusEndpoint string // Override the status endpoint. Defaults to '/api/status'
StatusOUVerify bool // If true, try to verify OUs when Mutual TLS is on. Defaults to false
AgentPollMinutes uint // Minutes between agent polling
UnseenAgentForgetHours uint // Number of hours after which an unseen agent is forgotten
StaleSeedFailMinutes uint // Number of minutes after which a stale (no progress) seed is considered failed.
SeedAcceptableBytesDiff int64 // Difference in bytes between seed source & target data size that is still considered as successful copy
SeedWaitSecondsBeforeSend int64 // Number of seconds for waiting before start send data command on agent
BinlogEventsChunkSize int // Chunk size (X) for SHOW BINLOG|RELAYLOG EVENTS LIMIT ?,X statements. Smaller means less locking and mroe work to be done
ReduceReplicationAnalysisCount bool // When true, replication analysis will only report instances where possibility of handled problems is possible in the first place (e.g. will not report most leaf nodes, that are mostly uninteresting). When false, provides an entry for every known instance
FailureDetectionPeriodBlockMinutes int // The time for which an instance's failure discovery is kept "active", so as to avoid concurrent "discoveries" of the instance's failure; this preceeds any recovery process, if any.
RecoveryPeriodBlockMinutes int // (supported for backwards compatibility but please use newer `RecoveryPeriodBlockSeconds` instead) The time for which an instance's recovery is kept "active", so as to avoid concurrent recoveries on smae instance as well as flapping
RecoveryPeriodBlockSeconds int // (overrides `RecoveryPeriodBlockMinutes`) The time for which an instance's recovery is kept "active", so as to avoid concurrent recoveries on smae instance as well as flapping
RecoveryIgnoreHostnameFilters []string // Recovery analysis will completely ignore hosts matching given patterns
RecoverPrimaryClusterFilters []string // Only do primary recovery on clusters matching these regexp patterns (of course the ".*" pattern matches everything)
RecoverIntermediatePrimaryClusterFilters []string // Only do IM recovery on clusters matching these regexp patterns (of course the ".*" pattern matches everything)
ProcessesShellCommand string // Shell that executes command scripts
OnFailureDetectionProcesses []string // Processes to execute when detecting a failover scenario (before making a decision whether to failover or not). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countReplicas}, {replicaHosts}, {isDowntimed}, {autoPrimaryRecovery}, {autoIntermediatePrimaryRecovery}
PreFailoverProcesses []string // Processes to execute before doing a failover (aborting operation should any once of them exits with non-zero code; order of execution undefined). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {countReplicas}, {replicaHosts}, {isDowntimed}
PostFailoverProcesses []string // Processes to execute after doing a failover (order of execution undefined). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countReplicas}, {replicaHosts}, {isDowntimed}, {isSuccessful}, {lostReplicas}, {countLostReplicas}
PostUnsuccessfulFailoverProcesses []string // Processes to execute after a not-completely-successful failover (order of execution undefined). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countReplicas}, {replicaHosts}, {isDowntimed}, {isSuccessful}, {lostReplicas}, {countLostReplicas}
PostPrimaryFailoverProcesses []string // Processes to execute after doing a primary failover (order of execution undefined). Uses same placeholders as PostFailoverProcesses
PostIntermediatePrimaryFailoverProcesses []string // Processes to execute after doing a primary failover (order of execution undefined). Uses same placeholders as PostFailoverProcesses
PostTakePrimaryProcesses []string // Processes to execute after a successful Take-Primary event has taken place
CoPrimaryRecoveryMustPromoteOtherCoPrimary bool // When 'false', anything can get promoted (and candidates are prefered over others). When 'true', orchestrator will promote the other co-primary or else fail
DetachLostReplicasAfterPrimaryFailover bool // Should replicas that are not to be lost in primary recovery (i.e. were more up-to-date than promoted replica) be forcibly detached
ApplyMySQLPromotionAfterPrimaryFailover bool // Should orchestrator take upon itself to apply MySQL primary promotion: set read_only=0, detach replication, etc.
PreventCrossDataCenterPrimaryFailover bool // When true (default: false), cross-DC primary failover are not allowed, orchestrator will do all it can to only fail over within same DC, or else not fail over at all.
PreventCrossRegionPrimaryFailover bool // When true (default: false), cross-region primary failover are not allowed, orchestrator will do all it can to only fail over within same region, or else not fail over at all.
PrimaryFailoverLostInstancesDowntimeMinutes uint // Number of minutes to downtime any server that was lost after a primary failover (including failed primary & lost replicas). 0 to disable
PrimaryFailoverDetachReplicaPrimaryHost bool // Should orchestrator issue a detach-replica-primary-host on newly promoted primary (this makes sure the new primary will not attempt to replicate old primary if that comes back to life). Defaults 'false'. Meaningless if ApplyMySQLPromotionAfterPrimaryFailover is 'true'.
FailPrimaryPromotionOnLagMinutes uint // when > 0, fail a primary promotion if the candidate replica is lagging >= configured number of minutes.
FailPrimaryPromotionIfSQLThreadNotUpToDate bool // when true, and a primary failover takes place, if candidate primary has not consumed all relay logs, promotion is aborted with error
DelayPrimaryPromotionIfSQLThreadNotUpToDate bool // when true, and a primary failover takes place, if candidate primary has not consumed all relay logs, delay promotion until the sql thread has caught up
PostponeReplicaRecoveryOnLagMinutes uint // On crash recovery, replicas that are lagging more than given minutes are only resurrected late in the recovery process, after primary/IM has been elected and processes executed. Value of 0 disables this feature
OSCIgnoreHostnameFilters []string // OSC replicas recommendation will ignore replica hostnames matching given patterns
URLPrefix string // URL prefix to run orchestrator on non-root web path, e.g. /orchestrator to put it behind nginx.
DiscoveryIgnoreReplicaHostnameFilters []string // Regexp filters to apply to prevent auto-discovering new replicas. Usage: unreachable servers due to firewalls, applications which trigger binlog dumps
DiscoveryIgnorePrimaryHostnameFilters []string // Regexp filters to apply to prevent auto-discovering a primary. Usage: pointing your primary temporarily to replicate seom data from external host
DiscoveryIgnoreHostnameFilters []string // Regexp filters to apply to prevent discovering instances of any kind
WebMessage string // If provided, will be shown on all web pages below the title bar
MaxConcurrentReplicaOperations int // Maximum number of concurrent operations on replicas
InstanceDBExecContextTimeoutSeconds int // Timeout on context used while calling ExecContext on instance database
LockShardTimeoutSeconds int // Timeout on context used to lock shard. Should be a small value because we should fail-fast
WaitReplicasTimeoutSeconds int // Timeout on amount of time to wait for the replicas in case of ERS. Should be a small value because we should fail-fast. Should not be larger than LockShardTimeoutSeconds since that is the total time we use for an ERS.
}
// ToJSONString will marshal this configuration as JSON
func (config *Configuration) ToJSONString() string {
b, _ := json.Marshal(config)
return string(b)
}
// Config is *the* configuration instance, used globally to get configuration data
var Config = newConfiguration()
var readFileNames []string
func newConfiguration() *Configuration {
return &Configuration{
Debug: false,
EnableSyslog: false,
ListenAddress: ":3000",
ListenSocket: "",
HTTPAdvertise: "",
AgentsServerPort: ":3001",
StatusEndpoint: DefaultStatusAPIEndpoint,
StatusOUVerify: false,
BackendDB: "sqlite",
SQLite3DataFile: "file::memory:?mode=memory&cache=shared",
SkipOrchestratorDatabaseUpdate: false,
PanicIfDifferentDatabaseDeploy: false,
RaftBind: "127.0.0.1:10008",
RaftAdvertise: "",
RaftDataDir: "",
DefaultRaftPort: 10008,
RaftNodes: []string{},
ExpectFailureAnalysisConcensus: true,
MySQLOrchestratorMaxPoolConnections: 128, // limit concurrent conns to backend DB
MySQLOrchestratorPort: 3306,
MySQLTopologyUseMutualTLS: false,
MySQLTopologyUseMixedTLS: true,
MySQLOrchestratorUseMutualTLS: false,
MySQLConnectTimeoutSeconds: 2,
MySQLOrchestratorReadTimeoutSeconds: 30,
MySQLOrchestratorRejectReadOnly: false,
MySQLDiscoveryReadTimeoutSeconds: 10,
MySQLTopologyReadTimeoutSeconds: 600,
MySQLConnectionLifetimeSeconds: 0,
DefaultInstancePort: 3306,
TLSCacheTTLFactor: 100,
InstancePollSeconds: 5,
InstanceWriteBufferSize: 100,
BufferInstanceWrites: false,
InstanceFlushIntervalMilliseconds: 100,
UnseenInstanceForgetHours: 240,
SnapshotTopologiesIntervalHours: 0,
DiscoverByShowSlaveHosts: false,
UseSuperReadOnly: false,
DiscoveryMaxConcurrency: 300,
DiscoveryQueueCapacity: 100000,
DiscoveryQueueMaxStatisticsSize: 120,
DiscoveryCollectionRetentionSeconds: 120,
DiscoverySeeds: []string{},
InstanceBulkOperationsWaitTimeoutSeconds: 10,
HostnameResolveMethod: "default",
MySQLHostnameResolveMethod: "none",
SkipBinlogServerUnresolveCheck: true,
ExpiryHostnameResolvesMinutes: 60,
RejectHostnameResolvePattern: "",
ReasonableReplicationLagSeconds: 10,
ProblemIgnoreHostnameFilters: []string{},
VerifyReplicationFilters: false,
ReasonableMaintenanceReplicationLagSeconds: 20,
CandidateInstanceExpireMinutes: 60,
AuditLogFile: "",
AuditToSyslog: false,
AuditToBackendDB: false,
AuditPurgeDays: 7,
RemoveTextFromHostnameDisplay: "",
ReadOnly: false,
AuthenticationMethod: "",
HTTPAuthUser: "",
HTTPAuthPassword: "",
AuthUserHeader: "X-Forwarded-User",
PowerAuthUsers: []string{"*"},
PowerAuthGroups: []string{},
AccessTokenUseExpirySeconds: 60,
AccessTokenExpiryMinutes: 1440,
ClusterNameToAlias: make(map[string]string),
DetectClusterAliasQuery: "",
DetectClusterDomainQuery: "",
DetectInstanceAliasQuery: "",
DetectPromotionRuleQuery: "",
DataCenterPattern: "",
PhysicalEnvironmentPattern: "",
DetectDataCenterQuery: "",
DetectPhysicalEnvironmentQuery: "",
DetectSemiSyncEnforcedQuery: "",
SupportFuzzyPoolHostnames: true,
InstancePoolExpiryMinutes: 60,
PromotionIgnoreHostnameFilters: []string{},
ServeAgentsHTTP: false,
AgentsUseSSL: false,
AgentsUseMutualTLS: false,
AgentSSLValidOUs: []string{},
AgentSSLSkipVerify: false,
AgentSSLPrivateKeyFile: "",
AgentSSLCertFile: "",
AgentSSLCAFile: "",
UseSSL: false,
UseMutualTLS: false,
SSLValidOUs: []string{},
SSLSkipVerify: false,
SSLPrivateKeyFile: "",
SSLCertFile: "",
SSLCAFile: "",
AgentPollMinutes: 60,
UnseenAgentForgetHours: 6,
StaleSeedFailMinutes: 60,
SeedAcceptableBytesDiff: 8192,
SeedWaitSecondsBeforeSend: 2,
BinlogEventsChunkSize: 10000,
ReduceReplicationAnalysisCount: true,
FailureDetectionPeriodBlockMinutes: 60,
RecoveryPeriodBlockMinutes: 60,
RecoveryPeriodBlockSeconds: 3600,
RecoveryIgnoreHostnameFilters: []string{},
RecoverPrimaryClusterFilters: []string{"*"},
RecoverIntermediatePrimaryClusterFilters: []string{},
ProcessesShellCommand: "bash",
OnFailureDetectionProcesses: []string{},
PreFailoverProcesses: []string{},
PostPrimaryFailoverProcesses: []string{},
PostIntermediatePrimaryFailoverProcesses: []string{},
PostFailoverProcesses: []string{},
PostUnsuccessfulFailoverProcesses: []string{},
PostTakePrimaryProcesses: []string{},
CoPrimaryRecoveryMustPromoteOtherCoPrimary: true,
DetachLostReplicasAfterPrimaryFailover: true,
ApplyMySQLPromotionAfterPrimaryFailover: true,
PreventCrossDataCenterPrimaryFailover: false,
PreventCrossRegionPrimaryFailover: false,
PrimaryFailoverLostInstancesDowntimeMinutes: 0,
PrimaryFailoverDetachReplicaPrimaryHost: false,
FailPrimaryPromotionOnLagMinutes: 0,
FailPrimaryPromotionIfSQLThreadNotUpToDate: false,
DelayPrimaryPromotionIfSQLThreadNotUpToDate: true,
PostponeReplicaRecoveryOnLagMinutes: 0,
OSCIgnoreHostnameFilters: []string{},
URLPrefix: "",
DiscoveryIgnoreReplicaHostnameFilters: []string{},
WebMessage: "",
MaxConcurrentReplicaOperations: 5,
InstanceDBExecContextTimeoutSeconds: 30,
LockShardTimeoutSeconds: 30,
WaitReplicasTimeoutSeconds: 30,
}
}
func (config *Configuration) postReadAdjustments() error {
if config.MySQLOrchestratorCredentialsConfigFile != "" {
mySQLConfig := struct {
Client struct {
User string
Password string
}
}{}
err := gcfg.ReadFileInto(&mySQLConfig, config.MySQLOrchestratorCredentialsConfigFile)
if err != nil {
log.Fatalf("Failed to parse gcfg data from file: %+v", err)
} else {
log.Infof("Parsed orchestrator credentials from %s", config.MySQLOrchestratorCredentialsConfigFile)
config.MySQLOrchestratorUser = mySQLConfig.Client.User
config.MySQLOrchestratorPassword = mySQLConfig.Client.Password
}
}
{
// We accept password in the form "${SOME_ENV_VARIABLE}" in which case we pull
// the given variable from os env
submatch := envVariableRegexp.FindStringSubmatch(config.MySQLOrchestratorPassword)
if len(submatch) > 1 {
config.MySQLOrchestratorPassword = os.Getenv(submatch[1])
}
}
if config.MySQLTopologyCredentialsConfigFile != "" {
mySQLConfig := struct {
Client struct {
User string
Password string
}
}{}
err := gcfg.ReadFileInto(&mySQLConfig, config.MySQLTopologyCredentialsConfigFile)
if err != nil {
log.Fatalf("Failed to parse gcfg data from file: %+v", err)
} else {
log.Infof("Parsed topology credentials from %s", config.MySQLTopologyCredentialsConfigFile)
config.MySQLTopologyUser = mySQLConfig.Client.User
config.MySQLTopologyPassword = mySQLConfig.Client.Password
}
}
{
// We accept password in the form "${SOME_ENV_VARIABLE}" in which case we pull
// the given variable from os env
submatch := envVariableRegexp.FindStringSubmatch(config.MySQLTopologyPassword)
if len(submatch) > 1 {
config.MySQLTopologyPassword = os.Getenv(submatch[1])
}
}
if config.RecoveryPeriodBlockSeconds == 0 && config.RecoveryPeriodBlockMinutes > 0 {
// RecoveryPeriodBlockSeconds is a newer addition that overrides RecoveryPeriodBlockMinutes
// The code does not consider RecoveryPeriodBlockMinutes anymore, but RecoveryPeriodBlockMinutes
// still supported in config file for backwards compatibility
config.RecoveryPeriodBlockSeconds = config.RecoveryPeriodBlockMinutes * 60
}
if config.FailPrimaryPromotionIfSQLThreadNotUpToDate && config.DelayPrimaryPromotionIfSQLThreadNotUpToDate {
return fmt.Errorf("Cannot have both FailPrimaryPromotionIfSQLThreadNotUpToDate and DelayPrimaryPromotionIfSQLThreadNotUpToDate enabled")
}
if config.FailPrimaryPromotionOnLagMinutes > 0 && config.ReplicationLagQuery == "" {
return fmt.Errorf("nonzero FailPrimaryPromotionOnLagMinutes requires ReplicationLagQuery to be set")
}
if config.URLPrefix != "" {
// Ensure the prefix starts with "/" and has no trailing one.
config.URLPrefix = strings.TrimLeft(config.URLPrefix, "/")
config.URLPrefix = strings.TrimRight(config.URLPrefix, "/")
config.URLPrefix = "/" + config.URLPrefix
}
if config.IsSQLite() && config.SQLite3DataFile == "" {
return fmt.Errorf("SQLite3DataFile must be set when BackendDB is sqlite3")
}
if config.RaftEnabled && config.RaftDataDir == "" {
return fmt.Errorf("RaftDataDir must be defined since raft is enabled (RaftEnabled)")
}
if config.RaftEnabled && config.RaftBind == "" {
return fmt.Errorf("RaftBind must be defined since raft is enabled (RaftEnabled)")
}
if config.RaftAdvertise == "" {
config.RaftAdvertise = config.RaftBind
}
if config.HTTPAdvertise != "" {
u, err := url.Parse(config.HTTPAdvertise)
if err != nil {
return fmt.Errorf("Failed parsing HTTPAdvertise %s: %s", config.HTTPAdvertise, err.Error())
}
if u.Scheme == "" {
return fmt.Errorf("If specified, HTTPAdvertise must include scheme (http:// or https://)")
}
if u.Hostname() == "" {
return fmt.Errorf("If specified, HTTPAdvertise must include host name")
}
if u.Port() == "" {
return fmt.Errorf("If specified, HTTPAdvertise must include port number")
}
if u.Path != "" {
return fmt.Errorf("If specified, HTTPAdvertise must not specify a path")
}
if config.InstanceWriteBufferSize <= 0 {
config.BufferInstanceWrites = false
}
}
return nil
}
func (config *Configuration) IsSQLite() bool {
return strings.Contains(config.BackendDB, "sqlite")
}
func (config *Configuration) IsMySQL() bool {
return config.BackendDB == "mysql" || config.BackendDB == ""
}
// read reads configuration from given file, or silently skips if the file does not exist.
// If the file does exist, then it is expected to be in valid JSON format or the function bails out.
func read(fileName string) (*Configuration, error) {
if fileName == "" {
return Config, fmt.Errorf("Empty file name")
}
file, err := os.Open(fileName)
if err != nil {
return Config, err
}
decoder := json.NewDecoder(file)
err = decoder.Decode(Config)
if err == nil {
log.Infof("Read config: %s", fileName)
} else {
log.Fatal("Cannot read config file:", fileName, err)
}
if err := Config.postReadAdjustments(); err != nil {
log.Fatal(err)
}
return Config, err
}
// ForceRead reads configuration from given file name or bails out if it fails
func ForceRead(fileName string) *Configuration {
_, err := read(fileName)
if err != nil {
log.Fatal("Cannot read config file:", fileName, err)
}
readFileNames = []string{fileName}
return Config
}
// CLIFlags stores some command line flags that are globally available in the process' lifetime
type CLIFlags struct {
Noop *bool
SkipUnresolve *bool
SkipUnresolveCheck *bool
BinlogFile *string
GrabElection *bool
Version *bool
Statement *string
PromotionRule *string
ConfiguredVersion string
SkipContinuousRegistration *bool
EnableDatabaseUpdate *bool
IgnoreRaftSetup *bool
Tag *string
}
var RuntimeCLIFlags CLIFlags

Просмотреть файл

@ -35,7 +35,7 @@ import (
"vitess.io/vitess/go/vt/vtctl/grpcvtctldserver/testutil"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/db"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtgr/inst"
topodatapb "vitess.io/vitess/go/vt/proto/topodata"
)

Просмотреть файл

@ -28,8 +28,8 @@ import (
"vitess.io/vitess/go/stats"
"vitess.io/vitess/go/vt/servenv"
"vitess.io/vitess/go/vt/vtgr/db"
"vitess.io/vitess/go/vt/vtgr/inst"
"vitess.io/vitess/go/vt/vtgr/log"
"vitess.io/vitess/go/vt/vtorc/inst"
)
var (

Просмотреть файл

@ -23,7 +23,7 @@ import (
"vitess.io/vitess/go/vt/vtgr/log"
"vitess.io/vitess/go/vt/vtgr/db"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtgr/inst"
"github.com/stretchr/testify/assert"
)

Просмотреть файл

@ -33,8 +33,8 @@ import (
"vitess.io/vitess/go/vt/topo"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/db"
"vitess.io/vitess/go/vt/vtgr/inst"
"vitess.io/vitess/go/vt/vtgr/log"
"vitess.io/vitess/go/vt/vtorc/inst"
)
var (

Просмотреть файл

@ -34,7 +34,7 @@ import (
"vitess.io/vitess/go/vt/vtctl/grpcvtctldserver/testutil"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/db"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtgr/inst"
gomock "github.com/golang/mock/gomock"
"github.com/stretchr/testify/assert"

381
go/vt/vtgr/db/db.go Normal file
Просмотреть файл

@ -0,0 +1,381 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package db
import (
"database/sql"
"fmt"
"strings"
"sync"
"time"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/external/golib/sqlutils"
)
var (
EmptyArgs []any
Db DB = (*vtorcDB)(nil)
)
var mysqlURI string
var dbMutex sync.Mutex
type DB interface {
QueryOrchestrator(query string, argsArray []any, onRow func(sqlutils.RowMap) error) error
}
type vtorcDB struct {
}
var _ DB = (*vtorcDB)(nil)
func (m *vtorcDB) QueryOrchestrator(query string, argsArray []any, onRow func(sqlutils.RowMap) error) error {
return QueryOrchestrator(query, argsArray, onRow)
}
type DummySQLResult struct {
}
func (dummyRes DummySQLResult) LastInsertId() (int64, error) {
return 0, nil
}
func (dummyRes DummySQLResult) RowsAffected() (int64, error) {
return 1, nil
}
func getMySQLURI() string {
dbMutex.Lock()
defer dbMutex.Unlock()
if mysqlURI != "" {
return mysqlURI
}
mysqlURI := fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?timeout=%ds&readTimeout=%ds&rejectReadOnly=%t&interpolateParams=true",
config.Config.MySQLOrchestratorUser,
config.Config.MySQLOrchestratorPassword,
config.Config.MySQLOrchestratorHost,
config.Config.MySQLOrchestratorPort,
config.Config.MySQLOrchestratorDatabase,
config.Config.MySQLConnectTimeoutSeconds,
config.Config.MySQLOrchestratorReadTimeoutSeconds,
config.Config.MySQLOrchestratorRejectReadOnly,
)
if config.Config.MySQLOrchestratorUseMutualTLS {
mysqlURI, _ = SetupMySQLOrchestratorTLS(mysqlURI)
}
return mysqlURI
}
// OpenDiscovery returns a DB instance to access a topology instance.
// It has lower read timeout than OpenTopology and is intended to
// be used with low-latency discovery queries.
func OpenDiscovery(host string, port int) (*sql.DB, error) {
return openTopology(host, port, config.Config.MySQLDiscoveryReadTimeoutSeconds)
}
// OpenTopology returns a DB instance to access a topology instance.
func OpenTopology(host string, port int) (*sql.DB, error) {
return openTopology(host, port, config.Config.MySQLTopologyReadTimeoutSeconds)
}
func openTopology(host string, port int, readTimeout int) (db *sql.DB, err error) {
uri := fmt.Sprintf("%s:%s@tcp(%s:%d)/?timeout=%ds&readTimeout=%ds&interpolateParams=true",
config.Config.MySQLTopologyUser,
config.Config.MySQLTopologyPassword,
host, port,
config.Config.MySQLConnectTimeoutSeconds,
readTimeout,
)
if config.Config.MySQLTopologyUseMutualTLS ||
(config.Config.MySQLTopologyUseMixedTLS && requiresTLS(host, port, uri)) {
if uri, err = SetupMySQLTopologyTLS(uri); err != nil {
return nil, err
}
}
if db, _, err = sqlutils.GetDB(uri); err != nil {
return nil, err
}
if config.Config.MySQLConnectionLifetimeSeconds > 0 {
db.SetConnMaxLifetime(time.Duration(config.Config.MySQLConnectionLifetimeSeconds) * time.Second)
}
db.SetMaxOpenConns(config.MySQLTopologyMaxPoolConnections)
db.SetMaxIdleConns(config.MySQLTopologyMaxPoolConnections)
return db, err
}
func openOrchestratorMySQLGeneric() (db *sql.DB, fromCache bool, err error) {
uri := fmt.Sprintf("%s:%s@tcp(%s:%d)/?timeout=%ds&readTimeout=%ds&interpolateParams=true",
config.Config.MySQLOrchestratorUser,
config.Config.MySQLOrchestratorPassword,
config.Config.MySQLOrchestratorHost,
config.Config.MySQLOrchestratorPort,
config.Config.MySQLConnectTimeoutSeconds,
config.Config.MySQLOrchestratorReadTimeoutSeconds,
)
if config.Config.MySQLOrchestratorUseMutualTLS {
uri, _ = SetupMySQLOrchestratorTLS(uri)
}
return sqlutils.GetDB(uri)
}
func IsSQLite() bool {
return config.Config.IsSQLite()
}
// OpenTopology returns the DB instance for the orchestrator backed database
func OpenOrchestrator() (db *sql.DB, err error) {
var fromCache bool
if IsSQLite() {
db, fromCache, err = sqlutils.GetSQLiteDB(config.Config.SQLite3DataFile)
if err == nil && !fromCache {
log.Infof("Connected to orchestrator backend: sqlite on %v", config.Config.SQLite3DataFile)
}
if db != nil {
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
}
} else {
if db, fromCache, err := openOrchestratorMySQLGeneric(); err != nil {
log.Errorf(err.Error())
return db, err
} else if !fromCache {
// first time ever we talk to MySQL
query := fmt.Sprintf("create database if not exists %s", config.Config.MySQLOrchestratorDatabase)
if _, err := db.Exec(query); err != nil {
log.Errorf(err.Error())
return db, err
}
}
db, fromCache, err = sqlutils.GetDB(getMySQLURI())
if err == nil && !fromCache {
// do not show the password but do show what we connect to.
safeMySQLURI := fmt.Sprintf("%s:?@tcp(%s:%d)/%s?timeout=%ds", config.Config.MySQLOrchestratorUser,
config.Config.MySQLOrchestratorHost, config.Config.MySQLOrchestratorPort, config.Config.MySQLOrchestratorDatabase, config.Config.MySQLConnectTimeoutSeconds)
log.Infof("Connected to orchestrator backend: %v", safeMySQLURI)
if config.Config.MySQLOrchestratorMaxPoolConnections > 0 {
log.Infof("Orchestrator pool SetMaxOpenConns: %d", config.Config.MySQLOrchestratorMaxPoolConnections)
db.SetMaxOpenConns(config.Config.MySQLOrchestratorMaxPoolConnections)
}
if config.Config.MySQLConnectionLifetimeSeconds > 0 {
db.SetConnMaxLifetime(time.Duration(config.Config.MySQLConnectionLifetimeSeconds) * time.Second)
}
}
}
if err == nil && !fromCache {
if !config.Config.SkipOrchestratorDatabaseUpdate {
initOrchestratorDB(db)
}
// A low value here will trigger reconnects which could
// make the number of backend connections hit the tcp
// limit. That's bad. I could make this setting dynamic
// but then people need to know which value to use. For now
// allow up to 25% of MySQLOrchestratorMaxPoolConnections
// to be idle. That should provide a good number which
// does not keep the maximum number of connections open but
// at the same time does not trigger disconnections and
// reconnections too frequently.
maxIdleConns := int(config.Config.MySQLOrchestratorMaxPoolConnections * 25 / 100)
if maxIdleConns < 10 {
maxIdleConns = 10
}
log.Infof("Connecting to backend %s:%d: maxConnections: %d, maxIdleConns: %d",
config.Config.MySQLOrchestratorHost,
config.Config.MySQLOrchestratorPort,
config.Config.MySQLOrchestratorMaxPoolConnections,
maxIdleConns)
db.SetMaxIdleConns(maxIdleConns)
}
return db, err
}
func translateStatement(statement string) (string, error) {
if IsSQLite() {
statement = sqlutils.ToSqlite3Dialect(statement)
}
return statement, nil
}
// versionIsDeployed checks if given version has already been deployed
func versionIsDeployed(db *sql.DB) (result bool, err error) {
query := `
select
count(*) as is_deployed
from
orchestrator_db_deployments
where
deployed_version = ?
`
err = db.QueryRow(query, config.RuntimeCLIFlags.ConfiguredVersion).Scan(&result)
// err means the table 'orchestrator_db_deployments' does not even exist, in which case we proceed
// to deploy.
// If there's another error to this, like DB gone bad, then we're about to find out anyway.
return result, err
}
// registerOrchestratorDeployment updates the orchestrator_metadata table upon successful deployment
func registerOrchestratorDeployment(db *sql.DB) error {
query := `
replace into orchestrator_db_deployments (
deployed_version, deployed_timestamp
) values (
?, NOW()
)
`
if _, err := execInternal(db, query, config.RuntimeCLIFlags.ConfiguredVersion); err != nil {
log.Fatalf("Unable to write to orchestrator_metadata: %+v", err)
}
log.Infof("Migrated database schema to version [%+v]", config.RuntimeCLIFlags.ConfiguredVersion)
return nil
}
// deployStatements will issue given sql queries that are not already known to be deployed.
// This iterates both lists (to-run and already-deployed) and also verifies no contraditions.
func deployStatements(db *sql.DB, queries []string) error {
tx, err := db.Begin()
if err != nil {
log.Fatal(err.Error())
}
// Ugly workaround ahead.
// Origin of this workaround is the existence of some "timestamp NOT NULL," column definitions,
// where in NO_ZERO_IN_DATE,NO_ZERO_DATE sql_mode are invalid (since default is implicitly "0")
// This means installation of orchestrator fails on such configured servers, and in particular on 5.7
// where this setting is the dfault.
// For purpose of backwards compatability, what we do is force sql_mode to be more relaxed, create the schemas
// along with the "invalid" definition, and then go ahead and fix those definitions via following ALTER statements.
// My bad.
originalSQLMode := ""
if config.Config.IsMySQL() {
_ = tx.QueryRow(`select @@session.sql_mode`).Scan(&originalSQLMode)
if _, err := tx.Exec(`set @@session.sql_mode=REPLACE(@@session.sql_mode, 'NO_ZERO_DATE', '')`); err != nil {
log.Fatal(err.Error())
}
if _, err := tx.Exec(`set @@session.sql_mode=REPLACE(@@session.sql_mode, 'NO_ZERO_IN_DATE', '')`); err != nil {
log.Fatal(err.Error())
}
}
for _, query := range queries {
query, err := translateStatement(query)
if err != nil {
log.Fatalf("Cannot initiate orchestrator: %+v; query=%+v", err, query)
return err
}
if _, err := tx.Exec(query); err != nil {
if strings.Contains(err.Error(), "syntax error") {
log.Fatalf("Cannot initiate orchestrator: %+v; query=%+v", err, query)
return err
}
if !sqlutils.IsAlterTable(query) && !sqlutils.IsCreateIndex(query) && !sqlutils.IsDropIndex(query) {
log.Fatalf("Cannot initiate orchestrator: %+v; query=%+v", err, query)
return err
}
if !strings.Contains(err.Error(), "duplicate column name") &&
!strings.Contains(err.Error(), "Duplicate column name") &&
!strings.Contains(err.Error(), "check that column/key exists") &&
!strings.Contains(err.Error(), "already exists") &&
!strings.Contains(err.Error(), "Duplicate key name") {
log.Errorf("Error initiating orchestrator: %+v; query=%+v", err, query)
}
}
}
if config.Config.IsMySQL() {
if _, err := tx.Exec(`set session sql_mode=?`, originalSQLMode); err != nil {
log.Fatal(err.Error())
}
}
if err := tx.Commit(); err != nil {
log.Fatal(err.Error())
}
return nil
}
// initOrchestratorDB attempts to create/upgrade the orchestrator backend database. It is created once in the
// application's lifetime.
func initOrchestratorDB(db *sql.DB) error {
log.Info("Initializing orchestrator")
versionAlreadyDeployed, err := versionIsDeployed(db)
if versionAlreadyDeployed && config.RuntimeCLIFlags.ConfiguredVersion != "" && err == nil {
// Already deployed with this version
return nil
}
if config.Config.PanicIfDifferentDatabaseDeploy && config.RuntimeCLIFlags.ConfiguredVersion != "" && !versionAlreadyDeployed {
log.Fatalf("PanicIfDifferentDatabaseDeploy is set. Configured version %s is not the version found in the database", config.RuntimeCLIFlags.ConfiguredVersion)
}
log.Info("Migrating database schema")
deployStatements(db, generateSQLBase)
deployStatements(db, generateSQLPatches)
registerOrchestratorDeployment(db)
if IsSQLite() {
ExecOrchestrator(`PRAGMA journal_mode = WAL`)
ExecOrchestrator(`PRAGMA synchronous = NORMAL`)
}
return nil
}
// execInternal
func execInternal(db *sql.DB, query string, args ...any) (sql.Result, error) {
var err error
query, err = translateStatement(query)
if err != nil {
return nil, err
}
res, err := sqlutils.ExecNoPrepare(db, query, args...)
return res, err
}
// ExecOrchestrator will execute given query on the orchestrator backend database.
func ExecOrchestrator(query string, args ...any) (sql.Result, error) {
var err error
query, err = translateStatement(query)
if err != nil {
return nil, err
}
db, err := OpenOrchestrator()
if err != nil {
return nil, err
}
res, err := sqlutils.ExecNoPrepare(db, query, args...)
return res, err
}
// QueryOrchestrator
func QueryOrchestrator(query string, argsArray []any, onRow func(sqlutils.RowMap) error) error {
query, err := translateStatement(query)
if err != nil {
log.Fatalf("Cannot query orchestrator: %+v; query=%+v", err, query)
return err
}
db, err := OpenOrchestrator()
if err != nil {
return err
}
if err = sqlutils.QueryRowsMap(db, query, onRow, argsArray...); err != nil {
log.Warning(err.Error())
}
return err
}

Просмотреть файл

@ -0,0 +1,862 @@
/*
Copyright 2017 Shlomi Noach, GitHub Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package db
// generateSQLBase & generateSQLPatches are lists of SQL statements required to build the orchestrator backend
var generateSQLBase = []string{
`
CREATE TABLE IF NOT EXISTS database_instance (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
last_checked timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_seen timestamp NULL DEFAULT NULL,
server_id int(10) unsigned NOT NULL,
version varchar(128) CHARACTER SET ascii NOT NULL,
binlog_format varchar(16) CHARACTER SET ascii NOT NULL,
log_bin tinyint(3) unsigned NOT NULL,
log_replica_updates tinyint(3) unsigned NOT NULL,
binary_log_file varchar(128) CHARACTER SET ascii NOT NULL,
binary_log_pos bigint(20) unsigned NOT NULL,
source_host varchar(128) CHARACTER SET ascii NOT NULL,
source_port smallint(5) unsigned NOT NULL,
replica_sql_running tinyint(3) unsigned NOT NULL,
replica_io_running tinyint(3) unsigned NOT NULL,
source_log_file varchar(128) CHARACTER SET ascii NOT NULL,
read_source_log_pos bigint(20) unsigned NOT NULL,
relay_source_log_file varchar(128) CHARACTER SET ascii NOT NULL,
exec_source_log_pos bigint(20) unsigned NOT NULL,
replication_lag_seconds bigint(20) unsigned DEFAULT NULL,
replica_lag_seconds bigint(20) unsigned DEFAULT NULL,
num_replica_hosts int(10) unsigned NOT NULL,
replica_hosts text CHARACTER SET ascii NOT NULL,
cluster_name varchar(128) CHARACTER SET ascii NOT NULL,
PRIMARY KEY (hostname,port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX cluster_name_idx ON database_instance
`,
`
CREATE INDEX cluster_name_idx_database_instance ON database_instance(cluster_name)
`,
`
DROP INDEX last_checked_idx ON database_instance
`,
`
CREATE INDEX last_checked_idx_database_instance ON database_instance(last_checked)
`,
`
DROP INDEX last_seen_idx ON database_instance
`,
`
CREATE INDEX last_seen_idx_database_instance ON database_instance(last_seen)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_maintenance (
database_instance_maintenance_id int(10) unsigned NOT NULL AUTO_INCREMENT,
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
maintenance_active tinyint(4) DEFAULT NULL,
begin_timestamp timestamp NULL DEFAULT NULL,
end_timestamp timestamp NULL DEFAULT NULL,
owner varchar(128) CHARACTER SET utf8 NOT NULL,
reason text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (database_instance_maintenance_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX maintenance_uidx ON database_instance_maintenance
`,
`
CREATE UNIQUE INDEX maintenance_uidx_database_instance_maintenance ON database_instance_maintenance (maintenance_active, hostname, port)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_long_running_queries (
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
process_id bigint(20) NOT NULL,
process_started_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
process_user varchar(16) CHARACTER SET utf8 NOT NULL,
process_host varchar(128) CHARACTER SET utf8 NOT NULL,
process_db varchar(128) CHARACTER SET utf8 NOT NULL,
process_command varchar(16) CHARACTER SET utf8 NOT NULL,
process_time_seconds int(11) NOT NULL,
process_state varchar(128) CHARACTER SET utf8 NOT NULL,
process_info varchar(1024) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (hostname,port,process_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX process_started_at_idx ON database_instance_long_running_queries
`,
`
CREATE INDEX process_started_at_idx_database_instance_long_running_queries ON database_instance_long_running_queries (process_started_at)
`,
`
CREATE TABLE IF NOT EXISTS audit (
audit_id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
audit_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
audit_type varchar(128) CHARACTER SET ascii NOT NULL,
hostname varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '',
port smallint(5) unsigned NOT NULL,
message text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (audit_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
`,
`
DROP INDEX audit_timestamp_idx ON audit
`,
`
CREATE INDEX audit_timestamp_idx_audit ON audit (audit_timestamp)
`,
`
DROP INDEX host_port_idx ON audit
`,
`
CREATE INDEX host_port_idx_audit ON audit (hostname, port, audit_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS host_agent (
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
token varchar(128) NOT NULL,
last_submitted timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_checked timestamp NULL DEFAULT NULL,
last_seen timestamp NULL DEFAULT NULL,
mysql_port smallint(5) unsigned DEFAULT NULL,
count_mysql_snapshots smallint(5) unsigned NOT NULL,
PRIMARY KEY (hostname)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX token_idx ON host_agent
`,
`
CREATE INDEX token_idx_host_agent ON host_agent (token)
`,
`
DROP INDEX last_submitted_idx ON host_agent
`,
`
CREATE INDEX last_submitted_idx_host_agent ON host_agent (last_submitted)
`,
`
DROP INDEX last_checked_idx ON host_agent
`,
`
CREATE INDEX last_checked_idx_host_agent ON host_agent (last_checked)
`,
`
DROP INDEX last_seen_idx ON host_agent
`,
`
CREATE INDEX last_seen_idx_host_agent ON host_agent (last_seen)
`,
`
CREATE TABLE IF NOT EXISTS agent_seed (
agent_seed_id int(10) unsigned NOT NULL AUTO_INCREMENT,
target_hostname varchar(128) NOT NULL,
source_hostname varchar(128) NOT NULL,
start_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_timestamp timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',
is_complete tinyint(3) unsigned NOT NULL DEFAULT '0',
is_successful tinyint(3) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (agent_seed_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX target_hostname_idx ON agent_seed
`,
`
CREATE INDEX target_hostname_idx_agent_seed ON agent_seed (target_hostname,is_complete)
`,
`
DROP INDEX source_hostname_idx ON agent_seed
`,
`
CREATE INDEX source_hostname_idx_agent_seed ON agent_seed (source_hostname,is_complete)
`,
`
DROP INDEX start_timestamp_idx ON agent_seed
`,
`
CREATE INDEX start_timestamp_idx_agent_seed ON agent_seed (start_timestamp)
`,
`
DROP INDEX is_complete_idx ON agent_seed
`,
`
CREATE INDEX is_complete_idx_agent_seed ON agent_seed (is_complete,start_timestamp)
`,
`
DROP INDEX is_successful_idx ON agent_seed
`,
`
CREATE INDEX is_successful_idx_agent_seed ON agent_seed (is_successful, start_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS agent_seed_state (
agent_seed_state_id int(10) unsigned NOT NULL AUTO_INCREMENT,
agent_seed_id int(10) unsigned NOT NULL,
state_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
state_action varchar(127) NOT NULL,
error_message varchar(255) NOT NULL,
PRIMARY KEY (agent_seed_state_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX agent_seed_idx ON agent_seed_state
`,
`
CREATE INDEX agent_seed_idx_agent_seed_state ON agent_seed_state (agent_seed_id, state_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS host_attributes (
hostname varchar(128) NOT NULL,
attribute_name varchar(128) NOT NULL,
attribute_value varchar(128) NOT NULL,
submit_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
expire_timestamp timestamp NULL DEFAULT NULL,
PRIMARY KEY (hostname,attribute_name)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX attribute_name_idx ON host_attributes
`,
`
CREATE INDEX attribute_name_idx_host_attributes ON host_attributes (attribute_name)
`,
`
DROP INDEX attribute_value_idx ON host_attributes
`,
`
CREATE INDEX attribute_value_idx_host_attributes ON host_attributes (attribute_value)
`,
`
DROP INDEX submit_timestamp_idx ON host_attributes
`,
`
CREATE INDEX submit_timestamp_idx_host_attributes ON host_attributes (submit_timestamp)
`,
`
DROP INDEX expire_timestamp_idx ON host_attributes
`,
`
CREATE INDEX expire_timestamp_idx_host_attributes ON host_attributes (expire_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS hostname_resolve (
hostname varchar(128) NOT NULL,
resolved_hostname varchar(128) NOT NULL,
resolved_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (hostname)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX resolved_timestamp_idx ON hostname_resolve
`,
`
CREATE INDEX resolved_timestamp_idx_hostname_resolve ON hostname_resolve (resolved_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS active_node (
anchor tinyint unsigned NOT NULL,
hostname varchar(128) CHARACTER SET ascii NOT NULL,
token varchar(128) NOT NULL,
last_seen_active timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (anchor)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
INSERT IGNORE INTO active_node (anchor, hostname, token, last_seen_active)
VALUES (1, '', '', NOW())
`,
`
CREATE TABLE IF NOT EXISTS node_health (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
token varchar(128) NOT NULL,
last_seen_active timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (hostname, token)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP VIEW IF EXISTS _whats_wrong
`,
`
DROP VIEW IF EXISTS whats_wrong
`,
`
DROP VIEW IF EXISTS whats_wrong_summary
`,
`
CREATE TABLE IF NOT EXISTS topology_recovery (
recovery_id bigint unsigned not null auto_increment,
hostname varchar(128) NOT NULL,
port smallint unsigned NOT NULL,
in_active_period tinyint unsigned NOT NULL DEFAULT 0,
start_active_period timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_active_period_unixtime int unsigned,
end_recovery timestamp NULL DEFAULT NULL,
processing_node_hostname varchar(128) CHARACTER SET ascii NOT NULL,
processcing_node_token varchar(128) NOT NULL,
successor_hostname varchar(128) DEFAULT NULL,
successor_port smallint unsigned DEFAULT NULL,
PRIMARY KEY (recovery_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX in_active_start_period_idx ON topology_recovery
`,
`
CREATE INDEX in_active_start_period_idx_topology_recovery ON topology_recovery (in_active_period, start_active_period)
`,
`
DROP INDEX start_active_period_idx ON topology_recovery
`,
`
CREATE INDEX start_active_period_idx_topology_recovery ON topology_recovery (start_active_period)
`,
`
DROP INDEX hostname_port_active_period_uidx ON topology_recovery
`,
`
CREATE UNIQUE INDEX hostname_port_active_period_uidx_topology_recovery ON topology_recovery (hostname, port, in_active_period, end_active_period_unixtime)
`,
`
CREATE TABLE IF NOT EXISTS hostname_unresolve (
hostname varchar(128) NOT NULL,
unresolved_hostname varchar(128) NOT NULL,
PRIMARY KEY (hostname)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX unresolved_hostname_idx ON hostname_unresolve
`,
`
CREATE INDEX unresolved_hostname_idx_hostname_unresolve ON hostname_unresolve (unresolved_hostname)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_pool (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
pool varchar(128) NOT NULL,
PRIMARY KEY (hostname, port, pool)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX pool_idx ON database_instance_pool
`,
`
CREATE INDEX pool_idx_database_instance_pool ON database_instance_pool (pool)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_topology_history (
snapshot_unix_timestamp INT UNSIGNED NOT NULL,
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
source_host varchar(128) CHARACTER SET ascii NOT NULL,
source_port smallint(5) unsigned NOT NULL,
cluster_name tinytext CHARACTER SET ascii NOT NULL,
PRIMARY KEY (snapshot_unix_timestamp, hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX cluster_name_idx ON database_instance_topology_history
`,
`
CREATE INDEX cluster_name_idx_database_instance_topology_history ON database_instance_topology_history (snapshot_unix_timestamp, cluster_name(128))
`,
`
CREATE TABLE IF NOT EXISTS candidate_database_instance (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
last_suggested TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX last_suggested_idx ON candidate_database_instance
`,
`
CREATE INDEX last_suggested_idx_candidate_database_instance ON candidate_database_instance (last_suggested)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_downtime (
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
downtime_active tinyint(4) DEFAULT NULL,
begin_timestamp timestamp DEFAULT CURRENT_TIMESTAMP,
end_timestamp timestamp NULL DEFAULT NULL,
owner varchar(128) CHARACTER SET utf8 NOT NULL,
reason text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS topology_failure_detection (
detection_id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
hostname varchar(128) NOT NULL,
port smallint unsigned NOT NULL,
in_active_period tinyint unsigned NOT NULL DEFAULT '0',
start_active_period timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_active_period_unixtime int unsigned NOT NULL,
processing_node_hostname varchar(128) NOT NULL,
processcing_node_token varchar(128) NOT NULL,
analysis varchar(128) NOT NULL,
cluster_name varchar(128) NOT NULL,
count_affected_replicas int unsigned NOT NULL,
replica_hosts text NOT NULL,
PRIMARY KEY (detection_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX hostname_port_active_period_uidx ON topology_failure_detection
`,
`
DROP INDEX in_active_start_period_idx ON topology_failure_detection
`,
`
CREATE INDEX in_active_start_period_idx_topology_failure_detection ON topology_failure_detection (in_active_period, start_active_period)
`,
`
CREATE TABLE IF NOT EXISTS hostname_resolve_history (
resolved_hostname varchar(128) NOT NULL,
hostname varchar(128) NOT NULL,
resolved_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (resolved_hostname)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX hostname ON hostname_resolve_history
`,
`
CREATE INDEX hostname_idx_hostname_resolve_history ON hostname_resolve_history (hostname)
`,
`
DROP INDEX resolved_timestamp_idx ON hostname_resolve_history
`,
`
CREATE INDEX resolved_timestamp_idx_hostname_resolve_history ON hostname_resolve_history (resolved_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS hostname_unresolve_history (
unresolved_hostname varchar(128) NOT NULL,
hostname varchar(128) NOT NULL,
last_registered TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (unresolved_hostname)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX hostname ON hostname_unresolve_history
`,
`
CREATE INDEX hostname_idx_hostname_unresolve_history ON hostname_unresolve_history (hostname)
`,
`
DROP INDEX last_registered_idx ON hostname_unresolve_history
`,
`
CREATE INDEX last_registered_idx_hostname_unresolve_history ON hostname_unresolve_history (last_registered)
`,
`
CREATE TABLE IF NOT EXISTS cluster_domain_name (
cluster_name varchar(128) CHARACTER SET ascii NOT NULL,
domain_name varchar(128) NOT NULL,
PRIMARY KEY (cluster_name)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX domain_name_idx ON cluster_domain_name
`,
`
CREATE INDEX domain_name_idx_cluster_domain_name ON cluster_domain_name (domain_name(32))
`,
`
CREATE TABLE IF NOT EXISTS primary_position_equivalence (
equivalence_id bigint unsigned not null auto_increment,
primary1_hostname varchar(128) CHARACTER SET ascii NOT NULL,
primary1_port smallint(5) unsigned NOT NULL,
primary1_binary_log_file varchar(128) CHARACTER SET ascii NOT NULL,
primary1_binary_log_pos bigint(20) unsigned NOT NULL,
primary2_hostname varchar(128) CHARACTER SET ascii NOT NULL,
primary2_port smallint(5) unsigned NOT NULL,
primary2_binary_log_file varchar(128) CHARACTER SET ascii NOT NULL,
primary2_binary_log_pos bigint(20) unsigned NOT NULL,
last_suggested TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (equivalence_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX equivalence_uidx ON primary_position_equivalence
`,
`
CREATE UNIQUE INDEX equivalence_uidx_primary_position_equivalence ON primary_position_equivalence (primary1_hostname, primary1_port, primary1_binary_log_file, primary1_binary_log_pos, primary2_hostname, primary2_port)
`,
`
DROP INDEX primary2_idx ON primary_position_equivalence
`,
`
CREATE INDEX primary2_idx_primary_position_equivalence ON primary_position_equivalence (primary2_hostname, primary2_port, primary2_binary_log_file, primary2_binary_log_pos)
`,
`
DROP INDEX last_suggested_idx ON primary_position_equivalence
`,
`
CREATE INDEX last_suggested_idx_primary_position_equivalence ON primary_position_equivalence (last_suggested)
`,
`
CREATE TABLE IF NOT EXISTS async_request (
request_id bigint unsigned NOT NULL AUTO_INCREMENT,
command varchar(128) charset ascii not null,
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
destination_hostname varchar(128) NOT NULL,
destination_port smallint(5) unsigned NOT NULL,
pattern text CHARACTER SET utf8 NOT NULL,
gtid_hint varchar(32) charset ascii not null,
begin_timestamp timestamp NULL DEFAULT NULL,
end_timestamp timestamp NULL DEFAULT NULL,
story text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (request_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX begin_timestamp_idx ON async_request
`,
`
CREATE INDEX begin_timestamp_idx_async_request ON async_request (begin_timestamp)
`,
`
DROP INDEX end_timestamp_idx ON async_request
`,
`
CREATE INDEX end_timestamp_idx_async_request ON async_request (end_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS blocked_topology_recovery (
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
cluster_name varchar(128) NOT NULL,
analysis varchar(128) NOT NULL,
last_blocked_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
blocking_recovery_id bigint unsigned,
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX cluster_blocked_idx ON blocked_topology_recovery
`,
`
CREATE INDEX cluster_blocked_idx_blocked_topology_recovery ON blocked_topology_recovery (cluster_name, last_blocked_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_last_analysis (
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
analysis_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
analysis varchar(128) NOT NULL,
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX analysis_timestamp_idx ON database_instance_last_analysis
`,
`
CREATE INDEX analysis_timestamp_idx_database_instance_last_analysis ON database_instance_last_analysis (analysis_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_analysis_changelog (
changelog_id bigint unsigned not null auto_increment,
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
analysis_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
analysis varchar(128) NOT NULL,
PRIMARY KEY (changelog_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX analysis_timestamp_idx ON database_instance_analysis_changelog
`,
`
CREATE INDEX analysis_timestamp_idx_database_instance_analysis_changelog ON database_instance_analysis_changelog (analysis_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS node_health_history (
history_id bigint unsigned not null auto_increment,
hostname varchar(128) CHARACTER SET ascii NOT NULL,
token varchar(128) NOT NULL,
first_seen_active timestamp NOT NULL,
extra_info varchar(128) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (history_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX first_seen_active_idx ON node_health_history
`,
`
CREATE INDEX first_seen_active_idx_node_health_history ON node_health_history (first_seen_active)
`,
`
DROP INDEX hostname_token_idx ON node_health_history
`,
`
CREATE UNIQUE INDEX hostname_token_idx_node_health_history ON node_health_history (hostname, token)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_coordinates_history (
history_id bigint unsigned not null auto_increment,
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
recorded_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
binary_log_file varchar(128) NOT NULL,
binary_log_pos bigint(20) unsigned NOT NULL,
relay_log_file varchar(128) NOT NULL,
relay_log_pos bigint(20) unsigned NOT NULL,
PRIMARY KEY (history_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX hostname_port_recorded_timestmp_idx ON database_instance_coordinates_history
`,
`
CREATE INDEX hostname_port_recorded_idx_database_instance_coordinates_history ON database_instance_coordinates_history (hostname, port, recorded_timestamp)
`,
`
DROP INDEX recorded_timestmp_idx ON database_instance_coordinates_history
`,
`
CREATE INDEX recorded_timestmp_idx_database_instance_coordinates_history ON database_instance_coordinates_history (recorded_timestamp)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_binlog_files_history (
history_id bigint unsigned not null auto_increment,
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
binary_log_file varchar(128) NOT NULL,
binary_log_pos bigint(20) unsigned NOT NULL,
first_seen timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_seen timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',
PRIMARY KEY (history_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX hostname_port_file_idx ON database_instance_binlog_files_history
`,
`
CREATE UNIQUE INDEX hostname_port_file_idx_database_instance_binlog_files_history ON database_instance_binlog_files_history (hostname, port, binary_log_file)
`,
`
DROP INDEX last_seen_idx ON database_instance_binlog_files_history
`,
`
CREATE INDEX last_seen_idx_database_instance_binlog_files_history ON database_instance_binlog_files_history (last_seen)
`,
`
CREATE TABLE IF NOT EXISTS access_token (
access_token_id bigint unsigned not null auto_increment,
public_token varchar(128) NOT NULL,
secret_token varchar(128) NOT NULL,
generated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
generated_by varchar(128) CHARACTER SET utf8 NOT NULL,
is_acquired tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (access_token_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX public_token_idx ON access_token
`,
`
CREATE UNIQUE INDEX public_token_uidx_access_token ON access_token (public_token)
`,
`
DROP INDEX generated_at_idx ON access_token
`,
`
CREATE INDEX generated_at_idx_access_token ON access_token (generated_at)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_recent_relaylog_history (
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
current_relay_log_file varchar(128) NOT NULL,
current_relay_log_pos bigint(20) unsigned NOT NULL,
current_seen timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',
prev_relay_log_file varchar(128) NOT NULL,
prev_relay_log_pos bigint(20) unsigned NOT NULL,
prev_seen timestamp NOT NULL DEFAULT '1971-01-01 00:00:00',
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX current_seen_idx ON database_instance_recent_relaylog_history
`,
`
CREATE INDEX current_seen_idx_database_instance_recent_relaylog_history ON database_instance_recent_relaylog_history (current_seen)
`,
`
CREATE TABLE IF NOT EXISTS orchestrator_metadata (
anchor tinyint unsigned NOT NULL,
last_deployed_version varchar(128) CHARACTER SET ascii NOT NULL,
last_deployed_timestamp timestamp NOT NULL,
PRIMARY KEY (anchor)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS orchestrator_db_deployments (
deployed_version varchar(128) CHARACTER SET ascii NOT NULL,
deployed_timestamp timestamp NOT NULL,
PRIMARY KEY (deployed_version)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS global_recovery_disable (
disable_recovery tinyint unsigned NOT NULL COMMENT 'Insert 1 to disable recovery globally',
PRIMARY KEY (disable_recovery)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS topology_recovery_steps (
recovery_step_id bigint unsigned not null auto_increment,
recovery_uid varchar(128) CHARACTER SET ascii NOT NULL,
audit_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
message text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (recovery_step_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS raft_store (
store_id bigint unsigned not null auto_increment,
store_key varbinary(512) not null,
store_value blob not null,
PRIMARY KEY (store_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE INDEX store_key_idx_raft_store ON raft_store (store_key)
`,
`
CREATE TABLE IF NOT EXISTS raft_log (
log_index bigint unsigned not null auto_increment,
term bigint not null,
log_type int not null,
data blob not null,
PRIMARY KEY (log_index)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS raft_snapshot (
snapshot_id bigint unsigned not null auto_increment,
snapshot_name varchar(128) CHARACTER SET utf8 NOT NULL,
snapshot_meta varchar(4096) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (snapshot_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE UNIQUE INDEX snapshot_name_uidx_raft_snapshot ON raft_snapshot (snapshot_name)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_peer_analysis (
peer varchar(128) NOT NULL,
hostname varchar(128) NOT NULL,
port smallint(5) unsigned NOT NULL,
analysis_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
analysis varchar(128) NOT NULL,
PRIMARY KEY (peer, hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS database_instance_tls (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
required tinyint unsigned NOT NULL DEFAULT 0,
PRIMARY KEY (hostname,port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS cluster_injected_pseudo_gtid (
cluster_name varchar(128) NOT NULL,
time_injected timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (cluster_name)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS hostname_ips (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
ipv4 varchar(128) CHARACTER SET ascii NOT NULL,
ipv6 varchar(128) CHARACTER SET ascii NOT NULL,
last_updated timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (hostname)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE TABLE IF NOT EXISTS database_instance_tags (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
tag_name varchar(128) CHARACTER SET utf8 NOT NULL,
tag_value varchar(128) CHARACTER SET utf8 NOT NULL,
last_updated timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (hostname, port, tag_name)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE INDEX tag_name_idx_database_instance_tags ON database_instance_tags (tag_name)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_stale_binlog_coordinates (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
binary_log_file varchar(128) NOT NULL,
binary_log_pos bigint(20) unsigned NOT NULL,
first_seen timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE INDEX first_seen_idx_database_instance_stale_binlog_coordinates ON database_instance_stale_binlog_coordinates (first_seen)
`,
`
CREATE TABLE IF NOT EXISTS vitess_tablet (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
keyspace varchar(128) CHARACTER SET ascii NOT NULL,
shard varchar(128) CHARACTER SET ascii NOT NULL,
cell varchar(128) CHARACTER SET ascii NOT NULL,
tablet_type smallint(5) NOT NULL,
primary_timestamp timestamp NOT NULL,
info varchar(512) CHARACTER SET ascii NOT NULL,
PRIMARY KEY (hostname, port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
CREATE INDEX cell_idx_vitess_tablet ON vitess_tablet (cell)
`,
`
CREATE INDEX ks_idx_vitess_tablet ON vitess_tablet (keyspace, shard)
`,
`
CREATE TABLE IF NOT EXISTS vitess_keyspace (
keyspace varchar(128) CHARACTER SET ascii NOT NULL,
keyspace_type smallint(5) NOT NULL,
durability_policy varchar(512) CHARACTER SET ascii NOT NULL,
PRIMARY KEY (keyspace)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
}

Просмотреть файл

@ -0,0 +1,583 @@
/*
Copyright 2017 Shlomi Noach, GitHub Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package db
// generateSQLPatches contains DDLs for patching schema to the latest version.
// Add new statements at the end of the list so they form a changelog.
var generateSQLPatches = []string{
`
ALTER TABLE
database_instance
ADD COLUMN read_only TINYINT UNSIGNED NOT NULL AFTER version
`,
`
ALTER TABLE
database_instance
ADD COLUMN last_sql_error TEXT NOT NULL AFTER exec_source_log_pos
`,
`
ALTER TABLE
database_instance
ADD COLUMN last_io_error TEXT NOT NULL AFTER last_sql_error
`,
`
ALTER TABLE
database_instance
ADD COLUMN oracle_gtid TINYINT UNSIGNED NOT NULL AFTER replica_io_running
`,
`
ALTER TABLE
database_instance
ADD COLUMN mariadb_gtid TINYINT UNSIGNED NOT NULL AFTER oracle_gtid
`,
`
ALTER TABLE
database_instance
ADD COLUMN relay_log_file varchar(128) CHARACTER SET ascii NOT NULL AFTER exec_source_log_pos
`,
`
ALTER TABLE
database_instance
ADD COLUMN relay_log_pos bigint unsigned NOT NULL AFTER relay_log_file
`,
`
DROP INDEX source_host_port_idx ON database_instance
`,
`
ALTER TABLE
database_instance
ADD INDEX source_host_port_idx_database_instance (source_host, source_port)
`,
`
ALTER TABLE
database_instance
ADD COLUMN pseudo_gtid TINYINT UNSIGNED NOT NULL AFTER mariadb_gtid
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_depth TINYINT UNSIGNED NOT NULL AFTER cluster_name
`,
`
ALTER TABLE
database_instance
ADD COLUMN has_replication_filters TINYINT UNSIGNED NOT NULL AFTER replica_io_running
`,
`
ALTER TABLE
database_instance
ADD COLUMN data_center varchar(32) CHARACTER SET ascii NOT NULL AFTER cluster_name
`,
`
ALTER TABLE
database_instance
ADD COLUMN physical_environment varchar(32) CHARACTER SET ascii NOT NULL AFTER data_center
`,
`
ALTER TABLE
database_instance_maintenance
ADD KEY active_timestamp_idx (maintenance_active, begin_timestamp)
`,
`
ALTER TABLE
database_instance
ADD COLUMN is_co_primary TINYINT UNSIGNED NOT NULL AFTER replication_depth
`,
`
ALTER TABLE
database_instance_maintenance
ADD KEY active_end_timestamp_idx (maintenance_active, end_timestamp)
`,
`
ALTER TABLE
database_instance
ADD COLUMN sql_delay INT UNSIGNED NOT NULL AFTER replica_lag_seconds
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN analysis varchar(128) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN cluster_name varchar(128) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN count_affected_replicas int unsigned NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN replica_hosts text CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE hostname_unresolve
ADD COLUMN last_registered TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE hostname_unresolve
ADD KEY last_registered_idx (last_registered)
`,
`
ALTER TABLE topology_recovery
ADD KEY cluster_name_in_active_idx (cluster_name, in_active_period)
`,
`
ALTER TABLE topology_recovery
ADD KEY end_recovery_idx (end_recovery)
`,
`
ALTER TABLE
database_instance
ADD COLUMN binlog_server TINYINT UNSIGNED NOT NULL AFTER version
`,
`
ALTER TABLE cluster_domain_name
ADD COLUMN last_registered TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE cluster_domain_name
ADD KEY last_registered_idx (last_registered)
`,
`
ALTER TABLE
database_instance
ADD COLUMN supports_oracle_gtid TINYINT UNSIGNED NOT NULL AFTER oracle_gtid
`,
`
ALTER TABLE
database_instance
ADD COLUMN executed_gtid_set text CHARACTER SET ascii NOT NULL AFTER oracle_gtid
`,
`
ALTER TABLE
database_instance
ADD COLUMN server_uuid varchar(64) CHARACTER SET ascii NOT NULL AFTER server_id
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN is_successful TINYINT UNSIGNED NOT NULL DEFAULT 0 AFTER processcing_node_token
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN acknowledged TINYINT UNSIGNED NOT NULL DEFAULT 0
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN acknowledged_by varchar(128) CHARACTER SET utf8 NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN acknowledge_comment text CHARACTER SET utf8 NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN participating_instances text CHARACTER SET ascii NOT NULL after replica_hosts
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN lost_replicas text CHARACTER SET ascii NOT NULL after participating_instances
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN all_errors text CHARACTER SET ascii NOT NULL after lost_replicas
`,
`
ALTER TABLE audit
ADD COLUMN cluster_name varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '' AFTER port
`,
`
ALTER TABLE candidate_database_instance
ADD COLUMN priority TINYINT SIGNED NOT NULL DEFAULT 1 comment 'positive promote, nagative unpromotes'
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN acknowledged_at TIMESTAMP NULL after acknowledged
`,
`
ALTER TABLE
topology_recovery
ADD KEY acknowledged_idx (acknowledged, acknowledged_at)
`,
`
ALTER TABLE
blocked_topology_recovery
ADD KEY last_blocked_idx (last_blocked_timestamp)
`,
`
ALTER TABLE candidate_database_instance
ADD COLUMN promotion_rule enum('must', 'prefer', 'neutral', 'prefer_not', 'must_not') NOT NULL DEFAULT 'neutral'
`,
`
ALTER TABLE node_health /* sqlite3-skip */
DROP PRIMARY KEY,
ADD PRIMARY KEY (hostname, token)
`,
`
ALTER TABLE node_health
ADD COLUMN extra_info varchar(128) CHARACTER SET utf8 NOT NULL
`,
`
ALTER TABLE agent_seed /* sqlite3-skip */
MODIFY end_timestamp timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE active_node /* sqlite3-skip */
MODIFY last_seen_active timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE node_health /* sqlite3-skip */
MODIFY last_seen_active timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE candidate_database_instance /* sqlite3-skip */
MODIFY last_suggested timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE primary_position_equivalence /* sqlite3-skip */
MODIFY last_suggested timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE
database_instance
ADD COLUMN last_attempted_check TIMESTAMP NOT NULL DEFAULT '1971-01-01 00:00:00' AFTER last_checked
`,
`
ALTER TABLE
database_instance /* sqlite3-skip */
MODIFY last_attempted_check TIMESTAMP NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE
database_instance_analysis_changelog
ADD KEY instance_timestamp_idx (hostname, port, analysis_timestamp)
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN last_detection_id bigint unsigned NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD KEY last_detection_idx (last_detection_id)
`,
`
ALTER TABLE node_health_history
ADD COLUMN command varchar(128) CHARACTER SET utf8 NOT NULL
`,
`
ALTER TABLE node_health
ADD COLUMN command varchar(128) CHARACTER SET utf8 NOT NULL
`,
`
ALTER TABLE database_instance_topology_history
ADD COLUMN version varchar(128) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE
database_instance
ADD COLUMN gtid_purged text CHARACTER SET ascii NOT NULL AFTER executed_gtid_set
`,
`
ALTER TABLE
database_instance_coordinates_history
ADD COLUMN last_seen timestamp NOT NULL DEFAULT '1971-01-01 00:00:00' AFTER recorded_timestamp
`,
`
ALTER TABLE
access_token
ADD COLUMN is_reentrant TINYINT UNSIGNED NOT NULL default 0
`,
`
ALTER TABLE
access_token
ADD COLUMN acquired_at timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE
database_instance_pool
ADD COLUMN registered_at timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE
database_instance
ADD COLUMN has_replication_credentials TINYINT UNSIGNED NOT NULL
`,
`
ALTER TABLE
database_instance
ADD COLUMN allow_tls TINYINT UNSIGNED NOT NULL AFTER sql_delay
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_enforced TINYINT UNSIGNED NOT NULL AFTER physical_environment
`,
`
ALTER TABLE
database_instance
ADD COLUMN instance_alias varchar(128) CHARACTER SET ascii NOT NULL AFTER physical_environment
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN successor_alias varchar(128) DEFAULT NULL
`,
`
ALTER TABLE
database_instance /* sqlite3-skip */
MODIFY cluster_name varchar(128) NOT NULL
`,
`
ALTER TABLE
node_health
ADD INDEX last_seen_active_idx (last_seen_active)
`,
`
ALTER TABLE
database_instance_maintenance
ADD COLUMN processing_node_hostname varchar(128) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE
database_instance_maintenance
ADD COLUMN processing_node_token varchar(128) NOT NULL
`,
`
ALTER TABLE
database_instance_maintenance
ADD COLUMN explicitly_bounded TINYINT UNSIGNED NOT NULL
`,
`
ALTER TABLE node_health_history
ADD COLUMN app_version varchar(64) CHARACTER SET ascii NOT NULL DEFAULT ""
`,
`
ALTER TABLE node_health
ADD COLUMN app_version varchar(64) CHARACTER SET ascii NOT NULL DEFAULT ""
`,
`
ALTER TABLE node_health_history /* sqlite3-skip */
MODIFY app_version varchar(64) CHARACTER SET ascii NOT NULL DEFAULT ""
`,
`
ALTER TABLE node_health /* sqlite3-skip */
MODIFY app_version varchar(64) CHARACTER SET ascii NOT NULL DEFAULT ""
`,
`
ALTER TABLE
database_instance
ADD COLUMN version_comment varchar(128) NOT NULL DEFAULT ''
`,
`
ALTER TABLE active_node
ADD COLUMN first_seen_active timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE node_health
ADD COLUMN first_seen_active timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE database_instance
ADD COLUMN major_version varchar(16) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE
database_instance
ADD COLUMN binlog_row_image varchar(16) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE topology_recovery
ADD COLUMN uid varchar(128) CHARACTER SET ascii NOT NULL
`,
`
CREATE INDEX uid_idx_topology_recovery ON topology_recovery(uid)
`,
`
CREATE INDEX recovery_uid_idx_topology_recovery_steps ON topology_recovery_steps(recovery_uid)
`,
`
ALTER TABLE
database_instance
ADD COLUMN last_discovery_latency bigint not null
`,
`
CREATE INDEX end_timestamp_idx_database_instance_downtime ON database_instance_downtime(end_timestamp)
`,
`
ALTER TABLE
topology_failure_detection
ADD COLUMN is_actionable tinyint not null default 0
`,
`
DROP INDEX hostname_port_active_period_uidx_topology_failure_detection ON topology_failure_detection
`,
`
CREATE UNIQUE INDEX host_port_active_recoverable_uidx_topology_failure_detection ON topology_failure_detection (hostname, port, in_active_period, end_active_period_unixtime, is_actionable)
`,
`
ALTER TABLE raft_snapshot
ADD COLUMN created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
`,
`
ALTER TABLE node_health
ADD COLUMN db_backend varchar(255) CHARACTER SET ascii NOT NULL DEFAULT ""
`,
`
ALTER TABLE node_health
ADD COLUMN incrementing_indicator bigint not null default 0
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_primary_enabled TINYINT UNSIGNED NOT NULL
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_replica_enabled TINYINT UNSIGNED NOT NULL
`,
`
ALTER TABLE
database_instance
ADD COLUMN gtid_mode varchar(32) CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE
database_instance
ADD COLUMN last_check_partial_success tinyint unsigned NOT NULL after last_attempted_check
`,
`
ALTER TABLE
database_instance
ADD COLUMN source_uuid varchar(64) CHARACTER SET ascii NOT NULL AFTER oracle_gtid
`,
`
ALTER TABLE
database_instance
ADD COLUMN gtid_errant text CHARACTER SET ascii NOT NULL AFTER gtid_purged
`,
`
ALTER TABLE
database_instance
ADD COLUMN ancestry_uuid text CHARACTER SET ascii NOT NULL AFTER source_uuid
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_sql_thread_state tinyint signed not null default 0 AFTER replica_io_running
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_io_thread_state tinyint signed not null default 0 AFTER replication_sql_thread_state
`,
`
ALTER TABLE
database_instance_tags /* sqlite3-skip */
DROP PRIMARY KEY,
ADD PRIMARY KEY (hostname, port, tag_name)
`,
`
ALTER TABLE
database_instance
ADD COLUMN region varchar(32) CHARACTER SET ascii NOT NULL AFTER data_center
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_primary_timeout INT UNSIGNED NOT NULL DEFAULT 0 AFTER semi_sync_primary_enabled
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_primary_wait_for_replica_count INT UNSIGNED NOT NULL DEFAULT 0 AFTER semi_sync_primary_timeout
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_primary_status TINYINT UNSIGNED NOT NULL DEFAULT 0 AFTER semi_sync_primary_wait_for_replica_count
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_replica_status TINYINT UNSIGNED NOT NULL DEFAULT 0 AFTER semi_sync_primary_status
`,
`
ALTER TABLE
database_instance
ADD COLUMN semi_sync_primary_clients INT UNSIGNED NOT NULL DEFAULT 0 AFTER semi_sync_primary_status
`,
`
ALTER TABLE /* sqlite3-skip */
database_instance
MODIFY semi_sync_primary_timeout BIGINT UNSIGNED NOT NULL DEFAULT 0
`,
// Fields related to Replication Group the instance belongs to
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_name VARCHAR(64) CHARACTER SET ascii NOT NULL DEFAULT '' AFTER gtid_mode
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_is_single_primary_mode TINYINT UNSIGNED NOT NULL DEFAULT 1 AFTER replication_group_name
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_member_state VARCHAR(16) CHARACTER SET ascii NOT NULL DEFAULT '' AFTER replication_group_is_single_primary_mode
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_member_role VARCHAR(16) CHARACTER SET ascii NOT NULL DEFAULT '' AFTER replication_group_member_state
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_members text CHARACTER SET ascii NOT NULL AFTER replication_group_member_role
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_primary_host varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '' AFTER replication_group_members
`,
`
ALTER TABLE
database_instance
ADD COLUMN replication_group_primary_port smallint(5) unsigned NOT NULL DEFAULT 0 AFTER replication_group_primary_host
`,
}

Просмотреть файл

@ -23,7 +23,7 @@ import (
gomock "github.com/golang/mock/gomock"
mysql "vitess.io/vitess/go/mysql"
inst "vitess.io/vitess/go/vt/vtorc/inst"
inst "vitess.io/vitess/go/vt/vtgr/inst"
)
// MockAgent is a mock of Agent interface

Просмотреть файл

@ -29,10 +29,9 @@ import (
"vitess.io/vitess/go/mysql"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/servenv"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/db"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/external/golib/sqlutils"
"vitess.io/vitess/go/vt/vtgr/inst"
)
var (
@ -459,7 +458,7 @@ func execInstance(instanceKey *inst.InstanceKey, query string, args ...any) erro
if err := verifyInstance(instanceKey); err != nil {
return err
}
sqlDb, err := db.OpenDiscovery(instanceKey.Hostname, instanceKey.Port)
sqlDb, err := OpenDiscovery(instanceKey.Hostname, instanceKey.Port)
if err != nil {
log.Errorf("error exec %v: %v", query, err)
return err
@ -473,7 +472,7 @@ func execInstanceWithTopo(instanceKey *inst.InstanceKey, query string, args ...a
if err := verifyInstance(instanceKey); err != nil {
return err
}
sqlDb, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
sqlDb, err := OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
log.Errorf("error exec %v: %v", query, err)
return err
@ -487,7 +486,7 @@ func fetchInstance(instanceKey *inst.InstanceKey, query string, onRow func(sqlut
if err := verifyInstance(instanceKey); err != nil {
return err
}
sqlDb, err := db.OpenDiscovery(instanceKey.Hostname, instanceKey.Port)
sqlDb, err := OpenDiscovery(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return err
}

Просмотреть файл

@ -14,6 +14,10 @@
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package db
import (
@ -28,10 +32,9 @@ import (
"github.com/patrickmn/go-cache"
"github.com/rcrowley/go-metrics"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/ssl"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/external/golib/sqlutils"
"vitess.io/vitess/go/vt/vtgr/ssl"
)
const Error3159 = "Error 3159:"
@ -40,8 +43,8 @@ const Error1045 = "Access denied for user"
// Track if a TLS has already been configured for topology
var topologyTLSConfigured = false
// Track if a TLS has already been configured for VTOrc
var vtorcTLSConfigured = false
// Track if a TLS has already been configured for Orchestrator
var orchestratorTLSConfigured = false
var requireTLSCache *cache.Cache = cache.New(time.Duration(config.Config.TLSCacheTTLFactor*config.Config.InstancePollSeconds)*time.Second, time.Second)
@ -51,10 +54,10 @@ var readInstanceTLSCacheCounter = metrics.NewCounter()
var writeInstanceTLSCacheCounter = metrics.NewCounter()
func init() {
_ = metrics.Register("instance_tls.read", readInstanceTLSCounter)
_ = metrics.Register("instance_tls.write", writeInstanceTLSCounter)
_ = metrics.Register("instance_tls.read_cache", readInstanceTLSCacheCounter)
_ = metrics.Register("instance_tls.write_cache", writeInstanceTLSCacheCounter)
metrics.Register("instance_tls.read", readInstanceTLSCounter)
metrics.Register("instance_tls.write", writeInstanceTLSCounter)
metrics.Register("instance_tls.read_cache", readInstanceTLSCacheCounter)
metrics.Register("instance_tls.write_cache", writeInstanceTLSCacheCounter)
}
func requiresTLS(host string, port int, uri string) bool {
@ -81,7 +84,7 @@ func requiresTLS(host string, port int, uri string) bool {
on duplicate key update
required=values(required)
`
if _, err := ExecVTOrc(query, host, port, required); err != nil {
if _, err := ExecOrchestrator(query, host, port, required); err != nil {
log.Error(err)
}
writeInstanceTLSCounter.Inc(1)
@ -124,31 +127,31 @@ func SetupMySQLTopologyTLS(uri string) (string, error) {
}
// Create a TLS configuration from the config supplied CA, Certificate, and Private key.
// Register the TLS config with the mysql drivers as the "vtorc" config
// Register the TLS config with the mysql drivers as the "orchestrator" config
// Modify the supplied URI to call the TLS config
func SetupMySQLVTOrcTLS(uri string) (string, error) {
if !vtorcTLSConfigured {
tlsConfig, err := ssl.NewTLSConfig(config.Config.MySQLVTOrcSSLCAFile, !config.Config.MySQLVTOrcSSLSkipVerify)
func SetupMySQLOrchestratorTLS(uri string) (string, error) {
if !orchestratorTLSConfigured {
tlsConfig, err := ssl.NewTLSConfig(config.Config.MySQLOrchestratorSSLCAFile, !config.Config.MySQLOrchestratorSSLSkipVerify)
// Drop to TLS 1.0 for talking to MySQL
tlsConfig.MinVersion = tls.VersionTLS10
if err != nil {
log.Fatalf("Can't create TLS configuration for VTOrc connection %s: %s", uri, err)
log.Fatalf("Can't create TLS configuration for Orchestrator connection %s: %s", uri, err)
return "", err
}
tlsConfig.InsecureSkipVerify = config.Config.MySQLVTOrcSSLSkipVerify
if (!config.Config.MySQLVTOrcSSLSkipVerify) &&
config.Config.MySQLVTOrcSSLCertFile != "" &&
config.Config.MySQLVTOrcSSLPrivateKeyFile != "" {
if err = ssl.AppendKeyPair(tlsConfig, config.Config.MySQLVTOrcSSLCertFile, config.Config.MySQLVTOrcSSLPrivateKeyFile); err != nil {
tlsConfig.InsecureSkipVerify = config.Config.MySQLOrchestratorSSLSkipVerify
if (!config.Config.MySQLOrchestratorSSLSkipVerify) &&
config.Config.MySQLOrchestratorSSLCertFile != "" &&
config.Config.MySQLOrchestratorSSLPrivateKeyFile != "" {
if err = ssl.AppendKeyPair(tlsConfig, config.Config.MySQLOrchestratorSSLCertFile, config.Config.MySQLOrchestratorSSLPrivateKeyFile); err != nil {
log.Fatalf("Can't setup TLS key pairs for %s: %s", uri, err)
return "", err
}
}
if err = mysql.RegisterTLSConfig("vtorc", tlsConfig); err != nil {
log.Fatalf("Can't register mysql TLS config for vtorc: %s", err)
if err = mysql.RegisterTLSConfig("orchestrator", tlsConfig); err != nil {
log.Fatalf("Can't register mysql TLS config for orchestrator: %s", err)
return "", err
}
vtorcTLSConfigured = true
orchestratorTLSConfigured = true
}
return fmt.Sprintf("%s&tls=vtorc", uri), nil
return fmt.Sprintf("%s&tls=orchestrator", uri), nil
}

53
go/vt/vtgr/external/golib/sqlutils/dialect.go поставляемый Normal file
Просмотреть файл

@ -0,0 +1,53 @@
/*
Copyright 2017 GitHub Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package sqlutils
import (
"regexp"
"strings"
)
type regexpMap struct {
r *regexp.Regexp
replacement string
}
func (this *regexpMap) process(text string) (result string) {
return this.r.ReplaceAllString(text, this.replacement)
}
func rmap(regexpExpression string, replacement string) regexpMap {
return regexpMap{
r: regexp.MustCompile(regexpSpaces(regexpExpression)),
replacement: replacement,
}
}
func regexpSpaces(statement string) string {
return strings.Replace(statement, " ", `[\s]+`, -1)
}
func applyConversions(statement string, conversions []regexpMap) string {
for _, rmap := range conversions {
statement = rmap.process(statement)
}
return statement
}

134
go/vt/vtgr/external/golib/sqlutils/sqlite_dialect.go поставляемый Normal file
Просмотреть файл

@ -0,0 +1,134 @@
/*
Copyright 2017 GitHub Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
// What's this about?
// This is a brute-force regular-expression based conversion from MySQL syntax to sqlite3 syntax.
// It is NOT meant to be a general purpose solution and is only expected & confirmed to run on
// queries issued by orchestrator. There are known limitations to this design.
// It's not even pretty.
// In fact...
// Well, it gets the job done at this time. Call it debt.
package sqlutils
import (
"regexp"
)
var sqlite3CreateTableConversions = []regexpMap{
rmap(`(?i) (character set|charset) [\S]+`, ``),
rmap(`(?i)int unsigned`, `int`),
rmap(`(?i)int[\s]*[(][\s]*([0-9]+)[\s]*[)] unsigned`, `int`),
rmap(`(?i)engine[\s]*=[\s]*(innodb|myisam|ndb|memory|tokudb)`, ``),
rmap(`(?i)DEFAULT CHARSET[\s]*=[\s]*[\S]+`, ``),
rmap(`(?i)[\S]*int( not null|) auto_increment`, `integer`),
rmap(`(?i)comment '[^']*'`, ``),
rmap(`(?i)after [\S]+`, ``),
rmap(`(?i)alter table ([\S]+) add (index|key) ([\S]+) (.+)`, `create index ${3}_${1} on $1 $4`),
rmap(`(?i)alter table ([\S]+) add unique (index|key) ([\S]+) (.+)`, `create unique index ${3}_${1} on $1 $4`),
rmap(`(?i)([\S]+) enum[\s]*([(].*?[)])`, `$1 text check($1 in $2)`),
rmap(`(?i)([\s\S]+[/][*] sqlite3-skip [*][/][\s\S]+)`, ``),
rmap(`(?i)timestamp default current_timestamp`, `timestamp default ('')`),
rmap(`(?i)timestamp not null default current_timestamp`, `timestamp not null default ('')`),
rmap(`(?i)add column (.*int) not null[\s]*$`, `add column $1 not null default 0`),
rmap(`(?i)add column (.* text) not null[\s]*$`, `add column $1 not null default ''`),
rmap(`(?i)add column (.* varchar.*) not null[\s]*$`, `add column $1 not null default ''`),
}
var sqlite3InsertConversions = []regexpMap{
rmap(`(?i)insert ignore ([\s\S]+) on duplicate key update [\s\S]+`, `insert or ignore $1`),
rmap(`(?i)insert ignore`, `insert or ignore`),
rmap(`(?i)now[(][)]`, `datetime('now')`),
rmap(`(?i)insert into ([\s\S]+) on duplicate key update [\s\S]+`, `replace into $1`),
}
var sqlite3GeneralConversions = []regexpMap{
rmap(`(?i)now[(][)][\s]*[-][\s]*interval [?] ([\w]+)`, `datetime('now', printf('-%d $1', ?))`),
rmap(`(?i)now[(][)][\s]*[+][\s]*interval [?] ([\w]+)`, `datetime('now', printf('+%d $1', ?))`),
rmap(`(?i)now[(][)][\s]*[-][\s]*interval ([0-9.]+) ([\w]+)`, `datetime('now', '-${1} $2')`),
rmap(`(?i)now[(][)][\s]*[+][\s]*interval ([0-9.]+) ([\w]+)`, `datetime('now', '+${1} $2')`),
rmap(`(?i)[=<>\s]([\S]+[.][\S]+)[\s]*[-][\s]*interval [?] ([\w]+)`, ` datetime($1, printf('-%d $2', ?))`),
rmap(`(?i)[=<>\s]([\S]+[.][\S]+)[\s]*[+][\s]*interval [?] ([\w]+)`, ` datetime($1, printf('+%d $2', ?))`),
rmap(`(?i)unix_timestamp[(][)]`, `strftime('%s', 'now')`),
rmap(`(?i)unix_timestamp[(]([^)]+)[)]`, `strftime('%s', $1)`),
rmap(`(?i)now[(][)]`, `datetime('now')`),
rmap(`(?i)cast[(][\s]*([\S]+) as signed[\s]*[)]`, `cast($1 as integer)`),
rmap(`(?i)\bconcat[(][\s]*([^,)]+)[\s]*,[\s]*([^,)]+)[\s]*[)]`, `($1 || $2)`),
rmap(`(?i)\bconcat[(][\s]*([^,)]+)[\s]*,[\s]*([^,)]+)[\s]*,[\s]*([^,)]+)[\s]*[)]`, `($1 || $2 || $3)`),
rmap(`(?i) rlike `, ` like `),
rmap(`(?i)create index([\s\S]+)[(][\s]*[0-9]+[\s]*[)]([\s\S]+)`, `create index ${1}${2}`),
rmap(`(?i)drop index ([\S]+) on ([\S]+)`, `drop index if exists $1`),
}
var (
sqlite3IdentifyCreateTableStatement = regexp.MustCompile(regexpSpaces(`(?i)^[\s]*create table`))
sqlite3IdentifyCreateIndexStatement = regexp.MustCompile(regexpSpaces(`(?i)^[\s]*create( unique|) index`))
sqlite3IdentifyDropIndexStatement = regexp.MustCompile(regexpSpaces(`(?i)^[\s]*drop index`))
sqlite3IdentifyAlterTableStatement = regexp.MustCompile(regexpSpaces(`(?i)^[\s]*alter table`))
sqlite3IdentifyInsertStatement = regexp.MustCompile(regexpSpaces(`(?i)^[\s]*(insert|replace)`))
)
func IsInsert(statement string) bool {
return sqlite3IdentifyInsertStatement.MatchString(statement)
}
func IsCreateTable(statement string) bool {
return sqlite3IdentifyCreateTableStatement.MatchString(statement)
}
func IsCreateIndex(statement string) bool {
return sqlite3IdentifyCreateIndexStatement.MatchString(statement)
}
func IsDropIndex(statement string) bool {
return sqlite3IdentifyDropIndexStatement.MatchString(statement)
}
func IsAlterTable(statement string) bool {
return sqlite3IdentifyAlterTableStatement.MatchString(statement)
}
func ToSqlite3CreateTable(statement string) string {
return applyConversions(statement, sqlite3CreateTableConversions)
}
func ToSqlite3Insert(statement string) string {
return applyConversions(statement, sqlite3InsertConversions)
}
func ToSqlite3Dialect(statement string) (translated string) {
if IsCreateTable(statement) {
return ToSqlite3CreateTable(statement)
}
if IsAlterTable(statement) {
return ToSqlite3CreateTable(statement)
}
statement = applyConversions(statement, sqlite3GeneralConversions)
if IsInsert(statement) {
return ToSqlite3Insert(statement)
}
return statement
}

246
go/vt/vtgr/external/golib/sqlutils/sqlite_dialect_test.go поставляемый Normal file
Просмотреть файл

@ -0,0 +1,246 @@
/*
Copyright 2017 GitHub Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package sqlutils
import (
"regexp"
"strings"
"testing"
"github.com/stretchr/testify/require"
)
var spacesRegexp = regexp.MustCompile(`[\s]+`)
func init() {
}
func stripSpaces(statement string) string {
statement = strings.TrimSpace(statement)
statement = spacesRegexp.ReplaceAllString(statement, " ")
return statement
}
func TestIsCreateTable(t *testing.T) {
require.True(t, IsCreateTable("create table t(id int)"))
require.True(t, IsCreateTable(" create table t(id int)"))
require.True(t, IsCreateTable("CREATE TABLE t(id int)"))
require.True(t, IsCreateTable(`
create table t(id int)
`))
require.False(t, IsCreateTable("where create table t(id int)"))
require.False(t, IsCreateTable("insert"))
}
func TestToSqlite3CreateTable(t *testing.T) {
{
statement := "create table t(id int)"
result := ToSqlite3CreateTable(statement)
require.Equal(t, result, statement)
}
{
statement := "create table t(id int, v varchar(123) CHARACTER SET ascii NOT NULL default '')"
result := ToSqlite3CreateTable(statement)
require.Equal(t, result, "create table t(id int, v varchar(123) NOT NULL default '')")
}
{
statement := "create table t(id int, v varchar ( 123 ) CHARACTER SET ascii NOT NULL default '')"
result := ToSqlite3CreateTable(statement)
require.Equal(t, result, "create table t(id int, v varchar ( 123 ) NOT NULL default '')")
}
{
statement := "create table t(i smallint unsigned)"
result := ToSqlite3CreateTable(statement)
require.Equal(t, result, "create table t(i smallint)")
}
{
statement := "create table t(i smallint(5) unsigned)"
result := ToSqlite3CreateTable(statement)
require.Equal(t, result, "create table t(i smallint)")
}
{
statement := "create table t(i smallint ( 5 ) unsigned)"
result := ToSqlite3CreateTable(statement)
require.Equal(t, result, "create table t(i smallint)")
}
}
func TestToSqlite3AlterTable(t *testing.T) {
{
statement := `
ALTER TABLE
database_instance
ADD COLUMN sql_delay INT UNSIGNED NOT NULL AFTER replica_lag_seconds
`
result := stripSpaces(ToSqlite3Dialect(statement))
require.Equal(t, result, stripSpaces(`
ALTER TABLE
database_instance
add column sql_delay int not null default 0
`))
}
{
statement := `
ALTER TABLE
database_instance
ADD INDEX source_host_port_idx (source_host, source_port)
`
result := stripSpaces(ToSqlite3Dialect(statement))
require.Equal(t, result, stripSpaces(`
create index
source_host_port_idx_database_instance
on database_instance (source_host, source_port)
`))
}
{
statement := `
ALTER TABLE
topology_recovery
ADD KEY last_detection_idx (last_detection_id)
`
result := stripSpaces(ToSqlite3Dialect(statement))
require.Equal(t, result, stripSpaces(`
create index
last_detection_idx_topology_recovery
on topology_recovery (last_detection_id)
`))
}
}
func TestCreateIndex(t *testing.T) {
{
statement := `
create index
source_host_port_idx_database_instance
on database_instance (source_host(128), source_port)
`
result := stripSpaces(ToSqlite3Dialect(statement))
require.Equal(t, result, stripSpaces(`
create index
source_host_port_idx_database_instance
on database_instance (source_host, source_port)
`))
}
}
func TestIsInsert(t *testing.T) {
require.True(t, IsInsert("insert into t"))
require.True(t, IsInsert("insert ignore into t"))
require.True(t, IsInsert(`
insert ignore into t
`))
require.False(t, IsInsert("where create table t(id int)"))
require.False(t, IsInsert("create table t(id int)"))
require.True(t, IsInsert(`
insert into
cluster_domain_name (cluster_name, domain_name, last_registered)
values
(?, ?, datetime('now'))
on duplicate key update
domain_name=values(domain_name),
last_registered=values(last_registered)
`))
}
func TestToSqlite3Insert(t *testing.T) {
{
statement := `
insert into
cluster_domain_name (cluster_name, domain_name, last_registered)
values
(?, ?, datetime('now'))
on duplicate key update
domain_name=values(domain_name),
last_registered=values(last_registered)
`
result := stripSpaces(ToSqlite3Dialect(statement))
require.Equal(t, result, stripSpaces(`
replace into
cluster_domain_name (cluster_name, domain_name, last_registered)
values
(?, ?, datetime('now'))
`))
}
}
func TestToSqlite3GeneralConversions(t *testing.T) {
{
statement := "select now()"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select datetime('now')")
}
{
statement := "select now() - interval ? second"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select datetime('now', printf('-%d second', ?))")
}
{
statement := "select now() + interval ? minute"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select datetime('now', printf('+%d minute', ?))")
}
{
statement := "select now() + interval 5 minute"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select datetime('now', '+5 minute')")
}
{
statement := "select some_table.some_column + interval ? minute"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select datetime(some_table.some_column, printf('+%d minute', ?))")
}
{
statement := "AND primary_instance.last_attempted_check <= primary_instance.last_seen + interval ? minute"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "AND primary_instance.last_attempted_check <= datetime(primary_instance.last_seen, printf('+%d minute', ?))")
}
{
statement := "select concat(primary_instance.port, '') as port"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select (primary_instance.port || '') as port")
}
{
statement := "select concat( 'abc' , 'def') as s"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select ('abc' || 'def') as s")
}
{
statement := "select concat( 'abc' , 'def', last.col) as s"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select ('abc' || 'def' || last.col) as s")
}
{
statement := "select concat(myself.only) as s"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select concat(myself.only) as s")
}
{
statement := "select concat(1, '2', 3, '4') as s"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select concat(1, '2', 3, '4') as s")
}
{
statement := "select group_concat( 'abc' , 'def') as s"
result := ToSqlite3Dialect(statement)
require.Equal(t, result, "select group_concat( 'abc' , 'def') as s")
}
}

433
go/vt/vtgr/external/golib/sqlutils/sqlutils.go поставляемый Normal file
Просмотреть файл

@ -0,0 +1,433 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package sqlutils
import (
"database/sql"
"encoding/json"
"fmt"
"strconv"
"strings"
"sync"
"time"
"vitess.io/vitess/go/vt/log"
)
const DateTimeFormat = "2006-01-02 15:04:05.999999"
// RowMap represents one row in a result set. Its objective is to allow
// for easy, typed getters by column name.
type RowMap map[string]CellData
// Cell data is the result of a single (atomic) column in a single row
type CellData sql.NullString
func (this *CellData) MarshalJSON() ([]byte, error) {
if this.Valid {
return json.Marshal(this.String)
} else {
return json.Marshal(nil)
}
}
// UnmarshalJSON reds this object from JSON
func (this *CellData) UnmarshalJSON(b []byte) error {
var s string
if err := json.Unmarshal(b, &s); err != nil {
return err
}
(*this).String = s
(*this).Valid = true
return nil
}
func (this *CellData) NullString() *sql.NullString {
return (*sql.NullString)(this)
}
// RowData is the result of a single row, in positioned array format
type RowData []CellData
// MarshalJSON will marshal this map as JSON
func (this *RowData) MarshalJSON() ([]byte, error) {
cells := make([](*CellData), len(*this))
for i, val := range *this {
d := CellData(val)
cells[i] = &d
}
return json.Marshal(cells)
}
func (this *RowData) Args() []any {
result := make([]any, len(*this))
for i := range *this {
result[i] = (*(*this)[i].NullString())
}
return result
}
// ResultData is an ordered row set of RowData
type ResultData []RowData
type NamedResultData struct {
Columns []string
Data ResultData
}
var EmptyResultData = ResultData{}
func (this *RowMap) GetString(key string) string {
return (*this)[key].String
}
// GetStringD returns a string from the map, or a default value if the key does not exist
func (this *RowMap) GetStringD(key string, def string) string {
if cell, ok := (*this)[key]; ok {
return cell.String
}
return def
}
func (this *RowMap) GetInt64(key string) int64 {
res, _ := strconv.ParseInt(this.GetString(key), 10, 0)
return res
}
func (this *RowMap) GetNullInt64(key string) sql.NullInt64 {
i, err := strconv.ParseInt(this.GetString(key), 10, 0)
if err == nil {
return sql.NullInt64{Int64: i, Valid: true}
} else {
return sql.NullInt64{Valid: false}
}
}
func (this *RowMap) GetInt(key string) int {
res, _ := strconv.Atoi(this.GetString(key))
return res
}
func (this *RowMap) GetIntD(key string, def int) int {
res, err := strconv.Atoi(this.GetString(key))
if err != nil {
return def
}
return res
}
func (this *RowMap) GetUint(key string) uint {
res, _ := strconv.ParseUint(this.GetString(key), 10, 0)
return uint(res)
}
func (this *RowMap) GetUintD(key string, def uint) uint {
res, err := strconv.Atoi(this.GetString(key))
if err != nil {
return def
}
return uint(res)
}
func (this *RowMap) GetUint64(key string) uint64 {
res, _ := strconv.ParseUint(this.GetString(key), 10, 0)
return res
}
func (this *RowMap) GetUint64D(key string, def uint64) uint64 {
res, err := strconv.ParseUint(this.GetString(key), 10, 0)
if err != nil {
return def
}
return uint64(res)
}
func (this *RowMap) GetBool(key string) bool {
return this.GetInt(key) != 0
}
func (this *RowMap) GetTime(key string) time.Time {
if t, err := time.Parse(DateTimeFormat, this.GetString(key)); err == nil {
return t
}
return time.Time{}
}
// knownDBs is a DB cache by uri
var knownDBs map[string]*sql.DB = make(map[string]*sql.DB)
var knownDBsMutex = &sync.Mutex{}
// GetDB returns a DB instance based on uri.
// bool result indicates whether the DB was returned from cache; err
func GetGenericDB(driverName, dataSourceName string) (*sql.DB, bool, error) {
knownDBsMutex.Lock()
defer func() {
knownDBsMutex.Unlock()
}()
var exists bool
if _, exists = knownDBs[dataSourceName]; !exists {
if db, err := sql.Open(driverName, dataSourceName); err == nil {
knownDBs[dataSourceName] = db
} else {
return db, exists, err
}
}
return knownDBs[dataSourceName], exists, nil
}
// GetDB returns a MySQL DB instance based on uri.
// bool result indicates whether the DB was returned from cache; err
func GetDB(mysql_uri string) (*sql.DB, bool, error) {
return GetGenericDB("mysql", mysql_uri)
}
// GetDB returns a SQLite DB instance based on DB file name.
// bool result indicates whether the DB was returned from cache; err
func GetSQLiteDB(dbFile string) (*sql.DB, bool, error) {
return GetGenericDB("sqlite3", dbFile)
}
// RowToArray is a convenience function, typically not called directly, which maps a
// single read database row into a NullString
func RowToArray(rows *sql.Rows, columns []string) []CellData {
buff := make([]any, len(columns))
data := make([]CellData, len(columns))
for i := range buff {
buff[i] = data[i].NullString()
}
rows.Scan(buff...)
return data
}
// ScanRowsToArrays is a convenience function, typically not called directly, which maps rows
// already read from the databse into arrays of NullString
func ScanRowsToArrays(rows *sql.Rows, on_row func([]CellData) error) error {
columns, _ := rows.Columns()
for rows.Next() {
arr := RowToArray(rows, columns)
err := on_row(arr)
if err != nil {
return err
}
}
return nil
}
func rowToMap(row []CellData, columns []string) map[string]CellData {
m := make(map[string]CellData)
for k, data_col := range row {
m[columns[k]] = data_col
}
return m
}
// ScanRowsToMaps is a convenience function, typically not called directly, which maps rows
// already read from the databse into RowMap entries.
func ScanRowsToMaps(rows *sql.Rows, on_row func(RowMap) error) error {
columns, _ := rows.Columns()
err := ScanRowsToArrays(rows, func(arr []CellData) error {
m := rowToMap(arr, columns)
err := on_row(m)
if err != nil {
return err
}
return nil
})
return err
}
// QueryRowsMap is a convenience function allowing querying a result set while poviding a callback
// function activated per read row.
func QueryRowsMap(db *sql.DB, query string, on_row func(RowMap) error, args ...any) (err error) {
defer func() {
if derr := recover(); derr != nil {
err = fmt.Errorf("QueryRowsMap unexpected error: %+v", derr)
}
}()
var rows *sql.Rows
rows, err = db.Query(query, args...)
if rows != nil {
defer rows.Close()
}
if err != nil && err != sql.ErrNoRows {
log.Error(err)
return err
}
err = ScanRowsToMaps(rows, on_row)
return
}
// queryResultData returns a raw array of rows for a given query, optionally reading and returning column names
func queryResultData(db *sql.DB, query string, retrieveColumns bool, args ...any) (resultData ResultData, columns []string, err error) {
defer func() {
if derr := recover(); derr != nil {
err = fmt.Errorf("QueryRowsMap unexpected error: %+v", derr)
}
}()
var rows *sql.Rows
rows, err = db.Query(query, args...)
if err != nil && err != sql.ErrNoRows {
log.Error(err)
return EmptyResultData, columns, err
}
defer rows.Close()
if retrieveColumns {
// Don't pay if you don't want to
columns, _ = rows.Columns()
}
resultData = ResultData{}
err = ScanRowsToArrays(rows, func(rowData []CellData) error {
resultData = append(resultData, rowData)
return nil
})
return resultData, columns, err
}
// QueryResultData returns a raw array of rows
func QueryResultData(db *sql.DB, query string, args ...any) (ResultData, error) {
resultData, _, err := queryResultData(db, query, false, args...)
return resultData, err
}
// QueryResultDataNamed returns a raw array of rows, with column names
func QueryNamedResultData(db *sql.DB, query string, args ...any) (NamedResultData, error) {
resultData, columns, err := queryResultData(db, query, true, args...)
return NamedResultData{Columns: columns, Data: resultData}, err
}
// QueryRowsMapBuffered reads data from the database into a buffer, and only then applies the given function per row.
// This allows the application to take its time with processing the data, albeit consuming as much memory as required by
// the result set.
func QueryRowsMapBuffered(db *sql.DB, query string, on_row func(RowMap) error, args ...any) error {
resultData, columns, err := queryResultData(db, query, true, args...)
if err != nil {
// Already logged
return err
}
for _, row := range resultData {
err = on_row(rowToMap(row, columns))
if err != nil {
return err
}
}
return nil
}
// ExecNoPrepare executes given query using given args on given DB, without using prepared statements.
func ExecNoPrepare(db *sql.DB, query string, args ...any) (res sql.Result, err error) {
defer func() {
if derr := recover(); derr != nil {
err = fmt.Errorf("ExecNoPrepare unexpected error: %+v", derr)
}
}()
res, err = db.Exec(query, args...)
if err != nil {
log.Error(err)
}
return res, err
}
// ExecQuery executes given query using given args on given DB. It will safele prepare, execute and close
// the statement.
func execInternal(silent bool, db *sql.DB, query string, args ...any) (res sql.Result, err error) {
defer func() {
if derr := recover(); derr != nil {
err = fmt.Errorf("execInternal unexpected error: %+v", derr)
}
}()
var stmt *sql.Stmt
stmt, err = db.Prepare(query)
if err != nil {
return nil, err
}
defer stmt.Close()
res, err = stmt.Exec(args...)
if err != nil && !silent {
log.Error(err)
}
return res, err
}
// Exec executes given query using given args on given DB. It will safele prepare, execute and close
// the statement.
func Exec(db *sql.DB, query string, args ...any) (sql.Result, error) {
return execInternal(false, db, query, args...)
}
// ExecSilently acts like Exec but does not report any error
func ExecSilently(db *sql.DB, query string, args ...any) (sql.Result, error) {
return execInternal(true, db, query, args...)
}
func InClauseStringValues(terms []string) string {
quoted := []string{}
for _, s := range terms {
quoted = append(quoted, fmt.Sprintf("'%s'", strings.Replace(s, ",", "''", -1)))
}
return strings.Join(quoted, ", ")
}
// Convert variable length arguments into arguments array
func Args(args ...any) []any {
return args
}
func NilIfZero(i int64) any {
if i == 0 {
return nil
}
return i
}
func ScanTable(db *sql.DB, tableName string) (NamedResultData, error) {
query := fmt.Sprintf("select * from %s", tableName)
return QueryNamedResultData(db, query)
}
func WriteTable(db *sql.DB, tableName string, data NamedResultData) (err error) {
if len(data.Data) == 0 {
return nil
}
if len(data.Columns) == 0 {
return nil
}
placeholders := make([]string, len(data.Columns))
for i := range placeholders {
placeholders[i] = "?"
}
query := fmt.Sprintf(
`replace into %s (%s) values (%s)`,
tableName,
strings.Join(data.Columns, ","),
strings.Join(placeholders, ","),
)
for _, rowData := range data.Data {
if _, execErr := db.Exec(query, rowData.Args()...); execErr != nil {
err = execErr
}
}
return err
}

Просмотреть файл

@ -0,0 +1,125 @@
/*
Copyright 2015 Shlomi Noach, courtesy Booking.com
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package inst
import (
"fmt"
"regexp"
"strings"
)
// InstanceKey is an instance indicator, identifued by hostname and port
type InstanceKey struct {
Hostname string
Port int
}
var (
ipv4Regexp = regexp.MustCompile(`^([0-9]+)[.]([0-9]+)[.]([0-9]+)[.]([0-9]+)$`)
)
const detachHint = "//"
// Constant strings for Group Replication information
// See https://dev.mysql.com/doc/refman/8.0/en/replication-group-members-table.html for additional information.
const (
// Group member roles
GroupReplicationMemberRolePrimary = "PRIMARY"
GroupReplicationMemberRoleSecondary = "SECONDARY"
// Group member states
GroupReplicationMemberStateOnline = "ONLINE"
GroupReplicationMemberStateRecovering = "RECOVERING"
GroupReplicationMemberStateUnreachable = "UNREACHABLE"
GroupReplicationMemberStateOffline = "OFFLINE"
GroupReplicationMemberStateError = "ERROR"
)
// Equals tests equality between this key and another key
func (instanceKey *InstanceKey) Equals(other *InstanceKey) bool {
if other == nil {
return false
}
return instanceKey.Hostname == other.Hostname && instanceKey.Port == other.Port
}
// SmallerThan returns true if this key is dictionary-smaller than another.
// This is used for consistent sorting/ordering; there's nothing magical about it.
func (instanceKey *InstanceKey) SmallerThan(other *InstanceKey) bool {
if instanceKey.Hostname < other.Hostname {
return true
}
if instanceKey.Hostname == other.Hostname && instanceKey.Port < other.Port {
return true
}
return false
}
// IsDetached returns 'true' when this hostname is logically "detached"
func (instanceKey *InstanceKey) IsDetached() bool {
return strings.HasPrefix(instanceKey.Hostname, detachHint)
}
// IsValid uses simple heuristics to see whether this key represents an actual instance
func (instanceKey *InstanceKey) IsValid() bool {
if instanceKey.Hostname == "_" {
return false
}
if instanceKey.IsDetached() {
return false
}
return len(instanceKey.Hostname) > 0 && instanceKey.Port > 0
}
// DetachedKey returns an instance key whose hostname is detahced: invalid, but recoverable
func (instanceKey *InstanceKey) DetachedKey() *InstanceKey {
if instanceKey.IsDetached() {
return instanceKey
}
return &InstanceKey{Hostname: fmt.Sprintf("%s%s", detachHint, instanceKey.Hostname), Port: instanceKey.Port}
}
// ReattachedKey returns an instance key whose hostname is detahced: invalid, but recoverable
func (instanceKey *InstanceKey) ReattachedKey() *InstanceKey {
if !instanceKey.IsDetached() {
return instanceKey
}
return &InstanceKey{Hostname: instanceKey.Hostname[len(detachHint):], Port: instanceKey.Port}
}
// StringCode returns an official string representation of this key
func (instanceKey *InstanceKey) StringCode() string {
return fmt.Sprintf("%s:%d", instanceKey.Hostname, instanceKey.Port)
}
// DisplayString returns a user-friendly string representation of this key
func (instanceKey *InstanceKey) DisplayString() string {
return instanceKey.StringCode()
}
// String returns a user-friendly string representation of this key
func (instanceKey InstanceKey) String() string {
return instanceKey.StringCode()
}
// IsValid uses simple heuristics to see whether this key represents an actual instance
func (instanceKey *InstanceKey) IsIPv4() bool {
return ipv4Regexp.MatchString(instanceKey.Hostname)
}

Просмотреть файл

@ -0,0 +1,67 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
This file has been copied over from VTOrc package
*/
package inst
import (
"testing"
"github.com/stretchr/testify/require"
"vitess.io/vitess/go/vt/vtgr/config"
)
func init() {
config.Config.HostnameResolveMethod = "none"
}
var key1 = InstanceKey{Hostname: "host1", Port: 3306}
func TestInstanceKeyEquals(t *testing.T) {
i1 := InstanceKey{
Hostname: "sql00.db",
Port: 3306,
}
i2 := InstanceKey{
Hostname: "sql00.db",
Port: 3306,
}
require.Equal(t, i1, i2)
i2.Port = 3307
require.NotEqual(t, i1, i2)
}
func TestInstanceKeyDetach(t *testing.T) {
require.False(t, key1.IsDetached())
detached1 := key1.DetachedKey()
require.True(t, detached1.IsDetached())
detached2 := key1.DetachedKey()
require.True(t, detached2.IsDetached())
require.True(t, detached1.Equals(detached2))
reattached1 := detached1.ReattachedKey()
require.False(t, reattached1.IsDetached())
require.True(t, reattached1.Equals(&key1))
reattached2 := reattached1.ReattachedKey()
require.False(t, reattached2.IsDetached())
require.True(t, reattached1.Equals(reattached2))
}

208
go/vt/vtgr/ssl/ssl.go Normal file
Просмотреть файл

@ -0,0 +1,208 @@
package ssl
import (
"crypto/tls"
"crypto/x509"
"encoding/pem"
"errors"
"fmt"
nethttp "net/http"
"os"
"strings"
"vitess.io/vitess/go/vt/log"
"github.com/go-martini/martini"
"github.com/howeyc/gopass"
"vitess.io/vitess/go/vt/vtgr/config"
)
/*
This file has been copied over from VTOrc package
*/
// Determine if a string element is in a string array
func HasString(elem string, arr []string) bool {
for _, s := range arr {
if s == elem {
return true
}
}
return false
}
// NewTLSConfig returns an initialized TLS configuration suitable for client
// authentication. If caFile is non-empty, it will be loaded.
func NewTLSConfig(caFile string, verifyCert bool) (*tls.Config, error) {
var c tls.Config
// Set to TLS 1.2 as a minimum. This is overridden for mysql communication
c.MinVersion = tls.VersionTLS12
if verifyCert {
log.Info("verifyCert requested, client certificates will be verified")
c.ClientAuth = tls.VerifyClientCertIfGiven
}
caPool, err := ReadCAFile(caFile)
if err != nil {
return &c, err
}
c.ClientCAs = caPool
return &c, nil
}
// Returns CA certificate. If caFile is non-empty, it will be loaded.
func ReadCAFile(caFile string) (*x509.CertPool, error) {
var caCertPool *x509.CertPool
if caFile != "" {
data, err := os.ReadFile(caFile)
if err != nil {
return nil, err
}
caCertPool = x509.NewCertPool()
if !caCertPool.AppendCertsFromPEM(data) {
return nil, errors.New("No certificates parsed")
}
log.Infof("Read in CA file: %v", caFile)
}
return caCertPool, nil
}
// Verify that the OU of the presented client certificate matches the list
// of Valid OUs
func Verify(r *nethttp.Request, validOUs []string) error {
if strings.Contains(r.URL.String(), config.Config.StatusEndpoint) && !config.Config.StatusOUVerify {
return nil
}
if r.TLS == nil {
return errors.New("No TLS")
}
for _, chain := range r.TLS.VerifiedChains {
s := chain[0].Subject.OrganizationalUnit
log.Infof("All OUs:", strings.Join(s, " "))
for _, ou := range s {
log.Infof("Client presented OU:", ou)
if HasString(ou, validOUs) {
log.Infof("Found valid OU:", ou)
return nil
}
}
}
log.Error("No valid OUs found")
return errors.New("Invalid OU")
}
// TODO: make this testable?
func VerifyOUs(validOUs []string) martini.Handler {
return func(res nethttp.ResponseWriter, req *nethttp.Request, c martini.Context) {
log.Infof("Verifying client OU")
if err := Verify(req, validOUs); err != nil {
nethttp.Error(res, err.Error(), nethttp.StatusUnauthorized)
}
}
}
// AppendKeyPair loads the given TLS key pair and appends it to
// tlsConfig.Certificates.
func AppendKeyPair(tlsConfig *tls.Config, certFile string, keyFile string) error {
cert, err := tls.LoadX509KeyPair(certFile, keyFile)
if err != nil {
return err
}
tlsConfig.Certificates = append(tlsConfig.Certificates, cert)
return nil
}
// Read in a keypair where the key is password protected
func AppendKeyPairWithPassword(tlsConfig *tls.Config, certFile string, keyFile string, pemPass []byte) error {
// Certificates aren't usually password protected, but we're kicking the password
// along just in case. It won't be used if the file isn't encrypted
certData, err := ReadPEMData(certFile, pemPass)
if err != nil {
return err
}
keyData, err := ReadPEMData(keyFile, pemPass)
if err != nil {
return err
}
cert, err := tls.X509KeyPair(certData, keyData)
if err != nil {
return err
}
tlsConfig.Certificates = append(tlsConfig.Certificates, cert)
return nil
}
// Read a PEM file and ask for a password to decrypt it if needed
func ReadPEMData(pemFile string, pemPass []byte) ([]byte, error) {
pemData, err := os.ReadFile(pemFile)
if err != nil {
return pemData, err
}
// We should really just get the pem.Block back here, if there's other
// junk on the end, warn about it.
pemBlock, rest := pem.Decode(pemData)
if len(rest) > 0 {
log.Warning("Didn't parse all of", pemFile)
}
if x509.IsEncryptedPEMBlock(pemBlock) { //nolint SA1019
// Decrypt and get the ASN.1 DER bytes here
pemData, err = x509.DecryptPEMBlock(pemBlock, pemPass) //nolint SA1019
if err != nil {
return pemData, err
}
log.Infof("Decrypted %v successfully", pemFile)
// Shove the decrypted DER bytes into a new pem Block with blank headers
var newBlock pem.Block
newBlock.Type = pemBlock.Type
newBlock.Bytes = pemData
// This is now like reading in an uncrypted key from a file and stuffing it
// into a byte stream
pemData = pem.EncodeToMemory(&newBlock)
}
return pemData, nil
}
// Print a password prompt on the terminal and collect a password
func GetPEMPassword(pemFile string) []byte {
fmt.Printf("Password for %s: ", pemFile)
pass, err := gopass.GetPasswd()
if err != nil {
// We'll error with an incorrect password at DecryptPEMBlock
return []byte("")
}
return pass
}
// Determine if PEM file is encrypted
func IsEncryptedPEM(pemFile string) bool {
pemData, err := os.ReadFile(pemFile)
if err != nil {
return false
}
pemBlock, _ := pem.Decode(pemData)
if len(pemBlock.Bytes) == 0 {
return false
}
return x509.IsEncryptedPEMBlock(pemBlock) //nolint SA1019
}
// ListenAndServeTLS acts identically to http.ListenAndServeTLS, except that it
// expects TLS configuration.
// TODO: refactor so this is testable?
func ListenAndServeTLS(addr string, handler nethttp.Handler, tlsConfig *tls.Config) error {
if addr == "" {
// On unix Listen calls getaddrinfo to parse the port, so named ports are fine as long
// as they exist in /etc/services
addr = ":https"
}
l, err := tls.Listen("tcp", addr, tlsConfig)
if err != nil {
return err
}
return nethttp.Serve(l, handler)
}

279
go/vt/vtgr/ssl/ssl_test.go Normal file
Просмотреть файл

@ -0,0 +1,279 @@
package ssl_test
import (
"crypto/tls"
"crypto/x509"
"encoding/pem"
"fmt"
nethttp "net/http"
"os"
"reflect"
"strings"
"syscall"
"testing"
"vitess.io/vitess/go/vt/vtgr/config"
"vitess.io/vitess/go/vt/vtgr/ssl"
)
/*
This file has been copied over from VTOrc package
*/
func TestHasString(t *testing.T) {
elem := "foo"
a1 := []string{"bar", "foo", "baz"}
a2 := []string{"bar", "fuu", "baz"}
good := ssl.HasString(elem, a1)
if !good {
t.Errorf("Didn't find %s in array %s", elem, strings.Join(a1, ", "))
}
bad := ssl.HasString(elem, a2)
if bad {
t.Errorf("Unexpectedly found %s in array %s", elem, strings.Join(a2, ", "))
}
}
// TODO: Build a fake CA and make sure it loads up
func TestNewTLSConfig(t *testing.T) {
fakeCA := writeFakeFile(pemCertificate)
defer syscall.Unlink(fakeCA)
conf, err := ssl.NewTLSConfig(fakeCA, true)
if err != nil {
t.Errorf("Could not create new TLS config: %s", err)
}
if conf.ClientAuth != tls.VerifyClientCertIfGiven {
t.Errorf("Client certificate verification was not enabled")
}
if conf.ClientCAs == nil {
t.Errorf("ClientCA empty even though cert provided")
}
conf, err = ssl.NewTLSConfig("", false)
if err != nil {
t.Errorf("Could not create new TLS config: %s", err)
}
if conf.ClientAuth == tls.VerifyClientCertIfGiven {
t.Errorf("Client certificate verification was enabled unexpectedly")
}
if conf.ClientCAs != nil {
t.Errorf("Filling in ClientCA somehow without a cert")
}
}
func TestStatus(t *testing.T) {
var validOUs []string
url := fmt.Sprintf("http://example.com%s", config.Config.StatusEndpoint)
req, err := nethttp.NewRequest("GET", url, nil)
if err != nil {
t.Fatal(err)
}
config.Config.StatusOUVerify = false
if err := ssl.Verify(req, validOUs); err != nil {
t.Errorf("Failed even with verification off")
}
config.Config.StatusOUVerify = true
if err := ssl.Verify(req, validOUs); err == nil {
t.Errorf("Did not fail on with bad verification")
}
}
func TestVerify(t *testing.T) {
var validOUs []string
req, err := nethttp.NewRequest("GET", "http://example.com/foo", nil)
if err != nil {
t.Fatal(err)
}
if err := ssl.Verify(req, validOUs); err == nil {
t.Errorf("Did not fail on lack of TLS config")
}
pemBlock, _ := pem.Decode([]byte(pemCertificate))
cert, err := x509.ParseCertificate(pemBlock.Bytes)
if err != nil {
t.Fatal(err)
}
var tcs tls.ConnectionState
req.TLS = &tcs
if err := ssl.Verify(req, validOUs); err == nil {
t.Errorf("Found a valid OU without any being available")
}
// Set a fake OU
cert.Subject.OrganizationalUnit = []string{"testing"}
// Pretend our request had a certificate
req.TLS.PeerCertificates = []*x509.Certificate{cert}
req.TLS.VerifiedChains = [][]*x509.Certificate{req.TLS.PeerCertificates}
// Look for fake OU
validOUs = []string{"testing"}
if err := ssl.Verify(req, validOUs); err != nil {
t.Errorf("Failed to verify certificate OU")
}
}
func TestReadPEMData(t *testing.T) {
pemCertFile := writeFakeFile(pemCertificate)
defer syscall.Unlink(pemCertFile)
pemPKFile := writeFakeFile(pemPrivateKey)
defer syscall.Unlink(pemPKFile)
pemPKWPFile := writeFakeFile(pemPrivateKeyWithPass)
defer syscall.Unlink(pemPKWPFile)
_, err := ssl.ReadPEMData(pemCertFile, []byte{})
if err != nil {
t.Errorf("Failed to decode certificate: %s", err)
}
pemNoPassBytes, err := ssl.ReadPEMData(pemPKFile, []byte{})
if err != nil {
t.Errorf("Failed to decode private key: %s", err)
}
pemPassBytes, err := ssl.ReadPEMData(pemPKWPFile, []byte("testing"))
if err != nil {
t.Errorf("Failed to decode private key with password: %s", err)
}
if reflect.DeepEqual(pemPassBytes, pemNoPassBytes) {
t.Errorf("PEM encoding failed after password removal")
}
}
func TestAppendKeyPair(t *testing.T) {
c, err := ssl.NewTLSConfig("", false)
if err != nil {
t.Fatal(err)
}
pemCertFile := writeFakeFile(pemCertificate)
defer syscall.Unlink(pemCertFile)
pemPKFile := writeFakeFile(pemPrivateKey)
defer syscall.Unlink(pemPKFile)
if err := ssl.AppendKeyPair(c, pemCertFile, pemPKFile); err != nil {
t.Errorf("Failed to append certificate and key to tls config: %s", err)
}
}
func TestAppendKeyPairWithPassword(t *testing.T) {
c, err := ssl.NewTLSConfig("", false)
if err != nil {
t.Fatal(err)
}
pemCertFile := writeFakeFile(pemCertificate)
defer syscall.Unlink(pemCertFile)
pemPKFile := writeFakeFile(pemPrivateKeyWithPass)
defer syscall.Unlink(pemPKFile)
if err := ssl.AppendKeyPairWithPassword(c, pemCertFile, pemPKFile, []byte("testing")); err != nil {
t.Errorf("Failed to append certificate and key to tls config: %s", err)
}
}
func TestIsEncryptedPEM(t *testing.T) {
pemPKFile := writeFakeFile(pemPrivateKey)
defer syscall.Unlink(pemPKFile)
pemPKWPFile := writeFakeFile(pemPrivateKeyWithPass)
defer syscall.Unlink(pemPKWPFile)
if ssl.IsEncryptedPEM(pemPKFile) {
t.Errorf("Incorrectly identified unencrypted PEM as encrypted")
}
if !ssl.IsEncryptedPEM(pemPKWPFile) {
t.Errorf("Incorrectly identified encrypted PEM as unencrypted")
}
}
func writeFakeFile(content string) string {
f, err := os.CreateTemp("", "ssl_test")
if err != nil {
return ""
}
os.WriteFile(f.Name(), []byte(content), 0644)
return f.Name()
}
const pemCertificate = `-----BEGIN CERTIFICATE-----
MIIDtTCCAp2gAwIBAgIJAOxKC7FsJelrMA0GCSqGSIb3DQEBBQUAMEUxCzAJBgNV
BAYTAkFVMRMwEQYDVQQIEwpTb21lLVN0YXRlMSEwHwYDVQQKExhJbnRlcm5ldCBX
aWRnaXRzIFB0eSBMdGQwHhcNMTcwODEwMTQ0MjM3WhcNMTgwODEwMTQ0MjM3WjBF
MQswCQYDVQQGEwJBVTETMBEGA1UECBMKU29tZS1TdGF0ZTEhMB8GA1UEChMYSW50
ZXJuZXQgV2lkZ2l0cyBQdHkgTHRkMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
CgKCAQEA12vHV3gYy5zd1lujA7prEhCSkAszE6E37mViWhLQ63CuedZfyYaTAHQK
HYDZi4K1MNAySUfZRMcICSSsxlRIz6mzXrFsowaJgwx4cbMDIvXE03KstuXoTYJh
+xmXB+5yEVEtIyP2DvPqfCmwCZb3k94Y/VY1nAQDxIxciXrAxT9zT1oYd0YWr2yp
J2mgsfnY4c3zg7W5WgvOTmYz7Ey7GJjpUjGdayx+P1CilKzSWH1xZuVQFNLSHvcH
WXkEoCMVc0tW5mO5eEO1aNHo9MSjPF386l1rq+pz5OwjqCEZq2b1YxesyLnbF+8+
iYGfYmFaDLFwG7zVDwialuI4TzIIOQIDAQABo4GnMIGkMB0GA1UdDgQWBBQ1ubGx
Yvn3wN5VXyoR0lOD7ARzVTB1BgNVHSMEbjBsgBQ1ubGxYvn3wN5VXyoR0lOD7ARz
VaFJpEcwRTELMAkGA1UEBhMCQVUxEzARBgNVBAgTClNvbWUtU3RhdGUxITAfBgNV
BAoTGEludGVybmV0IFdpZGdpdHMgUHR5IEx0ZIIJAOxKC7FsJelrMAwGA1UdEwQF
MAMBAf8wDQYJKoZIhvcNAQEFBQADggEBALmm4Zw/4jLKDJciUGUYOcr5Xe9TP/Cs
afH7IWvaFUDfV3W6yAm9jgNfIy9aDLpuu2CdEb+0qL2hdmGLV7IM3y62Ve0UTdGV
BGsm1zMmIguew2wGbAwGr5LmIcUseatVUKAAAfDrBNwotEAdM8kmGekUZfOM+J9D
FoNQ62C0buRHGugtu6zWAcZNOe6CI7HdhaAdxZlgn8y7dfJQMacoK0NcWeUVQwii
6D4mgaqUGM2O+WcquD1vEMuBPYVcKhi43019E0+6LI5QB6w80bARY8K7tkTdRD7U
y1/C7iIqyuBVL45OdSabb37TfGlHZIPIwLaGw3i4Mr0+F0jQT8rZtTQ=
-----END CERTIFICATE-----`
const pemPrivateKey = `-----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEA12vHV3gYy5zd1lujA7prEhCSkAszE6E37mViWhLQ63CuedZf
yYaTAHQKHYDZi4K1MNAySUfZRMcICSSsxlRIz6mzXrFsowaJgwx4cbMDIvXE03Ks
tuXoTYJh+xmXB+5yEVEtIyP2DvPqfCmwCZb3k94Y/VY1nAQDxIxciXrAxT9zT1oY
d0YWr2ypJ2mgsfnY4c3zg7W5WgvOTmYz7Ey7GJjpUjGdayx+P1CilKzSWH1xZuVQ
FNLSHvcHWXkEoCMVc0tW5mO5eEO1aNHo9MSjPF386l1rq+pz5OwjqCEZq2b1Yxes
yLnbF+8+iYGfYmFaDLFwG7zVDwialuI4TzIIOQIDAQABAoIBAHLf4pleTbqmmBWr
IC7oxhgIBmAR2Nbq7eyO2/e0ePxURnZqPwI0ZUekmZBKGbgvp3e0TlyNl+r5R+u4
RvosD/fNQv2IF6qH3eSoTcIz98Q40xD+4eNWjp5mnOFOMB/mo6VgaHWIw7oNkElN
4bX7b2LG2QSfaE8eRPQW9XHKp+mGhYFbxgPYxUmlIXuYZF61hVwxysDA6DP3LOi8
yUL6E64x6NqN9xtg/VoN+f6N0MOvsr4yb5+uvni1LVRFI7tNqIN4Y6P6trgKfnRR
EpZeAUu8scqyxE4NeqnnjK/wBuXxaeh3e9mN1V2SzT629c1InmmQasZ5slcCJQB+
38cswgECgYEA+esaLKwHXT4+sOqMYemi7TrhxtNC2f5OAGUiSRVmTnum2gl4wOB+
h5oLZAuG5nBEIoqbMEbI35vfuHqIe390IJtPdQlz4TGDsPufYj/gnnBBFy/c8f+n
f/CdRDRYrpnpKGwvUntLRB2pFbe2hlqqq+4YUqiHauJMOCJnPbOo1lECgYEA3KnF
VOXyY0fKD45G7ttfAcpw8ZI2gY99sCRwtBQGsbO61bvw5sl/3j7AmYosz+n6f7hb
uHmitIuPv4z3r1yfVysh80tTGIM3wDkpr3fLYRxpVOZU4hgxMQV9yyaSA/Hfqn48
vIK/NC4bERqpofNNdrIqNaGWkd87ZycvpRfa0WkCgYBztbVVr4RtWG9gLAg5IRot
KhD0pEWUdpiYuDpqifznI3r6Al6lNot+rwTNGkUoFhyFvZTigjNozFuFpz3fqAAV
RLNCJdFAF1O4spd1vst5r9GDMcbjSJG9u6KkvHO+y0XXUFeMoccUT4NEqd1ZUUsp
9T/PrXWdOA9AAjW4rKDkMQKBgQC9R4NVR8mbD8Frhoeh69qbFqO7E8hdalBN/3QN
hAAZ/imNnSEPVliwsvNSwQufbPzLAcDrhKrkY7JyhOERM0oa44zDvSESLbxszpvL
P97c9hoEEW9OYaIQgr1cvUES0S8ieBZxPVX11HazPUO0/5a68ijyyCD4D5xM53gf
DU9NwQKBgQCmVthQi65xcc4mgCIwXtBZWXeaPv5x0dLEXIC5EoN6eXLK9iW//7cE
hhawtJtl+J6laB+TkEGQsyhc4v85WcywdisyR7LR7CUqFYJMKeE/VtTVKnYbfq54
rHoQS9YotByBwPtRx0V93gkc+KWBOGmSBBxKj7lrBkYkcWAiRfpJjg==
-----END RSA PRIVATE KEY-----`
const pemPrivateKeyWithPass = `-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,3EABF60A784F9065
IDGYvdRJXvBt5vEDI9caEYJ2vvVmoqmxTKvheNX0aLSXUl/p8hIZ25kd/4mpmI3m
irQdEe2JuNh4/fPDe6Agg6mX6mYCVbiupfXdFKkqJzndW/O5nEQ4yuRgi0fO4wcH
OM/kTS8/7UaKfCuWFa71ywh1WeStFDBwsMQqLdFFeuQ/JC6g2tZW6xzCBE0BVIkq
6OWXmWumXMufhOdpb9sNoc3lbdOi037V886o0cIRQp4qPepElhhhplrhaJZBSxiP
TUldExbtYCN1APhrgUp1RpxIWHNLezjhUYLGooxb6SqinpLd9ia2uFotwNDeX7/T
dMPQPtgdFwvoCtWn9oVWp+regdZPacABLsvtTD4NS8h13BKzBmAqtYfHJk44u/Tv
6PcCb9xHI7+YpNJznrHiCtALWkfG56mDjp0SP+OKjsYMjo317D+x892i2XT79k2T
0IM0OUPizVkN5c7uDQBHqxmE9JVQT7QFMy1P57nWPsmG5o7e9Y/klaPQzi04FWEh
YAEZrU5/FQlFziu3/Jw6WwQnm3IqJP6iMlnR9Y5iZCZQnLhcJNIxxOJ/+cVH4dVD
jIHztasHgbfld045Ua7nk91VyFP5pWRPFacJ74D+xm/1IjF/+9Uj3NQX88Swig0Q
Fi7+eJ1XtCI0YdUqiUdp8QaS1GnFzibSIcXCbLLEn0Cgh/3CFXUyh92M4GIgvmcI
/hi4nUDa3nLYDHyOZubFLERb+Zr3EFzNXX4Ga3fcNH0deluxW4tda+QCk0ud6k9N
y2bCcAVnvbB+yX2s7CSVq+eaT/4JLIJY5AlrISRwYtG57SR/DN9HuU99dD30k581
PmarIt4VAakjXo/Zqd1AMh+ofbC/Qm7jBwbPGPZAM/FjpnVsvaXsdChI19Az72v3
wiLOKEw8M23vV4/E7QwW3Pp/RPyUZk6HAlBuLXbcyZHOOV4WPsKrI46BBXL8Qf4X
5kpRITFFUaFu3aaO7mloVAoneEKusKJgKOAwWifRI3jf6fH9B8qDA0jQpWRNpLs4
3A2qrOyHQ9SMoBr7ya8Vs2BMdfqAmOyiUdVzLr2EjnRxa7f3/7/sdzD1aaIJa2TM
kjpKgFMq5B/FRVmuAvKyEF52A/b6L9EpinyB53DzWnIw9W5zdjjRkuxmGmv1R94A
gJvbONh955cinHft0rm0hdKo77wDvXZdX5ZeITjOwJ0d/VBHYDGUonDVgnAVLcz+
n1BS+oOS1xLG/EJOGqtNYihVuCkbIwwdAVhc7pKo3nIbLyrKFKFyh/Br11PPBris
nlWo8BWSoFv7gKOftkulHJFAVekisaXe4OIcYMATeLvDfAnBDJrNHZn0HcyHI51L
3EhCCPJrrmfNv+QMdPk6LTts5YIdhNRSV5PR2X8ZshChod7atyrw+Wm+LCcy3h1G
xIVNracpnna+Ic5M8EIJZgLOH7IjDFS1EcPjz5em0rVqGGsLDvxmRo2ZJTPSHlpM
8q6VJEIso5sfoauf+fX+y7xk1CpFG8NkXSplbiYmZXdB1zepV1a/ZiW2uU7hEAV7
oMEzoBEIw3wTuRasixjH7Z6i8PvF3eUKXCIt0UiwTmWdCCW37c5eqjguyp9aLDtc
-----END RSA PRIVATE KEY-----`

Просмотреть файл

@ -1,173 +0,0 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package app
import (
"flag"
"io/ioutil"
golog "log"
"net"
nethttp "net/http"
"path"
"strings"
"time"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/collection"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/http"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtorc/logic"
"vitess.io/vitess/go/vt/vtorc/process"
"vitess.io/vitess/go/vt/vtorc/ssl"
"github.com/go-martini/martini"
"github.com/martini-contrib/auth"
"github.com/martini-contrib/gzip"
"github.com/martini-contrib/render"
)
const discoveryMetricsName = "DISCOVERY_METRICS"
// TODO(sougou): see if this can be embedded.
var webDir = flag.String("orc_web_dir", "web/vtorc", "VTOrc http file location")
var sslPEMPassword []byte
var agentSSLPEMPassword []byte
var discoveryMetrics *collection.Collection
// HTTP starts serving
func HTTP(continuousDiscovery bool) {
promptForSSLPasswords()
process.ContinuousRegistration(string(process.VTOrcExecutionHTTPMode), "")
martini.Env = martini.Prod
standardHTTP(continuousDiscovery)
}
// Iterate over the private keys and get passwords for them
// Don't prompt for a password a second time if the files are the same
func promptForSSLPasswords() {
if ssl.IsEncryptedPEM(config.Config.SSLPrivateKeyFile) {
sslPEMPassword = ssl.GetPEMPassword(config.Config.SSLPrivateKeyFile)
}
if ssl.IsEncryptedPEM(config.Config.AgentSSLPrivateKeyFile) {
if config.Config.AgentSSLPrivateKeyFile == config.Config.SSLPrivateKeyFile {
agentSSLPEMPassword = sslPEMPassword
} else {
agentSSLPEMPassword = ssl.GetPEMPassword(config.Config.AgentSSLPrivateKeyFile)
}
}
}
// standardHTTP starts serving HTTP or HTTPS (api/web) requests, to be used by normal clients
func standardHTTP(continuousDiscovery bool) {
m := martini.Classic()
// Make martini silent by setting its logger to a discard endpoint
m.Logger(golog.New(ioutil.Discard, "", 0))
switch strings.ToLower(config.Config.AuthenticationMethod) {
case "basic":
{
if config.Config.HTTPAuthUser == "" {
// Still allowed; may be disallowed in future versions
log.Warning("AuthenticationMethod is configured as 'basic' but HTTPAuthUser undefined. Running without authentication.")
}
m.Use(auth.Basic(config.Config.HTTPAuthUser, config.Config.HTTPAuthPassword))
}
case "multi":
{
if config.Config.HTTPAuthUser == "" {
// Still allowed; may be disallowed in future versions
log.Fatal("AuthenticationMethod is configured as 'multi' but HTTPAuthUser undefined")
}
m.Use(auth.BasicFunc(func(username, password string) bool {
if username == "readonly" {
// Will be treated as "read-only"
return true
}
return auth.SecureCompare(username, config.Config.HTTPAuthUser) && auth.SecureCompare(password, config.Config.HTTPAuthPassword)
}))
}
default:
{
// We inject a dummy User object because we have function signatures with User argument in api.go
m.Map(auth.User(""))
}
}
m.Use(gzip.All())
// Render html templates from templates directory
m.Use(render.Renderer(render.Options{
Directory: *webDir,
Layout: "templates/layout",
HTMLContentType: "text/html",
}))
m.Use(martini.Static(path.Join(*webDir, "public"), martini.StaticOptions{Prefix: config.Config.URLPrefix}))
if config.Config.UseMutualTLS {
m.Use(ssl.VerifyOUs(config.Config.SSLValidOUs))
}
inst.SetMaintenanceOwner(process.ThisHostname)
if continuousDiscovery {
// start to expire metric collection info
discoveryMetrics = collection.CreateOrReturnCollection(discoveryMetricsName)
discoveryMetrics.SetExpirePeriod(time.Duration(config.Config.DiscoveryCollectionRetentionSeconds) * time.Second)
log.Info("Starting Discovery")
go logic.ContinuousDiscovery()
}
log.Info("Registering endpoints")
http.HTTPWeb.URLPrefix = config.Config.URLPrefix
http.HTTPWeb.RegisterRequests(m)
// Serve
if config.Config.ListenSocket != "" {
log.Infof("Starting HTTP listener on unix socket %v", config.Config.ListenSocket)
unixListener, err := net.Listen("unix", config.Config.ListenSocket)
if err != nil {
log.Fatal(err)
}
defer unixListener.Close()
if err := nethttp.Serve(unixListener, m); err != nil {
log.Fatal(err)
}
} else if config.Config.UseSSL {
log.Info("Starting HTTPS listener")
tlsConfig, err := ssl.NewTLSConfig(config.Config.SSLCAFile, config.Config.UseMutualTLS)
if err != nil {
log.Fatal(err)
}
tlsConfig.InsecureSkipVerify = config.Config.SSLSkipVerify
if err = ssl.AppendKeyPairWithPassword(tlsConfig, config.Config.SSLCertFile, config.Config.SSLPrivateKeyFile, sslPEMPassword); err != nil {
log.Fatal(err)
}
if err = ssl.ListenAndServeTLS(config.Config.ListenAddress, m, tlsConfig); err != nil {
log.Fatal(err)
}
} else {
log.Infof("Starting HTTP listener on %+v", config.Config.ListenAddress)
if err := nethttp.ListenAndServe(config.Config.ListenAddress, m); err != nil {
log.Fatal(err)
}
}
log.Info("Web server closed")
}

Просмотреть файл

@ -1,74 +0,0 @@
package app
import (
"bytes"
"io"
"log"
"net/http"
"os"
"sync"
"testing"
"time"
"github.com/stretchr/testify/require"
"vitess.io/vitess/go/vt/vtorc/config"
)
func captureOutput(f func()) string {
reader, writer, err := os.Pipe()
if err != nil {
panic(err)
}
stdout := os.Stdout
stderr := os.Stderr
defer func() {
os.Stdout = stdout
os.Stderr = stderr
log.SetOutput(os.Stderr)
}()
os.Stdout = writer
os.Stderr = writer
log.SetOutput(writer)
out := make(chan string)
wg := new(sync.WaitGroup)
wg.Add(1)
go func() {
var buf bytes.Buffer
wg.Done()
_, _ = io.Copy(&buf, reader)
out <- buf.String()
}()
wg.Wait()
f()
_ = writer.Close()
return <-out
}
func TestStandardHTTPLogging(t *testing.T) {
defaultListenAddress := config.Config.ListenAddress
listenAddress := ":17000"
config.Config.ListenAddress = listenAddress
defer func() {
config.Config.ListenAddress = defaultListenAddress
}()
logOutput := captureOutput(func() {
go standardHTTP(false)
time.Sleep(10 * time.Second)
// Make a API request to check if something logged
makeAPICall(t, "http://localhost:17000/api/health")
})
require.NotContains(t, logOutput, "martini")
}
// makeAPICall is used make an API call given the url. It returns the status and the body of the response received
func makeAPICall(t *testing.T, url string) (status int, response string) {
t.Helper()
res, err := http.Get(url)
require.NoError(t, err)
bodyBytes, err := io.ReadAll(res.Body)
require.NoError(t, err)
body := string(bodyBytes)
return res.StatusCode, body
}

Просмотреть файл

@ -118,7 +118,7 @@ func CreateOrReturnCollection(name string) *Collection {
collection: nil,
done: make(chan struct{}),
// WARNING: use a different configuration name
expirePeriod: time.Duration(config.Config.DiscoveryCollectionRetentionSeconds) * time.Second,
expirePeriod: time.Duration(config.DiscoveryCollectionRetentionSeconds) * time.Second,
}
go qmc.StartAutoExpiration()

Просмотреть файл

@ -1,36 +0,0 @@
/*
Copyright 2015 Shlomi Noach, courtesy Booking.com
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package config
// CLIFlags stores some command line flags that are globally available in the process' lifetime
type CLIFlags struct {
Noop *bool
SkipUnresolve *bool
SkipUnresolveCheck *bool
BinlogFile *string
GrabElection *bool
Version *bool
Statement *string
PromotionRule *string
ConfiguredVersion string
SkipContinuousRegistration *bool
EnableDatabaseUpdate *bool
IgnoreRaftSetup *bool
Tag *string
}
var RuntimeCLIFlags CLIFlags

Просмотреть файл

@ -19,23 +19,16 @@ package config
import (
"encoding/json"
"fmt"
"net/url"
"os"
"regexp"
"strings"
"time"
"github.com/spf13/pflag"
"vitess.io/vitess/go/vt/log"
"gopkg.in/gcfg.v1"
)
var (
envVariableRegexp = regexp.MustCompile("[$][{](.*)[}]")
)
const (
LostInRecoveryDowntimeSeconds int = 60 * 60 * 24 * 365
DefaultStatusAPIEndpoint = "/api/status"
)
var configurationLoaded = make(chan bool)
@ -46,184 +39,74 @@ const (
MaintenanceOwner = "vtorc"
AuditPageSize = 20
MaintenancePurgeDays = 7
MySQLTopologyMaxPoolConnections = 3
MaintenanceExpireMinutes = 10
DebugMetricsIntervalSeconds = 10
StaleInstanceCoordinatesExpireSeconds = 60
DiscoveryMaxConcurrency = 300 // Number of goroutines doing hosts discovery
DiscoveryQueueCapacity = 100000
DiscoveryQueueMaxStatisticsSize = 120
DiscoveryCollectionRetentionSeconds = 120
HostnameResolveMethod = "default"
UnseenInstanceForgetHours = 240 // Number of hours after which an unseen instance is forgotten
ExpiryHostnameResolvesMinutes = 60 // Number of minutes after which to expire hostname-resolves
CandidateInstanceExpireMinutes = 60 // Minutes after which a suggestion to use an instance as a candidate replica (to be preferably promoted on primary failover) is expired.
FailureDetectionPeriodBlockMinutes = 60 // The time for which an instance's failure discovery is kept "active", so as to avoid concurrent "discoveries" of the instance's failure; this preceeds any recovery process, if any.
)
var (
sqliteDataFile = "file::memory:?mode=memory&cache=shared"
instancePollTime = 5 * time.Second
snapshotTopologyInterval = 0 * time.Hour
reasonableReplicationLag = 10 * time.Second
auditFileLocation = ""
auditToBackend = false
auditToSyslog = false
auditPurgeDuration = 7 * 24 * time.Hour // Equivalent of 7 days
recoveryPeriodBlockDuration = 30 * time.Second
preventCrossCellFailover = false
lockShardTimeout = 30 * time.Second
waitReplicasTimeout = 30 * time.Second
topoInformationRefreshDuration = 15 * time.Second
recoveryPollDuration = 1 * time.Second
)
// RegisterFlags registers the flags required by VTOrc
func RegisterFlags(fs *pflag.FlagSet) {
fs.StringVar(&sqliteDataFile, "sqlite-data-file", sqliteDataFile, "SQLite Datafile to use as VTOrc's database")
fs.DurationVar(&instancePollTime, "instance-poll-time", instancePollTime, "Timer duration on which VTOrc refreshes MySQL information")
fs.DurationVar(&snapshotTopologyInterval, "snapshot-topology-interval", snapshotTopologyInterval, "Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours")
fs.DurationVar(&reasonableReplicationLag, "reasonable-replication-lag", reasonableReplicationLag, "Maximum replication lag on replicas which is deemed to be acceptable")
fs.StringVar(&auditFileLocation, "audit-file-location", auditFileLocation, "File location where the audit logs are to be stored")
fs.BoolVar(&auditToBackend, "audit-to-backend", auditToBackend, "Whether to store the audit log in the VTOrc database")
fs.BoolVar(&auditToSyslog, "audit-to-syslog", auditToSyslog, "Whether to store the audit log in the syslog")
fs.DurationVar(&auditPurgeDuration, "audit-purge-duration", auditPurgeDuration, "Duration for which audit logs are held before being purged. Should be in multiples of days")
fs.DurationVar(&recoveryPeriodBlockDuration, "recovery-period-block-duration", recoveryPeriodBlockDuration, "Duration for which a new recovery is blocked on an instance after running a recovery")
fs.BoolVar(&preventCrossCellFailover, "prevent-cross-cell-failover", preventCrossCellFailover, "Prevent VTOrc from promoting a primary in a different cell than the current primary in case of a failover")
fs.DurationVar(&lockShardTimeout, "lock-shard-timeout", lockShardTimeout, "Duration for which a shard lock is held when running a recovery")
fs.DurationVar(&waitReplicasTimeout, "wait-replicas-timeout", waitReplicasTimeout, "Duration for which to wait for replica's to respond when issuing RPCs")
fs.DurationVar(&topoInformationRefreshDuration, "topo-information-refresh-duration", topoInformationRefreshDuration, "Timer duration on which VTOrc refreshes the keyspace and vttablet records from the topology server")
fs.DurationVar(&recoveryPollDuration, "recovery-poll-duration", recoveryPollDuration, "Timer duration on which VTOrc polls its database to run a recovery")
}
// Configuration makes for vtorc configuration input, which can be provided by user via JSON formatted file.
// Some of the parameteres have reasonable default values, and some (like database credentials) are
// strictly expected from user.
// TODO(sougou): change this to yaml parsing, and possible merge with tabletenv.
type Configuration struct {
Debug bool // set debug mode (similar to --debug option)
EnableSyslog bool // Should logs be directed (in addition) to syslog daemon?
ListenAddress string // Where vtorc HTTP should listen for TCP
ListenSocket string // Where vtorc HTTP should listen for unix socket (default: empty; when given, TCP is disabled)
HTTPAdvertise string // optional, for raft setups, what is the HTTP address this node will advertise to its peers (potentially use where behind NAT or when rerouting ports; example: "http://11.22.33.44:3030")
AgentsServerPort string // port vtorc agents talk back to
MySQLTopologyUser string // The user VTOrc will use to connect to MySQL instances
MySQLTopologyPassword string // The password VTOrc will use to connect to MySQL instances
MySQLReplicaUser string // User to set on replica MySQL instances while configuring replication settings on them. If set, use this credential instead of discovering from mysql. TODO(sougou): deprecate this in favor of fetching from vttablet
MySQLReplicaPassword string // Password to set on replica MySQL instances while configuring replication settings on them.
MySQLTopologyCredentialsConfigFile string // my.cnf style configuration file from where to pick credentials. Expecting `user`, `password` under `[client]` section
MySQLTopologySSLPrivateKeyFile string // Private key file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLCertFile string // Certificate PEM file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLCAFile string // Certificate Authority PEM file used to authenticate with a Topology mysql instance with TLS
MySQLTopologySSLSkipVerify bool // If true, do not strictly validate mutual TLS certs for Topology mysql instances
MySQLTopologyUseMutualTLS bool // Turn on TLS authentication with the Topology MySQL instances
MySQLTopologyUseMixedTLS bool // Mixed TLS and non-TLS authentication with the Topology MySQL instances
TLSCacheTTLFactor uint // Factor of InstancePollSeconds that we set as TLS info cache expiry
BackendDB string // EXPERIMENTAL: type of backend db; either "mysql" or "sqlite3"
SQLite3DataFile string // when BackendDB == "sqlite3", full path to sqlite3 datafile
SkipOrchestratorDatabaseUpdate bool // When true, do not check backend database schema nor attempt to update it. Useful when you may be running multiple versions of vtorc, and you only wish certain boxes to dictate the db structure (or else any time a different vtorc version runs it will rebuild database schema)
PanicIfDifferentDatabaseDeploy bool // When true, and this process finds the vtorc backend DB was provisioned by a different version, panic
RaftEnabled bool // When true, setup vtorc in a raft consensus layout. When false (default) all Raft* variables are ignored
RaftBind string
RaftAdvertise string
RaftDataDir string
DefaultRaftPort int // if a RaftNodes entry does not specify port, use this one
RaftNodes []string // Raft nodes to make initial connection with
ExpectFailureAnalysisConcensus bool
MySQLVTOrcHost string
MySQLVTOrcMaxPoolConnections int // The maximum size of the connection pool to the VTOrc backend.
MySQLVTOrcPort uint
MySQLVTOrcDatabase string
MySQLVTOrcUser string
MySQLVTOrcPassword string
MySQLVTOrcCredentialsConfigFile string // my.cnf style configuration file from where to pick credentials. Expecting `user`, `password` under `[client]` section
MySQLVTOrcSSLPrivateKeyFile string // Private key file used to authenticate with the VTOrc mysql instance with TLS
MySQLVTOrcSSLCertFile string // Certificate PEM file used to authenticate with the VTOrc mysql instance with TLS
MySQLVTOrcSSLCAFile string // Certificate Authority PEM file used to authenticate with the VTOrc mysql instance with TLS
MySQLVTOrcSSLSkipVerify bool // If true, do not strictly validate mutual TLS certs for the VTOrc mysql instances
MySQLVTOrcUseMutualTLS bool // Turn on TLS authentication with the VTOrc MySQL instance
MySQLVTOrcReadTimeoutSeconds int // Number of seconds before backend mysql read operation is aborted (driver-side)
MySQLVTOrcRejectReadOnly bool // Reject read only connections https://github.com/go-sql-driver/mysql#rejectreadonly
MySQLConnectTimeoutSeconds int // Number of seconds before connection is aborted (driver-side)
MySQLDiscoveryReadTimeoutSeconds int // Number of seconds before topology mysql read operation is aborted (driver-side). Used for discovery queries.
MySQLTopologyReadTimeoutSeconds int // Number of seconds before topology mysql read operation is aborted (driver-side). Used for all but discovery queries.
MySQLConnectionLifetimeSeconds int // Number of seconds the mysql driver will keep database connection alive before recycling it
DefaultInstancePort int // In case port was not specified on command line
ReplicationLagQuery string // custom query to check on replica lg (e.g. heartbeat table). Must return a single row with a single numeric column, which is the lag.
ReplicationCredentialsQuery string // custom query to get replication credentials. Must return a single row, with two text columns: 1st is username, 2nd is password. This is optional, and can be used by vtorc to configure replication after primary takeover or setup of co-primary. You need to ensure the vtorc user has the privileges to run this query
DiscoverByShowSlaveHosts bool // Attempt SHOW SLAVE HOSTS before PROCESSLIST
UseSuperReadOnly bool // Should vtorc super_read_only any time it sets read_only
InstancePollSeconds uint // Number of seconds between instance reads
InstanceWriteBufferSize int // Instance write buffer size (max number of instances to flush in one INSERT ODKU)
BufferInstanceWrites bool // Set to 'true' for write-optimization on backend table (compromise: writes can be stale and overwrite non stale data)
InstanceFlushIntervalMilliseconds int // Max interval between instance write buffer flushes
UnseenInstanceForgetHours uint // Number of hours after which an unseen instance is forgotten
SnapshotTopologiesIntervalHours uint // Interval in hour between snapshot-topologies invocation. Default: 0 (disabled)
DiscoveryMaxConcurrency uint // Number of goroutines doing hosts discovery
DiscoveryQueueCapacity uint // Buffer size of the discovery queue. Should be greater than the number of DB instances being discovered
DiscoveryQueueMaxStatisticsSize int // The maximum number of individual secondly statistics taken of the discovery queue
DiscoveryCollectionRetentionSeconds uint // Number of seconds to retain the discovery collection information
DiscoverySeeds []string // Hard coded array of hostname:port, ensuring vtorc discovers these hosts upon startup, assuming not already known to vtorc
InstanceBulkOperationsWaitTimeoutSeconds uint // Time to wait on a single instance when doing bulk (many instances) operation
HostnameResolveMethod string // Method by which to "normalize" hostname ("none"/"default"/"cname")
MySQLHostnameResolveMethod string // Method by which to "normalize" hostname via MySQL server. ("none"/"@@hostname"/"@@report_host"; default "@@hostname")
SkipBinlogServerUnresolveCheck bool // Skip the double-check that an unresolved hostname resolves back to same hostname for binlog servers
ExpiryHostnameResolvesMinutes int // Number of minutes after which to expire hostname-resolves
RejectHostnameResolvePattern string // Regexp pattern for resolved hostname that will not be accepted (not cached, not written to db). This is done to avoid storing wrong resolves due to network glitches.
ReasonableReplicationLagSeconds int // Above this value is considered a problem
ProblemIgnoreHostnameFilters []string // Will minimize problem visualization for hostnames matching given regexp filters
VerifyReplicationFilters bool // Include replication filters check before approving topology refactoring
ReasonableMaintenanceReplicationLagSeconds int // Above this value move-up and move-below are blocked
CandidateInstanceExpireMinutes uint // Minutes after which a suggestion to use an instance as a candidate replica (to be preferably promoted on primary failover) is expired.
AuditLogFile string // Name of log file for audit operations. Disabled when empty.
AuditToSyslog bool // If true, audit messages are written to syslog
AuditToBackendDB bool // If true, audit messages are written to the backend DB's `audit` table (default: true)
AuditPurgeDays uint // Days after which audit entries are purged from the database
RemoveTextFromHostnameDisplay string // Text to strip off the hostname on cluster/clusters pages
ReadOnly bool
AuthenticationMethod string // Type of autherntication to use, if any. "" for none, "basic" for BasicAuth, "multi" for advanced BasicAuth, "proxy" for forwarded credentials via reverse proxy, "token" for token based access
OAuthClientID string
OAuthClientSecret string
OAuthScopes []string
HTTPAuthUser string // Username for HTTP Basic authentication (blank disables authentication)
HTTPAuthPassword string // Password for HTTP Basic authentication
AuthUserHeader string // HTTP header indicating auth user, when AuthenticationMethod is "proxy"
PowerAuthUsers []string // On AuthenticationMethod == "proxy", list of users that can make changes. All others are read-only.
PowerAuthGroups []string // list of unix groups the authenticated user must be a member of to make changes.
AccessTokenUseExpirySeconds uint // Time by which an issued token must be used
AccessTokenExpiryMinutes uint // Time after which HTTP access token expires
ClusterNameToAlias map[string]string // map between regex matching cluster name to a human friendly alias
DetectClusterAliasQuery string // Optional query (executed on topology instance) that returns the alias of a cluster. Query will only be executed on cluster primary (though until the topology's primary is resovled it may execute on other/all replicas). If provided, must return one row, one column
DetectClusterDomainQuery string // Optional query (executed on topology instance) that returns the VIP/CNAME/Alias/whatever domain name for the primary of this cluster. Query will only be executed on cluster primary (though until the topology's primary is resovled it may execute on other/all replicas). If provided, must return one row, one column
DetectInstanceAliasQuery string // Optional query (executed on topology instance) that returns the alias of an instance. If provided, must return one row, one column
DetectPromotionRuleQuery string // Optional query (executed on topology instance) that returns the promotion rule of an instance. If provided, must return one row, one column.
DataCenterPattern string // Regexp pattern with one group, extracting the datacenter name from the hostname
RegionPattern string // Regexp pattern with one group, extracting the region name from the hostname
PhysicalEnvironmentPattern string // Regexp pattern with one group, extracting physical environment info from hostname (e.g. combination of datacenter & prod/dev env)
DetectDataCenterQuery string // Optional query (executed on topology instance) that returns the data center of an instance. If provided, must return one row, one column. Overrides DataCenterPattern and useful for installments where DC cannot be inferred by hostname
DetectRegionQuery string // Optional query (executed on topology instance) that returns the region of an instance. If provided, must return one row, one column. Overrides RegionPattern and useful for installments where Region cannot be inferred by hostname
DetectPhysicalEnvironmentQuery string // Optional query (executed on topology instance) that returns the physical environment of an instance. If provided, must return one row, one column. Overrides PhysicalEnvironmentPattern and useful for installments where env cannot be inferred by hostname
DetectSemiSyncEnforcedQuery string // Optional query (executed on topology instance) to determine whether semi-sync is fully enforced for primary writes (async fallback is not allowed under any circumstance). If provided, must return one row, one column, value 0 or 1.
SupportFuzzyPoolHostnames bool // Should "submit-pool-instances" command be able to pass list of fuzzy instances (fuzzy means non-fqdn, but unique enough to recognize). Defaults 'true', implies more queries on backend db
InstancePoolExpiryMinutes uint // Time after which entries in database_instance_pool are expired (resubmit via `submit-pool-instances`)
PromotionIgnoreHostnameFilters []string // VTOrc will not promote replicas with hostname matching pattern (via -c recovery; for example, avoid promoting dev-dedicated machines)
ServeAgentsHTTP bool // Spawn another HTTP interface dedicated for vtorc-agent
AgentsUseSSL bool // When "true" vtorc will listen on agents port with SSL as well as connect to agents via SSL
AgentsUseMutualTLS bool // When "true" Use mutual TLS for the server to agent communication
AgentSSLSkipVerify bool // When using SSL for the Agent, should we ignore SSL certification error
AgentSSLPrivateKeyFile string // Name of Agent SSL private key file, applies only when AgentsUseSSL = true
AgentSSLCertFile string // Name of Agent SSL certification file, applies only when AgentsUseSSL = true
AgentSSLCAFile string // Name of the Agent Certificate Authority file, applies only when AgentsUseSSL = true
AgentSSLValidOUs []string // Valid organizational units when using mutual TLS to communicate with the agents
UseSSL bool // Use SSL on the server web port
UseMutualTLS bool // When "true" Use mutual TLS for the server's web and API connections
SSLSkipVerify bool // When using SSL, should we ignore SSL certification error
SSLPrivateKeyFile string // Name of SSL private key file, applies only when UseSSL = true
SSLCertFile string // Name of SSL certification file, applies only when UseSSL = true
SSLCAFile string // Name of the Certificate Authority file, applies only when UseSSL = true
SSLValidOUs []string // Valid organizational units when using mutual TLS
StatusEndpoint string // Override the status endpoint. Defaults to '/api/status'
StatusOUVerify bool // If true, try to verify OUs when Mutual TLS is on. Defaults to false
AgentPollMinutes uint // Minutes between agent polling
UnseenAgentForgetHours uint // Number of hours after which an unseen agent is forgotten
StaleSeedFailMinutes uint // Number of minutes after which a stale (no progress) seed is considered failed.
SeedAcceptableBytesDiff int64 // Difference in bytes between seed source & target data size that is still considered as successful copy
SeedWaitSecondsBeforeSend int64 // Number of seconds for waiting before start send data command on agent
BinlogEventsChunkSize int // Chunk size (X) for SHOW BINLOG|RELAYLOG EVENTS LIMIT ?,X statements. Smaller means less locking and mroe work to be done
ReduceReplicationAnalysisCount bool // When true, replication analysis will only report instances where possibility of handled problems is possible in the first place (e.g. will not report most leaf nodes, that are mostly uninteresting). When false, provides an entry for every known instance
FailureDetectionPeriodBlockMinutes int // The time for which an instance's failure discovery is kept "active", so as to avoid concurrent "discoveries" of the instance's failure; this preceeds any recovery process, if any.
RecoveryPeriodBlockMinutes int // (supported for backwards compatibility but please use newer `RecoveryPeriodBlockSeconds` instead) The time for which an instance's recovery is kept "active", so as to avoid concurrent recoveries on smae instance as well as flapping
RecoveryPeriodBlockSeconds int // (overrides `RecoveryPeriodBlockMinutes`) The time for which an instance's recovery is kept "active", so as to avoid concurrent recoveries on smae instance as well as flapping
RecoveryIgnoreHostnameFilters []string // Recovery analysis will completely ignore hosts matching given patterns
RecoverPrimaryClusterFilters []string // Only do primary recovery on clusters matching these regexp patterns (of course the ".*" pattern matches everything)
RecoverIntermediatePrimaryClusterFilters []string // Only do IM recovery on clusters matching these regexp patterns (of course the ".*" pattern matches everything)
ProcessesShellCommand string // Shell that executes command scripts
OnFailureDetectionProcesses []string // Processes to execute when detecting a failover scenario (before making a decision whether to failover or not). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countReplicas}, {replicaHosts}, {isDowntimed}, {autoPrimaryRecovery}, {autoIntermediatePrimaryRecovery}
PreFailoverProcesses []string // Processes to execute before doing a failover (aborting operation should any once of them exits with non-zero code; order of execution undefined). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {countReplicas}, {replicaHosts}, {isDowntimed}
PostFailoverProcesses []string // Processes to execute after doing a failover (order of execution undefined). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countReplicas}, {replicaHosts}, {isDowntimed}, {isSuccessful}, {lostReplicas}, {countLostReplicas}
PostUnsuccessfulFailoverProcesses []string // Processes to execute after a not-completely-successful failover (order of execution undefined). May and should use some of these placeholders: {failureType}, {instanceType}, {isPrimary}, {isCoPrimary}, {failureDescription}, {command}, {failedHost}, {failureCluster}, {failureClusterDomain}, {failedPort}, {successorHost}, {successorPort}, {successorAlias}, {countReplicas}, {replicaHosts}, {isDowntimed}, {isSuccessful}, {lostReplicas}, {countLostReplicas}
PostPrimaryFailoverProcesses []string // Processes to execute after doing a primary failover (order of execution undefined). Uses same placeholders as PostFailoverProcesses
PostIntermediatePrimaryFailoverProcesses []string // Processes to execute after doing a primary failover (order of execution undefined). Uses same placeholders as PostFailoverProcesses
PostTakePrimaryProcesses []string // Processes to execute after a successful Take-Primary event has taken place
CoPrimaryRecoveryMustPromoteOtherCoPrimary bool // When 'false', anything can get promoted (and candidates are prefered over others). When 'true', vtorc will promote the other co-primary or else fail
DetachLostReplicasAfterPrimaryFailover bool // Should replicas that are not to be lost in primary recovery (i.e. were more up-to-date than promoted replica) be forcibly detached
ApplyMySQLPromotionAfterPrimaryFailover bool // Should vtorc take upon itself to apply MySQL primary promotion: set read_only=0, detach replication, etc.
PreventCrossDataCenterPrimaryFailover bool // When true (default: false), cross-DC primary failover are not allowed, vtorc will do all it can to only fail over within same DC, or else not fail over at all.
PreventCrossRegionPrimaryFailover bool // When true (default: false), cross-region primary failover are not allowed, vtorc will do all it can to only fail over within same region, or else not fail over at all.
PrimaryFailoverLostInstancesDowntimeMinutes uint // Number of minutes to downtime any server that was lost after a primary failover (including failed primary & lost replicas). 0 to disable
PrimaryFailoverDetachReplicaPrimaryHost bool // Should vtorc issue a detach-replica-primary-host on newly promoted primary (this makes sure the new primary will not attempt to replicate old primary if that comes back to life). Defaults 'false'. Meaningless if ApplyMySQLPromotionAfterPrimaryFailover is 'true'.
FailPrimaryPromotionOnLagMinutes uint // when > 0, fail a primary promotion if the candidate replica is lagging >= configured number of minutes.
FailPrimaryPromotionIfSQLThreadNotUpToDate bool // when true, and a primary failover takes place, if candidate primary has not consumed all relay logs, promotion is aborted with error
DelayPrimaryPromotionIfSQLThreadNotUpToDate bool // when true, and a primary failover takes place, if candidate primary has not consumed all relay logs, delay promotion until the sql thread has caught up
PostponeReplicaRecoveryOnLagMinutes uint // On crash recovery, replicas that are lagging more than given minutes are only resurrected late in the recovery process, after primary/IM has been elected and processes executed. Value of 0 disables this feature
OSCIgnoreHostnameFilters []string // OSC replicas recommendation will ignore replica hostnames matching given patterns
URLPrefix string // URL prefix to run vtorc on non-root web path, e.g. /vtorc to put it behind nginx.
DiscoveryIgnoreReplicaHostnameFilters []string // Regexp filters to apply to prevent auto-discovering new replicas. Usage: unreachable servers due to firewalls, applications which trigger binlog dumps
DiscoveryIgnorePrimaryHostnameFilters []string // Regexp filters to apply to prevent auto-discovering a primary. Usage: pointing your primary temporarily to replicate seom data from external host
DiscoveryIgnoreHostnameFilters []string // Regexp filters to apply to prevent discovering instances of any kind
WebMessage string // If provided, will be shown on all web pages below the title bar
MaxConcurrentReplicaOperations int // Maximum number of concurrent operations on replicas
InstanceDBExecContextTimeoutSeconds int // Timeout on context used while calling ExecContext on instance database
LockShardTimeoutSeconds int // Timeout on context used to lock shard. Should be a small value because we should fail-fast
WaitReplicasTimeoutSeconds int // Timeout on amount of time to wait for the replicas in case of ERS. Should be a small value because we should fail-fast. Should not be larger than LockShardTimeoutSeconds since that is the total time we use for an ERS.
TopoInformationRefreshSeconds int // Timer duration on which VTOrc refreshes the keyspace and vttablet records from the topo-server.
RecoveryPollSeconds int // Timer duration on which VTOrc recovery analysis runs
SQLite3DataFile string // full path to sqlite3 datafile
InstancePollSeconds uint // Number of seconds between instance reads
SnapshotTopologiesIntervalHours uint // Interval in hour between snapshot-topologies invocation. Default: 0 (disabled)
ReasonableReplicationLagSeconds int // Above this value is considered a problem
AuditLogFile string // Name of log file for audit operations. Disabled when empty.
AuditToSyslog bool // If true, audit messages are written to syslog
AuditToBackendDB bool // If true, audit messages are written to the backend DB's `audit` table (default: true)
AuditPurgeDays uint // Days after which audit entries are purged from the database
RecoveryPeriodBlockSeconds int // (overrides `RecoveryPeriodBlockMinutes`) The time for which an instance's recovery is kept "active", so as to avoid concurrent recoveries on smae instance as well as flapping
PreventCrossDataCenterPrimaryFailover bool // When true (default: false), cross-DC primary failover are not allowed, vtorc will do all it can to only fail over within same DC, or else not fail over at all.
LockShardTimeoutSeconds int // Timeout on context used to lock shard. Should be a small value because we should fail-fast
WaitReplicasTimeoutSeconds int // Timeout on amount of time to wait for the replicas in case of ERS. Should be a small value because we should fail-fast. Should not be larger than LockShardTimeoutSeconds since that is the total time we use for an ERS.
TopoInformationRefreshSeconds int // Timer duration on which VTOrc refreshes the keyspace and vttablet records from the topo-server.
RecoveryPollSeconds int // Timer duration on which VTOrc recovery analysis runs
}
// ToJSONString will marshal this configuration as JSON
@ -236,263 +119,67 @@ func (config *Configuration) ToJSONString() string {
var Config = newConfiguration()
var readFileNames []string
// UpdateConfigValuesFromFlags is used to update the config values from the flags defined.
// This is done before we read any configuration files from the user. So the config files take precedence.
func UpdateConfigValuesFromFlags() {
Config.SQLite3DataFile = sqliteDataFile
Config.InstancePollSeconds = uint(instancePollTime / time.Second)
Config.InstancePollSeconds = uint(instancePollTime / time.Second)
Config.SnapshotTopologiesIntervalHours = uint(snapshotTopologyInterval / time.Hour)
Config.ReasonableReplicationLagSeconds = int(reasonableReplicationLag / time.Second)
Config.AuditLogFile = auditFileLocation
Config.AuditToBackendDB = auditToBackend
Config.AuditToSyslog = auditToSyslog
Config.AuditPurgeDays = uint(auditPurgeDuration / (time.Hour * 24))
Config.RecoveryPeriodBlockSeconds = int(recoveryPeriodBlockDuration / time.Second)
Config.PreventCrossDataCenterPrimaryFailover = preventCrossCellFailover
Config.LockShardTimeoutSeconds = int(lockShardTimeout / time.Second)
Config.WaitReplicasTimeoutSeconds = int(waitReplicasTimeout / time.Second)
Config.TopoInformationRefreshSeconds = int(topoInformationRefreshDuration / time.Second)
Config.RecoveryPollSeconds = int(recoveryPollDuration / time.Second)
}
// LogConfigValues is used to log the config values.
func LogConfigValues() {
b, _ := json.MarshalIndent(Config, "", "\t")
log.Infof("Running with Configuration - %v", string(b))
}
func newConfiguration() *Configuration {
return &Configuration{
Debug: false,
EnableSyslog: false,
ListenAddress: ":3000",
ListenSocket: "",
HTTPAdvertise: "",
AgentsServerPort: ":3001",
StatusEndpoint: DefaultStatusAPIEndpoint,
StatusOUVerify: false,
BackendDB: "sqlite",
SQLite3DataFile: "file::memory:?mode=memory&cache=shared",
SkipOrchestratorDatabaseUpdate: false,
PanicIfDifferentDatabaseDeploy: false,
RaftBind: "127.0.0.1:10008",
RaftAdvertise: "",
RaftDataDir: "",
DefaultRaftPort: 10008,
RaftNodes: []string{},
ExpectFailureAnalysisConcensus: true,
MySQLVTOrcMaxPoolConnections: 128, // limit concurrent conns to backend DB
MySQLVTOrcPort: 3306,
MySQLTopologyUseMutualTLS: false,
MySQLTopologyUseMixedTLS: true,
MySQLVTOrcUseMutualTLS: false,
MySQLConnectTimeoutSeconds: 2,
MySQLVTOrcReadTimeoutSeconds: 30,
MySQLVTOrcRejectReadOnly: false,
MySQLDiscoveryReadTimeoutSeconds: 10,
MySQLTopologyReadTimeoutSeconds: 600,
MySQLConnectionLifetimeSeconds: 0,
DefaultInstancePort: 3306,
TLSCacheTTLFactor: 100,
InstancePollSeconds: 5,
InstanceWriteBufferSize: 100,
BufferInstanceWrites: false,
InstanceFlushIntervalMilliseconds: 100,
UnseenInstanceForgetHours: 240,
SnapshotTopologiesIntervalHours: 0,
DiscoverByShowSlaveHosts: false,
UseSuperReadOnly: false,
DiscoveryMaxConcurrency: 300,
DiscoveryQueueCapacity: 100000,
DiscoveryQueueMaxStatisticsSize: 120,
DiscoveryCollectionRetentionSeconds: 120,
DiscoverySeeds: []string{},
InstanceBulkOperationsWaitTimeoutSeconds: 10,
HostnameResolveMethod: "default",
MySQLHostnameResolveMethod: "none",
SkipBinlogServerUnresolveCheck: true,
ExpiryHostnameResolvesMinutes: 60,
RejectHostnameResolvePattern: "",
ReasonableReplicationLagSeconds: 10,
ProblemIgnoreHostnameFilters: []string{},
VerifyReplicationFilters: false,
ReasonableMaintenanceReplicationLagSeconds: 20,
CandidateInstanceExpireMinutes: 60,
AuditLogFile: "",
AuditToSyslog: false,
AuditToBackendDB: false,
AuditPurgeDays: 7,
RemoveTextFromHostnameDisplay: "",
ReadOnly: false,
AuthenticationMethod: "",
HTTPAuthUser: "",
HTTPAuthPassword: "",
AuthUserHeader: "X-Forwarded-User",
PowerAuthUsers: []string{"*"},
PowerAuthGroups: []string{},
AccessTokenUseExpirySeconds: 60,
AccessTokenExpiryMinutes: 1440,
ClusterNameToAlias: make(map[string]string),
DetectClusterAliasQuery: "",
DetectClusterDomainQuery: "",
DetectInstanceAliasQuery: "",
DetectPromotionRuleQuery: "",
DataCenterPattern: "",
PhysicalEnvironmentPattern: "",
DetectDataCenterQuery: "",
DetectPhysicalEnvironmentQuery: "",
DetectSemiSyncEnforcedQuery: "",
SupportFuzzyPoolHostnames: true,
InstancePoolExpiryMinutes: 60,
PromotionIgnoreHostnameFilters: []string{},
ServeAgentsHTTP: false,
AgentsUseSSL: false,
AgentsUseMutualTLS: false,
AgentSSLValidOUs: []string{},
AgentSSLSkipVerify: false,
AgentSSLPrivateKeyFile: "",
AgentSSLCertFile: "",
AgentSSLCAFile: "",
UseSSL: false,
UseMutualTLS: false,
SSLValidOUs: []string{},
SSLSkipVerify: false,
SSLPrivateKeyFile: "",
SSLCertFile: "",
SSLCAFile: "",
AgentPollMinutes: 60,
UnseenAgentForgetHours: 6,
StaleSeedFailMinutes: 60,
SeedAcceptableBytesDiff: 8192,
SeedWaitSecondsBeforeSend: 2,
BinlogEventsChunkSize: 10000,
ReduceReplicationAnalysisCount: true,
FailureDetectionPeriodBlockMinutes: 60,
RecoveryPeriodBlockMinutes: 60,
RecoveryPeriodBlockSeconds: 3600,
RecoveryIgnoreHostnameFilters: []string{},
RecoverPrimaryClusterFilters: []string{"*"},
RecoverIntermediatePrimaryClusterFilters: []string{},
ProcessesShellCommand: "bash",
OnFailureDetectionProcesses: []string{},
PreFailoverProcesses: []string{},
PostPrimaryFailoverProcesses: []string{},
PostIntermediatePrimaryFailoverProcesses: []string{},
PostFailoverProcesses: []string{},
PostUnsuccessfulFailoverProcesses: []string{},
PostTakePrimaryProcesses: []string{},
CoPrimaryRecoveryMustPromoteOtherCoPrimary: true,
DetachLostReplicasAfterPrimaryFailover: true,
ApplyMySQLPromotionAfterPrimaryFailover: true,
PreventCrossDataCenterPrimaryFailover: false,
PreventCrossRegionPrimaryFailover: false,
PrimaryFailoverLostInstancesDowntimeMinutes: 0,
PrimaryFailoverDetachReplicaPrimaryHost: false,
FailPrimaryPromotionOnLagMinutes: 0,
FailPrimaryPromotionIfSQLThreadNotUpToDate: false,
DelayPrimaryPromotionIfSQLThreadNotUpToDate: true,
PostponeReplicaRecoveryOnLagMinutes: 0,
OSCIgnoreHostnameFilters: []string{},
URLPrefix: "",
DiscoveryIgnoreReplicaHostnameFilters: []string{},
WebMessage: "",
MaxConcurrentReplicaOperations: 5,
InstanceDBExecContextTimeoutSeconds: 30,
LockShardTimeoutSeconds: 30,
WaitReplicasTimeoutSeconds: 30,
TopoInformationRefreshSeconds: 15,
RecoveryPollSeconds: 1,
SQLite3DataFile: "file::memory:?mode=memory&cache=shared",
InstancePollSeconds: 5,
SnapshotTopologiesIntervalHours: 0,
ReasonableReplicationLagSeconds: 10,
AuditLogFile: "",
AuditToSyslog: false,
AuditToBackendDB: false,
AuditPurgeDays: 7,
RecoveryPeriodBlockSeconds: 30,
PreventCrossDataCenterPrimaryFailover: false,
LockShardTimeoutSeconds: 30,
WaitReplicasTimeoutSeconds: 30,
TopoInformationRefreshSeconds: 15,
RecoveryPollSeconds: 1,
}
}
func (config *Configuration) postReadAdjustments() error {
if config.MySQLVTOrcCredentialsConfigFile != "" {
mySQLConfig := struct {
Client struct {
User string
Password string
}
}{}
err := gcfg.ReadFileInto(&mySQLConfig, config.MySQLVTOrcCredentialsConfigFile)
if err != nil {
log.Fatalf("Failed to parse gcfg data from file: %+v", err)
} else {
log.Infof("Parsed vtorc credentials from %s", config.MySQLVTOrcCredentialsConfigFile)
config.MySQLVTOrcUser = mySQLConfig.Client.User
config.MySQLVTOrcPassword = mySQLConfig.Client.Password
}
}
{
// We accept password in the form "${SOME_ENV_VARIABLE}" in which case we pull
// the given variable from os env
submatch := envVariableRegexp.FindStringSubmatch(config.MySQLVTOrcPassword)
if len(submatch) > 1 {
config.MySQLVTOrcPassword = os.Getenv(submatch[1])
}
}
if config.MySQLTopologyCredentialsConfigFile != "" {
mySQLConfig := struct {
Client struct {
User string
Password string
}
}{}
err := gcfg.ReadFileInto(&mySQLConfig, config.MySQLTopologyCredentialsConfigFile)
if err != nil {
log.Fatalf("Failed to parse gcfg data from file: %+v", err)
} else {
log.Infof("Parsed topology credentials from %s", config.MySQLTopologyCredentialsConfigFile)
config.MySQLTopologyUser = mySQLConfig.Client.User
config.MySQLTopologyPassword = mySQLConfig.Client.Password
}
}
{
// We accept password in the form "${SOME_ENV_VARIABLE}" in which case we pull
// the given variable from os env
submatch := envVariableRegexp.FindStringSubmatch(config.MySQLTopologyPassword)
if len(submatch) > 1 {
config.MySQLTopologyPassword = os.Getenv(submatch[1])
}
}
if config.RecoveryPeriodBlockSeconds == 0 && config.RecoveryPeriodBlockMinutes > 0 {
// RecoveryPeriodBlockSeconds is a newer addition that overrides RecoveryPeriodBlockMinutes
// The code does not consider RecoveryPeriodBlockMinutes anymore, but RecoveryPeriodBlockMinutes
// still supported in config file for backwards compatibility
config.RecoveryPeriodBlockSeconds = config.RecoveryPeriodBlockMinutes * 60
}
if config.FailPrimaryPromotionIfSQLThreadNotUpToDate && config.DelayPrimaryPromotionIfSQLThreadNotUpToDate {
return fmt.Errorf("Cannot have both FailPrimaryPromotionIfSQLThreadNotUpToDate and DelayPrimaryPromotionIfSQLThreadNotUpToDate enabled")
}
if config.FailPrimaryPromotionOnLagMinutes > 0 && config.ReplicationLagQuery == "" {
return fmt.Errorf("nonzero FailPrimaryPromotionOnLagMinutes requires ReplicationLagQuery to be set")
}
if config.URLPrefix != "" {
// Ensure the prefix starts with "/" and has no trailing one.
config.URLPrefix = strings.TrimLeft(config.URLPrefix, "/")
config.URLPrefix = strings.TrimRight(config.URLPrefix, "/")
config.URLPrefix = "/" + config.URLPrefix
}
if config.IsSQLite() && config.SQLite3DataFile == "" {
return fmt.Errorf("SQLite3DataFile must be set when BackendDB is sqlite3")
}
if config.RaftEnabled && config.RaftDataDir == "" {
return fmt.Errorf("RaftDataDir must be defined since raft is enabled (RaftEnabled)")
}
if config.RaftEnabled && config.RaftBind == "" {
return fmt.Errorf("RaftBind must be defined since raft is enabled (RaftEnabled)")
}
if config.RaftAdvertise == "" {
config.RaftAdvertise = config.RaftBind
}
if config.HTTPAdvertise != "" {
u, err := url.Parse(config.HTTPAdvertise)
if err != nil {
return fmt.Errorf("Failed parsing HTTPAdvertise %s: %s", config.HTTPAdvertise, err.Error())
}
if u.Scheme == "" {
return fmt.Errorf("If specified, HTTPAdvertise must include scheme (http:// or https://)")
}
if u.Hostname() == "" {
return fmt.Errorf("If specified, HTTPAdvertise must include host name")
}
if u.Port() == "" {
return fmt.Errorf("If specified, HTTPAdvertise must include port number")
}
if u.Path != "" {
return fmt.Errorf("If specified, HTTPAdvertise must not specify a path")
}
if config.InstanceWriteBufferSize <= 0 {
config.BufferInstanceWrites = false
}
return fmt.Errorf("SQLite3DataFile must be set")
}
return nil
}
// TODO: Simplify the callers and delete this function
func (config *Configuration) IsSQLite() bool {
return strings.Contains(config.BackendDB, "sqlite")
return true
}
// TODO: Simplify the callers and delete this function
func (config *Configuration) IsMySQL() bool {
return config.BackendDB == "mysql" || config.BackendDB == ""
return false
}
// read reads configuration from given file, or silently skips if the file does not exist.

Просмотреть файл

@ -1,111 +1,249 @@
/*
Copyright 2022 The Vitess Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package config
import (
"testing"
"time"
"github.com/stretchr/testify/require"
)
func init() {
Config.HostnameResolveMethod = "none"
}
func TestUpdateConfigValuesFromFlags(t *testing.T) {
t.Run("defaults", func(t *testing.T) {
// Restore the changes we make to the Config parameter
defer func() {
Config = newConfiguration()
}()
defaultConfig := newConfiguration()
UpdateConfigValuesFromFlags()
require.Equal(t, defaultConfig, Config)
})
func TestRecoveryPeriodBlock(t *testing.T) {
{
c := newConfiguration()
c.RecoveryPeriodBlockSeconds = 0
c.RecoveryPeriodBlockMinutes = 0
err := c.postReadAdjustments()
require.NoError(t, err)
require.EqualValues(t, 0, c.RecoveryPeriodBlockSeconds)
}
{
c := newConfiguration()
c.RecoveryPeriodBlockSeconds = 30
c.RecoveryPeriodBlockMinutes = 1
err := c.postReadAdjustments()
require.NoError(t, err)
require.EqualValues(t, 30, c.RecoveryPeriodBlockSeconds)
}
{
c := newConfiguration()
c.RecoveryPeriodBlockSeconds = 0
c.RecoveryPeriodBlockMinutes = 2
err := c.postReadAdjustments()
require.NoError(t, err)
require.EqualValues(t, 120, c.RecoveryPeriodBlockSeconds)
}
{
c := newConfiguration()
c.RecoveryPeriodBlockSeconds = 15
c.RecoveryPeriodBlockMinutes = 0
err := c.postReadAdjustments()
require.NoError(t, err)
require.EqualValues(t, 15, c.RecoveryPeriodBlockSeconds)
}
}
t.Run("override auditPurgeDuration", func(t *testing.T) {
oldAuditPurgeDuration := auditPurgeDuration
auditPurgeDuration = 8 * time.Hour * 24
auditPurgeDuration += time.Second + 4*time.Minute
// Restore the changes we make
defer func() {
Config = newConfiguration()
auditPurgeDuration = oldAuditPurgeDuration
}()
func TestRaft(t *testing.T) {
{
c := newConfiguration()
c.RaftBind = "1.2.3.4:1008"
c.RaftDataDir = "/path/to/somewhere"
err := c.postReadAdjustments()
require.NoError(t, err)
require.EqualValues(t, c.RaftAdvertise, c.RaftBind)
}
{
c := newConfiguration()
c.RaftEnabled = true
err := c.postReadAdjustments()
require.Error(t, err)
}
{
c := newConfiguration()
c.RaftEnabled = true
c.RaftDataDir = "/path/to/somewhere"
err := c.postReadAdjustments()
require.NoError(t, err)
}
{
c := newConfiguration()
c.RaftEnabled = true
c.RaftDataDir = "/path/to/somewhere"
c.RaftBind = ""
err := c.postReadAdjustments()
require.Error(t, err)
}
}
testConfig := newConfiguration()
// auditPurgeDuration is supposed to be in multiples of days.
// If it is not, then we round down to the nearest number of days.
testConfig.AuditPurgeDays = 8
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
func TestHttpAdvertise(t *testing.T) {
{
c := newConfiguration()
c.HTTPAdvertise = ""
err := c.postReadAdjustments()
require.NoError(t, err)
}
{
c := newConfiguration()
c.HTTPAdvertise = "http://127.0.0.1:1234"
err := c.postReadAdjustments()
require.NoError(t, err)
}
{
c := newConfiguration()
c.HTTPAdvertise = "http://127.0.0.1"
err := c.postReadAdjustments()
require.Error(t, err)
}
{
c := newConfiguration()
c.HTTPAdvertise = "127.0.0.1:1234"
err := c.postReadAdjustments()
require.Error(t, err)
}
{
c := newConfiguration()
c.HTTPAdvertise = "http://127.0.0.1:1234/mypath"
err := c.postReadAdjustments()
require.Error(t, err)
}
t.Run("override sqliteDataFile", func(t *testing.T) {
oldSqliteDataFile := sqliteDataFile
sqliteDataFile = "newVal"
// Restore the changes we make
defer func() {
Config = newConfiguration()
sqliteDataFile = oldSqliteDataFile
}()
testConfig := newConfiguration()
testConfig.SQLite3DataFile = "newVal"
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override instancePollTime", func(t *testing.T) {
oldInstancePollTime := instancePollTime
instancePollTime = 7 * time.Second
// Restore the changes we make
defer func() {
Config = newConfiguration()
instancePollTime = oldInstancePollTime
}()
testConfig := newConfiguration()
testConfig.InstancePollSeconds = 7
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override snapshotTopologyInterval", func(t *testing.T) {
oldSnapshotTopologyInterval := snapshotTopologyInterval
snapshotTopologyInterval = 1 * time.Hour
// Restore the changes we make
defer func() {
Config = newConfiguration()
snapshotTopologyInterval = oldSnapshotTopologyInterval
}()
testConfig := newConfiguration()
testConfig.SnapshotTopologiesIntervalHours = 1
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override reasonableReplicationLag", func(t *testing.T) {
oldReasonableReplicationLag := reasonableReplicationLag
reasonableReplicationLag = 15 * time.Second
// Restore the changes we make
defer func() {
Config = newConfiguration()
reasonableReplicationLag = oldReasonableReplicationLag
}()
testConfig := newConfiguration()
testConfig.ReasonableReplicationLagSeconds = 15
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override auditFileLocation", func(t *testing.T) {
oldAuditFileLocation := auditFileLocation
auditFileLocation = "newFile"
// Restore the changes we make
defer func() {
Config = newConfiguration()
auditFileLocation = oldAuditFileLocation
}()
testConfig := newConfiguration()
testConfig.AuditLogFile = "newFile"
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override auditToBackend", func(t *testing.T) {
oldAuditToBackend := auditToBackend
auditToBackend = true
// Restore the changes we make
defer func() {
Config = newConfiguration()
auditToBackend = oldAuditToBackend
}()
testConfig := newConfiguration()
testConfig.AuditToBackendDB = true
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override auditToSyslog", func(t *testing.T) {
oldAuditToSyslog := auditToSyslog
auditToSyslog = true
// Restore the changes we make
defer func() {
Config = newConfiguration()
auditToSyslog = oldAuditToSyslog
}()
testConfig := newConfiguration()
testConfig.AuditToSyslog = true
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override recoveryPeriodBlockDuration", func(t *testing.T) {
oldRecoveryPeriodBlockDuration := recoveryPeriodBlockDuration
recoveryPeriodBlockDuration = 5 * time.Minute
// Restore the changes we make
defer func() {
Config = newConfiguration()
recoveryPeriodBlockDuration = oldRecoveryPeriodBlockDuration
}()
testConfig := newConfiguration()
testConfig.RecoveryPeriodBlockSeconds = 300
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override preventCrossCellFailover", func(t *testing.T) {
oldPreventCrossCellFailover := preventCrossCellFailover
preventCrossCellFailover = true
// Restore the changes we make
defer func() {
Config = newConfiguration()
preventCrossCellFailover = oldPreventCrossCellFailover
}()
testConfig := newConfiguration()
testConfig.PreventCrossDataCenterPrimaryFailover = true
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override lockShardTimeout", func(t *testing.T) {
oldLockShardTimeout := lockShardTimeout
lockShardTimeout = 3 * time.Hour
// Restore the changes we make
defer func() {
Config = newConfiguration()
lockShardTimeout = oldLockShardTimeout
}()
testConfig := newConfiguration()
testConfig.LockShardTimeoutSeconds = 10800
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override waitReplicasTimeout", func(t *testing.T) {
oldWaitReplicasTimeout := waitReplicasTimeout
waitReplicasTimeout = 3*time.Minute + 4*time.Second
// Restore the changes we make
defer func() {
Config = newConfiguration()
waitReplicasTimeout = oldWaitReplicasTimeout
}()
testConfig := newConfiguration()
testConfig.WaitReplicasTimeoutSeconds = 184
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override topoInformationRefreshDuration", func(t *testing.T) {
oldTopoInformationRefreshDuration := topoInformationRefreshDuration
topoInformationRefreshDuration = 1 * time.Second
// Restore the changes we make
defer func() {
Config = newConfiguration()
topoInformationRefreshDuration = oldTopoInformationRefreshDuration
}()
testConfig := newConfiguration()
testConfig.TopoInformationRefreshSeconds = 1
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
t.Run("override recoveryPollDuration", func(t *testing.T) {
oldRecoveryPollDuration := recoveryPollDuration
recoveryPollDuration = 15 * time.Second
// Restore the changes we make
defer func() {
Config = newConfiguration()
recoveryPollDuration = oldRecoveryPollDuration
}()
testConfig := newConfiguration()
testConfig.RecoveryPollSeconds = 15
UpdateConfigValuesFromFlags()
require.Equal(t, testConfig, Config)
})
}

Просмотреть файл

@ -18,10 +18,7 @@ package db
import (
"database/sql"
"fmt"
"strings"
"sync"
"time"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/config"
@ -29,13 +26,9 @@ import (
)
var (
EmptyArgs []any
Db DB = (*vtorcDB)(nil)
Db DB = (*vtorcDB)(nil)
)
var mysqlURI string
var dbMutex sync.Mutex
type DB interface {
QueryVTOrc(query string, argsArray []any, onRow func(sqlutils.RowMap) error) error
}
@ -60,81 +53,6 @@ func (dummyRes DummySQLResult) RowsAffected() (int64, error) {
return 1, nil
}
func getMySQLURI() string {
dbMutex.Lock()
defer dbMutex.Unlock()
if mysqlURI != "" {
return mysqlURI
}
mysqlURI := fmt.Sprintf("%s:%s@tcp(%s:%d)/%s?timeout=%ds&readTimeout=%ds&rejectReadOnly=%t&interpolateParams=true",
config.Config.MySQLVTOrcUser,
config.Config.MySQLVTOrcPassword,
config.Config.MySQLVTOrcHost,
config.Config.MySQLVTOrcPort,
config.Config.MySQLVTOrcDatabase,
config.Config.MySQLConnectTimeoutSeconds,
config.Config.MySQLVTOrcReadTimeoutSeconds,
config.Config.MySQLVTOrcRejectReadOnly,
)
if config.Config.MySQLVTOrcUseMutualTLS {
mysqlURI, _ = SetupMySQLVTOrcTLS(mysqlURI)
}
return mysqlURI
}
// OpenDiscovery returns a DB instance to access a topology instance.
// It has lower read timeout than OpenTopology and is intended to
// be used with low-latency discovery queries.
func OpenDiscovery(host string, port int) (*sql.DB, error) {
return openTopology(host, port, config.Config.MySQLDiscoveryReadTimeoutSeconds)
}
// OpenTopology returns a DB instance to access a topology instance.
func OpenTopology(host string, port int) (*sql.DB, error) {
return openTopology(host, port, config.Config.MySQLTopologyReadTimeoutSeconds)
}
func openTopology(host string, port int, readTimeout int) (db *sql.DB, err error) {
uri := fmt.Sprintf("%s:%s@tcp(%s:%d)/?timeout=%ds&readTimeout=%ds&interpolateParams=true",
config.Config.MySQLTopologyUser,
config.Config.MySQLTopologyPassword,
host, port,
config.Config.MySQLConnectTimeoutSeconds,
readTimeout,
)
if config.Config.MySQLTopologyUseMutualTLS ||
(config.Config.MySQLTopologyUseMixedTLS && requiresTLS(host, port, uri)) {
if uri, err = SetupMySQLTopologyTLS(uri); err != nil {
return nil, err
}
}
if db, _, err = sqlutils.GetDB(uri); err != nil {
return nil, err
}
if config.Config.MySQLConnectionLifetimeSeconds > 0 {
db.SetConnMaxLifetime(time.Duration(config.Config.MySQLConnectionLifetimeSeconds) * time.Second)
}
db.SetMaxOpenConns(config.MySQLTopologyMaxPoolConnections)
db.SetMaxIdleConns(config.MySQLTopologyMaxPoolConnections)
return db, err
}
func openOrchestratorMySQLGeneric() (db *sql.DB, fromCache bool, err error) {
uri := fmt.Sprintf("%s:%s@tcp(%s:%d)/?timeout=%ds&readTimeout=%ds&interpolateParams=true",
config.Config.MySQLVTOrcUser,
config.Config.MySQLVTOrcPassword,
config.Config.MySQLVTOrcHost,
config.Config.MySQLVTOrcPort,
config.Config.MySQLConnectTimeoutSeconds,
config.Config.MySQLVTOrcReadTimeoutSeconds,
)
if config.Config.MySQLVTOrcUseMutualTLS {
uri, _ = SetupMySQLVTOrcTLS(uri)
}
return sqlutils.GetDB(uri)
}
func IsSQLite() bool {
return config.Config.IsSQLite()
}
@ -142,65 +60,14 @@ func IsSQLite() bool {
// OpenTopology returns the DB instance for the vtorc backed database
func OpenVTOrc() (db *sql.DB, err error) {
var fromCache bool
if IsSQLite() {
db, fromCache, err = sqlutils.GetSQLiteDB(config.Config.SQLite3DataFile)
if err == nil && !fromCache {
log.Infof("Connected to vtorc backend: sqlite on %v", config.Config.SQLite3DataFile)
}
if db != nil {
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
}
} else {
if db, fromCache, err := openOrchestratorMySQLGeneric(); err != nil {
log.Errorf(err.Error())
return db, err
} else if !fromCache {
// first time ever we talk to MySQL
query := fmt.Sprintf("create database if not exists %s", config.Config.MySQLVTOrcDatabase)
if _, err := db.Exec(query); err != nil {
log.Errorf(err.Error())
return db, err
}
}
db, fromCache, err = sqlutils.GetDB(getMySQLURI())
if err == nil && !fromCache {
// do not show the password but do show what we connect to.
safeMySQLURI := fmt.Sprintf("%s:?@tcp(%s:%d)/%s?timeout=%ds", config.Config.MySQLVTOrcUser,
config.Config.MySQLVTOrcHost, config.Config.MySQLVTOrcPort, config.Config.MySQLVTOrcDatabase, config.Config.MySQLConnectTimeoutSeconds)
log.Infof("Connected to vtorc backend: %v", safeMySQLURI)
if config.Config.MySQLVTOrcMaxPoolConnections > 0 {
log.Infof("VTOrc pool SetMaxOpenConns: %d", config.Config.MySQLVTOrcMaxPoolConnections)
db.SetMaxOpenConns(config.Config.MySQLVTOrcMaxPoolConnections)
}
if config.Config.MySQLConnectionLifetimeSeconds > 0 {
db.SetConnMaxLifetime(time.Duration(config.Config.MySQLConnectionLifetimeSeconds) * time.Second)
}
}
}
db, fromCache, err = sqlutils.GetSQLiteDB(config.Config.SQLite3DataFile)
if err == nil && !fromCache {
if !config.Config.SkipOrchestratorDatabaseUpdate {
_ = initVTOrcDB(db)
}
// A low value here will trigger reconnects which could
// make the number of backend connections hit the tcp
// limit. That's bad. I could make this setting dynamic
// but then people need to know which value to use. For now
// allow up to 25% of MySQLVTOrcMaxPoolConnections
// to be idle. That should provide a good number which
// does not keep the maximum number of connections open but
// at the same time does not trigger disconnections and
// reconnections too frequently.
maxIdleConns := int(config.Config.MySQLVTOrcMaxPoolConnections * 25 / 100)
if maxIdleConns < 10 {
maxIdleConns = 10
}
log.Infof("Connecting to backend %s:%d: maxConnections: %d, maxIdleConns: %d",
config.Config.MySQLVTOrcHost,
config.Config.MySQLVTOrcPort,
config.Config.MySQLVTOrcMaxPoolConnections,
maxIdleConns)
db.SetMaxIdleConns(maxIdleConns)
log.Infof("Connected to vtorc backend: sqlite on %v", config.Config.SQLite3DataFile)
_ = initVTOrcDB(db)
}
if db != nil {
db.SetMaxOpenConns(1)
db.SetMaxIdleConns(1)
}
return db, err
}
@ -212,23 +79,6 @@ func translateStatement(statement string) (string, error) {
return statement, nil
}
// versionIsDeployed checks if given version has already been deployed
func versionIsDeployed(db *sql.DB) (result bool, err error) {
query := `
select
count(*) as is_deployed
from
vtorc_db_deployments
where
deployed_version = ?
`
err = db.QueryRow(query, config.RuntimeCLIFlags.ConfiguredVersion).Scan(&result)
// err means the table 'vtorc_db_deployments' does not even exist, in which case we proceed
// to deploy.
// If there's another error to this, like DB gone bad, then we're about to find out anyway.
return result, err
}
// registerVTOrcDeployment updates the vtorc_metadata table upon successful deployment
func registerVTOrcDeployment(db *sql.DB) error {
query := `
@ -238,10 +88,9 @@ func registerVTOrcDeployment(db *sql.DB) error {
?, NOW()
)
`
if _, err := execInternal(db, query, config.RuntimeCLIFlags.ConfiguredVersion); err != nil {
if _, err := execInternal(db, query, ""); err != nil {
log.Fatalf("Unable to write to vtorc_metadata: %+v", err)
}
log.Infof("Migrated database schema to version [%+v]", config.RuntimeCLIFlags.ConfiguredVersion)
return nil
}
@ -309,15 +158,6 @@ func deployStatements(db *sql.DB, queries []string) error {
// application's lifetime.
func initVTOrcDB(db *sql.DB) error {
log.Info("Initializing vtorc")
versionAlreadyDeployed, err := versionIsDeployed(db)
if versionAlreadyDeployed && config.RuntimeCLIFlags.ConfiguredVersion != "" && err == nil {
// Already deployed with this version
return nil
}
if config.Config.PanicIfDifferentDatabaseDeploy && config.RuntimeCLIFlags.ConfiguredVersion != "" && !versionAlreadyDeployed {
log.Fatalf("PanicIfDifferentDatabaseDeploy is set. Configured version %s is not the version found in the database", config.RuntimeCLIFlags.ConfiguredVersion)
}
log.Info("Migrating database schema")
_ = deployStatements(db, generateSQLBase)
_ = deployStatements(db, generateSQLPatches)
@ -391,42 +231,6 @@ func QueryVTOrc(query string, argsArray []any, onRow func(sqlutils.RowMap) error
return err
}
// QueryVTOrcRowsMapBuffered
func QueryVTOrcRowsMapBuffered(query string, onRow func(sqlutils.RowMap) error) error {
query, err := translateStatement(query)
if err != nil {
log.Fatalf("Cannot query vtorc: %+v; query=%+v", err, query)
return err
}
db, err := OpenVTOrc()
if err != nil {
return err
}
return sqlutils.QueryRowsMapBuffered(db, query, onRow)
}
// QueryVTOrcBuffered
func QueryVTOrcBuffered(query string, argsArray []any, onRow func(sqlutils.RowMap) error) error {
query, err := translateStatement(query)
if err != nil {
log.Fatalf("Cannot query vtorc: %+v; query=%+v", err, query)
return err
}
db, err := OpenVTOrc()
if err != nil {
return err
}
if argsArray == nil {
argsArray = EmptyArgs
}
if err = sqlutils.QueryRowsMapBuffered(db, query, onRow, argsArray...); err != nil {
log.Warning(err.Error())
}
return err
}
// ReadTimeNow reads and returns the current timestamp as string. This is an unfortunate workaround
// to support both MySQL and SQLite in all possible timezones. SQLite only speaks UTC where MySQL has
// timezone support. By reading the time as string we get the database's de-facto notion of the time,

Просмотреть файл

@ -41,8 +41,6 @@ var generateSQLBase = []string{
exec_source_log_pos bigint(20) unsigned NOT NULL,
replication_lag_seconds bigint(20) unsigned DEFAULT NULL,
replica_lag_seconds bigint(20) unsigned DEFAULT NULL,
num_replica_hosts int(10) unsigned NOT NULL,
replica_hosts text CHARACTER SET ascii NOT NULL,
cluster_name varchar(128) CHARACTER SET ascii NOT NULL,
PRIMARY KEY (hostname,port)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
@ -349,20 +347,6 @@ var generateSQLBase = []string{
`
CREATE INDEX unresolved_hostname_idx_hostname_unresolve ON hostname_unresolve (unresolved_hostname)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_pool (
hostname varchar(128) CHARACTER SET ascii NOT NULL,
port smallint(5) unsigned NOT NULL,
pool varchar(128) NOT NULL,
PRIMARY KEY (hostname, port, pool)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX pool_idx ON database_instance_pool
`,
`
CREATE INDEX pool_idx_database_instance_pool ON database_instance_pool (pool)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_topology_history (
snapshot_unix_timestamp INT UNSIGNED NOT NULL,
@ -419,7 +403,6 @@ var generateSQLBase = []string{
analysis varchar(128) NOT NULL,
cluster_name varchar(128) NOT NULL,
count_affected_replicas int unsigned NOT NULL,
replica_hosts text NOT NULL,
PRIMARY KEY (detection_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
@ -665,29 +648,6 @@ var generateSQLBase = []string{
`
CREATE INDEX last_seen_idx_database_instance_binlog_files_history ON database_instance_binlog_files_history (last_seen)
`,
`
CREATE TABLE IF NOT EXISTS access_token (
access_token_id bigint unsigned not null auto_increment,
public_token varchar(128) NOT NULL,
secret_token varchar(128) NOT NULL,
generated_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
generated_by varchar(128) CHARACTER SET utf8 NOT NULL,
is_acquired tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (access_token_id)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
`,
`
DROP INDEX public_token_idx ON access_token
`,
`
CREATE UNIQUE INDEX public_token_uidx_access_token ON access_token (public_token)
`,
`
DROP INDEX generated_at_idx ON access_token
`,
`
CREATE INDEX generated_at_idx_access_token ON access_token (generated_at)
`,
`
CREATE TABLE IF NOT EXISTS database_instance_recent_relaylog_history (
hostname varchar(128) NOT NULL,

Просмотреть файл

@ -122,11 +122,6 @@ var generateSQLPatches = []string{
topology_recovery
ADD COLUMN count_affected_replicas int unsigned NOT NULL
`,
`
ALTER TABLE
topology_recovery
ADD COLUMN replica_hosts text CHARACTER SET ascii NOT NULL
`,
`
ALTER TABLE hostname_unresolve
ADD COLUMN last_registered TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
@ -194,7 +189,7 @@ var generateSQLPatches = []string{
`
ALTER TABLE
topology_recovery
ADD COLUMN participating_instances text CHARACTER SET ascii NOT NULL after replica_hosts
ADD COLUMN participating_instances text CHARACTER SET ascii NOT NULL after count_affected_replicas
`,
`
ALTER TABLE
@ -310,21 +305,6 @@ var generateSQLPatches = []string{
database_instance_coordinates_history
ADD COLUMN last_seen timestamp NOT NULL DEFAULT '1971-01-01 00:00:00' AFTER recorded_timestamp
`,
`
ALTER TABLE
access_token
ADD COLUMN is_reentrant TINYINT UNSIGNED NOT NULL default 0
`,
`
ALTER TABLE
access_token
ADD COLUMN acquired_at timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE
database_instance_pool
ADD COLUMN registered_at timestamp NOT NULL DEFAULT '1971-01-01 00:00:00'
`,
`
ALTER TABLE
database_instance

Просмотреть файл

@ -82,7 +82,7 @@ func CreateOrReturnQueue(name string) *Queue {
name: name,
queuedKeys: make(map[inst.InstanceKey]time.Time),
consumedKeys: make(map[inst.InstanceKey]time.Time),
queue: make(chan inst.InstanceKey, config.Config.DiscoveryQueueCapacity),
queue: make(chan inst.InstanceKey, config.DiscoveryQueueCapacity),
}
go q.startMonitoring()
@ -119,8 +119,8 @@ func (q *Queue) collectStatistics() {
q.metrics = append(q.metrics, QueueMetric{Queued: len(q.queuedKeys), Active: len(q.consumedKeys)})
// remove old entries if we get too big
if len(q.metrics) > config.Config.DiscoveryQueueMaxStatisticsSize {
q.metrics = q.metrics[len(q.metrics)-config.Config.DiscoveryQueueMaxStatisticsSize:]
if len(q.metrics) > config.DiscoveryQueueMaxStatisticsSize {
q.metrics = q.metrics[len(q.metrics)-config.DiscoveryQueueMaxStatisticsSize:]
}
}

Просмотреть файл

@ -1,66 +0,0 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package http
import (
"encoding/json"
"net/http"
)
// APIResponseCode is an OK/ERROR response code
type APIResponseCode int
const (
ERROR APIResponseCode = iota
OK
)
func (apiResponseCode *APIResponseCode) MarshalJSON() ([]byte, error) {
return json.Marshal(apiResponseCode.String())
}
func (apiResponseCode *APIResponseCode) String() string {
switch *apiResponseCode {
case ERROR:
return "ERROR"
case OK:
return "OK"
}
return "unknown"
}
// HTTPStatus returns the respective HTTP status for this response
func (apiResponseCode *APIResponseCode) HTTPStatus() int {
switch *apiResponseCode {
case ERROR:
return http.StatusInternalServerError
case OK:
return http.StatusOK
}
return http.StatusNotImplemented
}
// APIResponse is a response returned as JSON to various requests.
type APIResponse struct {
Code APIResponseCode
Message string
Details any
}
type API struct {
URLPrefix string
}

Просмотреть файл

@ -1,140 +0,0 @@
/*
Copyright 2015 Shlomi Noach, courtesy Booking.com
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package http
import (
"fmt"
"net/http"
"strings"
"github.com/martini-contrib/auth"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtorc/os"
"vitess.io/vitess/go/vt/vtorc/process"
)
func getProxyAuthUser(req *http.Request) string {
for _, user := range req.Header[config.Config.AuthUserHeader] {
return user
}
return ""
}
// isAuthorizedForAction checks req to see whether authenticated user has write-privileges.
// This depends on configured authentication method.
func isAuthorizedForAction(req *http.Request, user auth.User) bool {
if config.Config.ReadOnly {
return false
}
switch strings.ToLower(config.Config.AuthenticationMethod) {
case "basic":
{
// The mere fact we're here means the user has passed authentication
return true
}
case "multi":
return string(user) != "readonly"
case "proxy":
{
authUser := getProxyAuthUser(req)
for _, configPowerAuthUser := range config.Config.PowerAuthUsers {
if configPowerAuthUser == "*" || configPowerAuthUser == authUser {
return true
}
}
// check the user's group is one of those listed here
if len(config.Config.PowerAuthGroups) > 0 && os.UserInGroups(authUser, config.Config.PowerAuthGroups) {
return true
}
return false
}
case "token":
{
cookie, err := req.Cookie("access-token")
if err != nil {
return false
}
publicToken := strings.Split(cookie.Value, ":")[0]
secretToken := strings.Split(cookie.Value, ":")[1]
result, _ := process.TokenIsValid(publicToken, secretToken)
return result
}
case "oauth":
{
return false
}
default:
{
// Default: no authentication method
return true
}
}
}
func authenticateToken(publicToken string, resp http.ResponseWriter) error {
secretToken, err := process.AcquireAccessToken(publicToken)
if err != nil {
return err
}
cookieValue := fmt.Sprintf("%s:%s", publicToken, secretToken)
cookie := &http.Cookie{Name: "access-token", Value: cookieValue, Path: "/"}
http.SetCookie(resp, cookie)
return nil
}
// getUserID returns the authenticated user id, if available, depending on authentication method.
func getUserID(req *http.Request, user auth.User) string {
if config.Config.ReadOnly {
return ""
}
switch strings.ToLower(config.Config.AuthenticationMethod) {
case "basic":
{
return string(user)
}
case "multi":
{
return string(user)
}
case "proxy":
{
return getProxyAuthUser(req)
}
case "token":
{
return ""
}
default:
{
return ""
}
}
}
// figureClusterName is a convenience function to get a cluster name from hints
func figureClusterName(hint string) (clusterName string, err error) {
if hint == "" {
return "", fmt.Errorf("Unable to determine cluster name by empty hint")
}
instanceKey, _ := inst.ParseRawInstanceKey(hint)
return inst.FigureClusterName(hint, instanceKey, nil)
}

Просмотреть файл

@ -1,463 +0,0 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package http
import (
"expvar"
"fmt"
"net/http"
"net/http/pprof"
"strconv"
"text/template"
"github.com/go-martini/martini"
"github.com/martini-contrib/auth"
"github.com/martini-contrib/render"
"github.com/rcrowley/go-metrics"
"github.com/rcrowley/go-metrics/exp"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/inst"
)
// Web is the web requests server, mapping each request to a web page
type Web struct {
URLPrefix string
}
var HTTPWeb = Web{}
func (httpWeb *Web) getInstanceKey(host string, port string) (inst.InstanceKey, error) {
instanceKey := inst.InstanceKey{Hostname: host}
var err error
if instanceKey.Port, err = strconv.Atoi(port); err != nil {
return instanceKey, fmt.Errorf("Invalid port: %s", port)
}
return instanceKey, err
}
func (httpWeb *Web) AccessToken(params martini.Params, r render.Render, req *http.Request, resp http.ResponseWriter, user auth.User) {
publicToken := template.JSEscapeString(req.URL.Query().Get("publicToken"))
err := authenticateToken(publicToken, resp)
if err != nil {
r.JSON(200, &APIResponse{Code: ERROR, Message: fmt.Sprintf("%+v", err)})
return
}
r.Redirect(httpWeb.URLPrefix + "/")
}
func (httpWeb *Web) Index(params martini.Params, r render.Render, req *http.Request, user auth.User) {
// Redirect index so that all web URLs begin with "/web/".
// We also redirect /web/ to /web/clusters so that
// the Clusters page has a single canonical URL.
r.Redirect(httpWeb.URLPrefix + "/web/clusters")
}
func (httpWeb *Web) Clusters(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/clusters", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "clusters",
"autoshow_problems": false,
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"removeTextFromHostnameDisplay": config.Config.RemoveTextFromHostnameDisplay,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) ClustersAnalysis(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/clusters_analysis", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "clusters",
"autoshow_problems": false,
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"removeTextFromHostnameDisplay": config.Config.RemoveTextFromHostnameDisplay,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Cluster(params martini.Params, r render.Render, req *http.Request, user auth.User) {
clusterName, _ := figureClusterName(params["clusterName"])
r.HTML(200, "templates/cluster", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "cluster",
"clusterName": clusterName,
"autoshow_problems": true,
"contextMenuVisible": true,
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"removeTextFromHostnameDisplay": config.Config.RemoveTextFromHostnameDisplay,
"compactDisplay": template.JSEscapeString(req.URL.Query().Get("compact")),
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) ClusterByAlias(params martini.Params, r render.Render, req *http.Request, user auth.User) {
params["clusterName"] = params["clusterAlias"]
httpWeb.Cluster(params, r, req, user)
}
func (httpWeb *Web) ClusterByInstance(params martini.Params, r render.Render, req *http.Request, user auth.User) {
instanceKey, err := httpWeb.getInstanceKey(params["host"], params["port"])
if err != nil {
r.JSON(200, &APIResponse{Code: ERROR, Message: err.Error()})
return
}
instance, found, err := inst.ReadInstance(&instanceKey)
if (!found) || (err != nil) {
r.JSON(200, &APIResponse{Code: ERROR, Message: fmt.Sprintf("Cannot read instance: %+v", instanceKey)})
return
}
// Willing to accept the case of multiple clusters; we just present one
if instance.ClusterName == "" && err != nil {
r.JSON(200, &APIResponse{Code: ERROR, Message: fmt.Sprintf("%+v", err)})
return
}
params["clusterName"] = instance.ClusterName
httpWeb.Cluster(params, r, req, user)
}
func (httpWeb *Web) ClusterPools(params martini.Params, r render.Render, req *http.Request, user auth.User) {
clusterName, _ := figureClusterName(params["clusterName"])
r.HTML(200, "templates/cluster_pools", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "cluster pools",
"clusterName": clusterName,
"autoshow_problems": false, // because pool screen by default expands all hosts
"contextMenuVisible": true,
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"removeTextFromHostnameDisplay": config.Config.RemoveTextFromHostnameDisplay,
"compactDisplay": template.JSEscapeString(req.URL.Query().Get("compact")),
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Search(params martini.Params, r render.Render, req *http.Request, user auth.User) {
searchString := params["searchString"]
if searchString == "" {
searchString = req.URL.Query().Get("s")
}
searchString = template.JSEscapeString(searchString)
r.HTML(200, "templates/search", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "search",
"searchString": searchString,
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Discover(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/discover", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "discover",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Audit(params martini.Params, r render.Render, req *http.Request, user auth.User) {
page, err := strconv.Atoi(params["page"])
if err != nil {
page = 0
}
r.HTML(200, "templates/audit", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "audit",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"page": page,
"auditHostname": params["host"],
"auditPort": params["port"],
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) AuditRecovery(params martini.Params, r render.Render, req *http.Request, user auth.User) {
page, err := strconv.Atoi(params["page"])
if err != nil {
page = 0
}
recoveryID, err := strconv.ParseInt(params["id"], 10, 0)
if err != nil {
recoveryID = 0
}
recoveryUID := params["uid"]
clusterAlias := params["clusterAlias"]
clusterName, _ := figureClusterName(params["clusterName"])
r.HTML(200, "templates/audit_recovery", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "audit-recovery",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"page": page,
"clusterName": clusterName,
"clusterAlias": clusterAlias,
"recoveryId": recoveryID,
"recoveryUid": recoveryUID,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) AuditFailureDetection(params martini.Params, r render.Render, req *http.Request, user auth.User) {
page, err := strconv.Atoi(params["page"])
if err != nil {
page = 0
}
detectionID, err := strconv.ParseInt(params["id"], 10, 0)
if err != nil {
detectionID = 0
}
clusterAlias := params["clusterAlias"]
r.HTML(200, "templates/audit_failure_detection", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "audit-failure-detection",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"page": page,
"detectionId": detectionID,
"clusterAlias": clusterAlias,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Agents(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/agents", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "agents",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Agent(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/agent", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "agent",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"agentHost": params["host"],
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) AgentSeedDetails(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/agent_seed_details", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "agent seed details",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"seedId": params["seedId"],
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Seeds(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/seeds", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "seeds",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Home(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/home", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "home",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) About(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/about", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "about",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) KeepCalm(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/keep-calm", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "Keep Calm",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) FAQ(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/faq", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "FAQ",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) Status(params martini.Params, r render.Render, req *http.Request, user auth.User) {
r.HTML(200, "templates/status", map[string]any{
"agentsHttpActive": config.Config.ServeAgentsHTTP,
"title": "status",
"authorizedForAction": isAuthorizedForAction(req, user),
"userId": getUserID(req, user),
"autoshow_problems": false,
"prefix": httpWeb.URLPrefix,
"webMessage": config.Config.WebMessage,
})
}
func (httpWeb *Web) registerWebRequest(m *martini.ClassicMartini, path string, handler martini.Handler) {
fullPath := fmt.Sprintf("%s/web/%s", httpWeb.URLPrefix, path)
if path == "/" {
fullPath = fmt.Sprintf("%s/", httpWeb.URLPrefix)
}
m.Get(fullPath, handler)
}
// RegisterRequests makes for the de-facto list of known Web calls
func (httpWeb *Web) RegisterRequests(m *martini.ClassicMartini) {
httpWeb.registerWebRequest(m, "access-token", httpWeb.AccessToken)
httpWeb.registerWebRequest(m, "", httpWeb.Index)
httpWeb.registerWebRequest(m, "/", httpWeb.Index)
httpWeb.registerWebRequest(m, "home", httpWeb.About)
httpWeb.registerWebRequest(m, "about", httpWeb.About)
httpWeb.registerWebRequest(m, "keep-calm", httpWeb.KeepCalm)
httpWeb.registerWebRequest(m, "faq", httpWeb.FAQ)
httpWeb.registerWebRequest(m, "status", httpWeb.Status)
httpWeb.registerWebRequest(m, "clusters", httpWeb.Clusters)
httpWeb.registerWebRequest(m, "clusters-analysis", httpWeb.ClustersAnalysis)
httpWeb.registerWebRequest(m, "cluster/:clusterName", httpWeb.Cluster)
httpWeb.registerWebRequest(m, "cluster/alias/:clusterAlias", httpWeb.ClusterByAlias)
httpWeb.registerWebRequest(m, "cluster/instance/:host/:port", httpWeb.ClusterByInstance)
httpWeb.registerWebRequest(m, "cluster-pools/:clusterName", httpWeb.ClusterPools)
httpWeb.registerWebRequest(m, "search/:searchString", httpWeb.Search)
httpWeb.registerWebRequest(m, "search", httpWeb.Search)
httpWeb.registerWebRequest(m, "discover", httpWeb.Discover)
httpWeb.registerWebRequest(m, "audit", httpWeb.Audit)
httpWeb.registerWebRequest(m, "audit/:page", httpWeb.Audit)
httpWeb.registerWebRequest(m, "audit/instance/:host/:port", httpWeb.Audit)
httpWeb.registerWebRequest(m, "audit/instance/:host/:port/:page", httpWeb.Audit)
httpWeb.registerWebRequest(m, "audit-recovery", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/:page", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/id/:id", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/uid/:uid", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/cluster/:clusterName", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/cluster/:clusterName/:page", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/alias/:clusterAlias", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-recovery/alias/:clusterAlias/:page", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "audit-failure-detection", httpWeb.AuditFailureDetection)
httpWeb.registerWebRequest(m, "audit-failure-detection/:page", httpWeb.AuditFailureDetection)
httpWeb.registerWebRequest(m, "audit-failure-detection/id/:id", httpWeb.AuditFailureDetection)
httpWeb.registerWebRequest(m, "audit-failure-detection/alias/:clusterAlias", httpWeb.AuditFailureDetection)
httpWeb.registerWebRequest(m, "audit-failure-detection/alias/:clusterAlias/:page", httpWeb.AuditFailureDetection)
httpWeb.registerWebRequest(m, "audit-recovery-steps/:uid", httpWeb.AuditRecovery)
httpWeb.registerWebRequest(m, "agents", httpWeb.Agents)
httpWeb.registerWebRequest(m, "agent/:host", httpWeb.Agent)
httpWeb.registerWebRequest(m, "seed-details/:seedId", httpWeb.AgentSeedDetails)
httpWeb.registerWebRequest(m, "seeds", httpWeb.Seeds)
httpWeb.RegisterDebug(m)
}
// RegisterDebug adds handlers for /debug/vars (expvar) and /debug/pprof (net/http/pprof) support
func (httpWeb *Web) RegisterDebug(m *martini.ClassicMartini) {
m.Get(httpWeb.URLPrefix+"/debug/vars", func(w http.ResponseWriter, r *http.Request) {
// from expvar.go, since the expvarHandler isn't exported :(
w.Header().Set("Content-Type", "application/json; charset=utf-8")
fmt.Fprintf(w, "{\n")
first := true
expvar.Do(func(kv expvar.KeyValue) {
if !first {
fmt.Fprintf(w, ",\n")
}
first = false
fmt.Fprintf(w, "%q: %s", kv.Key, kv.Value)
})
fmt.Fprintf(w, "\n}\n")
})
// list all the /debug/ endpoints we want
m.Get(httpWeb.URLPrefix+"/debug/pprof", pprof.Index)
m.Get(httpWeb.URLPrefix+"/debug/pprof/cmdline", pprof.Cmdline)
m.Get(httpWeb.URLPrefix+"/debug/pprof/profile", pprof.Profile)
m.Get(httpWeb.URLPrefix+"/debug/pprof/symbol", pprof.Symbol)
m.Post(httpWeb.URLPrefix+"/debug/pprof/symbol", pprof.Symbol)
m.Get(httpWeb.URLPrefix+"/debug/pprof/block", pprof.Handler("block").ServeHTTP)
m.Get(httpWeb.URLPrefix+"/debug/pprof/heap", pprof.Handler("heap").ServeHTTP)
m.Get(httpWeb.URLPrefix+"/debug/pprof/goroutine", pprof.Handler("goroutine").ServeHTTP)
m.Get(httpWeb.URLPrefix+"/debug/pprof/threadcreate", pprof.Handler("threadcreate").ServeHTTP)
// go-metrics
m.Get(httpWeb.URLPrefix+"/debug/metrics", exp.ExpHandler(metrics.DefaultRegistry))
}

Просмотреть файл

@ -136,7 +136,6 @@ type ReplicationAnalysis struct {
CountReplicasFailingToConnectToPrimary uint
CountDowntimedReplicas uint
ReplicationDepth uint
Replicas InstanceKeyMap
IsFailingToConnectToPrimary bool
ReplicationStopped bool
Analysis AnalysisCode
@ -193,12 +192,6 @@ func (replicationAnalysis *ReplicationAnalysis) MarshalJSON() ([]byte, error) {
return json.Marshal(i)
}
// ReadReplicaHostsFromString parses and reads replica keys from comma delimited string
func (replicationAnalysis *ReplicationAnalysis) ReadReplicaHostsFromString(replicaHostsString string) error {
replicationAnalysis.Replicas = *NewInstanceKeyMap()
return replicationAnalysis.Replicas.ReadCommaDelimitedList(replicaHostsString)
}
// AnalysisString returns a human friendly description of all analysis issues
func (replicationAnalysis *ReplicationAnalysis) AnalysisString() string {
result := []string{}

Просмотреть файл

@ -18,7 +18,6 @@ package inst
import (
"fmt"
"regexp"
"time"
"vitess.io/vitess/go/vt/log"
@ -149,13 +148,6 @@ func GetReplicationAnalysis(clusterName string, hints *ReplicationAnalysisHints)
0
) AS count_replicas_failing_to_connect_to_primary,
MIN(primary_instance.replication_depth) AS replication_depth,
GROUP_CONCAT(
concat(
replica_instance.Hostname,
':',
replica_instance.Port
)
) as replica_hosts,
MIN(
primary_instance.replica_sql_running = 1
AND primary_instance.replica_io_running = 0
@ -419,9 +411,6 @@ func GetReplicationAnalysis(clusterName string, hints *ReplicationAnalysisHints)
a.IsBinlogServer = m.GetBool("is_binlog_server")
a.ClusterDetails.ReadRecoveryInfo()
a.Replicas = *NewInstanceKeyMap()
_ = a.Replicas.ReadCommaDelimitedList(m.GetString("replica_hosts"))
countValidOracleGTIDReplicas := m.GetUint("count_valid_oracle_gtid_replicas")
a.OracleGTIDImmediateTopology = countValidOracleGTIDReplicas == a.CountValidReplicas && a.CountValidReplicas > 0
countValidMariaDBGTIDReplicas := m.GetUint("count_valid_mariadb_gtid_replicas")
@ -605,11 +594,6 @@ func GetReplicationAnalysis(clusterName string, hints *ReplicationAnalysisHints)
if a.Analysis == NoProblem && len(a.StructureAnalysis) == 0 && !hints.IncludeNoProblem {
return
}
for _, filter := range config.Config.RecoveryIgnoreHostnameFilters {
if matched, _ := regexp.MatchString(filter, a.AnalyzedInstanceKey.Hostname); matched {
return
}
}
if a.IsDowntimed {
a.SkippableDueToDowntime = true
}
@ -772,43 +756,10 @@ func ExpireInstanceAnalysisChangelog() error {
where
analysis_timestamp < now() - interval ? hour
`,
config.Config.UnseenInstanceForgetHours,
config.UnseenInstanceForgetHours,
)
if err != nil {
log.Error(err)
}
return err
}
// ReadReplicationAnalysisChangelog
func ReadReplicationAnalysisChangelog() (res [](*ReplicationAnalysisChangelog), err error) {
query := `
select
hostname,
port,
analysis_timestamp,
analysis
from
database_instance_analysis_changelog
order by
hostname, port, changelog_id
`
analysisChangelog := &ReplicationAnalysisChangelog{}
err = db.QueryVTOrcRowsMap(query, func(m sqlutils.RowMap) error {
key := InstanceKey{Hostname: m.GetString("hostname"), Port: m.GetInt("port")}
if !analysisChangelog.AnalyzedInstanceKey.Equals(&key) {
analysisChangelog = &ReplicationAnalysisChangelog{AnalyzedInstanceKey: key, Changelog: []string{}}
res = append(res, analysisChangelog)
}
analysisEntry := fmt.Sprintf("%s;%s,", m.GetString("analysis_timestamp"), m.GetString("analysis"))
analysisChangelog.Changelog = append(analysisChangelog.Changelog, analysisEntry)
return nil
})
if err != nil {
log.Error(err)
}
return res, err
}

Просмотреть файл

@ -24,7 +24,6 @@ import (
)
func init() {
config.Config.HostnameResolveMethod = "none"
config.MarkConfigurationLoaded()
}

Просмотреть файл

@ -10,7 +10,6 @@ import (
var testCoordinates = BinlogCoordinates{LogFile: "mysql-bin.000010", LogPos: 108}
func init() {
config.Config.HostnameResolveMethod = "none"
config.MarkConfigurationLoaded()
}

Просмотреть файл

@ -20,7 +20,6 @@ import (
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
"vitess.io/vitess/go/vt/vtctl/reparentutil/promotionrule"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/db"
)
@ -60,7 +59,7 @@ func ExpireCandidateInstances() error {
_, err := db.ExecVTOrc(`
delete from candidate_database_instance
where last_suggested < NOW() - INTERVAL ? MINUTE
`, config.Config.CandidateInstanceExpireMinutes,
`, config.CandidateInstanceExpireMinutes,
)
if err != nil {
log.Error(err)
@ -69,45 +68,3 @@ func ExpireCandidateInstances() error {
}
return ExecDBWriteFunc(writeFunc)
}
// BulkReadCandidateDatabaseInstance returns a slice of
// CandidateDatabaseInstance converted to JSON.
/*
root@myVTOrc [vtorc]> select * from candidate_database_instance;
+-------------------+------+---------------------+----------+----------------+
| hostname | port | last_suggested | priority | promotion_rule |
+-------------------+------+---------------------+----------+----------------+
| host1.example.com | 3306 | 2016-11-22 17:41:06 | 1 | prefer |
| host2.example.com | 3306 | 2016-11-22 17:40:24 | 1 | prefer |
+-------------------+------+---------------------+----------+----------------+
2 rows in set (0.00 sec)
*/
func BulkReadCandidateDatabaseInstance() ([]CandidateDatabaseInstance, error) {
var candidateDatabaseInstances []CandidateDatabaseInstance
// Read all promotion rules from the table
query := `
SELECT
hostname,
port,
promotion_rule,
last_suggested,
last_suggested + INTERVAL ? MINUTE AS promotion_rule_expiry
FROM
candidate_database_instance
`
err := db.QueryVTOrc(query, sqlutils.Args(config.Config.CandidateInstanceExpireMinutes), func(m sqlutils.RowMap) error {
cdi := CandidateDatabaseInstance{
Hostname: m.GetString("hostname"),
Port: m.GetInt("port"),
PromotionRule: promotionrule.CandidatePromotionRule(m.GetString("promotion_rule")),
LastSuggestedString: m.GetString("last_suggested"),
PromotionRuleExpiry: m.GetString("promotion_rule_expiry"),
}
// add to end of candidateDatabaseInstances
candidateDatabaseInstances = append(candidateDatabaseInstances, cdi)
return nil
})
return candidateDatabaseInstances, err
}

Просмотреть файл

@ -16,12 +16,6 @@
package inst
import (
"regexp"
"vitess.io/vitess/go/vt/vtorc/config"
)
// ClusterInfo makes for a cluster status/info summary
type ClusterInfo struct {
ClusterName string
@ -34,21 +28,6 @@ type ClusterInfo struct {
// ReadRecoveryInfo
func (clusterInfo *ClusterInfo) ReadRecoveryInfo() {
clusterInfo.HasAutomatedPrimaryRecovery = clusterInfo.filtersMatchCluster(config.Config.RecoverPrimaryClusterFilters)
clusterInfo.HasAutomatedIntermediatePrimaryRecovery = clusterInfo.filtersMatchCluster(config.Config.RecoverIntermediatePrimaryClusterFilters)
}
// filtersMatchCluster will see whether the given filters match the given cluster details
func (clusterInfo *ClusterInfo) filtersMatchCluster(filters []string) bool {
for _, filter := range filters {
if filter == clusterInfo.ClusterName {
return true
}
if filter == "*" {
return true
} else if matched, _ := regexp.MatchString(filter, clusterInfo.ClusterName); matched && filter != "" {
return true
}
}
return false
clusterInfo.HasAutomatedPrimaryRecovery = true
clusterInfo.HasAutomatedIntermediatePrimaryRecovery = true
}

Просмотреть файл

@ -49,7 +49,7 @@ func ExpireClusterDomainName() error {
_, err := db.ExecVTOrc(`
delete from cluster_domain_name
where last_registered < NOW() - INTERVAL ? MINUTE
`, config.Config.ExpiryHostnameResolvesMinutes,
`, config.ExpiryHostnameResolvesMinutes,
)
if err != nil {
log.Error(err)

Просмотреть файл

@ -19,15 +19,10 @@ package inst
import (
"database/sql"
"encoding/json"
"fmt"
"strconv"
"strings"
"time"
math "vitess.io/vitess/go/vt/vtorc/util"
"vitess.io/vitess/go/vt/vtctl/reparentutil/promotionrule"
"vitess.io/vitess/go/vt/vtorc/config"
)
const ReasonableDiscoveryLatency = 500 * time.Millisecond
@ -79,7 +74,6 @@ type Instance struct {
primaryExecutedGtidSet string // Not exported
ReplicationLagSeconds sql.NullInt64
Replicas InstanceKeyMap
ClusterName string
DataCenter string
Region string
@ -101,7 +95,6 @@ type Instance struct {
IsUpToDate bool
IsRecentlyChecked bool
SecondsSinceLastSeen sql.NullInt64
CountMySQLSnapshots int
// Careful. IsCandidate and PromotionRule are used together
// and probably need to be merged. IsCandidate's value may
@ -145,7 +138,6 @@ type Instance struct {
func NewInstance() *Instance {
return &Instance{
Replicas: make(map[InstanceKey]bool),
ReplicationGroupMembers: make(map[InstanceKey]bool),
Problems: []string{},
}
@ -345,256 +337,7 @@ func (instance *Instance) UsingGTID() bool {
return instance.UsingOracleGTID || instance.UsingMariaDBGTID
}
// NextGTID returns the next (Oracle) GTID to be executed. Useful for skipping queries
func (instance *Instance) NextGTID() (string, error) {
if instance.ExecutedGtidSet == "" {
return "", fmt.Errorf("No value found in Executed_Gtid_Set; cannot compute NextGTID")
}
firstToken := func(s string, delimiter string) string {
tokens := strings.Split(s, delimiter)
return tokens[0]
}
lastToken := func(s string, delimiter string) string {
tokens := strings.Split(s, delimiter)
return tokens[len(tokens)-1]
}
// executed GTID set: 4f6d62ed-df65-11e3-b395-60672090eb04:1,b9b4712a-df64-11e3-b391-60672090eb04:1-6
executedGTIDsFromPrimary := lastToken(instance.ExecutedGtidSet, ",")
// executedGTIDsFromPrimary: b9b4712a-df64-11e3-b391-60672090eb04:1-6
executedRange := lastToken(executedGTIDsFromPrimary, ":")
// executedRange: 1-6
lastExecutedNumberToken := lastToken(executedRange, "-")
// lastExecutedNumber: 6
lastExecutedNumber, err := strconv.Atoi(lastExecutedNumberToken)
if err != nil {
return "", err
}
nextNumber := lastExecutedNumber + 1
nextGTID := fmt.Sprintf("%s:%d", firstToken(executedGTIDsFromPrimary, ":"), nextNumber)
return nextGTID, nil
}
// AddReplicaKey adds a replica to the list of this instance's replicas.
func (instance *Instance) AddReplicaKey(replicaKey *InstanceKey) {
instance.Replicas.AddKey(*replicaKey)
}
// AddGroupMemberKey adds a group member to the list of this instance's group members.
func (instance *Instance) AddGroupMemberKey(groupMemberKey *InstanceKey) {
instance.ReplicationGroupMembers.AddKey(*groupMemberKey)
}
// GetNextBinaryLog returns the successive, if any, binary log file to the one given
func (instance *Instance) GetNextBinaryLog(binlogCoordinates BinlogCoordinates) (BinlogCoordinates, error) {
if binlogCoordinates.LogFile == instance.SelfBinlogCoordinates.LogFile {
return binlogCoordinates, fmt.Errorf("Cannot find next binary log for %+v", binlogCoordinates)
}
return binlogCoordinates.NextFileCoordinates()
}
// IsReplicaOf returns true if this instance claims to replicate from given primary
func (instance *Instance) IsReplicaOf(primary *Instance) bool {
return instance.SourceKey.Equals(&primary.Key)
}
// IsReplicaOf returns true if this i supposed primary of given replica
func (instance *Instance) IsPrimaryOf(replica *Instance) bool {
return replica.IsReplicaOf(instance)
}
// IsDescendantOf returns true if this is replication directly or indirectly from other
func (instance *Instance) IsDescendantOf(other *Instance) bool {
for _, uuid := range strings.Split(instance.AncestryUUID, ",") {
if uuid == other.ServerUUID && uuid != "" {
return true
}
}
return false
}
// CanReplicateFrom uses heursitics to decide whether this instacne can practically replicate from other instance.
// Checks are made to binlog format, version number, binary logs etc.
func (instance *Instance) CanReplicateFrom(other *Instance) (bool, error) {
if instance.Key.Equals(&other.Key) {
return false, fmt.Errorf("instance cannot replicate from itself: %+v", instance.Key)
}
if !other.LogBinEnabled {
return false, fmt.Errorf("instance does not have binary logs enabled: %+v", other.Key)
}
if other.IsReplica() {
if !other.LogReplicationUpdatesEnabled {
return false, fmt.Errorf("instance does not have log_slave_updates enabled: %+v", other.Key)
}
// OK for a primary to not have log_slave_updates
// Not OK for a replica, for it has to relay the logs.
}
if instance.IsSmallerMajorVersion(other) && !instance.IsBinlogServer() {
return false, fmt.Errorf("instance %+v has version %s, which is lower than %s on %+v ", instance.Key, instance.Version, other.Version, other.Key)
}
if instance.LogBinEnabled && instance.LogReplicationUpdatesEnabled {
if instance.IsSmallerBinlogFormat(other) {
return false, fmt.Errorf("Cannot replicate from %+v binlog format on %+v to %+v on %+v", other.BinlogFormat, other.Key, instance.BinlogFormat, instance.Key)
}
}
if config.Config.VerifyReplicationFilters {
if other.HasReplicationFilters && !instance.HasReplicationFilters {
return false, fmt.Errorf("%+v has replication filters", other.Key)
}
}
if instance.ServerID == other.ServerID && !instance.IsBinlogServer() {
return false, fmt.Errorf("Identical server id: %+v, %+v both have %d", other.Key, instance.Key, instance.ServerID)
}
if instance.ServerUUID == other.ServerUUID && instance.ServerUUID != "" && !instance.IsBinlogServer() {
return false, fmt.Errorf("Identical server UUID: %+v, %+v both have %s", other.Key, instance.Key, instance.ServerUUID)
}
if instance.SQLDelay < other.SQLDelay && int64(other.SQLDelay) > int64(config.Config.ReasonableMaintenanceReplicationLagSeconds) {
return false, fmt.Errorf("%+v has higher SQL_Delay (%+v seconds) than %+v does (%+v seconds)", other.Key, other.SQLDelay, instance.Key, instance.SQLDelay)
}
return true, nil
}
// HasReasonableMaintenanceReplicationLag returns true when the replica lag is reasonable, and maintenance operations should have a green light to go.
func (instance *Instance) HasReasonableMaintenanceReplicationLag() bool {
// replicas with SQLDelay are a special case
if instance.SQLDelay > 0 {
return math.AbsInt64(instance.SecondsBehindPrimary.Int64-int64(instance.SQLDelay)) <= int64(config.Config.ReasonableMaintenanceReplicationLagSeconds)
}
return instance.SecondsBehindPrimary.Int64 <= int64(config.Config.ReasonableMaintenanceReplicationLagSeconds)
}
// CanMove returns true if this instance's state allows it to be repositioned. For example,
// if this instance lags too much, it will not be moveable.
func (instance *Instance) CanMove() (bool, error) {
if !instance.IsLastCheckValid {
return false, fmt.Errorf("%+v: last check invalid", instance.Key)
}
if !instance.IsRecentlyChecked {
return false, fmt.Errorf("%+v: not recently checked", instance.Key)
}
if !instance.ReplicationSQLThreadState.IsRunning() {
return false, fmt.Errorf("%+v: instance is not replicating", instance.Key)
}
if !instance.ReplicationIOThreadState.IsRunning() {
return false, fmt.Errorf("%+v: instance is not replicating", instance.Key)
}
if !instance.SecondsBehindPrimary.Valid {
return false, fmt.Errorf("%+v: cannot determine replication lag", instance.Key)
}
if !instance.HasReasonableMaintenanceReplicationLag() {
return false, fmt.Errorf("%+v: lags too much", instance.Key)
}
return true, nil
}
// CanMoveAsCoPrimary returns true if this instance's state allows it to be repositioned.
func (instance *Instance) CanMoveAsCoPrimary() (bool, error) {
if !instance.IsLastCheckValid {
return false, fmt.Errorf("%+v: last check invalid", instance.Key)
}
if !instance.IsRecentlyChecked {
return false, fmt.Errorf("%+v: not recently checked", instance.Key)
}
return true, nil
}
// StatusString returns a human readable description of this instance's status
func (instance *Instance) StatusString() string {
if !instance.IsLastCheckValid {
return "invalid"
}
if !instance.IsRecentlyChecked {
return "unchecked"
}
if instance.IsReplica() && !instance.ReplicaRunning() {
return "nonreplicating"
}
if instance.IsReplica() && !instance.HasReasonableMaintenanceReplicationLag() {
return "lag"
}
return "ok"
}
// LagStatusString returns a human readable representation of current lag
func (instance *Instance) LagStatusString() string {
if instance.IsDetached {
return "detached"
}
if !instance.IsLastCheckValid {
return "unknown"
}
if !instance.IsRecentlyChecked {
return "unknown"
}
if instance.IsReplica() && !instance.ReplicaRunning() {
return "null"
}
if instance.IsReplica() && !instance.SecondsBehindPrimary.Valid {
return "null"
}
if instance.IsReplica() && instance.ReplicationLagSeconds.Int64 > int64(config.Config.ReasonableMaintenanceReplicationLagSeconds) {
return fmt.Sprintf("%+vs", instance.ReplicationLagSeconds.Int64)
}
return fmt.Sprintf("%+vs", instance.ReplicationLagSeconds.Int64)
}
func (instance *Instance) descriptionTokens() (tokens []string) {
tokens = append(tokens, instance.LagStatusString())
tokens = append(tokens, instance.StatusString())
tokens = append(tokens, instance.Version)
if instance.ReadOnly {
tokens = append(tokens, "ro")
} else {
tokens = append(tokens, "rw")
}
if instance.LogBinEnabled {
tokens = append(tokens, instance.BinlogFormat)
} else {
tokens = append(tokens, "nobinlog")
}
{
extraTokens := []string{}
if instance.LogBinEnabled && instance.LogReplicationUpdatesEnabled {
extraTokens = append(extraTokens, ">>")
}
if instance.UsingGTID() || instance.SupportsOracleGTID {
token := "GTID"
if instance.GtidErrant != "" {
token = fmt.Sprintf("%s:errant", token)
}
extraTokens = append(extraTokens, token)
}
if instance.SemiSyncPrimaryStatus {
extraTokens = append(extraTokens, "semi:primary")
}
if instance.SemiSyncReplicaStatus {
extraTokens = append(extraTokens, "semi:replica")
}
if instance.IsDowntimed {
extraTokens = append(extraTokens, "downtimed")
}
tokens = append(tokens, strings.Join(extraTokens, ","))
}
return tokens
}
// HumanReadableDescription returns a simple readable string describing the status, version,
// etc. properties of this instance
func (instance *Instance) HumanReadableDescription() string {
tokens := instance.descriptionTokens()
nonEmptyTokens := []string{}
for _, token := range tokens {
if token != "" {
nonEmptyTokens = append(nonEmptyTokens, token)
}
}
description := fmt.Sprintf("[%s]", strings.Join(nonEmptyTokens, ","))
return description
}
// TabulatedDescription returns a simple tabulated string of various properties
func (instance *Instance) TabulatedDescription(separator string) string {
tokens := instance.descriptionTokens()
description := strings.Join(tokens, separator)
return description
}

Просмотреть файл

@ -1,52 +0,0 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package inst
import (
"fmt"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/db"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
)
func GetPreviousGTIDs(instanceKey *InstanceKey, binlog string) (previousGTIDs *OracleGtidSet, err error) {
if binlog == "" {
errMsg := fmt.Sprintf("GetPreviousGTIDs: empty binlog file name for %+v", *instanceKey)
log.Errorf(errMsg)
return nil, fmt.Errorf(errMsg)
}
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return nil, err
}
query := fmt.Sprintf("show binlog events in '%s' LIMIT 5", binlog)
err = sqlutils.QueryRowsMapBuffered(db, query, func(m sqlutils.RowMap) error {
eventType := m.GetString("Event_type")
if eventType == "Previous_gtids" {
var e error
if previousGTIDs, e = NewOracleGtidSet(m.GetString("Info")); e != nil {
return e
}
}
return nil
})
return previousGTIDs, err
}

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -60,17 +60,17 @@ func TestMkInsertOdkuSingle(t *testing.T) {
version, major_version, version_comment, binlog_server, read_only, binlog_format,
binlog_row_image, log_bin, log_replica_updates, binary_log_file, binary_log_pos, source_host, source_port,
replica_sql_running, replica_io_running, replication_sql_thread_state, replication_io_thread_state, has_replication_filters, supports_oracle_gtid, oracle_gtid, source_uuid, ancestry_uuid, executed_gtid_set, gtid_mode, gtid_purged, gtid_errant, mariadb_gtid, pseudo_gtid,
source_log_file, read_source_log_pos, relay_source_log_file, exec_source_log_pos, relay_log_file, relay_log_pos, last_sql_error, last_io_error, replication_lag_seconds, replica_lag_seconds, sql_delay, num_replica_hosts, replica_hosts, cluster_name, data_center, region, physical_environment, replication_depth, is_co_primary, has_replication_credentials, allow_tls, semi_sync_enforced, semi_sync_primary_enabled, semi_sync_primary_timeout, semi_sync_primary_wait_for_replica_count, semi_sync_replica_enabled, semi_sync_primary_status, semi_sync_primary_clients, semi_sync_replica_status, instance_alias, last_discovery_latency, replication_group_name, replication_group_is_single_primary_mode, replication_group_member_state, replication_group_member_role, replication_group_members, replication_group_primary_host, replication_group_primary_port, last_seen)
source_log_file, read_source_log_pos, relay_source_log_file, exec_source_log_pos, relay_log_file, relay_log_pos, last_sql_error, last_io_error, replication_lag_seconds, replica_lag_seconds, sql_delay, cluster_name, data_center, region, physical_environment, replication_depth, is_co_primary, has_replication_credentials, allow_tls, semi_sync_enforced, semi_sync_primary_enabled, semi_sync_primary_timeout, semi_sync_primary_wait_for_replica_count, semi_sync_replica_enabled, semi_sync_primary_status, semi_sync_primary_clients, semi_sync_replica_status, instance_alias, last_discovery_latency, replication_group_name, replication_group_is_single_primary_mode, replication_group_member_state, replication_group_member_role, replication_group_members, replication_group_primary_host, replication_group_primary_port, last_seen)
VALUES
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW())
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW())
ON DUPLICATE KEY UPDATE
hostname=VALUES(hostname), port=VALUES(port), last_checked=VALUES(last_checked), last_attempted_check=VALUES(last_attempted_check), last_check_partial_success=VALUES(last_check_partial_success), server_id=VALUES(server_id), server_uuid=VALUES(server_uuid), version=VALUES(version), major_version=VALUES(major_version), version_comment=VALUES(version_comment), binlog_server=VALUES(binlog_server), read_only=VALUES(read_only), binlog_format=VALUES(binlog_format), binlog_row_image=VALUES(binlog_row_image), log_bin=VALUES(log_bin), log_replica_updates=VALUES(log_replica_updates), binary_log_file=VALUES(binary_log_file), binary_log_pos=VALUES(binary_log_pos), source_host=VALUES(source_host), source_port=VALUES(source_port), replica_sql_running=VALUES(replica_sql_running), replica_io_running=VALUES(replica_io_running), replication_sql_thread_state=VALUES(replication_sql_thread_state), replication_io_thread_state=VALUES(replication_io_thread_state), has_replication_filters=VALUES(has_replication_filters), supports_oracle_gtid=VALUES(supports_oracle_gtid), oracle_gtid=VALUES(oracle_gtid), source_uuid=VALUES(source_uuid), ancestry_uuid=VALUES(ancestry_uuid), executed_gtid_set=VALUES(executed_gtid_set), gtid_mode=VALUES(gtid_mode), gtid_purged=VALUES(gtid_purged), gtid_errant=VALUES(gtid_errant), mariadb_gtid=VALUES(mariadb_gtid), pseudo_gtid=VALUES(pseudo_gtid), source_log_file=VALUES(source_log_file), read_source_log_pos=VALUES(read_source_log_pos), relay_source_log_file=VALUES(relay_source_log_file), exec_source_log_pos=VALUES(exec_source_log_pos), relay_log_file=VALUES(relay_log_file), relay_log_pos=VALUES(relay_log_pos), last_sql_error=VALUES(last_sql_error), last_io_error=VALUES(last_io_error), replication_lag_seconds=VALUES(replication_lag_seconds), replica_lag_seconds=VALUES(replica_lag_seconds), sql_delay=VALUES(sql_delay), num_replica_hosts=VALUES(num_replica_hosts), replica_hosts=VALUES(replica_hosts), cluster_name=VALUES(cluster_name), data_center=VALUES(data_center), region=VALUES(region), physical_environment=VALUES(physical_environment), replication_depth=VALUES(replication_depth), is_co_primary=VALUES(is_co_primary), has_replication_credentials=VALUES(has_replication_credentials), allow_tls=VALUES(allow_tls),
hostname=VALUES(hostname), port=VALUES(port), last_checked=VALUES(last_checked), last_attempted_check=VALUES(last_attempted_check), last_check_partial_success=VALUES(last_check_partial_success), server_id=VALUES(server_id), server_uuid=VALUES(server_uuid), version=VALUES(version), major_version=VALUES(major_version), version_comment=VALUES(version_comment), binlog_server=VALUES(binlog_server), read_only=VALUES(read_only), binlog_format=VALUES(binlog_format), binlog_row_image=VALUES(binlog_row_image), log_bin=VALUES(log_bin), log_replica_updates=VALUES(log_replica_updates), binary_log_file=VALUES(binary_log_file), binary_log_pos=VALUES(binary_log_pos), source_host=VALUES(source_host), source_port=VALUES(source_port), replica_sql_running=VALUES(replica_sql_running), replica_io_running=VALUES(replica_io_running), replication_sql_thread_state=VALUES(replication_sql_thread_state), replication_io_thread_state=VALUES(replication_io_thread_state), has_replication_filters=VALUES(has_replication_filters), supports_oracle_gtid=VALUES(supports_oracle_gtid), oracle_gtid=VALUES(oracle_gtid), source_uuid=VALUES(source_uuid), ancestry_uuid=VALUES(ancestry_uuid), executed_gtid_set=VALUES(executed_gtid_set), gtid_mode=VALUES(gtid_mode), gtid_purged=VALUES(gtid_purged), gtid_errant=VALUES(gtid_errant), mariadb_gtid=VALUES(mariadb_gtid), pseudo_gtid=VALUES(pseudo_gtid), source_log_file=VALUES(source_log_file), read_source_log_pos=VALUES(read_source_log_pos), relay_source_log_file=VALUES(relay_source_log_file), exec_source_log_pos=VALUES(exec_source_log_pos), relay_log_file=VALUES(relay_log_file), relay_log_pos=VALUES(relay_log_pos), last_sql_error=VALUES(last_sql_error), last_io_error=VALUES(last_io_error), replication_lag_seconds=VALUES(replication_lag_seconds), replica_lag_seconds=VALUES(replica_lag_seconds), sql_delay=VALUES(sql_delay), cluster_name=VALUES(cluster_name), data_center=VALUES(data_center), region=VALUES(region), physical_environment=VALUES(physical_environment), replication_depth=VALUES(replication_depth), is_co_primary=VALUES(is_co_primary), has_replication_credentials=VALUES(has_replication_credentials), allow_tls=VALUES(allow_tls),
semi_sync_enforced=VALUES(semi_sync_enforced), semi_sync_primary_enabled=VALUES(semi_sync_primary_enabled), semi_sync_primary_timeout=VALUES(semi_sync_primary_timeout), semi_sync_primary_wait_for_replica_count=VALUES(semi_sync_primary_wait_for_replica_count), semi_sync_replica_enabled=VALUES(semi_sync_replica_enabled), semi_sync_primary_status=VALUES(semi_sync_primary_status), semi_sync_primary_clients=VALUES(semi_sync_primary_clients), semi_sync_replica_status=VALUES(semi_sync_replica_status),
instance_alias=VALUES(instance_alias), last_discovery_latency=VALUES(last_discovery_latency), replication_group_name=VALUES(replication_group_name), replication_group_is_single_primary_mode=VALUES(replication_group_is_single_primary_mode), replication_group_member_state=VALUES(replication_group_member_state), replication_group_member_role=VALUES(replication_group_member_role), replication_group_members=VALUES(replication_group_members), replication_group_primary_host=VALUES(replication_group_primary_host), replication_group_primary_port=VALUES(replication_group_primary_port), last_seen=VALUES(last_seen)
`
a1 := `i710, 3306, 710, , 5.6.7, 5.6, MySQL, false, false, STATEMENT,
FULL, false, false, , 0, , 0,
false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 10, , 0, , , {0 false}, {0 false}, 0, 0, [], , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0, `
false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 10, , 0, , , {0 false}, {0 false}, 0, , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0, `
sql1, args1, err := mkInsertOdkuForInstances(instances[:1], false, true)
test.S(t).ExpectNil(err)
@ -83,22 +83,22 @@ func TestMkInsertOdkuThree(t *testing.T) {
// three instances
s3 := `INSERT INTO database_instance
(hostname, port, last_checked, last_attempted_check, last_check_partial_success, server_id, server_uuid, version, major_version, version_comment, binlog_server, read_only, binlog_format, binlog_row_image, log_bin, log_replica_updates, binary_log_file, binary_log_pos, source_host, source_port, replica_sql_running, replica_io_running, replication_sql_thread_state, replication_io_thread_state, has_replication_filters, supports_oracle_gtid, oracle_gtid, source_uuid, ancestry_uuid, executed_gtid_set, gtid_mode, gtid_purged, gtid_errant, mariadb_gtid, pseudo_gtid, source_log_file, read_source_log_pos, relay_source_log_file, exec_source_log_pos, relay_log_file, relay_log_pos, last_sql_error, last_io_error, replication_lag_seconds, replica_lag_seconds, sql_delay, num_replica_hosts, replica_hosts, cluster_name, data_center, region, physical_environment, replication_depth, is_co_primary, has_replication_credentials, allow_tls, semi_sync_enforced, semi_sync_primary_enabled, semi_sync_primary_timeout, semi_sync_primary_wait_for_replica_count,
(hostname, port, last_checked, last_attempted_check, last_check_partial_success, server_id, server_uuid, version, major_version, version_comment, binlog_server, read_only, binlog_format, binlog_row_image, log_bin, log_replica_updates, binary_log_file, binary_log_pos, source_host, source_port, replica_sql_running, replica_io_running, replication_sql_thread_state, replication_io_thread_state, has_replication_filters, supports_oracle_gtid, oracle_gtid, source_uuid, ancestry_uuid, executed_gtid_set, gtid_mode, gtid_purged, gtid_errant, mariadb_gtid, pseudo_gtid, source_log_file, read_source_log_pos, relay_source_log_file, exec_source_log_pos, relay_log_file, relay_log_pos, last_sql_error, last_io_error, replication_lag_seconds, replica_lag_seconds, sql_delay, cluster_name, data_center, region, physical_environment, replication_depth, is_co_primary, has_replication_credentials, allow_tls, semi_sync_enforced, semi_sync_primary_enabled, semi_sync_primary_timeout, semi_sync_primary_wait_for_replica_count,
semi_sync_replica_enabled, semi_sync_primary_status, semi_sync_primary_clients, semi_sync_replica_status, instance_alias, last_discovery_latency, replication_group_name, replication_group_is_single_primary_mode, replication_group_member_state, replication_group_member_role, replication_group_members, replication_group_primary_host, replication_group_primary_port, last_seen)
VALUES
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW()),
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW()),
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW())
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW()),
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW()),
(?, ?, NOW(), NOW(), 1, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, NOW())
ON DUPLICATE KEY UPDATE
hostname=VALUES(hostname), port=VALUES(port), last_checked=VALUES(last_checked), last_attempted_check=VALUES(last_attempted_check), last_check_partial_success=VALUES(last_check_partial_success), server_id=VALUES(server_id), server_uuid=VALUES(server_uuid), version=VALUES(version), major_version=VALUES(major_version), version_comment=VALUES(version_comment), binlog_server=VALUES(binlog_server), read_only=VALUES(read_only), binlog_format=VALUES(binlog_format), binlog_row_image=VALUES(binlog_row_image), log_bin=VALUES(log_bin), log_replica_updates=VALUES(log_replica_updates), binary_log_file=VALUES(binary_log_file), binary_log_pos=VALUES(binary_log_pos), source_host=VALUES(source_host), source_port=VALUES(source_port), replica_sql_running=VALUES(replica_sql_running), replica_io_running=VALUES(replica_io_running), replication_sql_thread_state=VALUES(replication_sql_thread_state), replication_io_thread_state=VALUES(replication_io_thread_state), has_replication_filters=VALUES(has_replication_filters), supports_oracle_gtid=VALUES(supports_oracle_gtid), oracle_gtid=VALUES(oracle_gtid), source_uuid=VALUES(source_uuid), ancestry_uuid=VALUES(ancestry_uuid), executed_gtid_set=VALUES(executed_gtid_set), gtid_mode=VALUES(gtid_mode), gtid_purged=VALUES(gtid_purged), gtid_errant=VALUES(gtid_errant), mariadb_gtid=VALUES(mariadb_gtid), pseudo_gtid=VALUES(pseudo_gtid), source_log_file=VALUES(source_log_file), read_source_log_pos=VALUES(read_source_log_pos), relay_source_log_file=VALUES(relay_source_log_file), exec_source_log_pos=VALUES(exec_source_log_pos), relay_log_file=VALUES(relay_log_file), relay_log_pos=VALUES(relay_log_pos), last_sql_error=VALUES(last_sql_error), last_io_error=VALUES(last_io_error), replication_lag_seconds=VALUES(replication_lag_seconds), replica_lag_seconds=VALUES(replica_lag_seconds), sql_delay=VALUES(sql_delay), num_replica_hosts=VALUES(num_replica_hosts), replica_hosts=VALUES(replica_hosts), cluster_name=VALUES(cluster_name), data_center=VALUES(data_center), region=VALUES(region),
hostname=VALUES(hostname), port=VALUES(port), last_checked=VALUES(last_checked), last_attempted_check=VALUES(last_attempted_check), last_check_partial_success=VALUES(last_check_partial_success), server_id=VALUES(server_id), server_uuid=VALUES(server_uuid), version=VALUES(version), major_version=VALUES(major_version), version_comment=VALUES(version_comment), binlog_server=VALUES(binlog_server), read_only=VALUES(read_only), binlog_format=VALUES(binlog_format), binlog_row_image=VALUES(binlog_row_image), log_bin=VALUES(log_bin), log_replica_updates=VALUES(log_replica_updates), binary_log_file=VALUES(binary_log_file), binary_log_pos=VALUES(binary_log_pos), source_host=VALUES(source_host), source_port=VALUES(source_port), replica_sql_running=VALUES(replica_sql_running), replica_io_running=VALUES(replica_io_running), replication_sql_thread_state=VALUES(replication_sql_thread_state), replication_io_thread_state=VALUES(replication_io_thread_state), has_replication_filters=VALUES(has_replication_filters), supports_oracle_gtid=VALUES(supports_oracle_gtid), oracle_gtid=VALUES(oracle_gtid), source_uuid=VALUES(source_uuid), ancestry_uuid=VALUES(ancestry_uuid), executed_gtid_set=VALUES(executed_gtid_set), gtid_mode=VALUES(gtid_mode), gtid_purged=VALUES(gtid_purged), gtid_errant=VALUES(gtid_errant), mariadb_gtid=VALUES(mariadb_gtid), pseudo_gtid=VALUES(pseudo_gtid), source_log_file=VALUES(source_log_file), read_source_log_pos=VALUES(read_source_log_pos), relay_source_log_file=VALUES(relay_source_log_file), exec_source_log_pos=VALUES(exec_source_log_pos), relay_log_file=VALUES(relay_log_file), relay_log_pos=VALUES(relay_log_pos), last_sql_error=VALUES(last_sql_error), last_io_error=VALUES(last_io_error), replication_lag_seconds=VALUES(replication_lag_seconds), replica_lag_seconds=VALUES(replica_lag_seconds), sql_delay=VALUES(sql_delay), cluster_name=VALUES(cluster_name), data_center=VALUES(data_center), region=VALUES(region),
physical_environment=VALUES(physical_environment), replication_depth=VALUES(replication_depth), is_co_primary=VALUES(is_co_primary), has_replication_credentials=VALUES(has_replication_credentials), allow_tls=VALUES(allow_tls), semi_sync_enforced=VALUES(semi_sync_enforced),
semi_sync_primary_enabled=VALUES(semi_sync_primary_enabled), semi_sync_primary_timeout=VALUES(semi_sync_primary_timeout), semi_sync_primary_wait_for_replica_count=VALUES(semi_sync_primary_wait_for_replica_count), semi_sync_replica_enabled=VALUES(semi_sync_replica_enabled), semi_sync_primary_status=VALUES(semi_sync_primary_status), semi_sync_primary_clients=VALUES(semi_sync_primary_clients), semi_sync_replica_status=VALUES(semi_sync_replica_status),
instance_alias=VALUES(instance_alias), last_discovery_latency=VALUES(last_discovery_latency), replication_group_name=VALUES(replication_group_name), replication_group_is_single_primary_mode=VALUES(replication_group_is_single_primary_mode), replication_group_member_state=VALUES(replication_group_member_state), replication_group_member_role=VALUES(replication_group_member_role), replication_group_members=VALUES(replication_group_members), replication_group_primary_host=VALUES(replication_group_primary_host), replication_group_primary_port=VALUES(replication_group_primary_port), last_seen=VALUES(last_seen)
`
a3 := `
i710, 3306, 710, , 5.6.7, 5.6, MySQL, false, false, STATEMENT, FULL, false, false, , 0, , 0, false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 10, , 0, , , {0 false}, {0 false}, 0, 0, [], , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0,
i720, 3306, 720, , 5.6.7, 5.6, MySQL, false, false, STATEMENT, FULL, false, false, , 0, , 0, false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 20, , 0, , , {0 false}, {0 false}, 0, 0, [], , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0,
i730, 3306, 730, , 5.6.7, 5.6, MySQL, false, false, STATEMENT, FULL, false, false, , 0, , 0, false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 30, , 0, , , {0 false}, {0 false}, 0, 0, [], , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0,
i710, 3306, 710, , 5.6.7, 5.6, MySQL, false, false, STATEMENT, FULL, false, false, , 0, , 0, false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 10, , 0, , , {0 false}, {0 false}, 0, , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0,
i720, 3306, 720, , 5.6.7, 5.6, MySQL, false, false, STATEMENT, FULL, false, false, , 0, , 0, false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 20, , 0, , , {0 false}, {0 false}, 0, , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0,
i730, 3306, 730, , 5.6.7, 5.6, MySQL, false, false, STATEMENT, FULL, false, false, , 0, , 0, false, false, 0, 0, false, false, false, , , , , , , false, false, , 0, mysql.000007, 30, , 0, , , {0 false}, {0 false}, 0, , , , , 0, false, false, false, false, false, 0, 0, false, false, 0, false, , 0, , false, , , [], , 0,
`
sql3, args3, err := mkInsertOdkuForInstances(instances[:3], true, true)

Просмотреть файл

@ -21,8 +21,6 @@ import (
"regexp"
"strconv"
"strings"
"vitess.io/vitess/go/vt/vtorc/config"
)
// InstanceKey is an instance indicator, identifued by hostname and port
@ -79,7 +77,7 @@ func parseRawInstanceKey(hostPort string, resolve bool) (instanceKey *InstanceKe
return nil, fmt.Errorf("Cannot parse address: %s", hostPort)
}
if port == "" {
port = fmt.Sprintf("%d", config.Config.DefaultInstancePort)
port = "3306"
}
return newInstanceKeyStrings(hostname, port, resolve)
}

Просмотреть файл

@ -25,7 +25,6 @@ import (
)
func init() {
config.Config.HostnameResolveMethod = "none"
config.MarkConfigurationLoaded()
}

Просмотреть файл

@ -24,7 +24,6 @@ import (
)
func init() {
config.Config.HostnameResolveMethod = "none"
config.MarkConfigurationLoaded()
}

Просмотреть файл

@ -24,12 +24,10 @@ import (
)
func init() {
config.Config.HostnameResolveMethod = "none"
config.MarkConfigurationLoaded()
}
var instance1 = Instance{Key: key1}
var instance2 = Instance{Key: key2}
func TestIsSmallerMajorVersion(t *testing.T) {
i55 := Instance{Version: "5.5"}
@ -69,158 +67,6 @@ func TestIsSmallerBinlogFormat(t *testing.T) {
test.S(t).ExpectFalse(iRow.IsSmallerBinlogFormat(iMixed))
}
func TestIsDescendant(t *testing.T) {
{
i57 := Instance{Key: key1, Version: "5.7"}
i56 := Instance{Key: key2, Version: "5.6"}
isDescendant := i57.IsDescendantOf(&i56)
test.S(t).ExpectEquals(isDescendant, false)
}
{
i57 := Instance{Key: key1, Version: "5.7", AncestryUUID: "00020192-1111-1111-1111-111111111111"}
i56 := Instance{Key: key2, Version: "5.6", ServerUUID: ""}
isDescendant := i57.IsDescendantOf(&i56)
test.S(t).ExpectEquals(isDescendant, false)
}
{
i57 := Instance{Key: key1, Version: "5.7", AncestryUUID: ""}
i56 := Instance{Key: key2, Version: "5.6", ServerUUID: "00020192-1111-1111-1111-111111111111"}
isDescendant := i57.IsDescendantOf(&i56)
test.S(t).ExpectEquals(isDescendant, false)
}
{
i57 := Instance{Key: key1, Version: "5.7", AncestryUUID: "00020193-2222-2222-2222-222222222222"}
i56 := Instance{Key: key2, Version: "5.6", ServerUUID: "00020192-1111-1111-1111-111111111111"}
isDescendant := i57.IsDescendantOf(&i56)
test.S(t).ExpectEquals(isDescendant, false)
}
{
i57 := Instance{Key: key1, Version: "5.7", AncestryUUID: "00020193-2222-2222-2222-222222222222,00020193-3333-3333-3333-222222222222"}
i56 := Instance{Key: key2, Version: "5.6", ServerUUID: "00020192-1111-1111-1111-111111111111"}
isDescendant := i57.IsDescendantOf(&i56)
test.S(t).ExpectEquals(isDescendant, false)
}
{
i57 := Instance{Key: key1, Version: "5.7", AncestryUUID: "00020193-2222-2222-2222-222222222222,00020192-1111-1111-1111-111111111111"}
i56 := Instance{Key: key2, Version: "5.6", ServerUUID: "00020192-1111-1111-1111-111111111111"}
isDescendant := i57.IsDescendantOf(&i56)
test.S(t).ExpectEquals(isDescendant, true)
}
}
func TestCanReplicateFrom(t *testing.T) {
i55 := Instance{Key: key1, Version: "5.5"}
i56 := Instance{Key: key2, Version: "5.6"}
var canReplicate bool
canReplicate, _ = i56.CanReplicateFrom(&i55)
test.S(t).ExpectEquals(canReplicate, false) //binlog not yet enabled
i55.LogBinEnabled = true
i55.LogReplicationUpdatesEnabled = true
i56.LogBinEnabled = true
i56.LogReplicationUpdatesEnabled = true
canReplicate, _ = i56.CanReplicateFrom(&i55)
test.S(t).ExpectEquals(canReplicate, false) //serverid not set
i55.ServerID = 55
i56.ServerID = 56
canReplicate, err := i56.CanReplicateFrom(&i55)
test.S(t).ExpectNil(err)
test.S(t).ExpectTrue(canReplicate)
canReplicate, _ = i55.CanReplicateFrom(&i56)
test.S(t).ExpectFalse(canReplicate)
iStatement := Instance{Key: key1, BinlogFormat: "STATEMENT", ServerID: 1, Version: "5.5", LogBinEnabled: true, LogReplicationUpdatesEnabled: true}
iRow := Instance{Key: key2, BinlogFormat: "ROW", ServerID: 2, Version: "5.5", LogBinEnabled: true, LogReplicationUpdatesEnabled: true}
canReplicate, err = iRow.CanReplicateFrom(&iStatement)
test.S(t).ExpectNil(err)
test.S(t).ExpectTrue(canReplicate)
canReplicate, _ = iStatement.CanReplicateFrom(&iRow)
test.S(t).ExpectFalse(canReplicate)
}
func TestNextGTID(t *testing.T) {
{
i := Instance{ExecutedGtidSet: "4f6d62ed-df65-11e3-b395-60672090eb04:1,b9b4712a-df64-11e3-b391-60672090eb04:1-6"}
nextGTID, err := i.NextGTID()
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(nextGTID, "b9b4712a-df64-11e3-b391-60672090eb04:7")
}
{
i := Instance{ExecutedGtidSet: "b9b4712a-df64-11e3-b391-60672090eb04:1-6"}
nextGTID, err := i.NextGTID()
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(nextGTID, "b9b4712a-df64-11e3-b391-60672090eb04:7")
}
{
i := Instance{ExecutedGtidSet: "b9b4712a-df64-11e3-b391-60672090eb04:6"}
nextGTID, err := i.NextGTID()
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(nextGTID, "b9b4712a-df64-11e3-b391-60672090eb04:7")
}
}
func TestRemoveInstance(t *testing.T) {
{
instances := [](*Instance){&instance1, &instance2}
test.S(t).ExpectEquals(len(instances), 2)
instances = RemoveNilInstances(instances)
test.S(t).ExpectEquals(len(instances), 2)
}
{
instances := [](*Instance){&instance1, nil, &instance2}
test.S(t).ExpectEquals(len(instances), 3)
instances = RemoveNilInstances(instances)
test.S(t).ExpectEquals(len(instances), 2)
}
{
instances := [](*Instance){&instance1, &instance2}
test.S(t).ExpectEquals(len(instances), 2)
instances = RemoveInstance(instances, &key1)
test.S(t).ExpectEquals(len(instances), 1)
instances = RemoveInstance(instances, &key1)
test.S(t).ExpectEquals(len(instances), 1)
instances = RemoveInstance(instances, &key2)
test.S(t).ExpectEquals(len(instances), 0)
instances = RemoveInstance(instances, &key2)
test.S(t).ExpectEquals(len(instances), 0)
}
}
func TestHumanReadableDescription(t *testing.T) {
i57 := Instance{Version: "5.7.8-log"}
{
desc := i57.HumanReadableDescription()
test.S(t).ExpectEquals(desc, "[unknown,invalid,5.7.8-log,rw,nobinlog]")
}
{
i57.UsingOracleGTID = true
i57.LogBinEnabled = true
i57.BinlogFormat = "ROW"
i57.LogReplicationUpdatesEnabled = true
desc := i57.HumanReadableDescription()
test.S(t).ExpectEquals(desc, "[unknown,invalid,5.7.8-log,rw,ROW,>>,GTID]")
}
}
func TestTabulatedDescription(t *testing.T) {
i57 := Instance{Version: "5.7.8-log"}
{
desc := i57.TabulatedDescription("|")
test.S(t).ExpectEquals(desc, "unknown|invalid|5.7.8-log|rw|nobinlog|")
}
{
i57.UsingOracleGTID = true
i57.LogBinEnabled = true
i57.BinlogFormat = "ROW"
i57.LogReplicationUpdatesEnabled = true
desc := i57.TabulatedDescription("|")
test.S(t).ExpectEquals(desc, "unknown|invalid|5.7.8-log|rw|ROW|>>,GTID")
}
}
func TestReplicationThreads(t *testing.T) {
{
test.S(t).ExpectFalse(instance1.ReplicaRunning())

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -16,45 +16,11 @@
package inst
import (
"context"
"database/sql"
"fmt"
"strings"
"time"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/db"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
)
// Max concurrency for bulk topology operations
const topologyConcurrency = 128
var topologyConcurrencyChan = make(chan bool, topologyConcurrency)
type OperationGTIDHint string
const (
GTIDHintDeny OperationGTIDHint = "NoGTID"
GTIDHintNeutral OperationGTIDHint = "GTIDHintNeutral"
GTIDHintForce OperationGTIDHint = "GTIDHintForce"
)
const (
Error1201CouldnotInitializePrimaryInfoStructure = "Error 1201:"
)
// ExecInstance executes a given query on the given MySQL topology instance
func ExecInstance(instanceKey *InstanceKey, query string, args ...any) (sql.Result, error) {
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return nil, err
}
return sqlutils.ExecNoPrepare(db, query, args...)
}
// ExecuteOnTopology will execute given function while maintaining concurrency limit
// on topology servers. It is safe in the sense that we will not leak tokens.
func ExecuteOnTopology(f func()) {
@ -62,881 +28,3 @@ func ExecuteOnTopology(f func()) {
defer func() { _ = recover(); <-topologyConcurrencyChan }()
f()
}
// EmptyCommitInstance issues an empty COMMIT on a given instance
func EmptyCommitInstance(instanceKey *InstanceKey) error {
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return err
}
tx, err := db.Begin()
if err != nil {
return err
}
err = tx.Commit()
if err != nil {
return err
}
return err
}
// RefreshTopologyInstance will synchronuously re-read topology instance
func RefreshTopologyInstance(instanceKey *InstanceKey) (*Instance, error) {
_, err := ReadTopologyInstance(instanceKey)
if err != nil {
return nil, err
}
inst, found, err := ReadInstance(instanceKey)
if err != nil || !found {
return nil, err
}
return inst, nil
}
// GetReplicationRestartPreserveStatements returns a sequence of statements that make sure a replica is stopped
// and then returned to the same state. For example, if the replica was fully running, this will issue
// a STOP on both io_thread and sql_thread, followed by START on both. If one of them is not running
// at the time this function is called, said thread will be neither stopped nor started.
// The caller may provide an injected statememt, to be executed while the replica is stopped.
// This is useful for CHANGE MASTER TO commands, that unfortunately must take place while the replica
// is completely stopped.
func GetReplicationRestartPreserveStatements(instanceKey *InstanceKey, injectedStatement string) (statements []string, err error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
return statements, err
}
if instance.ReplicationIOThreadRuning {
statements = append(statements, SemicolonTerminated(`stop slave io_thread`))
}
if instance.ReplicationSQLThreadRuning {
statements = append(statements, SemicolonTerminated(`stop slave sql_thread`))
}
if injectedStatement != "" {
statements = append(statements, SemicolonTerminated(injectedStatement))
}
if instance.ReplicationSQLThreadRuning {
statements = append(statements, SemicolonTerminated(`start slave sql_thread`))
}
if instance.ReplicationIOThreadRuning {
statements = append(statements, SemicolonTerminated(`start slave io_thread`))
}
return statements, err
}
// FlushBinaryLogs attempts a 'FLUSH BINARY LOGS' statement on the given instance.
func FlushBinaryLogs(instanceKey *InstanceKey, count int) (*Instance, error) {
if *config.RuntimeCLIFlags.Noop {
return nil, fmt.Errorf("noop: aborting flush-binary-logs operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
for i := 0; i < count; i++ {
_, err := ExecInstance(instanceKey, `flush binary logs`)
if err != nil {
log.Error(err)
return nil, err
}
}
log.Infof("flush-binary-logs count=%+v on %+v", count, *instanceKey)
_ = AuditOperation("flush-binary-logs", instanceKey, "success")
return ReadTopologyInstance(instanceKey)
}
// FlushBinaryLogsTo attempts to 'FLUSH BINARY LOGS' until given binary log is reached
func FlushBinaryLogsTo(instanceKey *InstanceKey, logFile string) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
distance := instance.SelfBinlogCoordinates.FileNumberDistance(&BinlogCoordinates{LogFile: logFile})
if distance < 0 {
errMsg := fmt.Sprintf("FlushBinaryLogsTo: target log file %+v is smaller than current log file %+v", logFile, instance.SelfBinlogCoordinates.LogFile)
log.Errorf(errMsg)
return nil, fmt.Errorf(errMsg)
}
return FlushBinaryLogs(instanceKey, distance)
}
// purgeBinaryLogsTo attempts to 'PURGE BINARY LOGS' until given binary log is reached
func purgeBinaryLogsTo(instanceKey *InstanceKey, logFile string) (*Instance, error) {
if *config.RuntimeCLIFlags.Noop {
return nil, fmt.Errorf("noop: aborting purge-binary-logs operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
_, err := ExecInstance(instanceKey, "purge binary logs to ?", logFile)
if err != nil {
log.Error(err)
return nil, err
}
log.Infof("purge-binary-logs to=%+v on %+v", logFile, *instanceKey)
_ = AuditOperation("purge-binary-logs", instanceKey, "success")
return ReadTopologyInstance(instanceKey)
}
func RestartReplicationQuick(instanceKey *InstanceKey) error {
for _, cmd := range []string{`stop slave sql_thread`, `stop slave io_thread`, `start slave io_thread`, `start slave sql_thread`} {
if _, err := ExecInstance(instanceKey, cmd); err != nil {
errMsg := fmt.Sprintf("%+v: RestartReplicationQuick: '%q' failed: %+v", *instanceKey, cmd, err)
log.Errorf(errMsg)
return fmt.Errorf(errMsg)
}
log.Infof("%s on %+v as part of RestartReplicationQuick", cmd, *instanceKey)
}
return nil
}
// StopReplicationNicely stops a replica such that SQL_thread and IO_thread are aligned (i.e.
// SQL_thread consumes all relay log entries)
// It will actually START the sql_thread even if the replica is completely stopped.
func StopReplicationNicely(instanceKey *InstanceKey, timeout time.Duration) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if !instance.ReplicationThreadsExist() {
return instance, fmt.Errorf("instance is not a replica: %+v", instanceKey)
}
// stop io_thread, start sql_thread but catch any errors
for _, cmd := range []string{`stop slave io_thread`, `start slave sql_thread`} {
if _, err := ExecInstance(instanceKey, cmd); err != nil {
errMsg := fmt.Sprintf("%+v: StopReplicationNicely: '%q' failed: %+v", *instanceKey, cmd, err)
log.Errorf(errMsg)
return nil, fmt.Errorf(errMsg)
}
}
if instance.SQLDelay == 0 {
// Otherwise we don't bother.
if instance, err = WaitForSQLThreadUpToDate(instanceKey, timeout, 0); err != nil {
return instance, err
}
}
_, err = ExecInstance(instanceKey, `stop slave`)
if err != nil {
log.Error(err)
return instance, err
}
instance, err = ReadTopologyInstance(instanceKey)
log.Infof("Stopped replication nicely on %+v, Self:%+v, Exec:%+v", *instanceKey, instance.SelfBinlogCoordinates, instance.ExecBinlogCoordinates)
return instance, err
}
func WaitForSQLThreadUpToDate(instanceKey *InstanceKey, overallTimeout time.Duration, staleCoordinatesTimeout time.Duration) (instance *Instance, err error) {
// Otherwise we don't bother.
var lastExecBinlogCoordinates BinlogCoordinates
if overallTimeout == 0 {
overallTimeout = 24 * time.Hour
}
if staleCoordinatesTimeout == 0 {
staleCoordinatesTimeout = time.Duration(config.Config.ReasonableReplicationLagSeconds) * time.Second
}
generalTimer := time.NewTimer(overallTimeout)
staleTimer := time.NewTimer(staleCoordinatesTimeout)
for {
instance, err := RetryInstanceFunction(func() (*Instance, error) {
return ReadTopologyInstance(instanceKey)
})
if err != nil {
log.Error(err)
return instance, err
}
if instance.SQLThreadUpToDate() {
// Woohoo
return instance, nil
}
if instance.SQLDelay != 0 {
errMsg := fmt.Sprintf("WaitForSQLThreadUpToDate: instance %+v has SQL Delay %+v. Operation is irrelevant", *instanceKey, instance.SQLDelay)
log.Errorf(errMsg)
return instance, fmt.Errorf(errMsg)
}
if !instance.ExecBinlogCoordinates.Equals(&lastExecBinlogCoordinates) {
// means we managed to apply binlog events. We made progress...
// so we reset the "staleness" timer
if !staleTimer.Stop() {
<-staleTimer.C
}
staleTimer.Reset(staleCoordinatesTimeout)
}
lastExecBinlogCoordinates = instance.ExecBinlogCoordinates
select {
case <-generalTimer.C:
errMsg := fmt.Sprintf("WaitForSQLThreadUpToDate timeout on %+v after duration %+v", *instanceKey, overallTimeout)
log.Errorf(errMsg)
return instance, fmt.Errorf(errMsg)
case <-staleTimer.C:
errMsg := fmt.Sprintf("WaitForSQLThreadUpToDate stale coordinates timeout on %+v after duration %+v", *instanceKey, staleCoordinatesTimeout)
log.Errorf(errMsg)
return instance, fmt.Errorf(errMsg)
default:
log.Infof("WaitForSQLThreadUpToDate waiting on %+v", *instanceKey)
time.Sleep(retryInterval)
}
}
}
// StopReplicas will stop replication concurrently on given set of replicas.
// It will potentially do nothing, or attempt to stop _nicely_ or just stop normally, all according to stopReplicationMethod
func StopReplicas(replicas [](*Instance), stopReplicationMethod StopReplicationMethod, timeout time.Duration) [](*Instance) {
if stopReplicationMethod == NoStopReplication {
return replicas
}
refreshedReplicas := [](*Instance){}
log.Infof("Stopping %d replicas via %s", len(replicas), string(stopReplicationMethod))
// use concurrency but wait for all to complete
barrier := make(chan *Instance)
for _, replica := range replicas {
replica := replica
go func() {
updatedReplica := &replica
// Signal completed replica
defer func() { barrier <- *updatedReplica }()
// Wait your turn to read a replica
ExecuteOnTopology(func() {
if stopReplicationMethod == StopReplicationNice {
_, _ = StopReplicationNicely(&replica.Key, timeout)
}
replica, _ = StopReplication(&replica.Key)
updatedReplica = &replica
})
}()
}
for range replicas {
refreshedReplicas = append(refreshedReplicas, <-barrier)
}
return refreshedReplicas
}
// StopReplication stops replication on a given instance
func StopReplication(instanceKey *InstanceKey) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
_, err = ExecInstance(instanceKey, `stop slave`)
if err != nil {
log.Error(err)
return instance, err
}
instance, err = ReadTopologyInstance(instanceKey)
log.Infof("Stopped replication on %+v, Self:%+v, Exec:%+v", *instanceKey, instance.SelfBinlogCoordinates, instance.ExecBinlogCoordinates)
return instance, err
}
// waitForReplicationState waits for both replication threads to be either running or not running, together.
// This is useful post- `start slave` operation, ensuring both threads are actually running,
// or post `stop slave` operation, ensuring both threads are not running.
func waitForReplicationState(instanceKey *InstanceKey, expectedState ReplicationThreadState) (expectationMet bool, err error) {
waitDuration := time.Second
waitInterval := 10 * time.Millisecond
startTime := time.Now()
for {
// Since this is an incremental aggressive polling, it's OK if an occasional
// error is observed. We don't bail out on a single error.
if expectationMet, _ := expectReplicationThreadsState(instanceKey, expectedState); expectationMet {
return true, nil
}
if time.Since(startTime)+waitInterval > waitDuration {
break
}
time.Sleep(waitInterval)
waitInterval = 2 * waitInterval
}
return false, nil
}
// StartReplication starts replication on a given instance.
func StartReplication(instanceKey *InstanceKey) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if !instance.IsReplica() {
return instance, fmt.Errorf("instance is not a replica: %+v", instanceKey)
}
_, err = ExecInstance(instanceKey, `start slave`)
if err != nil {
log.Error(err)
return instance, err
}
log.Infof("Started replication on %+v", instanceKey)
_, _ = waitForReplicationState(instanceKey, ReplicationThreadStateRunning)
instance, err = ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if !instance.ReplicaRunning() {
return instance, ErrReplicationNotRunning
}
return instance, nil
}
// RestartReplication stops & starts replication on a given instance
func RestartReplication(instanceKey *InstanceKey) (instance *Instance, err error) {
instance, err = StopReplication(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
instance, err = StartReplication(instanceKey)
if err != nil {
log.Error(err)
}
return instance, err
}
func WaitForExecBinlogCoordinatesToReach(instanceKey *InstanceKey, coordinates *BinlogCoordinates, maxWait time.Duration) (instance *Instance, exactMatch bool, err error) {
startTime := time.Now()
for {
if maxWait != 0 && time.Since(startTime) > maxWait {
return nil, exactMatch, fmt.Errorf("WaitForExecBinlogCoordinatesToReach: reached maxWait %+v on %+v", maxWait, *instanceKey)
}
instance, err = ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, exactMatch, err
}
switch {
case instance.ExecBinlogCoordinates.SmallerThan(coordinates):
time.Sleep(retryInterval)
case instance.ExecBinlogCoordinates.Equals(coordinates):
return instance, true, nil
case coordinates.SmallerThan(&instance.ExecBinlogCoordinates):
return instance, false, nil
}
}
}
// StartReplicationUntilPrimaryCoordinates issuesa START SLAVE UNTIL... statement on given instance
func StartReplicationUntilPrimaryCoordinates(instanceKey *InstanceKey, primaryCoordinates *BinlogCoordinates) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if !instance.IsReplica() {
return instance, fmt.Errorf("instance is not a replica: %+v", instanceKey)
}
if !instance.ReplicationThreadsStopped() {
return instance, fmt.Errorf("replication threads are not stopped: %+v", instanceKey)
}
log.Infof("Will start replication on %+v until coordinates: %+v", instanceKey, primaryCoordinates)
// MariaDB has a bug: a CHANGE MASTER TO statement does not work properly with prepared statement... :P
// See https://mariadb.atlassian.net/browse/MDEV-7640
// This is the reason for ExecInstance
_, err = ExecInstance(instanceKey, "start slave until master_log_file=?, master_log_pos=?",
primaryCoordinates.LogFile, primaryCoordinates.LogPos)
if err != nil {
log.Error(err)
return instance, err
}
instance, exactMatch, err := WaitForExecBinlogCoordinatesToReach(instanceKey, primaryCoordinates, 0)
if err != nil {
log.Error(err)
return instance, err
}
if !exactMatch {
return instance, fmt.Errorf("Start SLAVE UNTIL is past coordinates: %+v", instanceKey)
}
instance, err = StopReplication(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
return instance, err
}
// EnablePrimarySSL issues CHANGE MASTER TO MASTER_SSL=1
func EnablePrimarySSL(instanceKey *InstanceKey) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if instance.ReplicationThreadsExist() && !instance.ReplicationThreadsStopped() {
return instance, fmt.Errorf("EnablePrimarySSL: Cannot enable SSL replication on %+v because replication threads are not stopped", *instanceKey)
}
log.Infof("EnablePrimarySSL: Will attempt enabling SSL replication on %+v", *instanceKey)
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting CHANGE MASTER TO MASTER_SSL=1 operation on %+v; signaling error but nothing went wrong", *instanceKey)
}
_, err = ExecInstance(instanceKey, "change master to master_ssl=1")
if err != nil {
log.Error(err)
return instance, err
}
log.Infof("EnablePrimarySSL: Enabled SSL replication on %+v", *instanceKey)
instance, err = ReadTopologyInstance(instanceKey)
return instance, err
}
// See https://bugs.mysql.com/bug.php?id=83713
func workaroundBug83713(instanceKey *InstanceKey) {
log.Infof("workaroundBug83713: %+v", *instanceKey)
queries := []string{
`reset slave`,
`start slave IO_THREAD`,
`stop slave IO_THREAD`,
`reset slave`,
}
for _, query := range queries {
if _, err := ExecInstance(instanceKey, query); err != nil {
log.Infof("workaroundBug83713: error on %s: %+v", query, err)
}
}
}
// ChangePrimaryTo changes the given instance's primary according to given input.
// TODO(sougou): deprecate ReplicationCredentialsQuery, and all other credential discovery.
func ChangePrimaryTo(instanceKey *InstanceKey, primaryKey *InstanceKey, primaryBinlogCoordinates *BinlogCoordinates, skipUnresolve bool, gtidHint OperationGTIDHint) (*Instance, error) {
user, password := config.Config.MySQLReplicaUser, config.Config.MySQLReplicaPassword
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if instance.ReplicationThreadsExist() && !instance.ReplicationThreadsStopped() {
return instance, fmt.Errorf("ChangePrimaryTo: Cannot change primary on: %+v because replication threads are not stopped", *instanceKey)
}
log.Infof("ChangePrimaryTo: will attempt changing primary on %+v to %+v, %+v", *instanceKey, *primaryKey, *primaryBinlogCoordinates)
changeToPrimaryKey := primaryKey
if !skipUnresolve {
unresolvedPrimaryKey, nameUnresolved, err := UnresolveHostname(primaryKey)
if err != nil {
log.Infof("ChangePrimaryTo: aborting operation on %+v due to resolving error on %+v: %+v", *instanceKey, *primaryKey, err)
return instance, err
}
if nameUnresolved {
log.Infof("ChangePrimaryTo: Unresolved %+v into %+v", *primaryKey, unresolvedPrimaryKey)
}
changeToPrimaryKey = &unresolvedPrimaryKey
}
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting CHANGE MASTER TO operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
var changePrimaryFunc func() error
changedViaGTID := false
if instance.UsingMariaDBGTID && gtidHint != GTIDHintDeny {
// Keep on using GTID
changePrimaryFunc = func() error {
_, err := ExecInstance(instanceKey, "change master to master_user=?, master_password=?, master_host=?, master_port=?",
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port)
return err
}
changedViaGTID = true
} else if instance.UsingMariaDBGTID && gtidHint == GTIDHintDeny {
// Make sure to not use GTID
changePrimaryFunc = func() error {
_, err = ExecInstance(instanceKey, "change master to master_user=?, master_password=?, master_host=?, master_port=?, master_log_file=?, master_log_pos=?, master_use_gtid=no",
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port, primaryBinlogCoordinates.LogFile, primaryBinlogCoordinates.LogPos)
return err
}
} else if instance.IsMariaDB() && gtidHint == GTIDHintForce {
// Is MariaDB; not using GTID, turn into GTID
mariadbGTIDHint := "slave_pos"
if !instance.ReplicationThreadsExist() {
// This instance is currently a primary. As per https://mariadb.com/kb/en/change-master-to/#master_use_gtid
// we should be using current_pos.
// See also:
// - https://github.com/openark/orchestrator/issues/1146
// - https://dba.stackexchange.com/a/234323
mariadbGTIDHint = "current_pos"
}
changePrimaryFunc = func() error {
_, err = ExecInstance(instanceKey, fmt.Sprintf("change master to master_user=?, master_password=?, master_host=?, master_port=?, master_use_gtid=%s", mariadbGTIDHint),
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port)
return err
}
changedViaGTID = true
} else if instance.UsingOracleGTID && gtidHint != GTIDHintDeny {
// Is Oracle; already uses GTID; keep using it.
changePrimaryFunc = func() error {
_, err = ExecInstance(instanceKey, "change master to master_user=?, master_password=?, master_host=?, master_port=?",
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port)
return err
}
changedViaGTID = true
} else if instance.UsingOracleGTID && gtidHint == GTIDHintDeny {
// Is Oracle; already uses GTID
changePrimaryFunc = func() error {
_, err = ExecInstance(instanceKey, "change master to master_user=?, master_password=?, master_host=?, master_port=?, master_log_file=?, master_log_pos=?, master_auto_position=0",
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port, primaryBinlogCoordinates.LogFile, primaryBinlogCoordinates.LogPos)
return err
}
} else if instance.SupportsOracleGTID && gtidHint == GTIDHintForce {
// Is Oracle; not using GTID right now; turn into GTID
changePrimaryFunc = func() error {
_, err = ExecInstance(instanceKey, "change master to master_user=?, master_password=?, master_host=?, master_port=?, master_auto_position=1",
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port)
return err
}
changedViaGTID = true
} else {
// Normal binlog file:pos
changePrimaryFunc = func() error {
_, err = ExecInstance(instanceKey, "change master to master_user=?, master_password=?, master_host=?, master_port=?, master_log_file=?, master_log_pos=?",
user, password, changeToPrimaryKey.Hostname, changeToPrimaryKey.Port, primaryBinlogCoordinates.LogFile, primaryBinlogCoordinates.LogPos)
return err
}
}
err = changePrimaryFunc()
if err != nil && instance.UsingOracleGTID && strings.Contains(err.Error(), Error1201CouldnotInitializePrimaryInfoStructure) {
log.Infof("ChangePrimaryTo: got %+v", err)
workaroundBug83713(instanceKey)
err = changePrimaryFunc()
}
if err != nil {
log.Error(err)
return instance, err
}
durability, err := GetDurabilityPolicy(*primaryKey)
if err != nil {
log.Error(err)
return instance, err
}
semiSync := IsReplicaSemiSync(durability, *primaryKey, *instanceKey)
if _, err := ExecInstance(instanceKey, `set global rpl_semi_sync_master_enabled = ?, global rpl_semi_sync_slave_enabled = ?`, false, semiSync); err != nil {
log.Error(err)
return instance, err
}
_ = ResetInstanceRelaylogCoordinatesHistory(instanceKey)
log.Infof("ChangePrimaryTo: Changed primary on %+v to: %+v, %+v. GTID: %+v", *instanceKey, primaryKey, primaryBinlogCoordinates, changedViaGTID)
instance, err = ReadTopologyInstance(instanceKey)
return instance, err
}
// ResetReplication resets a replica, breaking the replication
func ResetReplication(instanceKey *InstanceKey) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if instance.ReplicationThreadsExist() && !instance.ReplicationThreadsStopped() {
return instance, fmt.Errorf("Cannot reset replication on: %+v because replication threads are not stopped", instanceKey)
}
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting reset-replication operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
// MySQL's RESET SLAVE is done correctly; however SHOW SLAVE STATUS still returns old hostnames etc
// and only resets till after next restart. This leads to vtorc still thinking the instance replicates
// from old host. We therefore forcibly modify the hostname.
// RESET SLAVE ALL command solves this, but only as of 5.6.3
_, err = ExecInstance(instanceKey, `change master to master_host='_'`)
if err != nil {
log.Error(err)
return instance, err
}
_, err = ExecInstance(instanceKey, `reset slave /*!50603 all */`)
if err != nil && strings.Contains(err.Error(), Error1201CouldnotInitializePrimaryInfoStructure) {
log.Infof("ResetReplication: got %+v", err)
workaroundBug83713(instanceKey)
_, err = ExecInstance(instanceKey, `reset slave /*!50603 all */`)
}
if err != nil {
log.Error(err)
return instance, err
}
log.Infof("Reset replication %+v", instanceKey)
instance, err = ReadTopologyInstance(instanceKey)
return instance, err
}
// ResetPrimary issues a RESET MASTER statement on given instance. Use with extreme care!
func ResetPrimary(instanceKey *InstanceKey) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if instance.ReplicationThreadsExist() && !instance.ReplicationThreadsStopped() {
return instance, fmt.Errorf("Cannot reset primary on: %+v because replication threads are not stopped", instanceKey)
}
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting reset-primary operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
_, err = ExecInstance(instanceKey, `reset master`)
if err != nil {
log.Error(err)
return instance, err
}
log.Infof("Reset primary %+v", instanceKey)
instance, err = ReadTopologyInstance(instanceKey)
return instance, err
}
// skipQueryClassic skips a query in normal binlog file:pos replication
func setGTIDPurged(instance *Instance, gtidPurged string) error {
if *config.RuntimeCLIFlags.Noop {
return fmt.Errorf("noop: aborting set-gtid-purged operation on %+v; signalling error but nothing went wrong", instance.Key)
}
_, err := ExecInstance(&instance.Key, `set global gtid_purged := ?`, gtidPurged)
return err
}
// injectEmptyGTIDTransaction
func injectEmptyGTIDTransaction(instanceKey *InstanceKey, gtidEntry *OracleGtidSetEntry) error {
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return err
}
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(config.Config.InstanceDBExecContextTimeoutSeconds)*time.Second)
defer cancel()
conn, err := db.Conn(ctx)
if err != nil {
return err
}
defer func() {
_ = conn.Close()
}()
if _, err := conn.ExecContext(ctx, fmt.Sprintf(`SET GTID_NEXT="%s"`, gtidEntry.String())); err != nil {
return err
}
tx, err := conn.BeginTx(ctx, &sql.TxOptions{})
if err != nil {
return err
}
if err := tx.Commit(); err != nil {
return err
}
if _, err := conn.ExecContext(ctx, `SET GTID_NEXT="AUTOMATIC"`); err != nil {
return err
}
return nil
}
// skipQueryClassic skips a query in normal binlog file:pos replication
func skipQueryClassic(instance *Instance) error {
_, err := ExecInstance(&instance.Key, `set global sql_slave_skip_counter := 1`)
return err
}
// skipQueryOracleGtid skips a single query in an Oracle GTID replicating replica, by injecting an empty transaction
func skipQueryOracleGtid(instance *Instance) error {
nextGtid, err := instance.NextGTID()
if err != nil {
return err
}
if nextGtid == "" {
return fmt.Errorf("Empty NextGTID() in skipQueryGtid() for %+v", instance.Key)
}
if _, err := ExecInstance(&instance.Key, `SET GTID_NEXT=?`, nextGtid); err != nil {
return err
}
if err := EmptyCommitInstance(&instance.Key); err != nil {
return err
}
if _, err := ExecInstance(&instance.Key, `SET GTID_NEXT='AUTOMATIC'`); err != nil {
return err
}
return nil
}
// SkipQuery skip a single query in a failed replication instance
func SkipQuery(instanceKey *InstanceKey) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if !instance.IsReplica() {
return instance, fmt.Errorf("instance is not a replica: %+v", instanceKey)
}
if instance.ReplicationSQLThreadRuning {
return instance, fmt.Errorf("Replication SQL thread is running on %+v", instanceKey)
}
if instance.LastSQLError == "" {
return instance, fmt.Errorf("No SQL error on %+v", instanceKey)
}
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting skip-query operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
log.Infof("Skipping one query on %+v", instanceKey)
if instance.UsingOracleGTID {
err = skipQueryOracleGtid(instance)
} else if instance.UsingMariaDBGTID {
errMsg := fmt.Sprintf("%+v is replicating with MariaDB GTID. To skip a query first disable GTID, then skip, then enable GTID again", *instanceKey)
log.Errorf(errMsg)
return instance, fmt.Errorf(errMsg)
} else {
err = skipQueryClassic(instance)
}
if err != nil {
log.Error(err)
return instance, err
}
_ = AuditOperation("skip-query", instanceKey, "Skipped one query")
return StartReplication(instanceKey)
}
// PrimaryPosWait issues a MASTER_POS_WAIT() an given instance according to given coordinates.
func PrimaryPosWait(instanceKey *InstanceKey, binlogCoordinates *BinlogCoordinates) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
_, err = ExecInstance(instanceKey, `select master_pos_wait(?, ?)`, binlogCoordinates.LogFile, binlogCoordinates.LogPos)
if err != nil {
log.Error(err)
return instance, err
}
log.Infof("Instance %+v has reached coordinates: %+v", instanceKey, binlogCoordinates)
instance, err = ReadTopologyInstance(instanceKey)
return instance, err
}
// SetReadOnly sets or clears the instance's global read_only variable
func SetReadOnly(instanceKey *InstanceKey, readOnly bool) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting set-read-only operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
if _, err := ExecInstance(instanceKey, "set global read_only = ?", readOnly); err != nil {
log.Error(err)
return instance, err
}
if config.Config.UseSuperReadOnly {
if _, err := ExecInstance(instanceKey, "set global super_read_only = ?", readOnly); err != nil {
// We don't bail out here. super_read_only is only available on
// MySQL 5.7.8 and Percona Server 5.6.21-70
// At this time vtorc does not verify whether a server supports super_read_only or not.
// It makes a best effort to set it.
log.Error(err)
}
}
instance, err = ReadTopologyInstance(instanceKey)
log.Infof("instance %+v read_only: %t", instanceKey, readOnly)
_ = AuditOperation("read-only", instanceKey, fmt.Sprintf("set as %t", readOnly))
return instance, err
}
// KillQuery stops replication on a given instance
func KillQuery(instanceKey *InstanceKey, process int64) (*Instance, error) {
instance, err := ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
if *config.RuntimeCLIFlags.Noop {
return instance, fmt.Errorf("noop: aborting kill-query operation on %+v; signalling error but nothing went wrong", *instanceKey)
}
_, err = ExecInstance(instanceKey, `kill query ?`, process)
if err != nil {
log.Error(err)
return instance, err
}
instance, err = ReadTopologyInstance(instanceKey)
if err != nil {
log.Error(err)
return instance, err
}
log.Infof("Killed query on %+v", *instanceKey)
_ = AuditOperation("kill-query", instanceKey, fmt.Sprintf("Killed query %d", process))
return instance, err
}
func GTIDSubtract(instanceKey *InstanceKey, gtidSet string, gtidSubset string) (gtidSubtract string, err error) {
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return gtidSubtract, err
}
err = db.QueryRow("select gtid_subtract(?, ?)", gtidSet, gtidSubset).Scan(&gtidSubtract)
return gtidSubtract, err
}
func ShowPrimaryStatus(instanceKey *InstanceKey) (primaryStatusFound bool, executedGtidSet string, err error) {
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return primaryStatusFound, executedGtidSet, err
}
err = sqlutils.QueryRowsMap(db, "show master status", func(m sqlutils.RowMap) error {
primaryStatusFound = true
executedGtidSet = m.GetStringD("Executed_Gtid_Set", "")
return nil
})
return primaryStatusFound, executedGtidSet, err
}
func ShowBinaryLogs(instanceKey *InstanceKey) (binlogs []string, err error) {
db, err := db.OpenTopology(instanceKey.Hostname, instanceKey.Port)
if err != nil {
return binlogs, err
}
err = sqlutils.QueryRowsMap(db, "show binary logs", func(m sqlutils.RowMap) error {
binlogs = append(binlogs, m.GetString("Log_name"))
return nil
})
return binlogs, err
}

Просмотреть файл

@ -1,570 +0,0 @@
package inst
import (
"math/rand"
"testing"
"vitess.io/vitess/go/vt/vtctl/reparentutil/promotionrule"
"vitess.io/vitess/go/vt/vtorc/config"
test "vitess.io/vitess/go/vt/vtorc/external/golib/tests"
)
var (
i710Key = InstanceKey{Hostname: "i710", Port: 3306}
i720Key = InstanceKey{Hostname: "i720", Port: 3306}
i730Key = InstanceKey{Hostname: "i730", Port: 3306}
i810Key = InstanceKey{Hostname: "i810", Port: 3306}
i820Key = InstanceKey{Hostname: "i820", Port: 3306}
i830Key = InstanceKey{Hostname: "i830", Port: 3306}
)
func init() {
config.Config.HostnameResolveMethod = "none"
config.MarkConfigurationLoaded()
}
func generateTestInstances() (instances [](*Instance), instancesMap map[string](*Instance)) {
i710 := Instance{Key: i710Key, ServerID: 710, ExecBinlogCoordinates: BinlogCoordinates{LogFile: "mysql.000007", LogPos: 10}}
i720 := Instance{Key: i720Key, ServerID: 720, ExecBinlogCoordinates: BinlogCoordinates{LogFile: "mysql.000007", LogPos: 20}}
i730 := Instance{Key: i730Key, ServerID: 730, ExecBinlogCoordinates: BinlogCoordinates{LogFile: "mysql.000007", LogPos: 30}}
i810 := Instance{Key: i810Key, ServerID: 810, ExecBinlogCoordinates: BinlogCoordinates{LogFile: "mysql.000008", LogPos: 10}}
i820 := Instance{Key: i820Key, ServerID: 820, ExecBinlogCoordinates: BinlogCoordinates{LogFile: "mysql.000008", LogPos: 20}}
i830 := Instance{Key: i830Key, ServerID: 830, ExecBinlogCoordinates: BinlogCoordinates{LogFile: "mysql.000008", LogPos: 30}}
instances = [](*Instance){&i710, &i720, &i730, &i810, &i820, &i830}
for _, instance := range instances {
instance.Version = "5.6.7"
instance.BinlogFormat = "STATEMENT"
}
instancesMap = make(map[string](*Instance))
for _, instance := range instances {
instancesMap[instance.Key.StringCode()] = instance
}
return instances, instancesMap
}
func applyGeneralGoodToGoReplicationParams(instances [](*Instance)) {
for _, instance := range instances {
instance.IsLastCheckValid = true
instance.LogBinEnabled = true
instance.LogReplicationUpdatesEnabled = true
}
}
func TestInitial(t *testing.T) {
test.S(t).ExpectTrue(true)
}
func TestSortInstances(t *testing.T) {
instances, _ := generateTestInstances()
sortInstances(instances)
test.S(t).ExpectEquals(instances[0].Key, i830Key)
test.S(t).ExpectEquals(instances[1].Key, i820Key)
test.S(t).ExpectEquals(instances[2].Key, i810Key)
test.S(t).ExpectEquals(instances[3].Key, i730Key)
test.S(t).ExpectEquals(instances[4].Key, i720Key)
test.S(t).ExpectEquals(instances[5].Key, i710Key)
}
func TestSortInstancesSameCoordinatesDifferingBinlogFormats(t *testing.T) {
instances, instancesMap := generateTestInstances()
for _, instance := range instances {
instance.ExecBinlogCoordinates = instances[0].ExecBinlogCoordinates
instance.BinlogFormat = "MIXED"
}
instancesMap[i810Key.StringCode()].BinlogFormat = "STATEMENT"
instancesMap[i720Key.StringCode()].BinlogFormat = "ROW"
sortInstances(instances)
test.S(t).ExpectEquals(instances[0].Key, i810Key)
test.S(t).ExpectEquals(instances[5].Key, i720Key)
}
func TestSortInstancesSameCoordinatesDifferingVersions(t *testing.T) {
instances, instancesMap := generateTestInstances()
for _, instance := range instances {
instance.ExecBinlogCoordinates = instances[0].ExecBinlogCoordinates
}
instancesMap[i810Key.StringCode()].Version = "5.5.1"
instancesMap[i720Key.StringCode()].Version = "5.7.8"
sortInstances(instances)
test.S(t).ExpectEquals(instances[0].Key, i810Key)
test.S(t).ExpectEquals(instances[5].Key, i720Key)
}
func TestSortInstancesDataCenterHint(t *testing.T) {
instances, instancesMap := generateTestInstances()
for _, instance := range instances {
instance.ExecBinlogCoordinates = instances[0].ExecBinlogCoordinates
instance.DataCenter = "somedc"
}
instancesMap[i810Key.StringCode()].DataCenter = "localdc"
SortInstancesDataCenterHint(instances, "localdc")
test.S(t).ExpectEquals(instances[0].Key, i810Key)
}
func TestSortInstancesGtidErrant(t *testing.T) {
instances, instancesMap := generateTestInstances()
for _, instance := range instances {
instance.ExecBinlogCoordinates = instances[0].ExecBinlogCoordinates
instance.GtidErrant = "00020192-1111-1111-1111-111111111111:1"
}
instancesMap[i810Key.StringCode()].GtidErrant = ""
sortInstances(instances)
test.S(t).ExpectEquals(instances[0].Key, i810Key)
}
func TestGetPriorityMajorVersionForCandidate(t *testing.T) {
{
instances, instancesMap := generateTestInstances()
priorityMajorVersion, err := getPriorityMajorVersionForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityMajorVersion, "5.6")
instancesMap[i810Key.StringCode()].Version = "5.5.1"
instancesMap[i720Key.StringCode()].Version = "5.7.8"
priorityMajorVersion, err = getPriorityMajorVersionForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityMajorVersion, "5.6")
instancesMap[i710Key.StringCode()].Version = "5.7.8"
instancesMap[i720Key.StringCode()].Version = "5.7.8"
instancesMap[i730Key.StringCode()].Version = "5.7.8"
instancesMap[i830Key.StringCode()].Version = "5.7.8"
priorityMajorVersion, err = getPriorityMajorVersionForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityMajorVersion, "5.7")
}
{
instances, instancesMap := generateTestInstances()
instancesMap[i710Key.StringCode()].Version = "5.6.9"
instancesMap[i720Key.StringCode()].Version = "5.6.9"
instancesMap[i730Key.StringCode()].Version = "5.7.8"
instancesMap[i810Key.StringCode()].Version = "5.7.8"
instancesMap[i820Key.StringCode()].Version = "5.7.8"
instancesMap[i830Key.StringCode()].Version = "5.6.9"
priorityMajorVersion, err := getPriorityMajorVersionForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityMajorVersion, "5.6")
}
// We will be testing under conditions that map iteration is in random order.
for range rand.Perm(20) { // Just running many iterations to cover multiple possible map iteration ordering. Perm() is just used as an array generator here.
instances, _ := generateTestInstances()
for _, instance := range instances {
instance.Version = "5.6.9"
}
test.S(t).ExpectEquals(len(instances), 6)
// Randomly populating different elements of the array/map
perm := rand.Perm(len(instances))[0 : len(instances)/2]
for _, i := range perm {
instances[i].Version = "5.7.8"
}
// getPriorityMajorVersionForCandidate uses map iteration
priorityMajorVersion, err := getPriorityMajorVersionForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityMajorVersion, "5.6")
}
}
func TestGetPriorityBinlogFormatForCandidate(t *testing.T) {
{
instances, instancesMap := generateTestInstances()
priorityBinlogFormat, err := getPriorityBinlogFormatForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityBinlogFormat, "STATEMENT")
instancesMap[i810Key.StringCode()].BinlogFormat = "MIXED"
instancesMap[i720Key.StringCode()].BinlogFormat = "ROW"
priorityBinlogFormat, err = getPriorityBinlogFormatForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityBinlogFormat, "STATEMENT")
instancesMap[i710Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i720Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i730Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i830Key.StringCode()].BinlogFormat = "ROW"
priorityBinlogFormat, err = getPriorityBinlogFormatForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityBinlogFormat, "ROW")
}
for _, lowBinlogFormat := range []string{"STATEMENT", "MIXED"} {
// We will be testing under conditions that map iteration is in random order.
for range rand.Perm(20) { // Just running many iterations to cover multiple possible map iteration ordering. Perm() is just used as an array generator here.
instances, _ := generateTestInstances()
for _, instance := range instances {
instance.BinlogFormat = lowBinlogFormat
}
test.S(t).ExpectEquals(len(instances), 6)
// Randomly populating different elements of the array/map
perm := rand.Perm(len(instances))[0 : len(instances)/2]
for _, i := range perm {
instances[i].BinlogFormat = "ROW"
}
// getPriorityBinlogFormatForCandidate uses map iteration
priorityBinlogFormat, err := getPriorityBinlogFormatForCandidate(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(priorityBinlogFormat, lowBinlogFormat)
}
}
}
func TestIsGenerallyValidAsBinlogSource(t *testing.T) {
instances, _ := generateTestInstances()
for _, instance := range instances {
test.S(t).ExpectFalse(isGenerallyValidAsBinlogSource(instance))
}
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
test.S(t).ExpectTrue(isGenerallyValidAsBinlogSource(instance))
}
}
func TestIsGenerallyValidAsCandidateReplica(t *testing.T) {
instances, _ := generateTestInstances()
for _, instance := range instances {
test.S(t).ExpectFalse(isGenerallyValidAsCandidateReplica(instance))
}
for _, instance := range instances {
instance.IsLastCheckValid = true
instance.LogBinEnabled = true
instance.LogReplicationUpdatesEnabled = false
}
for _, instance := range instances {
test.S(t).ExpectFalse(isGenerallyValidAsCandidateReplica(instance))
}
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
test.S(t).ExpectTrue(isGenerallyValidAsCandidateReplica(instance))
}
}
func TestIsBannedFromBeingCandidateReplica(t *testing.T) {
{
instances, _ := generateTestInstances()
for _, instance := range instances {
test.S(t).ExpectFalse(IsBannedFromBeingCandidateReplica(instance))
}
}
{
instances, _ := generateTestInstances()
for _, instance := range instances {
instance.PromotionRule = promotionrule.MustNot
}
for _, instance := range instances {
test.S(t).ExpectTrue(IsBannedFromBeingCandidateReplica(instance))
}
}
{
instances, _ := generateTestInstances()
config.Config.PromotionIgnoreHostnameFilters = []string{
"i7",
"i8[0-9]0",
}
for _, instance := range instances {
test.S(t).ExpectTrue(IsBannedFromBeingCandidateReplica(instance))
}
config.Config.PromotionIgnoreHostnameFilters = []string{}
}
}
func TestChooseCandidateReplicaNoCandidateReplica(t *testing.T) {
instances, _ := generateTestInstances()
for _, instance := range instances {
instance.IsLastCheckValid = true
instance.LogBinEnabled = true
instance.LogReplicationUpdatesEnabled = false
}
_, _, _, _, _, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNotNil(err)
}
func TestChooseCandidateReplica(t *testing.T) {
instances, _ := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 5)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplica2(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].LogReplicationUpdatesEnabled = false
instancesMap[i820Key.StringCode()].LogBinEnabled = false
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i810Key)
test.S(t).ExpectEquals(len(aheadReplicas), 2)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 3)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaSameCoordinatesDifferentVersions(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.ExecBinlogCoordinates = instances[0].ExecBinlogCoordinates
}
instancesMap[i810Key.StringCode()].Version = "5.5.1"
instancesMap[i720Key.StringCode()].Version = "5.7.8"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i810Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 5)
test.S(t).ExpectEquals(len(laterReplicas), 0)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityVersionNoLoss(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].Version = "5.5.1"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 5)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityVersionLosesOne(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].Version = "5.7.8"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i820Key)
test.S(t).ExpectEquals(len(aheadReplicas), 1)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 4)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityVersionLosesTwo(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].Version = "5.7.8"
instancesMap[i820Key.StringCode()].Version = "5.7.18"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i810Key)
test.S(t).ExpectEquals(len(aheadReplicas), 2)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 3)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityVersionHigherVersionOverrides(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].Version = "5.7.8"
instancesMap[i820Key.StringCode()].Version = "5.7.18"
instancesMap[i810Key.StringCode()].Version = "5.7.5"
instancesMap[i730Key.StringCode()].Version = "5.7.30"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 3)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 2)
}
func TestChooseCandidateReplicaLosesOneDueToBinlogFormat(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.BinlogFormat = "ROW"
}
instancesMap[i730Key.StringCode()].BinlogFormat = "STATEMENT"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 4)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 1)
}
func TestChooseCandidateReplicaPriorityBinlogFormatNoLoss(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.BinlogFormat = "MIXED"
}
instancesMap[i830Key.StringCode()].BinlogFormat = "STATEMENT"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 5)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityBinlogFormatLosesOne(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].BinlogFormat = "ROW"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i820Key)
test.S(t).ExpectEquals(len(aheadReplicas), 1)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 4)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityBinlogFormatLosesTwo(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i820Key.StringCode()].BinlogFormat = "ROW"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i810Key)
test.S(t).ExpectEquals(len(aheadReplicas), 2)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 3)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPriorityBinlogFormatRowOverrides(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i820Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i810Key.StringCode()].BinlogFormat = "ROW"
instancesMap[i730Key.StringCode()].BinlogFormat = "ROW"
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 3)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 2)
}
func TestChooseCandidateReplicaMustNotPromoteRule(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].PromotionRule = promotionrule.MustNot
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i820Key)
test.S(t).ExpectEquals(len(aheadReplicas), 1)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 4)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPreferNotPromoteRule(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
instancesMap[i830Key.StringCode()].PromotionRule = promotionrule.MustNot
instancesMap[i820Key.StringCode()].PromotionRule = promotionrule.PreferNot
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i820Key)
test.S(t).ExpectEquals(len(aheadReplicas), 1)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 4)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPreferNotPromoteRule2(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.PromotionRule = promotionrule.PreferNot
}
instancesMap[i830Key.StringCode()].PromotionRule = promotionrule.MustNot
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i820Key)
test.S(t).ExpectEquals(len(aheadReplicas), 1)
test.S(t).ExpectEquals(len(equalReplicas), 0)
test.S(t).ExpectEquals(len(laterReplicas), 4)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPromoteRuleOrdering(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.ExecBinlogCoordinates = instancesMap[i710Key.StringCode()].ExecBinlogCoordinates
instance.PromotionRule = promotionrule.Neutral
}
instancesMap[i830Key.StringCode()].PromotionRule = promotionrule.Prefer
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i830Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 5)
test.S(t).ExpectEquals(len(laterReplicas), 0)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPromoteRuleOrdering2(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.ExecBinlogCoordinates = instancesMap[i710Key.StringCode()].ExecBinlogCoordinates
instance.PromotionRule = promotionrule.Prefer
}
instancesMap[i820Key.StringCode()].PromotionRule = promotionrule.Must
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i820Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 5)
test.S(t).ExpectEquals(len(laterReplicas), 0)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}
func TestChooseCandidateReplicaPromoteRuleOrdering3(t *testing.T) {
instances, instancesMap := generateTestInstances()
applyGeneralGoodToGoReplicationParams(instances)
for _, instance := range instances {
instance.ExecBinlogCoordinates = instancesMap[i710Key.StringCode()].ExecBinlogCoordinates
instance.PromotionRule = promotionrule.Neutral
}
instancesMap[i730Key.StringCode()].PromotionRule = promotionrule.Must
instancesMap[i810Key.StringCode()].PromotionRule = promotionrule.Prefer
instancesMap[i830Key.StringCode()].PromotionRule = promotionrule.PreferNot
instances = sortedReplicas(instances, NoStopReplication)
candidate, aheadReplicas, equalReplicas, laterReplicas, cannotReplicateReplicas, err := ChooseCandidateReplica(instances)
test.S(t).ExpectNil(err)
test.S(t).ExpectEquals(candidate.Key, i730Key)
test.S(t).ExpectEquals(len(aheadReplicas), 0)
test.S(t).ExpectEquals(len(equalReplicas), 5)
test.S(t).ExpectEquals(len(laterReplicas), 0)
test.S(t).ExpectEquals(len(cannotReplicateReplicas), 0)
}

Просмотреть файл

@ -26,193 +26,6 @@ var (
DowntimeLostInRecoveryMessage = "lost-in-recovery"
)
// majorVersionsSortedByCount sorts (major) versions:
// - primary sort: by count appearances
// - secondary sort: by version
type majorVersionsSortedByCount struct {
versionsCount map[string]int
versions []string
}
func newMajorVersionsSortedByCount(versionsCount map[string]int) *majorVersionsSortedByCount {
versions := []string{}
for v := range versionsCount {
versions = append(versions, v)
}
return &majorVersionsSortedByCount{
versionsCount: versionsCount,
versions: versions,
}
}
func (majorVersionSorter *majorVersionsSortedByCount) Len() int {
return len(majorVersionSorter.versions)
}
func (majorVersionSorter *majorVersionsSortedByCount) Swap(i, j int) {
majorVersionSorter.versions[i], majorVersionSorter.versions[j] = majorVersionSorter.versions[j], majorVersionSorter.versions[i]
}
func (majorVersionSorter *majorVersionsSortedByCount) Less(i, j int) bool {
if majorVersionSorter.versionsCount[majorVersionSorter.versions[i]] == majorVersionSorter.versionsCount[majorVersionSorter.versions[j]] {
return majorVersionSorter.versions[i] > majorVersionSorter.versions[j]
}
return majorVersionSorter.versionsCount[majorVersionSorter.versions[i]] < majorVersionSorter.versionsCount[majorVersionSorter.versions[j]]
}
func (majorVersionSorter *majorVersionsSortedByCount) First() string {
return majorVersionSorter.versions[0]
}
// majorVersionsSortedByCount sorts (major) versions:
// - primary sort: by count appearances
// - secondary sort: by version
type binlogFormatSortedByCount struct {
formatsCount map[string]int
formats []string
}
func newBinlogFormatSortedByCount(formatsCount map[string]int) *binlogFormatSortedByCount {
formats := []string{}
for v := range formatsCount {
formats = append(formats, v)
}
return &binlogFormatSortedByCount{
formatsCount: formatsCount,
formats: formats,
}
}
func (binlogFormatSorter *binlogFormatSortedByCount) Len() int {
return len(binlogFormatSorter.formats)
}
func (binlogFormatSorter *binlogFormatSortedByCount) Swap(i, j int) {
binlogFormatSorter.formats[i], binlogFormatSorter.formats[j] = binlogFormatSorter.formats[j], binlogFormatSorter.formats[i]
}
func (binlogFormatSorter *binlogFormatSortedByCount) Less(i, j int) bool {
if binlogFormatSorter.formatsCount[binlogFormatSorter.formats[i]] == binlogFormatSorter.formatsCount[binlogFormatSorter.formats[j]] {
return IsSmallerBinlogFormat(binlogFormatSorter.formats[j], binlogFormatSorter.formats[i])
}
return binlogFormatSorter.formatsCount[binlogFormatSorter.formats[i]] < binlogFormatSorter.formatsCount[binlogFormatSorter.formats[j]]
}
func (binlogFormatSorter *binlogFormatSortedByCount) First() string {
return binlogFormatSorter.formats[0]
}
// InstancesSorterByExec sorts instances by executed binlog coordinates
type InstancesSorterByExec struct {
instances [](*Instance)
dataCenter string
}
func NewInstancesSorterByExec(instances [](*Instance), dataCenter string) *InstancesSorterByExec {
return &InstancesSorterByExec{
instances: instances,
dataCenter: dataCenter,
}
}
func (instancesSorter *InstancesSorterByExec) Len() int { return len(instancesSorter.instances) }
func (instancesSorter *InstancesSorterByExec) Swap(i, j int) {
instancesSorter.instances[i], instancesSorter.instances[j] = instancesSorter.instances[j], instancesSorter.instances[i]
}
func (instancesSorter *InstancesSorterByExec) Less(i, j int) bool {
// Returning "true" in this function means [i] is "smaller" than [j],
// which will lead to [j] be a better candidate for promotion
// Sh*t happens. We just might get nil while attempting to discover/recover
if instancesSorter.instances[i] == nil {
return false
}
if instancesSorter.instances[j] == nil {
return true
}
if instancesSorter.instances[i].ExecBinlogCoordinates.Equals(&instancesSorter.instances[j].ExecBinlogCoordinates) {
// Secondary sorting: "smaller" if not logging replica updates
if instancesSorter.instances[j].LogReplicationUpdatesEnabled && !instancesSorter.instances[i].LogReplicationUpdatesEnabled {
return true
}
// Next sorting: "smaller" if of higher version (this will be reversed eventually)
// Idea is that given 5.6 a& 5.7 both of the exact position, we will want to promote
// the 5.6 on top of 5.7, as the other way around is invalid
if instancesSorter.instances[j].IsSmallerMajorVersion(instancesSorter.instances[i]) {
return true
}
// Next sorting: "smaller" if of larger binlog-format (this will be reversed eventually)
// Idea is that given ROW & STATEMENT both of the exact position, we will want to promote
// the STATEMENT on top of ROW, as the other way around is invalid
if instancesSorter.instances[j].IsSmallerBinlogFormat(instancesSorter.instances[i]) {
return true
}
// Prefer local datacenter:
if instancesSorter.instances[j].DataCenter == instancesSorter.dataCenter && instancesSorter.instances[i].DataCenter != instancesSorter.dataCenter {
return true
}
// Prefer if not having errant GTID
if instancesSorter.instances[j].GtidErrant == "" && instancesSorter.instances[i].GtidErrant != "" {
return true
}
// Prefer candidates:
if instancesSorter.instances[j].PromotionRule.BetterThan(instancesSorter.instances[i].PromotionRule) {
return true
}
}
return instancesSorter.instances[i].ExecBinlogCoordinates.SmallerThan(&instancesSorter.instances[j].ExecBinlogCoordinates)
}
// filterInstancesByPattern will filter given array of instances according to regular expression pattern
func filterInstancesByPattern(instances [](*Instance), pattern string) [](*Instance) {
if pattern == "" {
return instances
}
filtered := [](*Instance){}
for _, instance := range instances {
if matched, _ := regexp.MatchString(pattern, instance.Key.DisplayString()); matched {
filtered = append(filtered, instance)
}
}
return filtered
}
// removeInstance will remove an instance from a list of instances
func RemoveInstance(instances [](*Instance), instanceKey *InstanceKey) [](*Instance) {
if instanceKey == nil {
return instances
}
for i := len(instances) - 1; i >= 0; i-- {
if instances[i].Key.Equals(instanceKey) {
instances = append(instances[:i], instances[i+1:]...)
}
}
return instances
}
// removeBinlogServerInstances will remove all binlog servers from given lsit
func RemoveBinlogServerInstances(instances [](*Instance)) [](*Instance) {
for i := len(instances) - 1; i >= 0; i-- {
if instances[i].IsBinlogServer() {
instances = append(instances[:i], instances[i+1:]...)
}
}
return instances
}
// removeNilInstances
func RemoveNilInstances(instances [](*Instance)) [](*Instance) {
for i := len(instances) - 1; i >= 0; i-- {
if instances[i] == nil {
instances = append(instances[:i], instances[i+1:]...)
}
}
return instances
}
// SemicolonTerminated is a utility function that makes sure a statement is terminated with
// a semicolon, if it isn't already
func SemicolonTerminated(statement string) string {
statement = strings.TrimSpace(statement)
statement = strings.TrimRight(statement, ";")
statement = statement + ";"
return statement
}
// MajorVersion returns a MySQL major version number (e.g. given "5.5.36" it returns "5.5")
func MajorVersion(version string) []string {
tokens := strings.Split(version, ".")

Просмотреть файл

@ -23,206 +23,8 @@ import (
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/db"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
"vitess.io/vitess/go/vt/vtorc/process"
"vitess.io/vitess/go/vt/vtorc/util"
)
// ReadActiveMaintenance returns the list of currently active maintenance entries
func ReadActiveMaintenance() ([]Maintenance, error) {
res := []Maintenance{}
query := `
select
database_instance_maintenance_id,
hostname,
port,
begin_timestamp,
unix_timestamp() - unix_timestamp(begin_timestamp) as seconds_elapsed,
maintenance_active,
owner,
reason
from
database_instance_maintenance
where
maintenance_active = 1
order by
database_instance_maintenance_id
`
err := db.QueryVTOrcRowsMap(query, func(m sqlutils.RowMap) error {
maintenance := Maintenance{}
maintenance.MaintenanceID = m.GetUint("database_instance_maintenance_id")
maintenance.Key.Hostname = m.GetString("hostname")
maintenance.Key.Port = m.GetInt("port")
maintenance.BeginTimestamp = m.GetString("begin_timestamp")
maintenance.SecondsElapsed = m.GetUint("seconds_elapsed")
maintenance.IsActive = m.GetBool("maintenance_active")
maintenance.Owner = m.GetString("owner")
maintenance.Reason = m.GetString("reason")
res = append(res, maintenance)
return nil
})
if err != nil {
log.Error(err)
}
return res, err
}
// BeginBoundedMaintenance will make new maintenance entry for given instanceKey.
func BeginBoundedMaintenance(instanceKey *InstanceKey, owner string, reason string, durationSeconds uint, explicitlyBounded bool) (int64, error) {
var maintenanceToken int64
if durationSeconds == 0 {
durationSeconds = config.MaintenanceExpireMinutes * 60
}
res, err := db.ExecVTOrc(`
insert ignore
into database_instance_maintenance (
hostname, port, maintenance_active, begin_timestamp, end_timestamp, owner, reason,
processing_node_hostname, processing_node_token, explicitly_bounded
) VALUES (
?, ?, 1, NOW(), NOW() + INTERVAL ? SECOND, ?, ?,
?, ?, ?
)
`,
instanceKey.Hostname,
instanceKey.Port,
durationSeconds,
owner,
reason,
process.ThisHostname,
util.ProcessToken.Hash,
explicitlyBounded,
)
if err != nil {
log.Error(err)
return maintenanceToken, err
}
if affected, _ := res.RowsAffected(); affected == 0 {
err = fmt.Errorf("Cannot begin maintenance for instance: %+v; maintenance reason: %+v", instanceKey, reason)
} else {
// success
maintenanceToken, _ = res.LastInsertId()
_ = AuditOperation("begin-maintenance", instanceKey, fmt.Sprintf("maintenanceToken: %d, owner: %s, reason: %s", maintenanceToken, owner, reason))
}
return maintenanceToken, err
}
// BeginMaintenance will make new maintenance entry for given instanceKey. Maintenance time is unbounded
func BeginMaintenance(instanceKey *InstanceKey, owner string, reason string) (int64, error) {
return BeginBoundedMaintenance(instanceKey, owner, reason, 0, false)
}
// EndMaintenanceByInstanceKey will terminate an active maintenance using given instanceKey as hint
func EndMaintenanceByInstanceKey(instanceKey *InstanceKey) (wasMaintenance bool, err error) {
res, err := db.ExecVTOrc(`
update
database_instance_maintenance
set
maintenance_active = NULL,
end_timestamp = NOW()
where
hostname = ?
and port = ?
and maintenance_active = 1
`,
instanceKey.Hostname,
instanceKey.Port,
)
if err != nil {
log.Error(err)
return wasMaintenance, err
}
if affected, _ := res.RowsAffected(); affected > 0 {
// success
wasMaintenance = true
_ = AuditOperation("end-maintenance", instanceKey, "")
}
return wasMaintenance, err
}
// InMaintenance checks whether a given instance is under maintenacne
func InMaintenance(instanceKey *InstanceKey) (inMaintenance bool, err error) {
query := `
select
count(*) > 0 as in_maintenance
from
database_instance_maintenance
where
hostname = ?
and port = ?
and maintenance_active = 1
and end_timestamp > NOW()
`
args := sqlutils.Args(instanceKey.Hostname, instanceKey.Port)
err = db.QueryVTOrc(query, args, func(m sqlutils.RowMap) error {
inMaintenance = m.GetBool("in_maintenance")
return nil
})
if err != nil {
log.Error(err)
}
return inMaintenance, err
}
// ReadMaintenanceInstanceKey will return the instanceKey for active maintenance by maintenanceToken
func ReadMaintenanceInstanceKey(maintenanceToken int64) (*InstanceKey, error) {
var res *InstanceKey
query := `
select
hostname, port
from
database_instance_maintenance
where
database_instance_maintenance_id = ?
`
err := db.QueryVTOrc(query, sqlutils.Args(maintenanceToken), func(m sqlutils.RowMap) error {
instanceKey, merr := NewResolveInstanceKey(m.GetString("hostname"), m.GetInt("port"))
if merr != nil {
return merr
}
res = instanceKey
return nil
})
if err != nil {
log.Error(err)
}
return res, err
}
// EndMaintenance will terminate an active maintenance via maintenanceToken
func EndMaintenance(maintenanceToken int64) (wasMaintenance bool, err error) {
res, err := db.ExecVTOrc(`
update
database_instance_maintenance
set
maintenance_active = NULL,
end_timestamp = NOW()
where
database_instance_maintenance_id = ?
`,
maintenanceToken,
)
if err != nil {
log.Error(err)
return wasMaintenance, err
}
if affected, _ := res.RowsAffected(); affected > 0 {
// success
wasMaintenance = true
instanceKey, _ := ReadMaintenanceInstanceKey(maintenanceToken)
_ = AuditOperation("end-maintenance", instanceKey, fmt.Sprintf("maintenanceToken: %d", maintenanceToken))
}
return wasMaintenance, err
}
// ExpireMaintenance will remove the maintenance flag on old maintenances and on bounded maintenances
func ExpireMaintenance() error {
{

Просмотреть файл

@ -1,79 +0,0 @@
/*
Copyright 2015 Shlomi Noach, courtesy Booking.com
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package inst
import (
"strings"
"time"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/config"
)
// PoolInstancesMap lists instance keys per pool name
type PoolInstancesMap map[string]([]*InstanceKey)
type PoolInstancesSubmission struct {
CreatedAt time.Time
Pool string
DelimitedInstances string
RegisteredAt string
}
func NewPoolInstancesSubmission(pool string, instances string) *PoolInstancesSubmission {
return &PoolInstancesSubmission{
CreatedAt: time.Now(),
Pool: pool,
DelimitedInstances: instances,
}
}
// ClusterPoolInstance is an instance mapping a cluster, pool & instance
type ClusterPoolInstance struct {
ClusterName string
Pool string
Hostname string
Port int
}
func ApplyPoolInstances(submission *PoolInstancesSubmission) error {
if submission.CreatedAt.Add(time.Duration(config.Config.InstancePoolExpiryMinutes) * time.Minute).Before(time.Now()) {
// already expired; no need to persist
return nil
}
var instanceKeys [](*InstanceKey)
if submission.DelimitedInstances != "" {
instancesStrings := strings.Split(submission.DelimitedInstances, ",")
for _, instanceString := range instancesStrings {
instanceString = strings.TrimSpace(instanceString)
instanceKey, err := ParseResolveInstanceKey(instanceString)
if config.Config.SupportFuzzyPoolHostnames {
instanceKey = ReadFuzzyInstanceKeyIfPossible(instanceKey)
}
if err != nil {
log.Error(err)
return err
}
instanceKeys = append(instanceKeys, instanceKey)
}
}
log.Infof("submitting %d instances in %+v pool", len(instanceKeys), submission.Pool)
_ = writePoolInstances(submission.Pool, instanceKeys)
return nil
}

Просмотреть файл

@ -1,135 +0,0 @@
/*
Copyright 2015 Shlomi Noach, courtesy Booking.com
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package inst
import (
"fmt"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/db"
"vitess.io/vitess/go/vt/vtorc/external/golib/sqlutils"
)
// writePoolInstances will write (and override) a single cluster name mapping
func writePoolInstances(pool string, instanceKeys [](*InstanceKey)) error {
writeFunc := func() error {
dbh, err := db.OpenVTOrc()
if err != nil {
log.Error(err)
return err
}
tx, _ := dbh.Begin()
if _, err := tx.Exec(`delete from database_instance_pool where pool = ?`, pool); err != nil {
_ = tx.Rollback()
log.Error(err)
return err
}
query := `insert into database_instance_pool (hostname, port, pool, registered_at) values (?, ?, ?, now())`
for _, instanceKey := range instanceKeys {
if _, err := tx.Exec(query, instanceKey.Hostname, instanceKey.Port, pool); err != nil {
_ = tx.Rollback()
log.Error(err)
return err
}
}
_ = tx.Commit()
return nil
}
return ExecDBWriteFunc(writeFunc)
}
// ReadClusterPoolInstances reads cluster-pool-instance associationsfor given cluster and pool
func ReadClusterPoolInstances(clusterName string, pool string) (result [](*ClusterPoolInstance), err error) {
args := sqlutils.Args()
whereClause := ``
if clusterName != "" {
whereClause = `
where
database_instance.cluster_name = ?
and ? in ('', pool)
`
args = append(args, clusterName, pool)
}
query := fmt.Sprintf(`
select
cluster_name,
database_instance_pool.*
from
database_instance
join database_instance_pool using (hostname, port)
%s
`, whereClause)
err = db.QueryVTOrc(query, args, func(m sqlutils.RowMap) error {
clusterPoolInstance := ClusterPoolInstance{
ClusterName: m.GetString("cluster_name"),
Pool: m.GetString("pool"),
Hostname: m.GetString("hostname"),
Port: m.GetInt("port"),
}
result = append(result, &clusterPoolInstance)
return nil
})
if err != nil {
return nil, err
}
return result, nil
}
// ReadAllClusterPoolInstances returns all clusters-pools-insatnces associations
func ReadAllClusterPoolInstances() ([](*ClusterPoolInstance), error) {
return ReadClusterPoolInstances("", "")
}
// ReadClusterPoolInstancesMap returns association of pools-to-instances for a given cluster
// and potentially for a given pool.
func ReadClusterPoolInstancesMap(clusterName string, pool string) (*PoolInstancesMap, error) {
var poolInstancesMap = make(PoolInstancesMap)
clusterPoolInstances, err := ReadClusterPoolInstances(clusterName, pool)
if err != nil {
return nil, nil
}
for _, clusterPoolInstance := range clusterPoolInstances {
if _, ok := poolInstancesMap[clusterPoolInstance.Pool]; !ok {
poolInstancesMap[clusterPoolInstance.Pool] = [](*InstanceKey){}
}
poolInstancesMap[clusterPoolInstance.Pool] = append(poolInstancesMap[clusterPoolInstance.Pool], &InstanceKey{Hostname: clusterPoolInstance.Hostname, Port: clusterPoolInstance.Port})
}
return &poolInstancesMap, nil
}
// ExpirePoolInstances cleans up the database_instance_pool table from expired items
func ExpirePoolInstances() error {
_, err := db.ExecVTOrc(`
delete
from database_instance_pool
where
registered_at < now() - interval ? minute
`,
config.Config.InstancePoolExpiryMinutes,
)
if err != nil {
log.Error(err)
}
return err
}

Просмотреть файл

@ -20,7 +20,6 @@ import (
"errors"
"fmt"
"net"
"regexp"
"strings"
"sync"
"time"
@ -76,23 +75,17 @@ var hostnameResolvesLightweightCacheInit = &sync.Mutex{}
var hostnameResolvesLightweightCacheLoadedOnceFromDB = false
var hostnameIPsCache = cache.New(10*time.Minute, time.Minute)
func init() {
if config.Config.ExpiryHostnameResolvesMinutes < 1 {
config.Config.ExpiryHostnameResolvesMinutes = 1
}
}
func getHostnameResolvesLightweightCache() *cache.Cache {
hostnameResolvesLightweightCacheInit.Lock()
defer hostnameResolvesLightweightCacheInit.Unlock()
if hostnameResolvesLightweightCache == nil {
hostnameResolvesLightweightCache = cache.New(time.Duration(config.Config.ExpiryHostnameResolvesMinutes)*time.Minute, time.Minute)
hostnameResolvesLightweightCache = cache.New(time.Duration(config.ExpiryHostnameResolvesMinutes)*time.Minute, time.Minute)
}
return hostnameResolvesLightweightCache
}
func HostnameResolveMethodIsNone() bool {
return strings.ToLower(config.Config.HostnameResolveMethod) == "none"
return strings.ToLower(config.HostnameResolveMethod) == "none"
}
// GetCNAME resolves an IP or hostname into a normalized valid CNAME
@ -106,7 +99,7 @@ func GetCNAME(hostname string) (string, error) {
}
func resolveHostname(hostname string) (string, error) {
switch strings.ToLower(config.Config.HostnameResolveMethod) {
switch strings.ToLower(config.HostnameResolveMethod) {
case "none":
return hostname, nil
case "default":
@ -157,14 +150,6 @@ func ResolveHostname(hostname string) (string, error) {
// Unfound: resolve!
log.Infof("Hostname unresolved yet: %s", hostname)
resolvedHostname, err := resolveHostname(hostname)
if config.Config.RejectHostnameResolvePattern != "" {
// Reject, don't even cache
if matched, _ := regexp.MatchString(config.Config.RejectHostnameResolvePattern, resolvedHostname); matched {
log.Warningf("ResolveHostname: %+v resolved to %+v but rejected due to RejectHostnameResolvePattern '%+v'", hostname, resolvedHostname, config.Config.RejectHostnameResolvePattern)
return hostname, nil
}
}
if err != nil {
// Problem. What we'll do is cache the hostname for just one minute, so as to avoid flooding requests
// on one hand, yet make it refresh shortly on the other hand. Anyway do not write to database.
@ -227,63 +212,10 @@ func FlushNontrivialResolveCacheToDatabase() error {
return nil
}
func ResetHostnameResolveCache() error {
err := deleteHostnameResolves()
getHostnameResolvesLightweightCache().Flush()
hostnameResolvesLightweightCacheLoadedOnceFromDB = false
return err
}
func HostnameResolveCache() (map[string]cache.Item, error) {
return getHostnameResolvesLightweightCache().Items(), nil
}
func UnresolveHostname(instanceKey *InstanceKey) (InstanceKey, bool, error) {
if *config.RuntimeCLIFlags.SkipUnresolve {
return *instanceKey, false, nil
}
unresolvedHostname, err := readUnresolvedHostname(instanceKey.Hostname)
if err != nil {
log.Error(err)
return *instanceKey, false, err
}
if unresolvedHostname == instanceKey.Hostname {
// unchanged. Nothing to do
return *instanceKey, false, nil
}
// We unresovled to a different hostname. We will now re-resolve to double-check!
unresolvedKey := &InstanceKey{Hostname: unresolvedHostname, Port: instanceKey.Port}
instance, err := ReadTopologyInstance(unresolvedKey)
if err != nil {
log.Error(err)
return *instanceKey, false, err
}
if instance.IsBinlogServer() && config.Config.SkipBinlogServerUnresolveCheck {
// Do nothing. Everything is assumed to be fine.
} else if instance.Key.Hostname != instanceKey.Hostname {
// Resolve(Unresolve(hostname)) != hostname ==> Bad; reject
if *config.RuntimeCLIFlags.SkipUnresolveCheck {
return *instanceKey, false, nil
}
errMsg := fmt.Sprintf("Error unresolving; hostname=%s, unresolved=%s, re-resolved=%s; mismatch. Skip/ignore with --skip-unresolve-check", instanceKey.Hostname, unresolvedKey.Hostname, instance.Key.Hostname)
log.Errorf(errMsg)
return *instanceKey, false, fmt.Errorf(errMsg)
}
return *unresolvedKey, true, nil
}
func RegisterHostnameUnresolve(registration *HostnameRegistration) (err error) {
if registration.Hostname == "" {
return DeleteHostnameUnresolve(&registration.Key)
}
if registration.CreatedAt.Add(time.Duration(config.Config.ExpiryHostnameResolvesMinutes) * time.Minute).Before(time.Now()) {
// already expired.
return nil
}
return WriteHostnameUnresolve(&registration.Key, registration.Hostname)
}
func extractIPs(ips []net.IP) (ipv4String string, ipv6String string) {
for _, ip := range ips {
if ip4 := ip.To4(); ip4 != nil {

Просмотреть файл

@ -126,110 +126,13 @@ func ReadAllHostnameResolves() ([]HostnameResolve, error) {
return res, err
}
// ReadAllHostnameUnresolves returns the content of the hostname_unresolve table
func ReadAllHostnameUnresolves() ([]HostnameUnresolve, error) {
unres := []HostnameUnresolve{}
query := `
select
hostname,
unresolved_hostname
from
hostname_unresolve
`
err := db.QueryVTOrcRowsMap(query, func(m sqlutils.RowMap) error {
hostnameUnresolve := HostnameUnresolve{hostname: m.GetString("hostname"), unresolvedHostname: m.GetString("unresolved_hostname")}
unres = append(unres, hostnameUnresolve)
return nil
})
if err != nil {
log.Error(err)
}
return unres, err
}
// readUnresolvedHostname reverse-reads hostname resolve. It returns a hostname which matches given pattern and resovles to resolvedHostname,
// or, in the event no such hostname is found, the given resolvedHostname, unchanged.
func readUnresolvedHostname(hostname string) (string, error) {
unresolvedHostname := hostname
query := `
select
unresolved_hostname
from
hostname_unresolve
where
hostname = ?
`
err := db.QueryVTOrc(query, sqlutils.Args(hostname), func(m sqlutils.RowMap) error {
unresolvedHostname = m.GetString("unresolved_hostname")
return nil
})
readUnresolvedHostnameCounter.Inc(1)
if err != nil {
log.Error(err)
}
return unresolvedHostname, err
}
// WriteHostnameUnresolve upserts an entry in hostname_unresolve
func WriteHostnameUnresolve(instanceKey *InstanceKey, unresolvedHostname string) error {
writeFunc := func() error {
_, err := db.ExecVTOrc(`
insert into hostname_unresolve (
hostname,
unresolved_hostname,
last_registered)
values (?, ?, NOW())
on duplicate key update
unresolved_hostname=values(unresolved_hostname),
last_registered=now()
`, instanceKey.Hostname, unresolvedHostname,
)
if err != nil {
log.Error(err)
return err
}
_, _ = db.ExecVTOrc(`
replace into hostname_unresolve_history (
hostname,
unresolved_hostname,
last_registered)
values (?, ?, NOW())
`, instanceKey.Hostname, unresolvedHostname,
)
writeUnresolvedHostnameCounter.Inc(1)
return nil
}
return ExecDBWriteFunc(writeFunc)
}
// DeleteHostnameUnresolve removes an unresolve entry
func DeleteHostnameUnresolve(instanceKey *InstanceKey) error {
writeFunc := func() error {
_, err := db.ExecVTOrc(`
delete from hostname_unresolve
where hostname=?
`, instanceKey.Hostname,
)
if err != nil {
log.Error(err)
}
return err
}
return ExecDBWriteFunc(writeFunc)
}
// ExpireHostnameUnresolve expires hostname_unresolve entries that haven't been updated recently.
func ExpireHostnameUnresolve() error {
writeFunc := func() error {
_, err := db.ExecVTOrc(`
delete from hostname_unresolve
where last_registered < NOW() - INTERVAL ? MINUTE
`, config.Config.ExpiryHostnameResolvesMinutes,
`, config.ExpiryHostnameResolvesMinutes,
)
if err != nil {
log.Error(err)
@ -246,7 +149,7 @@ func ForgetExpiredHostnameResolves() error {
from hostname_resolve
where
resolved_timestamp < NOW() - interval ? minute`,
2*config.Config.ExpiryHostnameResolvesMinutes,
2*config.ExpiryHostnameResolvesMinutes,
)
return err
}
@ -290,15 +193,6 @@ func DeleteInvalidHostnameResolves() error {
return err
}
// deleteHostnameResolves compeltely erases the database cache
func deleteHostnameResolves() error {
_, err := db.ExecVTOrc(`
delete
from hostname_resolve`,
)
return err
}
// writeHostnameIPs stroes an ipv4 and ipv6 associated witha hostname, if available
func writeHostnameIPs(hostname string, ipv4String string, ipv6String string) error {
writeFunc := func() error {

Просмотреть файл

@ -1,133 +0,0 @@
/*
Copyright 2017 Simon J Mudd
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package inst
/*
query holds information about query metrics and records the time taken
waiting before doing the query plus the time taken executing the query.
*/
import (
"time"
"vitess.io/vitess/go/vt/vtorc/collection"
"vitess.io/vitess/go/vt/vtorc/config"
"github.com/montanaflynn/stats"
)
// Metric records query metrics of backend writes that go through
// a sized channel. It allows us to compare the time waiting to
// execute the query against the time needed to run it and in a
// "sized channel" the wait time may be significant and is good to
// measure.
type WriteBufferMetric struct {
Timestamp time.Time // time the metric was started
Instances int // number of flushed instances
WaitLatency time.Duration // waiting before flush
WriteLatency time.Duration // time writing to backend
}
// When records the timestamp of the start of the recording
func (m WriteBufferMetric) When() time.Time {
return m.Timestamp
}
type AggregatedWriteBufferMetric struct {
InstanceWriteBufferSize int // config setting
InstanceFlushIntervalMilliseconds int // config setting
CountInstances int
MaxInstances float64
MeanInstances float64
MedianInstances float64
P95Instances float64
MaxWaitSeconds float64
MeanWaitSeconds float64
MedianWaitSeconds float64
P95WaitSeconds float64
MaxWriteSeconds float64
MeanWriteSeconds float64
MedianWriteSeconds float64
P95WriteSeconds float64
}
// AggregatedSince returns the aggregated query metrics for the period
// given from the values provided.
func AggregatedSince(c *collection.Collection, t time.Time) AggregatedWriteBufferMetric {
// Initialise timing metrics
var instancesCounter []float64
var waitTimings []float64
var writeTimings []float64
// Retrieve values since the time specified
values, err := c.Since(t)
a := AggregatedWriteBufferMetric{
InstanceWriteBufferSize: config.Config.InstanceWriteBufferSize,
InstanceFlushIntervalMilliseconds: config.Config.InstanceFlushIntervalMilliseconds,
}
if err != nil {
return a // empty data
}
// generate the metrics
for _, v := range values {
instancesCounter = append(instancesCounter, float64(v.(*WriteBufferMetric).Instances))
waitTimings = append(waitTimings, v.(*WriteBufferMetric).WaitLatency.Seconds())
writeTimings = append(writeTimings, v.(*WriteBufferMetric).WriteLatency.Seconds())
a.CountInstances += v.(*WriteBufferMetric).Instances
}
// generate aggregate values
if s, err := stats.Max(stats.Float64Data(instancesCounter)); err == nil {
a.MaxInstances = s
}
if s, err := stats.Mean(stats.Float64Data(instancesCounter)); err == nil {
a.MeanInstances = s
}
if s, err := stats.Median(stats.Float64Data(instancesCounter)); err == nil {
a.MedianInstances = s
}
if s, err := stats.Percentile(stats.Float64Data(instancesCounter), 95); err == nil {
a.P95Instances = s
}
if s, err := stats.Max(stats.Float64Data(waitTimings)); err == nil {
a.MaxWaitSeconds = s
}
if s, err := stats.Mean(stats.Float64Data(waitTimings)); err == nil {
a.MeanWaitSeconds = s
}
if s, err := stats.Median(stats.Float64Data(waitTimings)); err == nil {
a.MedianWaitSeconds = s
}
if s, err := stats.Percentile(stats.Float64Data(waitTimings), 95); err == nil {
a.P95WaitSeconds = s
}
if s, err := stats.Max(stats.Float64Data(writeTimings)); err == nil {
a.MaxWriteSeconds = s
}
if s, err := stats.Mean(stats.Float64Data(writeTimings)); err == nil {
a.MeanWriteSeconds = s
}
if s, err := stats.Median(stats.Float64Data(writeTimings)); err == nil {
a.MedianWriteSeconds = s
}
if s, err := stats.Percentile(stats.Float64Data(writeTimings), 95); err == nil {
a.P95WriteSeconds = s
}
return a
}

Просмотреть файл

@ -1,273 +0,0 @@
/*
Copyright 2017 Shlomi Noach, GitHub Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package logic
import (
"encoding/json"
"fmt"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/inst"
)
// AsyncRequest represents an entry in the async_request table
type CommandApplier struct {
}
func NewCommandApplier() *CommandApplier {
applier := &CommandApplier{}
return applier
}
func (applier *CommandApplier) ApplyCommand(op string, value []byte) any {
switch op {
case "heartbeat":
return nil
case "register-node":
return applier.registerNode(value)
case "discover":
return applier.discover(value)
case "injected-pseudo-gtid":
return nil // depracated
case "forget":
return applier.forget(value)
case "forget-cluster":
return applier.forgetCluster(value)
case "begin-downtime":
return applier.beginDowntime(value)
case "end-downtime":
return applier.endDowntime(value)
case "register-candidate":
return applier.registerCandidate(value)
case "ack-recovery":
return applier.ackRecovery(value)
case "register-hostname-unresolve":
return applier.registerHostnameUnresolve(value)
case "submit-pool-instances":
return applier.submitPoolInstances(value)
case "register-failure-detection":
return applier.registerFailureDetection(value)
case "write-recovery":
return applier.writeRecovery(value)
case "write-recovery-step":
return applier.writeRecoveryStep(value)
case "resolve-recovery":
return applier.resolveRecovery(value)
case "disable-global-recoveries":
return applier.disableGlobalRecoveries(value)
case "enable-global-recoveries":
return applier.enableGlobalRecoveries(value)
case "put-instance-tag":
return applier.putInstanceTag(value)
case "delete-instance-tag":
return applier.deleteInstanceTag(value)
case "set-cluster-alias-manual-override":
return applier.noop(value)
}
errMsg := fmt.Sprintf("Unknown command op: %s", op)
log.Errorf(errMsg)
return fmt.Errorf(errMsg)
}
func (applier *CommandApplier) registerNode(value []byte) any {
return nil
}
func (applier *CommandApplier) discover(value []byte) any {
instanceKey := inst.InstanceKey{}
if err := json.Unmarshal(value, &instanceKey); err != nil {
log.Error(err)
return err
}
DiscoverInstance(instanceKey, false /* forceDiscovery */)
return nil
}
func (applier *CommandApplier) forget(value []byte) any {
instanceKey := inst.InstanceKey{}
if err := json.Unmarshal(value, &instanceKey); err != nil {
log.Error(err)
return err
}
err := inst.ForgetInstance(&instanceKey)
return err
}
func (applier *CommandApplier) forgetCluster(value []byte) any {
var clusterName string
if err := json.Unmarshal(value, &clusterName); err != nil {
log.Error(err)
return err
}
err := inst.ForgetCluster(clusterName)
return err
}
func (applier *CommandApplier) beginDowntime(value []byte) any {
downtime := inst.Downtime{}
if err := json.Unmarshal(value, &downtime); err != nil {
log.Error(err)
return err
}
err := inst.BeginDowntime(&downtime)
return err
}
func (applier *CommandApplier) endDowntime(value []byte) any {
instanceKey := inst.InstanceKey{}
if err := json.Unmarshal(value, &instanceKey); err != nil {
log.Error(err)
return err
}
_, err := inst.EndDowntime(&instanceKey)
return err
}
func (applier *CommandApplier) registerCandidate(value []byte) any {
candidate := inst.CandidateDatabaseInstance{}
if err := json.Unmarshal(value, &candidate); err != nil {
log.Error(err)
return err
}
err := inst.RegisterCandidateInstance(&candidate)
return err
}
func (applier *CommandApplier) ackRecovery(value []byte) any {
ack := RecoveryAcknowledgement{}
err := json.Unmarshal(value, &ack)
if err != nil {
log.Error(err)
return err
}
if ack.AllRecoveries {
_, err = AcknowledgeAllRecoveries(ack.Owner, ack.Comment)
}
if ack.ClusterName != "" {
_, err = AcknowledgeClusterRecoveries(ack.ClusterName, ack.Owner, ack.Comment)
}
if ack.Key.IsValid() {
_, err = AcknowledgeInstanceRecoveries(&ack.Key, ack.Owner, ack.Comment)
}
if ack.ID > 0 {
_, err = AcknowledgeRecovery(ack.ID, ack.Owner, ack.Comment)
}
if ack.UID != "" {
_, err = AcknowledgeRecoveryByUID(ack.UID, ack.Owner, ack.Comment)
}
return err
}
func (applier *CommandApplier) registerHostnameUnresolve(value []byte) any {
registration := inst.HostnameRegistration{}
if err := json.Unmarshal(value, &registration); err != nil {
log.Error(err)
return err
}
err := inst.RegisterHostnameUnresolve(&registration)
return err
}
func (applier *CommandApplier) submitPoolInstances(value []byte) any {
submission := inst.PoolInstancesSubmission{}
if err := json.Unmarshal(value, &submission); err != nil {
log.Error(err)
return err
}
err := inst.ApplyPoolInstances(&submission)
return err
}
func (applier *CommandApplier) registerFailureDetection(value []byte) any {
analysisEntry := inst.ReplicationAnalysis{}
if err := json.Unmarshal(value, &analysisEntry); err != nil {
log.Error(err)
return err
}
_, err := AttemptFailureDetectionRegistration(&analysisEntry)
return err
}
func (applier *CommandApplier) writeRecovery(value []byte) any {
topologyRecovery := TopologyRecovery{}
if err := json.Unmarshal(value, &topologyRecovery); err != nil {
log.Error(err)
return err
}
if _, err := writeTopologyRecovery(&topologyRecovery); err != nil {
return err
}
return nil
}
func (applier *CommandApplier) writeRecoveryStep(value []byte) any {
topologyRecoveryStep := TopologyRecoveryStep{}
if err := json.Unmarshal(value, &topologyRecoveryStep); err != nil {
log.Error(err)
return err
}
err := writeTopologyRecoveryStep(&topologyRecoveryStep)
return err
}
func (applier *CommandApplier) resolveRecovery(value []byte) any {
topologyRecovery := TopologyRecovery{}
if err := json.Unmarshal(value, &topologyRecovery); err != nil {
log.Error(err)
return err
}
if err := writeResolveRecovery(&topologyRecovery); err != nil {
log.Error(err)
return err
}
return nil
}
func (applier *CommandApplier) disableGlobalRecoveries(value []byte) any {
err := DisableRecovery()
return err
}
func (applier *CommandApplier) enableGlobalRecoveries(value []byte) any {
err := EnableRecovery()
return err
}
func (applier *CommandApplier) putInstanceTag(value []byte) any {
instanceTag := inst.InstanceTag{}
if err := json.Unmarshal(value, &instanceTag); err != nil {
log.Error(err)
return err
}
err := inst.PutInstanceTag(&instanceTag.Key, &instanceTag.T)
return err
}
func (applier *CommandApplier) deleteInstanceTag(value []byte) any {
instanceTag := inst.InstanceTag{}
if err := json.Unmarshal(value, &instanceTag); err != nil {
log.Error(err)
return err
}
_, err := inst.Untag(&instanceTag.Key, &instanceTag.T)
return err
}
func (applier *CommandApplier) noop(value []byte) any {
return nil
}

Просмотреть файл

@ -31,7 +31,7 @@ import (
// RefreshAllKeyspaces reloads the keyspace information for the keyspaces that vtorc is concerned with.
func RefreshAllKeyspaces() {
var keyspaces []string
if *clustersToWatch == "" { // all known keyspaces
if len(clustersToWatch) == 0 { // all known keyspaces
ctx, cancel := context.WithTimeout(context.Background(), *topo.RemoteOperationTimeout)
defer cancel()
var err error
@ -43,8 +43,7 @@ func RefreshAllKeyspaces() {
}
} else {
// Parse input and build list of keyspaces
inputs := strings.Split(*clustersToWatch, ",")
for _, ks := range inputs {
for _, ks := range clustersToWatch {
if strings.Contains(ks, "/") {
// This is a keyspace/shard specification
input := strings.Split(ks, "/")
@ -55,7 +54,7 @@ func RefreshAllKeyspaces() {
}
}
if len(keyspaces) == 0 {
log.Errorf("Found no keyspaces for input: %v", *clustersToWatch)
log.Errorf("Found no keyspaces for input: %+v", clustersToWatch)
return
}
}

Просмотреть файл

@ -80,8 +80,8 @@ func TestRefreshAllKeyspaces(t *testing.T) {
}
// Set clusters to watch to only watch ks1 and ks3
onlyKs1and3 := "ks1/-,ks3/-80,ks3/80-"
clustersToWatch = &onlyKs1and3
onlyKs1and3 := []string{"ks1/-", "ks3/-80", "ks3/80-"}
clustersToWatch = onlyKs1and3
RefreshAllKeyspaces()
// Verify that we only have ks1 and ks3 in vtorc's db.
@ -91,8 +91,7 @@ func TestRefreshAllKeyspaces(t *testing.T) {
verifyKeyspaceInfo(t, "ks4", nil, "keyspace not found")
// Set clusters to watch to watch all keyspaces
allKeyspaces := ""
clustersToWatch = &allKeyspaces
clustersToWatch = nil
// Change the durability policy of ks1
reparenttestutil.SetKeyspaceDurability(context.Background(), t, ts, "ks1", "semi_sync")
RefreshAllKeyspaces()

Просмотреть файл

@ -118,14 +118,14 @@ func acceptSignals() {
log.Infof("Received SIGHUP. Reloading configuration")
_ = inst.AuditOperation("reload-configuration", nil, "Triggered via SIGHUP")
config.Reload()
discoveryMetrics.SetExpirePeriod(time.Duration(config.Config.DiscoveryCollectionRetentionSeconds) * time.Second)
discoveryMetrics.SetExpirePeriod(time.Duration(config.DiscoveryCollectionRetentionSeconds) * time.Second)
case syscall.SIGTERM:
log.Infof("Received SIGTERM. Starting shutdown")
atomic.StoreInt32(&hasReceivedSIGTERM, 1)
discoveryMetrics.StopAutoExpiration()
// probably should poke other go routines to stop cleanly here ...
_ = inst.AuditOperation("shutdown", nil, "Triggered via SIGTERM")
timeout := time.After(*shutdownWaitTime)
timeout := time.After(shutdownWaitTime)
func() {
for {
count := atomic.LoadInt32(&shardsLockCounter)
@ -154,7 +154,7 @@ func handleDiscoveryRequests() {
discoveryQueue = discovery.CreateOrReturnQueue("DEFAULT")
// create a pool of discovery workers
for i := uint(0); i < config.Config.DiscoveryMaxConcurrency; i++ {
for i := uint(0); i < config.DiscoveryMaxConcurrency; i++ {
go func() {
for {
instanceKey := discoveryQueue.Consume()
@ -182,10 +182,6 @@ func DiscoverInstance(instanceKey inst.InstanceKey, forceDiscovery bool) {
log.Infof("discoverInstance: skipping discovery of %+v because it is set to be forgotten", instanceKey)
return
}
if inst.RegexpMatchPatterns(instanceKey.StringCode(), config.Config.DiscoveryIgnoreHostnameFilters) {
log.Infof("discoverInstance: skipping discovery of %+v because it matches DiscoveryIgnoreHostnameFilters", instanceKey)
return
}
// create stopwatch entries
latency := stopwatch.NewNamedStopwatch()
@ -228,7 +224,7 @@ func DiscoverInstance(instanceKey inst.InstanceKey, forceDiscovery bool) {
discoveriesCounter.Inc(1)
// First we've ever heard of this instance. Continue investigation:
instance, err := inst.ReadTopologyInstanceBufferable(&instanceKey, config.Config.BufferInstanceWrites, latency)
instance, err := inst.ReadTopologyInstanceBufferable(&instanceKey, latency)
// panic can occur (IO stuff). Therefore it may happen
// that instance is nil. Check it, but first get the timing metrics.
totalLatency := latency.Elapsed("total")
@ -331,19 +327,6 @@ func onHealthTick() {
}
}
func injectSeeds(seedOnce *sync.Once) {
seedOnce.Do(func() {
for _, seed := range config.Config.DiscoverySeeds {
instanceKey, err := inst.ParseRawInstanceKey(seed)
if err == nil {
_ = inst.InjectSeed(instanceKey)
} else {
log.Errorf("Error parsing seed %s: %+v", seed, err)
}
}
})
}
// ContinuousDiscovery starts an asynchronuous infinite discovery process where instances are
// periodically investigated and their status captured, and long since unseen instances are
// purged and forgotten.
@ -372,17 +355,11 @@ func ContinuousDiscovery() {
return time.Since(continuousDiscoveryStartTime) >= checkAndRecoverWaitPeriod
}
var seedOnce sync.Once
go func() {
_ = ometrics.InitMetrics()
}()
go acceptSignals()
if *config.RuntimeCLIFlags.GrabElection {
_ = process.GrabElection()
}
log.Infof("continuous discovery: starting")
for {
select {
@ -397,7 +374,6 @@ func ContinuousDiscovery() {
// as instance poll
if IsLeaderOrActive() {
go inst.ExpireDowntime()
go injectSeeds(&seedOnce)
}
}()
case <-caretakingTick:
@ -416,11 +392,9 @@ func ContinuousDiscovery() {
go inst.ExpireHostnameUnresolve()
go inst.ExpireClusterDomainName()
go inst.ExpireAudit()
go inst.ExpirePoolInstances()
go inst.FlushNontrivialResolveCacheToDatabase()
go inst.ExpireStaleInstanceBinlogCoordinates()
go process.ExpireNodesHistory()
go process.ExpireAccessTokens()
go process.ExpireAvailableNodes()
go ExpireFailureDetectionHistory()
go ExpireTopologyRecoveryHistory()

Просмотреть файл

@ -19,17 +19,18 @@ package logic
import (
"context"
"errors"
"flag"
"strings"
"sync"
"sync/atomic"
"time"
"vitess.io/vitess/go/vt/log"
"github.com/spf13/pflag"
"google.golang.org/protobuf/encoding/prototext"
"google.golang.org/protobuf/proto"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/topo/topoproto"
"vitess.io/vitess/go/vt/vtorc/config"
topodatapb "vitess.io/vitess/go/vt/proto/topodata"
@ -44,13 +45,19 @@ import (
var (
ts *topo.Server
tmc tmclient.TabletManagerClient
clustersToWatch = flag.String("clusters_to_watch", "", "Comma-separated list of keyspaces or keyspace/shards that this instance will monitor and repair. Defaults to all clusters in the topology. Example: \"ks1,ks2/-80\"")
shutdownWaitTime = flag.Duration("shutdown_wait_time", 30*time.Second, "maximum time to wait for vtorc to release all the locks that it is holding before shutting down on SIGTERM")
clustersToWatch []string
shutdownWaitTime = 30 * time.Second
shardsLockCounter int32
// ErrNoPrimaryTablet is a fixed error message.
ErrNoPrimaryTablet = errors.New("no primary tablet found")
)
// RegisterFlags registers the flags required by VTOrc
func RegisterFlags(fs *pflag.FlagSet) {
fs.StringSliceVar(&clustersToWatch, "clusters_to_watch", clustersToWatch, "Comma-separated list of keyspaces or keyspace/shards that this instance will monitor and repair. Defaults to all clusters in the topology. Example: \"ks1,ks2/-80\"")
fs.DurationVar(&shutdownWaitTime, "shutdown_wait_time", shutdownWaitTime, "Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM")
}
// OpenTabletDiscovery opens the vitess topo if enables and returns a ticker
// channel for polling.
func OpenTabletDiscovery() <-chan time.Time {
@ -80,7 +87,7 @@ func refreshTabletsUsing(loader func(instanceKey *inst.InstanceKey), forceRefres
if !IsLeaderOrActive() {
return
}
if *clustersToWatch == "" { // all known clusters
if len(clustersToWatch) == 0 { // all known clusters
ctx, cancel := context.WithTimeout(context.Background(), *topo.RemoteOperationTimeout)
defer cancel()
cells, err := ts.GetKnownCells(ctx)
@ -103,8 +110,7 @@ func refreshTabletsUsing(loader func(instanceKey *inst.InstanceKey), forceRefres
} else {
// Parse input and build list of keyspaces / shards
var keyspaceShards []*topo.KeyspaceShard
inputs := strings.Split(*clustersToWatch, ",")
for _, ks := range inputs {
for _, ks := range clustersToWatch {
if strings.Contains(ks, "/") {
// This is a keyspace/shard specification
input := strings.Split(ks, "/")
@ -129,7 +135,7 @@ func refreshTabletsUsing(loader func(instanceKey *inst.InstanceKey), forceRefres
}
}
if len(keyspaceShards) == 0 {
log.Errorf("Found no keyspaceShards for input: %v", *clustersToWatch)
log.Errorf("Found no keyspaceShards for input: %+v", clustersToWatch)
return
}
refreshCtx, refreshCancel := context.WithTimeout(context.Background(), *topo.RemoteOperationTimeout)
@ -336,3 +342,38 @@ func shardPrimary(keyspace string, shard string) (primary *topodatapb.Tablet, er
}
return primary, err
}
// restartsReplication restarts the replication on the provided replicaKey. It also sets the correct semi-sync settings when it starts replication
func restartReplication(replicaKey *inst.InstanceKey) error {
replicaTablet, err := inst.ReadTablet(*replicaKey)
if err != nil {
log.Info("Could not read tablet - %+v", replicaKey)
return err
}
primaryTablet, err := shardPrimary(replicaTablet.Keyspace, replicaTablet.Shard)
if err != nil {
log.Info("Could not compute primary for %v/%v", replicaTablet.Keyspace, replicaTablet.Shard)
return err
}
durabilityPolicy, err := inst.GetDurabilityPolicy(replicaTablet)
if err != nil {
log.Info("Could not read the durability policy for %v/%v", replicaTablet.Keyspace, replicaTablet.Shard)
return err
}
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(config.Config.WaitReplicasTimeoutSeconds)*time.Second)
defer cancel()
err = tmc.StopReplication(ctx, replicaTablet)
if err != nil {
log.Info("Could not stop replication on %v", topoproto.TabletAliasString(replicaTablet.Alias))
return err
}
err = tmc.StartReplication(ctx, replicaTablet, inst.IsReplicaSemiSync(durabilityPolicy, primaryTablet, replicaTablet))
if err != nil {
log.Info("Could not start replication on %v", topoproto.TabletAliasString(replicaTablet.Alias))
return err
}
return nil
}

Просмотреть файл

@ -21,7 +21,6 @@ import (
"encoding/json"
"fmt"
"math/rand"
goos "os"
"strings"
"time"
@ -35,12 +34,9 @@ import (
"vitess.io/vitess/go/vt/logutil"
"vitess.io/vitess/go/vt/topo/topoproto"
"vitess.io/vitess/go/vt/vtctl/reparentutil"
"vitess.io/vitess/go/vt/vtctl/reparentutil/promotionrule"
"vitess.io/vitess/go/vt/vtorc/attributes"
"vitess.io/vitess/go/vt/vtorc/config"
"vitess.io/vitess/go/vt/vtorc/inst"
"vitess.io/vitess/go/vt/vtorc/os"
"vitess.io/vitess/go/vt/vtorc/process"
"vitess.io/vitess/go/vt/vtorc/util"
"vitess.io/vitess/go/vt/vttablet/tmclient"
)
@ -110,14 +106,6 @@ type RecoveryAcknowledgement struct {
AllRecoveries bool
}
func NewRecoveryAcknowledgement(owner string, comment string) *RecoveryAcknowledgement {
return &RecoveryAcknowledgement{
CreatedAt: time.Now(),
Owner: owner,
Comment: comment,
}
}
// BlockedTopologyRecovery represents an entry in the blocked_topology_recovery table
type BlockedTopologyRecovery struct {
FailedInstanceKey inst.InstanceKey
@ -207,23 +195,6 @@ var emergencyReadTopologyInstanceMap *cache.Cache
var emergencyRestartReplicaTopologyInstanceMap *cache.Cache
var emergencyOperationGracefulPeriodMap *cache.Cache
// InstancesByCountReplicas sorts instances by umber of replicas, descending
type InstancesByCountReplicas [](*inst.Instance)
func (instancesByCountReplicas InstancesByCountReplicas) Len() int {
return len(instancesByCountReplicas)
}
func (instancesByCountReplicas InstancesByCountReplicas) Swap(i, j int) {
instancesByCountReplicas[i], instancesByCountReplicas[j] = instancesByCountReplicas[j], instancesByCountReplicas[i]
}
func (instancesByCountReplicas InstancesByCountReplicas) Less(i, j int) bool {
if len(instancesByCountReplicas[i].Replicas) == len(instancesByCountReplicas[j].Replicas) {
// Secondary sorting: prefer more advanced replicas
return !instancesByCountReplicas[i].ExecBinlogCoordinates.SmallerThan(&instancesByCountReplicas[j].ExecBinlogCoordinates)
}
return len(instancesByCountReplicas[i].Replicas) < len(instancesByCountReplicas[j].Replicas)
}
func init() {
go initializeTopologyRecoveryPostConfiguration()
}
@ -256,311 +227,6 @@ func resolveRecovery(topologyRecovery *TopologyRecovery, successorInstance *inst
return writeResolveRecovery(topologyRecovery)
}
// prepareCommand replaces agreed-upon placeholders with analysis data
func prepareCommand(command string, topologyRecovery *TopologyRecovery) (result string, async bool) {
analysisEntry := &topologyRecovery.AnalysisEntry
command = strings.TrimSpace(command)
if strings.HasSuffix(command, "&") {
command = strings.TrimRight(command, "&")
async = true
}
command = strings.Replace(command, "{failureType}", string(analysisEntry.Analysis), -1)
command = strings.Replace(command, "{instanceType}", string(analysisEntry.GetAnalysisInstanceType()), -1)
command = strings.Replace(command, "{isPrimary}", fmt.Sprintf("%t", analysisEntry.IsPrimary), -1)
command = strings.Replace(command, "{isCoPrimary}", fmt.Sprintf("%t", analysisEntry.IsCoPrimary), -1)
command = strings.Replace(command, "{failureDescription}", analysisEntry.Description, -1)
command = strings.Replace(command, "{command}", analysisEntry.CommandHint, -1)
command = strings.Replace(command, "{failedHost}", analysisEntry.AnalyzedInstanceKey.Hostname, -1)
command = strings.Replace(command, "{failedPort}", fmt.Sprintf("%d", analysisEntry.AnalyzedInstanceKey.Port), -1)
command = strings.Replace(command, "{failureCluster}", analysisEntry.ClusterDetails.ClusterName, -1)
command = strings.Replace(command, "{failureClusterDomain}", analysisEntry.ClusterDetails.ClusterDomain, -1)
command = strings.Replace(command, "{countReplicas}", fmt.Sprintf("%d", analysisEntry.CountReplicas), -1)
command = strings.Replace(command, "{isDowntimed}", fmt.Sprint(analysisEntry.IsDowntimed), -1)
command = strings.Replace(command, "{autoPrimaryRecovery}", fmt.Sprint(analysisEntry.ClusterDetails.HasAutomatedPrimaryRecovery), -1)
command = strings.Replace(command, "{autoIntermediatePrimaryRecovery}", fmt.Sprint(analysisEntry.ClusterDetails.HasAutomatedIntermediatePrimaryRecovery), -1)
command = strings.Replace(command, "{orchestratorHost}", process.ThisHostname, -1)
command = strings.Replace(command, "{vtorcHost}", process.ThisHostname, -1)
command = strings.Replace(command, "{recoveryUID}", topologyRecovery.UID, -1)
command = strings.Replace(command, "{isSuccessful}", fmt.Sprint(topologyRecovery.SuccessorKey != nil), -1)
if topologyRecovery.SuccessorKey != nil {
command = strings.Replace(command, "{successorHost}", topologyRecovery.SuccessorKey.Hostname, -1)
command = strings.Replace(command, "{successorPort}", fmt.Sprintf("%d", topologyRecovery.SuccessorKey.Port), -1)
// As long as SucesssorKey != nil, we replace {successorAlias}.
// If SucessorAlias is "", it's fine. We'll replace {successorAlias} with "".
command = strings.Replace(command, "{successorAlias}", topologyRecovery.SuccessorAlias, -1)
}
command = strings.Replace(command, "{lostReplicas}", topologyRecovery.LostReplicas.ToCommaDelimitedList(), -1)
command = strings.Replace(command, "{countLostReplicas}", fmt.Sprintf("%d", len(topologyRecovery.LostReplicas)), -1)
command = strings.Replace(command, "{replicaHosts}", analysisEntry.Replicas.ToCommaDelimitedList(), -1)
return command, async
}
// applyEnvironmentVariables sets the relevant environment variables for a recovery
func applyEnvironmentVariables(topologyRecovery *TopologyRecovery) []string {
analysisEntry := &topologyRecovery.AnalysisEntry
env := goos.Environ()
env = append(env, fmt.Sprintf("ORC_FAILURE_TYPE=%s", string(analysisEntry.Analysis)))
env = append(env, fmt.Sprintf("ORC_INSTANCE_TYPE=%s", string(analysisEntry.GetAnalysisInstanceType())))
env = append(env, fmt.Sprintf("ORC_IS_PRIMARY=%t", analysisEntry.IsPrimary))
env = append(env, fmt.Sprintf("ORC_IS_CO_PRIMARY=%t", analysisEntry.IsCoPrimary))
env = append(env, fmt.Sprintf("ORC_FAILURE_DESCRIPTION=%s", analysisEntry.Description))
env = append(env, fmt.Sprintf("ORC_COMMAND=%s", analysisEntry.CommandHint))
env = append(env, fmt.Sprintf("ORC_FAILED_HOST=%s", analysisEntry.AnalyzedInstanceKey.Hostname))
env = append(env, fmt.Sprintf("ORC_FAILED_PORT=%d", analysisEntry.AnalyzedInstanceKey.Port))
env = append(env, fmt.Sprintf("ORC_FAILURE_CLUSTER=%s", analysisEntry.ClusterDetails.ClusterName))
env = append(env, fmt.Sprintf("ORC_FAILURE_CLUSTER_DOMAIN=%s", analysisEntry.ClusterDetails.ClusterDomain))
env = append(env, fmt.Sprintf("ORC_COUNT_REPLICAS=%d", analysisEntry.CountReplicas))
env = append(env, fmt.Sprintf("ORC_IS_DOWNTIMED=%v", analysisEntry.IsDowntimed))
env = append(env, fmt.Sprintf("ORC_AUTO_PRIMARY_RECOVERY=%v", analysisEntry.ClusterDetails.HasAutomatedPrimaryRecovery))
env = append(env, fmt.Sprintf("ORC_AUTO_INTERMEDIATE_PRIMARY_RECOVERY=%v", analysisEntry.ClusterDetails.HasAutomatedIntermediatePrimaryRecovery))
env = append(env, fmt.Sprintf("ORC_ORCHESTRATOR_HOST=%s", process.ThisHostname))
env = append(env, fmt.Sprintf("ORC_VTORC_HOST=%s", process.ThisHostname))
env = append(env, fmt.Sprintf("ORC_IS_SUCCESSFUL=%v", (topologyRecovery.SuccessorKey != nil)))
env = append(env, fmt.Sprintf("ORC_LOST_REPLICAS=%s", topologyRecovery.LostReplicas.ToCommaDelimitedList()))
env = append(env, fmt.Sprintf("ORC_REPLICA_HOSTS=%s", analysisEntry.Replicas.ToCommaDelimitedList()))
env = append(env, fmt.Sprintf("ORC_RECOVERY_UID=%s", topologyRecovery.UID))
if topologyRecovery.SuccessorKey != nil {
env = append(env, fmt.Sprintf("ORC_SUCCESSOR_HOST=%s", topologyRecovery.SuccessorKey.Hostname))
env = append(env, fmt.Sprintf("ORC_SUCCESSOR_PORT=%d", topologyRecovery.SuccessorKey.Port))
// As long as SucesssorKey != nil, we replace {successorAlias}.
// If SucessorAlias is "", it's fine. We'll replace {successorAlias} with "".
env = append(env, fmt.Sprintf("ORC_SUCCESSOR_ALIAS=%s", topologyRecovery.SuccessorAlias))
}
return env
}
func executeProcess(command string, env []string, topologyRecovery *TopologyRecovery, fullDescription string) (err error) {
// Log the command to be run and record how long it takes as this may be useful
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("Running %s: %s", fullDescription, command))
start := time.Now()
var info string
if err = os.CommandRun(command, env); err == nil {
info = fmt.Sprintf("Completed %s in %v", fullDescription, time.Since(start))
} else {
info = fmt.Sprintf("Execution of %s failed in %v with error: %v", fullDescription, time.Since(start), err)
log.Errorf(info)
}
_ = AuditTopologyRecovery(topologyRecovery, info)
return err
}
// executeProcesses executes a list of processes
func executeProcesses(processes []string, description string, topologyRecovery *TopologyRecovery, failOnError bool) (err error) {
if len(processes) == 0 {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("No %s hooks to run", description))
return nil
}
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("Running %d %s hooks", len(processes), description))
for i, command := range processes {
command, async := prepareCommand(command, topologyRecovery)
env := applyEnvironmentVariables(topologyRecovery)
fullDescription := fmt.Sprintf("%s hook %d of %d", description, i+1, len(processes))
if async {
fullDescription = fmt.Sprintf("%s (async)", fullDescription)
}
if async {
// Ignore errors
go func() {
_ = executeProcess(command, env, topologyRecovery, fullDescription)
}()
} else {
if cmdErr := executeProcess(command, env, topologyRecovery, fullDescription); cmdErr != nil {
if failOnError {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("Not running further %s hooks", description))
return cmdErr
}
if err == nil {
// Keep first error encountered
err = cmdErr
}
}
}
}
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("done running %s hooks", description))
return err
}
func PrimaryFailoverGeographicConstraintSatisfied(analysisEntry *inst.ReplicationAnalysis, suggestedInstance *inst.Instance) (satisfied bool, dissatisfiedReason string) {
if config.Config.PreventCrossDataCenterPrimaryFailover {
if suggestedInstance.DataCenter != analysisEntry.AnalyzedInstanceDataCenter {
return false, fmt.Sprintf("PreventCrossDataCenterPrimaryFailover: will not promote server in %s when failed server in %s", suggestedInstance.DataCenter, analysisEntry.AnalyzedInstanceDataCenter)
}
}
if config.Config.PreventCrossRegionPrimaryFailover {
if suggestedInstance.Region != analysisEntry.AnalyzedInstanceRegion {
return false, fmt.Sprintf("PreventCrossRegionPrimaryFailover: will not promote server in %s when failed server in %s", suggestedInstance.Region, analysisEntry.AnalyzedInstanceRegion)
}
}
return true, ""
}
// SuggestReplacementForPromotedReplica returns a server to take over the already
// promoted replica, if such server is found and makes an improvement over the promoted replica.
func SuggestReplacementForPromotedReplica(topologyRecovery *TopologyRecovery, deadInstanceKey *inst.InstanceKey, promotedReplica *inst.Instance, candidateInstanceKey *inst.InstanceKey) (replacement *inst.Instance, actionRequired bool, err error) {
candidateReplicas, _ := inst.ReadClusterCandidateInstances(promotedReplica.ClusterName)
candidateReplicas = inst.RemoveInstance(candidateReplicas, deadInstanceKey)
deadInstance, _, err := inst.ReadInstance(deadInstanceKey)
if err != nil {
deadInstance = nil
}
// So we've already promoted a replica.
// However, can we improve on our choice? Are there any replicas marked with "is_candidate"?
// Maybe we actually promoted such a replica. Does that mean we should keep it?
// Maybe we promoted a "neutral", and some "prefer" server is available.
// Maybe we promoted a "prefer_not"
// Maybe we promoted a server in a different DC than the primary
// There's many options. We may wish to replace the server we promoted with a better one.
_ = AuditTopologyRecovery(topologyRecovery, "checking if should replace promoted replica with a better candidate")
if candidateInstanceKey == nil {
_ = AuditTopologyRecovery(topologyRecovery, "+ checking if promoted replica is the ideal candidate")
if deadInstance != nil {
for _, candidateReplica := range candidateReplicas {
if promotedReplica.Key.Equals(&candidateReplica.Key) &&
promotedReplica.DataCenter == deadInstance.DataCenter &&
promotedReplica.PhysicalEnvironment == deadInstance.PhysicalEnvironment {
// Seems like we promoted a candidate in the same DC & ENV as dead IM! Ideal! We're happy!
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("promoted replica %+v is the ideal candidate", promotedReplica.Key))
return promotedReplica, false, nil
}
}
}
}
// We didn't pick the ideal candidate; let's see if we can replace with a candidate from same DC and ENV
if candidateInstanceKey == nil {
// Try a candidate replica that is in same DC & env as the dead instance
_ = AuditTopologyRecovery(topologyRecovery, "+ searching for an ideal candidate")
if deadInstance != nil {
for _, candidateReplica := range candidateReplicas {
if canTakeOverPromotedServerAsPrimary(candidateReplica, promotedReplica) &&
candidateReplica.DataCenter == deadInstance.DataCenter &&
candidateReplica.PhysicalEnvironment == deadInstance.PhysicalEnvironment {
// This would make a great candidate
candidateInstanceKey = &candidateReplica.Key
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("no candidate was offered for %+v but vtorc picks %+v as candidate replacement, based on being in same DC & env as failed instance", *deadInstanceKey, candidateReplica.Key))
}
}
}
}
if candidateInstanceKey == nil {
// We cannot find a candidate in same DC and ENV as dead primary
_ = AuditTopologyRecovery(topologyRecovery, "+ checking if promoted replica is an OK candidate")
for _, candidateReplica := range candidateReplicas {
if promotedReplica.Key.Equals(&candidateReplica.Key) {
// Seems like we promoted a candidate replica (though not in same DC and ENV as dead primary)
satisfied, reason := PrimaryFailoverGeographicConstraintSatisfied(&topologyRecovery.AnalysisEntry, candidateReplica)
if satisfied {
// Good enough. No further action required.
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("promoted replica %+v is a good candidate", promotedReplica.Key))
return promotedReplica, false, nil
}
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("skipping %+v; %s", candidateReplica.Key, reason))
}
}
}
// Still nothing?
if candidateInstanceKey == nil {
// Try a candidate replica that is in same DC & env as the promoted replica (our promoted replica is not an "is_candidate")
_ = AuditTopologyRecovery(topologyRecovery, "+ searching for a candidate")
for _, candidateReplica := range candidateReplicas {
if canTakeOverPromotedServerAsPrimary(candidateReplica, promotedReplica) &&
promotedReplica.DataCenter == candidateReplica.DataCenter &&
promotedReplica.PhysicalEnvironment == candidateReplica.PhysicalEnvironment {
// OK, better than nothing
candidateInstanceKey = &candidateReplica.Key
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("no candidate was offered for %+v but vtorc picks %+v as candidate replacement, based on being in same DC & env as promoted instance", promotedReplica.Key, candidateReplica.Key))
}
}
}
// Still nothing?
if candidateInstanceKey == nil {
// Try a candidate replica (our promoted replica is not an "is_candidate")
_ = AuditTopologyRecovery(topologyRecovery, "+ searching for a candidate")
for _, candidateReplica := range candidateReplicas {
if canTakeOverPromotedServerAsPrimary(candidateReplica, promotedReplica) {
if satisfied, reason := PrimaryFailoverGeographicConstraintSatisfied(&topologyRecovery.AnalysisEntry, candidateReplica); satisfied {
// OK, better than nothing
candidateInstanceKey = &candidateReplica.Key
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("no candidate was offered for %+v but vtorc picks %+v as candidate replacement", promotedReplica.Key, candidateReplica.Key))
} else {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("skipping %+v; %s", candidateReplica.Key, reason))
}
}
}
}
keepSearchingHint := ""
if satisfied, reason := PrimaryFailoverGeographicConstraintSatisfied(&topologyRecovery.AnalysisEntry, promotedReplica); !satisfied {
keepSearchingHint = fmt.Sprintf("Will keep searching; %s", reason)
} else if promotedReplica.PromotionRule == promotionrule.PreferNot {
keepSearchingHint = fmt.Sprintf("Will keep searching because we have promoted a server with prefer_not rule: %+v", promotedReplica.Key)
}
if keepSearchingHint != "" {
_ = AuditTopologyRecovery(topologyRecovery, keepSearchingHint)
neutralReplicas, _ := inst.ReadClusterNeutralPromotionRuleInstances(promotedReplica.ClusterName)
if candidateInstanceKey == nil {
// Still nothing? Then we didn't find a replica marked as "candidate". OK, further down the stream we have:
// find neutral instance in same dv&env as dead primary
_ = AuditTopologyRecovery(topologyRecovery, "+ searching for a neutral server to replace promoted server, in same DC and env as dead primary")
for _, neutralReplica := range neutralReplicas {
if canTakeOverPromotedServerAsPrimary(neutralReplica, promotedReplica) &&
deadInstance.DataCenter == neutralReplica.DataCenter &&
deadInstance.PhysicalEnvironment == neutralReplica.PhysicalEnvironment {
candidateInstanceKey = &neutralReplica.Key
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("no candidate was offered for %+v but vtorc picks %+v as candidate replacement, based on being in same DC & env as dead primary", promotedReplica.Key, neutralReplica.Key))
}
}
}
if candidateInstanceKey == nil {
// find neutral instance in same dv&env as promoted replica
_ = AuditTopologyRecovery(topologyRecovery, "+ searching for a neutral server to replace promoted server, in same DC and env as promoted replica")
for _, neutralReplica := range neutralReplicas {
if canTakeOverPromotedServerAsPrimary(neutralReplica, promotedReplica) &&
promotedReplica.DataCenter == neutralReplica.DataCenter &&
promotedReplica.PhysicalEnvironment == neutralReplica.PhysicalEnvironment {
candidateInstanceKey = &neutralReplica.Key
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("no candidate was offered for %+v but vtorc picks %+v as candidate replacement, based on being in same DC & env as promoted instance", promotedReplica.Key, neutralReplica.Key))
}
}
}
if candidateInstanceKey == nil {
_ = AuditTopologyRecovery(topologyRecovery, "+ searching for a neutral server to replace a prefer_not")
for _, neutralReplica := range neutralReplicas {
if canTakeOverPromotedServerAsPrimary(neutralReplica, promotedReplica) {
if satisfied, reason := PrimaryFailoverGeographicConstraintSatisfied(&topologyRecovery.AnalysisEntry, neutralReplica); satisfied {
// OK, better than nothing
candidateInstanceKey = &neutralReplica.Key
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("no candidate was offered for %+v but vtorc picks %+v as candidate replacement, based on promoted instance having prefer_not promotion rule", promotedReplica.Key, neutralReplica.Key))
} else {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("skipping %+v; %s", neutralReplica.Key, reason))
}
}
}
}
}
// So do we have a candidate?
if candidateInstanceKey == nil {
// Found nothing. Stick with promoted replica
_ = AuditTopologyRecovery(topologyRecovery, "+ found no server to promote on top promoted replica")
return promotedReplica, false, nil
}
if promotedReplica.Key.Equals(candidateInstanceKey) {
// Sanity. It IS the candidate, nothing to promote...
_ = AuditTopologyRecovery(topologyRecovery, "+ sanity check: found our very own server to promote; doing nothing")
return promotedReplica, false, nil
}
replacement, _, err = inst.ReadInstance(candidateInstanceKey)
return replacement, true, err
}
// recoverPrimaryHasPrimary resets the replication on the primary instance
func recoverPrimaryHasPrimary(ctx context.Context, analysisEntry inst.ReplicationAnalysis, candidateInstanceKey *inst.InstanceKey, forceInstanceRecovery bool, skipProcesses bool) (recoveryAttempted bool, topologyRecovery *TopologyRecovery, err error) {
topologyRecovery, err = AttemptRecoveryRegistration(&analysisEntry, false, true)
@ -657,58 +323,10 @@ func postErsCompletion(topologyRecovery *TopologyRecovery, analysisEntry inst.Re
// Success!
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("RecoverDeadPrimary: successfully promoted %+v", promotedReplica.Key))
if config.Config.PrimaryFailoverDetachReplicaPrimaryHost {
postponedFunction := func() error {
_ = AuditTopologyRecovery(topologyRecovery, "- RecoverDeadPrimary: detaching primary host on promoted primary")
_, _ = inst.DetachReplicaPrimaryHost(&promotedReplica.Key)
return nil
}
topologyRecovery.AddPostponedFunction(postponedFunction, fmt.Sprintf("RecoverDeadPrimary, detaching promoted primary host %+v", promotedReplica.Key))
}
_ = attributes.SetGeneralAttribute(analysisEntry.ClusterDetails.ClusterDomain, promotedReplica.Key.StringCode())
if !skipProcesses {
// Execute post primary-failover processes
_ = executeProcesses(config.Config.PostPrimaryFailoverProcesses, "PostPrimaryFailoverProcesses", topologyRecovery, false)
}
}
}
func isGenerallyValidAsWouldBePrimary(replica *inst.Instance, requireLogReplicationUpdates bool) bool {
if !replica.IsLastCheckValid {
// something wrong with this replica right now. We shouldn't hope to be able to promote it
return false
}
if !replica.LogBinEnabled {
return false
}
if requireLogReplicationUpdates && !replica.LogReplicationUpdatesEnabled {
return false
}
if replica.IsBinlogServer() {
return false
}
if inst.IsBannedFromBeingCandidateReplica(replica) {
return false
}
return true
}
func canTakeOverPromotedServerAsPrimary(wantToTakeOver *inst.Instance, toBeTakenOver *inst.Instance) bool {
if !isGenerallyValidAsWouldBePrimary(wantToTakeOver, true) {
return false
}
if !wantToTakeOver.SourceKey.Equals(&toBeTakenOver.Key) {
return false
}
if canReplicate, _ := toBeTakenOver.CanReplicateFrom(wantToTakeOver); !canReplicate {
return false
}
return true
}
// checkAndRecoverGenericProblem is a general-purpose recovery function
func checkAndRecoverLockedSemiSyncPrimary(ctx context.Context, analysisEntry inst.ReplicationAnalysis, candidateInstanceKey *inst.InstanceKey, forceInstanceRecovery bool, skipProcesses bool) (recoveryAttempted bool, topologyRecovery *TopologyRecovery, err error) {
return false, nil, nil
@ -750,7 +368,7 @@ func emergentlyRestartReplicationOnTopologyInstance(instanceKey *inst.InstanceKe
return
}
go inst.ExecuteOnTopology(func() {
_ = inst.RestartReplicationQuick(instanceKey)
_ = restartReplication(instanceKey)
_ = inst.AuditOperation("emergently-restart-replication-topology-instance", instanceKey, string(analysisCode))
})
}
@ -804,12 +422,7 @@ func checkAndExecuteFailureDetectionProcesses(analysisEntry inst.ReplicationAnal
return false, false, nil
}
log.Infof("topology_recovery: detected %+v failure on %+v", analysisEntry.Analysis, analysisEntry.AnalyzedInstanceKey)
// Execute on-detection processes
if skipProcesses {
return true, false, nil
}
err = executeProcesses(config.Config.OnFailureDetectionProcesses, "OnFailureDetectionProcesses", NewTopologyRecovery(analysisEntry), true)
return true, true, err
return true, false, nil
}
// getCheckAndRecoverFunctionCode gets the recovery function code to use for the given analysis.
@ -1120,16 +733,6 @@ func executeCheckAndRecoverFunction(analysisEntry inst.ReplicationAnalysis, cand
// that we just completed because we would be using stale data.
DiscoverInstance(analysisEntry.AnalyzedInstanceKey, true)
}
if !skipProcesses {
if topologyRecovery.SuccessorKey == nil {
// Execute general unsuccessful post failover processes
_ = executeProcesses(config.Config.PostUnsuccessfulFailoverProcesses, "PostUnsuccessfulFailoverProcesses", topologyRecovery, false)
} else {
// Execute general post failover processes
_, _ = inst.EndDowntime(topologyRecovery.SuccessorKey)
_ = executeProcesses(config.Config.PostFailoverProcesses, "PostFailoverProcesses", topologyRecovery, false)
}
}
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("Waiting for %d postponed functions", topologyRecovery.PostponedFunctionsContainer.Len()))
topologyRecovery.Wait()
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("Executed %d postponed functions", topologyRecovery.PostponedFunctionsContainer.Len()))
@ -1166,10 +769,6 @@ func CheckAndRecover(specificInstance *inst.InstanceKey, candidateInstanceKey *i
log.Error(err)
return false, nil, err
}
if *config.RuntimeCLIFlags.Noop {
log.Infof("--noop provided; will not execute processes")
skipProcesses = true
}
// intentionally iterating entries in random order
for _, j := range rand.Perm(len(replicationAnalysis)) {
analysisEntry := replicationAnalysis[j]
@ -1206,210 +805,6 @@ func CheckAndRecover(specificInstance *inst.InstanceKey, candidateInstanceKey *i
return recoveryAttempted, promotedReplicaKey, err
}
func forceAnalysisEntry(clusterName string, analysisCode inst.AnalysisCode, commandHint string, failedInstanceKey *inst.InstanceKey) (analysisEntry inst.ReplicationAnalysis, err error) {
clusterInfo, err := inst.ReadClusterInfo(clusterName)
if err != nil {
return analysisEntry, err
}
clusterAnalysisEntries, err := inst.GetReplicationAnalysis(clusterInfo.ClusterName, &inst.ReplicationAnalysisHints{IncludeDowntimed: true, IncludeNoProblem: true})
if err != nil {
return analysisEntry, err
}
for _, entry := range clusterAnalysisEntries {
if entry.AnalyzedInstanceKey.Equals(failedInstanceKey) {
analysisEntry = entry
}
}
analysisEntry.Analysis = analysisCode // we force this analysis
analysisEntry.CommandHint = commandHint
analysisEntry.ClusterDetails = *clusterInfo
analysisEntry.AnalyzedInstanceKey = *failedInstanceKey
return analysisEntry, nil
}
// ForceExecuteRecovery can be called to issue a recovery process even if analysis says there is no recovery case.
// The caller of this function injects the type of analysis it wishes the function to assume.
// By calling this function one takes responsibility for one's actions.
func ForceExecuteRecovery(analysisEntry inst.ReplicationAnalysis, candidateInstanceKey *inst.InstanceKey, skipProcesses bool) (recoveryAttempted bool, topologyRecovery *TopologyRecovery, err error) {
return executeCheckAndRecoverFunction(analysisEntry, candidateInstanceKey, true, skipProcesses)
}
// ForcePrimaryFailover *trusts* primary of given cluster is dead and initiates a failover
func ForcePrimaryFailover(clusterName string) (topologyRecovery *TopologyRecovery, err error) {
clusterPrimaries, err := inst.ReadClusterPrimary(clusterName)
if err != nil {
return nil, fmt.Errorf("Cannot deduce cluster primary for %+v", clusterName)
}
if len(clusterPrimaries) != 1 {
return nil, fmt.Errorf("Cannot deduce cluster primary for %+v", clusterName)
}
clusterPrimary := clusterPrimaries[0]
analysisEntry, err := forceAnalysisEntry(clusterName, inst.DeadPrimary, inst.ForcePrimaryFailoverCommandHint, &clusterPrimary.Key)
if err != nil {
return nil, err
}
recoveryAttempted, topologyRecovery, err := ForceExecuteRecovery(analysisEntry, nil, false)
if err != nil {
return nil, err
}
if !recoveryAttempted {
return nil, fmt.Errorf("Unexpected error: recovery not attempted. This should not happen")
}
if topologyRecovery == nil {
return nil, fmt.Errorf("Recovery attempted but with no results. This should not happen")
}
if topologyRecovery.SuccessorKey == nil {
return nil, fmt.Errorf("Recovery attempted yet no replica promoted")
}
return topologyRecovery, nil
}
// ForcePrimaryTakeover *trusts* primary of given cluster is dead and fails over to designated instance,
// which has to be its direct child.
func ForcePrimaryTakeover(clusterName string, destination *inst.Instance) (topologyRecovery *TopologyRecovery, err error) {
clusterPrimaries, err := inst.ReadClusterWriteablePrimary(clusterName)
if err != nil {
return nil, fmt.Errorf("Cannot deduce cluster primary for %+v", clusterName)
}
if len(clusterPrimaries) != 1 {
return nil, fmt.Errorf("Cannot deduce cluster primary for %+v", clusterName)
}
clusterPrimary := clusterPrimaries[0]
if !destination.SourceKey.Equals(&clusterPrimary.Key) {
return nil, fmt.Errorf("you may only promote a direct child of the primary %+v. The primary of %+v is %+v", clusterPrimary.Key, destination.Key, destination.SourceKey)
}
log.Infof("will demote %+v and promote %+v instead", clusterPrimary.Key, destination.Key)
analysisEntry, err := forceAnalysisEntry(clusterName, inst.DeadPrimary, inst.ForcePrimaryTakeoverCommandHint, &clusterPrimary.Key)
if err != nil {
return nil, err
}
recoveryAttempted, topologyRecovery, err := ForceExecuteRecovery(analysisEntry, &destination.Key, false)
if err != nil {
return nil, err
}
if !recoveryAttempted {
return nil, fmt.Errorf("Unexpected error: recovery not attempted. This should not happen")
}
if topologyRecovery == nil {
return nil, fmt.Errorf("Recovery attempted but with no results. This should not happen")
}
if topologyRecovery.SuccessorKey == nil {
return nil, fmt.Errorf("Recovery attempted yet no replica promoted")
}
return topologyRecovery, nil
}
// GracefulPrimaryTakeover will demote primary of existing topology and promote its
// direct replica instead.
// It expects that replica to have no siblings.
// This function is graceful in that it will first lock down the primary, then wait
// for the designated replica to catch up with last position.
// It will point old primary at the newly promoted primary at the correct coordinates.
// All of this is accomplished via PlannedReparentShard operation. It is an idempotent operation, look at its documentation for more detail
func GracefulPrimaryTakeover(clusterName string, designatedKey *inst.InstanceKey) (topologyRecovery *TopologyRecovery, err error) {
log.Infof("GracefulPrimaryTakeover for shard %v", clusterName)
clusterPrimaries, err := inst.ReadClusterPrimary(clusterName)
if err != nil {
return nil, fmt.Errorf("Cannot deduce cluster primary for %+v; error: %+v", clusterName, err)
}
if len(clusterPrimaries) != 1 {
return nil, fmt.Errorf("Cannot deduce cluster primary for %+v. Found %+v potential primarys", clusterName, len(clusterPrimaries))
}
clusterPrimary := clusterPrimaries[0]
log.Infof("GracefulPrimaryTakeover for shard %v, current primary - %v", clusterName, clusterPrimary.InstanceAlias)
analysisEntry, err := forceAnalysisEntry(clusterName, inst.GraceFulPrimaryTakeover, inst.GracefulPrimaryTakeoverCommandHint, &clusterPrimary.Key)
if err != nil {
return nil, err
}
topologyRecovery, err = AttemptRecoveryRegistration(&analysisEntry, false /*failIfFailedInstanceInActiveRecovery*/, false /*failIfClusterInActiveRecovery*/)
if err != nil {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("could not register recovery on %+v. Unable to issue PlannedReparentShard. err=%v", analysisEntry.AnalyzedInstanceKey, err))
return topologyRecovery, err
}
if topologyRecovery == nil {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("could not register recovery on %+v. Unable to issue PlannedReparentShard. Recovery is nil", analysisEntry.AnalyzedInstanceKey))
return nil, nil
}
// recovery is now registered. From now on this process owns the recovery and other processes will notbe able to run
// a recovery for some time.
// Let's audit anything that happens from this point on, including any early return
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("registered recovery on %+v. Recovery: %+v", analysisEntry.AnalyzedInstanceKey, topologyRecovery))
var promotedReplica *inst.Instance
// This has to be done in the end; whether successful or not, we should mark that the recovery is done.
// So that after the active period passes, we are able to run other recoveries.
defer func() {
_ = resolveRecovery(topologyRecovery, promotedReplica)
}()
primaryTablet, err := inst.ReadTablet(clusterPrimary.Key)
if err != nil {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("rerror reading primary tablet %+v: %+v", clusterPrimary.Key, err))
return topologyRecovery, err
}
if designatedKey != nil && !designatedKey.IsValid() {
// An empty or invalid key is as good as no key
designatedKey = nil
}
var designatedTabletAlias *topodatapb.TabletAlias
if designatedKey != nil {
designatedTablet, err := inst.ReadTablet(*designatedKey)
if err != nil {
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("rerror reading designated tablet %+v: %+v", *designatedKey, err))
return topologyRecovery, err
}
designatedTabletAlias = designatedTablet.Alias
_ = AuditTopologyRecovery(topologyRecovery, fmt.Sprintf("started PlannedReparentShard, new primary will be %s.", topoproto.TabletAliasString(designatedTabletAlias)))
} else {
_ = AuditTopologyRecovery(topologyRecovery, "started PlannedReparentShard with automatic primary selection.")
}
// check for the constraint failure for cross cell promotion
if designatedTabletAlias != nil && designatedTabletAlias.Cell != primaryTablet.Alias.Cell && config.Config.PreventCrossDataCenterPrimaryFailover {
errorMessage := fmt.Sprintf("GracefulPrimaryTakeover: constraint failure - %s and %s are in different cells", topoproto.TabletAliasString(designatedTabletAlias), topoproto.TabletAliasString(primaryTablet.Alias))
_ = AuditTopologyRecovery(topologyRecovery, errorMessage)
return topologyRecovery, fmt.Errorf(errorMessage)
}
ev, err := reparentutil.NewPlannedReparenter(ts, tmclient.NewTabletManagerClient(), logutil.NewCallbackLogger(func(event *logutilpb.Event) {
level := event.GetLevel()
value := event.GetValue()
// we only log the warnings and errors explicitly, everything gets logged as an information message anyways in auditing topology recovery
switch level {
case logutilpb.Level_WARNING:
log.Warningf("PRS - %s", value)
case logutilpb.Level_ERROR:
log.Errorf("PRS - %s", value)
}
_ = AuditTopologyRecovery(topologyRecovery, value)
})).ReparentShard(context.Background(),
primaryTablet.Keyspace,
primaryTablet.Shard,
reparentutil.PlannedReparentOptions{
NewPrimaryAlias: designatedTabletAlias,
WaitReplicasTimeout: time.Duration(config.Config.WaitReplicasTimeoutSeconds) * time.Second,
},
)
// here we need to forcefully refresh all the tablets because we know we have made a cluster wide operation,
// and it affects the replication information for all the tablets
forceRefreshAllTabletsInShard(context.Background(), primaryTablet.Keyspace, primaryTablet.Shard)
if ev != nil && ev.NewPrimary != nil {
promotedReplica, _, _ = inst.ReadInstance(&inst.InstanceKey{
Hostname: ev.NewPrimary.MysqlHostname,
Port: int(ev.NewPrimary.MysqlPort),
})
}
postPrsCompletion(topologyRecovery, analysisEntry, promotedReplica)
return topologyRecovery, err
}
func postPrsCompletion(topologyRecovery *TopologyRecovery, analysisEntry inst.ReplicationAnalysis, promotedReplica *inst.Instance) {
if promotedReplica != nil {
message := fmt.Sprintf("promoted replica: %+v", promotedReplica.Key)

Просмотреть файл

@ -40,7 +40,6 @@ func AttemptFailureDetectionRegistration(analysisEntry *inst.ReplicationAnalysis
string(analysisEntry.Analysis),
analysisEntry.ClusterDetails.ClusterName,
analysisEntry.CountReplicas,
analysisEntry.Replicas.ToCommaDelimitedList(),
analysisEntry.IsActionableRecovery,
)
startActivePeriodHint := "now()"
@ -61,7 +60,6 @@ func AttemptFailureDetectionRegistration(analysisEntry *inst.ReplicationAnalysis
analysis,
cluster_name,
count_affected_replicas,
replica_hosts,
is_actionable,
start_active_period
) values (
@ -75,7 +73,6 @@ func AttemptFailureDetectionRegistration(analysisEntry *inst.ReplicationAnalysis
?,
?,
?,
?,
%s
)
`, startActivePeriodHint)
@ -104,7 +101,7 @@ func ClearActiveFailureDetections() error {
in_active_period = 1
AND start_active_period < NOW() - INTERVAL ? MINUTE
`,
config.Config.FailureDetectionPeriodBlockMinutes,
config.FailureDetectionPeriodBlockMinutes,
)
if err != nil {
log.Error(err)
@ -112,24 +109,6 @@ func ClearActiveFailureDetections() error {
return err
}
// clearAcknowledgedFailureDetections clears the "in_active_period" flag for detections
// that were acknowledged
func clearAcknowledgedFailureDetections(whereClause string, args []any) error {
query := fmt.Sprintf(`
update topology_failure_detection set
in_active_period = 0,
end_active_period_unixtime = UNIX_TIMESTAMP()
where
in_active_period = 1
and %s
`, whereClause)
_, err := db.ExecVTOrc(query, args...)
if err != nil {
log.Error(err)
}
return err
}
func writeTopologyRecovery(topologyRecovery *TopologyRecovery) (*TopologyRecovery, error) {
analysisEntry := topologyRecovery.AnalysisEntry
sqlResult, err := db.ExecVTOrc(`
@ -147,7 +126,6 @@ func writeTopologyRecovery(topologyRecovery *TopologyRecovery) (*TopologyRecover
analysis,
cluster_name,
count_affected_replicas,
replica_hosts,
last_detection_id
) values (
?,
@ -162,7 +140,6 @@ func writeTopologyRecovery(topologyRecovery *TopologyRecovery) (*TopologyRecover
?,
?,
?,
?,
(select ifnull(max(detection_id), 0) from topology_failure_detection where hostname=? and port=?)
)
`,
@ -172,7 +149,7 @@ func writeTopologyRecovery(topologyRecovery *TopologyRecovery) (*TopologyRecover
process.ThisHostname, util.ProcessToken.Hash,
string(analysisEntry.Analysis),
analysisEntry.ClusterDetails.ClusterName,
analysisEntry.CountReplicas, analysisEntry.Replicas.ToCommaDelimitedList(),
analysisEntry.CountReplicas,
analysisEntry.AnalyzedInstanceKey.Hostname, analysisEntry.AnalyzedInstanceKey.Port,
)
if err != nil {
@ -394,54 +371,6 @@ func acknowledgeRecoveries(owner string, comment string, markEndRecovery bool, w
return rows, err
}
// AcknowledgeAllRecoveries acknowledges all unacknowledged recoveries.
func AcknowledgeAllRecoveries(owner string, comment string) (countAcknowledgedEntries int64, err error) {
whereClause := `1 = 1`
return acknowledgeRecoveries(owner, comment, false, whereClause, sqlutils.Args())
}
// AcknowledgeRecovery acknowledges a particular recovery.
// This also implied clearing their active period, which in turn enables further recoveries on those topologies
func AcknowledgeRecovery(recoveryID int64, owner string, comment string) (countAcknowledgedEntries int64, err error) {
whereClause := `recovery_id = ?`
return acknowledgeRecoveries(owner, comment, false, whereClause, sqlutils.Args(recoveryID))
}
// AcknowledgeRecovery acknowledges a particular recovery.
// This also implied clearing their active period, which in turn enables further recoveries on those topologies
func AcknowledgeRecoveryByUID(recoveryUID string, owner string, comment string) (countAcknowledgedEntries int64, err error) {
whereClause := `uid = ?`
return acknowledgeRecoveries(owner, comment, false, whereClause, sqlutils.Args(recoveryUID))
}
// AcknowledgeClusterRecoveries marks active recoveries for given cluster as acknowledged.
// This also implied clearing their active period, which in turn enables further recoveries on those topologies
func AcknowledgeClusterRecoveries(clusterName string, owner string, comment string) (countAcknowledgedEntries int64, err error) {
{
whereClause := `cluster_name = ?`
args := sqlutils.Args(clusterName)
_ = clearAcknowledgedFailureDetections(whereClause, args)
count, err := acknowledgeRecoveries(owner, comment, false, whereClause, args)
if err != nil {
return count, err
}
countAcknowledgedEntries = countAcknowledgedEntries + count
}
return countAcknowledgedEntries, nil
}
// AcknowledgeInstanceRecoveries marks active recoveries for given instane as acknowledged.
// This also implied clearing their active period, which in turn enables further recoveries on those topologies
func AcknowledgeInstanceRecoveries(instanceKey *inst.InstanceKey, owner string, comment string) (countAcknowledgedEntries int64, err error) {
whereClause := `
hostname = ?
and port = ?
`
args := sqlutils.Args(instanceKey.Hostname, instanceKey.Port)
_ = clearAcknowledgedFailureDetections(whereClause, args)
return acknowledgeRecoveries(owner, comment, false, whereClause, args)
}
// AcknowledgeInstanceCompletedRecoveries marks active and COMPLETED recoveries for given instane as acknowledged.
// This also implied clearing their active period, which in turn enables further recoveries on those topologies
func AcknowledgeInstanceCompletedRecoveries(instanceKey *inst.InstanceKey, owner string, comment string) (countAcknowledgedEntries int64, err error) {
@ -518,7 +447,6 @@ func readRecoveries(whereCondition string, limit string, args []any) ([]*Topolog
analysis,
cluster_name,
count_affected_replicas,
replica_hosts,
participating_instances,
lost_replicas,
all_errors,
@ -551,7 +479,6 @@ func readRecoveries(whereCondition string, limit string, args []any) ([]*Topolog
topologyRecovery.AnalysisEntry.Analysis = inst.AnalysisCode(m.GetString("analysis"))
topologyRecovery.AnalysisEntry.ClusterDetails.ClusterName = m.GetString("cluster_name")
topologyRecovery.AnalysisEntry.CountReplicas = m.GetUint("count_affected_replicas")
_ = topologyRecovery.AnalysisEntry.ReadReplicaHostsFromString(m.GetString("replica_hosts"))
topologyRecovery.SuccessorKey = &inst.InstanceKey{}
topologyRecovery.SuccessorKey.Hostname = m.GetString("successor_hostname")
@ -581,16 +508,6 @@ func readRecoveries(whereCondition string, limit string, args []any) ([]*Topolog
return res, err
}
// ReadActiveRecoveries reads active recovery entry/audit entries from topology_recovery
func ReadActiveClusterRecovery(clusterName string) ([]*TopologyRecovery, error) {
whereClause := `
where
in_active_period=1
and end_recovery is null
and cluster_name=?`
return readRecoveries(whereClause, ``, sqlutils.Args(clusterName))
}
// ReadInActivePeriodClusterRecovery reads recoveries (possibly complete!) that are in active period.
// (may be used to block further recoveries on this cluster)
func ReadInActivePeriodClusterRecovery(clusterName string) ([]*TopologyRecovery, error) {
@ -601,15 +518,6 @@ func ReadInActivePeriodClusterRecovery(clusterName string) ([]*TopologyRecovery,
return readRecoveries(whereClause, ``, sqlutils.Args(clusterName))
}
// ReadRecentlyActiveClusterRecovery reads recently completed entries for a given cluster
func ReadRecentlyActiveClusterRecovery(clusterName string) ([]*TopologyRecovery, error) {
whereClause := `
where
end_recovery > now() - interval 5 minute
and cluster_name=?`
return readRecoveries(whereClause, ``, sqlutils.Args(clusterName))
}
// ReadInActivePeriodSuccessorInstanceRecovery reads completed recoveries for a given instance, where said instance
// was promoted as result, still in active period (may be used to block further recoveries should this instance die)
func ReadInActivePeriodSuccessorInstanceRecovery(instanceKey *inst.InstanceKey) ([]*TopologyRecovery, error) {
@ -621,28 +529,6 @@ func ReadInActivePeriodSuccessorInstanceRecovery(instanceKey *inst.InstanceKey)
return readRecoveries(whereClause, ``, sqlutils.Args(instanceKey.Hostname, instanceKey.Port))
}
// ReadRecentlyActiveInstanceRecovery reads recently completed entries for a given instance
func ReadRecentlyActiveInstanceRecovery(instanceKey *inst.InstanceKey) ([]*TopologyRecovery, error) {
whereClause := `
where
end_recovery > now() - interval 5 minute
and
successor_hostname=? and successor_port=?`
return readRecoveries(whereClause, ``, sqlutils.Args(instanceKey.Hostname, instanceKey.Port))
}
// ReadRecovery reads completed recovery entry/audit entries from topology_recovery
func ReadRecovery(recoveryID int64) ([]*TopologyRecovery, error) {
whereClause := `where recovery_id = ?`
return readRecoveries(whereClause, ``, sqlutils.Args(recoveryID))
}
// ReadRecoveryByUID reads completed recovery entry/audit entries from topology_recovery
func ReadRecoveryByUID(recoveryUID string) ([]*TopologyRecovery, error) {
whereClause := `where uid = ?`
return readRecoveries(whereClause, ``, sqlutils.Args(recoveryUID))
}
// ReadRecentRecoveries reads latest recovery entries from topology_recovery
func ReadRecentRecoveries(clusterName string, unacknowledgedOnly bool, page int) ([]*TopologyRecovery, error) {
whereConditions := []string{}
@ -665,125 +551,6 @@ func ReadRecentRecoveries(clusterName string, unacknowledgedOnly bool, page int)
return readRecoveries(whereClause, limit, args)
}
// readRecoveries reads recovery entry/audit entries from topology_recovery
func readFailureDetections(whereCondition string, limit string, args []any) ([]*TopologyRecovery, error) {
res := []*TopologyRecovery{}
query := fmt.Sprintf(`
select
detection_id,
hostname,
port,
in_active_period as is_active,
start_active_period,
end_active_period_unixtime,
processing_node_hostname,
processcing_node_token,
analysis,
cluster_name,
count_affected_replicas,
replica_hosts,
(select max(recovery_id) from topology_recovery where topology_recovery.last_detection_id = detection_id) as related_recovery_id
from
topology_failure_detection
%s
order by
detection_id desc
%s
`, whereCondition, limit)
err := db.QueryVTOrc(query, args, func(m sqlutils.RowMap) error {
failureDetection := TopologyRecovery{}
failureDetection.ID = m.GetInt64("detection_id")
failureDetection.IsActive = m.GetBool("is_active")
failureDetection.RecoveryStartTimestamp = m.GetString("start_active_period")
failureDetection.ProcessingNodeHostname = m.GetString("processing_node_hostname")
failureDetection.ProcessingNodeToken = m.GetString("processcing_node_token")
failureDetection.AnalysisEntry.AnalyzedInstanceKey.Hostname = m.GetString("hostname")
failureDetection.AnalysisEntry.AnalyzedInstanceKey.Port = m.GetInt("port")
failureDetection.AnalysisEntry.Analysis = inst.AnalysisCode(m.GetString("analysis"))
failureDetection.AnalysisEntry.ClusterDetails.ClusterName = m.GetString("cluster_name")
failureDetection.AnalysisEntry.CountReplicas = m.GetUint("count_affected_replicas")
_ = failureDetection.AnalysisEntry.ReadReplicaHostsFromString(m.GetString("replica_hosts"))
failureDetection.AnalysisEntry.StartActivePeriod = m.GetString("start_active_period")
failureDetection.RelatedRecoveryID = m.GetInt64("related_recovery_id")
failureDetection.AnalysisEntry.ClusterDetails.ReadRecoveryInfo()
res = append(res, &failureDetection)
return nil
})
if err != nil {
log.Error(err)
}
return res, err
}
// ReadRecentFailureDetections
func ReadRecentFailureDetections(clusterName string, page int) ([]*TopologyRecovery, error) {
whereClause := ""
args := sqlutils.Args()
if clusterName != "" {
whereClause = `where cluster_name = ?`
args = append(args, clusterName)
}
limit := `
limit ?
offset ?`
args = append(args, config.AuditPageSize, page*config.AuditPageSize)
return readFailureDetections(whereClause, limit, args)
}
// ReadFailureDetection
func ReadFailureDetection(detectionID int64) ([]*TopologyRecovery, error) {
whereClause := `where detection_id = ?`
return readFailureDetections(whereClause, ``, sqlutils.Args(detectionID))
}
// ReadBlockedRecoveries reads blocked recovery entries, potentially filtered by cluster name (empty to unfilter)
func ReadBlockedRecoveries(clusterName string) ([]BlockedTopologyRecovery, error) {
res := []BlockedTopologyRecovery{}
whereClause := ""
args := sqlutils.Args()
if clusterName != "" {
whereClause = `where cluster_name = ?`
args = append(args, clusterName)
}
query := fmt.Sprintf(`
select
hostname,
port,
cluster_name,
analysis,
last_blocked_timestamp,
blocking_recovery_id
from
blocked_topology_recovery
%s
order by
last_blocked_timestamp desc
`, whereClause)
err := db.QueryVTOrc(query, args, func(m sqlutils.RowMap) error {
blockedTopologyRecovery := BlockedTopologyRecovery{}
blockedTopologyRecovery.FailedInstanceKey.Hostname = m.GetString("hostname")
blockedTopologyRecovery.FailedInstanceKey.Port = m.GetInt("port")
blockedTopologyRecovery.ClusterName = m.GetString("cluster_name")
blockedTopologyRecovery.Analysis = inst.AnalysisCode(m.GetString("analysis"))
blockedTopologyRecovery.LastBlockedTimestamp = m.GetString("last_blocked_timestamp")
blockedTopologyRecovery.BlockingRecoveryID = m.GetInt64("blocking_recovery_id")
res = append(res, blockedTopologyRecovery)
return nil
})
if err != nil {
log.Error(err)
}
return res, err
}
// writeTopologyRecoveryStep writes down a single step in a recovery process
func writeTopologyRecoveryStep(topologyRecoveryStep *TopologyRecoveryStep) error {
sqlResult, err := db.ExecVTOrc(`
@ -804,35 +571,6 @@ func writeTopologyRecoveryStep(topologyRecoveryStep *TopologyRecoveryStep) error
return err
}
// ReadTopologyRecoverySteps reads recovery steps for a given recovery
func ReadTopologyRecoverySteps(recoveryUID string) ([]TopologyRecoveryStep, error) {
res := []TopologyRecoveryStep{}
query := `
select
recovery_step_id, recovery_uid, audit_at, message
from
topology_recovery_steps
where
recovery_uid=?
order by
recovery_step_id asc
`
err := db.QueryVTOrc(query, sqlutils.Args(recoveryUID), func(m sqlutils.RowMap) error {
recoveryStep := TopologyRecoveryStep{}
recoveryStep.RecoveryUID = recoveryUID
recoveryStep.ID = m.GetInt64("recovery_step_id")
recoveryStep.AuditAt = m.GetString("audit_at")
recoveryStep.Message = m.GetString("message")
res = append(res, recoveryStep)
return nil
})
if err != nil {
log.Error(err)
}
return res, err
}
// ExpireFailureDetectionHistory removes old rows from the topology_failure_detection table
func ExpireFailureDetectionHistory() error {
return inst.ExpireTableData("topology_failure_detection", "start_active_period")

Просмотреть файл

@ -1,92 +0,0 @@
/*
Copyright 2014 Outbrain Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package os
import (
"fmt"
"os"
"os/exec"
"strings"
"syscall"
"vitess.io/vitess/go/vt/log"
"vitess.io/vitess/go/vt/vtorc/config"
)
// CommandRun executes some text as a command. This is assumed to be
// text that will be run by a shell so we need to write out the
// command to a temporary file and then ask the shell to execute
// it, after which the temporary file is removed.
func CommandRun(commandText string, env []string, arguments ...string) error {
// show the actual command we have been asked to run
log.Infof("CommandRun(%v,%+v)", commandText, arguments)
cmd, shellScript, err := generateShellScript(commandText, env, arguments...)
defer os.Remove(shellScript)
if err != nil {
log.Error(err)
return err
}
var waitStatus syscall.WaitStatus
log.Infof("CommandRun/running: %s", strings.Join(cmd.Args, " "))
cmdOutput, err := cmd.CombinedOutput()
log.Infof("CommandRun: %s\n", string(cmdOutput))
if err != nil {
// Did the command fail because of an unsuccessful exit code
if exitError, ok := err.(*exec.ExitError); ok {
waitStatus = exitError.Sys().(syscall.WaitStatus)
log.Errorf("CommandRun: failed. exit status %d", waitStatus.ExitStatus())
}
errMsg := fmt.Sprintf("(%s) %s", err.Error(), cmdOutput)
log.Error(errMsg)
return fmt.Errorf(errMsg)
}
// Command was successful
waitStatus = cmd.ProcessState.Sys().(syscall.WaitStatus)
log.Infof("CommandRun successful. exit status %d", waitStatus.ExitStatus())
return nil
}
// generateShellScript generates a temporary shell script based on
// the given command to be executed, writes the command to a temporary
// file and returns the exec.Command which can be executed together
// with the script name that was created.
func generateShellScript(commandText string, env []string, arguments ...string) (*exec.Cmd, string, error) {
shell := config.Config.ProcessesShellCommand
commandBytes := []byte(commandText)
tmpFile, err := os.CreateTemp("", "vtorc-process-cmd-")
if err != nil {
errMsg := fmt.Sprintf("generateShellScript() failed to create TempFile: %v", err.Error())
log.Errorf(errMsg)
return nil, "", fmt.Errorf(errMsg)
}
// write commandText to temporary file
_ = os.WriteFile(tmpFile.Name(), commandBytes, 0640)
shellArguments := append([]string{}, tmpFile.Name())
shellArguments = append(shellArguments, arguments...)
cmd := exec.Command(shell, shellArguments...)
cmd.Env = env
return cmd, tmpFile.Name(), nil
}

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше