vitess-gh/doc/MySQLServerProtocol.md

5.2 KiB

MySQL Binary Protocol

Vitess 2.1 is adding alpha support for the MySQL binary protocol (refered to as the protocol in the rest of this document). This allows existing applications to connect to Vitess directly without any change, or new driver / connector.

However, this also has limitations. The RPC protocol traditionnally exposed by Vitess is reacher in features, and allows finer grain control of the query behaviours.

This document explores the limitations of using this protocol.

Protocol Limitations

The following features are not represented in the protocol.

Bind Variables

In the traditional connector, queries are sent as plain text to the server. This protocol does not support bind variable.

The Prepared Statement part of the API can support bind variables, but it requires per-connection state, and using binary bind variables, which are much harder to implement. It is also not recommended.

Most database drivers (JDBC, go, ...) support bind variables. These end up being implemented as client-side bind variables, where the values are printed in the SQL statement by the client, and re-interpreted by the server.

A Vitess Connector, on the other end, can send the Bind Variable map to the server along with the query. The query plan is then cached by both vtgate and vttablet, providing much better execution times.

Note we added the normalize_queries command line parameter to vtgate to mitigate this problem. With this flag, vtgate will try to extract bind variables from full queries. This makes the vttablet side optimized, but still costs extra CPU on the vtgate side.

Tablet Type

The regular Vitess API provides the ability to specify the tablet type to target: master, replica, rdonly. The current MySQL connector we created only uses the master type.

Note we could implement a different policy for this.

Transaction Type

The regular Vitess API provides the ability to specify the transaction type: one shard only, or 2PC. The current MYSQL connector uses the transaction type provided to vtgate by the transaction_mode, which usually is the highest transaction level allowed by vtgate, not the default one.

Streaming Queries

The Vitess RPC protocol supports both non-streaming queries (for web-like traffic), and streaming queries (for data analysis traffic). The current MySQL connector only uses non-streaming queries.

This could be changed with a flag. Or we could try to be smart: anything within a transaction could be non-streaming query, anything outside a transaction could be streaming.

Extra Query Features

There are even more Vitess specific features in the API, that are not represented in the MySQL server API. Event Tokens for cache invalidation, Update Stream, Messages, ... These seem somewhat impossible to implement using the MySQL binary protocol.

Security

The current RPC protocol uses the security provided by the RPC protocol, TLS for gRPC. We then use the certificate name as the user name for our authorization (table ACLs).

The MySQL server connector requires user names and passwords, that need to maintained, rotated, ... We do not include TLS support yet, although it can be easily added. We also need to add the username as authenticated user to use for our authorization.

We have plans to provide an LDAP plug-in to authenticate the users (but no firm development plan for it yet).

Query Multiplexing

gRPC can multiplex multiple request / responses on the same TCP connection. The MySQL server protocol is very synchronous, and can only have one request in flight at any given time.

With all these limitations, why did we even bother implementing this? Well, there is something to be said for drop-in replacement and ease of use:

  • When migrating an existing application to Vitess, it is useful to run Vitess on top of an existing database, and only change the server address in the client code to point at vtgate, and not change anything else. This allows for an easier transition, and then the client code can be update later to use a better connector.

  • Some tools only support MySQL server protocol, and not connectors / not Vitess (yet!).

  • Most of the mentioned limitations are going to affect production systems running at a somewhat high load. For smaller workloads, none of this really matters.

  • A lot more programming languages have MySQL connectors than what we support already (we have connectors for Java, Python, PHP, Go). They can use the MySQL server protocol until a native connector for gRPC can be immplemented.

Future Features

We are thinking about the following features to make this connector more useful:

  • Add SSL support.

  • Provided more authentication schemes than mysql_native_password. In particular, sha256 seems easy.

  • Use an LDAP client to validate users and passwords on the server side.

  • Implement a lot more DBA features in the protocol. Statements like SHOW TABLES are not supported yet. They will make the MySQL binary protocol much more useful.

  • Possibly add hints to the queries so they can use more advanced features. Like provide a commit 2pc syntax to enable 2PC for some commits.

  • Provide direct SQL access for some features like Vitess Messages.