This is an alternative to 'make integration_test',
with the following advantages:
* Tests run in Docker, so no bootstrap is necessary.
* Tests are hermetic and can run in parallel.
* Test against different flavors just by setting a flag.
* Failing tests are retried to see if they are flaky.
* A failed test will be recorded for later inspection, while the script
continues to run other tests.
* A test that takes too long will be considered stuck and retried.
There's plenty of room for improvement, but now that we have something
in a more readable language than Makefile, we can iterate.