Occasional docker-compose errors will be easier to diagnose (#11835)
With this change we attempt to better diagnose some occasional network docker-compose issues that have beeen plaguing us after we solved or workarounded other CI-related issues. Sometimes the docker compose jobs fail on checking if the container is up and running with either of the two errors: * 'forward host lookup failed: Unknown host` * 'DNS fwd/rev mismatch' Usually this happens in rabbitMQ and openldap containers. Both indicate a problem with DNS of the docker engine or maybe some remnants of the previous docker run that do not allow us to start those containers. This change introduces few improvements: * added --volume in `docker system prune` command which might clean-up some anonymous volumes left by the containers between runs * removed docker-compose down --remove-orphans --down command after failure, as currently we are anyhow always doing it few lines before (before the test). This change will cause that our mechanism of logging container logs after failure will likely give us more information about in case the root cause is rabbitmq or openldap container failing to start * Increases number of tries to 5 in case of failed containers.
This commit is contained in:
Родитель
a5d3176878
Коммит
2f4a3d48a8
|
@ -31,17 +31,17 @@ function run_airflow_testing_in_docker() {
|
|||
set +u
|
||||
set +e
|
||||
local exit_code
|
||||
for try_num in {1..3}
|
||||
for try_num in {1..5}
|
||||
do
|
||||
echo
|
||||
echo "Making sure docker-compose is down"
|
||||
echo "Making sure docker-compose is down and remnants removed"
|
||||
echo
|
||||
docker-compose --log-level INFO -f "${SCRIPTS_CI_DIR}/docker-compose/base.yml" \
|
||||
down --remove-orphans --volumes --timeout 10
|
||||
echo
|
||||
echo "System-prune docker"
|
||||
echo
|
||||
docker system prune --force
|
||||
docker system prune --force --volumes
|
||||
echo
|
||||
echo "Check available space"
|
||||
echo
|
||||
|
@ -70,15 +70,9 @@ function run_airflow_testing_in_docker() {
|
|||
echo "Delete kerberos network"
|
||||
kerberos::delete_kerberos_network
|
||||
fi
|
||||
if [[ ${exit_code} == 254 ]]; then
|
||||
if [[ ${exit_code} == "254" && ${try_num} != "5" ]]; then
|
||||
echo
|
||||
echo "Failed starting integration on ${try_num} try. Wiping-out docker-compose remnants"
|
||||
echo
|
||||
docker-compose --log-level INFO \
|
||||
-f "${SCRIPTS_CI_DIR}/docker-compose/base.yml" \
|
||||
down --remove-orphans -v --timeout 5
|
||||
echo
|
||||
echo "Sleeping 5 seconds"
|
||||
echo "Failed try num ${try_num}. Sleeping 5 seconds for retry"
|
||||
echo
|
||||
sleep 5
|
||||
continue
|
||||
|
|
|
@ -21,5 +21,5 @@
|
|||
sudo swapoff -a
|
||||
sudo rm -f /swapfile
|
||||
sudo apt clean
|
||||
docker system prune --all
|
||||
docker system prune --all --force
|
||||
df -h
|
||||
|
|
Загрузка…
Ссылка в новой задаче