Benchmark - Fix torch.dist init issue with multiple models (#495)

Fix potential barrier timeout in init_process_group due to race condition of using the same port. Change to different ports when running multiple models sequentially in one process. For example, when running vgg11/13/16/19, will use port 29501~29504 respectively.
2023-03-21 20:35:03 +08:00 · 2023-03-21 20:35:03 +08:00 · 644b5395df
--- a/superbench/benchmarks/model_benchmarks/pytorch_base.py
+++ b/superbench/benchmarks/model_benchmarks/pytorch_base.py
@ -70,7 +70,8 @@ class PytorchBase(ModelBenchmark):
                    )
                    return False
                # torch >= 1.9.0a0 torch.distributed.elastic is used by default
-                port = int(os.environ['MASTER_PORT']) + 1
+                port = int(os.environ.get('MASTER_PORT', '29500')) + 1
+                os.environ['MASTER_PORT'] = str(port)
                addr = os.environ['MASTER_ADDR']
                self._global_rank = int(os.environ['RANK'])
                self._local_rank = int(os.environ['LOCAL_RANK'])