updated Readme and removed dev reframe config
This commit is contained in:
Родитель
f73df0a5fc
Коммит
da344b26a3
53
README.md
53
README.md
|
@ -134,6 +134,54 @@ Whether it is a bash or a python script, anything executable can be a test, as l
|
|||
- To receive a meaningful report on the error, you need to output the message into the stdout
|
||||
- If you want the report to contain more information than a single message can convey, you can make your script output a json string - just make sure it has a field "message" that would be used to log the error. If you do this, everything but the message field will end up in the "extra-info" part of the report as a valid json (please refer to the [Sample healthcheck report](#sample-healthcheck-report) section for an example). If there are any formatting issues or you fail to include the "message" field, the whole json construction will become the reported message instead
|
||||
|
||||
|
||||
## Running reframe test scripts
|
||||
|
||||
Update healthchecks.custom.pattern in the cluster-ini template to a pattern that the healthcheck will use to determine which test scripts to run.
|
||||
|
||||
![Alt](/images/reframe_pattern.png "Reframe pattern")
|
||||
|
||||
Alternatively, you can change the cluster template directly. This can be useful if you are planning to set up multiple clusters using that template:
|
||||
|
||||
```ini
|
||||
[[[configuration healthchecks.reframe]]]
|
||||
pattern = *.py
|
||||
```
|
||||
003_run_reframe.sh basically clones Jon shelly's repo to install and run reframe tests and then like other tests uses hcheck project to send log to cyclecloud and generate a report.
|
||||
|
||||
## Developer testing for reframe scripts
|
||||
|
||||
Example configuration for reframe tests for Dev testing:
|
||||
If you are using Centos then you would need to edit the azure_centos_7.py file present in Jon Shelly's repo https://github.com/JonShelley/reframe/blob/master/azure_nhc/config/azure_centos_7.py to include the sku configuration in following way:
|
||||
```bash
|
||||
site_configuration = {
|
||||
'systems': [
|
||||
{
|
||||
'name': 'fs_v2',
|
||||
'descr': 'Azure FV2',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'vm_size': 'F2s_v2',
|
||||
'hostnames': ['*'],
|
||||
'modules_system': 'tmod32',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'hpc',
|
||||
'scheduler': 'slurm',
|
||||
'launcher': 'srun',
|
||||
'max_jobs': 100,
|
||||
'access': ['-p hpc'],
|
||||
'environs': ['builtin'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
]
|
||||
}
|
||||
```
|
||||
You need to replace the config file in 003_run_reframe.sh script after cloning the repo example below:
|
||||
cp azure_nhc/config/azure_centos_7.py azure_nhc/config/azure_centos_7_backup.py
|
||||
cp ${CYCLECLOUD_SPEC_PATH}/files/azure_centos_7.py azure_nhc/config/azure_centos_7.py
|
||||
|
||||
## Running the hcheck binary
|
||||
|
||||
You should never have to run the tool manually, but in the case you want to do so, here is a list of supported parameters the tool accepts
|
||||
|
@ -148,7 +196,10 @@ You should never have to run the tool manually, but in the case you want to do s
|
|||
| --nr | Number of reruns for the set of scripts | --nr 3 |
|
||||
| --pt | Pattern for custom script detection | -pt .sh |
|
||||
| --rpath | Path to where the report would be generated | --rpath /tmp/log/report.json |
|
||||
| --rscript | Path to the script reporting the results back to the portal | --rscript ./send_logs |
|
||||
| --rscript | Path to the script reporting the results back to the portal | --rscript ./send_logs |
|
||||
| --python | Path used to specify where python code is to report errors to App Insights | --python /send_log_appInsights |
|
||||
| --reframe | Path used to specify where reframe is installed. | --reframe /reframe/bin/reframe |
|
||||
| --config | Path used to specify where the reframe tests config file is present | --config /reframe/azure_nhc/config/azure_centos_7.py |
|
||||
|
||||
## Changing the script for reporting errors
|
||||
|
||||
|
|
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 18 KiB |
|
@ -1,221 +0,0 @@
|
|||
# Copyright 2016-2021 Swiss National Supercomputing Centre (CSCS/ETH Zurich)
|
||||
# ReFrame Project Developers. See the top-level LICENSE file for details.
|
||||
#
|
||||
# SPDX-License-Identifier: BSD-3-Clause
|
||||
|
||||
#
|
||||
# Generic fallback configuration
|
||||
#
|
||||
|
||||
site_configuration = {
|
||||
'systems': [
|
||||
{
|
||||
'name': 'fs_v2',
|
||||
'descr': 'Azure FV2',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'vm_size': 'F2s_v2',
|
||||
'hostnames': ['*'],
|
||||
'modules_system': 'tmod32',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'hpc',
|
||||
'scheduler': 'slurm',
|
||||
'launcher': 'srun',
|
||||
'max_jobs': 100,
|
||||
'access': ['-p hpc'],
|
||||
'environs': ['builtin'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'hbrs',
|
||||
'descr': 'Azure HB',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'vm_size': 'HB60rs',
|
||||
'hostnames': ['*_hb_*'],
|
||||
'modules_system': 'tmod32',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'hb',
|
||||
'scheduler': 'slurm',
|
||||
'launcher': 'srun',
|
||||
'max_jobs': 100,
|
||||
'access': ['-p hb'],
|
||||
'environs': ['gnu-azhpc-cos7'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'hbrs_v2',
|
||||
'descr': 'Azure HBv2',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'vm_size': 'HB120rs_v2',
|
||||
'hostnames': ['*_hbv2_*'],
|
||||
'modules_system': 'tmod32',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'hbv2',
|
||||
'scheduler': 'slurm',
|
||||
'launcher': 'srun',
|
||||
'max_jobs': 100,
|
||||
'access': ['-p hbv2'],
|
||||
'environs': ['gnu-azhpc-cos7'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'hbrs_v3',
|
||||
'descr': 'Azure HBv3',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'vm_size': 'HB120rs_v3',
|
||||
'hostnames': ['*_hbv3_*'],
|
||||
'modules_system': 'tmod32',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'hbv3',
|
||||
'scheduler': 'slurm',
|
||||
'launcher': 'srun',
|
||||
'max_jobs': 100,
|
||||
'access': ['-p hbv3'],
|
||||
'environs': ['gnu-azhpc-cos7'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'hcrs',
|
||||
'descr': 'Azure HC',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'hostnames': ['*_hc_*'],
|
||||
'modules_system': 'tmod32',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'default',
|
||||
'scheduler': 'local',
|
||||
'launcher': 'local',
|
||||
'environs': ['gnu-azhpc-cos7'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'ndamsr_a100_v4',
|
||||
'descr': 'Azure NDm v4',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'hostnames': [''],
|
||||
'modules_system': 'tmod4',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'gpu',
|
||||
'scheduler': 'local',
|
||||
'launcher': 'local',
|
||||
'environs': ['gnu-azhpc'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'ndasr_v4',
|
||||
'descr': 'Azure ND v4',
|
||||
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
|
||||
'hostnames': [''],
|
||||
'modules_system': 'tmod4',
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'gpu',
|
||||
'scheduler': 'local',
|
||||
'launcher': 'local',
|
||||
'environs': ['gnu-azhpc'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
'name': 'generic',
|
||||
'descr': 'Generic example system',
|
||||
'hostnames': ['.*'],
|
||||
'partitions': [
|
||||
{
|
||||
'name': 'default',
|
||||
'scheduler': 'local',
|
||||
'launcher': 'local',
|
||||
'environs': ['builtin'],
|
||||
'prepare_cmds': ['source /etc/profile.d/modules.sh']
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
'environments': [
|
||||
{
|
||||
'name': 'builtin',
|
||||
'cc': 'cc',
|
||||
'cxx': '',
|
||||
'ftn': ''
|
||||
},
|
||||
{
|
||||
'name': 'gnu-azhpc',
|
||||
'modules': ['gcc-9.2.0', 'mpi/hpcx'],
|
||||
'cc': 'gcc',
|
||||
'cxx': 'g++',
|
||||
'ftn': 'gfortran'
|
||||
},
|
||||
{
|
||||
'name': 'gnu-azhpc-cos7',
|
||||
'modules': ['gcc-9.2.0', 'mpi/hpcx'],
|
||||
'cc': 'gcc',
|
||||
'cxx': 'g++',
|
||||
'ftn': 'gfortran'
|
||||
},
|
||||
{
|
||||
'name': 'gnu-azhpc-cos8',
|
||||
'modules': ['gcc-9.2.1', 'mpi/hpcx'],
|
||||
'cc': 'gcc',
|
||||
'cxx': 'g++',
|
||||
'ftn': 'gfortran'
|
||||
},
|
||||
{
|
||||
'name': 'gnu',
|
||||
'cc': 'gcc',
|
||||
'cxx': 'g++',
|
||||
'ftn': 'gfortran'
|
||||
},
|
||||
],
|
||||
'logging': [
|
||||
{
|
||||
'handlers': [
|
||||
{
|
||||
'type': 'stream',
|
||||
'name': 'stdout',
|
||||
'level': 'info',
|
||||
'format': '%(message)s'
|
||||
},
|
||||
{
|
||||
'type': 'file',
|
||||
'level': 'debug',
|
||||
'format': '[%(asctime)s] %(levelname)s: %(check_info)s: %(message)s', # noqa: E501
|
||||
'append': False
|
||||
}
|
||||
],
|
||||
'handlers_perflog': [
|
||||
{
|
||||
'type': 'filelog',
|
||||
'prefix': '%(check_system)s/%(check_partition)s',
|
||||
'level': 'info',
|
||||
'format': (
|
||||
'%(check_job_completion_time)s|reframe %(version)s|'
|
||||
'%(check_info)s|jobid=%(check_jobid)s|'
|
||||
'%(check_perf_var)s=%(check_perf_value)s|'
|
||||
'ref=%(check_perf_ref)s '
|
||||
'(l=%(check_perf_lower_thres)s, '
|
||||
'u=%(check_perf_upper_thres)s)|'
|
||||
'%(check_perf_unit)s'
|
||||
),
|
||||
'append': True
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
}
|
|
@ -19,8 +19,6 @@ if ! [[ -f $INSTALL_DIR/reframe/bin/reframe ]]
|
|||
curl -L -k https://github.com/JonShelley/reframe/tarball/master | tar -xz --strip-components 1
|
||||
./bootstrap.sh
|
||||
./bin/reframe -V
|
||||
cp azure_nhc/config/azure_centos_7.py azure_nhc/config/azure_centos_7_backup.py
|
||||
cp ${CYCLECLOUD_SPEC_PATH}/files/azure_centos_7.py azure_nhc/config/azure_centos_7.py
|
||||
else
|
||||
echo "Warning: Did not install ReFrame (looks like it already has been installed)"
|
||||
cd reframe
|
||||
|
@ -53,5 +51,4 @@ REPORT_PATH=$(jq -r '.report' ${HCHECK_SETTINGS_PATH})
|
|||
APPLICATIONINSIGHTS_CONNECTION_STRING=$(jq -r '.appinsights.ConnectString' ${HCHECK_SETTINGS_PATH})
|
||||
INSTRUMENTATION_KEY=$(jq -r '.appinsights.InstrumentationKey' ${HCHECK_SETTINGS_PATH})
|
||||
$INSTALL_DIR/linux-x64/hcheck -k $INSTALL_DIR/reframe/azure_nhc/run_level_2 --append --rpath $REPORT_PATH --reframe $INSTALL_DIR/reframe/bin/reframe --config $INSTALL_DIR/reframe/azure_nhc/config/${reframe_cfg}
|
||||
$INSTALL_DIR/linux-x64/hcheck --rpath $REPORT_PATH --fin --appin $INSTRUMENTATION_KEY --rscript $INSTALL_DIR/sbin/send_log
|
||||
#exit $?
|
||||
$INSTALL_DIR/linux-x64/hcheck --rpath $REPORT_PATH --fin --appin $INSTRUMENTATION_KEY --rscript $INSTALL_DIR/sbin/send_log
|
|
@ -33,5 +33,4 @@ $INSTALL_DIR/linux-x64/hcheck --rpath $REPORT_PATH --fin --appin $INSTRUMENTATIO
|
|||
# do
|
||||
# jetpack log --level error "$line";
|
||||
# done
|
||||
# exit ${PIPESTATUS[0]}
|
||||
#exit $?
|
||||
# exit ${PIPESTATUS[0]}
|
Загрузка…
Ссылка в новой задаче