updated Readme and removed dev reframe config

This commit is contained in:
nidhi0622 2023-02-08 13:18:50 -06:00
Родитель f73df0a5fc
Коммит da344b26a3
5 изменённых файлов: 54 добавлений и 228 удалений

Просмотреть файл

@ -134,6 +134,54 @@ Whether it is a bash or a python script, anything executable can be a test, as l
- To receive a meaningful report on the error, you need to output the message into the stdout
- If you want the report to contain more information than a single message can convey, you can make your script output a json string - just make sure it has a field "message" that would be used to log the error. If you do this, everything but the message field will end up in the "extra-info" part of the report as a valid json (please refer to the [Sample healthcheck report](#sample-healthcheck-report) section for an example). If there are any formatting issues or you fail to include the "message" field, the whole json construction will become the reported message instead
## Running reframe test scripts
Update healthchecks.custom.pattern in the cluster-ini template to a pattern that the healthcheck will use to determine which test scripts to run.
![Alt](/images/reframe_pattern.png "Reframe pattern")
Alternatively, you can change the cluster template directly. This can be useful if you are planning to set up multiple clusters using that template:
```ini
[[[configuration healthchecks.reframe]]]
pattern = *.py
```
003_run_reframe.sh basically clones Jon shelly's repo to install and run reframe tests and then like other tests uses hcheck project to send log to cyclecloud and generate a report.
## Developer testing for reframe scripts
Example configuration for reframe tests for Dev testing:
If you are using Centos then you would need to edit the azure_centos_7.py file present in Jon Shelly's repo https://github.com/JonShelley/reframe/blob/master/azure_nhc/config/azure_centos_7.py to include the sku configuration in following way:
```bash
site_configuration = {
'systems': [
{
'name': 'fs_v2',
'descr': 'Azure FV2',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'vm_size': 'F2s_v2',
'hostnames': ['*'],
'modules_system': 'tmod32',
'partitions': [
{
'name': 'hpc',
'scheduler': 'slurm',
'launcher': 'srun',
'max_jobs': 100,
'access': ['-p hpc'],
'environs': ['builtin'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
]
}
```
You need to replace the config file in 003_run_reframe.sh script after cloning the repo example below:
cp azure_nhc/config/azure_centos_7.py azure_nhc/config/azure_centos_7_backup.py
cp ${CYCLECLOUD_SPEC_PATH}/files/azure_centos_7.py azure_nhc/config/azure_centos_7.py
## Running the hcheck binary
You should never have to run the tool manually, but in the case you want to do so, here is a list of supported parameters the tool accepts
@ -148,7 +196,10 @@ You should never have to run the tool manually, but in the case you want to do s
| --nr | Number of reruns for the set of scripts | --nr 3 |
| --pt | Pattern for custom script detection | -pt .sh |
| --rpath | Path to where the report would be generated | --rpath /tmp/log/report.json |
| --rscript | Path to the script reporting the results back to the portal | --rscript ./send_logs |
| --rscript | Path to the script reporting the results back to the portal | --rscript ./send_logs |
| --python | Path used to specify where python code is to report errors to App Insights | --python /send_log_appInsights |
| --reframe | Path used to specify where reframe is installed. | --reframe /reframe/bin/reframe |
| --config | Path used to specify where the reframe tests config file is present | --config /reframe/azure_nhc/config/azure_centos_7.py |
## Changing the script for reporting errors

Двоичные данные
images/reframe_pattern.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 18 KiB

Просмотреть файл

@ -1,221 +0,0 @@
# Copyright 2016-2021 Swiss National Supercomputing Centre (CSCS/ETH Zurich)
# ReFrame Project Developers. See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: BSD-3-Clause
#
# Generic fallback configuration
#
site_configuration = {
'systems': [
{
'name': 'fs_v2',
'descr': 'Azure FV2',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'vm_size': 'F2s_v2',
'hostnames': ['*'],
'modules_system': 'tmod32',
'partitions': [
{
'name': 'hpc',
'scheduler': 'slurm',
'launcher': 'srun',
'max_jobs': 100,
'access': ['-p hpc'],
'environs': ['builtin'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'hbrs',
'descr': 'Azure HB',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'vm_size': 'HB60rs',
'hostnames': ['*_hb_*'],
'modules_system': 'tmod32',
'partitions': [
{
'name': 'hb',
'scheduler': 'slurm',
'launcher': 'srun',
'max_jobs': 100,
'access': ['-p hb'],
'environs': ['gnu-azhpc-cos7'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'hbrs_v2',
'descr': 'Azure HBv2',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'vm_size': 'HB120rs_v2',
'hostnames': ['*_hbv2_*'],
'modules_system': 'tmod32',
'partitions': [
{
'name': 'hbv2',
'scheduler': 'slurm',
'launcher': 'srun',
'max_jobs': 100,
'access': ['-p hbv2'],
'environs': ['gnu-azhpc-cos7'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'hbrs_v3',
'descr': 'Azure HBv3',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'vm_size': 'HB120rs_v3',
'hostnames': ['*_hbv3_*'],
'modules_system': 'tmod32',
'partitions': [
{
'name': 'hbv3',
'scheduler': 'slurm',
'launcher': 'srun',
'max_jobs': 100,
'access': ['-p hbv3'],
'environs': ['gnu-azhpc-cos7'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'hcrs',
'descr': 'Azure HC',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'hostnames': ['*_hc_*'],
'modules_system': 'tmod32',
'partitions': [
{
'name': 'default',
'scheduler': 'local',
'launcher': 'local',
'environs': ['gnu-azhpc-cos7'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'ndamsr_a100_v4',
'descr': 'Azure NDm v4',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'hostnames': [''],
'modules_system': 'tmod4',
'partitions': [
{
'name': 'gpu',
'scheduler': 'local',
'launcher': 'local',
'environs': ['gnu-azhpc'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'ndasr_v4',
'descr': 'Azure ND v4',
'vm_data_file': 'azure_nhc/vm_info/azure_vms_dataset.json',
'hostnames': [''],
'modules_system': 'tmod4',
'partitions': [
{
'name': 'gpu',
'scheduler': 'local',
'launcher': 'local',
'environs': ['gnu-azhpc'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
},
{
'name': 'generic',
'descr': 'Generic example system',
'hostnames': ['.*'],
'partitions': [
{
'name': 'default',
'scheduler': 'local',
'launcher': 'local',
'environs': ['builtin'],
'prepare_cmds': ['source /etc/profile.d/modules.sh']
}
]
}
],
'environments': [
{
'name': 'builtin',
'cc': 'cc',
'cxx': '',
'ftn': ''
},
{
'name': 'gnu-azhpc',
'modules': ['gcc-9.2.0', 'mpi/hpcx'],
'cc': 'gcc',
'cxx': 'g++',
'ftn': 'gfortran'
},
{
'name': 'gnu-azhpc-cos7',
'modules': ['gcc-9.2.0', 'mpi/hpcx'],
'cc': 'gcc',
'cxx': 'g++',
'ftn': 'gfortran'
},
{
'name': 'gnu-azhpc-cos8',
'modules': ['gcc-9.2.1', 'mpi/hpcx'],
'cc': 'gcc',
'cxx': 'g++',
'ftn': 'gfortran'
},
{
'name': 'gnu',
'cc': 'gcc',
'cxx': 'g++',
'ftn': 'gfortran'
},
],
'logging': [
{
'handlers': [
{
'type': 'stream',
'name': 'stdout',
'level': 'info',
'format': '%(message)s'
},
{
'type': 'file',
'level': 'debug',
'format': '[%(asctime)s] %(levelname)s: %(check_info)s: %(message)s', # noqa: E501
'append': False
}
],
'handlers_perflog': [
{
'type': 'filelog',
'prefix': '%(check_system)s/%(check_partition)s',
'level': 'info',
'format': (
'%(check_job_completion_time)s|reframe %(version)s|'
'%(check_info)s|jobid=%(check_jobid)s|'
'%(check_perf_var)s=%(check_perf_value)s|'
'ref=%(check_perf_ref)s '
'(l=%(check_perf_lower_thres)s, '
'u=%(check_perf_upper_thres)s)|'
'%(check_perf_unit)s'
),
'append': True
}
]
}
],
}

Просмотреть файл

@ -19,8 +19,6 @@ if ! [[ -f $INSTALL_DIR/reframe/bin/reframe ]]
curl -L -k https://github.com/JonShelley/reframe/tarball/master | tar -xz --strip-components 1
./bootstrap.sh
./bin/reframe -V
cp azure_nhc/config/azure_centos_7.py azure_nhc/config/azure_centos_7_backup.py
cp ${CYCLECLOUD_SPEC_PATH}/files/azure_centos_7.py azure_nhc/config/azure_centos_7.py
else
echo "Warning: Did not install ReFrame (looks like it already has been installed)"
cd reframe
@ -53,5 +51,4 @@ REPORT_PATH=$(jq -r '.report' ${HCHECK_SETTINGS_PATH})
APPLICATIONINSIGHTS_CONNECTION_STRING=$(jq -r '.appinsights.ConnectString' ${HCHECK_SETTINGS_PATH})
INSTRUMENTATION_KEY=$(jq -r '.appinsights.InstrumentationKey' ${HCHECK_SETTINGS_PATH})
$INSTALL_DIR/linux-x64/hcheck -k $INSTALL_DIR/reframe/azure_nhc/run_level_2 --append --rpath $REPORT_PATH --reframe $INSTALL_DIR/reframe/bin/reframe --config $INSTALL_DIR/reframe/azure_nhc/config/${reframe_cfg}
$INSTALL_DIR/linux-x64/hcheck --rpath $REPORT_PATH --fin --appin $INSTRUMENTATION_KEY --rscript $INSTALL_DIR/sbin/send_log
#exit $?
$INSTALL_DIR/linux-x64/hcheck --rpath $REPORT_PATH --fin --appin $INSTRUMENTATION_KEY --rscript $INSTALL_DIR/sbin/send_log

Просмотреть файл

@ -33,5 +33,4 @@ $INSTALL_DIR/linux-x64/hcheck --rpath $REPORT_PATH --fin --appin $INSTRUMENTATIO
# do
# jetpack log --level error "$line";
# done
# exit ${PIPESTATUS[0]}
#exit $?
# exit ${PIPESTATUS[0]}