chezhang
|
0731b03962
|
Fix a bug that heartbeat thread may be stuck due to deadlock
|
2020-07-27 16:48:13 +08:00 |
chezhang
|
6130db6eb7
|
Revise some log messages
|
2020-07-27 16:28:25 +08:00 |
chezhang
|
772b2f4012
|
Update version to 2.5.0.0
|
2020-07-17 21:51:02 +08:00 |
zclok010
|
cce1f68b9f
|
Improve GPU instance name readability in metric info by adding GPU name
|
2019-08-30 14:18:33 +08:00 |
zclok010
|
9695b14042
|
Fix a issue that node with FQDN host name may not be recognized by scheduler
|
2019-08-20 11:44:21 +08:00 |
zclok010
|
3dbf5d342d
|
Fix build issue
|
2019-08-20 11:38:18 +08:00 |
zclok010
|
b369a14ff3
|
Move build-in execution filters to https://github.com/Azure-Samples/hpcpack-samples
|
2019-08-01 23:35:55 +08:00 |
zclok010
|
5ae6582161
|
add a missing build-in execution filter in OnTaskStart.sh
|
2019-07-31 15:33:59 +08:00 |
zclok010
|
1a575bccb3
|
suppress frequent warning log message when counter file of a mellanox network driver is invalid
|
2019-07-31 15:32:31 +08:00 |
zclok010
|
dc73a1fb4d
|
git ignore VMExtension/*.zip
|
2019-07-26 00:04:41 +08:00 |
zclok010
|
ba4ca371d7
|
git ignore VMExtension\*.zip
|
2019-07-26 00:01:58 +08:00 |
zclok010
|
f91b5bae05
|
update config sample
|
2019-07-25 23:58:02 +08:00 |
Sunbin Zhu
|
6abe5e32d4
|
Merge branch 'v2' of https://github.com/Azure/hpcpack-linux-agent into v2
|
2019-07-25 23:37:09 +08:00 |
Sunbin Zhu
|
a544594e1e
|
Update hpcnodemanager.py
Add firewall rule to allow port 40002
|
2019-07-25 23:37:02 +08:00 |
zclok010
|
302720eb73
|
revise some version info
|
2019-07-23 12:03:32 +08:00 |
zclok010
|
0de36a6ff9
|
mitigate scheduler pressure when connection is poor by decreasing HTTP reporter retry frequency
|
2019-07-23 12:03:32 +08:00 |
zclok010
|
19fcca1a97
|
Support multiple instances monitoring with instance filter;
Seperate network usage monitoring from total usage to usage of individual network instances
|
2019-07-23 12:03:32 +08:00 |
Sunbin Zhu
|
c7c071ffa0
|
Include VM extension code
Include VM extension code
|
2019-07-19 15:03:46 +08:00 |
zclok010
|
a2ff32a012
|
add comments of known issue of run-away processes
|
2019-07-04 00:14:05 +08:00 |
zclok010
|
5760fb798b
|
Merge branch 'dockerTask' into v2
|
2019-07-03 14:49:55 +08:00 |
zclok010
|
16c5aaaecb
|
HpcData client location change in execution filter
|
2019-06-19 15:11:19 +08:00 |
zclok010
|
183c4d8ef2
|
update version info
|
2019-06-17 16:01:59 +08:00 |
zclok010
|
9d827bcac1
|
Fix node manager crash issue due to out-of-bound array writing when constructing monitoring packet with too many data values
|
2019-06-13 19:32:41 +08:00 |
zclok010
|
9ff4b44466
|
docker task improvement
|
2019-06-13 15:31:44 +08:00 |
zclok010
|
a1dd952348
|
Add IB network usage factor
|
2019-05-31 16:59:55 +08:00 |
zclok010
|
ae73ce7624
|
Merge branch 'config' into v2
|
2019-05-27 16:24:51 +08:00 |
zclok010
|
29b2baa192
|
Merge branch 'ibNetwork' into v2
|
2019-05-27 16:22:24 +08:00 |
zclok010
|
43bfbca0c7
|
update sample config file
|
2019-05-23 23:42:45 +08:00 |
zclok010
|
1d3a59d886
|
monitor IB network usage
|
2019-05-23 18:26:14 +08:00 |
FAREAST\chezhang
|
9322ff980d
|
fix code defect
|
2019-05-21 17:54:38 +08:00 |
FAREAST\chezhang
|
332d7f9f3d
|
Merge branch 'dataIO' into v2
|
2019-05-15 14:21:04 +08:00 |
FAREAST\chezhang
|
c6da4b4edf
|
Merge branch 'su' into v2
|
2019-05-15 14:20:55 +08:00 |
FAREAST\chezhang
|
68c9e0cd57
|
Merge branch 'mpiFilter' into v2
|
2019-05-15 14:20:48 +08:00 |
FAREAST\chezhang
|
71d2753c4f
|
Support downloading input files and uploading output files for task with HpcData service
|
2019-05-13 14:52:45 +08:00 |
FAREAST\chezhang
|
418ae90cab
|
replace mpi command in the whole command line instead of just the beginning
|
2019-05-13 14:39:04 +08:00 |
FAREAST\chezhang
|
371138d162
|
Fix a code defect when CCP_SWITCH_USER is set
|
2019-05-10 19:52:42 +08:00 |
zclok010
|
6a7ef893c6
|
git ignore .vscode
|
2019-05-10 12:16:27 +08:00 |
zclok010
|
b130b4fd7f
|
Fix test case error introduced in commit b41e682 (excution filter change)
|
2019-05-09 15:15:48 +08:00 |
zclok010
|
d98eec6f15
|
Add CcpVersion and CustomProperties in register info
|
2019-05-07 10:43:33 +08:00 |
zclok010
|
0d44a14c3e
|
refine code
|
2019-05-06 20:04:10 +08:00 |
zclok010
|
d7a8e4a371
|
tidy log when cleaning up zombie tasks
|
2019-05-05 14:57:32 +08:00 |
zclok010
|
05138060e7
|
Fix a bug that processes in task are not actually terminated after task canceling if cgroup is not enable
|
2019-04-30 20:26:31 +08:00 |
zclok010
|
844bb394b9
|
Fix a bug that zombie task clean up would fail when nodemanager starts
|
2019-04-30 20:24:27 +08:00 |
zclok010
|
42b467d5af
|
tidy some code
|
2019-04-30 15:12:30 +08:00 |
FAREAST\chezhang
|
6483b16b0f
|
Update tasks' statistics before sending heartbeat to enable showing statistics when task is running
|
2019-04-26 10:19:23 +08:00 |
zclok010
|
b41e682ab3
|
Merge branch 'executionfilter' into v2
|
2019-04-16 15:42:04 +08:00 |
zclok010
|
2750349ba1
|
Add build-in execution filters to adjust task affinity in terms of core distribution in NUMA nodes and to modify command for preparation of mpi task
|
2019-04-16 15:29:26 +08:00 |
zclok010
|
8deeffa1f0
|
get home dir by tilde expansion
|
2019-04-08 15:26:17 +08:00 |
zclok010
|
2fa7452e49
|
fix a bug that memory is limited to the first NUMA node when using cgroup
|
2019-03-28 17:26:35 +08:00 |
zclok010
|
4e0e4a2d1e
|
change the default working directory to home
|
2019-03-28 17:02:54 +08:00 |