Adding readme for health monitor
This commit is contained in:
Sourav Gupta 2022-09-12 11:11:51 +05:30 коммит произвёл GitHub
Родитель 35d4db0a22
Коммит 43e88d02f5
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
5 изменённых файлов: 103 добавлений и 4 удалений

Просмотреть файл

@ -167,7 +167,7 @@ func buildCliParamForMonitor() []string {
case hmcommon.NetworkProfiler:
cliParams = append(cliParams, "--no-network-profiler")
case hmcommon.FileCacheMon:
cliParams = append(cliParams, "--no-cache-monitor")
cliParams = append(cliParams, "--no-file-cache-monitor")
default:
log.Debug("health-monitor::buildCliParamForMonitor: Invalid health monitor option %v", v)
}

Просмотреть файл

@ -128,7 +128,7 @@ health_monitor:
# list of monitors to be disabled
monitor-disable-list:
- blobfuse_stats <Disable blobfuse2 stats polling>
- cache_monitor <Disable file cache directory monitor>
- file_cache_monitor <Disable file cache directory monitor>
- cpu_profiler <Disable CPU monitoring on blobfuse2 process>
- memory_profiler <Disable memory monitoring on blobfuse2 process>
- network_profiler <Disable network monitoring on blobfuse2 process>

Просмотреть файл

@ -0,0 +1,99 @@
# Blobfuse2 Health Monitor (Preview)
## About
Blobfuse2 Health Monitor is a tool which will help in monitoring Blobfuse2 mounts. It supports the following types of monitors:
1. **Blobfuse2 Stats Monitor:** Monitor the different statistics of blobfuse2 components like,
- Total bytes uploaded and downloaded via blobfuse2
- Events like create, delete, rename, synchronize, truncate, etc. on files or directories in the mounted directory
- Progress of uploads or downloads of large files to/from Azure Storage
- Keep track of number of calls that were made to Azure Storage for operations like create, delete, rename, chmod, etc. in the mounted directory
- Total number of open handles on files
- Number of times an open file request was served from the file cache or downloaded from the Azure Storage
2. **CPU and Memory Monitor:** Monitor the CPU and memory usage of the Blobfuse2 process associated with the mount
3. **File Cache Monitor:** Monitor the file cache directory specified while mounting. This monitor does the following,
- Monitor the different events like create, delete, rename, chmod, etc. of files and directories in the cache
- Keep track of the cache consumption with respect to the cache size specified during mounting
> **Note:** Health Monitor runs as a separate process where one health monitor process is associated with monitoring one blobfuse2 mounted directory.
## Enable Health Monitor
The different configuration options for the health monitor are,
- `enable-monitoring: true|false`: Boolean parameter to enable health monitor. By default it is disabled
- `stats-poll-interval-sec: <TIME IN SECONDS>`: Blobfuse2 stats polling interval (in sec). Default is 10 seconds
- `process-monitor-interval-sec: <TIME IN SECONDS>`: CPU and memory usage polling interval (in sec). Default is 30 sec
- `output-path: <PATH>`: Path where health monitor will generate its output file. It takes the current directory as default, if not specified. Output file name will be `monitor_<pid>.json`
- `monitor-disable-list: <LIST OF MONITORS>`: List of monitors to be disabled. To disable a monitor, add its corresponding name in the list
- `blobfuse_stats` - Disable blobfuse2 stats polling
- `cpu_profiler` - Disable CPU monitoring on blobfuse2 process
- `memory_profiler` - Disable memory monitoring on blobfuse2 process
- `file_cache_monitor` - Disable file cache directory monitor
### Sample Config
Add the following section to your blobfuse2 config file. Here file cache and memory monitors are disabled.
```yaml
health_monitor:
enable-monitoring: true
stats-poll-interval-sec: 10
process-monitor-interval-sec: 30
output-path: outputReportsPath
monitor-disable-list:
- file_cache_monitor
- memory_profiler
```
## Output Reports
Health monitor will store its output reports in the path specified in the `output-path` config option. If this option is not specified, it takes the current directory as default. It stores the last 100MB of monitor data in 10 different files named as `monitor_<pid>_<index>.json` where `monitor_<pid>.json`(Zeroth index) is latest and `monitor_<pid>_9.json` is the oldest output file.
### Sample Output
```
{
"Timestamp": "t1",
"CPUUsage": "value in %",
"MemoryUsage": "value in bytes",
"BlobfuseStats": [
{
"componentName": "azstorage",
"value": {
"Bytes Downloaded": value in bytes,
"Bytes Uploaded": value in bytes,
"Chmod": count of chmod calls,
"StreamDir": count of stream dir calls
}
},
{
"componentName": "file_cache",
"value": {
"Cache Usage": "value in MB",
"Usage Percent": "value in %",
"Files Downloaded": count,
"Files served from cache": count
}
}
],
"FileCache": [
{
"cacheEvent": "CREATE",
"path": "filePath",
"isDir": false,
"cacheSize": value in bytes,
"cacheConsumed": "value in %",
"cacheFilesCount": count of files in cache,
"evictedFilesCount": count of files evicted from cache,
"value": {
"FileSize": "value in bytes"
}
}
]
}
```

Просмотреть файл

@ -39,7 +39,7 @@ import (
const (
BlobfuseStats = "blobfuse_stats"
FileCacheMon = "cache_monitor"
FileCacheMon = "file_cache_monitor"
CpuProfiler = "cpu_profiler"
MemoryProfiler = "memory_profiler"
NetworkProfiler = "network_profiler"

Просмотреть файл

@ -154,7 +154,7 @@ func init() {
flag.BoolVar(&hmcommon.NoCpuProf, "no-cpu-profiler", false, "Disable CPU monitoring on blobfuse2 process")
flag.BoolVar(&hmcommon.NoMemProf, "no-memory-profiler", false, "Disable memory monitoring on blobfuse2 process")
flag.BoolVar(&hmcommon.NoNetProf, "no-network-profiler", false, "Disable network monitoring on blobfuse2 process")
flag.BoolVar(&hmcommon.NoFileCacheMon, "no-cache-monitor", false, "Disable file cache directory monitor")
flag.BoolVar(&hmcommon.NoFileCacheMon, "no-file-cache-monitor", false, "Disable file cache directory monitor")
flag.StringVar(&hmcommon.TempCachePath, "cache-path", "", "path to local disk cache")
flag.Float64Var(&hmcommon.MaxCacheSize, "max-size-mb", 0, "maximum cache size allowed. Default - 0 (unlimited)")