Analyzer: Add Feature - Add multi-rules feature for data diagnosis (#289)

**Description** Add multi-rules feature for data diagnosis to support multiple rules' combined check. **Major Revision** - revise rule design to support multiple rules combination check - update related codes and tests
2022-02-20 16:59:38 +08:00 · 2022-02-20 16:59:38 +08:00 · 97ed12f97f
--- a/docs/user-tutorial/data-diagnosis.md
+++ b/docs/user-tutorial/data-diagnosis.md
@ -54,6 +54,7 @@ superbench:
    ${rule_name}:
      function: string
      criteria: string
+      store: (optional)bool
      categories: string
      metrics:
        - ${benchmark_name}/regex
@ -108,11 +109,29 @@ superbench:
        - bert_models/pytorch-bert-base/throughput_train_float(32|16)
        - bert_models/pytorch-bert-large/throughput_train_float(32|16)
        - gpt_models/pytorch-gpt-large/throughput_train_float(32|16)
+    rule4:
+      function: variance
+      criteria: "lambda x:x<-0.05"
+      store: True
+      categories: CNN
+      metrics:
+        - resnet_models/pytorch-resnet.*/throughput_train_.*
+    rule5:
+      function: variance
+      criteria: "lambda x:x<-0.05"
+      store: True
+      categories: CNN
+      metrics:
+        - vgg_models/pytorch-vgg.*/throughput_train_.*\
+    rule6:
+      function: multi_rules
+      criteria: 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False'
+      categories: CNN
 ```

 This rule file describes the rules used for data diagnosis.

-They are firstly organized by the rule name, and each rule mainly includes 4 elements:
+They are firstly organized by the rule name, and each rule mainly includes several elements:

 #### `metrics`

@ -124,21 +143,29 @@ The categories belong to this rule.

 #### `criteria`

-The criteria used for this rule, which indicate how to compare the data with the baseline value. The format should be a lambda function supported by Python.
+The criterion used for this rule, which indicates how to compare the data with the baseline value for each metric. The format should be a lambda function supported by Python.
+
+#### `store`
+
+True if the current rule is not used alone to filter the defective machine, but will be used by other subsequent rules. False(default) if this rule is used to label the defective machine directly.

 #### `function`

 The function used for this rule.

-2 types of rules are supported currently:
+3 types of rules are supported currently:

 - `variance`: the rule is to check if the variance between raw data and baseline violates the criteria. variance = (raw data - criteria) / criteria

-  For example, if the criteria are `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.
+  For example, if the 'criteria' is `lambda x:x>0.05`, the rule is that if the variance is larger than 5%, it should be defective.

 - `value`: the rule is to check if the raw data violate the criteria.

-  For example, if the criteria are `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.
+  For example, if the 'criteria' is `lambda x:x>0`, the rule is that if the raw data is larger than the 0, it should be defective.
+
+- `multi_rules`: the rule is to check if the combined results of multiple previous rules and metrics violate the criteria.
+
+  For example, if the 'criteria' is 'lambda label:True if label["rule4"]+label["rule5"]>=2 else False', the rule is that if the sum of labeled metrics in rule4 and rule5 is larger than 2, it should be defective.

 `Tips`: you must contain a default rule for ${benchmark_name}/return_code as the above in the example, which is used to identify failed tests.

--- a/superbench/analyzer/data_diagnosis.py
+++ b/superbench/analyzer/data_diagnosis.py
@ -19,7 +19,7 @@ class DataDiagnosis():
    def __init__(self):
        """Init function."""
        self._sb_rules = {}
-        self._metrics = {}
+        self._benchmark_metrics_dict = {}

    def _get_metrics_by_benchmarks(self, metrics_list):
        """Get mappings of benchmarks:metrics of metrics_list.
@ -65,10 +65,13 @@ class DataDiagnosis():
            logger.log_and_raise(exception=Exception, msg='invalid criteria format')
        if 'categories' not in rule:
            logger.log_and_raise(exception=Exception, msg='{} lack of category'.format(name))
-        if 'metrics' not in rule:
-            logger.log_and_raise(exception=Exception, msg='{} lack of metrics'.format(name))
-        if isinstance(rule['metrics'], str):
-            rule['metrics'] = [rule['metrics']]
+        if rule['function'] != 'multi_rules':
+            if 'metrics' not in rule:
+                logger.log_and_raise(exception=Exception, msg='{} lack of metrics'.format(name))
+            if isinstance(rule['metrics'], str):
+                rule['metrics'] = [rule['metrics']]
+        if 'store' in rule and not isinstance(rule['store'], bool):
+            logger.log_and_raise(exception=Exception, msg='{} store must be bool type'.format(name))
        return rule

    def _get_baseline_of_metric(self, baseline, metric):
@ -93,53 +96,67 @@ class DataDiagnosis():
                logger.warning('DataDiagnosis: get baseline - {} baseline not found'.format(metric))
                return -1

-    def _get_criteria(self, rule_file, baseline_file):
-        """Get and generate criteria of metrics.
+    def __get_metrics_and_baseline(self, rule, benchmark_rules, baseline):
+        """Get metrics with baseline in the rule.

-        Read rule file and baseline file. For each rule, use metric with regex
-        in the metrics of the rule to match the metric full name from raw data
-        for each benchmark in the rule, and then merge baseline and rule for
-        matched metrics.
+        Parse metric regex in the rule, and store the (baseline, metric) pair
+        in _sb_rules[rule]['metrics'] and metric in _enable_metrics。

        Args:
-            rule_file (str): The path of rule yaml file
-            baseline_file (str): The path of baseline json file
+            rule (str): the name of the rule
+            benchmark_rules (dict): the dict of rules
+            baseline (dict): the dict of baseline of metrics
+        """
+        if self._sb_rules[rule]['function'] == 'multi_rules':
+            return
+        metrics_in_rule = benchmark_rules[rule]['metrics']
+        benchmark_metrics_dict_in_rule = self._get_metrics_by_benchmarks(metrics_in_rule)
+        for benchmark_name in benchmark_metrics_dict_in_rule:
+            if benchmark_name not in self._benchmark_metrics_dict:
+                logger.warning('DataDiagnosis: get criteria failed - {}'.format(benchmark_name))
+                continue
+            # get rules and criteria for each metric
+            for metric in self._benchmark_metrics_dict[benchmark_name]:
+                # metric full name in baseline
+                if metric in metrics_in_rule:
+                    self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
+                    self._enable_metrics.add(metric)
+                    continue
+                # metric full name not in baseline, use regex to match
+                for metric_regex in benchmark_metrics_dict_in_rule[benchmark_name]:
+                    if re.search(metric_regex, metric):
+                        self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
+                        self._enable_metrics.add(metric)
+
+    def _parse_rules_and_baseline(self, rules, baseline):
+        """Parse and merge rules and baseline read from file.
+
+        Args:
+            rules (dict): rules from rule yaml file
+            baseline (dict): baseline of metrics from baseline json file

        Returns:
            bool: return True if successfully get the criteria for all rules, otherwise False.
        """
        try:
-            rules = file_handler.read_rules(rule_file)
-            baseline = file_handler.read_baseline(baseline_file)
-            if not rules or not baseline:
+            if not rules:
                logger.error('DataDiagnosis: get criteria failed')
                return False
            self._sb_rules = {}
-            self._enable_metrics = []
+            self._enable_metrics = set()
            benchmark_rules = rules['superbench']['rules']
            for rule in benchmark_rules:
                benchmark_rules[rule] = self._check_rules(benchmark_rules[rule], rule)
                self._sb_rules[rule] = {}
+                self._sb_rules[rule]['name'] = rule
                self._sb_rules[rule]['function'] = benchmark_rules[rule]['function']
+                self._sb_rules[rule]['store'] = True if 'store' in benchmark_rules[
+                    rule] and benchmark_rules[rule]['store'] is True else False
                self._sb_rules[rule]['criteria'] = benchmark_rules[rule]['criteria']
                self._sb_rules[rule]['categories'] = benchmark_rules[rule]['categories']
                self._sb_rules[rule]['metrics'] = {}
-                single_rule_metrics = benchmark_rules[rule]['metrics']
-                benchmark_metrics = self._get_metrics_by_benchmarks(single_rule_metrics)
-                for benchmark_name in benchmark_metrics:
-                    # get rules and criteria for each metric
-                    for metric in self._metrics[benchmark_name]:
-                        # metric full name in baseline
-                        if metric in single_rule_metrics:
-                            self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
-                            self._enable_metrics.append(metric)
-                            continue
-                        # metric full name not in baseline, use regex to match
-                        for metric_regex in benchmark_metrics[benchmark_name]:
-                            if re.search(metric_regex, metric):
-                                self._sb_rules[rule]['metrics'][metric] = self._get_baseline_of_metric(baseline, metric)
-                                self._enable_metrics.append(metric)
-            self._enable_metrics.sort()
+                self.__get_metrics_and_baseline(rule, benchmark_rules, baseline)
+            self._enable_metrics = sorted(list(self._enable_metrics))
        except Exception as e:
            logger.error('DataDiagnosis: get criteria failed - {}'.format(str(e)))
            return False
@ -166,15 +183,22 @@ class DataDiagnosis():
        issue_label = False
        details = []
        categories = set()
+        violation = {}
        summary_data_row = pd.Series(index=self._enable_metrics, name=node, dtype=float)
        # Check each rule
        for rule in self._sb_rules:
            # Get rule op function and run the rule
            function_name = self._sb_rules[rule]['function']
            rule_op = RuleOp.get_rule_func(DiagnosisRuleType(function_name))
-            pass_rule = rule_op(data_row, self._sb_rules[rule], summary_data_row, details, categories)
+            violated_num = 0
+            if rule_op == RuleOp.multi_rules:
+                violated_num = rule_op(self._sb_rules[rule], details, categories, violation)
+            else:
+                violated_num = rule_op(data_row, self._sb_rules[rule], summary_data_row, details, categories)
            # label the node as defective one
-            if not pass_rule:
+            if self._sb_rules[rule]['store']:
+                violation[rule] = violated_num
+            elif violated_num:
                issue_label = True
        if issue_label:
            # Add category information
@ -210,7 +234,9 @@ class DataDiagnosis():
                logger.error('DataDiagnosis: empty raw data')
                return data_not_accept_df, label_df
            # get criteria
-            if not self._get_criteria(rule_file, baseline_file):
+            rules = file_handler.read_rules(rule_file)
+            baseline = file_handler.read_baseline(baseline_file)
+            if not self._parse_rules_and_baseline(rules, baseline):
                return data_not_accept_df, label_df
            # run diagnosis rules for each node
            for node in self._raw_data_df.index:
@ -242,7 +268,7 @@ class DataDiagnosis():
        """
        try:
            self._raw_data_df = file_handler.read_raw_data(raw_data_file)
-            self._metrics = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
+            self._benchmark_metrics_dict = self._get_metrics_by_benchmarks(list(self._raw_data_df.columns))
            logger.info('DataDiagnosis: Begin to process {} nodes'.format(len(self._raw_data_df)))
            data_not_accept_df, label_df = self.run_diagnosis_rules(rule_file, baseline_file)
            logger.info('DataDiagnosis: Processed finished')
--- a/superbench/analyzer/diagnosis_rule_op.py
+++ b/superbench/analyzer/diagnosis_rule_op.py
@ -16,6 +16,7 @@ class DiagnosisRuleType(Enum):

    VARIANCE = 'variance'
    VALUE = 'value'
+    MULTI_RULES = 'multi_rules'


 class RuleOp:
@ -54,37 +55,77 @@ class RuleOp:

        return None

+    @staticmethod
+    def check_criterion_with_a_value(rule):
+        """Check if the criterion is valid with a numeric variable and return bool type.
+
+        Args:
+            rule (dict): rule including function, criteria, metrics with their baseline values and categories
+        """
+        # parse criteria and check if valid
+        if not isinstance(eval(rule['criteria'])(0), bool):
+            logger.log_and_raise(exception=Exception, msg='invalid criteria format')
+
+    @staticmethod
+    def miss_test(metric, rule, data_row, details, categories):
+        """Check if the metric in the rule missed test and if so add details and categories.
+
+        Args:
+            metric (str): the name of the metric
+            data_row (pd.Series): raw data of the metrics
+            rule (dict): rule including function, criteria, metrics with their baseline values and categories
+            details (list): details about violated rules and related data
+            categories (set): categories of violated rules
+
+        Returns:
+            bool: if the metric in the rule missed test, return True, otherwise return False
+        """
+        # metric not in raw_data or the value is none, miss test
+        if metric not in data_row or pd.isna(data_row[metric]):
+            RuleOp.add_categories_and_details(metric + '_miss', rule['categories'], details, categories)
+            return True
+        return False
+
+    @staticmethod
+    def add_categories_and_details(detail, category, details, categories):
+        """Add details and categories.
+
+        Args:
+            detail (str): violated rule and related data
+            category (str): category of violated rule
+            details (list): list of details about violated rules and related data
+            categories (set): set of categories of violated rules
+        """
+        details.append(detail)
+        categories.add(category)
+
    @staticmethod
    def variance(data_row, rule, summary_data_row, details, categories):
        """Rule op function of variance.

        Each metric in the rule will calculate the variance (val - baseline / baseline),
        and use criteria in the rule to determine whether metric's variance meet the criteria,
-        if any metric is labeled, the rule is not passed.
+        if any metric meet the criteria, the rule is not passed.

        Args:
            data_row (pd.Series): raw data of the metrics
            rule (dict): rule including function, criteria, metrics with their baseline values and categories
            summary_data_row (pd.Series): results of the metrics processed after the function
-            details (list): defective details including data and rules
+            details (list): details about violated rules and related data
            categories (set): categories of violated rules

        Returns:
-            bool: whether the rule is passed
+            number: the number of the metrics that violate the rule if the rule is not passed, otherwise 0
        """
-        pass_rule = True
-        # parse criteria and check if valid
-        if not isinstance(eval(rule['criteria'])(0), bool):
-            logger.log_and_raise(exception=Exception, msg='invalid criteria format')
+        violated_metric_num = 0
+        RuleOp.check_criterion_with_a_value(rule)
        # every metric should pass the rule
        for metric in rule['metrics']:
-            violate_metric = False
            # metric not in raw_data or the value is none, miss test
-            if metric not in data_row or pd.isna(data_row[metric]):
-                pass_rule = False
-                details.append(metric + '_miss')
-                categories.add(rule['categories'])
+            if RuleOp.miss_test(metric, rule, data_row, details, categories):
+                violated_metric_num += 1
            else:
+                violate_metric = False
                # check if metric pass the rule
                val = data_row[metric]
                baseline = rule['metrics'][metric]
@ -95,13 +136,12 @@ class RuleOp:
                violate_metric = eval(rule['criteria'])(var)
                # add issued details and categories
                if violate_metric:
-                    pass_rule = False
+                    violated_metric_num += 1
                    info = '(B/L: {:.4f} VAL: {:.4f} VAR: {:.2f}% Rule:{})'.format(
                        baseline, val, var * 100, rule['criteria']
                    )
-                    details.append(metric + info)
-                    categories.add(rule['categories'])
-        return pass_rule
+                    RuleOp.add_categories_and_details(metric + info, rule['categories'], details, categories)
+        return violated_metric_num

    @staticmethod
    def value(data_row, rule, summary_data_row, details, categories):
@ -109,43 +149,63 @@ class RuleOp:

        Each metric in the rule will use criteria in the rule
        to determine whether metric's value meet the criteria,
-        if any metric is labeled, the rule is not passed.
+        if any metric meet the criteria, the rule is not passed.

        Args:
            data_row (pd.Series): raw data of the metrics
            rule (dict): rule including function, criteria, metrics with their baseline values and categories
            summary_data_row (pd.Series): results of the metrics processed after the function
-            details (list): defective details including data and rules
+            details (list): details about violated rules and related data
            categories (set): categories of violated rules

        Returns:
-            bool: whether the rule is passed
+            number: the number of the metrics that violate the rule if the rule is not passed, otherwise 0
        """
-        pass_rule = True
+        violated_metric_num = 0
        # parse criteria and check if valid
-        if not isinstance(eval(rule['criteria'])(0), bool):
-            logger.log_and_raise(exception=Exception, msg='invalid criteria format')
+        RuleOp.check_criterion_with_a_value(rule)
        # every metric should pass the rule
        for metric in rule['metrics']:
-            violate_metric = False
            # metric not in raw_data or the value is none, miss test
-            if metric not in data_row or pd.isna(data_row[metric]):
-                pass_rule = False
-                details.append(metric + '_miss')
-                categories.add(rule['categories'])
+            if RuleOp.miss_test(metric, rule, data_row, details, categories):
+                violated_metric_num += 1
            else:
+                violate_metric = False
                # check if metric pass the rule
                val = data_row[metric]
                summary_data_row[metric] = val
                violate_metric = eval(rule['criteria'])(val)
                # add issued details and categories
                if violate_metric:
-                    pass_rule = False
+                    violated_metric_num += 1
                    info = '(VAL: {:.4f} Rule:{})'.format(val, rule['criteria'])
-                    details.append(metric + info)
-                    categories.add(rule['categories'])
-        return pass_rule
+                    RuleOp.add_categories_and_details(metric + info, rule['categories'], details, categories)
+        return violated_metric_num
+
+    @staticmethod
+    def multi_rules(rule, details, categories, violation):
+        """Rule op function of multi_rules.
+
+        The criteria in this rule will use the combined results of multiple previous rules and their metrics
+        which has been stored in advance to determine whether this rule is passed.
+
+        Args:
+            rule (dict): rule including function, criteria, metrics with their baseline values and categories
+            details (list): details about violated rules and related data
+            categories (set): categories of violated rules
+            violation (dict): the number of the metrics that violate the rules
+        Returns:
+            number: 0 if the rule is passed, otherwise 1
+        """
+        violated = eval(rule['criteria'])(violation)
+        if not isinstance(violated, bool):
+            logger.log_and_raise(exception=Exception, msg='invalid upper criteria format')
+        if violated:
+            info = '{}:{}'.format(rule['name'], rule['criteria'])
+            RuleOp.add_categories_and_details(info, rule['categories'], details, categories)
+        return 1 if violated else 0


 RuleOp.add_rule_func(DiagnosisRuleType.VARIANCE)(RuleOp.variance)
 RuleOp.add_rule_func(DiagnosisRuleType.VALUE)(RuleOp.value)
+RuleOp.add_rule_func(DiagnosisRuleType.MULTI_RULES)(RuleOp.multi_rules)
--- a/tests/analyzer/test_baseline.json
+++ b/tests/analyzer/test_baseline.json
@ -1,9 +1,9 @@
 {
-    "kernel-launch/event_overhead": 0.00596,
-    "kernel-launch/wall_overhead": 0.01026,
-    "kernel-launch/return_code": 0,
-    "mem-bw/H2D_Mem_BW": 25.6,
-    "mem-bw/D2H_Mem_BW": 24.3,
-    "mem-bw/D2D_Mem_BW": 1118.0,
-    "mem-bw/return_code": 0
-  }
+  "kernel-launch/event_overhead": 0.00596,
+  "kernel-launch/wall_overhead": 0.01026,
+  "kernel-launch/return_code": 0,
+  "mem-bw/H2D_Mem_BW": 25.6,
+  "mem-bw/D2H_Mem_BW": 24.3,
+  "mem-bw/D2D_Mem_BW": 1118.0,
+  "mem-bw/return_code": 0
+}
--- a/tests/analyzer/test_data_diagnosis.py
+++ b/tests/analyzer/test_data_diagnosis.py
@ -39,16 +39,16 @@ class TestDataDiagnosis(unittest.TestCase):
        test_baseline_file = str(self.parent_path / 'test_baseline.json')
        diag1 = DataDiagnosis()
        diag1._raw_data_df = file_handler.read_raw_data(test_raw_data)
-        diag1._metrics = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df))
+        diag1._benchmark_metrics_dict = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df))
        assert (len(diag1._raw_data_df) == 3)
        # Negative case
        test_raw_data_fake = str(self.parent_path / 'test_results_fake.jsonl')
        test_rule_file_fake = str(self.parent_path / 'test_rules_fake.yaml')
        diag2 = DataDiagnosis()
        diag2._raw_data_df = file_handler.read_raw_data(test_raw_data_fake)
-        diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
+        diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
        assert (len(diag2._raw_data_df) == 0)
-        assert (len(diag2._metrics) == 0)
+        assert (len(diag2._benchmark_metrics_dict) == 0)
        metric_list = [
            'gpu_temperature', 'gpu_power_limit', 'gemm-flops/FP64',
            'bert_models/pytorch-bert-base/steptime_train_float32'
@ -124,21 +124,24 @@ class TestDataDiagnosis(unittest.TestCase):
        assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/event_overhead:0') == 0.00596)
        assert (diag1._get_baseline_of_metric(baseline, 'kernel-launch/return_code') == 0)
        assert (diag1._get_baseline_of_metric(baseline, 'mem-bw/H2D:0') == -1)
-        # Test - _get_criteria
+        # Test - _parse_rules_and_baseline
        # Negative case
-        assert (diag2._get_criteria(test_rule_file_fake, test_baseline_file) is False)
+        fake_rules = file_handler.read_rules(test_rule_file_fake)
+        baseline = file_handler.read_baseline(test_baseline_file)
+        assert (diag2._parse_rules_and_baseline(fake_rules, baseline) is False)
        diag2 = DataDiagnosis()
        diag2._raw_data_df = file_handler.read_raw_data(test_raw_data)
-        diag2._metrics = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
+        diag2._benchmark_metrics_dict = diag2._get_metrics_by_benchmarks(list(diag2._raw_data_df))
        p = Path(test_rule_file)
        with p.open() as f:
            rules = yaml.load(f, Loader=yaml.SafeLoader)
        rules['superbench']['rules']['fake'] = false_rules[0]
        with open(test_rule_file_fake, 'w') as f:
            yaml.dump(rules, f)
-        assert (diag1._get_criteria(test_rule_file_fake, test_baseline_file) is False)
+        assert (diag1._parse_rules_and_baseline(fake_rules, baseline) is False)
        # Positive case
-        assert (diag1._get_criteria(test_rule_file, test_baseline_file))
+        rules = file_handler.read_rules(test_rule_file)
+        assert (diag1._parse_rules_and_baseline(rules, baseline))
        # Test - _run_diagnosis_rules_for_single_node
        (details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-01')
        assert (details_row)
@ -211,3 +214,80 @@ class TestDataDiagnosis(unittest.TestCase):
        with Path(expect_result_file).open() as f:
            expect_result = f.read()
        assert (data_not_accept_read_from_json == expect_result)
+
+    def test_mutli_rules(self):
+        """Test multi rules check feature."""
+        diag1 = DataDiagnosis()
+        # test _check_rules
+        false_rules = [
+            {
+                'criteria': 'lambda x:x>0',
+                'categories': 'KernelLaunch',
+                'store': 'true',
+                'metrics': ['kernel-launch/event_overhead:\\d+']
+            }
+        ]
+        metric = 'kernel-launch/event_overhead:0'
+        for rules in false_rules:
+            self.assertRaises(Exception, diag1._check_rules, rules, metric)
+        # Positive case
+        true_rules = [
+            {
+                'categories': 'KernelLaunch',
+                'criteria': 'lambda x:x>0.05',
+                'store': True,
+                'function': 'variance',
+                'metrics': ['kernel-launch/event_overhead:\\d+']
+            }, {
+                'categories': 'CNN',
+                'function': 'multi_rules',
+                'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
+            }
+        ]
+        for rules in true_rules:
+            assert (diag1._check_rules(rules, metric))
+        # test _run_diagnosis_rules_for_single_node
+        rules = {
+            'superbench': {
+                'rules': {
+                    'rule1': {
+                        'categories': 'CNN',
+                        'criteria': 'lambda x:x<-0.5',
+                        'store': True,
+                        'function': 'variance',
+                        'metrics': ['mem-bw/D2H_Mem_BW']
+                    },
+                    'rule2': {
+                        'categories': 'CNN',
+                        'criteria': 'lambda x:x<-0.5',
+                        'function': 'variance',
+                        'store': True,
+                        'metrics': ['kernel-launch/wall_overhead']
+                    },
+                    'rule3': {
+                        'categories': 'CNN',
+                        'function': 'multi_rules',
+                        'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
+                    }
+                }
+            }
+        }
+        baseline = {
+            'kernel-launch/wall_overhead': 0.01026,
+            'mem-bw/D2H_Mem_BW': 24.3,
+        }
+
+        data = {'kernel-launch/wall_overhead': [0.005, 0.005], 'mem-bw/D2H_Mem_BW': [25, 10]}
+        diag1._raw_data_df = pd.DataFrame(data, index=['sb-validation-04', 'sb-validation-05'])
+        diag1._benchmark_metrics_dict = diag1._get_metrics_by_benchmarks(list(diag1._raw_data_df.columns))
+        diag1._parse_rules_and_baseline(rules, baseline)
+        (details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-04')
+        assert (not details_row)
+        (details_row, summary_data_row) = diag1._run_diagnosis_rules_for_single_node('sb-validation-05')
+        assert (details_row)
+        assert ('CNN' in details_row[0])
+        assert (
+            details_row[1] == 'kernel-launch/wall_overhead(B/L: 0.0103 VAL: 0.0050 VAR: -51.27% Rule:lambda x:x<-0.5),'
+            + 'mem-bw/D2H_Mem_BW(B/L: 24.3000 VAL: 10.0000 VAR: -58.85% Rule:lambda x:x<-0.5),' +
+            'rule3:lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
+        )
--- a/tests/analyzer/test_results.jsonl
+++ b/tests/analyzer/test_results.jsonl
--- a/tests/analyzer/test_ruleop.py
+++ b/tests/analyzer/test_ruleop.py
@ -99,21 +99,101 @@ class TestRuleOp(unittest.TestCase):
        # variance
        data = {'kernel-launch/event_overhead:0': 3.1, 'kernel-launch/event_overhead:1': 2}
        data_row = pd.Series(data)
-        pass_rule = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
-        assert (not pass_rule)
+        violated_metric_num = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
+        assert (violated_metric_num == 1)
        assert (categories == {'KernelLaunch'})
        assert (details == ['kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)'])

        data = {'kernel-launch/event_overhead:0': 1.5, 'kernel-launch/event_overhead:1': 1.5}
        data_row = pd.Series(data)
-        pass_rule = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
-        assert (pass_rule)
+        violated_metric_num = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
+        assert (violated_metric_num == 0)
        assert (categories == {'KernelLaunch'})

        # value
        rule_op = RuleOp.get_rule_func(DiagnosisRuleType.VALUE)
-        pass_rule = rule_op(data_row, true_baselines[2], summary_data_row, details, categories)
-        assert (not pass_rule)
+        violated_metric_num = rule_op(data_row, true_baselines[2], summary_data_row, details, categories)
        assert (categories == {'KernelLaunch', 'KernelLaunch2'})
        assert ('kernel-launch/event_overhead:0(VAL: 1.5000 Rule:lambda x:x>0)' in details)
        assert ('kernel-launch/event_overhead:0(B/L: 2.0000 VAL: 3.1000 VAR: 55.00% Rule:lambda x:x>0.5)' in details)
+
+    def test_multi_rules_op(self):
+        """multi-rule check."""
+        details = []
+        categories = set()
+        data_row = pd.Series()
+        summary_data_row = pd.Series(index=['kernel-launch/event_overhead:0'], dtype=float)
+        false_baselines = [
+            {
+                'categories': 'KernelLaunch',
+                'criteria': 'lambda label:True if label["rule2"]>=2 else False',
+                'function': 'multi_rules'
+            }
+        ]
+        label = {}
+        for rule in false_baselines:
+            self.assertRaises(Exception, RuleOp.multi_rules, rule, details, categories, label)
+
+        true_baselines = [
+            {
+                'name': 'rule1',
+                'categories': 'CNN',
+                'criteria': 'lambda x:x<-0.5',
+                'store': True,
+                'function': 'variance',
+                'metrics': {
+                    'resnet_models/pytorch-resnet152/throughput_train_float32': 300,
+                }
+            }, {
+                'name': 'rule2',
+                'categories': 'CNN',
+                'criteria': 'lambda x:x<-0.5',
+                'store': True,
+                'function': 'variance',
+                'metrics': {
+                    'vgg_models/pytorch-vgg11/throughput_train_float32': 300
+                }
+            }, {
+                'name': 'rule3',
+                'categories': 'KernelLaunch',
+                'criteria': 'lambda label:True if label["rule1"]+label["rule2"]>=2 else False',
+                'store': False,
+                'function': 'multi_rules'
+            }
+        ]
+        # label["rule1"]+label["rule2"]=1, rule3 pass
+        data = {
+            'resnet_models/pytorch-resnet152/throughput_train_float32': 300,
+            'vgg_models/pytorch-vgg11/throughput_train_float32': 100
+        }
+        data_row = pd.Series(data)
+        rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[0]['function']))
+        label[true_baselines[0]['name']] = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
+        label[true_baselines[1]['name']] = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
+        rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[2]['function']))
+        violated_metric_num = rule_op(true_baselines[2], details, categories, label)
+        assert (violated_metric_num == 0)
+        # label["rule1"]+label["rule2"]=2, rule3 not pass
+        data = {
+            'resnet_models/pytorch-resnet152/throughput_train_float32': 100,
+            'vgg_models/pytorch-vgg11/throughput_train_float32': 100
+        }
+        data_row = pd.Series(data)
+        details = []
+        categories = set()
+        rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[0]['function']))
+        label[true_baselines[0]['name']] = rule_op(data_row, true_baselines[0], summary_data_row, details, categories)
+        label[true_baselines[1]['name']] = rule_op(data_row, true_baselines[1], summary_data_row, details, categories)
+        rule_op = RuleOp.get_rule_func(DiagnosisRuleType(true_baselines[2]['function']))
+        violated_metric_num = rule_op(true_baselines[2], details, categories, label)
+        assert (violated_metric_num)
+        assert ('CNN' in categories)
+        assert (
+            details == [
+                'resnet_models/pytorch-resnet152/throughput_train_float32' +
+                '(B/L: 300.0000 VAL: 100.0000 VAR: -66.67% Rule:lambda x:x<-0.5)',
+                'vgg_models/pytorch-vgg11/throughput_train_float32' +
+                '(B/L: 300.0000 VAL: 100.0000 VAR: -66.67% Rule:lambda x:x<-0.5)',
+                'rule3:lambda label:True if label["rule1"]+label["rule2"]>=2 else False'
+            ]
+        )