зеркало из https://github.com/microsoft/LightGBM.git
[python][docs] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree() (#3618)
* [python] more detailed docs for trees_to_dataframe(), create_tree_digraph(), plot_tree() * fixing warnings * fix warnings * undo unnecessary space * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * single line, better weight descriptions * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> * column names * Update python-package/lightgbm/plotting.py Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
This commit is contained in:
Родитель
f38f118ce2
Коммит
eb035013c3
|
@ -2202,6 +2202,24 @@ class Booster(object):
|
|||
def trees_to_dataframe(self):
|
||||
"""Parse the fitted model and return in an easy-to-read pandas DataFrame.
|
||||
|
||||
The returned DataFrame has the following columns.
|
||||
|
||||
- ``tree_index`` : int64, which tree a node belongs to. 0-based, so a value of ``6``, for example, means "this node is in the 7th tree".
|
||||
- ``node_depth`` : int64, how far a node is from the root of the tree. The root node has a value of ``1``, its direct children are ``2``, etc.
|
||||
- ``node_index`` : string, unique identifier for a node.
|
||||
- ``left_child`` : string, ``node_index`` of the child node to the left of a split. ``None`` for leaf nodes.
|
||||
- ``right_child`` : string, ``node_index`` of the child node to the right of a split. ``None`` for leaf nodes.
|
||||
- ``parent_index`` : string, ``node_index`` of this node's parent. ``None`` for the root node.
|
||||
- ``split_feature`` : string, name of the feature used for splitting. ``None`` for leaf nodes.
|
||||
- ``split_gain`` : float64, gain from adding this split to the tree. ``NaN`` for leaf nodes.
|
||||
- ``threshold`` : float64, value of the feature used to decide which side of the split a record will go down. ``NaN`` for leaf nodes.
|
||||
- ``decision_type`` : string, logical operator describing how to compare a value to ``threshold``. For example, ``split_feature = "Column_10", threshold = 15, decision_type = "<="`` means that records where ``Column_10 <= 15`` follow the left side of the split, otherwise follows the right side of the split. ``None`` for leaf nodes.
|
||||
- ``missing_direction`` : string, split direction that missing values should go to. ``None`` for leaf nodes.
|
||||
- ``missing_type`` : string, describes what types of values are treated as missing.
|
||||
- ``value`` : float64, predicted value for this leaf node, multiplied by the learning rate.
|
||||
- ``weight`` : float64 or int64, sum of hessian (second-order derivative of objective), summed over observations that fall in this node.
|
||||
- ``count`` : int64, number of records in the training data that fall into this node.
|
||||
|
||||
Returns
|
||||
-------
|
||||
result : pandas DataFrame
|
||||
|
|
|
@ -474,6 +474,16 @@ def create_tree_digraph(booster, tree_index=0, show_info=None, precision=3,
|
|||
orientation='horizontal', **kwargs):
|
||||
"""Create a digraph representation of specified tree.
|
||||
|
||||
Each node in the graph represents a node in the tree.
|
||||
|
||||
Non-leaf nodes have labels like ``Column_10 <= 875.9``, which means
|
||||
"this node splits on the feature named "Column_10", with threshold 875.9".
|
||||
|
||||
Leaf nodes have labels like ``leaf 2: 0.422``, which means "this node is a
|
||||
leaf node, and the predicted value for records that fall into this node
|
||||
is 0.422". The number (``2``) is an internal unique identifier and doesn't
|
||||
have any special meaning.
|
||||
|
||||
.. note::
|
||||
|
||||
For more information please visit
|
||||
|
@ -487,9 +497,14 @@ def create_tree_digraph(booster, tree_index=0, show_info=None, precision=3,
|
|||
The index of a target tree to convert.
|
||||
show_info : list of strings or None, optional (default=None)
|
||||
What information should be shown in nodes.
|
||||
Possible values of list items:
|
||||
'split_gain', 'internal_value', 'internal_count', 'internal_weight',
|
||||
'leaf_count', 'leaf_weight', 'data_percentage'.
|
||||
|
||||
- ``'split_gain'`` : gain from adding this split to the model
|
||||
- ``'internal_value'`` : raw predicted value that would be produced by this node if it was a leaf node
|
||||
- ``'internal_count'`` : number of records from the training data that fall into this non-leaf node
|
||||
- ``'internal_weight'`` : total weight of all nodes that fall into this non-leaf node
|
||||
- ``'leaf_count'`` : number of records from the training data that fall into this leaf node
|
||||
- ``'leaf_weight'`` : total weight (sum of hessian) of all observations that fall into this leaf node
|
||||
- ``'data_percentage'`` : percentage of training data that fall into this node
|
||||
precision : int or None, optional (default=3)
|
||||
Used to restrict the display of floating point values to a certain precision.
|
||||
orientation : string, optional (default='horizontal')
|
||||
|
@ -536,6 +551,16 @@ def plot_tree(booster, ax=None, tree_index=0, figsize=None, dpi=None,
|
|||
show_info=None, precision=3, orientation='horizontal', **kwargs):
|
||||
"""Plot specified tree.
|
||||
|
||||
Each node in the graph represents a node in the tree.
|
||||
|
||||
Non-leaf nodes have labels like ``Column_10 <= 875.9``, which means
|
||||
"this node splits on the feature named "Column_10", with threshold 875.9".
|
||||
|
||||
Leaf nodes have labels like ``leaf 2: 0.422``, which means "this node is a
|
||||
leaf node, and the predicted value for records that fall into this node
|
||||
is 0.422". The number (``2``) is an internal unique identifier and doesn't
|
||||
have any special meaning.
|
||||
|
||||
.. note::
|
||||
|
||||
It is preferable to use ``create_tree_digraph()`` because of its lossless quality
|
||||
|
@ -556,9 +581,14 @@ def plot_tree(booster, ax=None, tree_index=0, figsize=None, dpi=None,
|
|||
Resolution of the figure.
|
||||
show_info : list of strings or None, optional (default=None)
|
||||
What information should be shown in nodes.
|
||||
Possible values of list items:
|
||||
'split_gain', 'internal_value', 'internal_count', 'internal_weight',
|
||||
'leaf_count', 'leaf_weight', 'data_percentage'.
|
||||
|
||||
- ``'split_gain'`` : gain from adding this split to the model
|
||||
- ``'internal_value'`` : raw predicted value that would be produced by this node if it was a leaf node
|
||||
- ``'internal_count'`` : number of records from the training data that fall into this non-leaf node
|
||||
- ``'internal_weight'`` : total weight of all nodes that fall into this non-leaf node
|
||||
- ``'leaf_count'`` : number of records from the training data that fall into this leaf node
|
||||
- ``'leaf_weight'`` : total weight (sum of hessian) of all observations that fall into this leaf node
|
||||
- ``'data_percentage'`` : percentage of training data that fall into this node
|
||||
precision : int or None, optional (default=3)
|
||||
Used to restrict the display of floating point values to a certain precision.
|
||||
orientation : string, optional (default='horizontal')
|
||||
|
|
Загрузка…
Ссылка в новой задаче