LightGBM/R-package/vignettes/basic_walkthrough.Rmd

---
title:
  "Basic Walkthrough"
description: >
  This vignette describes how to train a LightGBM model for binary classification.
output:
  markdown::html_format:
    options:
      toc: true
      number_sections: true
vignette: >
  %\VignetteIndexEntry{Basic Walkthrough}
  %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE
  , comment = "#>"
  , warning = FALSE
  , message = FALSE
)
```

## Introduction

Welcome to the world of [LightGBM](https://lightgbm.readthedocs.io/en/latest/), a highly efficient gradient boosting implementation (Ke et al. 2017).

```{r}
library(lightgbm)
```

```{r, include=FALSE}
# limit number of threads used, to be respectful of CRAN's resources when it checks this vignette
data.table::setDTthreads(1L)
setLGBMthreads(2L)
```

This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the `bank` dataset (Moro, Cortez, and Rita 2014). You will use the two input features "age" and "balance" to predict whether a client has subscribed a term deposit.

## The dataset

The dataset looks as follows.

```{r}
data(bank, package = "lightgbm")

bank[1L:5L, c("y", "age", "balance")]

# Distribution of the response
table(bank$y)
```

## Training the model

The R package of LightGBM offers two functions to train a model:

- `lgb.train()`: This is the main training logic. It offers full flexibility but requires a `Dataset` object created by the `lgb.Dataset()` function.
- `lightgbm()`: Simpler, but less flexible. Data can be passed without having to bother with `lgb.Dataset()`.

### Using the `lightgbm()` function

In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the `lightgbm()` function.

```{r}
# Numeric response and feature matrix
y <- as.numeric(bank$y == "yes")
X <- data.matrix(bank[, c("age", "balance")])

# Train
fit <- lightgbm(
  data = X
  , label = y
  , params = list(
    num_leaves = 4L
    , learning_rate = 1.0
    , objective = "binary"
  )
  , nrounds = 10L
  , verbose = -1L
)

# Result
summary(predict(fit, X))
```

It seems to have worked! And the predictions are indeed probabilities between 0 and 1.

### Using the `lgb.train()` function

Alternatively, you can go for the more flexible interface `lgb.train()`. Here, as an additional step, you need to prepare `y` and `X` by the data API `lgb.Dataset()` of LightGBM. Parameters are passed to `lgb.train()` as a named list.

```{r}
# Data interface
dtrain <- lgb.Dataset(X, label = y)

# Parameters
params <- list(
  objective = "binary"
  , num_leaves = 4L
  , learning_rate = 1.0
)

# Train
fit <- lgb.train(
  params
  , data = dtrain
  , nrounds = 10L
  , verbose = -1L
)
```

Try it out! If stuck, visit LightGBM's [documentation](https://lightgbm.readthedocs.io/en/latest/R/index.html) for more details.

```{r, echo = FALSE, results = "hide"}
# Cleanup
if (file.exists("lightgbm.model")) {
  file.remove("lightgbm.model")
}
```

## References

Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." In Advances in Neural Information Processing Systems 30 (NIPS 2017).

Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems 62: 22–31.
[R-package] [docs] add intro vignette (#3946) (#4775) * [R-package] [docs] add intro vignette (#3946) * add 10 test vignettes * Revert "add 10 test vignettes" This reverts commit 40fb2e2f1982402798776ee44e4ec82fc4644d3d. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-11-18 07:15:32 +03:00			`---`
			`title:`
			`"Basic Walkthrough"`
			`description: >`
			`This vignette describes how to train a LightGBM model for binary classification.`
[R-package] [ci] switch vignettes from 'rmarkdown' to 'markdown' (#6258) 2024-01-09 06:26:33 +03:00			`output:`
			`markdown::html_format:`
			`options:`
			`toc: true`
			`number_sections: true`
[R-package] [docs] add intro vignette (#3946) (#4775) * [R-package] [docs] add intro vignette (#3946) * add 10 test vignettes * Revert "add 10 test vignettes" This reverts commit 40fb2e2f1982402798776ee44e4ec82fc4644d3d. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-11-18 07:15:32 +03:00			`vignette: >`
			`%\VignetteIndexEntry{Basic Walkthrough}`
[R-package] [ci] switch vignettes from 'rmarkdown' to 'markdown' (#6258) 2024-01-09 06:26:33 +03:00			`%\VignetteEngine{knitr::knitr}`
[R-package] [docs] add intro vignette (#3946) (#4775) * [R-package] [docs] add intro vignette (#3946) * add 10 test vignettes * Revert "add 10 test vignettes" This reverts commit 40fb2e2f1982402798776ee44e4ec82fc4644d3d. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-11-18 07:15:32 +03:00			`%\VignetteEncoding{UTF-8}`
			`---`

			```{r, include = FALSE}
			`knitr::opts_chunk$set(`
			`collapse = TRUE`
			`, comment = "#>"`
			`, warning = FALSE`
			`, message = FALSE`
			`)`
			```

			`## Introduction`

			`Welcome to the world of [LightGBM](https://lightgbm.readthedocs.io/en/latest/), a highly efficient gradient boosting implementation (Ke et al. 2017).`

[R-package] [ci] switch vignettes from 'rmarkdown' to 'markdown' (#6258) 2024-01-09 06:26:33 +03:00			```{r}
[R-package] [docs] add intro vignette (#3946) (#4775) * [R-package] [docs] add intro vignette (#3946) * add 10 test vignettes * Revert "add 10 test vignettes" This reverts commit 40fb2e2f1982402798776ee44e4ec82fc4644d3d. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-11-18 07:15:32 +03:00			`library(lightgbm)`
			```

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) (#6226) 2023-12-08 02:03:16 +03:00			```{r, include=FALSE}
			`# limit number of threads used, to be respectful of CRAN's resources when it checks this vignette`
			`data.table::setDTthreads(1L)`
			`setLGBMthreads(2L)`
			```

[R-package] [docs] add intro vignette (#3946) (#4775) * [R-package] [docs] add intro vignette (#3946) * add 10 test vignettes * Revert "add 10 test vignettes" This reverts commit 40fb2e2f1982402798776ee44e4ec82fc4644d3d. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-11-18 07:15:32 +03:00			This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the `bank` dataset (Moro, Cortez, and Rita 2014). You will use the two input features "age" and "balance" to predict whether a client has subscribed a term deposit.

			`## The dataset`

			`The dataset looks as follows.`

			```{r}
			`data(bank, package = "lightgbm")`

			`bank[1L:5L, c("y", "age", "balance")]`

			`# Distribution of the response`
			`table(bank$y)`
			```

			`## Training the model`

			`The R package of LightGBM offers two functions to train a model:`

			- `lgb.train()`: This is the main training logic. It offers full flexibility but requires a `Dataset` object created by the `lgb.Dataset()` function.
			- `lightgbm()`: Simpler, but less flexible. Data can be passed without having to bother with `lgb.Dataset()`.

			### Using the `lightgbm()` function

			In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the `lightgbm()` function.

			```{r}
			`# Numeric response and feature matrix`
			`y <- as.numeric(bank$y == "yes")`
			`X <- data.matrix(bank[, c("age", "balance")])`

			`# Train`
			`fit <- lightgbm(`
			`data = X`
			`, label = y`
[R-package] remove support for '...' in lgb.train() (#4863) * [R-package] remove support for '...' in lgb.train() * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-12-08 01:51:58 +03:00			`, params = list(`
			`num_leaves = 4L`
			`, learning_rate = 1.0`
			`, objective = "binary"`
			`)`
[R-package] [docs] add intro vignette (#3946) (#4775) * [R-package] [docs] add intro vignette (#3946) * add 10 test vignettes * Revert "add 10 test vignettes" This reverts commit 40fb2e2f1982402798776ee44e4ec82fc4644d3d. * Apply suggestions from code review Co-authored-by: Nikita Titov <nekit94-08@mail.ru> Co-authored-by: Michael Mayer <mayermichael79@gmail.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru> 2021-11-18 07:15:32 +03:00			`, nrounds = 10L`
			`, verbose = -1L`
			`)`

			`# Result`
			`summary(predict(fit, X))`
			```

			`It seems to have worked! And the predictions are indeed probabilities between 0 and 1.`

			### Using the `lgb.train()` function

			Alternatively, you can go for the more flexible interface `lgb.train()`. Here, as an additional step, you need to prepare `y` and `X` by the data API `lgb.Dataset()` of LightGBM. Parameters are passed to `lgb.train()` as a named list.

			```{r}
			`# Data interface`
			`dtrain <- lgb.Dataset(X, label = y)`

			`# Parameters`
			`params <- list(`
			`objective = "binary"`
			`, num_leaves = 4L`
			`, learning_rate = 1.0`
			`)`

			`# Train`
			`fit <- lgb.train(`
			`params`
			`, data = dtrain`
			`, nrounds = 10L`
			`, verbose = -1L`
			`)`
			```

			`Try it out! If stuck, visit LightGBM's [documentation](https://lightgbm.readthedocs.io/en/latest/R/index.html) for more details.`

			```{r, echo = FALSE, results = "hide"}
			`# Cleanup`
			`if (file.exists("lightgbm.model")) {`
			`file.remove("lightgbm.model")`
			`}`
			```

			`## References`

			`Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." In Advances in Neural Information Processing Systems 30 (NIPS 2017).`

			`Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems 62: 22–31.`