refactoring logistic regression r text
This commit is contained in:
Родитель
84c4b706ea
Коммит
ab24e44a52
|
@ -230,6 +230,27 @@ Now that we have an idea of the relationship between the binary categories of co
|
|||
|
||||
## 3. Build your model
|
||||
|
||||
Let's begin by splitting the data into `training` and `test` sets. The training set is used to train a classifier so that it finds a statistical relationship between the features and the label value.
|
||||
|
||||
It is best practice to hold out some of your data for **testing** in order to get a better estimate of how your models will perform on new data by comparing the predicted labels with the already known labels in the test set. [rsample](https://rsample.tidymodels.org/), a package in Tidymodels, provides infrastructure for efficient data splitting and resampling:
|
||||
|
||||
```{r split_data}
|
||||
# Split data into 80% for training and 20% for testing
|
||||
set.seed(2056)
|
||||
pumpkins_split <- pumpkins_select %>%
|
||||
initial_split(prop = 0.8)
|
||||
|
||||
# Extract the data in each split
|
||||
pumpkins_train <- training(pumpkins_split)
|
||||
pumpkins_test <- testing(pumpkins_split)
|
||||
|
||||
# Print out the first 5 rows of the training set
|
||||
pumpkins_train %>%
|
||||
slice_head(n = 5)
|
||||
|
||||
|
||||
```
|
||||
|
||||
🙌 We are now ready to train a model by fitting the training features to the training label (color).
|
||||
|
||||
We'll begin by creating a recipe that specifies the preprocessing steps that should be carried out on our data to get it ready for modelling i.e: encoding categorical variables into a set of integers. Just like `baked_pumpkins`, we create a `pumpkins_recipe` but do not `prep` and `bake` since it would be bundled into a workflow, which you will see in just a few steps from now.
|
||||
|
|
Загрузка…
Ссылка в новой задаче