wpa/R/flex_index.R

350 строки
13 KiB
R

# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
# --------------------------------------------------------------------------------------------
#' @title Compute a Flexibility Index based on the Hourly Collaboration Query
#'
#' @description
#' `r lifecycle::badge('experimental')`
#'
#' Pass an Hourly Collaboration query and compute a Flexibility Index for the
#' entire population. The Flexibility Index is a quantitative measure of the
#' freedom for employees to work at a time of their choice.
#'
#' @details
#' The **Flexibility Index** is a metric that has been developed to quantify and
#' measure flexibility using behavioural data from Viva Insights. Flexibility
#' here refers to the freedom of employees to adopt a working arrangement of
#' their own choice, and more specifically refers to **time** flexibility
#' (_whenever_ I want) as opposed to **geographical** flexibility (_wherever_ I
#' want).
#'
#' The **Flexibility Index** is a score between 0 and 1, and is calculated based
#' on three component measures:
#'
#' - `ChangeHours`: this represents the freedom to define work start and end
#' time. Teams that embrace flexibility allow members to start and end their
#' workday at different times.
#'
#' - `TakeBreaks`: this represents the freedom define one's own schedule. In
#' teams that embrace flexibility, some members will choose to organize / split
#' their day in different ways (e.g. take a long lunch-break, disconnect in the
#' afternoon and reconnect in the evening, etc.).
#'
#' - `ControlHours`: this represents the freedom to switch off. Members who
#' choose alternative arrangements should be able to maintain a workload that is
#' broadly equivalent to those that follow standard arrangements.
#'
#' The **Flexibility Index** returns with one single score for each person-week,
#' plus the **three** sub-component binary variables (`TakeBreaks`,
#' `ChangeHours`, `ControlHours`). At the person-week level, each score can only
#' have the values 0, 0.33, 0.66, and 1. The Flexibility Index should only be
#' interpreted as a **group** of person-weeks, e.g. the average Flexibility
#' Index of a team of 6 over time, where the possible values would range from 0
#' to 1.
#'
#' @section Context:
#' The central feature of flexible working arrangements is
#' that it is the employee rather the employer who chooses the working
#' arrangement. _Observed flexibility_ serves as a proxy to assess whether a
#' flexible working arrangement are in place. The Flexibility Index is an
#' attempt to create such a proxy for quantifying and measuring flexibility,
#' using behavioural data from Viva Insights.
#'
#' @section Recurring disconnection time:
#' The key component of `TakeBreaks` in the Flexibility Index is best
#' interpreted as 'recurring disconnection time'. This denotes an hourly block
#' where there is consistently no activity occurring throughout the week. Note
#' that this applies a stricter criterion compared to the common definition of
#' a break, which is simply a time interval where no active work is being
#' done, and thus the more specific terminology 'recurring disconnection time'
#' is preferred.
#'
#' @param data Hourly Collaboration query to be passed through as data frame.
#'
#' @param hrvar A string specifying the HR attribute to cut the data by.
#' Defaults to NULL. This only affects the function when "table" is returned.
#'
#' @param signals Character vector to specify which collaboration metrics to
#' use:
#' - a combination of signals, such as `c("email", "IM")` (default)
#' - `"email"` for emails only
#' - `"IM"` for Teams messages only
#' - `"unscheduled_calls"` for Unscheduled Calls only
#' - `"meetings"` for Meetings only
#'
#' @param active_threshold A numeric value specifying the minimum number of
#' signals to be greater than in order to qualify as _active_. Defaults to 0.
#'
#' @param start_hour A character vector specifying starting hours, e.g. `"0900"`
#'
#' @param end_hour A character vector specifying end hours, e.g. `"1700"`
#'
#' @param return String specifying what to return. This must be one of the
#' following strings:
#' - `"plot"`
#' - `"data"`
#' - `"table"`
#'
#' See `Value` for more information.
#'
#' @param plot_method Character string for determining which plot to return.
#' - `"sample"` plots a sample of ten working pattern
#' - `"common"` plots the ten most common working patterns
#' - `"time"` plots the Flexibility Index for the group over time
#'
#' @param mode String specifying aggregation method for plot. Only applicable
#' when `return = "plot"`. Valid options include:
#' - `"binary"`: convert hourly activity into binary blocks. In the plot, each
#' block would display as solid.
#' - `"prop"`: calculate proportion of signals in each hour over total signals
#' across 24 hours, then average across all work weeks. In the plot, each
#' block would display as a heatmap.
#'
#' @return
#' A different output is returned depending on the value passed to the `return`
#' argument:
#' - `"plot"`: 'ggplot' object. A random of ten working patterns are displayed,
#' with diagnostic data and the Flexibility Index shown on the plot.
#' - `"data"`: data frame. The original input data appended with the
#' Flexibility Index and the component scores. Can be used with
#' `plot_flex_index()` to recreate visuals found in `flex_index()`.
#' - `"table"`: data frame. A summary table for the metric.
#'
#'
#' @import dplyr
#' @importFrom data.table ":=" "%like%" "%between%"
#'
#' @examples
#' # Create a sample small dataset
#' orgs <- c("Customer Service", "Financial Planning", "Biz Dev")
#' em_data <- em_data[em_data$Organization %in% orgs, ]
#'
#' # Examples of how to test the plotting options individually
#' # Sample of 10 work patterns
#' em_data %>%
#' flex_index(return = "plot", plot_method = "sample")
#'
#' # 10 most common work patterns
#' em_data %>%
#' flex_index(return = "plot", plot_method = "common")
#'
#' # Plot Flexibility Index over time
#' \donttest{
#' em_data %>%
#' flex_index(return = "plot", plot_method = "time")
#' }
#'
#' # Return a summary table with the computed Flexibility Index
#' \donttest{
#' em_data %>%
#' flex_index(hrvar = "Organization", return = "table")
#' }
#'
#' @section Returning the raw data:
#' The raw data containing the computed Flexibility Index can be returned with
#' the following:
#' ```
#' em_data %>%
#' flex_index(return = "data")
#' ```
#'
#' @family Working Patterns
#'
#' @export
flex_index <- function(data,
hrvar = NULL,
signals = c("email", "IM"),
active_threshold = 0,
start_hour = "0900",
end_hour = "1700",
return = "plot",
plot_method = "common",
mode = "binary"){
## Bindings for variables
TakeBreaks <- NULL
ChangeHours <- NULL
ControlHours <- NULL
FlexibilityIndex <- NULL
Signals_Break_hours <- NULL
## Make sure data.table knows we know we're using it
.datatable.aware = TRUE
## Save original
start_hour_o <- start_hour
end_hour_o <- end_hour
## Coerce to numeric, remove trailing zeros
start_hour <- as.numeric(gsub(pattern = "00$", replacement = "", x = start_hour))
end_hour <- as.numeric(gsub(pattern = "00$", replacement = "", x = end_hour))
norm_span <- end_hour - start_hour
## convert to data.table
data2 <-
data %>%
dplyr::mutate(Date = as.Date(Date, format = "%m/%d/%Y")) %>%
data.table::as.data.table() %>%
data.table::copy()
## Text replacement only for allowed values
if(any(signals %in% c("email", "IM", "unscheduled_calls", "meetings"))){
signal_set <- gsub(pattern = "email", replacement = "Emails_sent", x = signals) # case-sensitive
signal_set <- gsub(pattern = "IM", replacement = "IMs_sent", x = signal_set)
signal_set <- gsub(pattern = "unscheduled_calls", replacement = "Unscheduled_calls", x = signal_set)
signal_set <- gsub(pattern = "meetings", replacement = "Meetings", x = signal_set)
} else {
stop("Invalid input for `signals`.")
}
## Create 24 summed `Signals_sent` columns
signal_cols <- purrr::map(0:23, ~combine_signals(data, hr = ., signals = signal_set))
signal_cols <- bind_cols(signal_cols)
## Use names for matching
input_var <- names(signal_cols)
## Signals sent by Person and Date
signals_df <-
data2 %>%
.[, c("PersonId", "Date")] %>%
cbind(signal_cols)
## Save original `signals_df` before manipulating ------------------------
## Rename `Signals_sent` columns to prevent conflict
signals_df_o <- signals_df %>%
purrr::set_names(
nm = gsub(x = names(.),
replacement = "_ori_",
pattern = "_sent_")
)
## Signal label
sig_label <- ifelse(length(signal_set) > 1, "Signals_sent", signal_set)
## Create binary variable 0 or 1
num_cols <- names(which(sapply(signals_df, is.numeric))) # Get numeric columns
## Create Signals Total and binary values
signals_df <-
signals_df %>%
data.table::as.data.table() %>%
# active_threshold: minimum signals to qualify as active
.[, (num_cols) := lapply(.SD, function(x) ifelse(x > active_threshold, 1, 0)), .SDcols = num_cols] %>%
.[, ("Signals_Total") := apply(.SD, 1, sum), .SDcols = input_var]
## Classify PersonId-Signal data by time of day
WpA_classify <-
signals_df %>%
tidyr::gather(!!sym(sig_label), sent, -PersonId, -Date, -Signals_Total) %>%
data.table::as.data.table()
WpA_classify[, StartEnd := gsub(pattern = "[^[:digit:]]", replacement = "", x = get(sig_label))]
WpA_classify[, Start := as.numeric(substr(StartEnd, start = 1, stop = 2))]
WpA_classify[, End := as.numeric(substr(StartEnd, start = 3, stop = 4))]
WpA_classify[, Before_start := Start < (start_hour)] # Earlier than start hour
WpA_classify[, After_end := End > (end_hour)] # Later than start hour
WpA_classify[, Within_hours := (Start >= start_hour & End <= end_hour)]
WpA_classify[, HourType := NA_character_]
WpA_classify[After_end == TRUE, HourType := "After_end"]
WpA_classify[Before_start == TRUE, HourType := "Before_start"]
WpA_classify[Within_hours == TRUE, HourType := "Within_hours"]
WpA_classify <-
WpA_classify[, c("PersonId", "Date", "Signals_Total", "HourType", "sent")] %>%
.[, .(sent = sum(sent)), by = c("PersonId", "Date", "Signals_Total", "HourType")] %>%
tidyr::spread(HourType, sent) %>%
left_join(WpA_classify %>% ## Calculate first and last activity for day_span
filter(sent>0) %>%
group_by(PersonId,Date) %>%
summarise(First_signal = min(Start),
Last_signal = max(End),
.groups = "drop_last"),
by = c("PersonId","Date")) %>%
mutate(Day_Span = Last_signal-First_signal,
Signals_Break_hours = Day_Span-Signals_Total) %>%
select(-Signals_Total)
## hrvar treatment
if(is.null(hrvar)){
hr_dt <- data2[, c("PersonId", "Date")]
hr_dt <- hr_dt[, Total := "Total"]
hrvar <- "Total"
hr_dt <- as.data.frame(hr_dt)
} else {
temp_str <- c("PersonId", "Date", hrvar)
# hr_dt <- data2[, ..temp_str] # double dot prefix
hr_dt <-
data2 %>%
as.data.frame() %>%
select(temp_str)
}
## Bind calculated columns with original Signals df
calculated_data <-
WpA_classify %>%
left_join(signals_df, by = c("PersonId","Date")) %>%
left_join(hr_dt, by = c("PersonId","Date")) %>%
left_join(signals_df_o, by = c("PersonId","Date")) %>%
filter(Signals_Total >= 3) %>% # At least 3 signals required
## Additional calculations for Flexibility Index
mutate(TakeBreaks = (Signals_Break_hours > 0),
ChangeHours = (First_signal != start_hour),
ControlHours = (Day_Span <= norm_span)) %>%
mutate(FlexibilityIndex = select(., TakeBreaks, ChangeHours, ControlHours) %>%
apply(1, mean))
## Plot return
sig_label_ <- paste0(sig_label, "_")
## Summary Table for Return
## Applies groups
returnTable <-
calculated_data %>%
group_by(!!sym(hrvar)) %>%
summarise_at(vars(TakeBreaks, ChangeHours, ControlHours),
~mean(.), .groups = "drop_last") %>%
mutate(FlexibilityIndex =
select(., TakeBreaks, ChangeHours, ControlHours) %>%
apply(1, mean))
## Main plot
if(return == "plot"){
plot_flex_index(data = calculated_data,
sig_label = sig_label_,
start_hour = start_hour,
end_hour = end_hour,
method = plot_method,
mode = mode)
} else if(return == "data"){
calculated_data
} else if(return == "table"){
returnTable
} else {
stop("Check input for `return`.")
}
}