wpa/man/workpatterns_classify.Rd

270 строки
11 KiB
R

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/workpatterns_classify.R
\name{workpatterns_classify}
\alias{workpatterns_classify}
\title{Classify working pattern personas using a rule based algorithm}
\usage{
workpatterns_classify(
data,
hrvar = "Organization",
values = "percent",
signals = c("email", "IM"),
start_hour = "0900",
end_hour = "1700",
exp_hours = NULL,
mingroup = 5,
active_threshold = 0,
method = "bw",
return = "plot"
)
}
\arguments{
\item{data}{A data frame containing data from the Hourly Collaboration query.}
\item{hrvar}{A string specifying the HR attribute to cut the data by.
Defaults to \code{NULL}. This only affects the function when \code{"table"} is
returned, and is only applicable for \code{method = "bw"}.}
\item{values}{Only valid if using \code{pav} method. Character vector to specify
whether to return percentages or absolute values in \code{"data"} and \code{"plot"}.
Valid values are \code{"percent"} (default) and \code{"abs"}.}
\item{signals}{Character vector to specify which collaboration metrics to
use:
\itemize{
\item \code{"email"} (default) for emails only
\item \code{"IM"} for Teams messages only
\item \code{"unscheduled_calls"} for Unscheduled Calls only
\item \code{"meetings"} for Meetings only
\item or a combination of signals, such as \code{c("email", "IM")}
}}
\item{start_hour}{A character vector specifying starting hours, e.g.
\code{"0900"}. Note that this currently only supports \strong{hourly} increments. If
the official hours specifying checking in and 9 AM and checking out at 5
PM, then \code{"0900"} should be supplied here.}
\item{end_hour}{A character vector specifying starting hours, e.g. \code{"1700"}.
Note that this currently only supports \strong{hourly} increments. If the
official hours specifying checking in and 9 AM and checking out at 5 PM,
then \code{"1700"} should be supplied here.}
\item{exp_hours}{Numeric value representing the number of hours the
population is expected to be active for throughout the workday. By default,
this uses the difference between \code{end_hour} and \code{start_hour}. Only
applicable with the 'bw' method.}
\item{mingroup}{Numeric value setting the privacy threshold / minimum group
size. Defaults to 5.}
\item{active_threshold}{A numeric value specifying the minimum number of
signals to be greater than in order to qualify as \emph{active}. Defaults to 0.
Only applicable for the binary-week method.}
\item{method}{String to pass through specifying which method to use for
classification. By default, a binary week-based (\code{bw}) method is used, with
options to use the the person-average volume-based (\code{pav}) method.}
\item{return}{String specifying what to return. This must be one of the
following strings:
\itemize{
\item \code{"plot"}
\item \code{"data"}
\item \code{"table"}
\item \code{"plot-area"}
\item \code{"plot-hrvar"} (only for \code{bw} method)
\item \code{"plot-dist"} (only for \code{bw} method)
}
See \code{Value} for more information.}
}
\value{
Character vector to specify what to return. Valid options
include:
\itemize{
\item \code{"plot"}: ggplot object. With the \code{bw} method, this returns a grid
showing the distribution of archetypes by 'breaks' and number of active
hours (default). With the \code{pav} method, this returns a faceted bar plot
which shows the percentage of signals sent in each hour, with each facet
representing an archetype.
\item \code{"data"}: data frame. The raw data with the classified archetypes.
\item \code{"table"}: data frame. A summary table of the archetypes.
\item \code{"plot-area"}: ggplot object. With the \code{bw} method, this returns an area
plot of the percentages of archetypes shown over time. With the \code{pav}
method, this returns an area chart which shows the percentage of signals
sent in each hour, with each line representing an archetype.
\item \code{"plot-hrvar"}: ggplot object. A bar plot showing the count of archetypes,
faceted by the supplied HR attribute. This is only available for the \code{bw}
method.
\item \code{"plot-dist"}: returns a heatmap plot of signal distribution by hour and
archetypes. This is only available for the \code{bw} method.
}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
Apply a rule based algorithm to emails or instant messages sent by hour of
day. Uses a binary week-based ('bw') method by default, with options to use
the the person-average volume-based ('pav') method.
}
\details{
The working patterns archetypes are a set of segments created based on the
aggregated hourly activity of employees. A motivation of creating these
archetypes is to capture the diversity in working patterns, where for
instance employees may choose to take multiple or extended breaks throughout
the day, or choose to start or end earlier/later than their standard working
hours. Two methods have been developed to capture the different working
patterns.
This function is a wrapper around \code{workpatterns_classify_bw()} and
\code{workpatterns_classify_pav()}, and calls each function depending on what is
supplied to the \code{method} argument. Both methods implement a rule-based
classification of either \strong{person-weeks} or \strong{persons} that pull apart
different working patterns.
See individual sections below for details on the two different
implementations.
}
\section{Binary Week method}{
This method classifies each \strong{person-week} into one of the eight
archetypes:
\itemize{
\item \strong{0 Low Activity (< 3 hours on)}: fewer than 3 hours of active hours
\item \strong{1.1 Standard continuous (expected schedule)}: active hours equal to
\emph{expected hours}, with all activity confined within the expected start and
end time
\item \strong{1.2 Standard continuous (shifted schedule)}: active hours equal to
\emph{expected hours}, with activity occurring beyond either the expected start
or end time.
\item \strong{2.1 Standard flexible (expected schedule)}: active hours less than or
equal to \emph{expected hours}, with all activity confined within the expected
start and end time
\item \strong{2.2 Standard flexible (shifted schedule)}: active hours less than or
equal to \emph{expected hours}, with activity occurring beyond either the
expected start or end time.
\item \strong{3 Long flexible workday}: number of active hours exceed \emph{expected
hours}, with breaks occurring throughout
\item \strong{4 Long continuous workday}: number of active hours exceed \emph{expected
hours}, with activity happening in a continuous block (no breaks)
\item \strong{5 Always on (13h+)}: number of active hours greater than or equal to
13
}
\emph{Standard} here denotes the behaviour of not exhibiting total number of
active hours which exceed the expected total number of hours, as supplied by
\code{exp_hours}. \emph{Continuous} refers to the behaviour of \emph{not} taking breaks,
i.e. no inactive hours between the first and last active hours of the day,
where \emph{flexible} refers to the contrary.
This is the recommended method over \code{pav} for several reasons:
\enumerate{
\item \code{bw} ignores \emph{volume effects}, where activity volume can still bias the
results towards the 'standard working hours'.
\item It captures the intuition that each individual can have 'light' and
'heavy' weeks with respect to workload.
}
The notion of 'breaks' in the 'binary-week' method is best understood as
'recurring disconnection time'. This denotes an hourly block where there is
consistently no activity occurring throughout the week. Note that this
applies a stricter criterion compared to the common definition of a break,
which is simply a time interval where no active work is being done, and thus
the more specific terminology 'recurring disconnection time' is preferred.
In the standard plot output, the archetypes have been abbreviated to show the
following:
\itemize{
\item \strong{Low Activity} - archetype 0
\item \strong{Standard} - archetypes 1.1 and 1.2
\item \strong{Flexible} - archetypes 2.1 and 2.2
\item \strong{Long continuous} - archetype 4
\item \strong{Long flexible} - archetype 3
\item \strong{Always On} - archetype 5
}
}
\section{Person Average method}{
This method classifies each \strong{person} (based on unique \code{PersonId}) into
one of the six archetypes:
\itemize{
\item \strong{Absent}: Fewer than 10 signals over the week.
\item \strong{Extended Hours - Morning:} 15\%+ of collaboration before start hours and
less than 70\% within standard hours, and less than 15\% of collaboration after
end hours
\item \strong{Extended Hours - Evening}: Less than 15\% of collaboration before start
hours and less than 70\% within standard hours, and 15\%+ of collaboration
after end hours
\item \strong{Overnight workers}: less than 30\% of collaboration happens within
standard hours
\item \strong{Standard Hours}: over 70\% of collaboration within standard hours
\item \strong{Always On}: over 15\% of collaboration happens before starting hour and
end hour (both conditions must satisfy) and less than 70\% of collaboration
within standard hours
}
}
\section{Flexibility Index}{
The Working Patterns archetypes as calculated
using the binary-week method shares many similarities with the Flexibility
Index (see \code{flex_index()}):
\itemize{
\item Both are computed directly from the Hourly Collaboration Flexible Query.
\item Both apply the same binary conversion of activity on the signals from the
Hourly Collaboration Flexible Query.
}
}
\examples{
\donttest{
# Returns a plot by default
em_data \%>\% workpatterns_classify(method = "bw")
# Return an area plot
# With custom expected hours
em_data \%>\%
workpatterns_classify(
method = "bw",
return = "plot-area",
exp_hours = 7
)
em_data \%>\% workpatterns_classify(method = "bw", return = "table")
em_data \%>\% workpatterns_classify(method = "pav")
em_data \%>\% workpatterns_classify(method = "pav", return = "plot-area")
}
}
\seealso{
Other Clustering:
\code{\link{personas_hclust}()},
\code{\link{workpatterns_hclust}()}
Other Working Patterns:
\code{\link{flex_index}()},
\code{\link{identify_shifts}()},
\code{\link{identify_shifts_wp}()},
\code{\link{plot_flex_index}()},
\code{\link{workpatterns_area}()},
\code{\link{workpatterns_classify_bw}()},
\code{\link{workpatterns_classify_pav}()},
\code{\link{workpatterns_hclust}()},
\code{\link{workpatterns_rank}()},
\code{\link{workpatterns_report}()}
}
\author{
Ainize Cidoncha \href{mailto:ainize.cidoncha@microsoft.com}{ainize.cidoncha@microsoft.com}
Carlos Morales Torrado \href{mailto:carlos.morales@microsoft.com}{carlos.morales@microsoft.com}
Martin Chan \href{mailto:martin.chan@microsoft.com}{martin.chan@microsoft.com}
}
\concept{Clustering}
\concept{Working Patterns}