зеркало из https://github.com/microsoft/wpa.git
270 строки
11 KiB
R
270 строки
11 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/workpatterns_classify.R
|
|
\name{workpatterns_classify}
|
|
\alias{workpatterns_classify}
|
|
\title{Classify working pattern personas using a rule based algorithm}
|
|
\usage{
|
|
workpatterns_classify(
|
|
data,
|
|
hrvar = "Organization",
|
|
values = "percent",
|
|
signals = c("email", "IM"),
|
|
start_hour = "0900",
|
|
end_hour = "1700",
|
|
exp_hours = NULL,
|
|
mingroup = 5,
|
|
active_threshold = 0,
|
|
method = "bw",
|
|
return = "plot"
|
|
)
|
|
}
|
|
\arguments{
|
|
\item{data}{A data frame containing data from the Hourly Collaboration query.}
|
|
|
|
\item{hrvar}{A string specifying the HR attribute to cut the data by.
|
|
Defaults to \code{NULL}. This only affects the function when \code{"table"} is
|
|
returned, and is only applicable for \code{method = "bw"}.}
|
|
|
|
\item{values}{Only valid if using \code{pav} method. Character vector to specify
|
|
whether to return percentages or absolute values in \code{"data"} and \code{"plot"}.
|
|
Valid values are \code{"percent"} (default) and \code{"abs"}.}
|
|
|
|
\item{signals}{Character vector to specify which collaboration metrics to
|
|
use:
|
|
\itemize{
|
|
\item \code{"email"} (default) for emails only
|
|
\item \code{"IM"} for Teams messages only
|
|
\item \code{"unscheduled_calls"} for Unscheduled Calls only
|
|
\item \code{"meetings"} for Meetings only
|
|
\item or a combination of signals, such as \code{c("email", "IM")}
|
|
}}
|
|
|
|
\item{start_hour}{A character vector specifying starting hours, e.g.
|
|
\code{"0900"}. Note that this currently only supports \strong{hourly} increments. If
|
|
the official hours specifying checking in and 9 AM and checking out at 5
|
|
PM, then \code{"0900"} should be supplied here.}
|
|
|
|
\item{end_hour}{A character vector specifying starting hours, e.g. \code{"1700"}.
|
|
Note that this currently only supports \strong{hourly} increments. If the
|
|
official hours specifying checking in and 9 AM and checking out at 5 PM,
|
|
then \code{"1700"} should be supplied here.}
|
|
|
|
\item{exp_hours}{Numeric value representing the number of hours the
|
|
population is expected to be active for throughout the workday. By default,
|
|
this uses the difference between \code{end_hour} and \code{start_hour}. Only
|
|
applicable with the 'bw' method.}
|
|
|
|
\item{mingroup}{Numeric value setting the privacy threshold / minimum group
|
|
size. Defaults to 5.}
|
|
|
|
\item{active_threshold}{A numeric value specifying the minimum number of
|
|
signals to be greater than in order to qualify as \emph{active}. Defaults to 0.
|
|
Only applicable for the binary-week method.}
|
|
|
|
\item{method}{String to pass through specifying which method to use for
|
|
classification. By default, a binary week-based (\code{bw}) method is used, with
|
|
options to use the the person-average volume-based (\code{pav}) method.}
|
|
|
|
\item{return}{String specifying what to return. This must be one of the
|
|
following strings:
|
|
\itemize{
|
|
\item \code{"plot"}
|
|
\item \code{"data"}
|
|
\item \code{"table"}
|
|
\item \code{"plot-area"}
|
|
\item \code{"plot-hrvar"} (only for \code{bw} method)
|
|
\item \code{"plot-dist"} (only for \code{bw} method)
|
|
}
|
|
|
|
See \code{Value} for more information.}
|
|
}
|
|
\value{
|
|
Character vector to specify what to return. Valid options
|
|
include:
|
|
\itemize{
|
|
\item \code{"plot"}: ggplot object. With the \code{bw} method, this returns a grid
|
|
showing the distribution of archetypes by 'breaks' and number of active
|
|
hours (default). With the \code{pav} method, this returns a faceted bar plot
|
|
which shows the percentage of signals sent in each hour, with each facet
|
|
representing an archetype.
|
|
\item \code{"data"}: data frame. The raw data with the classified archetypes.
|
|
\item \code{"table"}: data frame. A summary table of the archetypes.
|
|
\item \code{"plot-area"}: ggplot object. With the \code{bw} method, this returns an area
|
|
plot of the percentages of archetypes shown over time. With the \code{pav}
|
|
method, this returns an area chart which shows the percentage of signals
|
|
sent in each hour, with each line representing an archetype.
|
|
\item \code{"plot-hrvar"}: ggplot object. A bar plot showing the count of archetypes,
|
|
faceted by the supplied HR attribute. This is only available for the \code{bw}
|
|
method.
|
|
\item \code{"plot-dist"}: returns a heatmap plot of signal distribution by hour and
|
|
archetypes. This is only available for the \code{bw} method.
|
|
}
|
|
}
|
|
\description{
|
|
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
|
|
|
|
Apply a rule based algorithm to emails or instant messages sent by hour of
|
|
day. Uses a binary week-based ('bw') method by default, with options to use
|
|
the the person-average volume-based ('pav') method.
|
|
}
|
|
\details{
|
|
The working patterns archetypes are a set of segments created based on the
|
|
aggregated hourly activity of employees. A motivation of creating these
|
|
archetypes is to capture the diversity in working patterns, where for
|
|
instance employees may choose to take multiple or extended breaks throughout
|
|
the day, or choose to start or end earlier/later than their standard working
|
|
hours. Two methods have been developed to capture the different working
|
|
patterns.
|
|
|
|
This function is a wrapper around \code{workpatterns_classify_bw()} and
|
|
\code{workpatterns_classify_pav()}, and calls each function depending on what is
|
|
supplied to the \code{method} argument. Both methods implement a rule-based
|
|
classification of either \strong{person-weeks} or \strong{persons} that pull apart
|
|
different working patterns.
|
|
|
|
See individual sections below for details on the two different
|
|
implementations.
|
|
}
|
|
\section{Binary Week method}{
|
|
|
|
|
|
This method classifies each \strong{person-week} into one of the eight
|
|
archetypes:
|
|
\itemize{
|
|
\item \strong{0 Low Activity (< 3 hours on)}: fewer than 3 hours of active hours
|
|
\item \strong{1.1 Standard continuous (expected schedule)}: active hours equal to
|
|
\emph{expected hours}, with all activity confined within the expected start and
|
|
end time
|
|
\item \strong{1.2 Standard continuous (shifted schedule)}: active hours equal to
|
|
\emph{expected hours}, with activity occurring beyond either the expected start
|
|
or end time.
|
|
\item \strong{2.1 Standard flexible (expected schedule)}: active hours less than or
|
|
equal to \emph{expected hours}, with all activity confined within the expected
|
|
start and end time
|
|
\item \strong{2.2 Standard flexible (shifted schedule)}: active hours less than or
|
|
equal to \emph{expected hours}, with activity occurring beyond either the
|
|
expected start or end time.
|
|
\item \strong{3 Long flexible workday}: number of active hours exceed \emph{expected
|
|
hours}, with breaks occurring throughout
|
|
\item \strong{4 Long continuous workday}: number of active hours exceed \emph{expected
|
|
hours}, with activity happening in a continuous block (no breaks)
|
|
\item \strong{5 Always on (13h+)}: number of active hours greater than or equal to
|
|
13
|
|
}
|
|
|
|
\emph{Standard} here denotes the behaviour of not exhibiting total number of
|
|
active hours which exceed the expected total number of hours, as supplied by
|
|
\code{exp_hours}. \emph{Continuous} refers to the behaviour of \emph{not} taking breaks,
|
|
i.e. no inactive hours between the first and last active hours of the day,
|
|
where \emph{flexible} refers to the contrary.
|
|
|
|
This is the recommended method over \code{pav} for several reasons:
|
|
\enumerate{
|
|
\item \code{bw} ignores \emph{volume effects}, where activity volume can still bias the
|
|
results towards the 'standard working hours'.
|
|
\item It captures the intuition that each individual can have 'light' and
|
|
'heavy' weeks with respect to workload.
|
|
}
|
|
|
|
The notion of 'breaks' in the 'binary-week' method is best understood as
|
|
'recurring disconnection time'. This denotes an hourly block where there is
|
|
consistently no activity occurring throughout the week. Note that this
|
|
applies a stricter criterion compared to the common definition of a break,
|
|
which is simply a time interval where no active work is being done, and thus
|
|
the more specific terminology 'recurring disconnection time' is preferred.
|
|
|
|
In the standard plot output, the archetypes have been abbreviated to show the
|
|
following:
|
|
\itemize{
|
|
\item \strong{Low Activity} - archetype 0
|
|
\item \strong{Standard} - archetypes 1.1 and 1.2
|
|
\item \strong{Flexible} - archetypes 2.1 and 2.2
|
|
\item \strong{Long continuous} - archetype 4
|
|
\item \strong{Long flexible} - archetype 3
|
|
\item \strong{Always On} - archetype 5
|
|
}
|
|
}
|
|
|
|
\section{Person Average method}{
|
|
|
|
|
|
This method classifies each \strong{person} (based on unique \code{PersonId}) into
|
|
one of the six archetypes:
|
|
\itemize{
|
|
\item \strong{Absent}: Fewer than 10 signals over the week.
|
|
\item \strong{Extended Hours - Morning:} 15\%+ of collaboration before start hours and
|
|
less than 70\% within standard hours, and less than 15\% of collaboration after
|
|
end hours
|
|
\item \strong{Extended Hours - Evening}: Less than 15\% of collaboration before start
|
|
hours and less than 70\% within standard hours, and 15\%+ of collaboration
|
|
after end hours
|
|
\item \strong{Overnight workers}: less than 30\% of collaboration happens within
|
|
standard hours
|
|
\item \strong{Standard Hours}: over 70\% of collaboration within standard hours
|
|
\item \strong{Always On}: over 15\% of collaboration happens before starting hour and
|
|
end hour (both conditions must satisfy) and less than 70\% of collaboration
|
|
within standard hours
|
|
}
|
|
}
|
|
|
|
\section{Flexibility Index}{
|
|
The Working Patterns archetypes as calculated
|
|
using the binary-week method shares many similarities with the Flexibility
|
|
Index (see \code{flex_index()}):
|
|
\itemize{
|
|
\item Both are computed directly from the Hourly Collaboration Flexible Query.
|
|
\item Both apply the same binary conversion of activity on the signals from the
|
|
Hourly Collaboration Flexible Query.
|
|
}
|
|
}
|
|
|
|
\examples{
|
|
\donttest{
|
|
# Returns a plot by default
|
|
em_data \%>\% workpatterns_classify(method = "bw")
|
|
|
|
# Return an area plot
|
|
# With custom expected hours
|
|
em_data \%>\%
|
|
workpatterns_classify(
|
|
method = "bw",
|
|
return = "plot-area",
|
|
exp_hours = 7
|
|
)
|
|
|
|
em_data \%>\% workpatterns_classify(method = "bw", return = "table")
|
|
|
|
em_data \%>\% workpatterns_classify(method = "pav")
|
|
|
|
em_data \%>\% workpatterns_classify(method = "pav", return = "plot-area")
|
|
|
|
}
|
|
|
|
}
|
|
\seealso{
|
|
Other Clustering:
|
|
\code{\link{personas_hclust}()},
|
|
\code{\link{workpatterns_hclust}()}
|
|
|
|
Other Working Patterns:
|
|
\code{\link{flex_index}()},
|
|
\code{\link{identify_shifts}()},
|
|
\code{\link{identify_shifts_wp}()},
|
|
\code{\link{plot_flex_index}()},
|
|
\code{\link{workpatterns_area}()},
|
|
\code{\link{workpatterns_classify_bw}()},
|
|
\code{\link{workpatterns_classify_pav}()},
|
|
\code{\link{workpatterns_hclust}()},
|
|
\code{\link{workpatterns_rank}()},
|
|
\code{\link{workpatterns_report}()}
|
|
}
|
|
\author{
|
|
Ainize Cidoncha \href{mailto:ainize.cidoncha@microsoft.com}{ainize.cidoncha@microsoft.com}
|
|
|
|
Carlos Morales Torrado \href{mailto:carlos.morales@microsoft.com}{carlos.morales@microsoft.com}
|
|
|
|
Martin Chan \href{mailto:martin.chan@microsoft.com}{martin.chan@microsoft.com}
|
|
}
|
|
\concept{Clustering}
|
|
\concept{Working Patterns}
|