зеркало из https://github.com/microsoft/wpa.git
73 строки
2.5 KiB
R
73 строки
2.5 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/remove_outliers.R
|
|
\name{remove_outliers}
|
|
\alias{remove_outliers}
|
|
\title{Remove outliers from a person query across time}
|
|
\usage{
|
|
remove_outliers(data, metric = "Collaboration_hours")
|
|
}
|
|
\arguments{
|
|
\item{data}{A Standard Person Query dataset in the form of a data frame.}
|
|
|
|
\item{metric}{Character string containing the name of the metric,
|
|
e.g. "Collaboration_hours"}
|
|
}
|
|
\value{
|
|
Returns a new data frame, "cleaned_data" with all metrics,
|
|
having removed the person-weeks that are below 2 standard
|
|
deviations of each individual's collaboration activity.
|
|
}
|
|
\description{
|
|
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
|
|
|
|
This function takes in a selected metric and uses z-score (number of standard
|
|
deviations) to identify and remove outlier weeks for individuals across time.
|
|
There are applications in this for removing weeks with abnormally low
|
|
collaboration activity, e.g. holidays. Retains metrics with z > -2.
|
|
|
|
Function is based on \code{identify_outlier()}, but implements a more elaborate
|
|
approach as the outliers are identified and removed \strong{with respect to each
|
|
individual}, as opposed to the group. Note that \code{remove_outliers()} has a
|
|
longer runtime compared to \code{identify_outlier()}.
|
|
}
|
|
\details{
|
|
For mature functions to remove common outliers, please see the following:
|
|
\itemize{
|
|
\item \code{identify_holidayweeks()}
|
|
\item \code{identify_nkw()}
|
|
\item \code{identify_inactiveweeks}
|
|
}
|
|
}
|
|
\seealso{
|
|
Other Data Validation:
|
|
\code{\link{check_query}()},
|
|
\code{\link{extract_hr}()},
|
|
\code{\link{flag_ch_ratio}()},
|
|
\code{\link{flag_em_ratio}()},
|
|
\code{\link{flag_extreme}()},
|
|
\code{\link{flag_outlooktime}()},
|
|
\code{\link{hr_trend}()},
|
|
\code{\link{hrvar_count_all}()},
|
|
\code{\link{hrvar_count}()},
|
|
\code{\link{hrvar_trend}()},
|
|
\code{\link{identify_churn}()},
|
|
\code{\link{identify_holidayweeks}()},
|
|
\code{\link{identify_inactiveweeks}()},
|
|
\code{\link{identify_nkw}()},
|
|
\code{\link{identify_outlier}()},
|
|
\code{\link{identify_privacythreshold}()},
|
|
\code{\link{identify_query}()},
|
|
\code{\link{identify_shifts_wp}()},
|
|
\code{\link{identify_shifts}()},
|
|
\code{\link{identify_tenure}()},
|
|
\code{\link{standardise_pq}()},
|
|
\code{\link{subject_validate_report}()},
|
|
\code{\link{subject_validate}()},
|
|
\code{\link{track_HR_change}()},
|
|
\code{\link{validation_report}()}
|
|
}
|
|
\author{
|
|
Mark Powers \href{mailto:mark.powers@microsoft.com}{mark.powers@microsoft.com}
|
|
}
|
|
\concept{Data Validation}
|