wpa/man/remove_outliers.Rd

73 строки
2.5 KiB
R

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/remove_outliers.R
\name{remove_outliers}
\alias{remove_outliers}
\title{Remove outliers from a person query across time}
\usage{
remove_outliers(data, metric = "Collaboration_hours")
}
\arguments{
\item{data}{A Standard Person Query dataset in the form of a data frame.}
\item{metric}{Character string containing the name of the metric,
e.g. "Collaboration_hours"}
}
\value{
Returns a new data frame, "cleaned_data" with all metrics,
having removed the person-weeks that are below 2 standard
deviations of each individual's collaboration activity.
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}
This function takes in a selected metric and uses z-score (number of standard
deviations) to identify and remove outlier weeks for individuals across time.
There are applications in this for removing weeks with abnormally low
collaboration activity, e.g. holidays. Retains metrics with z > -2.
Function is based on \code{identify_outlier()}, but implements a more elaborate
approach as the outliers are identified and removed \strong{with respect to each
individual}, as opposed to the group. Note that \code{remove_outliers()} has a
longer runtime compared to \code{identify_outlier()}.
}
\details{
For mature functions to remove common outliers, please see the following:
\itemize{
\item \code{identify_holidayweeks()}
\item \code{identify_nkw()}
\item \code{identify_inactiveweeks}
}
}
\seealso{
Other Data Validation:
\code{\link{check_query}()},
\code{\link{extract_hr}()},
\code{\link{flag_ch_ratio}()},
\code{\link{flag_em_ratio}()},
\code{\link{flag_extreme}()},
\code{\link{flag_outlooktime}()},
\code{\link{hr_trend}()},
\code{\link{hrvar_count_all}()},
\code{\link{hrvar_count}()},
\code{\link{hrvar_trend}()},
\code{\link{identify_churn}()},
\code{\link{identify_holidayweeks}()},
\code{\link{identify_inactiveweeks}()},
\code{\link{identify_nkw}()},
\code{\link{identify_outlier}()},
\code{\link{identify_privacythreshold}()},
\code{\link{identify_query}()},
\code{\link{identify_shifts_wp}()},
\code{\link{identify_shifts}()},
\code{\link{identify_tenure}()},
\code{\link{standardise_pq}()},
\code{\link{subject_validate_report}()},
\code{\link{subject_validate}()},
\code{\link{track_HR_change}()},
\code{\link{validation_report}()}
}
\author{
Mark Powers \href{mailto:mark.powers@microsoft.com}{mark.powers@microsoft.com}
}
\concept{Data Validation}