init: first commit

2020-10-26 21:21:24 +00:00 · 2020-10-26 21:21:24 +00:00 · 020c455952
--- a/.github/.gitignore
+++ b/.github/.gitignore
@ -0,0 +1 @@
+*.html
--- a/.github/developer_guide.md
+++ b/.github/developer_guide.md
@ -0,0 +1,106 @@
+# :hammer: Developer Guide
+
+This is guide for "developers", or anyone who is looking to contribute code to the {wpa} R package. 
+
+## Pre-requisites
+
+You should have the following installed before starting:
+
+1. [R](https://www.r-project.org/)
+2. [RStudio Desktop](https://rstudio.com/products/rstudio/download/#download)
+3. [GitHub Desktop](https://desktop.github.com/) (Highly recommended, unless you are very familiar with git)
+
+You should also have the following packages installed. These should all be available from CRAN:
+1. {devtools}
+2. {tidyverse}
+3. {roxygen2}
+
+Once these are all installed, you should be ready to roll!
+
+---
+
+## :hammer: Quick start guide - For developers
+
+### Update package and installing to local directory
+
+Once you've made changes to a script, here is how to update the package and install it locally on your computer. Run:
+
+#### 1. Generating documentation
+
+```R
+roxygen2::roxygenise()
+```
+`roxygenise()` generates documentation files for the R package, and makes them available to the user.
+
+#### 2. R CMD Checks
+Next, run this to build and check the package. You will need {devtools} installed.
+
+```R
+devtools::check()
+```
+This step runs a R CMD check against the package, to check that there will be no errors and issues if somebody installs the package. This step can take around 5 minutes long, and usually it is worth running this only after you've implemented all the desired changes towards the end. 
+
+#### 3. Install the package
+
+If everything runs smoothly, run this final line to install the local package. Make sure that your working directory is set to the package!
+
+```R
+devtools::install()
+```
+
+---
+
+## To increment or change a package version
+
+Each package release has a corresponding package version (e.g. 1.4.2.) To change that package version number for a particular release, simply go into the `DESCRIPTION` at the project root folder of the {wpa} package. You can edit this with any notepad software or RStudio. 
+
+Change the values that come after `Version: ` to change the package version. Prior to incrementing the package version, please align with the main developer team on the agreed version release details. 
+
+## How to release a package version
+
+Our current protocol for package release is to distribute it as a zipped file or a tar ball. The advantage of this approach is that users do not need to have a GitHub account, and this package release can be distributed via SharePoint or even email. Our current approach is to offer **both** options for installation. 
+
+Once you are happy with the package and all checks have passed, you can run the following to generate a **zipped** distribution of the package. If your platform is Windows, this should generate a zipped file one level above your project directory:
+
+```R
+devtools::build(binary = TRUE)
+```
+
+The code to generate a **tarball** distribution is similar:
+```R
+devtools::build(binary = FALSE)
+```
+
+After the **zipped file** and the **tarball** files have been generated, they are ready for distribution. 
+
+## How to build the PDF documentation
+
+It is good practice to distribute a package with an updated version of the documentation, so users won't be using outdated documentation. You can generate the familiar PDF documentation by running the following:
+```R
+devtools::build_manual()
+```
+
+## A short guide to adding a function to the package
+
+1. Let us assume that you have written a new function, e.g. a checking function that returns TRUE if PersonId exists, and returns FALSE if it does not).
+1. Once you check the function and ensure that it works, save this into a R file under the "R" folder of the package directory.
+    - Ensure that your current working directory is set to the package directory. You can either open the RStudio session with the package .Rproj file, or run `setwd()` to change you working directory. Run `getwd()` to check what your current working directory is.
+    - Ensure that the relevant roxygen headers are present (see [here](https://roxygen2.r-lib.org/articles/rd.html) for more reference).
+    - Whilst it is possible to have more than one function saved in a single R file, we recommend saving them in their own files with their own names for clarity. For instance, the function `collaboration_dist()` would be saved in a file called _collaboration_dist.R_.
+1. Follow the steps in the Developer Guide above to generate the documentation (`roxygen2::roxygenise()`), check the package (`devtools::check()`), and install the package (`devtools::install()`). 
+1. Once you are sure that everything works, open **GitHub Desktop**, and **check that you have selected the relevant branch**. If you are unsure what your changes are, it is always recommended that you work on a new branch so that you do not introduce any breaking changes to the master branch.
+1. _Fetch / pull_ from the branch. This ensures that your local package is updated with changes that others have made to the code.
+1. _Commit_ your changes, and add an intuitive commit message so others will know what you have changed to the code. Please see the wiki page of this repo for a style guide on the commit messages.
+1. _Push_ your changes to the branch. If there are no errors, you should be able to see the changes on the GitHub repository!
+
+## Using GitHub
+*It's on its way...*
+
+## Documenting functions with {roxygen2}
+*It's on its way...*
+
+## :link: Useful links
+
+- [Sharing Internal R Packages](https://support.rstudio.com/hc/en-us/articles/115000239587-Sharing-Internal-R-Packages)
+- [Writing an R package from scratch by Hilary Parker](https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/)
+- [Semantic versioning](https://semver.org/)
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@ -0,0 +1,19 @@
+# Summary
+This branch *<succinct summary of the purpose>*
+
+# Changes
+The changes made in this PR are:
+1. Change 1
+1. Change 2
+...
+
+# Check
+- [ ] Check 1
+- [ ] Check 2
+...
+
+# (OPTIONAL) Note
+This "fixes #<issue_number>"
+
+*<other things, such as how to incorporate new changes>*
+*<brief summary of the purpose of this pull request>*
--- a/.github/workflows/R-CMD-check.yaml
+++ b/.github/workflows/R-CMD-check.yaml
@ -0,0 +1,18 @@
+on: [push, pull_request]
+
+name: R-CMD-check
+
+jobs:
+  R-CMD-check:
+    runs-on: macOS-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: r-lib/actions/setup-r@master
+      - name: Install dependencies
+        run: |
+          install.packages(c("remotes", "rcmdcheck"))
+          remotes::install_deps(dependencies = TRUE)
+        shell: Rscript {0}
+      - name: Check
+        run: rcmdcheck::rcmdcheck(args = "--no-manual", error_on = "error")
+        shell: Rscript {0}
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,12 @@
+.Rproj.user
+.Rhistory
+.Rbuildignore
+_data/
+_development/
+.RData
+doc
+Meta
+.RDataTmp
+SQ-overview.html
+wpa export 20200427_131327.png
+*.html
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,14 @@
+# Contributing
+
+This project welcomes contributions and suggestions. Most contributions require you to
+agree to a Contributor License Agreement (CLA) declaring that you have the right to,
+and actually do, grant us the rights to use your contribution. For details, visit
+https://cla.microsoft.com.
+
+When you submit a pull request, a CLA-bot will automatically determine whether you need
+to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
+instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
+or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
--- a/65
+++ b/65
@ -0,0 +1,65 @@
+Package: wpa
+Type: Package
+Title: Tools for Analysing and Visualising Workplace Analytics data
+Version: 1.3.0
+Authors@R: c(
+    person(given = "Martin", family = "Chan", role = c("aut", "cre"), email = "martin.chan@microsoft.com"),
+    person(given = "Carlos", family = "Morales", role = "ctb", email = "carlos.morales@microsoft.com"),
+    person(given = "Mark", family = "Powers", role = "ctb", email = "mark.powers@microsoft.com"),
+    person(given = "Ainize", family = "Cidoncha", role = "ctb", email = "ainize.cidoncha@microsoft.com"),
+    person(given = "Rosamary", family = "Ochoa Vargas", role = "ctb", email = "rosamary.ochoa@microsoft.com"),
+    person(given = "Tannaz", family = "Sattari", role = "ctb", email = "tannaz.sattari@microsoft.com"))
+Description: Opinionated functions that enable easier and faster
+    analysis of Workplace Analytics data. There are three main types of functions in wpa:
+    (i) Standard functions create a ggplot visual or a summary table based on a specific
+    Workplace Analytics metric; (2) Report Generation functions generate HTML reports on
+    a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover
+    more specific applications (e.g. Subject Line text mining) of Workplace Analytics data.
+    This package adheres to tidyverse principles and works well with the pipe syntax.
+    wpa is built with the beginner-to-intermediate R users in mind, and is optimised for
+    simplicity. 
+URL: https://github.com/martinctc/wpa
+BugReports: https://github.com/martinctc/wpa/issues
+License: MIT + file LICENSE
+Encoding: UTF-8
+LazyData: true
+Depends:
+    R (>= 3.1.2)
+Imports: 
+    base,
+    dplyr,
+    stats,
+    utils,
+    tibble,
+    tidyr,
+    tidyselect (>= 1.0.0),
+    magrittr,
+    purrr,
+    reshape2,
+    ggplot2,
+    ggrepel,
+    Information,
+    scales,
+    htmltools,
+    knitr,
+    markdown,
+    rmarkdown,
+    networkD3,
+    DT,
+    tidytext,
+    ggraph,
+    igraph,
+    widyr,
+    proxy,
+    rlang,
+    GGally,
+    network,
+    ggwordcloud,
+    methods,
+    data.table
+RoxygenNote: 7.1.1
+Roxygen: list(markdown = TRUE)
+VignetteBuilder: knitr
+Suggests: 
+    extrafont
+Language: en-US
--- a/2
+++ b/2
@ -0,0 +1,2 @@
+YEAR: 2020
+COPYRIGHT HOLDER: Microsoft Corporation
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -0,0 +1,23 @@
+wpa
+
+Copyright (c) Microsoft Corporation.
+
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/170
+++ b/170
@ -0,0 +1,170 @@
+# Generated by roxygen2: do not edit by hand
+
+export("%>%")
+export(IV_by_period)
+export(IV_report)
+export(afterhours_dist)
+export(afterhours_distribution)
+export(afterhours_rank)
+export(afterhours_sum)
+export(afterhours_summary)
+export(afterhours_trend)
+export(analysis_scope)
+export(camel_clean)
+export(capacity_report)
+export(check_inputs)
+export(check_query)
+export(coaching_report)
+export(collab_sum)
+export(collab_summary)
+export(collaboration_area)
+export(collaboration_dist)
+export(collaboration_distribution)
+export(collaboration_fizz)
+export(collaboration_line)
+export(collaboration_rank)
+export(collaboration_report)
+export(collaboration_sum)
+export(collaboration_summary)
+export(collaboration_trend)
+export(comma)
+export(connectivity_report)
+export(copy_df)
+export(create_IV)
+export(create_bar)
+export(create_bar_asis)
+export(create_boxplot)
+export(create_dist)
+export(create_dt)
+export(create_fizz)
+export(create_line)
+export(create_period_scatter)
+export(create_rank)
+export(create_scatter)
+export(create_stacked)
+export(cut_hour)
+export(email_dist)
+export(email_fizz)
+export(email_line)
+export(email_rank)
+export(email_summary)
+export(email_trend)
+export(export)
+export(external_network_plot)
+export(extract_date_range)
+export(extract_hr)
+export(flag_ch_ratio)
+export(flag_em_ratio)
+export(flag_extreme)
+export(flag_outlooktime)
+export(g2g_network)
+export(generate_report)
+export(hr_trend)
+export(hrvar_count)
+export(hrvar_count_all)
+export(identify_churn)
+export(identify_holidayweeks)
+export(identify_inactiveweeks)
+export(identify_nkw)
+export(identify_outlier)
+export(identify_privacythreshold)
+export(identify_query)
+export(identify_shifts)
+export(identify_tenure)
+export(import_wpa)
+export(internal_network_plot)
+export(is_date_format)
+export(keymetrics_scan)
+export(meeting_dist)
+export(meeting_distribution)
+export(meeting_fizz)
+export(meeting_line)
+export(meeting_quality)
+export(meeting_rank)
+export(meeting_skim)
+export(meeting_summary)
+export(meeting_tm_report)
+export(meeting_trend)
+export(meetingtype_dist)
+export(meetingtype_dist_ca)
+export(meetingtype_dist_mt)
+export(meetingtype_summary)
+export(mgrcoatt_dist)
+export(mgrrel_matrix)
+export(one2one_dist)
+export(one2one_line)
+export(one2one_rank)
+export(one2one_sum)
+export(one2one_trend)
+export(period_change)
+export(personas_hclust)
+export(read_preamble)
+export(remove_outliers)
+export(rgb2hex)
+export(standardise_pq)
+export(subject_validate)
+export(subject_validate_report)
+export(theme_wpa)
+export(theme_wpa_basic)
+export(tm_clean)
+export(tm_cooc)
+export(tm_freq)
+export(tm_wordcloud)
+export(totals_bind)
+export(totals_reorder)
+export(track_HR_change)
+export(tstamp)
+export(validation_report)
+export(workloads_dist)
+export(workloads_distribution)
+export(workloads_line)
+export(workloads_summary)
+export(workloads_trend)
+export(workpatterns_area)
+export(workpatterns_hclust)
+export(workpatterns_rank)
+export(wrap)
+import(DT)
+import(Information)
+import(dplyr)
+import(ggplot2)
+import(ggraph)
+import(ggrepel)
+import(reshape2)
+import(scales)
+import(tidyselect)
+importFrom(data.table,"%between%")
+importFrom(data.table,"%like%")
+importFrom(data.table,":=")
+importFrom(dplyr,`%>%`)
+importFrom(dplyr,mutate_if)
+importFrom(htmltools,HTML)
+importFrom(igraph,graph_from_data_frame)
+importFrom(magrittr,"%>%")
+importFrom(markdown,markdownToHTML)
+importFrom(methods,is)
+importFrom(proxy,dist)
+importFrom(purrr,map_if)
+importFrom(purrr,pmap)
+importFrom(purrr,reduce)
+importFrom(rlang,"!!")
+importFrom(rlang,sym)
+importFrom(scales,percent)
+importFrom(scales,wrap_format)
+importFrom(stats,cutree)
+importFrom(stats,hclust)
+importFrom(stats,median)
+importFrom(stats,na.omit)
+importFrom(stats,rect.hclust)
+importFrom(stats,reorder)
+importFrom(stats,sd)
+importFrom(stats,setNames)
+importFrom(stats,wilcox.test)
+importFrom(tidyr,gather)
+importFrom(tidyr,replace_na)
+importFrom(tidyr,spread)
+importFrom(tidyselect,all_of)
+importFrom(tidytext,unnest_tokens)
+importFrom(utils,write.csv)
+importFrom(utils,write.table)
+importFrom(widyr,pairwise_count)
--- a/R/IV_by_Period.R
+++ b/R/IV_by_Period.R
@ -0,0 +1,154 @@
+#' @title Identify the WPA metrics that have the biggest change between two periods.
+#'
+#' @description
+#' This function uses the Information Value algorithm to predict which WPA metrics are most explained by the change in dates.
+#'
+#' @param data Person Query as a dataframe including date column named "Date"
+#' This function assumes the data format is MM/DD/YYYY as is standard in a WpA query output.
+#' @param before_start Start date of "before" time period in YYYY-MM-DD. Defaults to earliest date in dataset.
+#' @param before_end End date of "before" time period in YYYY-MM-DD
+#' @param after_start Start date of "after" time period in YYYY-MM-DD. Defaults to day after before_end.
+#' @param after_end End date of "after" time period in YYYY-MM-DD. Defaults to latest date in dataset.
+#' @param mybins Number of bins to cut the data into for Information Value analysis. Defaults to 10.
+#' @param return  Specify whether to return a summary table or detailed Excel and PDF files
+#' Specify "table" or "detailed" for outputs. Defaults to table.
+#'
+#' @import dplyr
+#' @import Information
+#'
+#'
+#' @family Flexible Input
+#'
+#' @export
+
+IV_by_period <-
+  function(data,
+           before_start = min(as.Date(data$Date, "%m/%d/%Y")),
+           before_end,
+           after_start = as.Date(before_end) + 1,
+           after_end = max(as.Date(data$Date, "%m/%d/%Y")),
+           mybins = 10,
+           return = "table") {
+    ## Check inputs
+    required_variables <- c("Date",
+                            "PersonId")
+    ## Error message if variables are not present
+    ## Nothing happens if all present
+    data %>%
+      check_inputs(requirements = required_variables)
+
+    daterange_1_start <- as.Date(before_start)
+    daterange_1_end <- as.Date(before_end)
+    daterange_2_start <- as.Date(after_start)
+    daterange_2_end <- as.Date(after_end)
+
+    WpA_dataset <- data %>% mutate(Date = as.Date(Date, "%m/%d/%Y"))
+
+
+    # Check for dates in data file
+    if (daterange_1_start < min(WpA_dataset$Date) |
+        daterange_1_start > max(WpA_dataset$Date) |
+        daterange_1_end < min(WpA_dataset$Date) |
+        daterange_1_end > max(WpA_dataset$Date) |
+        daterange_2_start < min(WpA_dataset$Date) |
+        daterange_2_start > max(WpA_dataset$Date) |
+        daterange_2_end < min(WpA_dataset$Date) |
+        daterange_2_end > max(WpA_dataset$Date)) {
+      stop('Dates not found in dataset')
+      geterrmessage()
+    }
+
+    # Create variable => Period
+    WpA_dataset_table <-
+      WpA_dataset %>%
+      mutate(
+        Period = case_when(
+          Date >= daterange_1_start &
+            Date <= daterange_1_end ~ "Before",
+          Date >= daterange_2_start &
+            Date <= daterange_2_end ~ "After"
+        )
+      ) %>% filter(Period == "Before" | Period == "After")
+
+    WpA_dataset_table <-
+      WpA_dataset_table %>% mutate(outcome = case_when(Period == "Before" ~ "0",
+                                                       Period == 'After' ~ "1"))
+
+    # Make the use of exponential notation less likely
+    options(scipen = 10)
+
+    # De-select character columns
+    train <-
+      WpA_dataset_table %>%
+      transform(outcome = as.numeric(outcome)) %>%
+      select_if(is.numeric)
+
+    # Filter out NAs
+    train <- train %>%
+      filter(rowSums(is.na(.[, ])) < 1)
+
+    # Rename Outcome Variable
+    # train <- transform(train, outcome = as.numeric(outcome))
+    train <- rename(train, 'Outcome' = "outcome")
+    colnames(train)
+
+
+    # Calculate Odds
+    odds <-
+      sum(train$Outcome) / (length(train$Outcome) - sum(train$Outcome))
+    lnodds <- log(odds)
+
+    # IV Analysis
+    IV <- create_infotables(data = train, y = "Outcome", bins = mybins)
+
+    # if(return == "detailed"){
+    #   # Ranking variables using  IV
+    #   wb <- createWorkbook()
+    #   addWorksheet(wb, "Ranking")
+    #   writeDataTable(wb, "Ranking", x = data.frame(IV$Summary))
+    #
+    #   # Export Individual Tables
+    #   for(i in names(IV$Tables)){
+    #     print(i)
+    #     addWorksheet(wb, substr(i, start = nchar(i) - 30, stop = nchar(i)))
+    #     temp <- IV$Tables[[i]]
+    #     temp$ODDS <- exp(temp$WOE + lnodds)
+    #     temp$PROB <- (temp$ODDS / (temp$ODDS + 1))
+    #     writeDataTable(wb, substr(i, start = nchar(i) - 30, stop = nchar(i)) , x = data.frame(temp))
+    #   }
+    #
+    #   # Save Workbook
+    #   saveWorkbook(wb, "Output_IV_v2.xlsx", overwrite = TRUE)
+    #
+    #   # Plot Graph
+    #   pdf("Output_IV_v2.pdf")
+    #   plot_infotables(IV, IV$Summary$Variable[], same_scale=TRUE)
+    #   dev.off()
+    # } else
+    if (return == "table") {
+      # Store all individual dataframes
+      Tables <- c()
+      Summary <- data.frame(IV$Summary)
+      Tables$Summary <- Summary
+      for (i in names(IV$Tables)) {
+        temp <- IV$Tables[[i]]
+        temp$ODDS <- exp(temp$WOE + lnodds)
+        temp$PROB <- (temp$ODDS / (temp$ODDS + 1))
+        Tables[[i]] <- create_dt(temp, rounding = 2)
+      }
+
+      # Return ranking table
+      return(create_dt(Tables$Summary, rounding = 2))
+      # print("Access individual metrics via Outputs[[metric_name]], e.g., Outputs[[Workweek_span]]")
+
+      # # Store each variable's plot
+      # plots <- c()
+      # for (i in names(IV$Tables)) {
+      #   plots[[i]] <- plot_infotables(IV, i)
+      # }
+
+     } else {
+      stop("Please enter a valid input for `return`, either detailed or table.")
+    }
+
+  }
--- a/R/IV_report.R
+++ b/R/IV_report.R
@ -0,0 +1,72 @@
+#' @title Information Value HTML Report
+#'
+#' @description
+#' The function generates an interactive HTML report using
+#' Standard Query data as an input. The report contains a full Information Value analysis,  a data exploration technique that helps determine which columns in a data set have predictive power or influence on the value of a specified dependent variable.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param predictors A character vector specifying the columns to be used as predictors.
+#' Defaults to NULL, where all numeric vectors in the data will be used as predictors.
+#' @param outcome A string specifying a binary variable, i.e. can only contain
+#' the values 1 or 0.
+#' @param bins Number of bins to use in `Information::create_infotables()`, defaults to 10.
+#' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' For example, "collaboration report".
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
+#' Defaults to TRUE.
+#'
+#' @family Reports
+#'
+#' @export
+IV_report <- function(data,
+                      predictors = NULL,
+                      outcome,
+                      bins= 5,
+                      path = "IV report",
+                      timestamp = TRUE){
+
+  ## Create timestamped path (if applicable)
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+ table_list <-
+   data %>%
+   create_IV(outcome = outcome,
+             predictors = predictors,
+             bins = bins,
+             return = "list")
+
+ table_names <- gsub("_", " ", x = names(table_list))
+
+  output_list <-
+    list(data %>% check_query(return = "text") %>% md2html(),
+         data %>% create_IV(outcome = outcome, predictors=predictors, bins= bins),
+         data %>% create_IV(outcome = outcome, predictors=predictors, bins= bins, return="summary"),
+         data %>% create_IV(outcome = outcome, predictors=predictors, bins= bins, return="plot-WOE")) %>%
+    c(table_list)  %>%
+    purrr::map_if(is.data.frame, create_dt)
+
+  title_list <- c("Data Overview", "Top Predictors", "", "WOE Analysis", table_names)
+
+  n_title <- length(title_list)
+
+  title_levels <- c(2,2,4,2, rep(3, n_title-4))
+
+  generate_report(title = "Information Value Report",
+                  filename = newpath,
+                  outputs = output_list,
+                  titles = title_list,
+                  subheaders = rep("", n_title),
+                  echos = rep(FALSE, n_title),
+                  levels = title_levels,
+                  theme = "cosmo",
+                  preamble = read_preamble("IV_report.md"))
+
+}
+
+
+
+
--- a/R/afterhours_dist.R
+++ b/R/afterhours_dist.R
@ -0,0 +1,64 @@
+#' @title After-Hours distribution
+#' @name afterhours_dist
+#'
+#' @description Analyse the distribution of weekly after-hours collaboration time.
+#' Returns a stacked bar plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @details
+#' Uses the metric \code{After_hours_collaboration_hours}.
+#'
+#' @param data A Standard Person Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param cut A vector specifying the cuts to use for the data,
+#' accepting "default" or "range-cut" as character vector,
+#' or a numeric value of length three to specify the exact breaks to use. e.g. c(1, 3, 5)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom tidyr spread
+#' @importFrom stats median
+#' @importFrom stats sd
+#'
+#' @family After-Hours
+#'
+#' @examples
+#' \dontrun{
+#' ## Return a plot
+#' afterhours_dist(sq_data, hrvar = "Organization")
+#'
+#' ## Return a table
+#' afterhours_dist(sq_data, hrvar = "Organization", return = "table")
+#'
+#' ## Return result with a custom specified breaks
+#' afterhours_dist(sq_data, hrvar = "LevelDesignation", cut = c(4, 7, 9))
+#' }
+#' @export
+afterhours_dist <- function(data,
+                            hrvar = "Organization",
+                            mingroup = 5,
+                            return = "plot",
+                            cut = c(1, 2, 3)) {
+
+  create_dist(data = data,
+              metric = "After_hours_collaboration_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return,
+              cut = cut,
+              dist_colours = c("#FE7F4F",
+                               "#ffdfd3",
+                               "#bed6f2",
+                               "#e9f1fb"))
+
+
+}
+
+#' @rdname afterhours_dist
+#' @export
+afterhours_distribution <- afterhours_dist
--- a/R/afterhours_rank.R
+++ b/R/afterhours_rank.R
@ -0,0 +1,52 @@
+#' @title Rank groups with high After-Hours Collaboration Hours
+#'
+#' @description
+#' This function scans a Standard Person Query for groups with high levels of After-Hours Collaboration.
+#' Returns a table with a all of groups (across multiple HR attributes) ranked by hours of After-Hours Collaboration Hours.
+#'
+#' @details
+#' Uses the metric \code{After_hours_collaboration_hours}.
+#'
+#' @param data A Standard Person Query dataset in the form of a data frame.
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "html" (default, returning an interactive DataTable)
+#' and "df" (data frame)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family After-Hours
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+afterhours_rank <- function(data,
+                            hrvar = extract_hr(data),
+                            mingroup = 5,
+                            return = "table"){
+
+  output <-
+    data %>%
+    create_rank(metric = "After_hours_collaboration_hours",
+                hrvar = hrvar,
+                mingroup = mingroup,
+                return = "table")
+
+  if(return == "html"){
+    return(create_dt(output))
+  } else if(return == "table"){
+    return(output)
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/afterhours_summary.R
+++ b/R/afterhours_summary.R
@ -0,0 +1,47 @@
+#' @title Summary of After-Hours Collaboration Hours
+#'
+#' @description
+#' Provides an overview analysis of after-hours collaboration time.
+#' Returns a bar plot showing average weekly after-hours collaboration hours by default.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' Uses the metric \code{After_hours_collaboration_hours}.
+#'
+#' @inheritParams create_bar
+#'
+#' @family After-Hours
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @examples
+#' # Return a ggplot bar chart
+#' afterhours_summary(sq_data, hrvar = "LevelDesignation")
+#'
+#' # Return a summary table
+#' afterhours_summary(sq_data, hrvar = "LevelDesignation", return = "table")
+#' @export
+afterhours_summary <- function(data,
+                              hrvar = "Organization",
+                              mingroup = 5,
+                              return = "plot"){
+  create_bar(data = data,
+             metric = "After_hours_collaboration_hours",
+             hrvar = hrvar,
+             mingroup = mingroup,
+             return = return,
+             bar_colour = "alert")
+
+}
+
+
+#' @rdname afterhours_summary
+#' @export
+afterhours_sum <- afterhours_summary
+
+
+
+
+
--- a/R/afterhours_trend.R
+++ b/R/afterhours_trend.R
@ -0,0 +1,106 @@
+#' @title After-Hours Time Trend
+#'
+#' @description
+#' Provides a week by week view of after-hours collaboration time.
+#' By default returns a week by week heatmap, highlighting the points in time with most activity.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' Uses the metric \code{After_hours_collaboration_hours}.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family After-Hours
+#'
+#' @examples
+#' ## Run plot
+#' afterhours_trend(sq_data)
+#'
+#' ## Run table
+#' afterhours_trend(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+afterhours_trend <- function(data, hrvar = "Organization", mingroup=5, return = "plot"){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          "After_hours_collaboration_hours",
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  myTable <-
+    data %>%
+    mutate(Date=as.Date(Date, "%m/%d/%Y")) %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, Date, group, After_hours_collaboration_hours) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy threshold
+
+  myTable <-
+    myTable %>%
+    group_by(Date, group) %>%
+    summarize(Employee_Count=mean(Employee_Count),
+              After_hours_collaboration_hours=mean(After_hours_collaboration_hours))
+
+  myTable_plot <-
+    myTable %>%
+    select(Date, group, After_hours_collaboration_hours)
+
+  myTable_return <-
+    myTable_plot %>%
+    spread(Date, After_hours_collaboration_hours)
+
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x =Date , y = group , fill = After_hours_collaboration_hours)) +
+    geom_tile(height=.5) +
+    scale_fill_gradient(name="Hours", low = "white", high = "red") +
+    theme_classic() +
+    theme(axis.text=element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14), legend.position = "right",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+	labs(title = "After-Hours Collaboration",
+	     subtitle = paste("Meeting and Email Time Outside Working Hours by", hrvar)) +
+	xlab("Date") +
+	ylab(hrvar) +
+	labs(caption = extract_date_range(data, return = "text"))
+
+  if(return == "table"){
+
+    myTable_return %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/capacity_report.R
+++ b/R/capacity_report.R
@ -0,0 +1,101 @@
+#' @title Generate a Capacity report in HTML
+#'
+#' @description
+#' The function generates an interactive HTML report using the
+#' Standard Person Query data as an input. The report contains a series
+#' of summary analysis and visualisations relating to key **capacity**
+#' metrics in Workplace Analytics,including length of week and
+#' time in after-hours collaboration.
+#'
+#' @param data A Standard Person Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' For example, "capacity report".
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
+#' Defaults to TRUE.
+#'
+#' @importFrom purrr map_if
+#' @importFrom dplyr `%>%`
+#'
+#' @family Reports
+#'
+#' @export
+capacity_report <- function(data,
+                            hrvar = "Organization",
+                            mingroup = 5,
+                            path = "capacity report",
+                            timestamp = TRUE){
+
+  ## Create timestamped path (if applicable)
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+  # Set outputs
+  output_list <-
+    list(data %>% check_query(return = "text") %>% md2html(),
+         md2html(text = read_preamble("workloads_section.md")), # Header
+         data %>% workloads_summary(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% workloads_summary(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% workloads_dist(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% workloads_dist(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% workloads_trend(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% workloads_trend(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% workloads_line(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% workloads_line(hrvar = hrvar, mingroup = mingroup, return = "table"),
+
+         md2html(text = read_preamble("afterhours_section.md")), # Header
+         data %>% afterhours_summary(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% afterhours_summary(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% afterhours_dist(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% afterhours_dist(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% afterhours_trend(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% afterhours_trend(hrvar = hrvar, mingroup = mingroup, return = "table")) %>%
+    purrr::map_if(is.data.frame, create_dt)
+
+  # Set header titles
+  title_list <-
+    c("Data Overview",
+
+      "Workloads",
+      "Workloads Summary - Plot",
+      "Workloads Summary - Table",
+      "Workloads Distribution - Plot",
+      "Workloads Distribution - Table",
+      "Workloads Trend - Plot",
+      "Workloads Trend - Table",
+      "Workloads over time - Plot",
+      "Workloads over time - Table",
+
+      "After hours",
+      "After hours Summary - Plot",
+      "After hours Summary - Table",
+      "After hours Distribution - Plot",
+      "After hours Distribution - Table",
+      "After hours Trend - Plot",
+      "After hours Trend - Table")
+
+  # Set header levels
+  n_title <- length(title_list)
+  levels_list <- rep(3, n_title)
+  levels_list[c(1, 2, 11)] <- 2 # Section header
+
+  generate_report(title = "Capacity Report",
+                  filename = newpath,
+                  outputs = output_list,
+                  titles = title_list,
+                  subheaders = rep("", n_title),
+                  echos = rep(FALSE, n_title),
+                  levels = rep(3, n_title),
+                  theme = "cosmo",
+                  preamble = read_preamble("capacity_report.md"))
+
+}
+
+
+
+
--- a/R/check_query.R
+++ b/R/check_query.R
@ -0,0 +1,206 @@
+#' @title Check a query to ensure that it is suitable for analysis
+#'
+#' @description
+#' Prints diagnostic data about the data query to the R console, with information
+#' such as date range, number of employees, HR attributes identified, etc.
+#'
+#' @details
+#' This can be used with any person-level query, such as the standard person query,
+#' collaboration assessment query, and the hourly collaboration query. When run,
+#' this prints diagnostic data to the R console.
+#'
+#' @param data A person-level query in the form of a data frame, including the standard person query,
+#' collaboration assessment query, and the hourly collaboration query.
+#' @param return Character string to specify whether to return a console message ("message"),
+#' a character string ("text"), or a data frame. Defaults to "message".
+#' @param validation Logical value to specify whether to return a check used by the `validation_report()`.
+#' Defaults to FALSE.
+#'
+#' @examples
+#' check_query(sq_data)
+#'
+#' @family Data Validation
+#'
+#' @export
+check_query <- function(data, return = "message", validation = FALSE){
+
+  if(!is.data.frame(data)){
+    stop("Input is not a data frame.")
+  }
+
+  if("PersonId" %in% names(data)){
+
+    if(validation == FALSE){
+
+      check_person_query(data = data, return = return)
+
+    } else if(validation == TRUE){
+
+      # Different displays required for validation_report()
+      check_query_validation(data = data, return = return)
+
+    }
+  } else {
+    message("Note: checks are currently unavailable for a non-Person query")
+  }
+}
+
+#' @title Check a Person Query to ensure that it is suitable for analysis
+#'
+#' @description
+#' Prints diagnostic data about the data query to the R console, with information
+#' such as date range, number of employees, HR attributes identified, etc.
+#'
+#' @inheritParams check_query
+#'
+#' @details Used as part of `check_query()`.
+#'
+#'
+check_person_query <- function(data, return){
+
+  ## Query Type - Initialise
+  ## Uses `identify_query()`
+  main_chunk <- paste0("The data used is a ", identify_query(data), ".")
+
+  ## PersonId
+  if(!("PersonId" %in% names(data))){
+    stop("There is no `PersonId` variable in the input.")
+  } else {
+    new_chunk <- paste("There are", dplyr::n_distinct(data$PersonId), "employees in this dataset.")
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## Date
+  if(!("Date" %in% names(data))){
+    stop("There is no `Date` variable in the input.")
+  } else {
+    data$Date <- as.Date(data$Date, "%m/%d/%Y")
+    new_chunk <- paste0("Date ranges from ", min(data$Date), " to ", max(data$Date), ".")
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## HR Variables
+  hr_chr <- extract_hr(data, max_unique = 200) %>% wrap(wrapper = "`")
+  new_chunk <- paste("There are", length(hr_chr), "(estimated) HR attributes in the data:" )
+  main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  new_chunk <- paste(hr_chr, collapse = ", ")
+  main_chunk <- paste(main_chunk, new_chunk, sep = "\n")
+
+  ## `IsActive` flag
+  if(!("IsActive" %in% names(data))){
+    new_chunk <- "The `IsActive` flag is not present in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n")
+  } else {
+    data$IsActive <- as.logical(data$IsActive) # Force to logical
+    active_n <- dplyr::n_distinct(data[data$IsActive == TRUE, "PersonId"])
+    new_chunk <- paste0("There are ", active_n, " active employees out of all in the dataset.")
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## Variable check header
+  new_chunk <- "Variable name check:"
+  main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+
+  ## Collaboration_hours
+  if(!("Collaboration_hours" %in% names(data)) &
+     ("Collaboration_hrs" %in% names(data))){
+    new_chunk <- "`Collaboration_hrs` is used instead of `Collaboration_hours` in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  } else if(!("Collaboration_hrs" %in% names(data)) &
+            ("Collaboration_hours" %in% names(data))){
+    new_chunk <- "`Collaboration_hours` is used instead of `Collaboration_hrs` in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  } else {
+    new_chunk <- "No collaboration hour metric exists in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## Instant_Message_hours
+  if(!("Instant_message_hours" %in% names(data)) &
+     ("Instant_Message_hours" %in% names(data))){
+    new_chunk <- "`Instant_Message_hours` is used instead of `Instant_message_hours` in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  } else if(!("Instant_Message_hours" %in% names(data)) &
+            ("Instant_message_hours" %in% names(data))){
+    new_chunk <- "`Instant_message_hours` is used instead of `Instant_Message_hours` in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  } else {
+    new_chunk <- "No instant message hour metric exists in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## Return
+  if(return == "message"){
+
+    main_chunk <- paste("", main_chunk, sep = "\n")
+    message(main_chunk)
+
+  } else if(return == "text"){
+    main_chunk
+  } else {
+    stop("Please check inputs for `return`")
+  }
+}
+
+#' @title Perform a query check for the validation report
+#'
+#' @description
+#' Prints diagnostic data about the data query to the R console, with information
+#' such as date range, number of employees, HR attributes identified, etc.
+#' Optimised for the `validation_report()`
+#'
+#' @inheritParams check_query
+#'
+#' @details Used as part of `check_query()`.
+#'
+check_query_validation <- function(data, return){
+  ## Query Type - Initialise
+  main_chunk <- ""
+
+  ## PersonId
+  if(!("PersonId" %in% names(data))){
+    stop("There is no `PersonId` variable in the input.")
+  } else {
+    new_chunk <- paste("There are", dplyr::n_distinct(data$PersonId), "employees in this dataset.")
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## Date
+  if(!("Date" %in% names(data))){
+    stop("There is no `Date` variable in the input.")
+  } else {
+    data$Date <- as.Date(data$Date, "%m/%d/%Y")
+    new_chunk <- paste0("Date ranges from ", min(data$Date), " to ", max(data$Date), ".")
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## HR Variables
+  hr_chr <- extract_hr(data, max_unique = 200) %>% wrap(wrapper = "`")
+  new_chunk <- paste("There are", length(hr_chr), "(estimated) HR attributes in the data:" )
+  main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  new_chunk <- paste(hr_chr, collapse = ", ")
+  main_chunk <- paste(main_chunk, new_chunk, sep = "\n")
+
+  ## `IsActive` flag
+  if(!("IsActive" %in% names(data))){
+    new_chunk <- "The `IsActive` flag is not present in the data."
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n")
+  } else {
+    data$IsActive <- as.logical(data$IsActive) # Force to logical
+    active_n <- dplyr::n_distinct(data[data$IsActive == TRUE, "PersonId"])
+    new_chunk <- paste0("There are ", active_n, " active employees out of all in the dataset.")
+    main_chunk <- paste(main_chunk, new_chunk, sep = "\n\n")
+  }
+
+  ## Return
+  if(return == "message"){
+
+    main_chunk <- paste("", main_chunk, sep = "\n")
+    message(main_chunk)
+
+  } else if(return == "text"){
+    main_chunk
+  } else {
+    stop("Please check inputs for `return`")
+  }
+}
--- a/R/coaching_report.R
+++ b/R/coaching_report.R
@ -0,0 +1,81 @@
+#' @title Generate a Coaching report in HTML
+#'
+#' @description
+#' The function generates an interactive HTML report using
+#' Standard Query data as an input. The report contains a series
+#' of summary analysis and visualisations relating to key **coaching**
+#' metrics in Workplace Analytics, specifically relating to the time
+#' spent between managers and their direct reports.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "LevelDesignation"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' For example, "collaboration report".
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
+#' Defaults to TRUE.
+#'
+#' @importFrom purrr map_if
+#' @importFrom dplyr `%>%`
+#'
+#' @family Reports
+#'
+#' @export
+coaching_report <- function(data,
+                            hrvar = "LevelDesignation",
+                            mingroup = 5,
+                            path = "coaching report",
+                            timestamp = TRUE){
+
+  ## Create timestamped path (if applicable)
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+  output_list <-
+    list(data %>% check_query(return = "text") %>% md2html(),
+
+         data %>% mgrrel_matrix(hrvar = hrvar, return = "plot"), # no mingroup arg
+         data %>% mgrrel_matrix(hrvar = hrvar, return = "table"), # no mingroup arg
+
+         data %>% one2one_sum(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% one2one_sum(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% one2one_dist(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% one2one_dist(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% one2one_trend(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% one2one_trend(hrvar = hrvar, mingroup = mingroup, return = "table")) %>%
+    purrr::map_if(is.data.frame, create_dt)
+
+  title_list <-
+    c("Data Overview",
+
+      "Manager Relation Style - Plot",
+      "Manager Relation Style  - Table",
+
+      "1-to-1 Summary - Plot",
+      "1-to-1 Summary - Table",
+      "1-to-1 Distribution - Plot",
+      "1-to-1 Distribution - Table",
+      "1-to-1 Trend - Plot",
+      "1-to-1 Trend - Table")
+
+  n_title <- length(title_list)
+
+  generate_report(title = "Coaching Report",
+                  filename = newpath,
+                  outputs = output_list,
+                  titles = title_list,
+                  subheaders = rep("", n_title),
+                  echos = rep(FALSE, n_title),
+                  levels = rep(3, n_title),
+                  theme = "cosmo",
+                  preamble = read_preamble("coaching_report.md"))
+
+}
+
+
+
+
--- a/R/collaboration_area.R
+++ b/R/collaboration_area.R
@ -0,0 +1,134 @@
+#' @title Collaboration - Stacked Area Plot
+#'
+#' @description
+#' Provides an overview analysis of 'Weekly Digital Collaboration'.
+#' Returns an stacked area plot of Email and Meeting Hours by default.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' Uses the metrics `Meeting_hours`, `Email_hours`, `Unscheduled_Call_hours`,
+#' and `Instant_Message_hours`.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' A Collaboration assessment dataset may also be provided, in which
+#' Unscheduled call hours would be included in the output.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Collaboration
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+collaboration_area <- function(data,
+                               hrvar = "Organization",
+                               mingroup=5,
+                               return = "plot"){
+
+  data$Date <- as.Date(data$Date, format = "%m/%d/%Y")
+
+  if("Instant_message_hours" %in% names(data)){
+
+    data <- rename(data, Instant_Message_hours = "Instant_message_hours")
+
+  }
+
+  if("Unscheduled_call_hours" %in% names(data)){
+
+    data <- rename(data, Unscheduled_Call_hours = "Unscheduled_call_hours")
+
+  }
+
+  if("Unscheduled_Call_hours" %in% names(data)){
+
+    main_vars <- c("Meeting_hours",
+                   "Email_hours",
+                   "Instant_Message_hours",
+                   "Unscheduled_Call_hours")
+
+  } else {
+
+    main_vars <- c("Meeting_hours",
+                   "Email_hours",
+                   "Instant_Message_hours")
+
+  }
+
+  myTable <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId,
+           Date,
+           group,
+           main_vars) %>%
+    group_by(Date, group) %>%
+    summarise_at(vars(main_vars), ~mean(.)) %>%
+    left_join(hrvar_count(data, hrvar, return = "table"),
+              by = c("group" = hrvar)) %>%
+    rename(Employee_Count = "n") %>%
+    filter(Employee_Count >= mingroup) %>%
+    ungroup()
+
+
+  myTable_long <-
+    myTable %>%
+    select(Date, group, ends_with("_hours")) %>%
+    gather(Metric, Hours, -Date, -group) %>%
+    mutate(Metric = sub(pattern = "_hours", replacement = "", x = Metric))
+
+  ## Levels
+  level_chr <- sub(pattern = "_hours", replacement = "", x = main_vars)
+
+  ## Colour definitions
+  colour_defs <-
+    c("Meeting" = rgb2hex(174, 239, 168),
+      "Email" = rgb2hex(192, 117, 205),
+      "Instant_Message" = rgb2hex(113, 200, 234),
+      "Unscheduled_Call" = rgb2hex(252, 161, 124))
+
+  colour_defs <- colour_defs[names(colour_defs) %in% level_chr]
+
+  plot_object <-
+    myTable_long %>%
+    mutate(Metric = factor(Metric, levels = level_chr)) %>%
+    ggplot(aes(x = Date, y = Hours, colour = Metric)) +
+    geom_area(aes(fill = Metric), alpha = 1.0, position = 'stack') +
+    theme_wpa_basic() +
+    scale_y_continuous(labels = round) +
+    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
+    scale_colour_manual(values = colour_defs) +
+    scale_fill_manual(values = colour_defs) +
+    facet_wrap(.~group) +
+    labs(title = "Total Collaboration Hours",
+         subtitle = paste("Weekly collaboration hours by",  camel_clean(hrvar))) +
+    labs(caption = extract_date_range(data, return = "text"))
+
+  if(return == "table"){
+
+    myTable %>%
+      as_tibble() %>%
+      mutate(Collaboration_hours = select(., main_vars) %>%
+               apply(1, sum, na.rm = TRUE))
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/collaboration_dist.R
+++ b/R/collaboration_dist.R
@ -0,0 +1,40 @@
+#' @title Collaboration Hours distribution
+#'
+#' @description
+#' Analyze the distribution of Collaboration Hours.
+#' Returns a stacked bar plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @details
+#' Uses the metric `Collaboration_hours`.
+#'
+#' @inheritParams create_dist
+#' @family Collaboration
+#'
+#' @examples
+#' ## Return a plot
+#' collaboration_dist(sq_data, hrvar = "Organization")
+#'
+#' ## Return a table
+#' collaboration_dist(sq_data, hrvar = "Organization", return = "table")
+#' @export
+
+collaboration_dist <- function(data,
+                               hrvar = "Organization",
+                               mingroup = 5,
+                               return = "plot",
+                               cut = c(15, 20, 25)) {
+
+  create_dist(data = data,
+              metric = "Collaboration_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return,
+              cut = cut)
+
+
+}
+
+#' @rdname collaboration_dist
+#' @export
+collaboration_distribution <- collaboration_dist
--- a/R/collaboration_fizz.R
+++ b/R/collaboration_fizz.R
@ -0,0 +1,30 @@
+#' @title Distribution of Collaboration Hours (Fizzy Drink plot)
+#'
+#' @description
+#' Analyze weekly collaboration hours distribution, and returns
+#' a 'fizzy' scatter plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @details
+#' Uses the metric `Collaboration_hours`.
+#'
+#' @inheritParams create_fizz
+#'
+#' @family Collaboration
+#'
+#' @examples
+#' collaboration_fizz(sq_data, hrvar = "Organization", return = "table")
+#' @export
+
+collaboration_fizz <- function(data,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot"){
+
+  create_fizz(data = data,
+              metric = "Collaboration_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+
+}
--- a/R/collaboration_line.R
+++ b/R/collaboration_line.R
@ -0,0 +1,61 @@
+#' @title Collaboration Time Trend - Line Chart
+#'
+#' @description
+#' Provides a week by week view of collaboration time, visualised as line charts.
+#' By default returns a line chart for collaboration hours,
+#' with a separate panel per value in the HR attribute.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' Uses the metric `Collaboration_hours`.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Collaboration
+#'
+#' @examples
+#'
+#' ## Return a line plot
+#' collaboration_line(sq_data, hrvar = "LevelDesignation")
+#'
+#'
+#' ## Return a table
+#' collaboration_line(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+collaboration_line <- function(data,
+                                hrvar = "Organization",
+                                mingroup=5,
+                                return = "plot"){
+
+  ## Inherit arguments
+  output <- create_line(data = data,
+              metric = "Collaboration_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+
+  if(return == "plot"){
+    output +
+      labs(title = "Collaboration Hours")
+  } else if(return == "table"){
+    output
+  } else {
+    stop("Invalid `return` value")
+  }
+}
--- a/R/collaboration_rank.R
+++ b/R/collaboration_rank.R
@ -0,0 +1,50 @@
+#' @title Collaboration Ranking
+#'
+#' @description
+#' This function scans a standard query output for groups with high levels of 'Weekly Digital Collaboration'.
+#' Returns a table with a all of groups (across multiple HR attributes) ranked by hours of digital collaboration.
+#'
+#' @details
+#' Uses the metric `Collaboration_hours`.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "html" (default, returning an interactive DataTable)
+#' and "df" (data frame)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family Collaboration
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+collaboration_rank <- function(data,
+                               hrvar = extract_hr(data),
+                               mingroup = 5,
+                               return = "table"){
+
+  output <-
+    data %>% create_rank(metric="Collaboration_hours",   hrvar = hrvar,
+                               mingroup = mingroup,
+                               return = "table")
+
+  if(return == "html"){
+    return(create_dt(output))
+  } else if(return == "table"){
+    return(output)
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/collaboration_report.R
+++ b/R/collaboration_report.R
@ -0,0 +1,125 @@
+#' @title Generate a Collaboration report in HTML
+#'
+#' @description
+#' The function generates an interactive HTML report using
+#' Standard Query data as an input. The report contains a series
+#' of summary analysis and visualisations relating to key **collaboration**
+#' metrics in Workplace Analytics,including email and meeting hours.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' For example, "collaboration report".
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
+#' Defaults to TRUE.
+#'
+#' @importFrom purrr map_if
+#' @importFrom dplyr `%>%`
+#'
+#' @family Reports
+#'
+#' @export
+collaboration_report <- function(data,
+                                 hrvar = "Organization",
+                                 mingroup = 5,
+                                 path = "collaboration report",
+                                 timestamp = TRUE){
+
+  ## Create timestamped path (if applicable)
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+  # Set outputs
+  output_list <-
+    list(data %>% check_query(return = "text") %>% md2html(),
+         md2html(text = read_preamble("collaboration_section.md")), # Header
+         data %>% collaboration_sum(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% collaboration_sum(hrvar = hrvar, mingroup = mingroup, return = "table"),
+
+         data %>% afterhours_sum(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% afterhours_sum(hrvar = hrvar, mingroup = mingroup, return = "table"),
+
+         data %>% collaboration_dist(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% collaboration_dist(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% collaboration_trend(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% collaboration_trend(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% collaboration_line(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% collaboration_line(hrvar = hrvar, mingroup = mingroup, return = "table"),
+
+         md2html(text = read_preamble("email_section.md")), # Header
+         data %>% email_summary(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% email_summary(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% email_dist(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% email_dist(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% email_trend(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% email_trend(hrvar = hrvar, mingroup = mingroup, return = "table"),
+
+         md2html(text = read_preamble("meeting_section.md")), # Header
+         data %>% meeting_summary(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% meeting_summary(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% meeting_dist(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% meeting_dist(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% meeting_trend(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% meeting_trend(hrvar = hrvar, mingroup = mingroup, return = "table"),
+         data %>% meeting_quality(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% meeting_quality(hrvar = hrvar, mingroup = mingroup, return = "table")) %>%
+    purrr::map_if(is.data.frame, create_dt)
+
+  # Set header titles
+  title_list <-
+    c("Data Overview",
+
+      "Collaboration", # Section header
+      "Collaboration Summary - Plot",
+      "Collaboration Summary - Table",
+
+      "Afterhours Summary - Plot",
+      "Afterhours Summary - Table",
+
+      "Collaboration Distribution - Plot",
+      "Collaboration Distribution - Table",
+      "Collaboration Trend - Plot",
+      "Collaboration Trend - Table",
+      "Collaboration over time - Plot",
+      "Collaboration over time - Table",
+
+      "Email", # Section header
+      "Email Summary - Plot",
+      "Email Summary - Table",
+      "Email Distribution - Plot",
+      "Email Distribution - Table",
+      "Email Trend - Plot",
+      "Email Trend - Table",
+
+      "Meeting", # Section header
+      "Meeting Summary - Plot",
+      "Meeting Summary - Table",
+      "Meeting Distribution - Plot",
+      "Meeting Distribution - Table",
+      "Meeting Trend - Plot",
+      "Meeting Trend - Table",
+      "Meeting Quality - Plot",
+      "Meeting Quality - Table")
+
+  # Set header levels
+  n_title <- length(title_list)
+  levels_list <- rep(3, n_title)
+  levels_list[c(1, 2, 13, 20)] <- 2 # Section header
+
+  # Generate report
+  generate_report(title = "Collaboration Report",
+                  filename = newpath,
+                  outputs = output_list,
+                  titles = title_list,
+                  subheaders = rep("", n_title),
+                  echos = rep(FALSE, n_title),
+                  levels = levels_list,
+                  theme = "cosmo",
+                  preamble = read_preamble("collaboration_report.md"))
+
+}
--- a/R/collaboration_sum.R
+++ b/R/collaboration_sum.R
@ -0,0 +1,79 @@
+#' @title Collaboration Summary
+#'
+#' @description
+#' Provides an overview analysis of 'Weekly Digital Collaboration'.
+#' Returns a stacked bar plot of Email and Meeting Hours by default.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' Uses the metrics `Meeting_hours`, `Email_hours`, `Unscheduled_Call_hours`,
+#' and `Instant_Message_hours`.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family Collaboration
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+collaboration_sum <- function(data,
+                              hrvar = "Organization",
+                              mingroup=5,
+                              return = "plot"){
+
+  if("Instant_message_hours" %in% names(data)){
+
+    data <- rename(data, Instant_Message_hours = "Instant_message_hours")
+
+  }
+
+  if("Unscheduled_Call_hours" %in% names(data)){
+
+    main_vars <- c("Meeting_hours",
+                   "Email_hours",
+                   "Instant_Message_hours",
+                   "Unscheduled_Call_hours")
+
+  } else {
+
+    main_vars <- c("Meeting_hours",
+                   "Email_hours")
+
+  }
+
+ create_stacked(data = data,
+                hrvar = hrvar,
+                metrics = main_vars,
+                mingroup = mingroup,
+                return = return)
+
+}
+
+#' @rdname collaboration_sum
+#' @export
+collab_sum <- collaboration_sum
+
+#' @rdname collaboration_sum
+#' @export
+collaboration_summary <- collaboration_sum
+
+#' @rdname collaboration_sum
+#' @export
+collab_summary <-collaboration_sum
+
+
+
+
--- a/R/collaboration_trend.R
+++ b/R/collaboration_trend.R
@ -0,0 +1,100 @@
+#' @title Collaboration Time Trend
+#'
+#' @description
+#' Provides a week by week view of collaboration time.
+#' By default returns a week by week heatmap, highlighting the points in time with most activity.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' Uses the metric `Collaboration_hours`.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Collaboration
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+collaboration_trend <- function(data,
+                                hrvar = "Organization",
+                                mingroup=5,
+                                return = "plot"){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          "Collaboration_hours",
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  myTable <-
+    data %>%
+    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, Date, group, Collaboration_hours) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy threshold
+
+  myTable <-
+    myTable %>%
+    group_by(Date, group) %>%
+    summarize(Employee_Count = mean(Employee_Count),
+              Collaboration_hours = mean(Collaboration_hours))
+
+  myTable_plot <- myTable %>% select(Date, group, Collaboration_hours)
+
+  myTable_return <-  myTable_plot %>% spread(Date, Collaboration_hours)
+
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x = Date , y = group , fill = Collaboration_hours)) +
+    geom_tile(height=.5) +
+    scale_fill_gradient(name="Hours", low = "white", high = "red") +
+    theme_classic() +
+    theme(axis.text=element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "right",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+    labs(title = "Collaboration Hours",
+         subtitle = paste("Total meeting and email time by", tolower(hrvar))) +
+    xlab("Date") +
+    ylab(hrvar) +
+    labs(caption = extract_date_range(data, return = "text"))
+
+  if(return == "table"){
+
+    myTable_return %>%
+	  as_tibble()  %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/connectivity_report.R
+++ b/R/connectivity_report.R
@ -0,0 +1,73 @@
+#' @title Generate a Connectivity report in HTML
+#'
+#' @description
+#' The function generates an interactive HTML report using
+#' Standard Query data as an input. The report contains a series
+#' of summary analysis and visualisations relating to key **connectivity**
+#' metrics in Workplace Analytics, including external/internal network size
+#' vs breadth.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' For example, "collaboration report".
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
+#' Defaults to TRUE.
+#'
+#' @importFrom purrr map_if
+#' @importFrom dplyr `%>%`
+#'
+#' @family Reports
+#'
+#' @export
+connectivity_report <- function(data,
+                                hrvar = "LevelDesignation",
+                                mingroup = 5,
+                                path = "connectivity report",
+                                timestamp = TRUE){
+
+  ## Create timestamped path (if applicable)
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+  output_list <-
+    list(data %>% check_query(return = "text") %>% md2html(),
+
+         data %>% external_network_plot(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% external_network_plot(hrvar = hrvar, mingroup = mingroup, return = "table"),
+
+         data %>% internal_network_plot(hrvar = hrvar, mingroup = mingroup, return = "plot"),
+         data %>% internal_network_plot(hrvar = hrvar, mingroup = mingroup, return = "table")) %>%
+    purrr::map_if(is.data.frame, create_dt)
+
+  title_list <-
+    c("Data Overview",
+
+      "External network - Plot",
+      "External network  - Table",
+
+      "Internal network - Plot",
+      "Internal network - Table")
+
+  n_title <- length(title_list)
+
+  generate_report(title = "Connectivity Report",
+                  filename = newpath,
+                  outputs = output_list,
+                  titles = title_list,
+                  subheaders = rep("", n_title),
+                  echos = rep(FALSE, n_title),
+                  levels = rep(3, n_title),
+                  theme = "cosmo",
+                  preamble = read_preamble("connectivity_report.md"))
+
+}
+
+
+
+
--- a/R/copy_df.R
+++ b/R/copy_df.R
@ -0,0 +1,32 @@
+#' @title Copy a data frame to clipboard for pasting in Excel
+#'
+#' @description
+#' This is a pipe-optimised function, that feeds into `wpa::export()`,
+#' but can be used as a stand-alone function.
+#'
+#' Based on the original function at http://www.github.com/martinctc/surveytoolbox.
+#'
+#' @param x Data frame to be passed through. Cannot contain list-columns or nested data frames.
+#' @param row.names A logical vector for specifying whether to allow row names. Defaults to FALSE.
+#' @param col.names A logical vector for specifying whether to allow column names. Defaults to FALSE.
+#' @param quietly Set this to TRUE to not print data frame on console
+#' @param ... Additional arguments for write.table().
+#'
+#' @importFrom utils write.table
+#'
+#' @export
+
+copy_df <-function(x,
+                   row.names = FALSE,
+                   col.names = TRUE,
+                   quietly = FALSE,...) {
+
+
+    utils::write.table(x,"clipboard-50000",
+                sep="\t",
+                row.names=row.names,
+                col.names=col.names,...)
+
+  if(quietly==FALSE) print(x)
+
+}
--- a/R/create_IV.R
+++ b/R/create_IV.R
@ -0,0 +1,92 @@
+#' @title Calculate Information Value for a selected outcome variable
+#'
+#' @description
+#' Specify an outcome variable and return IV outputs.
+#' All numeric variables in the dataset are used as predictor variables.
+#'
+#' @param data A Person Query dataset in the form of a data frame.
+#' @param predictors A character vector specifying the columns to be used as predictors.
+#' Defaults to NULL, where all numeric vectors in the data will be used as predictors.
+#' @param outcome A string specifying a binary variable, i.e. can only contain
+#' the values 1 or 0.
+#' @param bins Number of bins to use in `Information::create_infotables()`, defaults to 10.
+#' @param return String specifying what output to return.
+#' Defaults to "plot" that return a bar plot summarising the information value.
+#' "summary" returns a summary table, "list" returns a list of outputs for all the
+#' input variables, "plot-WOE" commpares distribution for top predictors.
+#'
+#' @import dplyr
+#'
+#' @family Information Value
+#'
+#' @examples
+#' \dontrun{
+#' sq_data %>%
+#'   mutate(X = ifelse(Collaboration_hours > 12, 1, 0)) %>%
+#'   create_IV(outcome = "X")
+#' sq_data %>%
+#'   mutate(X = ifelse(Collaboration_hours > 12, 1, 0)) %>%
+#'   create_IV(outcome = "X",
+#'             predictors = c("Email_hours", "Meeting_hours"),
+#'             return = "list")
+#' }
+#'
+#' @export
+create_IV <- function(data,
+                      predictors = NULL,
+                      outcome,
+                      bins = 5,
+                      return = "plot"){
+
+  if(is.null(predictors)){
+    train <-
+      data %>%
+      rename(outcome = outcome) %>%
+      select_if(is.numeric) %>%
+      tidyr::drop_na()
+  } else {
+    train <-
+      data %>%
+      rename(outcome = outcome) %>%
+      select(predictors, outcome) %>%
+      tidyr::drop_na()
+  }
+
+  # Calculate Odds
+  odds <- sum(train$outcome) / (length(train$outcome) - sum(train$outcome))
+  lnodds <- log(odds)
+
+  # IV Analysis
+  IV <- Information::create_infotables(data = train, y = "outcome", bins = bins)
+  IV_summary <- IV$Summary
+  IV_names <- names(IV$Tables)
+
+  # Output list
+  output_list <-
+    IV_names %>%
+    purrr::map(function(x){
+      IV$Tables[[x]] %>%
+        mutate(ODDS = exp(WOE + lnodds),
+               PROB = ODDS / (ODDS + 1))
+    }) %>%
+    purrr::set_names(IV_names)
+
+  if(return == "summary"){
+    IV_summary
+  } else if(return == "plot"){
+    IV_summary %>%
+      utils::head(12) %>%
+      create_bar_asis(group_var = "Variable",
+                      bar_var = "IV",
+                      title = "Information Value (IV)",
+                      subtitle = "Showing top 12 only")
+
+  } else if(return == "plot-WOE"){
+    Information::plot_infotables(IV, IV$Summary$Variable[], same_scale=TRUE) %>% grDevices::recordPlot()
+
+  } else if(return == "list"){
+    output_list
+  } else {
+    stop("Please enter a valid input for `return`.")
+  }
+}
--- a/R/create_bar.R
+++ b/R/create_bar.R
@ -0,0 +1,168 @@
+#' @title Mean Bar Plot (General Purpose)
+#'
+#' @description
+#' Provides an overview analysis of a selected metric by calculating a mean per metric.
+#' Returns a bar plot showing the average of a selected metric by default.
+#' Additional options available to return a summary table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#' but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param bar_colour String to specify colour to use for bars.
+#' In-built accepted values include "default" (default), "alert" (red), and
+#' "darkblue". Otherwise, hex codes are also accepted. You can also supply
+#' RGB values via `rgb2hex()`.
+#' @param na.rm A logical value indicating whether NA values should be stripped
+#' before the computation proceeds. Defaults to FALSE.
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @importFrom scales wrap_format
+#' @importFrom stats reorder
+#'
+#' @family General
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @examples
+#' # Return a ggplot bar chart
+#' create_bar(sq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation")
+#' create_bar(sq_data, metric = "Generated_workload_email_hours")
+#' create_bar(sq_data, metric = "After_hours_collaboration_hours")
+#'
+#' # Return a summary table
+#' create_bar(sq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation", return = "table")
+#' @export
+create_bar <- function(data,
+                       metric,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot",
+                       bar_colour = "default",
+                       na.rm = FALSE){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          metric,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Clean metric name
+  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+
+  ## Data for bar plot
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>%
+    group_by(PersonId, group) %>%
+    summarise(!!sym(metric) := mean(!!sym(metric), na.rm = na.rm)) %>%
+    ungroup() %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group") %>%
+    filter(Employee_Count >= mingroup)
+
+  ## Colour bar override
+  if(bar_colour == "default"){
+
+    bar_colour <- rgb2hex(0, 120, 212)
+
+  } else if(bar_colour == "alert"){
+
+    bar_colour <- rgb2hex(215, 59, 1)
+
+  } else if(bar_colour == "darkblue"){
+
+    bar_colour <- rgb2hex(0, 32, 80)
+
+  }
+
+  ## Employee count / base size table
+  plot_legend <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Employee_Count = first(Employee_Count)) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count))
+
+  ## Data for bar plot
+  plot_table <-
+    plot_data %>%
+    group_by(group) %>%
+    summarise_at(metric, ~mean(., na.rm = na.rm)) %>%
+    arrange(desc(!!sym(metric)))
+
+  ## Table for annotation
+  annot_table <-
+    plot_legend %>%
+    dplyr::left_join(plot_table, by = "group")
+
+  ## Location attribute for x axis
+ location <- plot_table %>% select(!!sym(metric)) %>% max()
+
+  ## Bar plot
+  plot_object <-
+    plot_table %>%
+    ggplot(aes(x = stats::reorder(group, !!sym(metric)), y = !!sym(metric))) +
+    geom_bar(stat = "identity",
+             fill = bar_colour) +
+    geom_text(aes(label = round(!!sym(metric), 1)),
+              hjust = 1.3,
+              color = "#FFFFFF",
+              fontface = "bold",
+              size = 4) +
+    scale_y_continuous(limits = c(0, location * 1.25)) +
+    annotate("text",
+             x = plot_legend$group,
+             y = location * 1.15,
+             label = plot_legend$Employee_Count) +
+    annotate("rect", xmin = 0.5, xmax = length(plot_legend$group) + 0.5, ymin = location * 1.05, ymax = location * 1.25, alpha = .2) +
+    coord_flip() +
+    theme_classic() +
+    theme(axis.text = element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "top",
+          legend.justification  = "right",
+          legend.title = element_text(size=14),
+          legend.text = element_text(size=14)) +
+    labs(title = clean_nm,
+         subtitle = paste("Average", clean_nm, "by", camel_clean(hrvar))) +
+    xlab(camel_clean(hrvar)) +
+    ylab(paste("Average weekly", clean_nm)) +
+    labs(caption = extract_date_range(data, return = "text"))
+
+  summary_table <-
+    plot_data %>%
+    select(group, !!sym(metric)) %>%
+    group_by(group) %>%
+    summarise(!!sym(metric) := mean(!!sym(metric)),
+              n = n())
+
+  if(return == "table"){
+
+    return(summary_table)
+
+    } else if(return == "plot"){
+
+    return(plot_object)
+
+    } else {
+
+    stop("Please enter a valid input for `return`.")
+
+    }
+}
--- a/R/create_bar_asis.R
+++ b/R/create_bar_asis.R
@ -0,0 +1,87 @@
+#' @title Create a bar chart without aggregation
+#'
+#' @description
+#' This function creates a bar chart directly from the aggregated / summarised data.
+#' Unlike `create_bar()` which performs a person-level aggregation, there is no
+#' calculation for `create_bar_asis()` and the values are rendered as they are passed
+#' into the function.
+#'
+#' @param data Plotting data as a data frame.
+#' @param group_var String containing name of variable for the group.
+#' @param bar_var String containing name of variable representing the value of the bars.
+#' @param title Title of the plot.
+#' @param subtitle Subtitle of the plot.
+#' @param caption Caption of the plot.
+#' @param ylab Y-axis label for the plot (group axis)
+#' @param xlab X-axis label of the plot (bar axis).
+#' @param percent Logical value to determine whether to show labels as percentage signs. Defaults to FALSE.
+#' @param bar_colour String to specify colour to use for bars.
+#' In-built accepted values include "default" (default), "alert" (red), and
+#' "darkblue". Otherwise, hex codes are also accepted. You can also supply
+#' RGB values via `rgb2hex()`.
+#'
+#' @import ggplot2
+#' @import dplyr
+#'
+#' @export
+create_bar_asis <- function(data,
+                            group_var,
+                            bar_var,
+                            title = NULL,
+                            subtitle = NULL,
+                            caption = NULL,
+                            ylab = group_var,
+                            xlab = bar_var,
+                            percent = FALSE,
+                            bar_colour = "default"){
+
+  ## Colour bar override
+  if(bar_colour == "default"){
+
+    bar_colour <- rgb2hex(0, 120, 212)
+
+  } else if(bar_colour == "alert"){
+
+    bar_colour <- rgb2hex(215, 59, 1)
+
+  } else if(bar_colour == "darkblue"){
+
+    bar_colour <- rgb2hex(0, 32, 80)
+
+  }
+
+  up_break <- max(data[[bar_var]], na.rm = TRUE) * 1.3
+
+  if(percent == FALSE){
+    returnPlot <-
+      data %>%
+      ggplot(aes(x = reorder(!!sym(group_var), !!sym(bar_var)), y = !!sym(bar_var))) +
+      geom_col(fill = bar_colour) +
+      geom_text(aes(label = round(!!sym(bar_var), 2)),
+                hjust = -0.25,
+                color = "#000000",
+                fontface = "bold",
+                size = 4)
+  } else if(percent == TRUE){
+    returnPlot <-
+      data %>%
+      ggplot(aes(x = reorder(!!sym(group_var), !!sym(bar_var)), y = !!sym(bar_var))) +
+      geom_col(fill = bar_colour) +
+      geom_text(aes(label = scales::percent(!!sym(bar_var), accuracy = 1)),
+                hjust = -0.25,
+                color = "#000000",
+                fontface = "bold",
+                size = 4)
+
+  }
+
+  returnPlot +
+    scale_y_continuous(limits = c(0, up_break)) +
+    coord_flip() +
+    labs(title = title,
+         subtitle = subtitle,
+         caption = caption,
+         y = xlab,
+         x = ylab) +
+    theme_wpa_basic()
+}
--- a/R/create_boxplot.R
+++ b/R/create_boxplot.R
@ -0,0 +1,148 @@
+#' @title Box Plot (General Purpose)
+#'
+#' @description
+#' Analyzes a selected metric and returns a a 'fizzy' scatter plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @details
+#' This is a general purpose function that powers all the functions
+#' in the package that produce box plots.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats median
+#' @importFrom stats sd
+#'
+#' @family General
+#'
+#' @examples
+#' ## Create a fizzy plot for Work Week Span by Level Designation
+#' create_boxplot(sq_data, metric = "Workweek_span", hrvar = "LevelDesignation", return = "plot")
+#'
+#' ## Create a summary statistics table for Work Week Span by Organization
+#' create_boxplot(sq_data, metric = "Workweek_span", hrvar = "Organization", return = "table")
+#'
+#' ## Create a fizzy plot for Collaboration Hours by Level Designation
+#' create_boxplot(sq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation", return = "plot")
+#' @export
+
+create_boxplot <- function(data,
+                           metric,
+                           hrvar = "Organization",
+                           mingroup = 5,
+                           return = "plot") {
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          metric,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Clean metric name
+  clean_nm <- gsub(pattern = "_", replacement = "", x = metric)
+
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    group_by(PersonId, group) %>%
+    summarise(!!sym(metric) := mean(!!sym(metric))) %>%
+    ungroup() %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group") %>%
+    filter(Employee_Count >= mingroup)
+
+  ## Get max value
+  max_point <- max(plot_data[[metric]]) * 1.2
+
+  plot_legend <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Employee_Count = first(Employee_Count)) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count))
+
+  ## summary table
+  summary_table <-
+    plot_data %>%
+    select(group, tidyselect::all_of(metric)) %>%
+    group_by(group) %>%
+    summarise(mean = mean(!!sym(metric)),
+              median = median(!!sym(metric)),
+              sd = sd(!!sym(metric)),
+              min = min(!!sym(metric)),
+              max = max(!!sym(metric)),
+              range = max - min,
+              n = n())
+
+  ## group order
+  group_ord <-
+    summary_table %>%
+    arrange(desc(mean)) %>%
+    pull(group)
+
+  plot_object <-
+    plot_data %>%
+    mutate(group = factor(group, levels = group_ord)) %>%
+    ggplot(aes(x = group, y = !!sym(metric))) +
+    geom_boxplot(color = "#578DB8") +
+    ylim(0, max_point) +
+    annotate("text", x = plot_legend$group, y = 0, label = plot_legend$Employee_Count) +
+    scale_x_discrete(labels = scales::wrap_format(10)) +
+    theme_classic() +
+    theme(axis.text=element_text(size=12),
+          axis.text.x = element_text(angle = 30, hjust = 1),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "top",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+    labs(title = clean_nm,
+         subtitle = paste("Distribution of",
+                          tolower(clean_nm),
+                          "by",
+                          camel_clean(hrvar))) +
+    xlab(hrvar) +
+    ylab(paste("Average", clean_nm)) +
+    labs(caption = extract_date_range(data, return = "text"))
+
+
+
+  if(return == "table"){
+
+    summary_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else if(return == "data"){
+
+    plot_data %>%
+      mutate(group = factor(group, levels = group_ord)) %>%
+      arrange(desc(group))
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/create_dist.R
+++ b/R/create_dist.R
@ -0,0 +1,152 @@
+#' @title Horizontal 100 percent stacked bar plot (General Purpose)
+#'
+#' @description
+#' Provides an analysis of the distribution of a selected metric.
+#' Returns a stacked bar plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param cut A numeric vector of length three to specify the breaks for the distribution,
+#' e.g. c(10, 15, 20)
+#' @param dist_colours A character vector of length four to specify colour
+#' codes for the stacked bars.
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom tidyr spread
+#' @importFrom stats median
+#' @importFrom stats sd
+#'
+#' @family Collaboration
+#'
+#' @examples
+#' ## Return a plot
+#' create_dist(sq_data, metric = "Collaboration_hours", hrvar = "Organization")
+#'
+#' ## Return a table
+#' create_dist(sq_data, metric = "Collaboration_hours", hrvar = "Organization", return = "table")
+#' @export
+
+create_dist <- function(data,
+                        metric,
+                        hrvar = "Organization",
+                        mingroup = 5,
+                        return = "plot",
+                        cut = c(15, 20, 25),
+                        dist_colours = c("#FE7F4F",
+                                         "#ffdfd3",
+                                         "#bed6f2",
+                                         "#e9f1fb")) {
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          metric,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Clean metric name
+  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+
+  ## Basic Data for bar plot
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>%
+    group_by(PersonId, group) %>%
+    summarise(!!sym(metric) := mean(!!sym(metric))) %>%
+    ungroup() %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group") %>%
+    filter(Employee_Count >= mingroup)
+
+  ## Create buckets of collaboration hours
+  plot_data <-
+    plot_data %>%
+    mutate(bucket_hours = cut_hour(!!sym(metric),
+                                          cuts = cut))
+
+  ## Employee count / base size table
+  plot_legend <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Employee_Count=first(Employee_Count)) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count))
+
+  ## Data for bar plot
+  plot_table <-
+    plot_data %>%
+    group_by(group, bucket_hours) %>%
+    summarize(Employees=n(),
+              Employee_Count=first(Employee_Count),
+              percent= Employees / Employee_Count ) %>%
+    arrange(group, desc(bucket_hours))
+
+  ## Table for annotation
+  annot_table <-
+    plot_legend %>%
+    dplyr::left_join(plot_table, by = "group")
+
+  ## Bar plot
+
+  plot_object <-
+    plot_table %>%
+    ggplot(aes(x = group, y=Employees, fill = bucket_hours)) +
+    geom_bar(stat = "identity", position = position_fill(reverse = TRUE)) +
+    scale_y_continuous(labels = function(x) paste0(x*100, "%")) +
+    coord_flip() +
+    annotate("text", x = plot_legend$group, y = -.05, label = plot_legend$Employee_Count ) +
+    scale_fill_manual(name="",
+                      values = rev(dist_colours)) +
+    theme_wpa_basic() +
+    labs(title = clean_nm,
+         subtitle = paste("Distribution of", clean_nm, "by", camel_clean(hrvar))) +
+    xlab(camel_clean(hrvar)) +
+    ylab("Fraction of employees") +
+    labs(caption = extract_date_range(data, return = "text"))
+
+  ## Table to return
+  return_table <-
+    plot_table %>%
+    select(group, bucket_hours, percent) %>%
+    spread(bucket_hours,  percent) %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group")
+
+
+  if(return == "table"){
+
+    return_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
+
+#' @rdname collaboration_dist
+#' @export
+collaboration_distribution <- collaboration_dist
--- a/R/create_dt.R
+++ b/R/create_dt.R
@ -0,0 +1,47 @@
+#' @title Create interactive tables in HTML with 'download' buttons.
+#'
+#' @description
+#' See https://martinctc.github.io/blog/vignette-downloadable-tables-in-rmarkdown-with-the-dt-package/ for more.
+#'
+#' @param x Data frame to be passed through.
+#' @param rounding Numeric vector to specify the number of decimal points to display
+#' @param freeze Number of columns from the left to 'freeze'. Defaults to 2,
+#' which includes the row number column.
+#'
+#' @import DT
+#' @importFrom dplyr mutate_if
+#'
+#' @export
+create_dt <- function(x, rounding = 1, freeze = 2){
+
+  # Round all numeric to "rounding" number of dp
+  num_cols <- dplyr::select_if(x, is.numeric) %>% names()
+
+  if(length(num_cols) == 0){ # No numeric columns
+
+    DT::datatable(x,
+                  extensions = c('Buttons',
+                                 'FixedColumns'),
+                  options = list(dom = 'Blfrtip',
+                                 fixedColumns = list(leftColumns = freeze),
+                                 scrollX = TRUE,
+                                 buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
+                                 lengthMenu = list(c(10,25,50,-1),
+                                                   c(10,25,50,"All"))))
+
+  } else {
+
+    DT::datatable(x,
+                  extensions = c('Buttons',
+                                 'FixedColumns'),
+                  options = list(dom = 'Blfrtip',
+                                 fixedColumns = list(leftColumns = freeze),
+                                 scrollX = TRUE,
+                                 buttons = c('copy', 'csv', 'excel', 'pdf', 'print'),
+                                 lengthMenu = list(c(10,25,50,-1),
+                                                   c(10,25,50,"All")))) %>%
+      DT::formatRound(columns = num_cols, rounding)
+
+
+  }
+}
--- a/R/create_fizz.R
+++ b/R/create_fizz.R
@ -0,0 +1,136 @@
+#' @title Fizzy Drink / Jitter Scatter Plot (General Purpose)
+#'
+#' @description
+#' Analyzes a selected metric and returns a a 'fizzy' scatter plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @details
+#' This is a general purpose function that powers all the functions
+#' in the package that produce 'fizzy drink' / jitter scatter plots.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats median
+#' @importFrom stats sd
+#'
+#' @family General
+#'
+#' @examples
+#' ## Create a fizzy plot for Work Week Span by Level Designation
+#' create_fizz(sq_data, metric = "Workweek_span", hrvar = "LevelDesignation", return = "plot")
+#'
+#' ## Create a summary statistics table for Work Week Span by Organization
+#' create_fizz(sq_data, metric = "Workweek_span", hrvar = "Organization", return = "table")
+#'
+#' ## Create a fizzy plot for Collaboration Hours by Level Designation
+#' create_fizz(sq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation", return = "plot")
+#' @export
+
+create_fizz <- function(data,
+                        metric,
+                        hrvar = "Organization",
+                        mingroup = 5,
+                        return = "plot") {
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          metric,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Clean metric name
+  clean_nm <- gsub(pattern = "_", replacement = "", x = metric)
+
+  ## Plot data
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    group_by(PersonId, group) %>%
+    summarise(!!sym(metric) := mean(!!sym(metric))) %>%
+    ungroup() %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group") %>%
+    filter(Employee_Count >= mingroup)
+
+  ## Get max value
+  max_point <- max(plot_data[[metric]]) * 1.2
+
+  plot_legend <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Employee_Count = first(Employee_Count)) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count))
+
+  plot_object <-
+    plot_data %>%
+    ggplot(aes(x = group, y = !!sym(metric))) +
+    geom_point(size = 1,
+               alpha = 1/5,
+               color = "#578DB8",
+               position = position_jitter(width = 0.1, height = 0.1)) +
+    ylim(0, max_point) +
+    annotate("text", x = plot_legend$group, y = 0, label = plot_legend$Employee_Count) +
+    scale_x_discrete(labels = scales::wrap_format(10)) +
+    theme_classic() +
+    theme(axis.text=element_text(size=12),
+          axis.text.x = element_text(angle = 30, hjust = 1),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "top",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+    labs(title = clean_nm,
+         subtitle = paste("Distribution of",
+                          tolower(clean_nm),
+                          "by",
+                          camel_clean(hrvar))) +
+    xlab(hrvar) +
+    ylab(paste("Average", clean_nm)) +
+    labs(caption = extract_date_range(data, return = "text"))
+
+  summary_table <-
+    plot_data %>%
+    select(group, tidyselect::all_of(metric)) %>%
+    group_by(group) %>%
+    summarise(mean = mean(!!sym(metric)),
+              median = median(!!sym(metric)),
+              sd = sd(!!sym(metric)),
+              min = min(!!sym(metric)),
+              max = max(!!sym(metric)),
+              range = max - min,
+              n = n())
+
+  if(return == "table"){
+
+    summary_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/create_line.R
+++ b/R/create_line.R
@ -0,0 +1,140 @@
+#' @title Time Trend - Line Chart (General Purpose)
+#'
+#' @description
+#' Provides a week by week view of a selected metric, visualised as line charts.
+#' By default returns a line chart for the defined metric,
+#' with a separate panel per value in the HR attribute.
+#' Additional options available to return a summary table.
+#'
+#' @details
+#' This is a general purpose function that powers all the functions
+#' in the package that produce faceted line plots.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom tidyselect all_of
+#'
+#' @family General
+#'
+#' @examples
+#' \dontrun{
+#' sq_data %>% create_line(metric = "Email_hours", return = "plot")
+#'
+#' sq_data %>% create_line(metric = "Collaboration_hours", return = "plot")
+#'
+#' sq_data %>% create_line(metric = "Workweek_span", hrvar = "LevelDesignation")
+#' }
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+create_line <- function(data,
+                        metric,
+                        hrvar = "Organization",
+                        mingroup = 5,
+                        return = "plot"){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          metric,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+
+  ## Clean metric name
+  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+
+  myTable <-
+    data %>%
+    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, Date, group, all_of(metric)) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy threshold
+
+  myTable <-
+    myTable %>%
+    group_by(Date, group) %>%
+    summarize(Employee_Count = mean(Employee_Count),
+              !!sym(metric) := mean(!!sym(metric)))
+
+  ## Data frame to return
+  myTable_return <-
+    myTable %>%
+    select(Date, group, all_of(metric)) %>%
+    spread(Date, !!sym(metric))
+
+  ## Data frame for creating plot
+  myTable_plot <-
+    myTable %>%
+    select(Date, group, all_of(metric)) %>%
+    group_by(Date, group) %>%
+    summarise_at(vars(all_of(metric)), ~mean(., na.rm = TRUE)) %>%
+    ungroup()
+
+
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x = Date, y = !!sym(metric))) +
+    geom_line(colour = "grey40") +
+    facet_wrap(.~group) +
+    scale_fill_gradient(name="Hours", low = "white", high = "red") +
+    theme_classic() +
+    theme(plot.title = element_text(color = "grey40",
+                                    face = "bold",
+                                    size = 18),
+          plot.subtitle = element_text(size = 14),
+          strip.background = element_rect(color = "grey40",
+                                          fill = "grey40"),
+          strip.text = element_text(size = 10,
+                                    colour = "#FFFFFF",
+                                    face = "bold"),
+          axis.text = element_text(size = 8, face = "bold"),
+          axis.line = element_line(colour = "grey40"),
+          legend.position = "right",
+          legend.justification = "right",
+          legend.title=element_text(size = 10),
+          legend.text=element_text(size = 10)) +
+    labs(title = clean_nm,
+         subtitle = paste("Total",
+                          tolower(clean_nm),
+                          "by",
+                          camel_clean(hrvar))) +
+    xlab("Date") +
+    ylab("Weekly hours") +
+    labs(caption = extract_date_range(data, return = "text"))
+
+  if(return == "table"){
+
+    myTable_return %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/create_period_scatter.R
+++ b/R/create_period_scatter.R
@ -0,0 +1,185 @@
+#' @title Period comparison scatter plot (General Purpose)
+#'
+#' @description
+#' Returns two side-by-side scatter plots representing two selected metrics,
+#' using colour to map an HR attribute and size to represent number of employees.
+#' Returns a faceted scatter plot by default, with additional options
+#' to return a summary table.
+#'
+#' @details
+#' This is a general purpose function that powers all the functions
+#' in the package that produce faceted scatter plots.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector,
+#' defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param metric_x Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param metric_y Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param before_start Start date of "before" time period in YYYY-MM-DD
+#' @param before_end End date of "before" time period in YYYY-MM-DD
+#' @param after_start Start date of "after" time period in YYYY-MM-DD
+#' @param after_end End date of "after" time period in YYYY-MM-DD
+#' @param before_label String to specify a label for the "before" period. Defaults to "Period 1".
+#' @param after_label String to specify a label for the "after" period. Defaults to "Period 2".
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#'
+#' @family General
+#'
+#' @examples
+#' \dontrun{
+#'
+#'create_period_scatter(sq_data,
+#'hrvar = "LevelDesignation",
+#'before_start = "2019-11-03",
+#'before_end = "2019-12-31",
+#'after_start = "2020-01-01",
+#'after_end = "2020-01-26")
+#'
+#'
+#' create_period_scatter(sq_data, before_end = "2019-12-31", return = "table")
+#' }
+#'
+#' @export
+create_period_scatter <- function(data,
+                                  hrvar = "Organization",
+                                  metric_x = "Multitasking_meeting_hours",
+                                  metric_y = "Meeting_hours",
+                                  before_start = min(as.Date(data$Date, "%m/%d/%Y")),
+                                  before_end,
+                                  after_start = as.Date(before_end) + 1,
+                                  after_end = max(as.Date(data$Date, "%m/%d/%Y")),
+                                  before_label = "Period 1",
+                                  after_label = "Period 2",
+                                  mingroup = 5,
+                                  return = "plot"){
+
+  ## Check inputs
+  ## Update these column names as per appropriate
+  required_variables <- c("Date",
+                          hrvar,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+
+  daterange_1_start <- as.Date(before_start)
+  daterange_1_end <- as.Date(before_end)
+  daterange_2_start <- as.Date(after_start)
+  daterange_2_end <- as.Date(after_end)
+
+  # Fix dates format for WpA Queries
+  WpA_dataset <- data %>% mutate(Date = as.Date(Date, "%m/%d/%Y"))
+
+  # Check for dates in data file
+  if (daterange_1_start < min(WpA_dataset$Date) |
+      daterange_1_start > max(WpA_dataset$Date) |
+      daterange_1_end < min(WpA_dataset$Date) |
+      daterange_1_end > max(WpA_dataset$Date) |
+      daterange_2_start < min(WpA_dataset$Date) |
+      daterange_2_start > max(WpA_dataset$Date) |
+      daterange_2_end < min(WpA_dataset$Date) |
+      daterange_2_end > max(WpA_dataset$Date)) {
+    stop('Dates not found in dataset')
+    geterrmessage()
+  }
+
+  ## Employee count
+  emp_count <-
+    WpA_dataset %>%
+    group_by(!!sym(hrvar)) %>%
+    summarise(n = n_distinct(PersonId))
+
+  data_p1 <-
+    WpA_dataset %>%
+    rename(group = hrvar) %>%
+    filter(between(Date, daterange_1_start, daterange_1_end)) %>%
+    group_by(PersonId, group) %>%
+    summarise_at(vars(!!sym(metric_x), !!sym(metric_y)), ~mean(.)) %>%
+    ungroup() %>%
+    group_by(group) %>%
+    summarise_at(vars(!!sym(metric_x), !!sym(metric_y)), ~mean(., na.rm = TRUE)) %>%
+    mutate(Period = before_label) %>%
+    left_join(emp_count, by = c(group = hrvar)) %>%
+    filter(n >= mingroup)
+
+  data_p2 <-
+    WpA_dataset %>%
+    rename(group = hrvar) %>%
+    filter(between(Date, daterange_2_start, daterange_2_end)) %>%
+    group_by(PersonId, group) %>%
+    summarise_at(vars(!!sym(metric_x), !!sym(metric_y)), ~mean(.)) %>%
+    ungroup() %>%
+    group_by(group) %>%
+    summarise_at(vars(!!sym(metric_x), !!sym(metric_y)), ~mean(., na.rm = TRUE)) %>%
+    mutate(Period = after_label) %>%
+    left_join(emp_count, by = c(group = hrvar)) %>%
+    filter(n >= mingroup)
+
+  ## bind data
+  data_both <- rbind(data_p1, data_p2)
+
+  date_range_str <-
+    paste("Data from",
+          daterange_1_start,
+          "to",
+          daterange_1_end,
+          "and",
+          daterange_2_start,
+          "to",
+          daterange_2_end)
+
+  clean_x <- gsub(pattern = "_", replacement = " ", x = metric_x)
+  clean_y <- gsub(pattern = "_", replacement = " ", x = metric_y)
+
+  plot_title <-
+    paste(clean_x, "and", clean_y,
+          "over", before_label,
+          "and", after_label)
+
+
+  plot_object <-
+    data_both %>%
+    ggplot(aes(x = !!sym(metric_x),
+               y = !!sym(metric_y),
+               colour = group,
+               size = n)) +
+    geom_point(alpha = 0.5) +
+    scale_size(range = c(1, 20)) +
+    facet_wrap(.~Period) +
+    guides(size = FALSE) +
+    theme(legend.position = "bottom") +
+    ggtitle(plot_title,
+            subtitle = paste("Weekly data by", hrvar)) +
+    ylab(clean_y) +
+    xlab(clean_x) +
+    labs(caption = date_range_str) +
+    wpa::theme_wpa_basic()
+
+  if(return == "table"){
+
+    # return(myTable_return)
+    return(data_both)
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
+
+
--- a/R/create_rank.R
+++ b/R/create_rank.R
@ -0,0 +1,70 @@
+#' @title Create Ranking
+#'
+#' @description
+#' This function scans a standard query output for groups with high levels of a given Workplace Analytics Metric.
+#' Returns a table with a all of groups (across multiple HR attributes) ranked by the specified metric.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "table" (default) and "df" (data frame)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family General
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+create_rank <- function(data,
+                        metric,
+                        hrvar = extract_hr(data),
+                        mingroup = 5,
+                        return = "table"){
+
+  results <-
+    data %>% create_bar(
+             metric = metric,
+             hrvar = hrvar[1],
+             mingroup = mingroup,
+             return = return, bar_colour = "default")
+
+  results$hrvar <- ""
+
+  results <- results[0,]
+
+  	for (p in hrvar) {
+  	table1 <-
+  	  data %>%
+	  create_bar(metric = metric,
+             hrvar = p,
+             mingroup = mingroup,
+             return = "table", bar_colour = "default")
+
+  	table1$hrvar <- p
+
+  	results <- rbind(results,table1)
+  	}
+
+  output <-
+    results %>% arrange(desc(get(metric))) %>%
+    select(hrvar, everything())
+
+  if(return == "table"){
+    return(output)
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/create_scatter.R
+++ b/R/create_scatter.R
@ -0,0 +1,118 @@
+#' @title Scatter plot (General Purpose)
+#'
+#' @description
+#' Returns a scatter plot of two selected metrics, using colour to map
+#' an HR attribute.
+#' Returns a scatter plot by default, with additional options
+#' to return a summary table.
+#'
+#' @details
+#' This is a general purpose function that powers all the functions
+#' in the package that produce scatter plots.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric_x Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param metric_y Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import scales
+#'
+#' @family General
+#'
+#' @examples
+#' create_scatter(sq_data,
+#' "Internal_network_size",
+#' "External_network_size",
+#' "Organization")
+#'
+#' create_scatter(sq_data,
+#' "Generated_workload_call_hours",
+#' "Generated_workload_email_hours",
+#' "Organization", mingroup = 100, return = "plot")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+create_scatter <- function(data,
+                           metric_x = "Internal_network_size",
+                           metric_y = "External_network_size",
+                           hrvar = "Organization",
+                           mingroup = 5,
+                           return = "plot"){
+
+  ## Check inputs
+  required_variables <- c(hrvar,
+                          metric_x,
+                          metric_y,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Extract values violating privacy threshold
+  violate_thres_chr <-
+    data %>%
+    group_by(!!sym(hrvar)) %>%
+    summarise(n = n_distinct(PersonId)) %>%
+    filter(n < mingroup) %>%
+    pull(!!sym(hrvar))
+
+
+  clean_x <- gsub(pattern = "_", replacement = " ", x = metric_x)
+  clean_y <- gsub(pattern = "_", replacement = " ", x = metric_y)
+
+  myTable <-
+    data %>%
+    filter(!(!!sym(hrvar) %in% violate_thres_chr)) %>%
+    group_by(PersonId, !!sym(hrvar)) %>%
+    summarise_at(vars(!!sym(metric_x),
+                      !!sym(metric_y)),
+                 ~mean(.)) %>%
+    ungroup()
+
+  plot_object <-
+    myTable %>%
+    ggplot(aes(x = !!sym(metric_x),
+               y = !!sym(metric_y),
+               colour = !!sym(hrvar))) +
+    geom_point(alpha = 0.5) +
+    labs(title = paste0(clean_x, " and\n", clean_y, " by ", camel_clean(hrvar)),
+         caption = extract_date_range(data, return = "text")) +
+    xlab(clean_x) +
+    ylab(clean_y) +
+    theme_wpa_basic()
+
+  myTable_return <-
+    myTable %>%
+    group_by(!!sym(hrvar)) %>%
+    summarise_at(vars(!!sym(metric_x),
+                      !!sym(metric_y)),
+                 ~mean(.))
+
+  if(return == "table"){
+
+    return(myTable_return)
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
+
--- a/R/create_stacked.R
+++ b/R/create_stacked.R
@ -0,0 +1,174 @@
+#' @title Horizontal stacked bar plot (General Purpose)
+#'
+#' @description
+#' Creates a sum total calculation using selected metrics,
+#' where the typical use case is to create different definitions
+#' of collaboration hours.
+#' Returns a stacked bar plot by default.
+#' Additional options available to return a summary table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metrics A character vector to specify variables to be used
+#' in calculating the "Total" value, e.g. c("Meeting_hours", "Email_hours").
+#' The order of the variable names supplied determine the order in which they
+#' appear on the stacked plot.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param stack_colours
+#' A character vector to specify the colour codes for the stacked bar charts.
+#' @param plot_title An option to override plot title.
+#' @param plot_subtitle An option to override plot subtitle.
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family General
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @examples
+#' \dontrun{
+#' sq_data %>%
+#'   create_stacked(hrvar = "LevelDesignation",
+#'                  metrics = c("Meeting_hours", "Email_hours"),
+#'                  return = "plot")
+#'
+#' sq_data %>%
+#'   create_stacked(hrvar = "FunctionType",
+#'                  metrics = c("Meeting_hours",
+#'                              "Email_hours",
+#'                              "Call_hours",
+#'                              "Instant_Message_hours"),
+#'                  return = "plot")
+#'
+#' sq_data %>%
+#'   create_stacked(hrvar = "FunctionType",
+#'                  metrics = c("Meeting_hours",
+#'                              "Email_hours",
+#'                              "Call_hours",
+#'                              "Instant_Message_hours"),
+#'                  return = "table")
+#'}
+#' @export
+create_stacked <- function(data,
+                           hrvar = "Organization",
+                           metrics = c("Meeting_hours",
+                                       "Email_hours"),
+                           mingroup = 5,
+                           return = "plot",
+                           stack_colours = c("#203864",
+                                             "#5191DD",
+                                             "darkgrey",
+                                             "lightgrey"),
+                           plot_title = "Collaboration Hours",
+                           plot_subtitle = "Weekly collaboration hours"){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          metrics,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  n_count <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    group_by(group) %>%
+    summarise(Employee_Count = n_distinct(PersonId))
+
+  ## Person level table
+  myTable <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, group, metrics) %>%
+    group_by(PersonId, group) %>%
+    summarise_at(vars(metrics), ~mean(.)) %>%
+    ungroup() %>%
+    mutate(Total = select(., metrics) %>% apply(1, sum)) %>%
+    left_join(n_count, by = "group") %>%
+    # Keep only groups above privacy threshold
+    filter(Employee_Count >= mingroup)
+
+  myTableReturn <-
+    myTable %>%
+    group_by(group) %>%
+    summarise_at(vars(metrics, Total), ~mean(.)) %>%
+    left_join(n_count, by = "group")
+
+  plot_table <-
+    myTable %>%
+    select(PersonId, group, metrics, Total) %>%
+    gather(Metric, Value, -PersonId, -group)
+
+  totalTable <-
+    plot_table %>%
+    filter(Metric == "Total") %>%
+    group_by(group) %>%
+    summarise(Total = mean(Value))
+
+  myTable_legends <-
+    n_count %>%
+    filter(Employee_Count >= mingroup) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count)) %>%
+    left_join(totalTable, by = "group")
+
+  ## Get maximum value
+  location <- max(myTable_legends$Total)
+
+  plot_object <-
+    plot_table %>%
+    filter(Metric != "Total") %>%
+    mutate(Metric = factor(Metric, levels = rev(metrics))) %>%
+    group_by(group, Metric) %>%
+    summarise_at(vars(Value), ~mean(.)) %>%
+    ggplot(aes(x = stats::reorder(group, Value, mean), y = Value, fill = Metric)) +
+    geom_bar(position = "stack", stat = "identity") +
+    geom_text(aes(label = round(Value, 1)),
+              position = position_stack(vjust = 0.5),
+              color = "#FFFFFF",
+              fontface = "bold") +
+    scale_y_continuous(limits = c(0, location * 1.25)) +
+    annotate("text",
+             x = myTable_legends$group,
+             y = location * 1.15,
+             label = myTable_legends$Employee_Count) +
+    annotate("rect", xmin = 0.5, xmax = length(myTable_legends$group) + 0.5, ymin = location * 1.05, ymax = location * 1.25, alpha = .2) +
+    scale_fill_manual(name="",
+                      values = stack_colours,
+                      breaks = metrics,
+                      labels = gsub("_", " ", metrics)) +
+    coord_flip() +
+    theme_wpa_basic() +
+    labs(title = plot_title,
+         subtitle = paste(plot_subtitle, "by",  camel_clean(hrvar)),
+         caption = extract_date_range(data, return = "text")) +
+    xlab(hrvar) +
+    ylab("Average weekly hours")
+
+  if(return == "table"){
+
+    myTableReturn
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
+
+
+
--- a/R/cut_hour.R
+++ b/R/cut_hour.R
@ -0,0 +1,43 @@
+#' @title Convert a numeric variable for hours into categorical
+#'
+#' @description
+#' Supply a numeric variable, e.g. `Collaboration_hours`,
+#' and the function returns a character vector
+#'
+#' @param metric A numeric variable representing hours.
+#' @param cuts A numeric variable of length 3 to represent the
+#' cut points required.
+#'
+#' @family General
+#'
+#' @examples
+#' ## Direct use
+#' cut_hour(1:30, cuts = c(15, 20, 25))
+#'
+#' ## Use on a query
+#' cut_hour(sq_data$Collaboration_hours, cuts = c(10, 15, 20))
+#'
+#' @export
+
+cut_hour <- function(metric, cuts){
+
+
+  label1 <- paste0("< ", cuts[1], " hours")
+  label2 <- paste0(cuts[1], " - ", cuts[2], " hours")
+  label3 <- paste0(cuts[2], " - ", cuts[3], " hours")
+  label4 <- paste0(cuts[3], "+ hours")
+
+  out <-
+    cut(metric,
+        breaks = c(0, cuts, 100),
+        include.lowest = TRUE,
+        labels = c(label1,
+                   label2,
+                   label3,
+                   label4))
+
+  # out <- as.character(out)
+  return(out)
+}
+
+
--- a/R/dv_data.R
+++ b/R/dv_data.R
@ -0,0 +1,81 @@
+#' @title Sample Standard Query dataset for Data Validation
+#'
+#' @description
+#' A dataset generated from a Standard Query from WpA.
+#'
+#' @format A data frame with 897 rows and 69 variables:
+#' \describe{
+#'   \item{PersonId}{ }
+#'   \item{Date}{ }
+#'   \item{Workweek_span}{ }
+#'   \item{Meetings_with_skip_level}{ }
+#'   \item{Meeting_hours_with_skip_level}{ }
+#'   \item{Generated_workload_email_hours}{ }
+#'   \item{Generated_workload_email_recipients}{ }
+#'   \item{Generated_workload_instant_messages_hours}{ }
+#'   \item{Generated_workload_instant_messages_recipients}{ }
+#'   \item{Generated_workload_call_hours}{ }
+#'   \item{Generated_workload_call_participants}{ }
+#'   \item{Generated_workload_calls_organized}{ }
+#'   \item{External_network_size}{ }
+#'   \item{Internal_network_size}{ }
+#'   \item{Networking_outside_company}{ }
+#'   \item{Networking_outside_organization}{ }
+#'   \item{After_hours_meeting_hours}{ }
+#'   \item{Open_1_hour_block}{ }
+#'   \item{Open_2_hour_blocks}{ }
+#'   \item{Total_focus_hours}{ }
+#'   \item{Low_quality_meeting_hours}{ }
+#'   \item{Total_emails_sent_during_meeting}{ }
+#'   \item{Meetings}{ }
+#'   \item{Meeting_hours}{ }
+#'   \item{Conflicting_meeting_hours}{ }
+#'   \item{Multitasking_meeting_hours}{ }
+#'   \item{Redundant_meeting_hours__lower_level_}{ }
+#'   \item{Redundant_meeting_hours__organizational_}{ }
+#'   \item{Time_in_self_organized_meetings}{ }
+#'   \item{Meeting_hours_during_working_hours}{ }
+#'   \item{Generated_workload_meeting_attendees}{ }
+#'   \item{Generated_workload_meeting_hours}{ }
+#'   \item{Generated_workload_meetings_organized}{ }
+#'   \item{Manager_coaching_hours_1_on_1}{ }
+#'   \item{Meetings_with_manager}{ }
+#'   \item{Meeting_hours_with_manager}{ }
+#'   \item{Meetings_with_manager_1_on_1}{ }
+#'   \item{Meeting_hours_with_manager_1_on_1}{ }
+#'   \item{After_hours_email_hours}{ }
+#'   \item{Emails_sent}{ }
+#'   \item{Email_hours}{ }
+#'   \item{Working_hours_email_hours}{ }
+#'   \item{After_hours_instant_messages}{ }
+#'   \item{Instant_messages_sent}{ }
+#'   \item{Instant_Message_hours}{ }
+#'   \item{Working_hours_instant_messages}{ }
+#'   \item{After_hours_collaboration_hours}{ }
+#'   \item{Collaboration_hours}{ }
+#'   \item{Collaboration_hours_external}{ }
+#'   \item{Working_hours_collaboration_hours}{ }
+#'   \item{After_hours_in_calls}{ }
+#'   \item{Total_calls}{ }
+#'   \item{Call_hours}{ }
+#'   \item{Working_hours_in_calls}{ }
+#'   \item{Domain}{ }
+#'   \item{FunctionType}{ }
+#'   \item{LevelDesignation}{ }
+#'   \item{Layer}{ }
+#'   \item{Region}{ }
+#'   \item{Organization}{ }
+#'   \item{zId}{ }
+#'   \item{attainment}{ }
+#'   \item{TimeZone}{ }
+#'   \item{HourlyRate}{ }
+#'   \item{IsInternal}{ }
+#'   \item{IsActive}{ }
+#'   \item{HireDate}{ }
+#'   \item{WorkingStartTimeSetInOutlook}{ }
+#'   \item{WorkingEndTimeSetInOutlook}{ }
+#'
+#'   ...
+#' }
+#' @source \url{https://workplaceanalytics-demo.office.com/en-us/Home}
+"dv_data"
--- a/R/em_data.R
+++ b/R/em_data.R
@ -0,0 +1,71 @@
+#' @title Sample Hourly Collaboration data with Email and IMs
+#'
+#' @description
+#' A sample dataset generated from an Hourly Collaboration query
+#' from the WpA demo tenant. The data is grouped by week and contains
+#' columns for both IMs sent and Emails sent.
+#'
+#' @format A data frame with 41567 rows and 56 variables:
+#' \describe{
+#'   \item{PersonId}{ }
+#'   \item{Date}{ }
+#'   \item{IMs_sent_23_24}{ }
+#'   \item{IMs_sent_22_23}{ }
+#'   \item{IMs_sent_21_22}{ }
+#'   \item{IMs_sent_20_21}{ }
+#'   \item{IMs_sent_19_20}{ }
+#'   \item{IMs_sent_18_19}{ }
+#'   \item{IMs_sent_17_18}{ }
+#'   \item{IMs_sent_16_17}{ }
+#'   \item{IMs_sent_15_16}{ }
+#'   \item{IMs_sent_14_15}{ }
+#'   \item{IMs_sent_13_14}{ }
+#'   \item{IMs_sent_12_13}{ }
+#'   \item{IMs_sent_11_12}{ }
+#'   \item{IMs_sent_10_11}{ }
+#'   \item{IMs_sent_09_10}{ }
+#'   \item{IMs_sent_08_09}{ }
+#'   \item{IMs_sent_07_08}{ }
+#'   \item{IMs_sent_06_07}{ }
+#'   \item{IMs_sent_05_06}{ }
+#'   \item{IMs_sent_04_05}{ }
+#'   \item{IMs_sent_03_04}{ }
+#'   \item{IMs_sent_02_03}{ }
+#'   \item{IMs_sent_01_02}{ }
+#'   \item{IMs_sent_00_01}{ }
+#'   \item{Emails_sent_23_24}{ }
+#'   \item{Emails_sent_22_23}{ }
+#'   \item{Emails_sent_21_22}{ }
+#'   \item{Emails_sent_20_21}{ }
+#'   \item{Emails_sent_19_20}{ }
+#'   \item{Emails_sent_18_19}{ }
+#'   \item{Emails_sent_17_18}{ }
+#'   \item{Emails_sent_16_17}{ }
+#'   \item{Emails_sent_15_16}{ }
+#'   \item{Emails_sent_14_15}{ }
+#'   \item{Emails_sent_13_14}{ }
+#'   \item{Emails_sent_12_13}{ }
+#'   \item{Emails_sent_11_12}{ }
+#'   \item{Emails_sent_10_11}{ }
+#'   \item{Emails_sent_09_10}{ }
+#'   \item{Emails_sent_08_09}{ }
+#'   \item{Emails_sent_07_08}{ }
+#'   \item{Emails_sent_06_07}{ }
+#'   \item{Emails_sent_05_06}{ }
+#'   \item{Emails_sent_04_05}{ }
+#'   \item{Emails_sent_03_04}{ }
+#'   \item{Emails_sent_02_03}{ }
+#'   \item{Emails_sent_01_02}{ }
+#'   \item{Emails_sent_00_01}{ }
+#'   \item{Domain}{ }
+#'   \item{LevelDesignation}{ }
+#'   \item{Layer}{ }
+#'   \item{Organization}{ }
+#'   \item{IsInternal}{ }
+#'   \item{IsActive}{ }
+#'
+#'  ...
+#' }
+#' @source \url{https://workplaceanalytics-demo.office.com/en-us/Home}
+"em_data"
+
--- a/R/email_dist.R
+++ b/R/email_dist.R
@ -0,0 +1,37 @@
+#' @title Email Hours distribution
+#'
+#' @description
+#' Analyze Email Hours distribution.
+#' Returns a stacked bar plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @inheritParams create_dist
+#'
+#' @family Emails
+#'
+#' @examples
+#' ## Return a plot
+#' email_dist(sq_data, hrvar = "Organization")
+#'
+#' ## Return a table
+#' email_dist(sq_data, hrvar = "Organization", return = "table")
+#'
+#' ## Return result with a custom specified breaks
+#' email_dist(sq_data, hrvar = "LevelDesignation", cut = c(4, 7, 9))
+#'
+#' @export
+
+email_dist <- function(data,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot",
+                       cut = c(5, 10, 15)) {
+
+  create_dist(data = data,
+              metric = "Email_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return,
+              cut = cut)
+
+}
--- a/R/email_fizz.R
+++ b/R/email_fizz.R
@ -0,0 +1,27 @@
+#' @title Distribution of Email Hours (Fizzy Drink plot)
+#'
+#' @description
+#' Analyze weekly email hours distribution, and returns
+#' a 'fizzy' scatter plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @inheritParams create_fizz
+#'
+#' @family Emails
+#'
+#' @examples
+#' email_fizz(sq_data, hrvar = "Organization", return = "table")
+#' @export
+
+email_fizz <- function(data,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot"){
+
+  create_fizz(data = data,
+              metric = "Email_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+
+}
--- a/R/email_line.R
+++ b/R/email_line.R
@ -0,0 +1,37 @@
+#' @title Email Time Trend - Line Chart
+#'
+#' @description
+#' Provides a week by week view of email time, visualised as line charts.
+#' By default returns a line chart for email hours,
+#' with a separate panel per value in the HR attribute.
+#' Additional options available to return a summary table.
+#'
+#' @inheritParams create_line
+#'
+#' @family Emails
+#'
+#' @examples
+#'
+#' ## Return a line plot
+#' email_line(sq_data, hrvar = "LevelDesignation")
+#'
+#'
+#' ## Return a table
+#' email_line(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+email_line <- function(data,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot"){
+  ## Inherit arguments
+  create_line(data = data,
+              metric = "Email_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+}
--- a/R/email_rank.R
+++ b/R/email_rank.R
@ -0,0 +1,47 @@
+#' @title Email Hours Ranking
+#'
+#' @description
+#' This function scans a standard query output for groups with high levels of 'Weekly Email Collaboration'.
+#' Returns a table with a all of groups (across multiple HR attributes) ranked by hours of digital collaboration.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "html" (default, returning an interactive DataTable)
+#' and "df" (data frame)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family Emails
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+email_rank <- function(data,
+                               hrvar = extract_hr(data),
+                               mingroup = 5,
+                               return = "table"){
+
+  output <-
+    data %>% create_rank(metric="Email_hours",   hrvar = hrvar,
+                               mingroup = mingroup,
+                               return = "table")
+
+  if(return == "html"){
+    return(create_dt(output))
+  } else if(return == "table"){
+    return(output)
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/email_summary.R
+++ b/R/email_summary.R
@ -0,0 +1,45 @@
+#' @title Email Summary
+#'
+#' @description
+#' Provides an overview analysis of weekly email hours.
+#' Returns a bar plot showing average weekly email hours by default.
+#' Additional options available to return a summary table.
+#'
+#' @inheritParams create_bar
+#'
+#'
+#' @family Emails
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @examples
+#' # Return a ggplot bar chart
+#' email_summary(sq_data, hrvar = "LevelDesignation")
+#'
+#' # Return a summary table
+#' email_summary(sq_data, hrvar = "LevelDesignation", return = "table")
+#' @export
+
+email_summary <- function(data,
+                              hrvar = "Organization",
+                              mingroup = 5,
+                              return = "plot"){
+  create_bar(data = data,
+             metric = "Email_hours",
+             hrvar = hrvar,
+             mingroup = mingroup,
+             return = return,
+             bar_colour = "darkblue")
+
+
+}
+
+
+
+
+
+
+
+
--- a/R/email_trend.R
+++ b/R/email_trend.R
@ -0,0 +1,99 @@
+#' Email Hours Time Trend
+#'
+#' Provides a week by week view of email time.
+#' By default returns a week by week heatmap, highlighting the points in time with most activity.
+#' Additional options available to return a summary table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Emails
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+email_trend <- function(data,
+                        hrvar = "Organization",
+                        mingroup = 5,
+                        return = "plot"){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          "Email_hours",
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Date range data frame
+  myPeriod <- extract_date_range(data)
+
+  myTable <-
+    data %>%
+    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, Date, group, Email_hours) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy threshold
+
+  myTable <-
+    myTable %>%
+    group_by(Date, group) %>%
+    summarize(Employee_Count = mean(Employee_Count),
+              Email_hours = mean(Email_hours))
+
+  myTable_plot <- myTable %>% select(Date, group, Email_hours)
+
+  myTable_return <-  myTable_plot %>% spread(Date, Email_hours)
+
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x = Date , y = group , fill = Email_hours)) +
+    geom_tile(height=.5) +
+    scale_fill_gradient(name="Hours", low = "white", high = "red") +
+    theme_classic() +
+    theme(axis.text=element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "right",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+    labs(title = "Email Hours",
+         subtitle = paste("Total email time by", tolower(hrvar))) +
+    xlab("Date") +
+    ylab(hrvar) +
+    labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
+
+  if(return == "table"){
+
+    myTable_return %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/export.R
+++ b/R/export.R
@ -0,0 +1,76 @@
+#' @title Export {wpa} outputs to CSV, clipboard, or save as images
+#'
+#' @description
+#' A general use function to export {wpa} outputs to CSV,
+#' clipboard, or save as images. By default, `export()` copies
+#' a data frame to the clipboard. If the input is a ggplot object,
+#' the default behaviour is to export a PNG.
+#'
+#' @param x Data frame or ggplot object to be passed through.
+#' @param method Character string specifying the method of export.
+#' Valid inputs include "clipboard" (default), "csv", "png", and "svg".
+#' @param path If exporting a file, enter the path and the desired file name, _excluding the file extension_.
+#' For example, "Analysis/SQ Overview".
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name. Defaults to TRUE.
+#' @param width Width of the plot
+#' @param height Height of the plot
+#'
+#' @importFrom utils write.csv
+#'
+#' @export
+
+export <- function(x,
+                   method = "clipboard",
+                   path = "wpa export",
+                   timestamp = TRUE,
+                   width = 12,
+                   height = 9){
+
+  ## Create timestamped path (if applicable)
+
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+  ## Force method to png if is.ggplot and method not appropriate
+  if(is.ggplot(x) & method %in% c("clipboard", "csv")){
+    message("Input is a ggplot object. Defaulted to exporting as PNG...")
+    method <- "png"
+  }
+
+  ## Main export function
+  if(method == "clipboard"){
+    copy_df(x)
+    message(c("Data frame copied to clipboard.\n",
+              "You may paste the contents directly to Excel."))
+
+  ## Export option: CSV
+  } else if(method == "csv"){
+
+    newpath <- paste0(newpath, ".csv")
+
+    write.csv(x = x, file = newpath)
+
+  ## Export option: PNG
+  } else if(method == "png"){
+
+    newpath <- paste0(newpath, ".png")
+
+    ggsave(filename = newpath, plot = x, width = width, height = height)
+
+
+  ## Export option: SVG
+  } else if(method == "svg"){
+
+    newpath <- paste0(newpath, ".svg")
+
+    ggsave(filename = newpath, plot = x, width = width, height = height)
+
+  } else {
+
+    stop("Please check inputs. Enter `?export` for more details.")
+
+  }
+}
--- a/R/external_network_plot.R
+++ b/R/external_network_plot.R
@ -0,0 +1,111 @@
+#' @title Plot the external network metrics for a HR variable
+#'
+#' @description
+#' Plot the external network metrics for a HR variable.
+#'
+#' @param data Person Query as a dataframe including date column named "Date"
+#' This function assumes the data format is MM/DD/YYYY as is standard in a WpA query output.
+#' @param hrvar WpA variable for an HR variable to group networks by
+#'  For example, "Layer"
+#' @param mingroup Numeric vector for minimum group size for aggregation
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param bubble_size A numeric vector of length two to specify the size range of the bubbles
+#'
+#' @import dplyr
+#' @import reshape2
+#' @import ggplot2
+#' @import ggrepel
+#'
+#' @examples
+#' \dontrun{external_network_plot(sq_data)
+#
+#' }
+#'
+#' @family Connectivity
+#'
+#' @export
+
+external_network_plot <- function(data,
+                                  hrvar = "Organization",
+                                  mingroup = 5,
+                                  return = "plot",
+                                  bubble_size = c(1, 5)){
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar))
+
+  plot_data <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Ext_network_size = mean(External_network_size),
+              Ext_network_breadth = mean(Networking_outside_company),
+              Employee_count = n_distinct(PersonId)
+    ) %>%
+    filter(Employee_count >= mingroup)
+
+  summary_table <- plot_data %>% arrange(., desc(Ext_network_size))
+
+  ext_network_plot <- ggplot(plot_data,
+         aes(x=Ext_network_size,
+             y=Ext_network_breadth)
+  ) +
+    geom_point(aes(size=Employee_count),
+               color = rgb2hex(0, 120, 212),
+               alpha = 0.5) +
+    geom_text_repel(label=plot_data$group) +
+    scale_x_continuous(name = "External Network Size") +
+    scale_y_continuous(name = "External Network Breadth") +
+    scale_size(range = bubble_size) +
+    theme_classic() +
+    theme(
+      axis.text = element_text(size = 10),
+      axis.text.x = element_text(
+        angle = 90,
+        hjust = 1,
+        vjust = 0.5
+      ),
+      plot.title = element_text(
+        color = "grey40",
+        face = "bold",
+        size = 18
+      ),
+      plot.subtitle = element_text(size = 14)
+    ) +
+    labs(
+      size = "Size"
+    ) +
+    labs(
+      title = paste("External network metrics by", hrvar),
+      subtitle = paste(
+        "Network size is number of people, breadth is number of organizations"
+      )
+    ) +
+    labs(
+      caption = paste(
+        "Total employees =",
+        sum(plot_data$Employee_count),
+        "| Data from",
+        min(as.Date(data$Date, "%m/%d/%Y")),
+        "to",
+        max(as.Date(data$Date, "%m/%d/%Y"))
+      )
+    )
+
+  if(return == "table"){
+
+    summary_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(ext_network_plot)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/extract_hr.R
+++ b/R/extract_hr.R
@ -0,0 +1,48 @@
+#' @title Extract HR attribute variables
+#'
+#' @description
+#' This function uses a combination of variable class,
+#' number of unique values, and regular expression matching
+#' to extract HR / organisational attributes from a data frame.
+#'
+#' @param data A data frame to be passed through.
+#' @param max_unique A numeric value representing the maximum
+#' number of unique values to accept for an HR attribute. Defaults to 50.
+#' @param return Character string with values "names" (default)
+#' or "vars" to determine whether the function returns a character
+#' vector of variable names or a data frame containing the HR
+#' attributes.
+#'
+#' @family General
+#'
+#' @examples
+#' sq_data %>% extract_hr(return = "names")
+#'
+#' sq_data %>% extract_hr(return = "vars")
+#'
+#' @export
+extract_hr <- function(data,
+                       max_unique = 50,
+                       return = "names"){
+
+  hr_var <-
+    data %>%
+    dplyr::select_if(~(is.character(.) | is.logical(.) | is.factor(.))) %>%
+    dplyr::select_if(~(dplyr::n_distinct(.) < max_unique)) %>%
+    dplyr::select_if(~!all(is_date_format(.))) %>%
+    names() %>%
+    .[.!= "WorkingStartTimeSetInOutlook"] %>%
+    .[.!= "WorkingEndTimeSetInOutlook"] %>%
+    .[.!= "WorkingDaysSetInOutlook"]
+
+  if(return == "names"){
+    return(hr_var)
+  } else if(return == "vars"){
+    return(dplyr::select(data, tidyselect::all_of(hr_var)))
+  } else {
+   stop("Invalid input for return")
+  }
+}
+
+
+
--- a/R/flag_ch_ratio.R
+++ b/R/flag_ch_ratio.R
@ -0,0 +1,80 @@
+#' @title Flag unusual high collaboration hours to after-hours collaboration hours ratio
+#'
+#' @description This function flags persons who have an unusual ratio
+#' of collaboration hours to after-hours collaboration hours.
+#' Returns a character string by default.
+#'
+#' @import dplyr
+#'
+#' @param data A data frame containing a Person Query.
+#' @param threshold Numeric value specifying the threshold for flagging. Defaults to 30.
+#' @param return String to specify what to return. Options include "message" for console return, and "text" for string return.
+#'
+#' @family Data Validation
+#'
+#' @examples
+#' flag_ch_ratio(sq_data)
+#'
+#' \dontrun{
+#'  tibble(PersonId = c("Alice", "Bob"),
+#'         Collaboration_hours = c(30, 0.5),
+#'         After_hours_collaboration_hours = c(0.5, 30)) %>%
+#'    flag_ch_ratio()
+#' }
+#'
+#' @export
+flag_ch_ratio <- function(data, threshold = c(1, 30), return = "message"){
+
+  min_thres <- min(threshold, na.rm = TRUE)
+  max_thres <- max(threshold, na.rm = TRUE)
+
+  ## Check for high collab hours but lower afterhour collab hours
+  ## Because of faulty outlook settings
+  ch_summary <-
+    data %>%
+    group_by(PersonId) %>%
+    summarise_at(vars(Collaboration_hours, After_hours_collaboration_hours), ~mean(.)) %>%
+    mutate(CH_ratio = Collaboration_hours / After_hours_collaboration_hours) %>%
+    arrange(desc(CH_ratio)) %>%
+    mutate(CH_FlagLow = ifelse(CH_ratio < min_thres, TRUE, FALSE),
+           CH_FlagHigh = ifelse(CH_ratio > max_thres, TRUE, FALSE),
+           CH_Flag = ifelse(CH_ratio > max_thres | CH_ratio < min_thres, TRUE, FALSE))
+
+  ## Percent of people with high collab hours + low afterhour collab hours  
+  CHFlagN <- sum(ch_summary$CH_Flag, na.rm = TRUE)
+  CHFlagProp <- mean(ch_summary$CH_Flag, na.rm = TRUE)
+  CHFlagProp2 <- paste(round(CHFlagProp * 100), "%") # Formatted
+
+  CHFlagMessage_Warning <- paste0("[Warning]  The ratio of after-hours collaboration to total collaboration hours is outside the expected threshold for ", CHFlagN, " employees (", CHFlagProp2, " of the total).")
+  CHFlagMessage_Pass_Low <- paste0("[Pass] The ratio of after-hours collaboration to total collaboration hours is outside the expected threshold for only ", CHFlagN, " employees (", CHFlagProp2, " of the total).")
+  CHFlagMessage_Pass_Zero <- paste0("[Pass] The ratio of after-hours collaboration to total collaboration hours falls within the expected threshold for all employees.")
+						 
+
+  CHFlagLowN <- sum(ch_summary$CH_FlagLow, na.rm = TRUE)
+  CHFlagLowProp <- mean(ch_summary$CH_FlagLow, na.rm = TRUE)
+  CHFlagLowProp2 <- paste(round(CHFlagLowProp * 100), "%") # Formatted
+  CHFlagLowMessage <- paste0("- ", CHFlagLowN, " employees (", CHFlagLowProp2,
+                          ") have an unusually low after-hours collaboration")
+
+  CHFlagHighN <- sum(ch_summary$CH_FlagHigh, na.rm = TRUE)
+  CHFlagHighProp <- mean(ch_summary$CH_FlagHigh, na.rm = TRUE)
+  CHFlagHighProp2 <- paste(round(CHFlagHighProp * 100), "%") # Formatted
+  CHFlagHighMessage <- paste0("- ", CHFlagHighN, " employees (", CHFlagHighProp2 , ") have an unusually high after-hours collaboration (relative to weekly collaboration hours)")
+							 
+  if(CHFlagProp >= .05){
+    CHFlagMessage <- paste(CHFlagMessage_Warning, CHFlagHighMessage, CHFlagLowMessage, sep = "\n")
+  } else if(CHFlagProp < .05 & CHFlagProp2 > 0){
+    CHFlagMessage <- paste(CHFlagMessage_Pass_Low, CHFlagHighMessage, CHFlagLowMessage, sep = "\n")
+  } else if(CHFlagProp==0){
+    CHFlagMessage <- CHFlagMessage_Pass_Zero 
+  }
+  
+  ## Print diagnosis
+  ## Should implement options to return the PersonIds or a full data frame
+  if(return == "message"){
+    message(CHFlagMessage)
+  } else if(return == "text"){
+    CHFlagMessage
+  }
+
+}
--- a/R/flag_em_ratio.R
+++ b/R/flag_em_ratio.R
@ -0,0 +1,43 @@
+#' @title Flag unusual high Email hours to emails sent ratio
+#'
+#' @description This function flags persons who have an unusual ratio
+#' of email hours to emails sent.
+#' Returns a character string by default.
+#'
+#' @import dplyr
+#'
+#' @family Data Validation
+#'
+#' @param data A data frame containing a Person Query.
+#' @param threshold Numeric value specifying the threshold for flagging. Defaults to 1.
+#'
+#' @examples
+#' flag_em_ratio(sq_data)
+#'
+#' @export
+flag_em_ratio <- function(data, threshold = 1){
+
+  ## Check for high collab hours but lower afterhour collab hours
+  ## Because of faulty outlook settings
+  em_summary <-
+    data %>%
+    group_by(PersonId) %>%
+    summarise_at(vars(Email_hours, Emails_sent), ~mean(.)) %>%
+    mutate(Email_ratio = Email_hours / Emails_sent) %>%
+    arrange(desc(Email_ratio)) %>%
+    mutate(Email_Flag = ifelse(Email_ratio > threshold, TRUE, FALSE))
+
+  ## Percent of people with high collab hours + low afterhour collab hours
+  EmailFlagN <- sum(em_summary$Email_Flag, na.rm = TRUE)
+  EmailFlagProp <- mean(em_summary$Email_Flag, na.rm = TRUE)
+  EmailFlagProp2 <- paste(round(EmailFlagProp * 100), "%") # Formatted
+  EmailFlagMessage <- paste0(EmailFlagProp2, " (", EmailFlagN, ") ",
+                         "of the population have an unusually high email hours to emails sent ratio.")
+
+  ## Print diagnosis
+  ## Should implement options to return the PersonIds or a full data frame
+  # EmailFlagMessage
+  em_summary
+}
+
+
--- a/R/flag_extreme.R
+++ b/R/flag_extreme.R
@ -0,0 +1,88 @@
+#' @title Warn if a certain metric exceeds an arbitrary threshold
+#'
+#' @description
+#' This is used as part of data validation to check if there are extreme values
+#' in the dataset.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param metric A character string specifying the metric to test.
+#' @param person A logical value to specify whether to calculate person-averages.
+#' Defaults to TRUE (person-averages calculated).
+#' @param threshold Numeric value specifying the threshold for flagging.
+#' @param return A character string specifying what to return.
+#'
+#' @family Data Validation
+#'
+#' @import dplyr
+#'
+#' @examples
+#' \dontrun{
+#' # The threshold values are intentionally set low to trigger messages.
+#' flag_extreme(sq_data, "Email_hours", threshold = 15)
+#' flag_extreme(sq_data, "Email_hours", threshold = 15, return = "table")
+#' flag_extreme(sq_data, "Email_hours", person = FALSE, threshold = 15)
+#' }
+#'
+#' @export
+flag_extreme <- function(data,
+                         metric,
+                         person = TRUE,
+                         threshold,
+                         return = "message"){
+
+  ## Data frame containing the extreme values
+  if(person == TRUE){
+    extreme_df <-
+      data %>%
+      rename(metric = !!sym(metric)) %>%
+      group_by(PersonId) %>%
+      summarise_at(vars(metric), ~mean(.)) %>%
+      filter(metric > threshold) %>%
+      rename(!!sym(metric) := "metric")
+  } else if(person == FALSE){
+    extreme_df <-
+      data %>%
+      rename(metric = !!sym(metric)) %>%
+      filter(metric > threshold) %>%
+      rename(!!sym(metric) := "metric")
+  }
+
+
+  ## Clean names for pretty printing
+  metric_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+  metric_nm <- camel_clean(metric_nm)
+
+  ## Define MessageLevel
+  if(person == TRUE){
+    MessageLevel <- " persons where their average "
+  } else if(person == FALSE){
+    MessageLevel <- " rows where their value of "
+  }
+
+  ## Define FlagMessage
+
+  if(nrow(extreme_df) == 0){
+    FlagMessage <-
+      paste0("[Pass] There are no ",
+             MessageLevel,
+             metric_nm,
+             " exceeds ",
+             threshold, ".")
+  } else {
+    FlagMessage <-
+      paste0("[Warning] There are ",
+             nrow(extreme_df),
+             MessageLevel,
+             metric_nm,
+             " exceeds ",
+             threshold, ".")
+  }
+
+  if(return == "text"){
+    FlagMessage
+  } else if(return == "message"){
+    message(FlagMessage)
+  } else if(return == "table"){
+    extreme_df
+  }
+}
--- a/R/flag_outlooktime.R
+++ b/R/flag_outlooktime.R
@ -0,0 +1,105 @@
+#' @title Flag unusual outlook time settings for work day start and end time
+#'
+#' @description This function flags unusual outlook calendar settings for
+#' start and end time of work day.
+#'
+#' @import dplyr
+#'
+#' @param data A data frame containing a Person Query.
+#' @param threshold A numeric vector of length two, specifying the hour threshold for flagging.
+#' Defaults to c(4, 15).
+#' @param return String to specify what to return.
+#' Valid options include "text" (default), "message", and "data".
+#'
+#' @family Data Validation
+#'
+#' @examples
+#' \dontrun{
+#' flag_outlooktime(sq_data)
+#' }
+#' @export
+flag_outlooktime <- function(data, threshold = c(4, 15), return = "message"){
+
+  # pad_times <- function(x){
+  #   if(nchar(x) == 1){
+  #     x <- paste0("0", x, "00")
+  #   } else if(nchar(x) == 2){
+  #     x <- paste0(x, "00")
+  #   } else if(nchar(x) == 3){
+  #     x <- paste0("0", x)
+  #   } else {
+  #     x
+  #   }
+  # }
+  #
+  # pad_times <- Vectorize(pad_times)
+
+  if(any(nchar(data$WorkingStartTimeSetInOutlook) != 5 | nchar(data$WorkingEndTimeSetInOutlook) != 5)){
+    stop("Please check data format for `WorkingStartTimeSetInOutlook` or `WorkingEndTimeSetInOutlook.\n
+         These variables must be character vectors, and have the format `%H:%M`, such as `07:30` or `23:00`.")
+  }
+
+  clean_times <- function(x){
+    out <- gsub(pattern = ":", replacement = "", x = x)
+    # out <- pad_times(out)
+    strptime(out, format = "%H%M")
+  }
+
+  flagged_data <-
+    data %>%
+    # mutate_at(vars(WorkingStartTimeSetInOutlook, WorkingEndTimeSetInOutlook), ~clean_times(.)) %>%
+    mutate_at(vars(WorkingStartTimeSetInOutlook, WorkingEndTimeSetInOutlook), ~gsub(pattern = ":", replacement = "", x = .)) %>%
+    mutate_at(vars(WorkingStartTimeSetInOutlook, WorkingEndTimeSetInOutlook), ~strptime(., format = "%H%M")) %>%
+    mutate(WorkdayRange = as.numeric(WorkingEndTimeSetInOutlook - WorkingStartTimeSetInOutlook, units = "hours"),
+           WorkdayFlag1 = WorkdayRange < threshold[[1]],
+           WorkdayFlag2 = WorkdayRange > threshold[[2]],
+           WorkdayFlag = WorkdayRange < threshold[[1]] | WorkdayRange > threshold[[2]]) %>%
+    select(PersonId, WorkdayRange, WorkdayFlag, WorkdayFlag1, WorkdayFlag2)
+
+  ## Short working hour settings
+  FlagN1 <- sum(flagged_data$WorkdayFlag1, na.rm = TRUE)
+  FlagProp1 <- mean(flagged_data$WorkdayFlag1, na.rm = TRUE)
+  FlagProp1F <- paste0(round(FlagProp1 * 100, 1), "%") # Formatted
+
+  ## Long working hour settings
+  FlagN2 <- sum(flagged_data$WorkdayFlag2, na.rm = TRUE)
+  FlagProp2 <- mean(flagged_data$WorkdayFlag2, na.rm = TRUE)
+  FlagProp2F <- paste0(round(FlagProp2 * 100, 1), "%") # Formatted
+
+  ## Short or long working hoursettings
+  FlagN <- sum(flagged_data$WorkdayFlag, na.rm = TRUE)
+  FlagProp <- mean(flagged_data$WorkdayFlag, na.rm = TRUE)
+  FlagPropF <- paste0(round(FlagProp * 100, 1), "%") # Formatted
+
+  ## Flag Messages
+  Warning_Message <- paste0("[Warning]  ", FlagPropF, " (", FlagN, ") ", "of the person-date rows in the data have extreme Outlook settings.")
+  Pass_Message1 <- paste0("[Pass] Only ", FlagPropF, " (", FlagN, ") ", "of the person-date rows in the data have extreme Outlook settings.")
+  Pass_Message2 <- paste0("There are no extreme Outlook settings in this dataset (Working hours shorter than  ", threshold[[1]], " hours, or longer than ", threshold[[2]], " hours.")
+  Detail_Message <-  paste0(FlagProp1F, " (", FlagN1, ") ", " have an Outlook workday shorter than ", threshold[[1]], " hours, while ",
+                        FlagProp2F, " (", FlagN2, ") ", "have a workday longer than ", threshold[[2]], " hours.")
+
+
+  if(FlagProp >= .05){
+    FlagMessage <- paste(Warning_Message, Detail_Message, sep = "\n")
+
+  } else if(FlagProp < .05 & FlagProp > 0){
+    FlagMessage <- paste(Pass_Message1, Detail_Message, sep = "\n")
+
+  } else if(FlagProp==0){
+    FlagMessage <- Pass_Message2 
+  }
+
+  ## Print diagnosis
+  ## Should implement options to return the PersonIds or a full data frame
+  if(return == "text"){
+    FlagMessage
+  } else if(return == "message"){
+    message(FlagMessage)
+  } else if(return == "data"){
+    flagged_data[flagged_data$WorkdayFlag == TRUE,]
+  } else {
+    stop("Error: please check inputs for `return`")
+  }
+}
+
+
--- a/R/g2g_data.R
+++ b/R/g2g_data.R
@ -0,0 +1,19 @@
+#' @title Sample Group-to-Group dataset
+#'
+#' @description
+#' A dataset generated from a Group-to-Group Query from WpA.
+#'
+#' @format A data frame with 3517 rows and 7 variables:
+#' \describe{
+#'   \item{TimeInvestors_Organization}{ }
+#'   \item{Collaborators_Organization}{ }
+#'   \item{Date}{ }
+#'   \item{Meetings}{ }
+#'   \item{Meeting_hours}{ }
+#'   \item{Email_hours}{ }
+#'   \item{Collaboration_hours}{ }
+#'
+#'   ...
+#' }
+#' @source \url{https://workplaceanalytics-demo.office.com/en-us/Home}
+"g2g_data"
--- a/R/g2g_network.R
+++ b/R/g2g_network.R
@ -0,0 +1,93 @@
+#' @title Create a network plot with the G2G query
+#'
+#' @description
+#' Pass a data frame containing a G2G query and return a network
+#' plot.
+#' Automatically handles "Collaborators_within_group" and "Other_collaborators" within query data.
+#'
+#' @param data Data frame containing a G2G query.
+#' @param time_investor String containing the variable name for the Time Investor column.
+#' @param collaborator String containing the variable name for the Collaborator column.
+#' @param metric String containing the variable name for metric.
+#' @param exc_threshold Exclusion threshold to apply.
+#' @param subtitle String to override default plot subtitle.
+#' @param ... Additional arguments to pass to `GGally::ggnet2()`
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#'
+#' @import ggplot2
+#' @import dplyr
+#'
+#' @examples
+#' \dontrun{
+#' g2g_data %>%
+#'   g2g_network(time_investor = "TimeInvestors_Organization",
+#'               collaborator = "Collaborators_Organization",
+#'               metric = "Collaboration_hours")
+#' }
+#'
+#' @export
+g2g_network <- function(data,
+                        time_investor,
+                        collaborator,
+                        metric,
+                        exc_threshold = 0.1,
+                        subtitle = "Collaboration Across Organizations",
+                        return = "plot",
+                        ...){
+
+  plot_data <-
+    data %>%
+    rename(TimeInvestorOrg = time_investor,
+           CollaboratorOrg = collaborator,
+           Metric = metric) %>%
+    mutate(CollaboratorOrg = case_when(CollaboratorOrg == "Collaborators Within Group" ~ TimeInvestorOrg,
+                                     TRUE ~ CollaboratorOrg)) %>%
+    group_by(TimeInvestorOrg, CollaboratorOrg) %>%
+    filter(TimeInvestorOrg != "Other_Collaborators" &
+             CollaboratorOrg!="Other_Collaborators") %>%
+    summarise_at("Metric", ~mean(.)) %>%
+    group_by(TimeInvestorOrg) %>%
+    mutate(metric_prop = Metric / sum(Metric, na.rm = TRUE)) %>%
+    select(TimeInvestorOrg, CollaboratorOrg, metric_prop) %>%
+    ungroup()
+
+  if(return == "table"){
+
+    plot_data
+
+  } else if(return == "plot"){
+
+    mynet_em <-
+      plot_data %>%
+      filter(metric_prop > exc_threshold) %>%
+      mutate_at(vars(TimeInvestorOrg, CollaboratorOrg), ~sub(pattern = " ", replacement = "\n", x = .)) %>%
+      mutate(metric_prop = metric_prop * 10) %>%
+      network::network(matrix.type = "edgelist",
+                       ignore.eval = FALSE,
+                       names.eval = "weights")
+
+    mynet_em %>%
+      GGally::ggnet2(size = 12,
+                     color = "lightblue",
+                     label = TRUE,
+                     label.size = 4,
+                     label.color = "black",
+                     edge.size = "weights",
+                     edge.alpha = .5,
+                     ...) +
+      ggtitle("Group to Group Collaboration",
+              subtitle = subtitle) +
+      xlab(label = "") +
+      ylab(label = "") +
+      theme_wpa_basic() +
+      labs(caption = paste("Displays only collaboration above ", exc_threshold * 100, "% of node's total collaboration", sep = "")) +
+      theme(axis.line = element_blank())
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/generate_report.R
+++ b/R/generate_report.R
@ -0,0 +1,196 @@
+#' @title Generate HTML report with list inputs
+#'
+#' @description
+#' This is a support function using a list-pmap workflow to
+#' create a HTML document, using RMarkdown as the engine.
+#'
+#' @param title Character string to specify the title of the chunk.
+#' @param filename File name to be used in the exported HTML.
+#' @param outputs A list of outputs to be added to the HTML report.
+#' Note that `outputs`, `titles`, `echos`, and `levels` must have the same length
+#' @param titles A list/vector of character strings to specify the title of the chunks.
+#' @param subheaders A list/vector of character strings to specify the subheaders for each chunk.
+#' @param echos A list/vector of logical values to specify whether to display code.
+#' @param levels A list/vector of numeric value to specify the header level of the chunk.
+#' @param theme Character vector to specify theme to be used for the report.
+#' E.g. "united", "default".
+#' @param preamble A preamble to appear at the beginning of the report, passed as a text string.
+#'
+#' @importFrom purrr pmap
+#' @importFrom purrr reduce
+#'
+#' @family Reports
+#'
+#' @examples
+#' \dontrun{
+#' # Step 1: Define Content
+#' output_list <-
+#'   list(sq_data %>% workloads_summary(return = "plot"),
+#'        sq_data %>%
+#'          workloads_summary(return = "table")) %>%
+#'   purrr::map_if(is.data.frame, create_dt)
+#'
+#' # Step 2: Add Corresponding Titles
+#' title_list <- c("Workloads Summary - Plot", "Workloads Summary - Table")
+#' n_title <- length(title_list)
+#'
+#'# Step 3: Generate Report
+#' generate_report(title = "My First Report",
+#'                 filename = "My First Report",
+#'                 outputs = output_list,
+#'                 titles = title_list,
+#'                 subheaders = rep("", n_title),
+#'                 echos = rep(FALSE, n_title
+#'
+#' }
+#' @export
+generate_report <- function(title = "My minimal HTML generator",
+                            filename = "minimal_html",
+                            outputs = output_list,
+                            titles,
+                            subheaders,
+                            echos,
+                            levels,
+                            theme = "united",
+                            preamble = ""){
+
+
+
+  ## Title of document
+  title_chr <- paste0('title: \"', title, '\"')
+
+
+  ## chunk loopage
+  ## merged to create `chunk_merged`
+  chunk_merged <-
+    list(output = outputs,
+         title = titles,
+         subheader = subheaders,
+         echo = echos,
+         level = levels,
+         id = seq(1, length(outputs))) %>%
+    purrr::pmap(function(output, title, subheader, echo, level, id){
+
+      generate_chunks(level = level,
+                      title = title,
+                      subheader = subheader,
+                      echo = echo,
+                      object = paste0("outputs[[", id, "]]"))
+
+    }) %>%
+    purrr::reduce(c)
+
+  # wpa_logo <- system.file("logos/logo.PNG", package = "wpa")
+
+  ## markdown object
+  markobj <- c('---',
+               title_chr <- paste0('title: \"', title, '\"'),
+               'output: ',
+               '  html_document:',
+               paste0('    theme: ', theme),
+               # '    theme: united',
+               '    toc: true',
+               '    toc_float:',
+               '      collapsed: false',
+               '      smooth_scroll: true',
+               '---',
+               # paste0('![]("', wpa_logo, '")'),
+               '',
+               preamble,
+               '',
+               chunk_merged)
+
+  writeLines(markobj, paste0(filename, ".Rmd"))
+  rmarkdown::render(paste0(filename, ".Rmd"))
+
+  ## Load in browser
+  utils::browseURL(paste0(filename, ".html"))
+
+  ## Deletes specified files
+  unlink(c(paste0(filename, ".Rmd"),
+           paste0(filename, ".md")))
+
+
+}
+
+
+#' @title Generate chunk strings
+#'
+#' @description This is used as a supporting function for `generate_report()`
+#' and not directly used. `generate_report()`` works by creating a
+#' loop structure around generate_chunks(), and binds them together
+#' to create a report.
+#'
+#' @details
+#' `generate_chunks()` is primarily a wrapper around paste() functions,
+#'  to create a structured character vector that will form the individual
+#'  chunks. No plots 'exist' within the environment of `generate_chunks()`.
+#'
+#' @param level Numeric value to specify the header level of the chunk.
+#' @param title Character string to specify the title of the chunk.
+#' @param subheader Character string to specify the subheader of the chunk.
+#' @param echo Logical value to specify whether to display code.
+#' @param object Character string to specify name of the object to show.
+generate_chunks <- function(level = 3,
+                            title,
+                            subheader = "",
+                            echo = FALSE,
+                            object){
+
+  level_hash <- paste(rep('#', level), collapse = "")
+
+  obj <- c(paste(level_hash, title),
+           subheader,
+           paste0('```{r, echo=',
+                  echo,
+                  ', fig.height=9, fig.width=12}'),
+           object,
+           '```',
+           ' ')
+
+  return(obj)
+}
+
+#' Read preamble
+#'
+#' Read in a preamble to be used within each individual reporting function.
+#' Reads from the Markdown file installed with the package.
+#'
+#' @param path Text string containing the path for the appropriate Markdown file.
+#'
+#' @export
+read_preamble <- function(path){
+
+  full_path <- paste0("/preamble/", path)
+
+  complete_path <- paste0(path.package("wpa"), full_path)
+
+  text <- suppressWarnings(readLines(complete_path))
+
+  return(text)
+}
+
+#' Display HTML fragment in RMarkdown chunk, from Markdown text
+#'
+#' @description
+#' This is a wrapper around `markdown::markdownToHTML()`, where
+#' the default behaviour is to produce a HTML fragment.
+#' `htmltools::HTML()` is then used to evaluate the HTML code
+#' within a RMarkdown chunk.
+#'
+#' @importFrom htmltools HTML
+#' @importFrom markdown markdownToHTML
+#'
+#' @param text Character vector containing Markdown text
+#'
+md2html <- function(text){
+
+  html_chunk <- markdown::markdownToHTML(text = text,
+                                         fragment.only = TRUE)
+
+  htmltools::HTML(html_chunk)
+
+}
+
+
+
--- a/R/globals.R
+++ b/R/globals.R
@ -0,0 +1,184 @@
+###################################################################
+
+## Global Variables
+
+## This file is added to minimize the false positives flagged during R CMD check.
+## Example: afterhours_trend: no visible binding for global variable 'Date'
+
+###################################################################
+
+utils::globalVariables(
+  c(
+    "Date",
+    "Start",
+    "End",
+    "PersonId",
+    "group",
+    "Period",
+    "outcome",
+    ".",
+    "After_hours_collaboration_hours",
+    "Employee_Count",
+    "bins",
+    "Metric",
+    "Hours",
+    "Collaboration_hours",
+    "bucket_hours",
+    "Employees",
+    "Total",
+    "Value",
+    "Email_hours",
+    "External_network_size",
+    "Networking_outside_company",
+    "Ext_network_size",
+    "Ext_network_breadth",
+    "CH_ratio",
+    "Emails_sent",
+    "Email_ratio",
+    "WorkingEndTimeSetInOutlook",
+    "WorkingStartTimeSetInOutlook",
+    "WorkdayRange",
+    "WorkdayFlag",
+    "WorkdayFlag1",
+    "WorkdayFlag2",
+    "TimeInvestorOrg",
+    "CollaboratorOrg",
+    "metric_prop",
+    "output_list",
+    "n_unique",
+    "attribute",
+    "NA_per",
+    "no visible global function definition for head",
+    "mean_collab",
+    "holidayweek",
+    "ymin",
+    "ymax",
+    "z_score",
+    "Organization",
+    "flag_nkw",
+    "perc",
+    "tenure_years",
+    "odd_tenure",
+    "Internal_network_size",
+    "Networking_outside_organization",
+    "Int_network_size",
+    "Int_network_breadth",
+    "Workweek_span",
+    "Meeting_hours",
+    "Meetings",
+    "After_hours_meeting_hours",
+    "Low_quality_meeting_hours",
+    "Generated_workload_email_recipients",
+    "After_hours_email_hours",
+    "Total_focus_hours",
+    "variable",
+    "Multitasking_meeting_hours",
+    "Redundant_meeting_hours__organizational_",
+    "Conflicting_meeting_hours",
+    "Meeting_count",
+    "perc_after_hours_m",
+    "perc_low_quality",
+    "perc_Multitasking",
+    "perc_Redundant",
+    "perc_conflicting",
+    "Redundant_meeting_hours__lower_level_",
+    "HourType",
+    "extract_prop",
+    "extract_raw",
+    "RawHours",
+    "MeetingType",
+    "Attendee_meeting_hours",
+    "AttendeeMeetingHours",
+    "Prop",
+    "Duration",
+    "Attendees",
+    "Percent",
+    "value2",
+    "x",
+    "y",
+    "Meeting_hours_with_manager",
+    "coattendman_rate",
+    "bucket_coattendman_rate",
+    "Meeting_hours_with_manager_1_on_1",
+    "mgr1on1",
+    "coattendande",
+    "mgrRel",
+    "xmin",
+    "xmax",
+    "bucket_manager_1_on_1",
+    "Minutes_with_manager_1_on_1",
+    "Metrics",
+    "Values",
+    "Checks",
+    "Variables",
+    "UniqueValues",
+    "MissingValues",
+    "Examples",
+    "KPI",
+    "After",
+    "Before",
+    "delta",
+    "perc_diff",
+    "Mybins",
+    "Employee_perc",
+    "cluster",
+    "key",
+    "clusters",
+    "label",
+    "Signals",
+    "StartEnd",
+    "Before_start",
+    "Within_hours",
+    "After_end",
+    "Cases",
+    "subjectFlag",
+    "Subject",
+    "line",
+    "text",
+    "word",
+    "name",
+    "freq",
+    "pre_group",
+    "Utilization_hrs",
+    "UH_bin",
+    "UH_over_45",
+    "h_clust",
+    "dist_m",
+    "Signals_Total",
+    "Signals_sent",
+    "Personas",
+    "Employee_count",
+    "external_network_plot",
+    "total",
+    "value",
+    "prop",
+    "ODDS",
+    "..input_var2",
+    ".N",
+    ".SD",
+    "Freq",
+    "PersonCount",
+    "Shifts",
+    "Shoulders",
+    "per",
+    "sent",
+    "Fill",
+    "PersonWeekId",
+    "WeekCount",
+    "WeekPercentage",
+    "patternRank",
+    "Count",
+    "Day_Span",
+    "First_signal",
+    "Last_signal",
+    "MeetingHoursInLongOrLargeMeetings",
+    "Percentage",
+    "TenureYear",
+    "calculation",
+    "perc_nkw",
+    "value_rescaled",
+    "values",
+    "cleaned_data",
+    "zscore"
+    )
+)
--- a/R/hr_trend.R
+++ b/R/hr_trend.R
@ -0,0 +1,58 @@
+#' @title Employee count over time
+#'
+#' @description Returns a line chart showing the change in
+#' employee count over time. Part of a data validation process to check
+#' for unusual license growth / declines over time.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#'
+#' @examples
+#' \dontrun{
+#' hr_trend(sq_data)
+#' hr_trend(dv_data)
+#' hr_trend(dv_data, return = "table")
+#' }
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+hr_trend <- function(data, return = "plot"){
+
+  options(dplyr.summarise.inform = FALSE)
+
+  data$Date <- as.Date(data$Date, format = "%m/%d/%Y")
+
+  ## Date range data frame
+  myPeriod <- extract_date_range(data)
+
+  plot_data <-
+    data %>%
+    group_by(Date) %>%
+    summarise(n = n_distinct(PersonId)) %>%
+    ungroup()
+
+  if(return == "plot"){
+    plot_data %>%
+      ggplot(aes(x = Date, y = n)) +
+      geom_line(size = 1) +
+      labs(title = "Population over time",
+           subtitle = "Unique licensed population by week",
+           caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End)) +
+      ylab("Employee count") +
+      xlab("Date") +
+      scale_y_continuous(labels = round, limits = c(0,NA)) +
+      theme_wpa_basic()
+
+  } else if(return == "table"){
+    plot_data
+  } else {
+    stop("Please enter a valid input for `return`.")
+  }
+}
--- a/R/hrvar_count.R
+++ b/R/hrvar_count.R
@ -0,0 +1,102 @@
+#' @title Create a count of distinct people in a specified HR variable
+#'
+#' @description
+#' This function enables you to create a count of the distinct people
+#' by the specified HR attribute.The default behaviour is to return a
+#' bar chart as typically seen in 'Analysis Scope'.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#' but accepts any character vector, e.g. "LevelDesignation".
+#' If a vector with more than one value is provided, the HR attributes are automatically
+#' concatenated.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import ggplot2
+#' @import dplyr
+#' @importFrom data.table ":=" "%like%" "%between%"
+#'
+#' @family General
+#' @family Data Validation
+#'
+#' @examples
+#' # Return a bar plot
+#' hrvar_count(sq_data, hrvar = "LevelDesignation")
+#'
+#' # Return a summary table
+#' hrvar_count(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#'@export
+
+hrvar_count <- function(data,
+                        hrvar = "Organization",
+                        return = "plot"){
+
+  ## Allow multiple HRvar inputs
+  if(length(hrvar) > 1){
+    hrvar_flat <- paste(hrvar, collapse = ", ")
+
+    summary_table <-
+      data %>%
+      select(PersonId, all_of(hrvar)) %>%
+      mutate(!!sym(hrvar_flat) := select(., hrvar) %>%
+               apply(1, paste, collapse = ", ")) %>%
+      group_by(!!sym(hrvar_flat)) %>%
+      summarise(n = n_distinct(PersonId)) %>%
+      arrange(desc(n))
+
+    # Single reference for single and multiple org attributes
+    hrvar_label <- hrvar_flat
+
+  } else {
+
+    summary_table <-
+      data %>%
+      select(PersonId, all_of(hrvar)) %>%
+      group_by(!!sym(hrvar)) %>%
+      summarise(n = n_distinct(PersonId)) %>%
+      arrange(desc(n))
+
+    # Single reference for single and multiple org attributes
+    hrvar_label <- hrvar
+  }
+
+  if(return == "table"){
+
+    data %>%
+      data.table::as.data.table() %>%
+      .[, .(n = n_distinct(PersonId)), by = hrvar] %>%
+      as_tibble() %>%
+      arrange(desc(n))
+
+  } else if(return == "plot"){
+
+    ## This is re-run to enable multi-attribute grouping without concatenation
+    summary_table %>%
+      ggplot(aes(x = stats::reorder(!!sym(hrvar_label), -n),
+                 y = n)) +
+      geom_col(fill = rgb2hex(0, 120, 212)) +
+      geom_text(aes(label = n),
+                vjust = -1,
+                fontface = "bold",
+                size = 4)+
+      theme_wpa_basic() +
+      theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
+      labs(title = paste("People by", camel_clean(hrvar_label))) +
+      scale_y_continuous(limits = c(0, max(summary_table$n) * 1.1)) +
+      xlab(camel_clean(hrvar_label)) +
+      ylab("Number of employees")
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
+
+#' @rdname hrvar_count
+#' @export
+analysis_scope <- hrvar_count
+
--- a/R/hrvar_count_all.R
+++ b/R/hrvar_count_all.R
@ -0,0 +1,105 @@
+#' @title Create count of distinct fields and percentage of employees with NAs for all HR variables
+#'
+#' @description
+#' This function enables you to create a summary table to validate organizational data. This table will provide a summary of the data found
+#' in the WpA Sources page.
+#' This function will return a summary table with the count of distinct fields per HR attribute and the percentage of
+#' employees with NAs for that attribute.
+#' See hrvar_count function for more detail on the specific HR attribute of interest.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param n_var number of HR variables to include in report as rows. Default is set to 10 HR variables.
+#' @param return String to specify what to return
+#' @param threshold The max number of unique values allowed for any attribute. Default is 100.
+#' @param maxna The max percentage of NAs allowable for any column. Default is 20.
+#'
+#' @import dplyr
+#'
+#' @family Data Validation
+#'
+#' @return
+#' Returns an error message by default, where 'text' is passed in `return`.
+#' When 'table' is passed, a summary table listing the number of distinct fields and percentage of NAs for the specified number of HR attributes will be returned.
+#' when 'message' is passed, outputs a message indicating which values are beyond the specified thresholds.
+#'
+#' @export
+hrvar_count_all <- function(data,
+                            n_var = 50,
+                            return = "message",
+                            threshold = 100,
+                            maxna = 20
+                            ){
+
+  ## Character vector of HR attributes
+  extracted_chr <- extract_hr(data, return = "names")
+
+  summary_table_n <-
+    data %>%
+    select(PersonId, extracted_chr) %>%
+    summarise_at(vars(extracted_chr), ~n_distinct(.,na.rm = TRUE)) # Excludes NAs from unique count
+
+  ## Note: WPA here is used for a matching separator
+  results <-
+    data %>%
+    select(PersonId, extracted_chr) %>%
+    summarise_at(vars(extracted_chr),
+                 list(`WPAn_unique` = ~n_distinct(., na.rm = TRUE), # Excludes NAs from unique count
+                      `WPAper_na` = ~(sum(is.na(.))/ nrow(data) * 100),
+                      `WPAsum_na` = ~sum(is.na(.)) # Number of missing values
+                      )) %>% # % of missing values
+    tidyr::gather(attribute, values) %>%
+    tidyr::separate(col = attribute, into = c("attribute", "calculation"), sep = "_WPA") %>%
+    tidyr::spread(calculation, values)
+
+    ## Single print message
+    if(sum(results$n_unique >= threshold)==0){
+
+      printMessage <- paste("No attributes have greater than", threshold, "unique values.")
+    }
+
+    if(sum(results$per_na >= maxna)==0){
+      newMessage <- paste("No attributes have more than", maxna, "percent NA values.")
+      printMessage <- paste(printMessage, newMessage, collapse = "\n")
+
+    }
+
+    for (i in 1:nrow(results)) {
+      if(results$n_unique[i] >= threshold){
+
+        newMessage <- paste0("The attribute '",results$attribute[i],"' has a large amount of unique values. Please check.")
+        printMessage <- paste(printMessage, newMessage, collapse = "\n")
+        }
+
+      if(results$per_na[i]>=maxna){
+
+        newMessage <- paste0("The attribute '",results$attribute[i],"' has a large amount of NA values. Please check.")
+        printMessage <- paste(printMessage, newMessage, collapse = "\n")
+        }
+    }
+
+    if(return == "table"){
+
+      results <-
+        results %>%
+        select(Attributes = "attribute",
+               `Unique values` = "n_unique",
+               `Total missing values` = "sum_na",
+               `% missing values` = "per_na")
+
+      return(utils::head(results, n_var))
+
+    } else if(return == "text"){
+
+      printMessage
+
+    } else if(return == "message"){
+
+      message(printMessage)
+
+    } else {
+
+      stop("Error: please check inputs for `return`")
+
+    }
+}
+
--- a/R/identify_churn.R
+++ b/R/identify_churn.R
@ -0,0 +1,113 @@
+#' @title Identify employees who have churned from the dataset
+#'
+#' @description
+#' This function identifies and counts the number of employees who have churned from
+#' the dataset by measuring whether an employee who is present in the first `n` (n1) weeks
+#' of the data is present in the last `n` (n2) weeks of the data.
+#'
+#' @details
+#' An additional use case of this function is the ability to identify "new-joiners" by using
+#' the argument `flip`.
+#'
+#' @param data A Person Query as a data frame. Must contain a `PersonId`.
+#' @param n1 A numeric value specifying the number of weeks at the beginning of the period
+#' that defines the measured employee set. Defaults to 6.
+#' @param n2 A numeric value specifying the number of weeks at the end of the period
+#' to calculate whether employees have churned from the data. Defaults to 6.
+#' @param return String specifying what to return. Defaults to "message", with options to
+#' return a character string ("text") or the `PersonId` of employees who have been identified
+#' as churned ("data").
+#' @param flip Logical, defaults to FALSE. This determines whether to reverse the logic of identifying the
+#' non-overlapping set. If set to `TRUE`, this effectively identifies new-joiners, or those
+#' who were not present in the first n weeks of the data but were present in the final n weeks.
+#'
+#' @details
+#' If an employee is present in the first `n` weeks of the data but not present in the last
+#' `n` weeks of the data, the function considers the employee as churned. As the measurement period
+#' is defined by the number of weeks from the start and the end of the passed data frame, you
+#' may consider filtering the dates accordingly before running this function.
+#'
+#' Another assumption that is in place is that any employee whose `PersonId` is not available in the
+#' data has churned. Note that there may be other reasons why an employee's `PersonId` may not
+#' be present, e.g. maternity/paternity leave, Workplace Analytics license has been removed,
+#' shift to a low-collaboration role (to the extent that he/she becomes inactive).
+#' @examples
+#' \dontrun{
+#' sq_data %>% identify_churn(n1 = 3, n2 = 3, return = "message")
+#' }
+#' @export
+identify_churn <- function(data,
+                           n1 = 6,
+                           n2 = 6,
+                           return = "message",
+                           flip = FALSE){
+
+  data$Date <- as.Date(data$Date, format = "%m/%d/%Y") # Ensure correct format
+
+  unique_dates <- unique(data$Date) # Vector of unique dates
+
+  nlen <- length(unique_dates) # Total number of unique dates
+
+  # First and last n weeks
+  firstnweeks <- sort(unique_dates)[1:n1]
+  lastnweeks <- sort(unique_dates, decreasing = TRUE)[1:n2]
+
+  ## People in the first week
+  first_peeps <-
+    data %>%
+    dplyr::filter(Date %in% firstnweeks) %>%
+    dplyr::pull(PersonId) %>%
+    unique()
+
+  ## People in the last week
+  final_peeps <-
+    data %>%
+    dplyr::filter(Date %in% lastnweeks) %>%
+    dplyr::pull(PersonId) %>%
+    unique()
+
+  if(flip == FALSE){
+
+    ## In first, not in last
+    churner_id <- setdiff(first_peeps, final_peeps)
+
+    ## Message
+    printMessage <-
+      paste0("Churn:\nThere are ", length(churner_id),
+             " employees from ", min(firstnweeks), " to ",
+             max(firstnweeks), " (", n1, " weeks)",
+             " who are no longer present in ",
+             min(lastnweeks), " to ", max(lastnweeks),
+             " (", n2, " weeks).")
+
+  } else if(flip == TRUE){
+
+    ## In last, not in first
+    ## new joiners
+    churner_id <- dplyr::setdiff(final_peeps, first_peeps)
+
+    ## Message
+    printMessage <-
+      paste0("New joiners:\nThere are ", length(churner_id),
+             " employees from ", min(lastnweeks), " to ",
+             max(lastnweeks), " (", n2, " weeks)",
+             " who were not present in ",
+             min(firstnweeks), " to ", max(firstnweeks),
+             " (", n1, " weeks).")
+
+  } else {
+    stop("Invalid argument for `flip`")
+  }
+
+
+
+  if(return == "message"){
+    message(printMessage)
+  } else if(return == "text"){
+    printMessage
+  } else if(return == "data"){
+    churner_id
+  } else {
+    stop("Invalid `return`")
+  }
+}
--- a/R/identify_holidayweeks.R
+++ b/R/identify_holidayweeks.R
@ -0,0 +1,105 @@
+#' @title Identify Holiday Weeks
+#'
+#' @description
+#' This function scans a standard query output for weeks where collaboration hours is far outside the mean.
+#' Returns a list of weeks that appear to be holiday weeks and optionally an edited dataframe with outliers removed.
+#' By default, missing values are excluded.
+#'
+#' As best practice, run this function prior to any analysis to remove atypical collaboration weeks from your dataset.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param sd The standard deviation below the mean for collaboration hours that should define an outlier week. Enter a positive number. Default is 1 standard deviation.
+#' @param return String to specify what to return
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @importFrom methods is
+#'
+#' @family Data Validation
+#'
+#' @return
+#' Returns a message by default (`return` = "message").
+#' When 'message' is passed, a message is printed identifying holiday weeks.
+#' When 'data' is passed, a dataset with outlier weeks flagged in a new column is returned as a dataframe.
+#' When 'data_cleaned' is passed, a dataset with outlier weeks removed is returned as a dataframe.
+#' when 'data_dirty' is passed, a dataset with only outlier weeks is returned as a dataframe.
+#' when 'plot' is passed, a pot with holiday weeks highlighted is returned as a dataframe.
+#'
+#' @export
+identify_holidayweeks <- function(data, sd = 1, return = "message"){
+
+  ## Ensure date is formatted
+  if(all(is_date_format(data$Date))){
+    data$Date <- as.Date(data$Date, format = "%m/%d/%Y")
+  } else if(is(data$Date, "Date")){
+  # Do nothing
+  } else {
+    stop("`Date` appears not to be properly formatted.\n
+         It needs to be in the format MM/DD/YYYY.\n
+         Also check for missing values or stray values with inconsistent formats.")
+  }
+
+  Calc <-
+    data %>%
+    group_by(Date) %>%
+    summarize(mean_collab = mean(Collaboration_hours, na.rm = TRUE),.groups = 'drop') %>%
+    mutate(z_score = (mean_collab - mean(mean_collab, na.rm = TRUE))/ sd(mean_collab, na.rm = TRUE))
+
+  Outliers = (Calc$Date[Calc$z_score < -sd])
+
+  mean_collab_hrs <- mean(Calc$mean_collab, na.rm = TRUE)
+
+  Message <- paste0("The weeks where collaboration was ",
+                   sd,
+                   " standard deviations below the mean (",
+                   round(mean_collab_hrs, 1),
+                   ") are: \n",
+                   paste(wrap(Outliers, wrapper = "`"),collapse = ", "))
+
+  myTable_plot <-
+    data %>%
+    mutate(holidayweek = (Date %in% Outliers)) %>%
+    select("Date", "holidayweek", "Collaboration_hours") %>%
+    group_by(Date) %>%
+    summarise(Collaboration_hours=mean(Collaboration_hours), holidayweek=first(holidayweek)) %>%
+    mutate(Date=as.Date(Date, format = "%m/%d/%Y"))
+
+  myTable_plot_shade <-
+    myTable_plot %>%
+    filter(holidayweek == TRUE) %>%
+    mutate(min = Date - 3 , max = Date + 3 , ymin = -Inf, ymax = +Inf)
+
+  plot <-
+    myTable_plot %>%
+    ggplot(aes(x = Date, y = Collaboration_hours, group = 1)) +
+    geom_line(colour = "grey40") +
+    theme_wpa_basic() +
+    geom_rect(data = myTable_plot_shade,
+              aes(xmin = min, xmax = max, ymin = ymin, ymax = ymax),
+              color="transparent", fill="steelblue", alpha=0.3) +
+    labs(title = "Holiday Weeks",
+         subtitle = "Showing average collaboration hours over time")+
+    ylab("Collaboration Hours") +
+    xlab("Date")
+
+  if(return == "text"){
+    return(Message)
+  } else if(return == "message"){
+    message(Message)
+  }else if(return %in% c("data_clean", "data_cleaned")){
+    return(data %>% filter(!(Date %in% Outliers)) %>% data.frame())
+  } else if(return == "data_dirty"){
+    return(data %>% filter((Date %in% Outliers)) %>% data.frame())
+  } else if(return == "data"){
+    return(data %>% mutate(holidayweek = (Date %in% Outliers)) %>% data.frame())
+  } else if(return == "plot"){
+    return(plot)
+  } else {
+    stop("Error: please check inputs for `return`")
+  }
+}
+
+
+
+
+
--- a/R/identify_inactiveweeks.R
+++ b/R/identify_inactiveweeks.R
@ -0,0 +1,55 @@
+#' @title Identify Inactive Weeks
+#'
+#' @description
+#' This function scans a standard query output for weeks where collaboration hours is far outside the mean for any individual person in the dataset.
+#' Returns a list of weeks that appear to be inactive weeks and optionally an edited dataframe with outliers removed.
+#'
+#' As best practice, run this function prior to any analysis to remove atypical collaboration weeks from your dataset.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param sd The standard deviation below the mean for collaboration hours that should define an outlier week. Enter a positive number. Default is 1 standard deviation.
+#' @param return String to specify what to return.
+#'
+#' @import dplyr
+#'
+#' @family Data Validation
+#'
+#' @return
+#' Returns an error message by default, where 'text' is returned.
+#' When 'data_cleaned' is passed, a dataset with outlier weeks removed is returned as a dataframe.
+#' When 'data_dirty' is passed, a dataset with outlier weeks is returned as a dataframe.
+#'
+#' @export
+identify_inactiveweeks <- function(data, sd = 2, return = "text"){
+  Calc <-
+    data %>%
+    group_by(PersonId) %>%
+    mutate(z_score = (Collaboration_hours - mean(Collaboration_hours))/sd(Collaboration_hours)) %>%
+    filter(z_score<= -sd) %>%
+    select(PersonId,Date,z_score) %>%
+    data.frame()
+
+  Message <- paste0("There are ",nrow(Calc), " rows of data with weekly collaboration hours more than ",sd," standard deviations below the mean.")
+
+  if(return == "text"){
+    return(Message)
+
+  } else if(return == "data_dirty"){
+    data %>% group_by(PersonId) %>% mutate(
+      z_score = (Collaboration_hours - mean(Collaboration_hours))/sd(Collaboration_hours)) %>%
+      filter(z_score<= -sd) %>% select(-z_score) %>% data.frame()
+
+  } else if(return == "data_cleaned"){
+    data %>% group_by(PersonId) %>% mutate(
+      z_score = (Collaboration_hours - mean(Collaboration_hours))/sd(Collaboration_hours)) %>%
+      filter(z_score> -sd) %>% select(-z_score) %>% data.frame()
+
+  } else if(return == "data"){
+    data %>% group_by(PersonId) %>% mutate(
+      z_score = (Collaboration_hours - mean(Collaboration_hours))/sd(Collaboration_hours)) %>%
+        mutate(inactiveweek = (z_score<= -sd)) %>% select(-z_score) %>% data.frame()
+
+  } else {
+    stop("Error: please check inputs for `return`")
+  }
+}
--- a/R/identify_nkw.R
+++ b/R/identify_nkw.R
@ -0,0 +1,96 @@
+#' @title Identify Non-Knowledge workers (nkw)
+#'
+#' @description
+#' This function scans a standard query output to identify employees with consistently low collaboration signals.
+#' Returns the % of non-knowledge workers identified by Organization, and optionally an edited dataframe with non-knowledge workers removed, or the full dataframe with the kw/nkw flag added.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param collab_threshold The collaboration hours threshold that should be exceeded as an average for the entire analysis period for the employee to be categorized as a knowledge worker ("kw").
+#' Enter a positive number. Default is set to 5 collaboration hours.
+#' @param return String to specify what to return
+#'
+#' @import dplyr
+#'
+#' @family Data Validation
+#'
+#' @return
+#' Returns an error message by default, where 'text' is passed in `return`.
+#' When 'data_with_flag' is passed, a copy of your original dataframe with an additional column containing the kw/nkw flag will be returned.
+#' When 'data_clean' is passed, a full dataframe excluding "nkw" employees will be returned.
+#' when 'data_summmary' is passed, a summary table by organization listing the number and % of non-knowledge workers will be returned.
+#'
+#' @export
+identify_nkw <- function(data, collab_threshold = 5, return = "data_summary"){
+
+  options(dplyr.summarise.inform = FALSE)
+
+  summary_byPersonId <-
+    data %>%
+    group_by(PersonId, Organization) %>%
+    summarize(mean_collab = mean(Collaboration_hours))%>%
+    mutate(flag_nkw = case_when(mean_collab > collab_threshold ~ "kw",
+                                TRUE ~ "nkw"))
+
+  data_with_flag <- left_join(data,
+                              summary_byPersonId %>%
+                                dplyr::select(PersonId,flag_nkw), by = 'PersonId')
+
+  summary_byOrganization <-
+    summary_byPersonId %>%
+    group_by(Organization, flag_nkw)%>%
+    summarise(total = n())%>%
+    group_by(Organization)%>%
+    mutate(perc = total/sum(total))%>% #need to format to %
+    filter(flag_nkw == "nkw")%>%
+    rename(n_nkw = total, perc_nkw = perc)%>%
+    select(-flag_nkw) %>%
+    ungroup()
+
+  ## Number of NKW identified
+  n_nkw <- sum(summary_byPersonId$flag_nkw == "nkw")
+
+
+  if(n_nkw == 0){
+    flagMessage <- paste0("[Pass] There are no non-knowledge workers identified",
+                          " (average collaboration hours below ",
+                          collab_threshold,
+                          " hours).")
+  } else {
+    flagMessage <-
+      paste0("[Warning] Out of a population of ", n_distinct(data$PersonId),
+             ", there are ", n_nkw,
+             " employees who may be non-knowledge workers (average collaboration hours below ",
+             collab_threshold, " hours).")
+  }
+
+   if(return == "data_with_flag"){
+    return(data_with_flag)
+
+  } else if(return %in% c("data_clean", "data_cleaned")){
+
+    return(data_with_flag %>% filter(flag_nkw == "kw") %>% data.frame())
+
+  } else if(return == "text"){
+
+    flagMessage
+
+  } else if(return =="data_summary"){
+
+    summary_byOrganization %>%
+      mutate(perc_nkw = scales::percent(perc_nkw, accuracy = 1)) %>%
+      rename(`Non-knowledge workers (count)` = "n_nkw",
+             `Non-knowledge workers (%)` = "perc_nkw")
+
+  } else {
+    stop("Error: please check inputs for `return`")
+  }
+}
+
+
+
+
+
+
+
+
+
--- a/R/identify_outlier.R
+++ b/R/identify_outlier.R
@ -0,0 +1,51 @@
+#' @title Identify outliers across time
+#'
+#' This function takes in a selected metric and uses
+#' z-score (number of standard deviations) to identify outliers
+#' across time. There are applications in this for identifying
+#' weeks with abnormally low collaboration activity, e.g. holidays.
+#' Time as a grouping variable can be overridden with the `group_var`
+#' argument.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param group_var A string with the name of the grouping variable.
+#' Defaults to `Date`.
+#' @param metric Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#'
+#' @import dplyr
+#'
+#' @examples
+#' identify_outlier(sq_data, metric = "Collaboration_hours")
+#'
+#' @return
+#' Returns a data frame with `Date` (if grouping variable is not set),
+#' the metric, and the corresponding z-score.
+#'
+#' @family General
+#' @family Data Validation
+#'
+#' @export
+identify_outlier <- function(data,
+                             group_var = "Date",
+                             metric = "Collaboration_hours"){
+
+  ## Check inputs
+  required_variables <- c(group_var,
+                          "PersonId",
+                          metric)
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  main_table <-
+    data %>%
+    group_by(!!sym(group_var)) %>%
+    summarise_at(vars(!!sym(metric)), ~mean(.)) %>%
+    mutate(zscore = (!!sym(metric) - mean(!!sym(metric)))/sd(!!sym(metric)))
+
+  return(main_table)
+}
+
--- a/R/identify_privacythreshold.R
+++ b/R/identify_privacythreshold.R
@ -0,0 +1,81 @@
+#' @title Identify groups under privacy threshold
+#'
+#' @description
+#' This function scans a standard query output for groups with of employees under the privacy threshold.
+#' The method consists in reviewing each individual HR attribute, and count the distinct people within each group.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "table" (default) and "text" (text)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family Data Validation
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+identify_privacythreshold <- function(data,
+                        hrvar = extract_hr(data),
+                        mingroup = 5,
+                        return = "table"){
+
+  results <-
+    data %>% hrvar_count(
+             hrvar = hrvar[1],
+             return = "table")
+
+  results$hrvar <- ""
+
+  results <- results[0,]
+
+  	for (p in hrvar) {
+  	table1 <-
+  	  data %>%
+	  hrvar_count(hrvar = p,
+             return = "table")
+
+  	table1$hrvar <- p
+	colnames(table1)[1] <- "group"
+
+  	results <- rbind(results,table1)
+  	}
+
+ output <-  results %>% arrange(n) %>% select(hrvar, everything())
+
+ groups_under <-  results %>% filter(n<mingroup) %>% nrow()
+
+ MinGroupFlagMessage_Warning <- paste0("[Warning]  There are ", groups_under, " groups under the minimum group size privacy threshold of ", mingroup, ".")
+ MinGroupFlagMessage_Low <- paste0("[Pass] There is only ", groups_under, " group under the minimum group size privacy threshold of ", mingroup, ".")
+ MinGroupFlagMessage_Zero <- paste0("[Pass] There are no groups under the minimum group size privacy threshold of ", mingroup, ".")
+
+
+  if(groups_under > 1){
+    MinGroupFlagMessage <- MinGroupFlagMessage_Warning
+  } else if(groups_under == 1 ){
+    MinGroupFlagMessage <- MinGroupFlagMessage_Low
+  } else if(groups_under ==0){
+    MinGroupFlagMessage <- MinGroupFlagMessage_Zero
+  }
+
+  if(return == "table"){
+    return(output)
+  } else if(return == "message"){
+    message(MinGroupFlagMessage)
+  } else if(return == "text"){
+    MinGroupFlagMessage
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/identify_query.R
+++ b/R/identify_query.R
@ -0,0 +1,70 @@
+#' @title Identify the query type of the passed data frame
+#'
+#' @description
+#' Pass a Workplace Analytics dataset and return the identified
+#' query type as a string. This function uses variable name string
+#' matching to 'guess' the query type of the data frame.
+#'
+#' @param data A Workplace Analytics dataset in the form of a data frame.
+#' If the data is not identified as a Workplace Analytics dataset, the function
+#' will return an error.
+#' @param threshold Debugging use only. Increase to raise the 'strictness' of the
+#' guessing algorithm. Defaults to 2.
+#'
+#' @family Data Validation
+#'
+#' @examples
+#' \dontrun{
+#' identify_query(sq_data) # Standard query
+#' identify_query(mt_data) # Meeting query
+#' identify_query(em_data) # Hourly collaboration query
+#' identify_query(iris) # Will return an error
+#' identify_query(mtcars) # Will return an error
+#' }
+#'
+#' @export
+identify_query <- function(data, threshold = 2){
+
+  ## variables to check for in each query type
+  spq_var <- c("PersonId", "Collaboration_hours", "Instant_Message_hours") # Standard Person query
+  caq_var <- c("PersonId", "Collaboration_hrs", "Instant_message_hours") # Collaboration Assessment query
+  smq_var <- c("MeetingId", "Date", "Attendees") # Standard Meeting Query
+  shc_var <- c("PersonId", "Emails_sent_00_01", "IMs_sent_23_24") # Standard Hourly Collaboration
+
+  ## see if there are columns which do not exist
+  spq_check <- check_inputs(input = data, requirements = spq_var, return = "names")
+  caq_check <- check_inputs(input = data, requirements = caq_var, return = "names")
+  smq_check <- check_inputs(input = data, requirements = smq_var, return = "names")
+  shc_check <- check_inputs(input = data, requirements = shc_var, return = "names")
+
+  ## length of checks
+  spq_check_n <- length(spq_check)
+  caq_check_n <- length(caq_check)
+  smq_check_n <- length(smq_check)
+  shc_check_n <- length(shc_check)
+  total_check_vec <- c(spq_check_n, caq_check_n, smq_check_n, shc_check_n)
+
+  ## should never be zero
+  total_check_n <- sum(total_check_vec, na.rm = TRUE)
+
+  ## Labels
+  qlabels <- c("Person Query",
+               "Collaboration Assessment Query",
+               "Meeting Query",
+               "Hourly Collaboration Query")
+
+  ## Minimum number of non-matches
+  min_nm <- min(total_check_vec)
+
+  ## Final guess
+  f_guess <- qlabels[which.min(total_check_vec)]
+
+  if(total_check_n == 0){
+    stop("Error: please check if query data is properly formatted query data.")
+  } else if(min_nm > threshold){
+    stop("Column mismatches: please check if query data is properly formatted query data.")
+  } else {
+    f_guess
+  }
+}
+
--- a/R/identify_shifts.R
+++ b/R/identify_shifts.R
@ -0,0 +1,63 @@
+#' @title Identify shifts based on outlook time settings for work day start and end time
+#'
+#' @description
+#' This function uses outlook calendar settings for start and end time of work day to identify
+#' work shifts. The relevant variables are `WorkingStartTimeSetInOutlook` and
+#' `WorkingEndTimeSetInOutlook`.
+#'
+#' @param data A data frame containing data from the Hourly Collaboration query.
+#'
+#' @param return Character vector to specify what to return.
+#' "table" (default) returns a summary table of the counts of shifts.
+#' "data" returns the original input data frame with the `Shifts` column appended.
+#'
+#' @importFrom data.table ":=" "%like%" "%between%"
+#'
+#' @examples
+#' dv_data %>% identify_shifts()
+#'
+#' @export
+identify_shifts <- function(data, return = "plot"){
+
+  clean_times <- function(x){
+    out <- gsub(pattern = ":00", replacement = "", x = x)
+    as.numeric(out)
+  }
+
+  data <- data.table::as.data.table(data)
+  # data <- data.table::copy(data)
+
+  # Make sure data.table knows we know we're using it
+  .datatable.aware = TRUE
+
+  data[, Shifts := paste(WorkingStartTimeSetInOutlook, WorkingEndTimeSetInOutlook, sep = "-")]
+
+
+  # outputTable <- data[, .(count = .N), by = Shifts]
+  outputTable <- data[, list(WeekCount = .N,
+                             PersonCount = dplyr::n_distinct(PersonId)), by = Shifts]
+
+  outputTable <- data.table::setorder(outputTable, -PersonCount)
+
+
+  if(return == "table"){
+    dplyr::as_tibble(outputTable)
+
+  } else if(return == "plot"){
+
+    outputTable %>%
+      utils::head(10) %>%
+      create_bar_asis(group_var = "Shifts",
+                      bar_var = "WeekCount",
+                      title = "Most frequent outlook shifts",
+                      subtitle = "Showing top 10 only",
+                      caption = extract_date_range(data, return = "text"),
+                      ylab = "Shifts",
+                      xlab = "Frequency")
+
+  } else if(return == "data"){
+
+    output_data <- data
+    dplyr::as_tibble(output_data)
+  }
+}
--- a/R/identify_tenure.R
+++ b/R/identify_tenure.R
@ -0,0 +1,114 @@
+#' @title Tenure calculation based on different input dates, returns data summary table or histogram
+#'
+#' @description
+#' This function calculates employee tenure based on different input dates.
+#' `identify_tenure` uses the latest Date available if user selects "Date",
+#'  but also have flexibility to select a specific date, e.g. "1/1/2020".
+#'
+#' @family Data Validation
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param end_date A string specifying the name of the date variable representing the latest date. Defaults to "Date".
+#' @param beg_date A string specifying the name of the date variable representing the hire date. Defaults to "HireDate".
+#' @param maxten A numeric value representing the maximum tenure.
+#' If the tenure exceeds this threshold, it would be accounted for in the flag message.
+#' @param return String to specify what to return.
+#' Defaults to "message".
+#' Other valid values include "text", "plot", "data_cleaned", "data_dirty", and "data".
+#' For "data", a data frame with the `PersonId` and a calculated variable called `TenureYear` is returned.
+#'
+#' @examples
+#' \dontrun{
+#' # Add HireDate to sq_data - method #1
+#' sq_data2 <- sq_data %>% mutate(HireDate = as.Date("1/1/2015", format = "%m/%d/%Y" ))
+#' identify_tenure(sq_data2)
+#'
+#' # Add HireDate to sq_data - method #2
+#' sq_data$HireDate <-
+#'   rep(sample(seq(as.Date('1975/01/01'),
+#'                  as.Date('2019/11/01'), by="day"), 15119),2)
+#'
+#' identify_tenure(sq_data)
+#' }
+#'
+#' @export
+identify_tenure <- function(data,
+                            end_date = "Date",
+                            beg_date = "HireDate",
+                            maxten = 40,
+                            return = "message"){
+
+  required_variables <- c("HireDate")
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  data_prep <-
+    data %>%
+    mutate(Date = as.Date(Date, format= "%m/%d/%Y"), # Re-format `Date`
+           end_date = as.Date(!!sym(end_date), format= "%m/%d/%Y"), # Access a symbol, not a string
+           beg_date = as.Date(!!sym(beg_date), format= "%m/%d/%Y")) %>% # Access a symbol, not a string
+    arrange(end_date) %>%
+    mutate(End = last(end_date))
+
+  last_date <- data_prep$End
+
+  # graphing data
+  tenure_summary <-
+    data_prep %>%
+    filter(Date == last_date) %>%
+    mutate(tenure_years = (Date - beg_date)/365) %>%
+    group_by(tenure_years)%>%
+    summarise(n = n(),.groups = 'drop')
+
+
+  # off person IDs
+  oddpeople <-
+    data_prep %>%
+    filter(Date == last_date) %>%
+    mutate(tenure_years = (Date - beg_date)/365) %>%
+    filter(tenure_years >= maxten) %>%
+    select(PersonId)
+
+  # message
+  Message <- paste0("The mean tenure is ",round(mean(tenure_summary$tenure_years,na.rm = TRUE),1)," years.\nThe max tenure is ",
+                   round(max(tenure_summary$tenure_years,na.rm = TRUE),1),".\nThere are ",
+                   length(tenure_summary$tenure_years[tenure_summary$tenure_years>=maxten])," employees with a tenure greater than ",maxten," years.")
+
+  if(return == "text"){
+    return(Message)
+
+  } else if(return == "message"){
+    message(Message)
+
+  } else if(return == "plot"){
+
+    suppressWarnings(
+      ggplot(data = tenure_summary,aes(x = as.numeric(tenure_years))) +
+        geom_density() +
+        labs(title = "Tenure - Density",
+             subtitle = "Calculated with `HireDate`") +
+        xlab("Tenure in Years") +
+        ylab("Density - number of employees") +
+        theme_wpa_basic()
+    )
+
+  } else if(return == "data_cleaned"){
+    return(data %>% filter(!(PersonId %in% oddpeople$PersonId)) %>% data.frame())
+
+  } else if(return == "data_dirty"){
+    return(data %>% filter((PersonId %in% oddpeople$PersonId)) %>% data.frame())
+
+  } else if(return == "data"){
+
+    data_prep %>%
+      filter(Date == last_date) %>%
+      mutate(TenureYear = as.numeric((Date - beg_date)/365)) %>%
+      select(PersonId, TenureYear)
+
+  } else {
+    stop("Error: please check inputs for `return`")
+  }
+
+}
--- a/R/import_wpa.R
+++ b/R/import_wpa.R
@ -0,0 +1,52 @@
+#' @title Import a Workplace Analytics Query
+#'
+#' @description
+#' Import a Workplace Analytics Query from a local CSV File, with variable classifications optimised
+#' for other **wpa** functions.
+#'
+#' @details
+#' `import_wpa()` uses `data.table::fread()` to import CSV files for speed,
+#' and by default `stringsAsFactors` is set to FALSE.
+#' A data frame is returned by the function (not a `data.table`).
+#'
+#' @param x String containing the path to the Workplace Analytics query to be imported.
+#' The input file must be a CSV file, and the file extension must be explicitly entered,
+#' e.g. "/files/standard query.csv"
+#' @param standardise logical. If TRUE, `import_wpa()` runs `standardise_pq()` to make a Collaboration
+#' Assessment query's columns name standard and consistent with a Standard Person Query. Note that this
+#' will have no effect if the query being imported is not a Collaboration Assessment query. Defaults
+#' as FALSE.
+#'
+#' @export
+import_wpa <- function(x, standardise = FALSE){
+
+  return_data <- data.table::fread(x, stringsAsFactors = FALSE) %>% as.data.frame()
+
+  # Columns which are Dates
+  dateCols <- sapply(return_data, function(x) all(is_date_format(x)))
+  dateCols <- dateCols[dateCols == TRUE]
+
+  return_data <-
+    return_data %>%
+    dplyr::mutate_at(dplyr::vars(names(dateCols)), ~as.Date(., format = "%m/%d/%Y"))
+
+  message("Query has been imported successfully!")
+
+  ## Query check only available for Person Queries
+  if(identify_query(return_data) == "Person Query"){
+    check_query(return_data)
+  }
+
+  ## Standardise query if `standardise == TRUE`
+  if(standardise == TRUE & identify_query(return_data) == "Collaboration Assessment Query"){
+    message("Standardising column names for a Collaboration Assessment query to
+            a Person query...")
+    return_data <- standardise_pq(return_data)
+  }
+
+  dplyr::as_tibble(return_data)
+}
+
+
+
+
--- a/R/init.R
+++ b/R/init.R
@ -0,0 +1,7 @@
+.onAttach <- function(libname, pkgname) {
+  message <- c("\n Thank you for using the {wpa} R package!",
+               "\n \n Our analysts have taken every care to ensure that this package runs smoothly and bug-free.",
+               "\n However, if you do happen to encounter any, please email mac@microsoft.com to report any issues.",
+               "\n \n Happy coding!")
+  packageStartupMessage(message)
+}
--- a/R/internal_network_plot.R
+++ b/R/internal_network_plot.R
@ -0,0 +1,111 @@
+#' @title Plot the internal network metrics for a HR variable
+#'
+#' @description
+#' lot the internal network metrics for a HR variable.
+#'
+#' @param data Person Query as a dataframe including date column named "Date"
+#' This function assumes the data format is MM/DD/YYYY as is standard in a WpA query output.
+#' @param hrvar WpA variable for an HR variable to group networks by
+#'  For example, "Layer"
+#' @param mingroup Numeric vector for minimum group size for aggregation
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param bubble_size A numeric vector of length two to specify the size range of the bubbles
+#'
+#' @import dplyr
+#' @import reshape2
+#' @import ggplot2
+#' @import ggrepel
+#'
+#' @examples
+#' \dontrun{internal_network_plot(sq_data)
+#
+#' }
+#'
+#' @family Connectivity
+#'
+#' @export
+
+internal_network_plot <- function(data,
+                                  hrvar = "Organization",
+                                  mingroup = 5,
+                                  return = "plot",
+                                  bubble_size = c(1, 5))
+  {
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar))
+
+  plot_data <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Int_network_size = mean(Internal_network_size),
+              Int_network_breadth = mean(Networking_outside_organization),
+              Employee_count = n_distinct(PersonId)
+              ) %>%
+    filter(Employee_count >= mingroup)
+
+  # Create summary table
+  summary_table <- plot_data %>% arrange(., desc(Int_network_size))
+
+  # Create plot
+  int_network_plot <-ggplot(plot_data,
+         aes(x=Int_network_size,
+             y=Int_network_breadth)
+         ) +
+    geom_point(aes(size=Employee_count),
+                color = rgb2hex(0, 120, 212),
+                alpha = 0.5) +
+    geom_text_repel(label=plot_data$group) +
+    scale_x_continuous(name = "Internal Network Size") +
+    scale_y_continuous(name = "Internal Network Breadth") +
+    scale_size(range = bubble_size) +
+    theme_classic() +
+    theme(
+      axis.text = element_text(size = 10),
+      axis.text.x = element_text(
+        angle = 90,
+        hjust = 1,
+        vjust = 0.5
+      ),
+      plot.title = element_text(
+        color = "grey40",
+        face = "bold",
+        size = 18
+      ),
+      plot.subtitle = element_text(size = 14)
+    ) +
+    labs(
+      title = paste("Internal network metrics by", hrvar),
+      subtitle = paste(
+        "Network size is number of people, breadth is number of organizations"
+        )
+    ) +
+    labs(
+      caption = paste(
+        "Total employees =",
+        sum(plot_data$Employee_count),
+        "| Data from",
+        min(as.Date(data$Date, "%m/%d/%Y")),
+        "to",
+        max(as.Date(data$Date, "%m/%d/%Y"))
+      )
+      )
+
+  if(return == "table"){
+
+    summary_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(int_network_plot)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/is_date_format.R
+++ b/R/is_date_format.R
@ -0,0 +1,19 @@
+#' @title Identify whether string is a date format
+#'
+#' @description
+#' This function uses regular expression to determine whether
+#' a string is of the format "mdy", separated by "-", "/",
+#' or ".", returning a logical vector.
+#'
+#' @param string Character string to test whether is a date format.
+#'
+#' @examples
+#' \dontrun{
+#' is_date_format("1/5/2020")
+#' }
+#'
+#' @export
+is_date_format <- function(string){
+  grepl("^\\d{1,2}[- /.]\\d{1,2}[- /.]\\d{1,4}$",
+        string)
+}
--- a/R/keymetrics_report.R
+++ b/R/keymetrics_report.R
@ -0,0 +1,183 @@
+# #' @title Create an exploratory overview of Standard Query dataset
+# #'
+# #' @description
+# #' Input a Standard Query dataset as a data frame in the argument, returning an overview of the key metrics.
+# #'
+# #' @param data A Standard Query dataset in the form of a data frame.
+# #' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+# #'
+# #' @import dplyr
+# #' @import ggplot2
+# #' @import reshape2
+# #'
+# #' @return
+# #' Returns a HTML report containing an overview of the key metrics in a Standard Person Query.
+# #'
+# #' @export
+#
+# keymetrics_report <- function(data, mingroup = 10){
+#   ## Table by Person (Period Averages)
+#   data %>%
+#     group_by(PersonId) %>%
+#     summarise(periods = n(),
+#               Organization = first(Organization),
+#               LevelDesignation = first(LevelDesignation)) -> mydata_person
+#
+#   ## Plot available data by Organization
+#   mydata_person %>% count(Organization, name = "count") -> mydata_OrgLevel
+#
+#   mydata_OrgLevel %>%
+#     ggplot(aes(x = Organization, y = 1, label=count)) +
+#     geom_point(aes(size = count), colour = "steelblue", alpha = 0.5) +
+#     scale_size_continuous(range = c(20, 50)) +
+#     geom_label() +
+#     ggtitle("Measured Individuals (by Organization)") +
+#     ylab("") +
+#     xlab("Organization") +
+#     theme(legend.position="none",
+#           axis.text.x = element_text(angle = 90, hjust = 1),
+#           axis.title.y = element_blank(),
+#           axis.text.y = element_blank(),
+#           axis.ticks.y = element_blank()) -> plot_OrgLevel
+#
+#   ## Plot available data by LevelDesignation
+#   mydata_person %>% count(LevelDesignation, name = "count") -> mydata_LevelDesignation
+#
+#   mydata_LevelDesignation %>%
+#     ggplot(aes(x=LevelDesignation, y=1, label=count)) +
+#     geom_point(aes(size = count), colour = "steelblue", alpha = 0.5) +
+#     scale_size_continuous(range = c(20, 50)) +
+#     geom_label() +
+#     ggtitle("Measured Individuals (by Level Designation)") +
+#     ylab("") + xlab("Level Designation") +
+#     theme(legend.position="none",
+#           axis.text.x = element_text(angle = 90, hjust = 1),
+#           axis.title.y = element_blank(),
+#           axis.text.y = element_blank(),
+#           axis.ticks.y = element_blank(),
+#           plot.title = element_text(color="grey40", face="bold", size=20)) -> plot_LevelDesignation
+#
+#   ## Plot available data by Organization and Level
+#   mydata_person %>% count(Organization, LevelDesignation, name = "count") -> mydata_OrgLevelDesignation
+#
+#   mydata_OrgLevelDesignation %>%
+#     ggplot(aes(x = Organization, y = LevelDesignation)) +
+#     geom_tile(aes(fill = count), colour = "grey50") +
+#     geom_text(aes(label = round(count, 1))) +
+#     scale_fill_gradient(low = "lightblue", high = "blue",
+#                         limits=c(mingroup,NA)) +
+#     labs(fill = "Count") +
+#     ggtitle("Group Sizes (by Organization and Level)") +
+#     ylab("Level Designation") +
+#     theme(axis.text.x = element_text(angle = 45, hjust = 1),
+#           plot.title = element_text(color="grey40", face="bold", size=20)) -> plot_OrgLevelDesignation
+#
+#   ## Pipe in Manager Relationship
+#   data %>% mgrrel_matrix(return = "plot") -> plot_ManagerRelationship
+#
+#   ## Pipe in WITL
+#   data %>% keymetrics_scan(hrvar = "Organization", return = "plot") -> plot_WITL
+#
+#   ## Pipe in Meeting Habits
+#   data %>% meeting_summary(hrvar = "Organization", return = "plot") -> plot_MeetingHabits
+#
+#   ## Generate kables
+#
+#   mydata_OrgLevel %>%
+#     knitr::kable(caption = "Data by Organization") %>%
+#     kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE) -> tb_OrgLevel
+#
+#   mydata_LevelDesignation %>%
+#     knitr::kable(caption = "Data by Level Designation") %>%
+#     kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE) -> tb_LevelDesignation
+#
+#   mydata_OrgLevelDesignation %>%
+#     spread(LevelDesignation, count) %>%
+#     knitr::kable(caption = "Data by Level Designation") %>%
+#     kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE) -> tb_OrgLevelDesignation
+#
+#   ## Pipe in Manager Relationship
+#   data %>%
+#     mgrrel_matrix(return = "table") %>%
+#     knitr::kable(caption = "Distribution of Manager-Direct Relationship") %>%
+#     kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE) -> tb_ManagerRelationship
+#
+#   ## Pipe in WITL
+#   data %>%
+#     keymetrics_scan(hrvar = "Organization", return = "table") %>%
+#     mutate_at(vars(-variable), ~round(., 1)) %>%
+#     knitr::kable(caption = "Week in the Life - by Organization") %>%
+#     kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE)  -> tb_WITL
+#
+#   ## Pipe in Meeting Habits
+#   data %>%
+#     meeting_summary(hrvar = "Organization", return = "table") %>%
+#     mutate_at(vars(-variable), ~round(., 1)) %>%
+#     knitr::kable(caption = "Meeting Habits - by Organization") %>%
+#     kableExtra::kable_styling(bootstrap_options = "striped", full_width = TRUE)  -> tb_MeetingHabits
+#
+#   markobj <- c('---',
+#                'title: "WPA Standard Query Overview"',
+#                'output: ',
+#                '  html_document:',
+#                '    theme: united',
+#                '    toc: true',
+#                '    toc_float:',
+#                '      collapsed: false',
+#                '      smooth_scroll: true',
+#
+#                '---',
+#                '',
+#                '',
+#
+#                '### Week in the Life',
+#                '```{r, echo=FALSE}',
+#                'plot_WITL',
+#                '```',
+#                '```{r, echo=FALSE}',
+#                'tb_WITL',
+#                '```',
+#
+#                '### Organization Level',
+#
+#                '```{r, echo=FALSE}',
+#                'tb_OrgLevel',
+#                '```',
+#
+#                '### Level Designation',
+#                '```{r, echo=FALSE}',
+#                'plot_LevelDesignation',
+#                '```',
+#                '```{r, echo=FALSE}',
+#                'tb_LevelDesignation',
+#                '```',
+#
+#                '### Organization and Level Designation',
+#                '```{r, echo=FALSE}',
+#                'plot_OrgLevelDesignation',
+#                '```',
+#                '```{r, echo=FALSE}',
+#                'tb_OrgLevelDesignation',
+#                '```',
+#
+#                '### Meeting Habits',
+#                '```{r, echo=FALSE}',
+#                'plot_MeetingHabits',
+#                '```',
+#                '```{r, echo=FALSE}',
+#                'tb_MeetingHabits',
+#                '```',
+#
+#                '### Manager-direct relationship Distribution',
+#                '```{r, echo=FALSE}',
+#                'plot_ManagerRelationship',
+#                '```',
+#                '```{r, echo=FALSE}',
+#                'tb_ManagerRelationship')
+#
+#   writeLines(markobj, "SQ-overview.Rmd")
+#   rmarkdown::render("SQ-overview.Rmd")
+#   utils::browseURL("SQ-overview.html")
+#   unlink(c("SQ-overview.Rmd",
+#            "SQ-overview.md"))
+# }
--- a/R/keymetrics_scan.R
+++ b/R/keymetrics_scan.R
@ -0,0 +1,118 @@
+#' @title Run a summary of Key Metrics from the Standard Query data
+#'
+#' @description
+#' Returns a heatmapped table by default, with options to return a table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, e.g. "Organization"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param metrics A character vector containing the variable names to calculate averages of.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param textsize A numeric value specifying the text size to show in the plot.
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @importFrom stats reorder
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @examples
+#' keymetrics_scan(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#' @export
+
+keymetrics_scan <- function(data,
+                            hrvar = "Organization",
+                            mingroup = 5,
+                            metrics = c("Workweek_span",
+                                        "Collaboration_hours",
+                                        "After_hours_collaboration_hours",
+                                        "Meetings",
+                                        "Meeting_hours",
+                                        "After_hours_meeting_hours",
+                                        "Low_quality_meeting_hours",
+                                        "Meeting_hours_with_manager_1_on_1",
+                                        "Meeting_hours_with_manager",
+                                        "Emails_sent",
+                                        "Email_hours",
+                                        "After_hours_email_hours",
+                                        "Generated_workload_email_hours",
+                                        "Total_focus_hours",
+                                        "Internal_network_size",
+                                        "Networking_outside_organization",
+                                        "External_network_size",
+                                        "Networking_outside_company"),
+                            return = "plot",
+                            textsize = 2){
+
+  myTable <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    group_by(group, PersonId) %>%
+    summarise_at(vars(metrics), ~mean(., na.rm = TRUE)) %>%
+    group_by(group) %>%
+    summarise_at(vars(metrics), ~mean(., na.rm = TRUE)) %>%
+    left_join(hrvar_count(data, hrvar = hrvar, return = "table") %>%
+                rename(Employee_Count = "n"),
+              by = c("group" = hrvar)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy threshold
+
+  myTable %>%
+    reshape2::melt(id.vars = "group") %>%
+    reshape2::dcast(variable ~ group) -> myTable_wide
+
+  myTable_long <- reshape2::melt(myTable, id.vars=c("group")) %>%
+    mutate(variable = factor(variable)) %>%
+    group_by(variable) %>%
+    # Heatmap by row
+    mutate(value_rescaled = value/mean(value)) %>%
+    ungroup()
+
+  # Underscore to space
+  us_to_space <- function(x){
+    gsub(pattern = "_", replacement = " ", x = x)
+  }
+
+
+  plot_object <-
+    myTable_long %>%
+    filter(variable != "Employee_Count") %>%
+    ggplot(aes(x = group,
+               y = stats::reorder(variable, desc(variable)))) +
+    geom_tile(aes(fill = value_rescaled)) +
+    geom_text(aes(label=round(value, 1)), size = textsize) +
+    scale_fill_distiller(palette = "Blues",  direction = 1) +
+    scale_x_discrete(position = "top") +
+    scale_y_discrete(labels = us_to_space) +
+    theme_light() +
+    labs(title = "Key Workplace Analytics metrics",
+         subtitle = paste("Weekly average by", camel_clean(hrvar)),
+         y =" ",
+         x =" ",
+         caption = extract_date_range(data, return = "text")) +
+    theme(axis.text.x = element_text(angle = 90, hjust = 0),
+          plot.title = element_text(color="grey40", face="bold", size=20)) +
+    guides(fill=FALSE)
+
+
+  if(return == "table"){
+
+    myTable_wide %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/meeting_dist.R
+++ b/R/meeting_dist.R
@ -0,0 +1,41 @@
+#' @title Meeting Hours distribution
+#'
+#' @description
+#' Analyze Meeting Hours distribution.
+#' Returns a stacked bar plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @inheritParams create_dist
+#'
+#' @family Meeting Culture
+#'
+#' @examples
+#' ## Return a plot
+#' meeting_dist(sq_data, hrvar = "Organization")
+#'
+#' ## Return a table
+#' meeting_dist(sq_data, hrvar = "Organization", return = "table")
+#'
+#' ## Return result with a custom specified breaks
+#' meeting_dist(sq_data, hrvar = "LevelDesignation", cut = c(4, 7, 9))
+#'
+#'
+#' @export
+
+meeting_dist <- function(data,
+                         hrvar = "Organization",
+                         mingroup = 5,
+                         return = "plot",
+                         cut = c(5, 10, 15)) {
+
+  create_dist(data = data,
+              metric = "Meeting_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return,
+              cut = cut)
+}
+
+#' @rdname meeting_dist
+#' @export
+meeting_distribution <- meeting_dist
--- a/R/meeting_fizz.R
+++ b/R/meeting_fizz.R
@ -0,0 +1,27 @@
+#' @title Distribution of Meeting Hours (Fizzy Drink plot)
+#'
+#' @description
+#' Analyze weekly meeting hours distribution, and returns
+#' a 'fizzy' scatter plot by default.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @inheritParams create_fizz
+#'
+#' @family Meetings
+#'
+#' @examples
+#' meeting_fizz(sq_data, hrvar = "Organization", return = "table")
+#' @export
+
+meeting_fizz <- function(data,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot"){
+
+  create_fizz(data = data,
+              metric = "Meeting_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+
+}
--- a/R/meeting_line.R
+++ b/R/meeting_line.R
@ -0,0 +1,37 @@
+#' @title Meeting Time Trend - Line Chart
+#'
+#' @description
+#' Provides a week by week view of meeting time, visualised as line charts.
+#' By default returns a line chart for meeting hours,
+#' with a separate panel per value in the HR attribute.
+#' Additional options available to return a summary table.
+#'
+#' @inheritParams create_line
+#'
+#' @family Meetings
+#'
+#' @examples
+#'
+#' ## Return a line plot
+#' meeting_line(sq_data, hrvar = "LevelDesignation")
+#'
+#'
+#' ## Return a table
+#' meeting_line(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+meeting_line <- function(data,
+                       hrvar = "Organization",
+                       mingroup = 5,
+                       return = "plot"){
+  ## Inherit arguments
+  create_line(data = data,
+              metric = "Meeting_hours",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+}
--- a/R/meeting_quality.R
+++ b/R/meeting_quality.R
@ -0,0 +1,106 @@
+#' Run a meeting habits / meeting quality analysis
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, e.g. "Organization"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" or "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @family Meeting Culture
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+meeting_quality <- function(data,
+                            hrvar = "Organization",
+                            mingroup = 5,
+                            return = "plot"){
+
+  ## Date range data frame
+  myPeriod <- extract_date_range(data)
+
+  ## Prepare Table
+  data %>%
+    rename(group = !!sym(hrvar)) %>% # Rename hrvar to `group`
+    group_by(PersonId, group) %>%
+    summarize(Meeting_count = mean(Meetings),
+              Meeting_hours = mean(Meeting_hours),
+              Low_quality_meeting_hours = mean(Low_quality_meeting_hours),
+              perc_after_hours_m = 100*mean(After_hours_meeting_hours)/mean(Meeting_hours),
+              perc_low_quality = 100*mean(Low_quality_meeting_hours)/mean(Meeting_hours),
+              perc_Multitasking = 100*mean(Multitasking_meeting_hours)/mean(Meeting_hours),
+              perc_Redundant = 100*mean(Redundant_meeting_hours__organizational_)/mean(Meeting_hours),
+              perc_conflicting  = 100*mean(Conflicting_meeting_hours)/mean(Meeting_hours)) %>%
+    group_by(group) %>%
+    summarise_at(vars(Meeting_count,
+                      Meeting_hours,
+                      Low_quality_meeting_hours,
+                      perc_after_hours_m,
+                      perc_low_quality,
+                      perc_Multitasking,
+                      perc_Redundant,
+                      perc_conflicting),
+                 ~mean(.)) %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group") %>%
+    filter(Employee_Count >= mingroup) -> myTable
+
+ myTable_wide <- myTable %>%
+    reshape2::melt(id.vars = "group") %>%
+    reshape2::dcast(variable ~ group)
+
+  myTable_long  <-  reshape2::melt(myTable, id.vars=c("group"))
+  myTable_plot <- myTable  %>% select(group, perc_low_quality, Meeting_hours)
+
+## Bar plot
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x = perc_low_quality, y = Meeting_hours, size=2)) +
+    geom_point(stat = "identity",
+             fill = "#203864", alpha=0.1) +
+    geom_text(aes(label = group),
+              hjust = 0.2,
+              color = "black",
+              fontface = "bold",
+              size = 4)+
+    ylim(min(myTable_plot$Meeting_hours),max(myTable_plot$Meeting_hours) + 2) +
+    xlim(min(myTable_plot$perc_low_quality),max(myTable_plot$perc_low_quality) + 2) +
+    theme_wpa_basic() +
+    theme(axis.text=element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "none",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+    labs(title = "Meeting Quality",
+         subtitle = paste("Meeting time and low-quality percentage by", tolower(hrvar))) +
+    ylab("Average weekly meeting hours") +
+    xlab("Average weekly percentage of low-quality meeting hours") +
+    labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
+
+  if(return == "table"){
+
+    myTable %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/meeting_rank.R
+++ b/R/meeting_rank.R
@ -0,0 +1,47 @@
+#' @title Meeting Hours Ranking
+#'
+#' @description
+#' This function scans a standard query output for groups with high levels of 'Weekly Email Collaboration'.
+#' Returns a table with a all of groups (across multiple HR attributes) ranked by hours of digital collaboration.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "html" (default, returning an interactive DataTable)
+#' and "df" (data frame)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family Meetings
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+meeting_rank <- function(data,
+                               hrvar = extract_hr(data),
+                               mingroup = 5,
+                               return = "table"){
+
+  output <-
+    data %>% create_rank(metric="Meeting_hours",   hrvar = hrvar,
+                               mingroup = mingroup,
+                               return = "table")
+
+  if(return == "html"){
+    return(create_dt(output))
+  } else if(return == "table"){
+    return(output)
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/meeting_skim.R
+++ b/R/meeting_skim.R
@ -0,0 +1,101 @@
+#' @title Produce a skim summary of meeting hours
+#'
+#' @description
+#' This function returns a skim summary in the console
+#' when provided a standard query in the input.
+#'
+#' @param data A standard person query data in the form of a data frame.
+#' @param return A character string to specify whether to return a message
+#' or a text string. Defaults to "message". Valid options include "text" or
+#' "table".
+#'
+#' @import dplyr
+#' @examples
+#' meeting_skim(sq_data)
+#'
+#' @export
+meeting_skim <- function(data, return = "message"){
+
+  key_output <-
+    data %>%
+    select(PersonId,
+           Meeting_hours,
+           Conflicting_meeting_hours,
+           Multitasking_meeting_hours,
+           Redundant_meeting_hours__lower_level_,
+           Redundant_meeting_hours__organizational_,
+           Low_quality_meeting_hours) %>%
+    summarise_at(vars(-PersonId), ~sum(.)) %>% # sum total
+    tidyr::gather(HourType, Hours, -Meeting_hours) %>%
+    mutate_at(vars(Hours), ~./Meeting_hours) %>%
+    mutate(RawHours = Hours * Meeting_hours)
+
+
+  mh_total <- round(key_output$Meeting_hours[1])
+
+  mh_lowqualityprop <-
+    key_output %>%
+    filter(HourType == "Low_quality_meeting_hours") %>%
+    pull(Hours) %>%
+    `*`(100) %>%
+    round() %>%
+    paste0("%")
+
+  bracket <- function(text){
+    paste0("(", text, ")")
+  }
+
+  extract_prop <- function(filt_chr){
+    key_output %>%
+      filter(HourType == filt_chr) %>%
+      pull(Hours) %>%
+      `*`(100) %>%
+      round() %>%
+      paste0("%")
+  }
+
+  extract_raw <- function(filt_chr){
+    key_output %>%
+      filter(HourType == filt_chr) %>%
+      pull(RawHours) %>%
+      round()
+  }
+
+
+  print_text <-
+    paste("There are",
+          mh_total,
+          "total meeting hours across the analysis population.\n",
+
+          ">>>",
+          extract_raw("Low_quality_meeting_hours"),
+          "are low quality",
+          bracket(extract_prop("Low_quality_meeting_hours")),
+          "\n",
+
+          ">>>",
+          extract_raw("Redundant_meeting_hours__organizational_"),
+          "are redundant",
+          bracket(extract_prop("Redundant_meeting_hours__organizational_")),
+          "\n",
+          ">>>",
+          extract_raw("Conflicting_meeting_hours"),
+          "are conflicting",
+          bracket(extract_prop("Conflicting_meeting_hours")),
+          "\n",
+          ">>>",
+          extract_raw("Multitasking_meeting_hours"),
+          "are multitasking.",
+          bracket(extract_prop("Multitasking_meeting_hours")))
+
+  if(return == "message"){
+    message(print_text)
+    } else if(return == "text"){
+      print_text <- gsub(pattern = ">>>", replacement = " - ", x = print_text)
+      print_text
+    } else if(return == "table"){
+      key_output
+    } else {
+      stop("Please check `return`")
+    }
+}
--- a/R/meeting_summary.R
+++ b/R/meeting_summary.R
@ -0,0 +1,42 @@
+#' @title Meeting Summary
+#'
+#' @description
+#' Provides an overview analysis of weekly meeting hours.
+#' Returns a bar plot showing average weekly meeting hours by default.
+#' Additional options available to return a summary table.
+#'
+#' @inheritParams create_bar
+#'
+#' @family Meetings
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @examples
+#' # Return a ggplot bar chart
+#' meeting_summary(sq_data, hrvar = "LevelDesignation")
+#'
+#' # Return a summary table
+#' meeting_summary(sq_data, hrvar = "LevelDesignation", return = "table")
+#' @export
+meeting_summary <- function(data,
+                              hrvar = "Organization",
+                              mingroup = 5,
+                              return = "plot"){
+
+  create_bar(data = data,
+             metric = "Meeting_hours",
+             hrvar = hrvar,
+             mingroup = mingroup,
+             return = return,
+             bar_colour = "darkblue")
+}
+
+
+
+
+
+
+
+
--- a/R/meeting_tm_report.R
+++ b/R/meeting_tm_report.R
@ -0,0 +1,78 @@
+#' @title Generate a Meeting Text Mining report in HTML
+#'
+#' @description
+#' Create a text mining report in HTML based on Meeting Subject Lines
+#'
+#' @param data A Meeting Query dataset in the form of a data frame.
+#' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' For example, "meeting text mining report".
+#' @param stopwords A single-column data frame labelled 'word' containing custom stopwords to remove.
+#' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
+#' Defaults to TRUE.
+#' @param keep A numeric vector specifying maximum number of words to keep.
+#' @param seed A numeric vector to set seed for random generation.
+#'
+#' @family Text-mining
+#'
+#' @export
+meeting_tm_report <- function(data,
+                              path = "meeting text mining report",
+                              stopwords = NULL,
+                              timestamp = TRUE,
+                              keep = 100,
+                              seed = 100){
+
+  ## Create timestamped path (if applicable)
+  if(timestamp == TRUE){
+    newpath <- paste(path, wpa::tstamp())
+  } else {
+    newpath <- path
+  }
+
+  # Set outputs
+  output_list <-
+    list(md2html(text = read_preamble("blank.md")), # Header
+         data %>% tm_wordcloud(stopwords = stopwords, keep = keep),
+         data %>% tm_freq(token = "words", stopwords = stopwords, keep = keep),
+         data %>% tm_freq(token = "words", stopwords = stopwords, keep = keep, return = "table"),
+         data %>% tm_freq(token = "ngrams", stopwords = stopwords, keep = keep),
+         data %>% tm_freq(token = "ngrams", stopwords = stopwords, keep = keep, return = "table"),
+         data %>% tm_cooc(stopwords = stopwords, seed = seed),
+         data %>% tm_cooc(stopwords = stopwords, seed = seed, return="table")) %>%
+    purrr::map_if(is.data.frame, create_dt)
+
+  # Set header titles
+  title_list <-
+    c("Text Mining Report", # Section header
+      "Word cloud",
+      "Word Frequency",
+	  "",
+      "Phrase Frequency",
+	  "",
+      "Word Co-occurrence",
+	  "")
+
+  # Set header levels
+  n_title <- length(title_list)
+  levels_list <- rep(3, n_title)
+  levels_list[c(1)] <- 2 # Section header
+
+  # Generate report
+  generate_report(title = "Analysis of Meeting Subject Lines",
+                  filename = newpath,
+                  outputs = output_list,
+                  titles = title_list,
+                  subheaders = rep("", n_title),
+                  echos = rep(FALSE, n_title),
+                  levels = levels_list,
+                  theme = "cosmo",
+                  preamble = read_preamble("meeting_tm_report.md"))
+
+}
+
+
+
+
+
+
+
--- a/R/meeting_trend.R
+++ b/R/meeting_trend.R
@ -0,0 +1,105 @@
+#' @title Meeting Hours Time Trend
+#'
+#' @description
+#' Provides a week by week view of meeting time.
+#' By default returns a week by week heatmap, highlighting the points in time with most activity.
+#' Additional options available to return a summary table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Collaboration
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+meeting_trend <- function(data,
+                        hrvar = "Organization",
+                        mingroup=5,
+                        return = "plot"){
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          "Meeting_hours",
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  myPeriod <-
+    data %>%
+    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
+    arrange(Date) %>%
+    mutate(Start = first(Date), End=last(Date)) %>%
+    filter(row_number() == 1) %>%
+    select(Start, End)
+
+  myTable <-
+    data %>%
+    mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, Date, group, Meeting_hours) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy threshold
+
+  myTable <-
+    myTable %>%
+    group_by(Date, group) %>%
+    summarize(Employee_Count = mean(Employee_Count),
+              Meeting_hours = mean(Meeting_hours))
+
+  myTable_plot <- myTable %>% select(Date, group, Meeting_hours)
+
+  myTable_return <- myTable_plot %>% spread(Date, Meeting_hours)
+
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x = Date , y = group , fill = Meeting_hours)) +
+    geom_tile(height=.5) +
+    scale_fill_gradient(name="Hours", low = "white", high = "red") +
+    theme_classic() +
+    theme(axis.text=element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14),
+          legend.position = "right",
+          legend.justification = "right",
+          legend.title=element_text(size=14),
+          legend.text=element_text(size=14)) +
+    labs(title = "Meeting Hours",
+         subtitle = paste("Total meeting time by", tolower(hrvar))) +
+    xlab("Date") +
+    ylab(hrvar) +
+    labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
+
+  if(return == "table"){
+
+    myTable_return %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
--- a/R/meetingtype_dist.R
+++ b/R/meetingtype_dist.R
@ -0,0 +1,40 @@
+#' @title Meeting Type Distribution
+#'
+#' @description
+#' Calculate the hour distribution of internal meeting types.
+#' This is a wrapper around `meetingtype_dist_mt()` and
+#' `meetingtype_dist_ca()`, depending on whether a Meeting Query or a Collaboration Assessment Query
+#' is passed as an input.
+#'
+#' @param data Data frame. If a meeting query, must contain the variables `Attendee` and `DurationHours`.
+#' @param hrvar Character string to specify the HR attribute to split the data by.
+#' Note that this is only applicable if a Collaboration Assessment query is passed to the function. If a Meeting Query
+#' is passed instead, this argument is ignored.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import ggplot2
+#' @import dplyr
+#' @importFrom stats setNames
+#'
+#' @export
+
+meetingtype_dist <- function(data,
+                             hrvar = NULL,
+                             mingroup = 5,
+                             return = "plot"){
+
+  if("MeetingId" %in% names(data)){
+
+    message("Calculating results using a Meeting Query...")
+    meetingtype_dist_mt(data, return = return)
+
+  } else if("PersonId" %in% names(data)){
+
+    message("Calculating results using a Collaboration Assessment Query...")
+    meetingtype_dist_ca(data, hrvar = hrvar, mingroup = mingroup, return = return)
+
+  } else {
+    stop("Please check query type. Must be either a Collaboration Assessment Query or a Meeting Query.")
+  }
+}
--- a/R/meetingtype_dist_ca.R
+++ b/R/meetingtype_dist_ca.R
@ -0,0 +1,230 @@
+#' @title Meeting Type Distribution (Collaboration Assessment Query)
+#'
+#' @description
+#' Calculate the hour distribution of internal meeting types,
+#' using a Collaboration Assessment Query with core WpA variables as an input.
+#'
+#' @param data Meeting Query data frame. Must contain the variables `Attendee` and `DurationHours`
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import ggplot2
+#' @import dplyr
+#' @importFrom stats setNames
+#'
+#' @export
+
+meetingtype_dist_ca <- function(data,
+                                hrvar = NULL,
+                                mingroup = 5,
+                                return = "plot"){
+
+  mt_dist_str <- c("Bloated_meeting_hours",
+                   "Lengthy_meeting_hours",
+                   "Workshop_meeting_hours",
+                   "All_hands_meeting_hours",
+                   "Status_update_meeting_hours",
+                   "Decision_making_meeting_hours",
+                   "One_on_one_meeting_hours")
+
+  mt_dist_str_clean <-
+    mt_dist_str %>%
+    gsub(pattern = "_meeting_hours", replacement = "", x = .) %>%
+    gsub(pattern = "_", replacement = " ", x = .)
+
+
+  ## Add dummy "Total" column if hrvar = NULL
+  if(is.null(hrvar)){
+
+    data <- mutate(data, Total = "Total")
+    hrvar <- "Total"
+
+  }
+
+  ## No org splits
+  if(hrvar == "Total"){
+
+    myResultsTable <-
+      data %>%
+      summarise_at(vars(mt_dist_str), ~sum(., na.rm = TRUE)) %>%
+      gather(MeetingType, AttendeeMeetingHours) %>%
+      mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours),
+             Percent = paste(round(Prop * 100), "%")) %>%
+      mutate(MeetingType = gsub(pattern = "_meeting_hours", replacement = "", x = MeetingType)) %>%
+      mutate(MeetingType = gsub(pattern = "_", replacement = " ", x = MeetingType))
+
+    ## Only for creating the bottom row data
+    myResultsTableTotal <-
+      data %>%
+      summarise_at(vars(mt_dist_str), ~sum(., na.rm = TRUE)) %>%
+      gather(MeetingType, AttendeeMeetingHours) %>%
+      mutate(MeetingType = "Total") %>%
+      group_by(MeetingType) %>%
+      summarise(AttendeeMeetingHours = sum(AttendeeMeetingHours, na.rm = TRUE)) %>%
+      mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours),
+             Percent = paste(round(Prop * 100), "%"))
+
+    outputTable <-
+      rbind(myResultsTable,
+            myResultsTableTotal)
+
+
+  } else {
+
+    ## Group by hrvar
+    myResultsTable <-
+      data %>%
+      group_by(!!sym(hrvar)) %>%
+      summarise_at(vars(mt_dist_str), ~sum(., na.rm = TRUE)) %>%
+      left_join(data %>% hrvar_count(hrvar = hrvar, return = "table"),
+                by = hrvar) %>%
+      filter(n >= mingroup) %>%
+      gather(MeetingType, AttendeeMeetingHours, -!!sym(hrvar), -n) %>%
+      group_by(!!sym(hrvar)) %>%
+      mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours),
+             Percent = paste(round(Prop * 100), "%")) %>%
+      mutate(MeetingType = gsub(pattern = "_meeting_hours", replacement = "", x = MeetingType)) %>%
+      mutate(MeetingType = gsub(pattern = "_", replacement = " ", x = MeetingType))
+
+    outputTable <-
+      myResultsTable %>%
+      ungroup() %>%
+      select(!!sym(hrvar), MeetingType, Prop) %>%
+      spread(MeetingType, Prop)
+
+  }
+
+
+
+  if(hrvar == "Total"){
+
+    base_df <-
+      data.frame(id = 1:7,
+                 value = c("All hands",
+                           "Bloated",
+                           "Status update",
+                           "Decision making",
+                           "One on one",
+                           "Lengthy",
+                           "Workshop"))
+
+    duration <-
+      c(0, 0, 3, 3,
+        0, 0, 2, 2,
+        0, 0, 1, 1,
+        0, 0, 1, 1,
+        0, 0, 1, 1,
+        1, 1, 2, 2,
+        2, 2, 3, 3)
+
+    attendees <-
+      c(4, 5, 5, 4,
+        3, 4, 4, 3,
+        2, 3, 3, 2,
+        1, 2, 2, 1,
+        0, 1, 1, 0,
+        0, 3, 3, 0,
+        0, 4, 4, 0)
+
+    main_plot_df <-
+      rbind(base_df,
+            base_df,
+            base_df,
+            base_df) %>%
+      arrange(id) %>%
+      mutate(Duration = duration,
+             Attendees = attendees)
+
+
+    label_df <-
+      main_plot_df %>%
+      group_by(id, value) %>%
+      summarise(x = mean(Duration),
+                y = mean(Attendees)) %>%
+      mutate(value2 = sub(" ", "\n", value),
+             text = ifelse(id %in% c(1, 2, 6, 7),"#FFFFFF", "black"),
+             fill = ifelse(id %in% c(1, 2, 6, 7),"#31617C", "#9FE5FF")) %>%
+      left_join(select(myResultsTable, MeetingType, Percent),
+                by = c("value" = "MeetingType")) %>%
+      mutate(Percent = ifelse(is.na(Percent), "0 %", Percent)) %>%
+      mutate(value2 = paste(value2, Percent, sep = "\n"))
+
+    colo_v <- setNames(label_df$fill, nm = base_df$value)
+    text_v <- setNames(label_df$text, nm = label_df$value)
+
+    plot_object <-
+      main_plot_df %>%
+      ggplot(aes(x = Duration, y = Attendees)) +
+      geom_polygon(aes(fill = value, group = id),
+                   colour = "#FFFFFF") +
+      scale_fill_manual(values = colo_v) +
+      geom_text(data = label_df, aes(x = x, y = y, label = value2, colour = value), size = 3) +
+      scale_colour_manual(values = text_v) +
+      scale_x_continuous(breaks = 0:3,
+                         labels = c("0", "1", "2+", "")) +
+      scale_y_continuous(breaks = 0:5,
+                         labels = c("2", "3-8", "9-18",
+                                    "19-50", "51+", "")) +
+      labs(title = "Meeting types by attendees and duration",
+           subtitle = "% of total time spent in meetings",
+           ylab = "Attendees",
+           xlab = "Duration (hours)",
+           caption = extract_date_range(data, return = "text")) +
+      theme_wpa_basic() +
+      theme(legend.position = "none")
+
+  } else {
+
+    plot_object <-
+    myResultsTable %>%
+      mutate(MeetingType = factor(MeetingType, levels = rev(mt_dist_str_clean))) %>%
+      mutate(Fill = case_when(MeetingType == "Bloated" ~ rgb2hex(180,180,180),
+                              MeetingType == "Lengthy" ~ rgb2hex(120,120,120),
+                              MeetingType == "Workshop" ~ rgb2hex(90,90,90),
+                              MeetingType == "All hands" ~ rgb2hex(60,60,60),
+                              MeetingType == "Status update" ~ rgb2hex(45,160,160),
+                              MeetingType == "Decision making" ~ rgb2hex(25,120,140),
+                              MeetingType == "One on one" ~ rgb2hex(5,85,100))) %>%
+      ggplot(aes(x = !!sym(hrvar), y = Prop, group = MeetingType, fill = Fill)) +
+      geom_bar(position = "stack", stat = "identity") +
+      geom_text(aes(label = paste(round(Prop * 100), "%")),
+                position = position_stack(vjust = 0.5),
+                color = "#FFFFFF",
+                fontface = "bold") +
+      scale_fill_identity(name = "Coaching styles",
+                          breaks = c(
+                            rgb2hex(180,180,180),
+                            rgb2hex(120,120,120),
+                            rgb2hex(90,90,90),
+                            rgb2hex(60,60,60),
+                            rgb2hex(45,160,160),
+                            rgb2hex(25,120,140),
+                            rgb2hex(5,85,100)
+                            ),
+                          labels = mt_dist_str_clean,
+                          guide = "legend") +
+    coord_flip() +
+    theme_wpa_basic() +
+    labs(title = "Meeting types by attendees and duration",
+         subtitle = "% of total time spent in meetings",
+         y = "Percentage",
+         caption = extract_date_range(data, return = "text"))
+
+
+  }
+
+  if(return == "table"){
+
+    outputTable
+
+    } else if(return == "plot"){
+
+      return(plot_object)
+
+    } else {
+
+      stop("Please enter a valid input for `return`.")
+
+    }
+}
--- a/R/meetingtype_dist_mt.R
+++ b/R/meetingtype_dist_mt.R
@ -0,0 +1,143 @@
+#' @title Meeting Type Distribution (Meeting Query)
+#'
+#' @description
+#' Calculate the hour distribution of internal meeting types,
+#' using a Meeting Query with core WpA variables as an input.
+#'
+#' @param data Meeting Query data frame. Must contain the variables `Attendee` and `DurationHours`
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import ggplot2
+#' @import dplyr
+#' @importFrom stats setNames
+#'
+#' @export
+meetingtype_dist_mt <- function(data, return = "plot"){
+
+  ## Date range data frame
+  myPeriod <- extract_date_range(data)
+
+  data_typed <-
+    data %>%
+    mutate(MeetingType = case_when(
+      Attendees == 2 & DurationHours <= 1 ~ "One on one",
+      Attendees >= 3 & Attendees <= 8 & DurationHours <= 1  ~ "Decision making",
+      Attendees >= 9 & Attendees <= 18 & DurationHours <= 1  ~ "Status update",
+      Attendees >= 19 & Attendees <= 50 & DurationHours <= 2  ~ "Bloated",
+      Attendees <= 18 & DurationHours >= 1 & DurationHours <= 2 ~ "Lengthy",
+      Attendees >= 51 ~ "All hands",
+      Attendees <= 50 & DurationHours >= 2 ~ "Workshop",
+      TRUE ~ NA_character_))
+
+  myResultsTable <-
+    data_typed %>%
+    group_by(MeetingType) %>%
+    summarise(AttendeeMeetingHours = sum(Attendee_meeting_hours),
+              TotalMeetings = n()) %>%
+    mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours)) %>%
+    mutate(Percent = paste(round(Prop * 100), "%"))
+
+  myResultsTableTotal <-
+    data_typed %>%
+    summarise(AttendeeMeetingHours = sum(Attendee_meeting_hours),
+              TotalMeetings = n()) %>%
+    mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours),
+           Percent = paste(round(Prop * 100), "%"),
+           MeetingType = "Total") %>%
+    select(MeetingType, tidyselect::everything())
+
+  base_df <-
+    data.frame(id = 1:7,
+               value = c("All hands",
+                         "Bloated",
+                         "Status update",
+                         "Decision making",
+                         "One on one",
+                         "Lengthy",
+                         "Workshop"))
+
+  duration <-
+    c(0, 0, 3, 3,
+      0, 0, 2, 2,
+      0, 0, 1, 1,
+      0, 0, 1, 1,
+      0, 0, 1, 1,
+      1, 1, 2, 2,
+      2, 2, 3, 3)
+
+  attendees <-
+    c(4, 5, 5, 4,
+      3, 4, 4, 3,
+      2, 3, 3, 2,
+      1, 2, 2, 1,
+      0, 1, 1, 0,
+      0, 3, 3, 0,
+      0, 4, 4, 0)
+
+  main_plot_df <-
+    rbind(base_df,
+          base_df,
+          base_df,
+          base_df) %>%
+    arrange(id) %>%
+    mutate(Duration = duration,
+           Attendees = attendees)
+
+
+  label_df <-
+    main_plot_df %>%
+    group_by(id, value) %>%
+    summarise(x = mean(Duration),
+              y = mean(Attendees)) %>%
+    mutate(value2 = sub(" ", "\n", value),
+           text = ifelse(id %in% c(1, 2, 6, 7),"#FFFFFF", "black"),
+           fill = ifelse(id %in% c(1, 2, 6, 7),"#31617C", "#9FE5FF")) %>%
+    left_join(select(myResultsTable, MeetingType, Percent),
+              by = c("value" = "MeetingType")) %>%
+    mutate(Percent = ifelse(is.na(Percent), "0 %", Percent)) %>%
+    mutate(value2 = paste(value2, Percent, sep = "\n"))
+
+  colo_v <- setNames(label_df$fill, nm = base_df$value)
+  text_v <- setNames(label_df$text, nm = label_df$value)
+
+  plot_object <-
+    main_plot_df %>%
+    ggplot(aes(x = Duration, y = Attendees)) +
+    geom_polygon(aes(fill = value, group = id),
+                 colour = "#FFFFFF") +
+    scale_fill_manual(values = colo_v) +
+    geom_text(data = label_df, aes(x = x, y = y, label = value2, colour = value), size = 3) +
+    scale_colour_manual(values = text_v) +
+    scale_x_continuous(breaks = 0:3,
+                       labels = c("0", "1", "2+", "")) +
+    scale_y_continuous(breaks = 0:5,
+                       labels = c("2", "3-8", "9-18",
+                                  "19-50", "51+", "")) +
+    labs(title = "Meeting types by attendees and duration",
+         subtitle = "% of total time spent in meetings",
+         ylab = "Attendees",
+         xlab = "Duration (hours)",
+         caption = paste("Data from week of",
+                         myPeriod$Start,
+                         "to week of",
+                         myPeriod$End)) +
+    theme_wpa_basic() +
+    theme(legend.position = "none")
+
+
+    if(return == "table"){
+
+      rbind(myResultsTable,
+            myResultsTableTotal)
+
+    } else if(return == "plot"){
+
+      return(plot_object)
+
+    } else {
+
+      stop("Please enter a valid input for `return`.")
+
+    }
+}
--- a/R/meetingtype_summary.R
+++ b/R/meetingtype_summary.R
@ -0,0 +1,69 @@
+#' @title Create a summary bar chart of the proportion of Meeting Hours spent in Long or Large Meetings
+#'
+#' @description
+#' This function creates a bar chart showing the percentage of meeting hours which are spent in
+#' long or large meetings.
+#'
+#' @param data Collaboration Assessment query in the form of a data frame. Requires the following variables:
+#' - `Bloated_meeting_hours`
+#' - `Lengthy_meeting_hours`
+#' - `Workshop_meeting_hours`
+#' - `All_hands_meeting_hours`
+#' - `Status_update_meeting_hours`
+#' - `Decision_making_meeting_hours`
+#' - `One_on_one_meeting_hours`
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#' but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import ggplot2
+#' @import dplyr
+#'
+#' @export
+meetingtype_summary <- function(data,
+                                hrvar = "Organization",
+                                mingroup = 5,
+                                return = "plot"){
+
+  mt_dist_str <- c("Bloated_meeting_hours",
+                   "Lengthy_meeting_hours",
+                   "Workshop_meeting_hours",
+                   "All_hands_meeting_hours",
+                   "Status_update_meeting_hours",
+                   "Decision_making_meeting_hours",
+                   "One_on_one_meeting_hours")
+
+  returnTable <-
+    data %>%
+    group_by(!!sym(hrvar)) %>%
+    summarise_at(vars(mt_dist_str), ~sum(., na.rm = TRUE)) %>%
+    gather(MeetingType, AttendeeMeetingHours, -!!sym(hrvar)) %>%
+    mutate(MeetingType = gsub(pattern = "_meeting_hours", replacement = "", x = MeetingType)) %>%
+    mutate(MeetingType = gsub(pattern = "_", replacement = " ", x = MeetingType)) %>%
+    group_by(!!sym(hrvar)) %>%
+    mutate(AttendeeMeetingHours = AttendeeMeetingHours / sum(AttendeeMeetingHours)) %>%
+    spread(MeetingType, AttendeeMeetingHours) %>%
+    left_join(hrvar_count(data, hrvar, return = "table"), by = hrvar) %>%
+    filter(n >= mingroup) %>%
+    ungroup() %>%
+    mutate(MeetingHoursInLongOrLargeMeetings = select(., c("All hands", "Bloated", "Lengthy", "Workshop")) %>%
+             apply(1, sum, na.rm = TRUE)) %>%
+    select(!!sym(hrvar), MeetingHoursInLongOrLargeMeetings, n)
+
+  if(return == "plot"){
+    returnTable %>%
+      create_bar_asis(group_var = hrvar,
+                      bar_var = "MeetingHoursInLongOrLargeMeetings",
+                      title = "% of Meeting Hours\nin Long or Large Meetings",
+                      subtitle = paste0("By ", camel_clean(hrvar)),
+                      caption = extract_date_range(data, return = "text"),
+                      percent = TRUE,
+                      bar_colour = "alert")
+  } else if(return == "table"){
+    returnTable
+  } else {
+    stop("Please enter a valid input for `return`.")
+  }
+}
--- a/R/mgrcoatt_dist.R
+++ b/R/mgrcoatt_dist.R
@ -0,0 +1,120 @@
+#' @title Manager meeting coattendance distribution
+#'
+#' @description
+#' Analyze degree of attendance between employes and their managers.
+#' Returns a stacked bar plot of different buckets of coattendance.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats median
+#' @importFrom stats sd
+#'
+#' @family Managerial Relations
+#' @family Meeting Culture
+#'
+#' @examples
+#' workloads_dist(sq_data, hrvar = "Organization", return = "table")
+#' @export
+
+mgrcoatt_dist <- function(data, hrvar = "Organization", mingroup = 5, return = "plot") {
+
+myPeriod <-
+    data %>%
+    mutate(Date=as.Date(Date, "%m/%d/%Y")) %>%
+    arrange(Date) %>%
+    mutate(Start=first(Date), End=last(Date)) %>%
+    filter(row_number()==1) %>%
+    select(Start, End)
+
+  ## Basic Data for bar plot
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>%
+    group_by(PersonId) %>%
+	filter(Meeting_hours>0) %>%
+	mutate(coattendman_rate = Meeting_hours_with_manager / Meeting_hours) %>%
+    summarise(periods = n(),
+              group = first(group), coattendman_rate=mean(coattendman_rate)) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)
+
+  ## Create buckets of coattendance time
+  plot_data <-
+    plot_data %>%
+    mutate(bucket_coattendman_rate = case_when(coattendman_rate>=0 &  coattendman_rate<.20 ~ "0 - 20%",
+                                                                        coattendman_rate>=.20 & coattendman_rate<.4 ~ "20 - 40%",
+                                                                        coattendman_rate>=.40 & coattendman_rate<.6 ~ "40 - 60%",
+                                                                        coattendman_rate>=.6 ~ "60% +"))
+
+
+  ## Employee count / base size table
+  plot_legend <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Employee_Count=first(Employee_Count)) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count))
+
+  ## Data for bar plot
+  plot_table <-
+    plot_data %>%
+    group_by(group, bucket_coattendman_rate) %>%
+    summarize(Employees=n(),
+              Employee_Count=first(Employee_Count), percent= Employees / Employee_Count) %>%
+    arrange(group, bucket_coattendman_rate)
+
+  ## Table for annotation
+  annot_table <-
+    plot_legend %>%
+    dplyr::left_join(plot_table, by = "group")
+
+  ## Bar plot
+  plot_object <-
+    plot_table %>%
+    ggplot(aes(x = group, y=Employees, fill = bucket_coattendman_rate)) +
+    geom_bar(stat = "identity", position = position_fill(reverse = TRUE)) +
+	coord_flip() +
+	scale_y_continuous(labels = function(x) paste0(x*100, "%")) +
+	annotate("text", x = plot_legend$group, y = -.05, label = plot_legend$Employee_Count ) +
+	scale_fill_manual(name="", values = c("#bed6f2", "#e9f1fb","#ffdfd3","#FE7F4F")) +
+	theme_classic() +
+    theme(axis.text=element_text(size=12),
+          plot.title = element_text(color="grey40", face="bold", size=18),
+          plot.subtitle = element_text(size=14), legend.position = "top", legend.justification = "right",
+          legend.title=element_text(size=14), legend.text=element_text(size=14)) +
+	labs(title = "Time with Manager", subtitle = paste("Meeting Co-attendance Rate by", hrvar)) +
+	xlab(hrvar) +
+	ylab("Fraction of Employees") +
+	labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
+
+  ## Table to return
+  return_table <-
+    plot_table %>%
+    select(group, bucket_coattendman_rate,  percent) %>%
+    spread(bucket_coattendman_rate,  percent)
+
+  if(return == "table"){
+
+    return_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/mgrrel_matrix.R
+++ b/R/mgrrel_matrix.R
@ -0,0 +1,223 @@
+#' @title Manager Relationship 2x2 Matrix
+#'
+#' @description
+#' Generate the Manager-Relationship 2x2 matrix, returning a ggplot object by default.
+#' Additional options available to return a "wide" or "long" summary table.
+#'
+#' @param data Standard Query data to pass through. Accepts a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector,
+#' e.g. "Organization". Defaults to NULL.
+#' @param return A character vector specifying whether to return a matrix plot or a table.
+#' Defaults to the 2 by 2 matrix. Valid values are "plot", "table", and "chartdata".
+#' @param plot_colors
+#' Pass a character vector of length 4 containing HEX codes to specify colors to use in plotting.
+#' @param threshold
+#' Specify a numeric value to determine threshold (in minutes) for 1:1 manager hours.
+#' Defaults to 15.
+#'
+#' @import dplyr
+#' @import reshape2
+#' @import ggplot2
+#' @importFrom scales percent
+#'
+#' @family Managerial Relations
+#' @family Meeting Culture
+#'
+#' @examples
+#' mgrrel_matrix(sq_data)
+#'
+#' @export
+mgrrel_matrix <- function(data,
+                          hrvar = NULL,
+                          return = "plot",
+                          plot_colors = c("#FE7F4F", "#C3E5FF", "#FFD4C4", "#FEAA8A"),
+                          threshold = 15){
+
+  ## Add dummy "Total" column if hrvar = NULL
+  if(is.null(hrvar)){
+
+      data <- mutate(data, Total = "Total")
+      hrvar <- "Total"
+
+    }
+
+
+  ## Check inputs
+  required_variables <- c("Date",
+                          hrvar,
+                          "PersonId",
+                          "Meeting_hours_with_manager",
+                          "Meeting_hours",
+                          "Meeting_hours_with_manager_1_on_1")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Create a Person Weekly Average
+  data1 <-
+    data %>%
+    mutate(coattendman_rate = Meeting_hours_with_manager / Meeting_hours) %>% # Coattendance Rate with Manager
+    filter(!is.na(coattendman_rate)) %>%
+    group_by(PersonId, !!sym(hrvar)) %>%
+    summarise_at(vars(Meeting_hours_with_manager,
+                      Meeting_hours,
+                      Meeting_hours_with_manager_1_on_1,
+                      coattendman_rate),
+                 ~mean(.)) %>%
+    ungroup()
+
+
+  ## Threshold
+  thres_low_chr <- paste("<", threshold, "min")
+  thres_top_chr <- paste(">", threshold, "min")
+
+
+  ## Create key variables
+  data2 <-
+    data1 %>%
+    mutate(coattendande = ifelse(coattendman_rate < 0.5, "<50%", ">50%"),
+           mgr1on1 = ifelse(Meeting_hours_with_manager_1_on_1 * 60 < threshold,
+                            thres_low_chr,
+                            thres_top_chr))
+
+
+  ## Grouping variable split
+  if(hrvar == "Total"){
+    data2 %>%
+      count(mgr1on1, coattendande) %>%
+      mutate(perc = n / sum(n)) %>% # Calculate percentages
+      mutate(xmin = ifelse(mgr1on1 == thres_low_chr, -sqrt(perc), 0),
+             xmax = ifelse(mgr1on1 == thres_top_chr, sqrt(perc), 0),
+             ymin = ifelse(coattendande == "<50%", -sqrt(perc), 0),
+             ymax = ifelse(coattendande == ">50%", sqrt(perc), 0),
+             mgrRel = case_when(mgr1on1 == thres_low_chr & coattendande == "<50%" ~ "Under-coached",
+                                mgr1on1 == thres_low_chr & coattendande == ">50%" ~ "Co-attending",
+                                mgr1on1 == thres_top_chr & coattendande == ">50%" ~ "Highly managed",
+                                TRUE ~ "Coaching")) %>%
+      mutate_at("mgrRel", ~as.factor(.)) -> chart
+
+    chart %>%
+      select(mgrRel, n, perc) %>%
+      group_by(mgrRel) %>%
+      summarise_all(~sum(., na.rm = TRUE)) -> clean_tb
+
+  } else if(hrvar != "Total"){
+
+    data2 %>%
+      count(!!sym(hrvar), mgr1on1, coattendande) %>%
+      group_by(!!sym(hrvar)) %>%
+      mutate(perc = n / sum(n)) %>% # Calculate percentages
+      mutate(xmin = ifelse(mgr1on1 == thres_low_chr, -sqrt(perc), 0),
+             xmax = ifelse(mgr1on1 == thres_top_chr, sqrt(perc), 0),
+             ymin = ifelse(coattendande == "<50%", -sqrt(perc), 0),
+             ymax = ifelse(coattendande == ">50%", sqrt(perc), 0),
+             mgrRel = case_when(mgr1on1 == thres_low_chr & coattendande == "<50%" ~ "Under-coached",
+                                mgr1on1 == thres_low_chr & coattendande == ">50%" ~ "Co-attending",
+                                mgr1on1 == thres_top_chr & coattendande == ">50%" ~ "Highly managed",
+                                TRUE ~ "Coaching")) %>%
+      ungroup() %>%
+      mutate_at("mgrRel", ~as.factor(.)) -> chart
+
+    chart %>%
+      select(mgrRel, !!sym(hrvar), n, perc) %>%
+      group_by(mgrRel, !!sym(hrvar)) %>%
+      summarise_all(~sum(., na.rm = TRUE)) -> clean_tb
+
+  }
+
+  ## Sort colours out
+  # Legacy variable names
+  myColors <- plot_colors
+  names(myColors) <- levels(chart$mgrRel)
+
+  ## Show stacked bar chart if multiple groups
+
+  if(hrvar == "Total"){
+
+    plot <-
+      chart %>%
+      ggplot() +
+      geom_rect(aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, fill = mgrRel), color = "white") +
+      scale_fill_manual(name = "mgrRel", values = myColors) +
+      geom_text(aes(x = xmin + 0.5*sqrt(perc),
+                    y = ymin + 0.5*sqrt(perc),
+                    label = scales::percent(perc, accuracy = 1))) +
+      coord_equal() +
+      xlab("Weekly 1:1 time with manager") +
+      scale_x_continuous(breaks = c(-max(abs(chart$xmin),abs(chart$xmax))/2,max(abs(chart$xmin),abs(chart$xmax))/2),
+                         labels = c(thres_low_chr, thres_top_chr),
+                         limits = c(-max(abs(chart$xmin), abs(chart$xmax)), max(abs(chart$xmin), abs(chart$xmax)))) +
+      ylab("Employee and manager coattend") +
+      scale_y_continuous(breaks = c(-max(abs(chart$ymin), abs(chart$ymax))/2, max(abs(chart$ymin), abs(chart$ymax))/2),
+                         labels = c("<50%", ">50%"),
+                         limits = c(-max(abs(chart$ymin), abs(chart$ymax)), max(abs(chart$ymin), abs(chart$ymax)))) +
+      theme_minimal() +
+      theme(panel.grid = element_blank(),
+            plot.title = element_text(color="grey40", face="bold", size=20),
+            axis.line = element_line(),
+            legend.position = "bottom",
+            legend.title = element_blank(),
+            axis.title = element_text(size = 12),
+            axis.text = element_text(size = 12),
+            legend.text = element_text(size = 12),
+            strip.text.x = element_text(color = "grey40", face = "bold", size = 14)) +
+      labs(caption = extract_date_range(data, return = "text"))
+
+  } else if(hrvar != "Total"){
+    plot <-
+      chart %>%
+      mutate(Fill = case_when(mgrRel == "Co-attending" ~ rgb2hex(68,151,169),
+                              mgrRel == "Coaching" ~ rgb2hex(95,190,212),
+                              mgrRel == "Highly managed" ~ rgb2hex(49,97,124),
+                              mgrRel == "Under-coached" ~ rgb2hex(89,89,89))) %>%
+      ggplot(aes(x = !!sym(hrvar), y = perc, group = mgrRel, fill = Fill)) +
+      geom_bar(position = "stack", stat = "identity") +
+      geom_text(aes(label = paste(round(perc * 100), "%")),
+                position = position_stack(vjust = 0.5),
+                color = "#FFFFFF",
+                fontface = "bold") +
+      scale_fill_identity(name = "Coaching styles",
+                          breaks = c(rgb2hex(68,151,169),
+                                     rgb2hex(95,190,212),
+                                     rgb2hex(49,97,124),
+                                     rgb2hex(89,89,89)),
+                          labels = c("Co-attending",
+                                     "Coaching",
+                                     "Highly managed",
+                                     "Under-coached"),
+                          guide = "legend") +
+      coord_flip() +
+      theme_wpa_basic() +
+      labs(title = "Distribution of types of \nmanager-direct relationship across organizations",
+           subtitle = "Based on manager 1:1 time and percentage of overall time spent with managers")
+  }
+
+
+  if(return == "plot"){
+
+    plot +
+      labs(title = "Distribution of types of \nmanager-direct relationship",
+           subtitle = "Based on manager 1:1 time and percentage of\noverall time spent with managers")
+
+  } else if(return == "table"){
+
+    clean_tb %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "chartdata"){
+
+    chart %>%
+      as_tibble() %>%
+      return()
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
+
+
--- a/R/mt_data.R
+++ b/R/mt_data.R
@ -0,0 +1,42 @@
+#' @title Sample Meeting Query dataset
+#'
+#' @description
+#' A dataset generated from a Meeting Query from WpA.
+#'
+#' @format A data frame with 2000 rows and 30 variables:
+#' \describe{
+#'   \item{MeetingId}{ }
+#'   \item{StartDate}{ }
+#'   \item{StartTimeUTC}{ }
+#'   \item{EndDate}{ }
+#'   \item{EndTimeUTC}{ }
+#'   \item{Attendee_meeting_hours}{ }
+#'   \item{Attendees}{ }
+#'   \item{Organizer_Domain}{ }
+#'   \item{Organizer_FunctionType}{ }
+#'   \item{Organizer_LevelDesignation}{ }
+#'   \item{Organizer_Layer}{ }
+#'   \item{Organizer_Region}{ }
+#'   \item{Organizer_Organization}{ }
+#'   \item{Organizer_zId}{ }
+#'   \item{Organizer_attainment}{ }
+#'   \item{Organizer_TimeZone}{ }
+#'   \item{Organizer_HourlyRate}{ }
+#'   \item{Organizer_IsInternal}{ }
+#'   \item{Organizer_PersonId}{ }
+#'   \item{IsCancelled}{ }
+#'   \item{DurationHours}{ }
+#'   \item{IsRecurring}{ }
+#'   \item{Subject}{ }
+#'   \item{TotalAccept}{ }
+#'   \item{TotalNoResponse}{ }
+#'   \item{TotalDecline}{ }
+#'   \item{TotalNoEmailsDuringMeeting}{ }
+#'   \item{TotalNoDoubleBooked}{ }
+#'   \item{TotalNoAttendees}{ }
+#'   \item{MeetingResources}{ }
+#'
+#'   ...
+#' }
+#' @source \url{https://workplaceanalytics-demo.office.com/en-us/Home}
+"mt_data"
--- a/R/one2one_dist.R
+++ b/R/one2one_dist.R
@ -0,0 +1,110 @@
+#' @title Manager 1:1 Time Trend distribution
+#'
+#' @description
+#' Analyze Manager 1:1 Time distribution.
+#' Returns a stacked bar plot of different buckets of 1:1 time.
+#' Additional options available to return a table with distribution elements.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats median
+#' @importFrom stats sd
+#'
+#' @family Managerial Relations
+#'
+#' @examples
+#' workloads_dist(sq_data, hrvar = "Organization", return = "table")
+#' @export
+
+one2one_dist <- function(data, hrvar = "Organization", mingroup = 5, return = "plot") {
+
+  ## Date range data frame
+  myPeriod <- extract_date_range(data)
+
+  ## Basic Data for bar plot
+  plot_data <-
+    data %>%
+    rename(group = !!sym(hrvar)) %>%
+    group_by(PersonId, group) %>%
+    summarise(Minutes_with_manager_1_on_1 = mean(Meeting_hours_with_manager_1_on_1 * 60)) %>%
+    ungroup() %>%
+    left_join(data %>%
+                rename(group = !!sym(hrvar)) %>%
+                group_by(group) %>%
+                summarise(Employee_Count = n_distinct(PersonId)),
+              by = "group") %>%
+    filter(Employee_Count >= mingroup)
+
+  ## Create buckets of 1:1 time
+  plot_data <-
+    plot_data %>%
+    mutate(bucket_manager_1_on_1 = case_when(Minutes_with_manager_1_on_1 == 0 ~ "0 minutes",
+                                             Minutes_with_manager_1_on_1 > 0 &  Minutes_with_manager_1_on_1 < 15 ~ "1 - 15 minutes",
+                                             Minutes_with_manager_1_on_1 >= 15 & Minutes_with_manager_1_on_1 < 30 ~ "15 - 30 minutes",
+                                             Minutes_with_manager_1_on_1 >= 30 ~ "30 min +"))
+
+
+  ## Employee count / base size table
+  plot_legend <-
+    plot_data %>%
+    group_by(group) %>%
+    summarize(Employee_Count=first(Employee_Count)) %>%
+    mutate(Employee_Count = paste("n=",Employee_Count))
+
+  ## Data for bar plot
+  plot_table <-
+    plot_data %>%
+    group_by(group, bucket_manager_1_on_1) %>%
+    summarize(Employees = n(),
+              Employee_Count = first(Employee_Count),
+              percent = Employees / Employee_Count) %>%
+    arrange(group, desc(bucket_manager_1_on_1))
+
+  ## Table for annotation
+  annot_table <-
+    plot_legend %>%
+    dplyr::left_join(plot_table, by = "group")
+
+  ## Bar plot
+  plot_object <-
+    plot_table %>%
+    ggplot(aes(x = group, y=Employees, fill = bucket_manager_1_on_1)) +
+    geom_bar(stat = "identity", position = position_fill(reverse = TRUE)) +
+	scale_y_continuous(labels = function(x) paste0(x*100, "%")) +
+	coord_flip() +
+	annotate("text", x = plot_legend$group, y = -.05, label = plot_legend$Employee_Count ) +
+	scale_fill_manual(name="", values = c("#FE7F4F", "#ffdfd3", "#bed6f2", "#e9f1fb")) +
+  theme_wpa_basic() +
+	labs(title = "Time with Manager", subtitle = paste("Scheduled 1:1 weekly meeting minutes by", tolower(hrvar))) +
+	xlab(hrvar) +
+	ylab("Fraction of employees") +
+	labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
+
+  ## Table to return
+  return_table <-  plot_table %>% select(group, bucket_manager_1_on_1,  percent) %>% spread(bucket_manager_1_on_1,  percent)
+
+
+  if(return == "table"){
+
+    return_table %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/one2one_line.R
+++ b/R/one2one_line.R
@ -0,0 +1,47 @@
+#' @title Manager 1:1 Time Trend - Line Chart
+#'
+#' @description
+#' Provides a week by week view of 1:1 time with managers, visualised as line charts.
+#' By default returns a line chart for 1:1 meeting hours,
+#' with a separate panel per value in the HR attribute.
+#' Additional options available to return a summary table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Managerial Relations
+#'
+#' @examples
+#'
+#' ## Return a line plot
+#' one2one_line(sq_data, hrvar = "LevelDesignation")
+#'
+#'
+#' ## Return a table
+#' one2one_line(sq_data, hrvar = "LevelDesignation", return = "table")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+one2one_line <- function(data,
+                         hrvar = "Organization",
+                         mingroup = 5,
+                         return = "plot"){
+  ## Inherit arguments
+  create_line(data = data,
+              metric = "Meeting_hours_with_manager_1_on_1",
+              hrvar = hrvar,
+              mingroup = mingroup,
+              return = return)
+}
--- a/R/one2one_rank.R
+++ b/R/one2one_rank.R
@ -0,0 +1,47 @@
+#' @title Manager 1:1 Time Ranking
+#'
+#' @description
+#' This function scans a standard query output for groups with high levels of 'Manager 1:1 Time'.
+#' Returns a table with a all of groups (across multiple HR attributes) ranked by hours of digital collaboration.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar A list of HR Variables to consider in the scan.
+#' Defaults to all HR attributes identified.
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size.
+#' Defaults to 5.
+#' @param return A character vector specifying what to return.
+#' Valid values include "html" (default, returning an interactive DataTable)
+#' and "df" (data frame)
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#' @importFrom stats reorder
+#'
+#' @family Managerial Relations
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+one2one_rank <- function(data,
+                               hrvar = extract_hr(data),
+                               mingroup = 5,
+                               return = "table"){
+
+  output <-
+    data %>% create_rank(metric="Meeting_hours_with_manager_1_on_1",   hrvar = hrvar,
+                               mingroup = mingroup,
+                               return = "table")
+
+  if(return == "html"){
+    return(create_dt(output))
+  } else if(return == "table"){
+    return(output)
+  } else {
+    stop("Invalid `return` argument.")
+  }
+}
--- a/R/one2one_sum.R
+++ b/R/one2one_sum.R
@ -0,0 +1,45 @@
+#' @title Manager 1:1 Time Summary
+#'
+#' @description
+#' Provides an overview analysis of Manager 1:1 Time.
+#' Returns a bar plot showing average weekly minutes of Manager 1:1 Time by default.
+#' Additional options available to return a summary table.
+#'
+#' @inheritParams create_bar
+#'
+#' @family Managerial Relations
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+one2one_sum <- function(data,
+                        hrvar = "Organization",
+                        mingroup = 5,
+                        return = "plot"){
+
+  cleaned_data <-
+    data %>%
+    mutate(`Scheduled 1:1 meeting minutes with manager` = Meeting_hours_with_manager_1_on_1 * 60)
+
+
+  create_bar(data = cleaned_data,
+             hrvar = hrvar,
+             mingroup = mingroup,
+             metric = "Scheduled 1:1 meeting minutes with manager",
+             return = return,
+             bar_colour = "darkblue")
+
+}
+
+
+
+
+
+
+
+
+
+
--- a/R/one2one_trend.R
+++ b/R/one2one_trend.R
@ -0,0 +1,86 @@
+#' @title Manager 1:1 Time Trend
+#'
+#' @description
+#' Provides a week by week view of scheduled manager 1:1 Time.
+#' By defualt returns a week by week heatmap, highlighting the points in time with most activity.
+#' Additional options available to return a summary table.
+#'
+#' @param data A Standard Query dataset in the form of a data frame.
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector (e.g. "LevelDesignation")
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import reshape2
+#' @import scales
+#'
+#' @family Managerial Relations
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+
+one2one_trend <- function(data, hrvar = "Organization", mingroup=5, return = "plot"){
+
+  ## Date range data frame
+  myPeriod <- extract_date_range(data)
+
+  myTable <-
+    data %>%
+    mutate(Date=as.Date(Date, "%m/%d/%Y")) %>%
+    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
+    select(PersonId, Date, group, Meeting_hours_with_manager_1_on_1) %>%
+    group_by(group) %>%
+    mutate(Employee_Count = n_distinct(PersonId)) %>%
+    filter(Employee_Count >= mingroup)  # Keep only groups above privacy treshold
+
+  myTable <-
+    myTable %>%
+    group_by(Date, group) %>%
+    summarize(Employee_Count=mean(Employee_Count),
+              Minutes_with_manager_1_on_1 = mean(Meeting_hours_with_manager_1_on_1 * 60))
+
+  myTable_plot <-
+    myTable %>%
+    select(Date, group, Minutes_with_manager_1_on_1)
+
+  myTable_return <- myTable_plot %>% spread(Date, Minutes_with_manager_1_on_1)
+
+  plot_object <-
+    myTable_plot %>%
+    ggplot(aes(x =Date , y = group , fill = Minutes_with_manager_1_on_1)) +
+    geom_tile(height=.5) +
+    scale_fill_gradient(name="Minutes", low = "white", high = "red") +
+    theme_wpa_basic() +
+    labs(title = "Time with Manager",
+         subtitle = paste("Average scheduled 1:1 meeting minutes by", tolower(hrvar))) +
+    xlab("Date") +
+    ylab(hrvar) +
+    labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
+
+  if(return == "table"){
+
+    myTable_return %>%
+      as_tibble() %>%
+      return()
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+
+}
+
+
+
+
+
--- a/Показать больше
+++ b/Показать больше