Merge branch 'main' into IV_with_p_value

2021-01-07 16:53:17 +00:00 · 2021-01-07 16:53:17 +00:00 · 5cc7cdedd2
--- a/.Rbuildignore
+++ b/.Rbuildignore
@ -12,5 +12,9 @@
 ^\.github$
 ^man/figures/wpa cheatsheet_20201116.pdf$
 ^man/figures/wpa cheatsheet_20201116.png$
+^man/figures/wpa cheatsheet.pdf$
+^man/figures/wpa cheatsheet.png$
 ^vignettes/best-practices.Rmd$
 ^vignettes/text-mining-capabilities.Rmd$
+^vignettes/IV-report.Rmd$
+^CONTRIBUTORS.svg$
--- a/.github/AnalystGuideOutline.txt
+++ b/.github/AnalystGuideOutline.txt
@ -0,0 +1,26 @@
+Analyst Guide
+
+
+1. Introduction
+
+ - Welcome Note
+ - Why use R for Workplace Analytics?
+ - Pre-Requisites
+ - Useful Resources
+ - Analysis Guide Contents
+ 
+2. Getting Stared
+ 
+ - Intallation Instructions
+ - Importing Workplace Analytics Data
+ - Exploring the dataset
+ - Function Structure and additional parameters
+ - Exporting Plots and Tables
+
+3. Summary Functions
+
+4. Distribution Functions
+
+5. Trend Functions
+
+6. Network Functions
--- a/.github/analyst_guide.md
+++ b/.github/analyst_guide.md
@ -1,19 +1,43 @@
-# Analyst Guide
+# Introduction

-This is an analyst guide for the **wpa** package. Please use the links in the navigation bar above to access the individual sections.
+Welcome to the Analyst Guide of the **wpa** R package. This document will guide you through the installation process, explain the package functionality and structure, and get you comfortable with some common functions for analysis.
+
+## Before we begin...
+
+Make sure you have:
+1.  Analyst access to [Workplace Analytics](https://www.microsoft.com/en-us/microsoft-365/business/workplace-analytics)
+2.  [R](https://www.r-project.org/) installed in your local device. 
+
+An IDE is optional, but we recommend either [RStudio Desktop](https://rstudio.com/products/rstudio/download/#download) or [VS Code](https://code.visualstudio.com/).

 ## Why use R for Workplace Analytics?

 There are multiple reasons:

-1. **Cutting edge data science**: R is an open-source language that is known for its **active user community** and a **wide range of packages** that together enable the quick and effective implementation of data science techniques
-2. **Reproducibility**: Code-based workflows help facilitate **reproducible analysis**, which is the notion that analysis should be built in a way that is replicable by others. R as a tool promotes this good practice. 
+1. **Cutting edge data science**: R is an open-source language that is known for its active user community and a wide range of packages that together enable the quick and effective implementation of data science techniques.
+2. **Reproducibility**: Code-based workflows help facilitate reproducible analysis, which is the notion that analysis should be built in a way that is replicable by others. R as a tool promotes this good practice. 
 3. **Efficiency / scalability**: R scales relatively well in the context of large datasets. The application of functions and automated processes also help cut down routine analysis time 
-4. **Integration**: If you already use R as part of your analysis toolkit, adopting the **wpa** package as part of the workflow will be seamless and easy
-5. **Go beyond basic reporting**: One of the most appealing feature of R is the access it offers to a wide range of packages. For instance, clustering and text mining can be done very easily as part of a R workflow – which are both available from the **wpa** package
+4. **Integration**: If you already use R as part of your analysis toolkit, adopting this package as part of the workflow will be seamless and easy
+5. **Extensibility**: One of the most appealing feature of R is the access it offers to a wide range of packages. For instance, clustering and text mining can be done very easily as part of a R workflow – which are both available in this package

+## Guide contents

+This guide is organized in five key sections:

-Also check out our package cheat sheet for more information:
+1. [**Getting Started**](analyst_guide_getting_started.html): This section contains the detailed installation instructions, and a general overview of how functions work. 
+2. [**Summary Functions**](analyst_guide_summary.html): This section introduces functions that calculate averages and draw comparisons across groups.
+3. [**Distribution Functions**](analyst_guide_distribution.html): This section describes functions that help you explore distributions across groups.
+4. [**Trend Functions**](analyst_guide_trend.html): This section explains functions that explore time dyanmics across a wide range of metrics.
+5. [**Network Functions**](analyst_guide_network.html): This section explores functions that help you plot and analyse networks.
+
+## Additional resources
+
+To get the most out of **wpa**, make sure to leaverage these additional resources: 
+
+1. Our official [**wpa** cheat sheet](https://github.com/microsoft/wpa/blob/main/man/figures/wpa%20cheatsheet.pdf).
+2. A growing list of [articles](https://microsoft.github.io/wpa/articles/) with detailed walkthroughs, written by multiple contributors. 
+
+## Ready to go?
+
+Let's begin with the [**Getting Started**](analyst_guide_getting_started.html) section.

-<a href="https://github.com/microsoft/wpa/blob/main/man/figures/wpa%20cheatsheet_20201116.pdf"><img src="https://raw.githubusercontent.com/microsoft/wpa/main/man/figures/wpa%20cheatsheet_20201116.png" align="center" width=50% /></a>
--- a/.github/analyst_guide_distribution.md
+++ b/.github/analyst_guide_distribution.md
@ -1,3 +1,107 @@
-# Distribution
+# Distribution Functions
+
+**Distribution** functions allow you to drill deeper into groups, to diagnose if the averages are truly representative of that group, or if they are hiding pockets of individuals with divergent behaviour (outliers).
+
+## Grouping individuals
+
+The `collaboration_dist()` function produces a 100% stacked bar chart, with all individuals classified in pre-defined groups according to defined ranges of collaboration hours. This function uses a Standard Person Query as input:
+
+```R
+sq_data %>% collaboration_dist() 
+```
+
+You can specify which HR attribute/variable to use as a grouping variable with the `hrvar` argument, and what output to obtain (either "plot" or "table") with the `return` argument.
+
+```R
+sq_data %>% collaboration_dist(hrvar = "LevelDesignation", return="table") 
+```
+*wpa* includes a wide range of equivalent functions, that create 100% stacked bar plots for specific metrics. These include:
+
+- `email_dist()`
+- `meeting_dist()`
+- `one2one_dist()`
+- `afterhours_dist()`
+- `workloads_dist()`
+
+## Fizzy Drink Plots 
+You can also explore distributions by using jittered ('fizzy drink') scatter plots. The `collaboration_fizz()` function illustrates how different individuals fall in a common scale:
+
+```R
+sq_data %>% email_fizz() # Fizzy drink plot
+```
+
+As usual, you to specify an HR attribute grouping variable and create a table with the `return` argument.
+
+```R
+sq_data %>% collaboration_fizz(hrvar = "LevelDesignation", return = "table")
+```
+
+Other examples of distribution fizzy drink plots include:
+
+- `email_fizz()`
+- `meeting_fizz()`
+- `one2one_fizz()`
+- `afterhours_fizz()`
+- `workloads_fizz()`
+
+## Custom bar charts and tables
+
+Not all metrics include their own dist and fizz functions. However, you can obtain the equivalent graphs using the `create_dist()` and  `create_fizz()`. These functions requires you to include a character string containing the name of the metric you want to analyze. Additionall, the  `create_boxplot()` function allows you to create a box plot. 
+
+For example, using "Generated_workload_email_hours":
+
+```
+sq_data %>% create_dist(metric = "Generated_workload_email_hours")
+
+sq_data %>% create_fizz(metric = "Generated_workload_email_hours")
+
+sq_data %>% create_boxplot(metric = "Generated_workload_email_hours")
+
+```
+
+##  Exploring changes over time
+
+In our next section, we will cover [**Trend Functions**](analyst_guide_trend.html) that are useful to explore time dyanmics across a wide range of metrics.
+
+## Gallery
+
+<html>
+<head>
+<style>
+div.gallery {
+  margin: 5px;
+  border: 1px solid #ccc;
+  float: left;
+  width: 180px;
+}
+
+div.gallery:hover {
+  border: 1px solid #777;
+}
+
+div.gallery img {
+  width: 100%;
+  height: auto;
+}
+
+div.desc {
+  padding: 15px;
+  text-align: center;
+}
+</style>
+</head>
+<body>
+
+<div class="gallery">
+  <a target="_blank" href="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/email_dist.png">
+    <img src="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/email_dist.png" alt="Email Distribution Function" width="600" height="400">
+  </a>
+  <div class="desc">email_dist()</div>
+</div>
+
+</body>
+</html>
+
+
+

-_Coming soon..._
--- a/.github/analyst_guide_getting_started.md
+++ b/.github/analyst_guide_getting_started.md
@ -0,0 +1,189 @@
+# Getting started
+This section contains the detailed installation instructions, and a first overview on how functions work in the **wpa** package. 
+
+## Installation
+
+The latest stable version of **wpa** is available in Github. You can automatically download and install it in your local device, by running the following code in R:
+
+```R
+# Check if devtools is installed, if not then install it
+if(!"devtools" %in% installed.packages()){
+  install.packages("devtools")
+}
+devtools::install_git(url = "https://github.com/microsoft/wpa.git")
+
+```
+
+If you are prompted for a message to update your packages, we recommend updating all CRAN packages, unless you are aware that there are any significant breaking changes in the new packages that would be updated. You may need to restart your R Session both **before** and **after** the package updates prior to re-running the above code. 
+
+**wpa** is not yet released on CRAN, and therefore `install.packages()` will not work. If you prefer to proceed with a local installation, you can download a installation file [here](https://github.com/microsoft/wpa/releases). 
+
+## Loading the wpa package
+Once the installation is complete, you can load the package with:
+
+```R
+library(wpa)
+```
+
+You only need to install the package once; however, you will need to load it every time you start a new R session. 
+
+**wpa** is designed to work side by side with other Data Science R packages from [tidyverse](https://www.tidyverse.org/). We generally recommend to load that package too:
+
+```R
+library(tidyverse)
+```
+
+## Importing Workplace Analytics data
+To read data into R, you can use the `import_wpa()` function. This function accepts any query file in CSV format and performs variable type conversions optimized for Workplace Analytics.
+
+Assuming you have a file called *myquery.csv* on your desktop, you can import it into R using:
+
+```R
+setwd("C:/Users/myuser/Desktop/")
+person_data <- import_wpa("myquery.csv") 
+```
+
+In the code above, `set_wd()` will  set the working directory to the Desktop, then `import_wpa()` will read the source CSV. Note that file paths in R must be provided as a forward-slash (`/`) or escaped back-slash (`\\`). 
+
+As an alternative to `set_wd()`, you may also consider using [RStudio Projects](https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner's-guide/), which enables you to use relative links within the working directory _instead_ of `set_wd()` and full file paths.
+
+The contents will be saved to the object person_data (using `<-` as an [Assignment Operator](https://stat.ethz.ch/R-manual/R-devel/library/base/html/assignOps.html)).
+
+## Demo data
+The **wpa** package includes a set of demo Workplace Analytics datasets that you can use to explore the functionality of this package. We will also use them extensively in this guide. The included datasets are:
+
+1. `sq_data`: A Standard Person Query
+2. `dv_data`: A Standard Person Query with outliers
+3. `mt_data`: A Standard Meeting Query
+4. `em_data`: An Hourly Collaboration Query
+5. `g2g_data`: A Group-to-Group Query
+
+See [here](https://docs.microsoft.com/en-us/Workplace-Analytics/tutorials/query-basics) for a full documentation of the queries in Workplace Analytics.
+
+## Exploring a person query 
+We can explore the `sq_data` person query using the `analysis_scope()` function. This function creates a basic bar plot, with the count of the distinct individuals for different group (groups defined by an HR attribute in your query).  
+
+For example, if we want to know the number of individuals in `sq_data` per organization, we can use:
+
+```R
+analysis_scope(sq_data, hrvar = "Organization")
+```
+
+This function requires you to provide a person query (`sq_data`) and specify which HR variable will be used to slice the data (`hrvar`). As we have indicated that the `Organization` attribute should be used, the resulting bar chart will show the number of individuals for each organization in the database.
+
+The same R code can be written using a Forward-Pipe Operator (`%>%`) to feed our query into the funciton. The notation is common in R data science applications, and is the one we will use moving forward. 
+
+
+```R
+sq_data %>% analysis_scope(hrvar = "Organization") 
+```
+
+Let's now use this function to explore of other groups. For example:
+
+```R
+sq_data %>% analysis_scope(hrvar = "LevelDesignation")
+sq_data %>% analysis_scope(hrvar = "TimeZone")
+```
+
+We can expand this analysis by using the `dplyr::filter()` function from **dplyr**. This will allows us to drill into a specific subset of the data. This is where the Forward-Pipe Operators (`%>%`) become very useful, as we can write a single line that takes the original data, applies a filter, and then creates the plot:
+
+```R
+sq_data %>%
+	filter(LevelDesignation == "Support") %>%
+    analysis_scope(hrvar = "Organization")
+```
+
+Most functions in **wpa** create plot by default, but can change their behaviour by adding a `return` argument. If you add `return="table"` to this function it will now produce a table with the count of the distinct individuals by group.
+
+```R
+sq_data %>% analysis_scope(hrvar = "LevelDesignation", return = "table")
+```
+If at any point you would like to understand more about the functions, you can:
+
+- Run in the R console with the function name prefixed with a question mark, e.g. `?analysis_scope`
+- View the underlying source code of the function with `View()`,  e.g. `View(analysis_scope)`
+- Visit the reference page online: https://microsoft.github.io/wpa/reference
+
+
+
+## Function structure
+
+All functions in **wpa** follow a similar behaviour, including many common arguments. The following illustrates the basic API of standard analysis functions:
+
+<img src="https://raw.githubusercontent.com/microsoft/wpa/main/man/figures/api-demo.png" align="center" width=80% />
+
+So far we have explored the `hrvar` and `return` arguments. We will use the `mingroup` in the next section. 
+
+## Exporting plots and tables
+Tables and plots can be saved with the `export()` function. This functions allows you to save plots and tables into your local drive.
+
+One again, adding an additional forward-Pipe operator we can write:
+
+```R
+sq_data %>%
+	analysis_scope(hrvar = "Organization") %>%
+	export()
+
+```
+
+## Four steps from data to output
+
+The examples above illustrate how the use of **wpa** can be summarized in 4 simple steps: Load the package, read-in query data, run functions and export results. The script below illustrates this functionality:
+
+```R
+library(wpa) # Step 1
+
+person_data <- import_wpa("myquery.csv") # Step 2
+
+person_data %>% analysis_scope() # Step 3
+
+person_data %>%
+	analysis_scope() %>%
+	export() # Step 4
+
+```
+
+## Ready to learn more?
+
+Let's go to the [**Summary Functions**](analyst_guide_summary.html) section, to see how we can analyse different Workplace Analytics Metrics in a similar way.
+
+
+
+## Gallery
+
+<html>
+<head>
+<style>
+div.gallery {
+  margin: 5px;
+  border: 1px solid #ccc;
+  float: left;
+  width: 180px;
+}
+
+div.gallery:hover {
+  border: 1px solid #777;
+}
+
+div.gallery img {
+  width: 100%;
+  height: auto;
+}
+
+div.desc {
+  padding: 15px;
+  text-align: center;
+}
+</style>
+</head>
+<body>
+
+<div class="gallery">
+  <a target="_blank" href="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/analysis_scope.png">
+    <img src="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/analysis_scope.png" alt="Analysis Scope" width="600" height="400">
+  </a>
+  <div class="desc">analysis_scope()</div>
+</div>
+
+</body>
+</html>
--- a/.github/analyst_guide_summary.md
+++ b/.github/analyst_guide_summary.md
@ -1,34 +1,145 @@
-# Summary
+# Summary functions

-**Summary functions** allow you to compare averages across organizational attributes.
+This section describes the use of  summary functions. These functions allow you to compare averages across the groups defined by an organizational attribute, for many Workplace Analytics metrics.

-An instance of a summary function in action would be:
+
+## Exploring key metrics
+
+The `keymetrics_scan()` function allows you to produce a summary table with a wide range of metrics from an Standard Person Query. Just like in the function we studied in the previous section, you can specify which HR attribute/variable to use as a grouping variable with the `hrvar` argument, and what output to obtain (either "plot" or "table") with the `return` argument.

 ```R
+sq_data %>% keymetrics_scan(hrvar = "Organization", return = "plot")
+sq_data %>% keymetrics_scan(hrvar = "Organization", return = "table")
+```
+
+The resulting table, will provide a averages for 18 key Workplace Analytics Metrics. You can customised what specific indicators to include, with the `metrics` argument:
+
+```R
+sq_data %>% keymetrics_scan(hrvar = "Organization", return = "plot", 
+	metrics= c("Workweek_span", "Collaboration_hours", "After_hours_collaboration_hours")
+```
+
+The `keymetrics_scan()` function is a great starting point for exploratory data analysis, before you dive deeper into particular metrics.
+
+## Average email and meeting hours
+
+The `collaboration_summary()` function generates a stacked bar plot summarising the email and meeting hours by an HR attribute you specify. If no HR attribute is specified, "organization" will be used by default:
+
+```{r}
 sq_data %>% collaboration_summary()
 ```

-You can use return a summary table rather than a plot:
+By changing the `hrvar()` argument, you can change the groups being shown easily:
+
+```{r}
+sq_data %>% collaboration_summary(hrvar = "LevelDesignation")
+```
+
+By default, all summary functions exclude groups with less that five individuals. This is also something that can be adjusted, using the `mingroup()` argument:
+
+```
+sq_data %>% collaboration_summary(hrvar = "LevelDesignation", mingroup = 10)
+```
+
+Finally, you can also use "table" in the `return` argument, to obtain summary table instead of a plot. The `export()` function will copy all contents to the clipboard. 
+
+```{r}
+sq_data %>% collaboration_summary(hrvar = "LevelDesignation", return = "table")
+```
+
+## Other summary functions
+
+The package includes a wide range of summary functions, that create bar plots for specific metrics. These include:
+
+- `email_summary()`:  Bar plot summarising email hours by an HR attribute.
+- `meeting_summary()`: Bar plot summarising meeting hours by an HR attribute.
+- `one2one_summary()`: Bar plot summarising manager one-to-one meeting hours, by an HR attribute.
+- `workloads_summary()`: Bar plot summarising workweek span by an HR attribute.
+- `afterhours_summary()`: Bar plot summarising after-hours collaboration hours by an HR attribute.
+
+All of these functions work equivalently to the `collaboration_summary()` function: they use Standard Person Query data as an input, and accept `hrvar`, `return` and `mingroup` arguments.
+
+For example, 
+
+```
+sq_data %>% afterhours_summary()
+```
+
+## Custom bar charts and tables
+
+For other metrics, the `create_bar()` function is a good way to obtain a summary bar chart for any metric. This function requires you to include a character string containing the name of the metric you want to analyze, e.g. "Generated_workload_email_hours":
+
+```
+sq_data %>% create_bar(metric = "Generated_workload_email_hours")
+```
+
+The `create_bar()` function also accepts `hrvar`, `return` and `mingroup` arguments.
+
+
+## Customizing plots
+
+All plots in **wpa** are [ggplot objects](https://rafalab.github.io/dsbook/ggplot2.html). This means you can customize them by adding ggplot arguments and layers. For instance, you can change the title of a `collaboration_summary()` plot:

 ```R
-sq_data %>% collaboration_summary(return = "table")
+sq_data %>% collaboration_summary() + ggtitle("This is a custom title")
 ```

-There is also an option to change the threshold for excluding group size:
+##  Going beyond averages

-```
-sq_data %>%
-  collaboration_sum(hrvar = "LevelDesignation",
-  					mingroup = 10,
-  					return = "table")
-```
+ Let's continue to [**Distribution Functions**](analyst_guide_distribution.html) to explore how we can analyse distributions from different Workplace Analytics Metrics with unique functions.


+## Gallery
+
+<html>
+<head>
+<style>
+div.gallery {
+  margin: 5px;
+  border: 1px solid #ccc;
+  float: left;
+  width: 180px;
+}
+
+div.gallery:hover {
+  border: 1px solid #777;
+}
+
+div.gallery img {
+  width: 100%;
+  height: auto;
+}
+
+div.desc {
+  padding: 15px;
+  text-align: center;
+}
+</style>
+</head>
+<body>
+
+<div class="gallery">
+  <a target="_blank" href="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/keymetrics_scan.png">
+    <img src="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/keymetrics_scan.png" alt="Key Metrics Scan" width="600" height="400">
+  </a>
+  <div class="desc">keymetrics_scan()</div>
+</div>
+
+<div class="gallery">
+  <a target="_blank" href="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/collab_sum.png">
+    <img src="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/collab_sum.png" alt="Collaboration Summary" width="600" height="400">
+  </a>
+  <div class="desc">collaboration_summary()</div>
+</div>
+
+<div class="gallery">
+  <a target="_blank" href="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/afterhours_sum.png">
+    <img src="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/afterhours_sum.png" alt="After-Hours Summary" width="600" height="400">
+  </a>
+  <div class="desc">afterhours_summary()</div>
+</div>
+
+</body>
+</html>

-Other similar functions include:

- `email_summary()`
- `meeting_summary()`
- `one2one_summary()`
- `workloads_summary()`
- `afterhours_summary()`
--- a/.github/analyst_guide_trend.md
+++ b/.github/analyst_guide_trend.md
@ -1,3 +1,36 @@
 # Trend

-_Coming soon..._
+**Trend** functions analyze Workplace Analytics metrics over time.
+
+<img src="https://raw.githubusercontent.com/microsoft/wpa/main/.github/gallery/collab_area.png" align ="center" width=80%>
+
+## Functions
+
+Trend functions include:
+
+### Trend
+
+- `collaboration_trend()`
+- `meeting_trend()`
+- `email_trend()`
+- `one2one_trend()`
+- `afterhours_trend()`
+- `workloads_trend()`
+
+### Line 
+
+- `collaboration_line()`
+- `meeting_line()`
+- `email_line()`
+- `one2one_line()`
+- `afterhours_line()`
+- `workloads_line()`
+
+### Flexible functions
+
+- `create_trend()`
+- `create_line()`
+
+### Other
+
+- `collab_area()`
--- a/.github/developer_guide.md
+++ b/.github/developer_guide.md
@ -1,6 +1,23 @@
 # :hammer: Developer Guide

-This is guide for "developers", or anyone who is looking to contribute code to the {wpa} R package. 
+This is guide for "developers", or anyone who is looking to contribute code to the **wpa** R package. 
+
+Please also read our [CONTRIBUTING document](CONTRIBUTING.html) on details regarding our Contributor License Agreement (CLA).
+
+## Pre-requisites
+
+You should have the following installed before starting:
+
+1. [R](https://www.r-project.org/)
+2. [RStudio Desktop](https://rstudio.com/products/rstudio/download/#download)
+3. [GitHub Desktop](https://desktop.github.com/) (Highly recommended, unless you are very familiar with git)
+
+You should also have the following packages installed. These should all be available from CRAN:
+1. **devtools**
+2. **tidyverse**
+3. **roxygen2**
+
+Once these are all installed, you should be ready to get started.

 ## Design Principles

@ -12,45 +29,79 @@ Please try to adhere to the following package design principles before proceedin
 - **Parsimony**: in creating the package, as much of the existing code should be re-used if possible to minimise duplication of work and to make analysis reproducible.
 - **Tidy**: the functions from the package are designed to be consistent with tidy principles, and work well with a **dplyr** pipe (`%>%`) workflow.

-## Pre-requisites
-
-You should have the following installed before starting:
-
-1. [R](https://www.r-project.org/)
-2. [RStudio Desktop](https://rstudio.com/products/rstudio/download/#download)
-3. [GitHub Desktop](https://desktop.github.com/) (Highly recommended, unless you are very familiar with git)
-
-You should also have the following packages installed. These should all be available from CRAN:
-1. {devtools}
-2. {tidyverse}
-3. {roxygen2}
-
-Once these are all installed, you should be ready to roll!
-
 ---

 ## :hammer: Quick start guide - For developers

-### Update package and installing to local directory
+### 1. Cloning or forking the package
+
+To get started, you will need to  _fork_  and _clone_ the **wpa** repository on your local machine.  You will need to have either **git** or **GitHub Desktop** set up on your local machine (see Pre-requisites). 
+
+Here are the steps if you do not have repository access to `microsoft/wpa` (if you have been granted direct access, you can clone directly):
+
+1. On the top right of the repository page, click 'Fork'.
+1.  This will create a repository called `<YOURUSERNAME>/wpa`. 
+1. After this, you can either use **GitHub Desktop** or **git** to  clone this repository to your local machine.
+
+When creating a fork on GitHub, you may be asked whether you are looking to contribute to the parent project or for your own purposes (see screenshot). Please select **contribute to the parent project**. 
+
+<img src ="https://raw.githubusercontent.com/microsoft/wpa/main/man/figures/fork-repo-example.png" height=300></img>
+
+### Using GitHub Desktop
+
+1. Click on the big green `Code` button on the repository page, and click 'Open with GitHub Desktop'.
+2. Follow the instructions on GitHub Desktop to clone the repository to a folder on your local machine.
+
+### Using git
+
+You may prefer to use **git** because it is more flexible, or because you have repository access and you are trying to directly clone the package from `microsoft/wpa` and are failing to do so with GitHub Desktop. There is a known issue with GitHub Desktop that it could fail with 2FA authentication, in which case you can use **git** to authenticate properly. 
+
+Here are the steps:
+
+1. Ensure you have git installed. If not, you can download this from [Git - Downloads (git-scm.com)](https://git-scm.com/downloads).
+2. Once installed, open up Git Bash. This should open up a Command Line interface. 
+3. Decide which folder you would like to clone the `wpa` repository, and copy the path to the folder. It is highly recommended that you create a new folder and name it `wpa`. An example of this path would be `C:\My Documents\GitHub\wpa`.
+4. Type `cd `, and followed (separated with a space) by the path to the local folder above. You will need to add quotes, so the full command would look something like `cd "C:/My Documents/GitHub/wpa"`. See [How to change folder with git bash? - Stack Overflow](https://stackoverflow.com/questions/8961334/how-to-change-folder-with-git-bash) if you have issues changing directories. 
+5. Once Git Bash is showing the correct file path to your `wpa` repository, run the command `git init`. This command initializes your folder as a git repository. In other words, it sets it up to do version control. 
+6. Next, run `git remote add origin https://github.com/microsoft/wpa.git`. This will "connect" your local folder with the Git repository. You may get some prompts for authentication somewhere around here. Run `git remote -v` to check whether this has been successfully applied. 
+7. You can now use your standard git commands to commit, pull, push, etc. See [git cheat sheet](https://education.github.com/git-cheat-sheet-education.pdf) for some useful git commands.
+
+If you are in the situation where you are simply trying to overcome the authentication issue with GitHub Desktop, please read this paragraph. Once the authentication through Git Bash is set up, you may now go back to the GitHub Desktop GUI window. Go to `File > Add local repository` and choose the `wpa` folder that you have just set up. You may be asked to login again, but this should now allow you to fetch or pull from the repository. You are now set up!
+
+### 2. Adding / changing a function
+
+After creating a copy of the repository on your local machine and having installed the repository version, you can start working with R. 
+
+1. Let us assume that you have written a new function, e.g. a checking function that returns TRUE if `PersonId` exists, and returns FALSE if it does not).
+1. Once you check the function and ensure that it works, save this into a R file under the "R" folder of the package directory.
+   - Ensure that your current working directory is set to the package directory. You can either open the RStudio session with the package .Rproj file, or run `setwd()` to change you working directory. Run `getwd()` to check what your current working directory is.
+   - Ensure that the relevant roxygen headers are present (see [here](https://roxygen2.r-lib.org/articles/rd.html) for more reference).
+   - Whilst it is possible to have more than one function saved in a single R file, we recommend saving them in their own files with their own names for clarity. For instance, the function `collaboration_dist()` would be saved in a file called _collaboration_dist.R_.
+1. Run `devtools::load_all()`. This will simulate loading the package with the function in it. Run the function with some example data to ensure that it works.    
+1. Follow the steps in the next section to generate the documentation (`roxygen2::roxygenise()`), check the package (`devtools::check()`), and install the package (`devtools::install()`). 
+
+### 3. Update package and installing to local directory
+
+There are certain tricks which are helpful to know with developing or changing an R function,  such as how to test the function and how to generate R documentation.  This section will provide more information. 

 Once you've made changes to a script, here is how to update the package and install it locally on your computer. Run:

-#### 1. Generating documentation
+#### 3.1. Generating documentation

 ```R
 roxygen2::roxygenise()
 ```
 `roxygenise()` generates documentation files for the R package, and makes them available to the user.

-#### 2. R CMD Checks
-Next, run this to build and check the package. You will need {devtools} installed.
+#### 3.2. R CMD Checks
+Next, run this to build and check the package. You will need **devtools** installed.

 ```R
 devtools::check()
 ```
 This step runs a R CMD check against the package, to check that there will be no errors and issues if somebody installs the package. This step can take around 5 minutes long, and usually it is worth running this only after you've implemented all the desired changes towards the end. 

-#### 3. Install the package
+#### 3.3. Install the package

 If everything runs smoothly, run this final line to install the local package. Make sure that your working directory is set to the package!

@ -58,11 +109,38 @@ If everything runs smoothly, run this final line to install the local package. M
 devtools::install()
 ```

+#### Other installation methods
+
+Note that `devtools::install()` would install the package directly from the directory in your local machine, i.e. the version that you are developing. If you would like to install the stable GitHub version from the `main` branch, you will need to run: 
+
+```R
+devtools::install_git(url = "https://github.com/microsoft/wpa.git")
+```
+
+If you'd like to install from a particular branch or install vignettes, you can leverage the additional arguments available with `devtools::install_git()`, for example:
+
+```R
+devtools::install_git(url = "https://github.com/microsoft/wpa.git",
+                      branch = "feature/network-functions",
+                      build_vignettes = TRUE)
+```
+
+### 4. Pushing changes to GitHub
+
+Once you are sure that everything works, you can push your changes to GitHub either with git or GitHub Desktop. Depending on the set-up, the method to merge is different:
+
+- If you are pushing to your forked repository, you can push directly to the master/main branch. 
+- If you are pushing to the official repository at `microsoft/wpa` (assuming you have access), you are recommended to either create a new branch or push the changes to an existing patch branch that is **not** the main repository. It is recommended that you never push changes directly to the `main` branch so that you do not introduce any breaking changes.
+
+For commit messages, we would ask you to add an intuitive commit message so others will know what you have changed to the code. Please see the wiki page of this repo for a style guide on the commit messages.
+
+The next step is to create a pull request to merge your changes to `microsoft/wpa`. The repository itself provides a pull request template on what information you should include in the pull request. A maintainer will reply and review your pull request once it has been successfully submitted.
+
 ---

 ## To increment or change a package version

-Each package release has a corresponding package version (e.g. 1.4.2.) To change that package version number for a particular release, simply go into the `DESCRIPTION` at the project root folder of the {wpa} package. You can edit this with any notepad software or RStudio. 
+Each package release has a corresponding package version (e.g. 1.4.2.) To change that package version number for a particular release, simply go into the `DESCRIPTION` at the project root folder of the **wpa** package. You can edit this with any notepad software or RStudio. 

 Change the values that come after `Version: ` to change the package version. Prior to incrementing the package version, please align with the main developer team on the agreed version release details. 

@ -90,24 +168,11 @@ It is good practice to distribute a package with an updated version of the docum
 devtools::build_manual()
 ```

-## A short guide to adding a function to the package

-1. Let us assume that you have written a new function, e.g. a checking function that returns TRUE if PersonId exists, and returns FALSE if it does not).
-1. Once you check the function and ensure that it works, save this into a R file under the "R" folder of the package directory.
-    - Ensure that your current working directory is set to the package directory. You can either open the RStudio session with the package .Rproj file, or run `setwd()` to change you working directory. Run `getwd()` to check what your current working directory is.
-    - Ensure that the relevant roxygen headers are present (see [here](https://roxygen2.r-lib.org/articles/rd.html) for more reference).
-    - Whilst it is possible to have more than one function saved in a single R file, we recommend saving them in their own files with their own names for clarity. For instance, the function `collaboration_dist()` would be saved in a file called _collaboration_dist.R_.
-1. Follow the steps in the Developer Guide above to generate the documentation (`roxygen2::roxygenise()`), check the package (`devtools::check()`), and install the package (`devtools::install()`). 
-1. Once you are sure that everything works, open **GitHub Desktop**, and **check that you have selected the relevant branch**. If you are unsure what your changes are, it is always recommended that you work on a new branch so that you do not introduce any breaking changes to the master branch.
-1. _Fetch / pull_ from the branch. This ensures that your local package is updated with changes that others have made to the code.
-1. _Commit_ your changes, and add an intuitive commit message so others will know what you have changed to the code. Please see the wiki page of this repo for a style guide on the commit messages.
-1. _Push_ your changes to the branch. If there are no errors, you should be able to see the changes on the GitHub repository!

-## Using GitHub
-*It's on its way...*

 ## Documenting functions with {roxygen2}
-*It's on its way...*
+We recommend Karl Broman's guide on [Writing documentation with Roxygen2]([Writing documentation with Roxygen2 (kbroman.org)](https://kbroman.org/pkg_primer/pages/docs.html)).

 ## :link: Useful links

--- a/.github/gallery/afterhours_sum.png
+++ b/.github/gallery/afterhours_sum.png
--- a/.github/gallery/analysis_scope.png
+++ b/.github/gallery/analysis_scope.png
--- a/.github/gallery/collab_area.png
+++ b/.github/gallery/collab_area.png
--- a/.github/gallery/collab_sum.png
+++ b/.github/gallery/collab_sum.png
--- a/.github/gallery/email_dist.png
+++ b/.github/gallery/email_dist.png
--- a/.github/gallery/identify_holidayweeks.png
+++ b/.github/gallery/identify_holidayweeks.png
--- a/.github/gallery/keymetrics_scan.png
+++ b/.github/gallery/keymetrics_scan.png
--- a/.github/gallery/meetingtype_dist.png
+++ b/.github/gallery/meetingtype_dist.png
--- a/.github/gallery/mgrrel_matrix.png
+++ b/.github/gallery/mgrrel_matrix.png
--- a/.github/gallery/network_g2g.png
+++ b/.github/gallery/network_g2g.png
--- a/.github/gallery/tm_cooc.png
+++ b/.github/gallery/tm_cooc.png
--- a/.github/gallery/tm_freq.png
+++ b/.github/gallery/tm_freq.png
--- a/.github/gallery/tm_wordcloud.png
+++ b/.github/gallery/tm_wordcloud.png
--- a/.github/gallery/workpatterns_rank.png
+++ b/.github/gallery/workpatterns_rank.png
--- a/.github/package_structure.md
+++ b/.github/package_structure.md
@ -1,4 +1,4 @@
-## :package: Package Structure
+# :package: Package Structure

 There are four main types of functions in **wpa**:
 1. Standard Analysis 
--- a/.github/reporting_security_issues.md
+++ b/.github/reporting_security_issues.md
@ -1,4 +1,4 @@
-## Security
+# Security

 Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).

@ -34,4 +34,4 @@ We prefer all communications to be in English.

 ## Policy

-Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
+Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).
--- a/.github/workflows/contributors.yaml
+++ b/.github/workflows/contributors.yaml
@ -0,0 +1,15 @@
+name: Contributors
+on:
+  schedule:
+    - cron: '0 1 * * 0' # At 01:00 on Sunday.
+  push:
+    branches:
+      - main
+jobs:
+  contributors:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: bubkoo/contributors-list@v1
+        with:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          round: true
--- a/CONTRIBUTORS.svg
+++ b/CONTRIBUTORS.svg
--- a/14
+++ b/14
@ -1,7 +1,7 @@
 Package: wpa
 Type: Package
 Title: Tools for Analysing and Visualising Workplace Analytics data
-Version: 1.3.0
+Version: 1.3.1
 Authors@R: c(
    person(given = "Martin", family = "Chan", role = c("aut", "cre"), email = "martin.chan@microsoft.com"),
    person(given = "Carlos", family = "Morales", role = "aut", email = "carlos.morales@microsoft.com"),
@ -9,8 +9,10 @@ Authors@R: c(
    person(given = "Ainize", family = "Cidoncha", role = "ctb", email = "ainize.cidoncha@microsoft.com"),
    person(given = "Rosamary", family = "Ochoa Vargas", role = "ctb", email = "rosamary.ochoa@microsoft.com"),
    person(given = "Tannaz", family = "Sattari", role = "ctb", email = "tannaz.sattari@microsoft.com"),
-	person(given = "Lucas", family = "Hogner", role = "ctb", email = "lucas.hogner@microsoft.com"),
-	person(given = "Ellen", family = "Trinklein", role = "ctb")
+    person(given = "Lucas", family = "Hogner", role = "ctb", email = "lucas.hogner@microsoft.com"),
+    person(given = "Jasminder", family = "Thind", role = "ctb", email = "jasminder.thind@microsoft.com"),
+    person(given = "Simone", family = "Liebal", role = "ctb", email = "simone.liebal@microsoft.com"),
+    person(given = "Ellen", family = "Trinklein", role = "ctb")
 	)
 Description: Opinionated functions that enable easier and faster
    analysis of Workplace Analytics data. There are three main types of functions in wpa:
@ -52,11 +54,8 @@ Imports:
    tidytext,
    ggraph,
    igraph,
-    widyr,
    proxy,
    rlang,
-    GGally,
-    network,
    ggwordcloud,
    methods,
    data.table
@ -64,5 +63,6 @@ RoxygenNote: 7.1.1
 Roxygen: list(markdown = TRUE)
 VignetteBuilder: knitr
 Suggests: 
-    extrafont
+    extrafont,
+    leiden
 Language: en-US
--- a/18
+++ b/18
@ -41,6 +41,7 @@ export(create_IV)
 export(create_bar)
 export(create_bar_asis)
 export(create_boxplot)
+export(create_bubble)
 export(create_dist)
 export(create_dt)
 export(create_fizz)
@ -48,6 +49,7 @@ export(create_line)
 export(create_line_asis)
 export(create_period_scatter)
 export(create_rank)
+export(create_sankey)
 export(create_scatter)
 export(create_stacked)
 export(create_trend)
@ -69,6 +71,7 @@ export(flag_extreme)
 export(flag_outlooktime)
 export(g2g_network)
 export(generate_report)
+export(heat_colours)
 export(hr_trend)
 export(hrvar_count)
 export(hrvar_count_all)
@ -85,6 +88,7 @@ export(import_wpa)
 export(internal_network_plot)
 export(is_date_format)
 export(keymetrics_scan)
+export(maxmin)
 export(meeting_dist)
 export(meeting_fizz)
 export(meeting_line)
@ -102,6 +106,11 @@ export(meetingtype_sum)
 export(meetingtype_summary)
 export(mgrcoatt_dist)
 export(mgrrel_matrix)
+export(network_describe)
+export(network_g2g)
+export(network_leiden)
+export(network_louvain)
+export(network_p2p)
 export(one2one_dist)
 export(one2one_fizz)
 export(one2one_line)
@ -110,6 +119,7 @@ export(one2one_sum)
 export(one2one_summary)
 export(one2one_trend)
 export(p_test)
+export(pairwise_count)
 export(period_change)
 export(personas_hclust)
 export(read_preamble)
@ -126,9 +136,11 @@ export(tm_cooc)
 export(tm_freq)
 export(tm_wordcloud)
 export(totals_bind)
+export(totals_col)
 export(totals_reorder)
 export(track_HR_change)
 export(tstamp)
+export(us_to_space)
 export(validation_report)
 export(workloads_dist)
 export(workloads_fizz)
@ -146,16 +158,19 @@ import(Information)
 import(dplyr)
 import(ggplot2)
 import(ggraph)
-import(ggrepel)
 import(reshape2)
 import(scales)
 import(stats)
+import(tidyr)
 import(tidyselect)
 importFrom(data.table,"%between%")
 importFrom(data.table,"%like%")
 importFrom(data.table,":=")
+importFrom(data.table,as.data.table)
+importFrom(data.table,rbindlist)
 importFrom(dplyr,`%>%`)
 importFrom(dplyr,mutate_if)
+importFrom(grDevices,rainbow)
 importFrom(htmltools,HTML)
 importFrom(igraph,graph_from_data_frame)
 importFrom(magrittr,"%>%")
@ -185,4 +200,3 @@ importFrom(tidyselect,all_of)
 importFrom(tidytext,unnest_tokens)
 importFrom(utils,write.csv)
 importFrom(utils,write.table)
-importFrom(widyr,pairwise_count)
--- a/R/IV_by_Period.R
+++ b/R/IV_by_Period.R
@ -23,6 +23,7 @@
 #'
 #'
 #' @family Flexible Input
+#' @family Information Value
 #'
 #' @export

--- a/R/IV_report.R
+++ b/R/IV_report.R
@ -7,9 +7,9 @@
 #'
 #' @description
 #' The function generates an interactive HTML report using
-#' Standard Query data as an input. The report contains a full Information Value analysis,  a data exploration technique that helps determine which columns in a data set have predictive power or influence on the value of a specified dependent variable.
+#' Standard Person Query data as an input. The report contains a full Information Value analysis,  a data exploration technique that helps determine which columns in a data set have predictive power or influence on the value of a specified dependent variable.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param predictors A character vector specifying the columns to be used as predictors.
 #' Defaults to NULL, where all numeric vectors in the data will be used as predictors.
 #' @param outcome A string specifying a binary variable, i.e. can only contain
@ -20,13 +20,21 @@
 #' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
 #' Defaults to TRUE.
 #'
+#' @examples
+#' \dontrun{
+#' sq_data %>%
+#'   mutate(X = ifelse(Collaboration_hours > 12, 1, 0)) %>% # Simulate binary variable
+#'   IV_report(outcome =  "CH_binary", predictors = c("Email_hours", "Workweek_span"))
+#' }
+#'
 #' @family Reports
+#' @family Information Value
 #'
 #' @export
 IV_report <- function(data,
                      predictors = NULL,
                      outcome,
-                      bins= 5,
+                      bins = 5,
                      path = "IV report",
                      timestamp = TRUE){

--- a/R/check_query.R
+++ b/R/check_query.R
@ -11,11 +11,11 @@
 #'
 #' @details
 #' This can be used with any person-level query, such as the standard person query,
-#' collaboration assessment query, and the hourly collaboration query. When run,
+#' Ways of Working assessment query, and the hourly collaboration query. When run,
 #' this prints diagnostic data to the R console.
 #'
 #' @param data A person-level query in the form of a data frame, including the standard person query,
-#' collaboration assessment query, and the hourly collaboration query.
+#' Ways of Working assessment query, and the hourly collaboration query.
 #' @param return Character string to specify whether to return a console message ("message"),
 #' a character string ("text"), or a data frame. Defaults to "message".
 #' @param validation Logical value to specify whether to return a check used by the `validation_report()`.
--- a/R/coaching_report.R
+++ b/R/coaching_report.R
@ -7,12 +7,12 @@
 #'
 #' @description
 #' The function generates an interactive HTML report using
-#' Standard Query data as an input. The report contains a series
+#' Standard Person Query data as an input. The report contains a series
 #' of summary analysis and visualisations relating to key **coaching**
 #' metrics in Workplace Analytics, specifically relating to the time
 #' spent between managers and their direct reports.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "LevelDesignation"
 #'  but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
--- a/R/collaboration_area.R
+++ b/R/collaboration_area.R
@ -6,7 +6,7 @@
 #' @title Collaboration - Stacked Area Plot
 #'
 #' @description
-#' Provides an overview analysis of 'Weekly Digital Collaboration'.
+#' Provides an overview analysis of Weekly Digital Collaboration.
 #' Returns an stacked area plot of Email and Meeting Hours by default.
 #' Additional options available to return a summary table.
 #'
@ -14,8 +14,8 @@
 #' Uses the metrics `Meeting_hours`, `Email_hours`, `Unscheduled_Call_hours`,
 #' and `Instant_Message_hours`.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
-#' A Collaboration assessment dataset may also be provided, in which
+#' @param data A Standard Person Query dataset in the form of a data frame.
+#' A Ways of Working assessment dataset may also be provided, in which
 #' Unscheduled call hours would be included in the output.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
@ -30,6 +30,9 @@
 #'
 #' @family Collaboration
 #'
+#' @examples
+#' collaboration_area(sq_data)
+#'
 #' @return
 #' Returns a ggplot object by default, where 'plot' is passed in `return`.
 #' When 'table' is passed, a summary table is returned as a data frame.
--- a/R/collaboration_line.R
+++ b/R/collaboration_line.R
@ -14,7 +14,7 @@
 #' @details
 #' Uses the metric `Collaboration_hours`.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
 #'  but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
--- a/R/collaboration_report.R
+++ b/R/collaboration_report.R
@ -7,11 +7,11 @@
 #'
 #' @description
 #' The function generates an interactive HTML report using
-#' Standard Query data as an input. The report contains a series
+#' Standard Person Query data as an input. The report contains a series
 #' of summary analysis and visualisations relating to key **collaboration**
 #' metrics in Workplace Analytics,including email and meeting hours.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
 #'  but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
--- a/R/collaboration_sum.R
+++ b/R/collaboration_sum.R
@ -14,7 +14,7 @@
 #' Uses the metrics `Meeting_hours`, `Email_hours`, `Unscheduled_Call_hours`,
 #' and `Instant_Message_hours`.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
 #' @param return Character vector specifying what to return, defaults to "plot".
--- a/R/combine_signals.R
+++ b/R/combine_signals.R
@ -3,18 +3,34 @@
 # Licensed under the MIT License. See LICENSE.txt in the project root for license information.
 # --------------------------------------------------------------------------------------------

-#' @title Combine signals from Emails and IMs
+#' @title Combine signals from the Hourly Collaboration query
 #'
 #' @description
-#' Takes in an Hourly Collaboration Data, and for each hour sums and stores the
-#' signal for `Emails_sent` and `IMs_sent` in `Signals_sent`. This is an internal
+#' Takes in an Hourly Collaboration Data, and for each hour sums and aggregates the
+#' signals (e.g.`Emails_sent` and `IMs_sent`) in `Signals_sent`. This is an internal
 #' function used in the Working Patterns functions.
 #'
-#' @param data Data containing Emails_sent and IMs_sent variables
+#' @param data Hourly Collaboration query containing signal variables (e.g. `Emails_sent_00_01`)
 #' @param hr Numeric value between 0 to 23 to iterate through
+#' @param signals Character vector for specifying which signal types to combine.
+#' Defaults to c("Emails_sent", "IMs_sent"). Other valid values include "Unscheduled_calls" and
+#' "Meetings".
+#'
+#' @details
+#' `combine_signals` uses string matching to aggregate columns.
+#'
+#' @examples
+#' # Demo using simulated variables
+#' sim_data <-
+#'   data.frame(Emails_sent_09_10 = sample(1:5, size = 10, replace = TRUE),
+#'              Unscheduled_calls_09_10 = sample(1:5, size = 10, replace = TRUE))
+#'
+#' combine_signals(sim_data, hr = 9, signals = c("Emails_sent", "Unscheduled_calls"))
 #'
 #' @export
-combine_signals <- function(data, hr){
+combine_signals <- function(data,
+                            hr,
+                            signals = c("Emails_sent", "IMs_sent")){

  if(!is.numeric(hr) | hr < 0 | hr > 23){

@ -22,15 +38,29 @@ combine_signals <- function(data, hr){

  }

+  # End hour
  hr_two <- hr + 1

+  # String pad to two digits
  hr1 <- ifelse(nchar(hr) == 1, paste0(0, hr), hr)
  hr2 <- ifelse(nchar(hr_two) == 1, paste0(0, hr_two), hr_two)

-  full_string <- paste0("Signals_sent_", hr1, "_", hr2)
-  em_string <- paste0("Emails_sent_", hr1, "_", hr2)
-  im_string <- paste0("IMs_sent_", hr1, "_", hr2)
+  # Create string vectors
+  # Use original supplied string if length of signals == 1
+  if(length(signals) == 1){

+    full_string <- paste0(signals, "_", hr1, "_", hr2)

-  dplyr::transmute(data, !!sym(full_string) := !!sym(em_string) + !!sym(im_string))
+  } else {
+
+    full_string <- paste0("Signals_sent_", hr1, "_", hr2)
+
+  }
+
+  input_string <- paste0(signals, "_", hr1, "_", hr2) # Should be length > 1
+
+  # Sum columns and only return `Signals_sent_` prefixed column
+  data %>%
+    dplyr::transmute(!!sym(full_string) := select(., input_string) %>%
+                       apply(1, sum, na.rm = TRUE))
 }
--- a/R/connectivity_report.R
+++ b/R/connectivity_report.R
@ -7,12 +7,12 @@
 #'
 #' @description
 #' The function generates an interactive HTML report using
-#' Standard Query data as an input. The report contains a series
+#' Standard Person Query data as an input. The report contains a series
 #' of summary analysis and visualisations relating to key **connectivity**
 #' metrics in Workplace Analytics, including external/internal network size
 #' vs breadth.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
 #'  but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
--- a/R/create_IV.R
+++ b/R/create_IV.R
@ -108,7 +108,13 @@ create_IV <- function(data,
                      subtitle = "Showing top 12 only")
    
  } else if(return == "plot-WOE"){
+  if (length(IV$Summary$Variable[]) >9) {
+    Information::plot_infotables(IV, IV$Summary$Variable[1:9], same_scale=TRUE) %>% grDevices::recordPlot()
+	}
+	
+	else {
    Information::plot_infotables(IV, IV$Summary$Variable[], same_scale=TRUE) %>% grDevices::recordPlot()
+	}

  } else if(return == "list"){
    output_list
--- a/R/create_bar.R
+++ b/R/create_bar.R
@ -64,8 +64,14 @@ create_bar <- function(data,
  data %>%
    check_inputs(requirements = required_variables)

+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }
+
  ## Clean metric name
-  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+  clean_nm <- us_to_space(metric)

  ## Data for bar plot
  plot_data <-
--- a/R/create_boxplot.R
+++ b/R/create_boxplot.R
@ -57,8 +57,14 @@ create_boxplot <- function(data,
  data %>%
    check_inputs(requirements = required_variables)

+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }
+
  ## Clean metric name
-  clean_nm <- gsub(pattern = "_", replacement = "", x = metric)
+  clean_nm <- us_to_space(metric)

  plot_data <-
    data %>%
--- a/R/create_bubble.R
+++ b/R/create_bubble.R
@ -0,0 +1,120 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Create a bubble plot with two selected Workplace Analytics metrics (General Purpose), with size
+#' representing the number of employees in the group.
+#'
+#' @description
+#' Returns a bubble plot of two selected metrics, using size to map the number of employees.
+#'
+#' @details
+#' This is a general purpose function that powers all the functions
+#' in the package that produce bubble plots.
+#'
+#' @param data A Standard Person Query dataset in the form of a data frame.
+#' @param metric_x Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param metric_y Character string containing the name of the metric,
+#' e.g. "Collaboration_hours"
+#' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
+#'  but accepts any character vector, e.g. "LevelDesignation"
+#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
+#' @param return Character vector specifying what to return, defaults to "plot".
+#' Valid inputs are "plot" and "table".
+#' @param bubble_size A numeric vector of length two to specify the size range of the bubbles
+#'
+#' @import dplyr
+#' @import ggplot2
+#' @import scales
+#'
+#' @family General
+#'
+#' @examples
+#' create_bubble(sq_data,
+#' "Internal_network_size",
+#' "External_network_size",
+#' "Organization")
+#'
+#' create_bubble(sq_data,
+#' "Generated_workload_call_hours",
+#' "Generated_workload_email_hours",
+#' "Organization", mingroup = 100, return = "plot")
+#'
+#' @return
+#' Returns a ggplot object by default, where 'plot' is passed in `return`.
+#' When 'table' is passed, a summary table is returned as a data frame.
+#'
+#' @export
+create_bubble <- function(data,
+                          metric_x,
+                          metric_y,
+                          hrvar = "Organization",
+                          mingroup = 5,
+                          return = "plot",
+                          bubble_size = c(1, 10)){
+
+  ## Check inputs
+  required_variables <- c(hrvar,
+                          metric_x,
+                          metric_y,
+                          "PersonId")
+
+  ## Error message if variables are not present
+  ## Nothing happens if all present
+  data %>%
+    check_inputs(requirements = required_variables)
+
+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }
+
+  ## Clean metric names
+  clean_x <- us_to_space(metric_x)
+  clean_y <- us_to_space(metric_y)
+
+  myTable <-
+    data %>%
+    group_by(PersonId, !!sym(hrvar)) %>%
+    summarise_at(vars(!!sym(metric_x), !!sym(metric_y)), ~mean(., na.rm = TRUE)) %>%
+    group_by(!!sym(hrvar)) %>%
+    summarise_at(vars(!!sym(metric_x), !!sym(metric_y)), ~mean(., na.rm = TRUE)) %>%
+    ungroup() %>%
+    left_join(hrvar_count(data, hrvar = hrvar, return = "table"),
+              by = hrvar) %>%
+    filter(n >= mingroup)
+
+  plot_object <-
+    myTable %>%
+    ggplot(aes(x = !!sym(metric_x),
+               y = !!sym(metric_y),
+               label = !!sym(hrvar))) +
+    geom_point(alpha = 0.5, color = rgb2hex(0, 120, 212), aes(size = n)) +
+    ggrepel::geom_text_repel(size = 3) +
+    labs(title = paste0(clean_x, " and ", clean_y),
+         subtitle = paste("By", camel_clean(hrvar)),
+         caption = paste("Total employees =", sum(myTable$n), "|",
+                         extract_date_range(data, return = "text"))) +
+    xlab(clean_x) +
+    ylab(clean_y) +
+    scale_size(range = bubble_size) +
+    theme_wpa_basic()
+
+  if(return == "table"){
+
+    return(myTable)
+
+  } else if(return == "plot"){
+
+    return(plot_object)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
+
--- a/R/create_dist.R
+++ b/R/create_dist.R
@ -64,7 +64,13 @@ create_dist <- function(data,
    check_inputs(requirements = required_variables)

  ## Clean metric name
-  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+  clean_nm <- us_to_space(metric)
+
+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }

  ## Basic Data for bar plot
  plot_data <-
--- a/R/create_fizz.R
+++ b/R/create_fizz.R
@ -57,8 +57,14 @@ create_fizz <- function(data,
  data %>%
    check_inputs(requirements = required_variables)

+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }
+
  ## Clean metric name
-  clean_nm <- gsub(pattern = "_", replacement = "", x = metric)
+  clean_nm <- us_to_space(metric)

  ## Plot data
  plot_data <-
@ -93,15 +99,7 @@ create_fizz <- function(data,
    ylim(0, max_point) +
    annotate("text", x = plot_legend$group, y = 0, label = plot_legend$Employee_Count) +
    scale_x_discrete(labels = scales::wrap_format(10)) +
-    theme_classic() +
-    theme(axis.text=element_text(size=12),
-          axis.text.x = element_text(angle = 30, hjust = 1),
-          plot.title = element_text(color="grey40", face="bold", size=18),
-          plot.subtitle = element_text(size=14),
-          legend.position = "top",
-          legend.justification = "right",
-          legend.title=element_text(size=14),
-          legend.text=element_text(size=14)) +
+    theme_wpa_basic() +
    labs(title = clean_nm,
         subtitle = paste("Distribution of",
                          tolower(clean_nm),
--- a/R/create_line.R
+++ b/R/create_line.R
@ -61,9 +61,14 @@ create_line <- function(data,
  data %>%
    check_inputs(requirements = required_variables)

+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }

  ## Clean metric name
-  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+  clean_nm <- us_to_space(metric)

  myTable <-
    data %>%
--- a/R/create_period_scatter.R
+++ b/R/create_period_scatter.R
@ -15,7 +15,7 @@
 #' This is a general purpose function that powers all the functions
 #' in the package that produce faceted scatter plots.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics. Accepts a character vector,
 #' defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
 #' @param metric_x Character string containing the name of the metric,
@ -143,8 +143,8 @@ create_period_scatter <- function(data,
          "to",
          daterange_2_end)

-  clean_x <- gsub(pattern = "_", replacement = " ", x = metric_x)
-  clean_y <- gsub(pattern = "_", replacement = " ", x = metric_y)
+  clean_x <- us_to_space(metric_x)
+  clean_y <- us_to_space(metric_y)

  plot_title <-
    paste(clean_x, "and", clean_y,
--- a/R/create_rank.R
+++ b/R/create_rank.R
@ -19,6 +19,9 @@
 #' @param return A character vector specifying what to return.
 #' Valid values include "table" (default). Features are being considered for alternative return options but are currently
 #' unavailable.
+#' @param plot_mode Numeric vector to determine which plot mode to return:
+#'   1. Top and bottom five groups across the data population are highlighted
+#'   2. Top and bottom groups _per_ organizational attribute are highlighted
 #'
 #' @import dplyr
 #' @import ggplot2
@ -38,7 +41,8 @@ create_rank <- function(data,
                        metric,
                        hrvar = extract_hr(data),
                        mingroup = 5,
-                        return = "table"){
+                        return = "table",
+                        plot_mode = 1){

  results <-
    create_bar(data,
@ -73,7 +77,115 @@ create_rank <- function(data,
    select(hrvar, everything())

  if(return == "table"){
+
    return(output)
+
+  } else if(return == "plot"){
+
+    # Company average
+    avg_ch <-
+      data %>%
+      create_bar(hrvar = NULL, metric = metric, return = "table") %>%
+      pull(metric)
+
+    if(plot_mode == 1){
+
+      # Main plot
+      output %>%
+        mutate(Rank = rev(rank(!!sym(metric), ties.method = "max"))) %>%
+        mutate(Group =
+                 case_when(Rank %in% 1:5 ~ "Top 5",
+                           Rank %in% nrow(.):(nrow(.) - 5) ~ "Bottom 5",
+                           TRUE ~ "Middle")) %>%
+        group_by(hrvar) %>%
+        mutate(OrgGroup =
+                 case_when(!!sym(metric) == max(!!sym(metric), na.rm = TRUE) ~ "Top",
+                           !!sym(metric) == min(!!sym(metric), na.rm = TRUE) ~ "Bottom",
+                           TRUE ~ "Middle")) %>%
+        mutate(top_group = max(!!sym(metric), na.rm = TRUE)) %>%
+        ungroup() %>%
+        ggplot(aes(x = !!sym(metric),
+                   y = reorder(hrvar, top_group))) + # Sort by top group
+        geom_point(aes(fill = Group,
+                       size = n),
+                   colour = "black",
+                   pch = 21,
+                   alpha = 0.8) +
+        labs(title = us_to_space(metric),
+             subtitle = "Lowest and highest values, by org. attribute",
+             y = "",
+             x = "") +
+        ggrepel::geom_text_repel(aes(x = !!sym(metric),
+                                     y = hrvar,
+                                     label = ifelse(Group %in% c("Top 5", "Bottom 5"), group, "")),
+                                 size = 3) +
+        scale_x_continuous(position = "top") +
+        scale_fill_manual(name = "Group",
+                          values = c(rgb2hex(68,151,169),
+                                     "white",
+                                     "#FE7F4F"),
+                          guide = "legend") +
+        theme_wpa_basic() +
+        scale_size(guide = "none", range = c(1, 15)) +
+        theme(
+          axis.line=element_blank(),
+		  panel.grid.major.x = element_blank(),
+          panel.grid.major.y = element_line(colour = "#D9E7F7", size = 3), # lightblue bar
+		  panel.grid.minor.x = element_line(color="gray"),
+          strip.placement = "outside",
+          strip.background = element_blank(),
+          strip.text = element_blank()
+        ) +
+        geom_vline(xintercept = avg_ch, colour = "red") 
+
+    } else if(plot_mode == 2){
+
+      output %>%
+        group_by(hrvar) %>%
+        mutate(OrgGroup =
+                 case_when(!!sym(metric) == max(!!sym(metric), na.rm = TRUE) ~ "Top",
+                           !!sym(metric) == min(!!sym(metric), na.rm = TRUE) ~ "Bottom",
+                           TRUE ~ "Middle")) %>%
+        mutate(top_group = max(!!sym(metric), na.rm = TRUE)) %>%
+        ungroup() %>%
+        ggplot(aes(x = !!sym(metric),
+                   y = reorder(hrvar, top_group))) + # Sort by top group
+        geom_point(aes(fill = OrgGroup,
+                       size = n),
+                   colour = "black",
+                   pch = 21,
+                   alpha = 0.8) +
+        labs(title = us_to_space(metric),
+             subtitle = "Group averages by organizational attribute",
+             y = "Organizational attributes",
+             x = us_to_space(metric)) +
+        ggrepel::geom_text_repel(aes(x = !!sym(metric),
+                                     y = hrvar,
+                                     label = ifelse(OrgGroup %in% c("Top", "Bottom"), group, "")),
+                                 size = 3) +
+        scale_x_continuous(position = "top") +
+        scale_fill_manual(name = "Group",
+                          values = c(rgb2hex(68,151,169),
+                                     "white",
+                                     "#FE7F4F"),
+                          guide = "legend") +
+        theme_wpa_basic() +
+        scale_size(guide = "none", range = c(1, 8)) +
+        theme(
+          panel.grid.major.x = element_blank(),
+          panel.grid.major.y = element_line(colour = "#D9E7F7", size = 3), # lightblue bar
+          strip.placement = "outside",
+          strip.background = element_blank(),
+          strip.text = element_blank()
+        ) +
+        geom_vline(xintercept = avg_ch, colour = "red")
+
+    } else {
+
+      stop("Invalid plot_mode argument.")
+
+    }
+
  } else {
    stop("Invalid `return` argument.")
  }
--- a/R/create_sankey.R
+++ b/R/create_sankey.R
@ -0,0 +1,72 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Create a sankey chart from a two-column count table
+#'
+#' @description
+#' Create a networkD3 style sankey chart based on a long count table
+#' with two variables. The input data should have three columns, where
+#' each row is a unique group:
+#'   1. Variable 1
+#'   2. Variable 2
+#'   3. Count
+#'
+#' @param data Data frame of the long count table.
+#' @param var1 String containing the name of the variable to be shown on the left.
+#' @param var2 String containing the name of the variable to be shown on the right.
+#' @param count String containing the name of the count variable.
+#'
+#' @import dplyr
+#'
+#' @examples
+#' \donttest{
+#' sq_data %>%
+#'   dplyr::count(Organization, FunctionType) %>%
+#'   create_sankey(var1 = "Organization", var2 = "FunctionType")
+#' }
+#'
+#' @export
+create_sankey <- function(data, var1, var2, count = "n"){
+
+  ## Rename
+  data$pre_group <- data[[var1]]
+  data$group <- data[[var2]]
+
+  ## Set up `nodes`
+  group_source <- unique(data$pre_group)
+  group_target <- paste0(unique(data$group), " ")
+
+  groups <- c(group_source, group_target)
+
+  nodes_source <- tibble(name = group_source)
+  nodes_target <- tibble(name = group_target)
+  nodes <- rbind(nodes_source, nodes_target) %>% mutate(node = 0:(nrow(.) - 1))
+
+  ## Set up `links`
+  links <-
+    data %>%
+    mutate(group = paste0(group, " ")) %>%
+    select(source = "pre_group",
+           target = "group",
+           value = count)
+
+  nodes_source <- nodes_source %>% select(name) # Make `nodes` a single column data frame
+  nodes_target <- nodes_target %>% select(name) # Make `nodes` a single column data frame
+
+  links <-
+    links %>%
+    left_join(nodes %>% rename(IDsource = "node"), by = c("source" = "name")) %>%
+    left_join(nodes %>% rename(IDtarget = "node"), by = c("target" = "name"))
+
+
+  networkD3::sankeyNetwork(Links = as.data.frame(links),
+                           Nodes = as.data.frame(nodes),
+                           Source = 'IDsource', # Change reference to IDsource
+                           Target = 'IDtarget', # Change reference to IDtarget
+                           Value = 'value',
+                           NodeID = 'name',
+                           units="count",
+                           sinksRight = FALSE)
+}
--- a/R/create_scatter.R
+++ b/R/create_scatter.R
@ -74,9 +74,9 @@ create_scatter <- function(data,
    filter(n < mingroup) %>%
    pull(!!sym(hrvar))

-
-  clean_x <- gsub(pattern = "_", replacement = " ", x = metric_x)
-  clean_y <- gsub(pattern = "_", replacement = " ", x = metric_y)
+  ## Clean metric names
+  clean_x <- us_to_space(metric_x)
+  clean_y <- us_to_space(metric_y)

  myTable <-
    data %>%
--- a/R/create_stacked.R
+++ b/R/create_stacked.R
@ -12,7 +12,7 @@
 #' Returns a stacked bar plot by default.
 #' Additional options available to return a summary table.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param metrics A character vector to specify variables to be used
 #' in calculating the "Total" value, e.g. c("Meeting_hours", "Email_hours").
 #' The order of the variable names supplied determine the order in which they
@ -39,7 +39,6 @@
 #' When 'table' is passed, a summary table is returned as a data frame.
 #'
 #' @examples
-#' \dontrun{
 #' sq_data %>%
 #'   create_stacked(hrvar = "LevelDesignation",
 #'                  metrics = c("Meeting_hours", "Email_hours"),
@ -60,7 +59,6 @@
 #'                              "Call_hours",
 #'                              "Instant_Message_hours"),
 #'                  return = "table")
-#'}
 #' @export
 create_stacked <- function(data,
                           hrvar = "Organization",
@ -85,6 +83,12 @@ create_stacked <- function(data,
  data %>%
    check_inputs(requirements = required_variables)

+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }
+
  n_count <-
    data %>%
    rename(group = !!sym(hrvar)) %>% # Rename HRvar to `group`
--- a/R/create_trend.R
+++ b/R/create_trend.R
@ -11,7 +11,7 @@
 #' Additional options available to return a summary table.
 #'
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param metric Character string containing the name of the metric,
 #' e.g. "Collaboration_hours"
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
@ -27,6 +27,9 @@
 #'
 #' @family Flexible
 #'
+#' @examples
+#' create_trend(sq_data, metric = "Collaboration_hours", hrvar = "LevelDesignation")
+#'
 #' @return
 #' Returns a ggplot object by default, where 'plot' is passed in `return`.
 #' When 'table' is passed, a summary table is returned as a data frame.
@ -49,8 +52,14 @@ create_trend <- function(data,
  data %>%
    check_inputs(requirements = required_variables)

+  ## Handling NULL values passed to hrvar
+  if(is.null(hrvar)){
+    data <- totals_col(data)
+    hrvar <- "Total"
+  }
+
  ## Clean metric name
-  clean_nm <- gsub(pattern = "_", replacement = " ", x = metric)
+  clean_nm <- us_to_space(metric)

  myTable <-
    data %>%
--- a/R/dv_data.R
+++ b/R/dv_data.R
@ -3,10 +3,10 @@
 # Licensed under the MIT License. See LICENSE.txt in the project root for license information.
 # --------------------------------------------------------------------------------------------

-#' @title Sample Standard Query dataset for Data Validation
+#' @title Sample Standard Person Query dataset for Data Validation
 #'
 #' @description
-#' A dataset generated from a Standard Query from WpA.
+#' A dataset generated from a Standard Person Query from WpA.
 #'
 #' @format A data frame with 897 rows and 69 variables:
 #' \describe{
--- a/R/export.R
+++ b/R/export.R
@ -13,7 +13,7 @@
 #'
 #' @param x Data frame or ggplot object to be passed through.
 #' @param method Character string specifying the method of export.
-#' Valid inputs include "clipboard" (default), "csv", "png", and "svg".
+#' Valid inputs include "clipboard" (default), "csv", "png", "svg", "jpeg", and "pdf".
 #' @param path If exporting a file, enter the path and the desired file name, _excluding the file extension_.
 #' For example, "Analysis/SQ Overview".
 #' @param timestamp Logical vector specifying whether to include a timestamp in the file name. Defaults to TRUE.
@ -58,18 +58,10 @@ export <- function(x,

    write.csv(x = x, file = newpath)

-  ## Export option: PNG
-  } else if(method == "png"){
+  ## Export option: any ggsave methods
+  } else if(method %in% c("png", "svg", "jpeg", "pdf")){

-    newpath <- paste0(newpath, ".png")
-
-    ggsave(filename = newpath, plot = x, width = width, height = height)
-
-
-  ## Export option: SVG
-  } else if(method == "svg"){
-
-    newpath <- paste0(newpath, ".svg")
+    newpath <- paste0(newpath, ".", method)

    ggsave(filename = newpath, plot = x, width = width, height = height)

--- a/R/external_network_plot.R
+++ b/R/external_network_plot.R
@ -10,24 +10,11 @@
 #' External Network Breadth as the vertical axis and External Network Size as the
 #' horizontal axis.
 #'
-#' @param data Person Query as a dataframe including date column named "Date"
-#' This function assumes the data format is MM/DD/YYYY as is standard in a WpA query output.
-#' @param hrvar WpA variable for an HR variable to group networks by
-#'  For example, "Layer"
-#' @param mingroup Numeric vector for minimum group size for aggregation
-#' @param return Character vector specifying what to return, defaults to "plot".
-#' Valid inputs are "plot" and "table".
-#' @param bubble_size A numeric vector of length two to specify the size range of the bubbles
-#'
-#' @import dplyr
-#' @import reshape2
-#' @import ggplot2
-#' @import ggrepel
+#' @inheritParams create_bubble
 #'
 #' @examples
-#' \dontrun{external_network_plot(sq_data)
-#
-#' }
+#' ## Return a plot
+#' sq_data %>% external_network_plot(return = "plot")
 #'
 #' @family Connectivity
 #'
@ -38,52 +25,15 @@ external_network_plot <- function(data,
                                  mingroup = 5,
                                  return = "plot",
                                  bubble_size = c(1, 8)){
-  plot_data <-
-    data %>%
-    rename(group = !!sym(hrvar))

-  plot_data <-
-    plot_data %>%
-    group_by(group) %>%
-    summarize(Ext_network_size = mean(External_network_size),
-              Ext_network_breadth = mean(Networking_outside_company),
-              Employee_count = n_distinct(PersonId)
-    ) %>%
-    filter(Employee_count >= mingroup)
-
-  summary_table <- plot_data %>% arrange(., desc(Ext_network_size))
-
-  ext_network_plot <- ggplot(plot_data,
-         aes(x=Ext_network_size,
-             y=Ext_network_breadth)
-  ) +
-    geom_point(aes(size=Employee_count),
-               color = rgb2hex(0, 120, 212),
-               alpha = 0.5) +
-    geom_text_repel(label=plot_data$group) +
-    scale_x_continuous(name = "External Network Size") +
-    scale_y_continuous(name = "External Network Breadth") +
-    scale_size(range = bubble_size) +
-    theme_wpa_basic() +
-    labs(title = paste("External network metrics by", hrvar),
-         subtitle = paste("Network size is number of people, breadth is number of organizations"),
-         size = "Size") +
-    labs(caption = paste("Total employees =", sum(plot_data$Employee_count), "|", extract_date_range(data, return = "text")))
-
-  if(return == "table"){
-
-    summary_table %>%
-      as_tibble() %>%
-      return()
-
-  } else if(return == "plot"){
-
-    return(ext_network_plot)
-
-  } else {
-
-    stop("Please enter a valid input for `return`.")
-
-  }
+  data %>%
+    rename(`External Network Size` = "External_network_size",
+           `External Network Breadth` = "Networking_outside_company") %>%
+    create_bubble(hrvar = hrvar,
+                  mingroup = mingroup,
+                  metric_x = "External Network Size",
+                  metric_y = "External Network Breadth",
+                  return = return,
+                  bubble_size = bubble_size)

 }
--- a/R/flag_extreme.R
+++ b/R/flag_extreme.R
@ -9,7 +9,7 @@
 #' This is used as part of data validation to check if there are extreme values
 #' in the dataset.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param metric A character string specifying the metric to test.
 #' @param person A logical value to specify whether to calculate person-averages.
 #' Defaults to TRUE (person-averages calculated).
@ -21,12 +21,15 @@
 #' @import dplyr
 #'
 #' @examples
-#' \dontrun{
 #' # The threshold values are intentionally set low to trigger messages.
 #' flag_extreme(sq_data, "Email_hours", threshold = 15)
+#'
+#' # Return a summary table
 #' flag_extreme(sq_data, "Email_hours", threshold = 15, return = "table")
+#'
+#' # Person-week level
 #' flag_extreme(sq_data, "Email_hours", person = FALSE, threshold = 15)
-#' }
+#'
 #'
 #' @export
 flag_extreme <- function(data,
@ -54,8 +57,7 @@ flag_extreme <- function(data,


  ## Clean names for pretty printing
-  metric_nm <- gsub(pattern = "_", replacement = " ", x = metric)
-  metric_nm <- camel_clean(metric_nm)
+  metric_nm <- metric %>% us_to_space() %>% camel_clean()

  ## Define MessageLevel
  if(person == TRUE){
--- a/R/generate_report.R
+++ b/R/generate_report.R
@ -39,7 +39,7 @@
 #' title_list <- c("Workloads Summary - Plot", "Workloads Summary - Table")
 #' n_title <- length(title_list)
 #'
-#'# Step 3: Generate Report
+#' # Step 3: Generate Report
 #' generate_report(title = "My First Report",
 #'                 filename = "My First Report",
 #'                 outputs = output_list,
--- a/R/globals.R
+++ b/R/globals.R
@ -184,6 +184,9 @@ utils::globalVariables(
    "value_rescaled",
    "values",
    "cleaned_data",
-    "zscore"
+    "zscore",
+    "StrongTieType",
+    "TieOrigin_PersonId",
+    "TieDestination_PersonId"
    )
 )
--- a/R/heat_colours.R
+++ b/R/heat_colours.R
@ -0,0 +1,37 @@
+#' @title
+#' Generate a vector of `n` contiguous colours, as a red-yellow-green palette.
+#'
+#' @param n the number of colors (≥ 1) to be in the palette.
+#' @param alpha an alpha-transparency level in the range of 0 to 1
+#' (0 means transparent and 1 means opaque)
+#' @param rev logical indicating whether the ordering of the colors should be reversed.
+#'
+#' @examples
+#' barplot(rep(10, 50), col = heat_colours(n = 50), border = NA)
+#'
+#' barplot(rep(10, 50), col = heat_colours(n = 50, alpha = 0.5, rev = TRUE), border = NA)
+#'
+#' @export
+heat_colours <- function (n, alpha, rev = FALSE) {
+
+  ## Move from red to green
+  pre_h <- seq(from = 0, to = 0.3, length.out = n - 1)
+  h <- c(1, pre_h)
+
+  ## Less bright
+  s <- rep(0.69, length(h))
+
+  ## Increasingly low value (darker)
+  v <- seq(from = 1, to = 0.8, length.out = n)
+
+  cols <- grDevices::hsv(h = h, s = s, v = v, alpha = alpha)
+
+  if(rev){
+
+    rev(cols)
+
+  } else if(rev == FALSE) {
+
+    cols
+  }
+}
--- a/R/hr_trend.R
+++ b/R/hr_trend.R
@ -9,7 +9,7 @@
 #' employee count over time. Part of a data validation process to check
 #' for unusual license growth / declines over time.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param return Character vector specifying what to return, defaults to "plot".
 #' Valid inputs are "plot" and "table".
 #'
--- a/R/hrvar_count.R
+++ b/R/hrvar_count.R
@ -10,7 +10,7 @@
 #' by the specified HR attribute.The default behaviour is to return a
 #' bar chart as typically seen in 'Analysis Scope'.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
 #' but accepts any character vector, e.g. "LevelDesignation".
 #' If a vector with more than one value is provided, the HR attributes are automatically
--- a/R/hrvar_count_all.R
+++ b/R/hrvar_count_all.R
@ -12,7 +12,7 @@
 #' employees with NAs for that attribute.
 #' See hrvar_count function for more detail on the specific HR attribute of interest.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param n_var number of HR variables to include in report as rows. Default is set to 10 HR variables.
 #' @param return String to specify what to return
 #' @param threshold The max number of unique values allowed for any attribute. Default is 100.
@ -22,6 +22,10 @@
 #'
 #' @family Data Validation
 #'
+#' @examples
+#' # Return a summary table of all HR attributes
+#' hrvar_count_all(sq_data, return = "table")
+#'
 #' @return
 #' Returns an error message by default, where 'text' is passed in `return`.
 #' When 'table' is passed, a summary table listing the number of distinct fields and percentage of NAs for the specified number of HR attributes will be returned.
@ -36,7 +40,7 @@ hrvar_count_all <- function(data,
                            ){

  ## Character vector of HR attributes
-  extracted_chr <- extract_hr(data, return = "names")
+  extracted_chr <- extract_hr(data, return = "names", max_unique = threshold)

  summary_table_n <-
    data %>%
--- a/R/identify_holidayweeks.R
+++ b/R/identify_holidayweeks.R
@ -12,9 +12,15 @@
 #'
 #' As best practice, run this function prior to any analysis to remove atypical collaboration weeks from your dataset.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param sd The standard deviation below the mean for collaboration hours that should define an outlier week. Enter a positive number. Default is 1 standard deviation.
-#' @param return String to specify what to return
+#' @param return String to specify what to return. Returns a message by default (`return` = "message").
+#' Valid options are:
+#'   - When 'message' is passed, a message is printed identifying holiday weeks.
+#'   - When 'data' is passed, a dataset with outlier weeks flagged in a new column is returned as a dataframe.
+#'   - When 'data_cleaned' is passed, a dataset with outlier weeks removed is returned as a dataframe.
+#'   - when 'data_dirty' is passed, a dataset with only outlier weeks is returned as a dataframe.
+#'   - when 'plot' is passed, a pot with holiday weeks highlighted is returned as a dataframe.
 #'
 #' @import dplyr
 #' @import ggplot2
@ -22,13 +28,13 @@
 #'
 #' @family Data Validation
 #'
-#' @return
-#' Returns a message by default (`return` = "message").
-#' When 'message' is passed, a message is printed identifying holiday weeks.
-#' When 'data' is passed, a dataset with outlier weeks flagged in a new column is returned as a dataframe.
-#' When 'data_cleaned' is passed, a dataset with outlier weeks removed is returned as a dataframe.
-#' when 'data_dirty' is passed, a dataset with only outlier weeks is returned as a dataframe.
-#' when 'plot' is passed, a pot with holiday weeks highlighted is returned as a dataframe.
+#' @examples
+#' # Return a message by default
+#' identify_holidayweeks(sq_data)
+#'
+#' # Return a plot
+#' identify_holidayweeks(sq_data, return = "plot")
+#'
 #'
 #' @export
 identify_holidayweeks <- function(data, sd = 1, return = "message"){
--- a/R/identify_inactiveweeks.R
+++ b/R/identify_inactiveweeks.R
@ -11,7 +11,7 @@
 #'
 #' As best practice, run this function prior to any analysis to remove atypical collaboration weeks from your dataset.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param sd The standard deviation below the mean for collaboration hours that should define an outlier week. Enter a positive number. Default is 1 standard deviation.
 #' @param return String to specify what to return.
 #'
--- a/R/identify_nkw.R
+++ b/R/identify_nkw.R
@ -9,7 +9,7 @@
 #' This function scans a standard query output to identify employees with consistently low collaboration signals.
 #' Returns the % of non-knowledge workers identified by Organization, and optionally an edited dataframe with non-knowledge workers removed, or the full dataframe with the kw/nkw flag added.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param collab_threshold The collaboration hours threshold that should be exceeded as an average for the entire analysis period for the employee to be categorized as a knowledge worker ("kw").
 #' Enter a positive number. Default is set to 5 collaboration hours.
 #' @param return String to specify what to return
--- a/R/identify_outlier.R
+++ b/R/identify_outlier.R
@ -12,7 +12,7 @@
 #' Time as a grouping variable can be overridden with the `group_var`
 #' argument.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param group_var A string with the name of the grouping variable.
 #' Defaults to `Date`.
 #' @param metric Character string containing the name of the metric,
--- a/R/identify_privacythreshold.R
+++ b/R/identify_privacythreshold.R
@ -9,7 +9,7 @@
 #' This function scans a standard query output for groups with of employees under the privacy threshold.
 #' The method consists in reviewing each individual HR attribute, and count the distinct people within each group.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar A list of HR Variables to consider in the scan.
 #' Defaults to all HR attributes identified.
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size.
--- a/R/identify_query.R
+++ b/R/identify_query.R
@ -32,7 +32,7 @@ identify_query <- function(data, threshold = 2){

  ## variables to check for in each query type
  spq_var <- c("PersonId", "Collaboration_hours", "Instant_Message_hours") # Standard Person query
-  caq_var <- c("PersonId", "Collaboration_hrs", "Instant_message_hours") # Collaboration Assessment query
+  caq_var <- c("PersonId", "Collaboration_hrs", "Instant_message_hours") # Ways of Working Assessment query
  smq_var <- c("MeetingId", "Date", "Attendees") # Standard Meeting Query
  shc_var <- c("PersonId", "Emails_sent_00_01", "IMs_sent_23_24") # Standard Hourly Collaboration

@ -54,7 +54,7 @@ identify_query <- function(data, threshold = 2){

  ## Labels
  qlabels <- c("Person Query",
-               "Collaboration Assessment Query",
+               "Ways of Working Assessment Query",
               "Meeting Query",
               "Hourly Collaboration Query")

--- a/R/identify_tenure.R
+++ b/R/identify_tenure.R
@ -12,7 +12,7 @@
 #'
 #' @family Data Validation
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param end_date A string specifying the name of the date variable representing the latest date. Defaults to "Date".
 #' @param beg_date A string specifying the name of the date variable representing the hire date. Defaults to "HireDate".
 #' @param maxten A numeric value representing the maximum tenure.
--- a/R/import_wpa.R
+++ b/R/import_wpa.R
@ -19,7 +19,7 @@
 #' e.g. "/files/standard query.csv"
 #' @param standardise logical. If TRUE, `import_wpa()` runs `standardise_pq()` to make a Collaboration
 #' Assessment query's columns name standard and consistent with a Standard Person Query. Note that this
-#' will have no effect if the query being imported is not a Collaboration Assessment query. Defaults
+#' will have no effect if the query being imported is not a Ways of Working Assessment query. Defaults
 #' as FALSE.
 #'
 #' @export
@ -43,8 +43,8 @@ import_wpa <- function(x, standardise = FALSE){
  }

  ## Standardise query if `standardise == TRUE`
-  if(standardise == TRUE & identify_query(return_data) == "Collaboration Assessment Query"){
-    message("Standardising column names for a Collaboration Assessment query to
+  if(standardise == TRUE & identify_query(return_data) == "Ways of Working Assessment Query"){
+    message("Standardising column names for a Ways of Working Assessment query to
            a Person query...")
    return_data <- standardise_pq(return_data)
  }
--- a/R/internal_network_plot.R
+++ b/R/internal_network_plot.R
@ -10,24 +10,11 @@
 #' Internal Network Breadth as the vertical axis and Internal Network Size as the
 #' horizontal axis.
 #'
-#' @param data Person Query as a dataframe including date column named "Date"
-#' This function assumes the data format is MM/DD/YYYY as is standard in a WpA query output.
-#' @param hrvar WpA variable for an HR variable to group networks by
-#'  For example, "Layer"
-#' @param mingroup Numeric vector for minimum group size for aggregation
-#' @param return Character vector specifying what to return, defaults to "plot".
-#' Valid inputs are "plot" and "table".
-#' @param bubble_size A numeric vector of length two to specify the size range of the bubbles
-#'
-#' @import dplyr
-#' @import reshape2
-#' @import ggplot2
-#' @import ggrepel
+#' @inheritParams create_bubble
 #'
 #' @examples
-#' \dontrun{
-#' internal_network_plot(sq_data)
-#' }
+#' ## Return a plot
+#' sq_data %>% internal_network_plot(return = "plot")
 #'
 #' @family Connectivity
 #'
@ -37,60 +24,17 @@ internal_network_plot <- function(data,
                                  hrvar = "Organization",
                                  mingroup = 5,
                                  return = "plot",
-                                  bubble_size = c(1, 8))
-  {
-  plot_data <-
-    data %>%
-    rename(group = !!sym(hrvar))
+                                  bubble_size = c(1, 8)) {

-  plot_data <-
-    plot_data %>%
-    group_by(group) %>%
-    summarize(Int_network_size = mean(Internal_network_size),
-              Int_network_breadth = mean(Networking_outside_organization),
-              Employee_count = n_distinct(PersonId)
-              ) %>%
-    filter(Employee_count >= mingroup)
+  data %>%
+    rename(`Internal Network Size` = "Internal_network_size",
+           `Internal Network Breadth` = "Networking_outside_organization") %>%
+    create_bubble(hrvar = hrvar,
+                  mingroup = mingroup,
+                  metric_x = "Internal Network Size",
+                  metric_y = "Internal Network Breadth",
+                  return = return,
+                  bubble_size = bubble_size)

-  # Create summary table
-  summary_table <- plot_data %>% arrange(., desc(Int_network_size))
-
-  # Create plot
-  int_network_plot <-ggplot(plot_data,
-         aes(x=Int_network_size,
-             y=Int_network_breadth)
-         ) +
-    geom_point(aes(size=Employee_count),
-                color = rgb2hex(0, 120, 212),
-                alpha = 0.5) +
-    geom_text_repel(label=plot_data$group) +
-    scale_x_continuous(name = "Internal Network Size") +
-    scale_y_continuous(name = "Internal Network Breadth") +
-    scale_size(range = bubble_size) +
-    theme_wpa_basic() +
-    labs(
-      title = paste("Internal network metrics by", hrvar),
-      subtitle = paste("Network size is number of people, breadth is number of organizations"),
-      size = "Size"
-    ) +
-    labs(caption = paste("Total employees =", sum(plot_data$Employee_count), "|", extract_date_range(data, return = "text")))
-
-
-
-  if(return == "table"){
-
-    summary_table %>%
-      as_tibble() %>%
-      return()
-
-  } else if(return == "plot"){
-
-    return(int_network_plot)
-
-  } else {
-
-    stop("Please enter a valid input for `return`.")
-
-  }

 }
--- a/R/keymetrics_scan.R
+++ b/R/keymetrics_scan.R
@ -3,12 +3,12 @@
 # Licensed under the MIT License. See LICENSE.txt in the project root for license information.
 # --------------------------------------------------------------------------------------------

-#' @title Run a summary of Key Metrics from the Standard Query data
+#' @title Run a summary of Key Metrics from the Standard Person Query data
 #'
 #' @description
 #' Returns a heatmapped table by default, with options to return a table.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics. Accepts a character vector, e.g. "Organization"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
 #' @param metrics A character vector containing the variable names to calculate averages of.
@ -74,34 +74,38 @@ keymetrics_scan <- function(data,
    mutate(variable = factor(variable)) %>%
    group_by(variable) %>%
    # Heatmap by row
-    mutate(value_rescaled = value/mean(value)) %>%
+    mutate(value_rescaled = maxmin(value)) %>%
    ungroup()

-  # Underscore to space
-  us_to_space <- function(x){
-    gsub(pattern = "_", replacement = " ", x = x)
-  }
-
-
  plot_object <-
    myTable_long %>%
    filter(variable != "Employee_Count") %>%
    ggplot(aes(x = group,
               y = stats::reorder(variable, desc(variable)))) +
-    geom_tile(aes(fill = value_rescaled)) +
+    geom_tile(aes(fill = value_rescaled),
+              colour = "#FFFFFF",
+              size = 2) +
    geom_text(aes(label=round(value, 1)), size = textsize) +
-    scale_fill_distiller(palette = "Blues",  direction = 1) +
+    # Fill is contingent on max-min scaling
+    scale_fill_gradient2(low = rgb2hex(7, 111, 161),
+                         mid = rgb2hex(241, 204, 158),
+                         high = rgb2hex(216, 24, 42),
+                         midpoint = 0.5,
+                         breaks = c(0, 0.5, 1),
+                         labels = c("Minimum", "", "Maximum"),
+                         limits = c(0, 1)) +
    scale_x_discrete(position = "top") +
    scale_y_discrete(labels = us_to_space) +
-    theme_light() +
-    labs(title = "Key Workplace Analytics metrics",
+    theme_wpa_basic() +
+    theme(axis.line = element_line(color = "#FFFFFF")) +
+    labs(title = "Key metrics",
         subtitle = paste("Weekly average by", camel_clean(hrvar)),
         y =" ",
         x =" ",
+         fill = " ",
         caption = extract_date_range(data, return = "text")) +
    theme(axis.text.x = element_text(angle = 90, hjust = 0),
-          plot.title = element_text(color="grey40", face="bold", size=20)) +
-    guides(fill=FALSE)
+          plot.title = element_text(color="grey40", face="bold", size=20))


  if(return == "table"){
--- a/R/maxmin.R
+++ b/R/maxmin.R
@ -0,0 +1,23 @@
+#' @title Max-Min Scaling Function
+#'
+#' @description This function allows you to scale vectors or an entire data frame using the max-min scaling method
+#' A numeric vector is always returned. Originally implemented in https://github.com/martinctc/surveytoolbox.
+#'
+#' @details This is used within `keymetrics_scan()` to enable row-wise heatmapping.
+#'
+#' @param x Pass a vector or the required columns of a data frame through this argument.
+#' @keywords max-min
+#' @export
+#' @examples
+#' numbers <- c(15, 40, 10, 2)
+#' maxmin(numbers)
+#' @export
+
+maxmin <- function(x){
+  if(any(is.na(x))){
+    warning("Warning: vector contains missing values. Those values will return as NA.")
+  }
+  maxs <- max(x, na.rm = TRUE)
+  mins <- min(x, na.rm = TRUE)
+  as.numeric(scale(x,center=mins,scale=maxs-mins))
+}
--- a/R/meeting_quality.R
+++ b/R/meeting_quality.R
@ -8,16 +8,12 @@
 #' @description Return an analysis of Meeting Quality with a bubble plot, using a Standard Person
 #' Query as an input.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
-#' @param hrvar HR Variable by which to split metrics. Accepts a character vector, e.g. "Organization"
-#' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
-#' @param return Character vector specifying what to return, defaults to "plot".
-#' Valid inputs are "plot" or "table".
+#' @inheritParams create_bubble
 #'
 #' @import dplyr
-#' @import ggplot2
-#' @import reshape2
-#' @family Meeting Culture
+#'
+#' @examples
+#' meeting_quality(sq_data, return = "plot")
 #'
 #' @return
 #' Returns a ggplot object by default, where 'plot' is passed in `return`.
@ -30,85 +26,35 @@ meeting_quality <- function(data,
                            mingroup = 5,
                            return = "plot"){

-  ## Date range data frame
-  myPeriod <- extract_date_range(data)
-
-  ## Prepare Table
-  data %>%
-    rename(group = !!sym(hrvar)) %>% # Rename hrvar to `group`
-    group_by(PersonId, group) %>%
-    summarize(Meeting_count = mean(Meetings),
-              Meeting_hours = mean(Meeting_hours),
-              Low_quality_meeting_hours = mean(Low_quality_meeting_hours),
-              perc_after_hours_m = 100*mean(After_hours_meeting_hours)/mean(Meeting_hours),
-              perc_low_quality = 100*mean(Low_quality_meeting_hours)/mean(Meeting_hours),
-              perc_Multitasking = 100*mean(Multitasking_meeting_hours)/mean(Meeting_hours),
-              perc_Redundant = 100*mean(Redundant_meeting_hours__organizational_)/mean(Meeting_hours),
-              perc_conflicting  = 100*mean(Conflicting_meeting_hours)/mean(Meeting_hours)) %>%
-    group_by(group) %>%
-    summarise_at(vars(Meeting_count,
-                      Meeting_hours,
-                      Low_quality_meeting_hours,
-                      perc_after_hours_m,
-                      perc_low_quality,
-                      perc_Multitasking,
-                      perc_Redundant,
-                      perc_conflicting),
-                 ~mean(.)) %>%
-    left_join(data %>%
-                rename(group = !!sym(hrvar)) %>%
-                group_by(group) %>%
-                summarise(Employee_Count = n_distinct(PersonId)),
-              by = "group") %>%
-    filter(Employee_Count >= mingroup) -> myTable
-
- myTable_wide <- myTable %>%
-    reshape2::melt(id.vars = "group") %>%
-    reshape2::dcast(variable ~ group)
-
-  myTable_long  <-  reshape2::melt(myTable, id.vars=c("group"))
-  myTable_plot <- myTable  %>% select(group, perc_low_quality, Meeting_hours)
-
-## Bar plot
-  plot_object <-
-    myTable_plot %>%
-    ggplot(aes(x = perc_low_quality, y = Meeting_hours, size=2)) +
-    geom_point(stat = "identity",
-             fill = "#203864", alpha=0.1) +
-    geom_text(aes(label = group),
-              hjust = 0.2,
-              color = "black",
-              fontface = "bold",
-              size = 4)+
-    ylim(min(myTable_plot$Meeting_hours),max(myTable_plot$Meeting_hours) + 2) +
-    xlim(min(myTable_plot$perc_low_quality),max(myTable_plot$perc_low_quality) + 2) +
-    theme_wpa_basic() +
-    theme(axis.text=element_text(size=12),
-          plot.title = element_text(color="grey40", face="bold", size=18),
-          plot.subtitle = element_text(size=14),
-          legend.position = "none",
-          legend.justification = "right",
-          legend.title=element_text(size=14),
-          legend.text=element_text(size=14)) +
-    labs(title = "Meeting Quality",
-         subtitle = paste("Meeting time and low-quality percentage by", tolower(hrvar))) +
-    ylab("Average weekly meeting hours") +
-    xlab("Average weekly percentage of low-quality meeting hours") +
-    labs(caption = paste("Data from week of", myPeriod$Start, "to week of", myPeriod$End))
-
+  ## Wrapper around summary table
  if(return == "table"){

-    myTable %>%
-      as_tibble() %>%
-      return()
+    meeting_chr <-
+      c("After_hours_meeting_hours",
+        "Low_quality_meeting_hours",
+        "Conflicting_meeting_hours",
+        "Multitasking_meeting_hours",
+        "Meetings",
+        "Meeting_hours")

-  } else if(return == "plot"){

-    return(plot_object)
+    data %>%
+      rename(group = !!sym(hrvar)) %>% # Rename hrvar to `group`
+      group_by(PersonId, group) %>%
+      summarise_at(vars(meeting_chr), ~mean(., na.rm = TRUE)) %>%
+      group_by(group) %>%
+      summarise_at(vars(meeting_chr), ~mean(., na.rm = TRUE)) %>%
+      left_join(hrvar_count(data, hrvar, return = "table"), by = c("group" = hrvar)) %>%
+      filter(n >= mingroup)

  } else {

-    stop("Please enter a valid input for `return`.")
-
+    data %>%
+      mutate(Percentage_of_Low_quality_meeting_hours = Low_quality_meeting_hours / Meeting_hours) %>%
+      create_bubble(hrvar = hrvar,
+                    mingroup = mingroup,
+                    metric_x = "Percentage_of_Low_quality_meeting_hours",
+                    metric_y = "Meeting_hours",
+                    return = return)
  }
 }
--- a/R/meetingtype_dist.R
+++ b/R/meetingtype_dist.R
@ -3,15 +3,15 @@
 #' @description
 #' Calculate the hour distribution of internal meeting types.
 #' This is a wrapper around `meetingtype_dist_mt()` and
-#' `meetingtype_dist_ca()`, depending on whether a Meeting Query or a Collaboration Assessment Query
+#' `meetingtype_dist_ca()`, depending on whether a Meeting Query or a Ways of Working Assessment Query
 #' is passed as an input.
 #'
 #' @param data Data frame. If a meeting query, must contain the variables `Attendee` and `DurationHours`.
 #' @param hrvar Character string to specify the HR attribute to split the data by.
-#' Note that this is only applicable if a Collaboration Assessment query is passed to the function. If a Meeting Query
+#' Note that this is only applicable if a Ways of Working Assessment query is passed to the function. If a Meeting Query
 #' is passed instead, this argument is ignored.
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
-#' Only applicable when using a Collaboration Assessment query.
+#' Only applicable when using a Ways of Working Assessment query.
 #' @param return Character vector specifying what to return, defaults to "plot".
 #' Valid inputs are "plot" and "table".
 #'
@ -19,6 +19,10 @@
 #' @import dplyr
 #' @importFrom stats setNames
 #'
+#' @examples
+#' ## Implementation using Standard Meeting Query
+#' meetingtype_dist(mt_data)
+#'
 #' @export

 meetingtype_dist <- function(data,
@ -33,10 +37,10 @@ meetingtype_dist <- function(data,

  } else if("PersonId" %in% names(data)){

-    message("Calculating results using a Collaboration Assessment Query...")
+    message("Calculating results using a Ways of Working Assessment Query...")
    meetingtype_dist_ca(data, hrvar = hrvar, mingroup = mingroup, return = return)

  } else {
-    stop("Please check query type. Must be either a Collaboration Assessment Query or a Meeting Query.")
+    stop("Please check query type. Must be either a Ways of Working Assessment Query or a Meeting Query.")
  }
 }
--- a/R/meetingtype_dist_ca.R
+++ b/R/meetingtype_dist_ca.R
@ -1,8 +1,8 @@
-#' @title Meeting Type Distribution (Collaboration Assessment Query)
+#' @title Meeting Type Distribution (Ways of Working Assessment Query)
 #'
 #' @description
 #' Calculate the hour distribution of internal meeting types,
-#' using a Collaboration Assessment Query with core WpA variables as an input.
+#' using a Ways of Working Assessment Query with core WpA variables as an input.
 #'
 #' @param data Meeting Query data frame. Must contain the variables `Attendee` and `DurationHours`
 #' @param hrvar Character string to specify the HR attribute to split the data by.
@ -32,8 +32,7 @@ meetingtype_dist_ca <- function(data,
  mt_dist_str_clean <-
    mt_dist_str %>%
    gsub(pattern = "_meeting_hours", replacement = "", x = .) %>%
-    gsub(pattern = "_", replacement = " ", x = .)
-
+    us_to_space()

  ## Add dummy "Total" column if hrvar = NULL
  if(is.null(hrvar)){
@ -53,7 +52,7 @@ meetingtype_dist_ca <- function(data,
      mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours),
             Percent = paste(round(Prop * 100), "%")) %>%
      mutate(MeetingType = gsub(pattern = "_meeting_hours", replacement = "", x = MeetingType)) %>%
-      mutate(MeetingType = gsub(pattern = "_", replacement = " ", x = MeetingType))
+      mutate(MeetingType = us_to_space(MeetingType))

    ## Only for creating the bottom row data
    myResultsTableTotal <-
@ -86,7 +85,7 @@ meetingtype_dist_ca <- function(data,
      mutate(Prop = AttendeeMeetingHours / sum(AttendeeMeetingHours),
             Percent = paste(round(Prop * 100), "%")) %>%
      mutate(MeetingType = gsub(pattern = "_meeting_hours", replacement = "", x = MeetingType)) %>%
-      mutate(MeetingType = gsub(pattern = "_", replacement = " ", x = MeetingType))
+      mutate(MeetingType = us_to_space(MeetingType))

    outputTable <-
      myResultsTable %>%
--- a/R/meetingtype_summary.R
+++ b/R/meetingtype_summary.R
@ -4,7 +4,7 @@
 #' This function creates a bar chart showing the percentage of meeting hours which are spent in
 #' long or large meetings.
 #'
-#' @param data Collaboration Assessment query in the form of a data frame. Requires the following variables:
+#' @param data Ways of Working Assessment query in the form of a data frame. Requires the following variables:
 #' - `Bloated_meeting_hours`
 #' - `Lengthy_meeting_hours`
 #' - `Workshop_meeting_hours`
@ -43,7 +43,7 @@ meetingtype_summary <- function(data,
    summarise_at(vars(mt_dist_str), ~sum(., na.rm = TRUE)) %>%
    gather(MeetingType, AttendeeMeetingHours, -!!sym(hrvar)) %>%
    mutate(MeetingType = gsub(pattern = "_meeting_hours", replacement = "", x = MeetingType)) %>%
-    mutate(MeetingType = gsub(pattern = "_", replacement = " ", x = MeetingType)) %>%
+    mutate(MeetingType = us_to_space(MeetingType)) %>%
    group_by(!!sym(hrvar)) %>%
    mutate(AttendeeMeetingHours = AttendeeMeetingHours / sum(AttendeeMeetingHours)) %>%
    spread(MeetingType, AttendeeMeetingHours) %>%
--- a/R/mgrcoatt_dist.R
+++ b/R/mgrcoatt_dist.R
@ -10,7 +10,7 @@
 #' Returns a stacked bar plot of different buckets of coattendance.
 #' Additional options available to return a table with distribution elements.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics. Accepts a character vector, defaults to "Organization" but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size, defaults to 5.
 #' @param return Character vector specifying what to return, defaults to "plot".
--- a/R/mgrrel_matrix.R
+++ b/R/mgrrel_matrix.R
@ -9,7 +9,7 @@
 #' Generate the Manager-Relationship 2x2 matrix, returning a ggplot object by default.
 #' Additional options available to return a "wide" or "long" summary table.
 #'
-#' @param data Standard Query data to pass through. Accepts a data frame.
+#' @param data Standard Person Query data to pass through. Accepts a data frame.
 #' @param hrvar HR Variable by which to split metrics. Accepts a character vector,
 #' e.g. "Organization". Defaults to NULL.
 #' @param return A character vector specifying whether to return a matrix plot or a table.
--- a/R/network_describe.R
+++ b/R/network_describe.R
@ -0,0 +1,125 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Uncover HR attributes which best represent a population for a Person to Person query
+#'
+#' @author Tannaz Sattari Tabrizi <Tannaz.Sattari@microsoft.com>
+#'
+#' @description
+#' Returns a data frame that gives a percentage of the group combinations that best represent
+#' the population provided. Uses a person to person query.
+#'
+#' @param data Data frame for a person to person query.
+#' @param hrvar Character vector of length 3 containing the HR attributes to be used.
+#'
+#' @import dplyr
+#' @import tidyr
+#'
+#' @export
+network_describe <- function(data, hrvar = c("Organization", "LevelDesignation", "FunctionType")){
+
+  if(length(hrvar) != 3){
+
+    stop("Please provide a character vector of length 3 for `hrvar`")
+
+  }
+
+  ## De-duplicated data containing only TieOrigins
+  filtered_Data <- unique(select(data, starts_with("TieOrigin_")))
+
+  ## Select features
+  features <- select(filtered_Data, paste0("TieOrigin_", hrvar))
+
+  ## Feature set: 1
+  max_percentages_1f <-
+    features %>%
+    colnames() %>%
+    purrr::map(function(c){
+
+      agg <-
+        features %>%
+        group_by_at(.vars = vars(c)) %>%
+        summarise(count = n(), .groups = "drop") %>%
+        mutate(percentage = count / sum(count, na.rm = TRUE))
+
+      agg %>%
+        arrange(desc(percentage)) %>%
+        slice(1) %>% # Extract first row
+        mutate(feature_1 = c,
+               feature_1_value = !!sym(c)) %>%
+        select(feature_1, feature_1_value, Percentage = "percentage")
+    }) %>%
+    bind_rows()
+
+  ## Feature set: 2
+  max_percentages_2f <-
+    list(c1 = colnames(features),
+         c2 = colnames(features)) %>%
+    expand.grid(stringsAsFactors = FALSE) %>%
+    filter(c1 != c2) %>%
+    purrr::pmap(function(c1, c2){
+      agg <-
+        features %>%
+        group_by_at(.vars=vars(c1, c2)) %>%
+        summarise(count = n(), .groups = "drop") %>%
+        mutate(percentage = count / sum(count, na.rm = TRUE))
+
+      agg %>%
+        arrange(desc(percentage)) %>%
+        slice(1) %>% # Extract first row
+        mutate(feature_1 = c1,
+               feature_1_value = !!sym(as.character(c1)),
+               feature_2 = c2,
+               feature_2_value = !!sym(as.character(c2))) %>%
+      select(feature_1,
+             feature_1_value,
+             feature_2,
+             feature_2_value,
+             Percentage = "percentage")
+    }) %>%
+    bind_rows()
+
+
+  ## Feature set: 3
+  max_percentages_3f <-
+    list(c1 = colnames(features),
+         c2 = colnames(features),
+         c3 = colnames(features)) %>%
+    expand.grid(stringsAsFactors = FALSE) %>%
+    filter(c1 != c2,
+           c2 != c3,
+           c3 != c1) %>%
+    purrr::pmap(function(c1, c2, c3){
+      agg <-
+        features %>%
+        group_by_at(.vars=vars(c1, c2, c3)) %>%
+        summarise(count = n(), .groups = "drop") %>%
+        mutate(percentage = count / sum(count, na.rm = TRUE))
+
+      agg %>%
+        arrange(desc(percentage)) %>%
+        slice(1) %>% # Extract first row
+        mutate(feature_1 = c1,
+               feature_1_value = !!sym(c1),
+               feature_2 = c2,
+               feature_2_value = !!sym(c2),
+               feature_3 = c3,
+               feature_3_value = !!sym(c3)) %>%
+        select(feature_1,
+               feature_1_value,
+               feature_2,
+               feature_2_value,
+               feature_3,
+               feature_3_value,
+               Percentage = "percentage")
+    }) %>%
+    bind_rows()
+
+  list(max_percentages_1f,
+       max_percentages_2f,
+       max_percentages_3f) %>%
+    bind_rows() %>%
+    select(starts_with("feature"), Percentage)
+}
--- a/R/network_g2g.R
+++ b/R/network_g2g.R
@ -3,11 +3,10 @@
 # Licensed under the MIT License. See LICENSE.txt in the project root for license information.
 # --------------------------------------------------------------------------------------------

-#' @title Create a network plot with the G2G query
+#' @title Create a network plot with the group-to-group query
 #'
 #' @description
-#' Pass a data frame containing a G2G query and return a network
-#' plot.
+#' Pass a data frame containing a group-to-group query and return a network plot.
 #' Automatically handles "Collaborators_within_group" and "Other_collaborators" within query data.
 #'
 #' @param data Data frame containing a G2G query.
@ -16,31 +15,30 @@
 #' @param metric String containing the variable name for metric.
 #' @param exc_threshold Exclusion threshold to apply.
 #' @param subtitle String to override default plot subtitle.
-#' @param ... Additional arguments to pass to `GGally::ggnet2()`
 #' @param return Character vector specifying what to return, defaults to "plot".
-#' Valid inputs are "plot" and "table".
-#'
+#' Valid inputs include:
+#'   - "plot": return a network plot.
+#'   - "table": return a raw data table used to plot the network.
+#'   - "network": return an **igraph** object
 #'
 #' @import ggplot2
 #' @import dplyr
 #'
 #' @examples
-#' \dontrun{
+#' ## Return a network plot
 #' g2g_data %>%
-#'   g2g_network(time_investor = "TimeInvestors_Organization",
+#'   network_g2g(time_investor = "TimeInvestors_Organization",
 #'               collaborator = "Collaborators_Organization",
 #'               metric = "Collaboration_hours")
-#' }
 #'
 #' @export
-g2g_network <- function(data,
+network_g2g <- function(data,
                        time_investor,
                        collaborator,
                        metric,
                        exc_threshold = 0.1,
                        subtitle = "Collaboration Across Organizations",
-                        return = "plot",
-                        ...){
+                        return = "plot"){

  plot_data <-
    data %>%
@ -62,33 +60,40 @@ g2g_network <- function(data,

    plot_data

-  } else if(return == "plot"){
+  } else if(return %in% c("plot", "network")){

+    ## Network object
    mynet_em <-
      plot_data %>%
      filter(metric_prop > exc_threshold) %>%
      mutate_at(vars(TimeInvestorOrg, CollaboratorOrg), ~sub(pattern = " ", replacement = "\n", x = .)) %>%
      mutate(metric_prop = metric_prop * 10) %>%
-      network::network(matrix.type = "edgelist",
-                       ignore.eval = FALSE,
-                       names.eval = "weights")
+      igraph::graph_from_data_frame(directed = FALSE)

-    mynet_em %>%
-      GGally::ggnet2(size = 12,
-                     color = "lightblue",
-                     label = TRUE,
-                     label.size = 4,
-                     label.color = "black",
-                     edge.size = "weights",
-                     edge.alpha = .5,
-                     ...) +
-      ggtitle("Group to Group Collaboration",
-              subtitle = subtitle) +
-      xlab(label = "") +
-      ylab(label = "") +
-      theme_wpa_basic() +
-      labs(caption = paste("Displays only collaboration above ", exc_threshold * 100, "% of node's total collaboration", sep = "")) +
-      theme(axis.line = element_blank())
+    if(return == "network"){
+
+      mynet_em # Return igraph object
+
+    } else {
+
+      ## Plot object
+      mynet_em %>%
+        ggraph::ggraph(layout = "fr") +
+        ggraph::geom_edge_link(aes(edge_width = metric_prop * 1), edge_alpha = 0.5, edge_colour = "grey") +
+        ggraph::geom_node_point(size = 20, colour = "lightblue") +
+        ggraph::geom_node_text(aes(label = name), size = 3, repel = FALSE) +
+        ggplot2::theme(panel.background = ggplot2::element_rect(fill = 'white'), legend.position = "none") +
+        theme_wpa_basic() +
+        labs(title = "Group to Group Collaboration",
+             subtitle = subtitle,
+             x = "",
+             y = "",
+             caption = paste("Displays only collaboration above ", exc_threshold * 100, "% of node's total collaboration", sep = "")) +
+        theme(axis.line = element_blank(),
+              axis.text = element_blank(),
+              legend.position = "none")
+
+    }

  } else {

@ -96,3 +101,7 @@ g2g_network <- function(data,

  }
 }
+
+#' @rdname network_g2g
+#' @export
+g2g_network <- network_g2g
--- a/R/network_leiden.R
+++ b/R/network_leiden.R
@ -0,0 +1,225 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Implement the Leiden community detection on a Person to Person network query
+#'
+#' @description
+#' Take a P2P network query and implement the Leiden community detection method. To run
+#' this function, you will require all the pre-requisites of the **leiden** package installed,
+#' which includes Python and **reticulate**.
+#'
+#' @param data Data frame containing a Person to Person query.
+#' @param hrvar String containing the HR attribute to be matched in the dataset.
+#' @param bg_fill String to specify background fill colour.
+#' @param font_col String to specify font and link colour.
+#' @param node_alpha A numeric value between 0 and 1 to specify the transparency of the nodes.
+#' @param path File path for saving the PDF output. Defaults to "network_p2p_leiden".
+#' Since the network outputs are computationally intensive, the default behaviour is to save time by
+#' saving the plot output directly as a PDF in the specified path. To override this behaviour and return
+#' a plot object instead, you can pass `NULL` to `path`. What is passed to `path` makes no difference
+#' if returning anything other than "plot-leiden" or "plot-hrvar".
+#'
+#' @param algorithm String to specify the node placement algorithm to be used. Defaults to "mds" to perform
+#' a multidimensional scaling of nodes using a shortest path, which is also a deterministic method.
+#' See <https://rdrr.io/cran/ggraph/man/layout_tbl_graph_igraph.html> for a full list of options.
+#'
+#' @param res Resolution parameter to be passed to `leiden::leiden()`. Defaults to 0.5.
+#' @param desc_hrvar Character vector of length 3 containing the HR attributes to use when returning the
+#' "describe" output. See `network_describe()`.
+#' @param return String specifying what output to return. Valid return options include:
+#'   - 'plot-leiden': return a network plot coloured by leiden communities.
+#'   - 'plot-hrvar': return a network plot coloured by HR attribute.
+#'   - 'plot-sankey': return a sankey plot combining communities and HR attribute.
+#'   - 'table': return a vertex summary table with counts in communities and HR attribute.
+#'   - 'data': return a vertex data file that matches vertices with communities and HR attributes.
+#'   - 'describe': return a list of data frames which describe each of the identified communities.
+#'   - 'network': return igraph object.
+#'
+#' @import dplyr
+#'
+#' @export
+network_leiden <- function(data,
+                           hrvar,
+                           bg_fill = "#000000",
+                           font_col = "#FFFFFF",
+                           algorithm = "mds",
+                           path = "network_p2p_leiden",
+                           node_alpha = 0.8,
+                           res = 0.5,
+                           desc_hrvar = c("Organization", "LevelDesignation", "FunctionType"),
+                           return){
+
+  ## Set variables
+  TO_hrvar <- paste0("TieOrigin_", hrvar)
+  TD_hrvar <- paste0("TieDestination_", hrvar)
+
+  ## Set edges df
+  edges <-
+    data %>%
+    select(from = "TieOrigin_PersonId",
+           to = "TieDestination_PersonId",
+           weight = "StrongTieScore")
+
+  ## Vertices data frame to provide meta-data
+  vert_ft <-
+    rbind(
+      # TieOrigin
+      edges %>%
+        select(from) %>% # Single column
+        unique() %>% # Remove duplications
+        left_join(select(data, TieOrigin_PersonId, TO_hrvar),
+                  by = c("from"  = "TieOrigin_PersonId")) %>%
+        select(node = "from", !!sym(hrvar) := TO_hrvar),
+
+      # TieDestination
+      edges %>%
+        select(to) %>% # Single column
+        unique() %>% # Remove duplications
+        left_join(select(data, TieDestination_PersonId, TD_hrvar),
+                  by = c("to"  = "TieDestination_PersonId")) %>%
+        select(node = "to", !!sym(hrvar) := TD_hrvar)
+    )
+
+  ## Create igraph object
+  g_raw <-
+    igraph::graph_from_data_frame(edges,
+                                  directed = TRUE, # Directed, but FALSE for visualization
+                                  vertices = unique(vert_ft)) # remove duplicates
+
+  ## Return a numeric vector of partitions / clusters / modules
+  ## Set a low resolution parameter to have fewer groups
+  ld <- leiden::leiden(g_raw, resolution_parameter = res) # create partitions
+
+  ## Add cluster
+  g <-
+    g_raw %>%
+    # Add leiden partitions to graph object
+    igraph::set_vertex_attr("cluster", value = as.character(ld)) %>%
+    igraph::simplify()
+
+  ## Create vertex table
+  vertex_tb <-
+    g %>%
+    igraph::get.vertex.attribute() %>%
+    as_tibble()
+
+  g_layout <-
+    g %>%
+    ggraph::ggraph(layout = "igraph", algorithm = algorithm)
+
+  ## Return
+  if(return == "plot-leiden"){
+
+    plot_output <-
+      g_layout +
+      ggraph::geom_edge_link(colour = "lightgrey", edge_width = 0.01, alpha = 0.15) +
+      ggraph::geom_node_point(aes(colour = cluster),
+                              alpha = node_alpha,
+                              pch = 16) +
+      theme_void() +
+      theme(legend.position = "bottom",
+            legend.background = element_rect(fill = bg_fill),
+            plot.background = element_rect(fill = bg_fill),
+            text = element_text(colour = font_col),
+            axis.line = element_blank()) +
+      labs(title = "Person to person collaboration with Community Detection",
+           subtitle = "Based on Leiden algorithm and Strong Tie Score",
+           y = "",
+           x = "")
+
+    # Default PDF output unless NULL supplied to path
+    if(is.null(path)){
+
+      plot_output
+
+    } else {
+
+     ggsave(paste0(path, tstamp(), ".pdf"),
+            plot = plot_output,
+            width = 16,
+            height = 9)
+
+    }
+
+  } else if(return == "plot-hrvar"){
+
+    plot_output <-
+      g_layout +
+      ggraph::geom_edge_link(colour = "lightgrey", edge_width = 0.01, alpha = 0.15) +
+      ggraph::geom_node_point(aes(colour = !!sym(hrvar)),
+                              alpha = node_alpha,
+                              pch = 16) +
+      theme_void() +
+      theme(legend.position = "bottom",
+            legend.background = element_rect(fill = bg_fill),
+            plot.background = element_rect(fill = bg_fill),
+            text = element_text(colour = font_col),
+            axis.line = element_blank()) +
+      labs(title = "Person to person collaboration",
+           subtitle = paste0("Showing ", hrvar),
+           y = "",
+           x = "")
+
+    # Default PDF output unless NULL supplied to path
+    if(is.null(path)){
+
+      plot_output
+
+    } else {
+
+      ggsave(paste0(path, tstamp(), ".pdf"),
+             plot = plot_output,
+             width = 16,
+             height = 9)
+
+    }
+
+  } else if(return == "table"){
+
+    vertex_tb %>%
+      count(!!sym(hrvar), cluster)
+
+  } else if(return == "data"){
+
+    vertex_tb
+
+  } else if(return == "network"){
+
+    g
+
+
+  } else if(return == "plot-sankey"){
+
+    create_sankey(data = vertex_tb %>% count(!!sym(hrvar), cluster),
+                  var1 = hrvar,
+                  var2 = "cluster",
+                  count = "n")
+
+  } else if(return == "describe"){
+
+    describe_tb <-
+      vertex_tb %>%
+      left_join(select(data, starts_with("TieOrigin_")),
+                by = c("name" = "TieOrigin_PersonId"))
+
+    desc_str <-
+      describe_tb %>%
+      pull(cluster) %>%
+      unique()
+
+    desc_str %>%
+      purrr::map(function(x){
+        describe_tb %>%
+          filter(cluster == x) %>%
+          network_describe(hrvar = desc_hrvar)
+      }) %>%
+      setNames(nm = desc_str)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/network_louvain.R
+++ b/R/network_louvain.R
@ -0,0 +1,222 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Implement the Louvain community detection on a Person to Person network query
+#'
+#' @description
+#' Take a P2P network query and implement the Louvain community detection method. The
+#' **igraph** implementation of the Louvain method is used.
+#'
+#' @param data Data frame containing a Person to Person query.
+#' @param hrvar String containing the HR attribute to be matched in the dataset.
+#' @param bg_fill String to specify background fill colour.
+#' @param font_col String to specify font and link colour.
+#' @param node_alpha A numeric value between 0 and 1 to specify the transparency of the nodes.
+#' @param algorithm String to specify the node placement algorithm to be used. Defaults to "mds" to perform
+#' a multidimensional scaling of nodes using a shortest path, which is also a deterministic method.
+#' See <https://rdrr.io/cran/ggraph/man/layout_tbl_graph_igraph.html> for a full list of options.
+#' @param path File path for saving the PDF output. Defaults to "network_p2p_louvain".
+#' Since the network outputs are computationally intensive, the default behaviour is to save time by
+#' saving the plot output directly as a PDF in the specified path. To override this behaviour and return
+#' a plot object instead, you can pass `NULL` to `path`. What is passed to `path` makes no difference
+#' if returning anything other than "plot-louvain" or "plot-hrvar".
+#' @param desc_hrvar Character vector of length 3 containing the HR attributes to use when returning the
+#' "describe" output. See `network_describe()`.
+#'
+#' @param return String specifying what output to return.Valid return options include:
+#'   - 'plot-louvain': return a network plot coloured by louvain communities.
+#'   - 'plot-hrvar': return a network plot coloured by HR attribute.
+#'   - 'plot-sankey': return a sankey plot combining communities and HR attribute.
+#'   - 'table': return a vertex summary table with counts in communities and HR attribute.
+#'   - 'data': return a vertex data file that matches vertices with communities and HR attributes.
+#'   - 'describe': returns a list of data frames which describe each of the identified communities.
+#'   - 'network': return igraph object.
+#'
+#' @import ggraph
+#' @import dplyr
+#'
+#' @export
+network_louvain <- function(data,
+                            hrvar,
+                            bg_fill = "#000000",
+                            font_col = "#FFFFFF",
+                            node_alpha = 0.8,
+                            algorithm = "mds",
+                            path = "network_p2p_louvain",
+                            desc_hrvar = c("Organization", "LevelDesignation", "FunctionType"),
+                            return){
+
+  ## Set variables
+  TO_hrvar <- paste0("TieOrigin_", hrvar)
+  TD_hrvar <- paste0("TieDestination_", hrvar)
+
+  ## Set edges df
+  edges <-
+    data %>%
+    select(from = "TieOrigin_PersonId",
+           to = "TieDestination_PersonId",
+           weight = "StrongTieScore")
+
+  ## Vertices data frame to provide meta-data
+  vert_ft <-
+    rbind(
+      # TieOrigin
+      edges %>%
+        select(from) %>% # Single column
+        unique() %>% # Remove duplications
+        left_join(select(data, TieOrigin_PersonId, TO_hrvar),
+                  by = c("from"  = "TieOrigin_PersonId")) %>%
+        select(node = "from", !!sym(hrvar) := TO_hrvar),
+
+      # TieDestination
+      edges %>%
+        select(to) %>% # Single column
+        unique() %>% # Remove duplications
+        left_join(select(data, TieDestination_PersonId, TD_hrvar),
+                  by = c("to"  = "TieDestination_PersonId")) %>%
+        select(node = "to", !!sym(hrvar) := TD_hrvar)
+    )
+
+  ## Create igraph object
+  g_raw <-
+    igraph::graph_from_data_frame(edges,
+                                  directed = FALSE, # Set to undirected for clustering
+                                  vertices = unique(vert_ft)) # remove duplicates
+
+  ## Return a numeric vector of partitions / clusters / modules
+  ## Set a low resolution parameter to have fewer groups
+  lc <- igraph::cluster_louvain(g_raw)
+
+  ## Add cluster
+  g <-
+    g_raw %>%
+    # Add louvain partitions to graph object
+    igraph::set_vertex_attr("cluster", value = as.character(igraph::membership(lc))) %>% # Return membership - diff from Leiden
+    igraph::simplify()
+
+  ## Create vertex table
+  vertex_tb <-
+    g %>%
+    igraph::get.vertex.attribute() %>%
+    as_tibble()
+
+  g_layout <-
+    g %>%
+    ggraph::ggraph(layout = "igraph", algorithm = algorithm)
+
+  ## Return
+  if(return == "plot-louvain"){
+
+    plot_output <-
+      g_layout +
+      ggraph::geom_edge_link(colour = "lightgrey", edge_width = 0.01, alpha = 0.15) +
+      ggraph::geom_node_point(aes(colour = cluster),
+                              alpha = node_alpha,
+                              pch = 16) +
+      theme_void() +
+      theme(legend.position = "bottom",
+            legend.background = element_rect(fill = bg_fill),
+            plot.background = element_rect(fill = bg_fill),
+            text = element_text(colour = font_col),
+            axis.line = element_blank()) +
+      labs(title = "Person to person collaboration with Community Detection",
+           subtitle = "Based on Louvain algorithm and Strong Tie Score",
+           y = "",
+           x = "")
+
+    # Default PDF output unless NULL supplied to path
+    if(is.null(path)){
+
+      plot_output
+
+    } else {
+
+      ggsave(paste0(path, tstamp(), ".pdf"),
+             plot = plot_output,
+             width = 16,
+             height = 9)
+
+    }
+
+  } else if(return == "plot-hrvar"){
+
+    plot_output <-
+      g_layout +
+      ggraph::geom_edge_link(colour = "lightgrey", edge_width = 0.01, alpha = 0.15) +
+      ggraph::geom_node_point(aes(colour = !!sym(hrvar)),
+                              alpha = node_alpha,
+                              pch = 16) +
+      theme_void() +
+      theme(legend.position = "bottom",
+            legend.background = element_rect(fill = bg_fill),
+            plot.background = element_rect(fill = bg_fill),
+            text = element_text(colour = font_col),
+            axis.line = element_blank()) +
+      labs(title = "Person to person collaboration",
+           subtitle = paste0("Showing ", hrvar),
+           y = "",
+           x = "")
+
+    # Default PDF output unless NULL supplied to path
+    if(is.null(path)){
+
+      plot_output
+
+    } else {
+
+      ggsave(paste0(path, tstamp(), ".pdf"),
+             plot = plot_output,
+             width = 16,
+             height = 9)
+
+    }
+
+  } else if(return == "table"){
+
+    vertex_tb %>%
+      count(!!sym(hrvar), cluster)
+
+  } else if(return == "data"){
+
+    vertex_tb
+
+  } else if(return == "network"){
+
+    g
+
+
+  } else if(return == "plot-sankey"){
+
+    create_sankey(data = vertex_tb %>% count(!!sym(hrvar), cluster),
+                  var1 = hrvar,
+                  var2 = "cluster",
+                  count = "n")
+
+  } else if(return == "describe"){
+
+    describe_tb <-
+      vertex_tb %>%
+      left_join(select(data, starts_with("TieOrigin_")),
+                by = c("name" = "TieOrigin_PersonId"))
+
+    desc_str <-
+      describe_tb %>%
+      pull(cluster) %>%
+      unique()
+
+    desc_str %>%
+      purrr::map(function(x){
+        describe_tb %>%
+          filter(cluster == x) %>%
+          network_describe(hrvar = desc_hrvar)
+      }) %>%
+      setNames(nm = desc_str)
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/network_p2p.R
+++ b/R/network_p2p.R
@ -0,0 +1,160 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Create a network plot with the person-to-person query
+#'
+#' @description
+#' Pass a data frame containing a person-to-person query and save a network
+#' plot as a PDF file.
+#'
+#' @param data Data frame containing a person-to-person query.
+#' @param hrvar String containing the label for the HR attribute.
+#' @param return Character vector specifying what to return, defaults to "pdf".
+#' Valid inputs are:
+#'   - "pdf": saves the network plot as a PDF in the specified path. See `path`. This is the recommended
+#'   output format as large networks can be slow in other formats.
+#'   - "plot": returns a ggplot object of the network plot. It is not recommended that you run this without
+#'   assigning the output to an object as plotting to the R console can be slow for large networks.
+#'   - "table": returns the edgelist data frame used in the network.
+#'   - "network": returns the igraph object used to create the network plot.
+#' @param path File path for saving the PDF output. Defaults to "network_p2p".
+#' @param bg_fill String to specify background fill colour.
+#' @param font_col String to specify font and link colour.
+#' @param legend_pos String to specify position of legend. Defaults to "bottom". See `ggplot2::theme()`.
+#' @param palette Function for generating a colour palette with a single argument `n`. Uses "rainbow" by default.
+#' @param node_alpha A numeric value between 0 and 1 to specify the transparency of the nodes.
+#' @param algorithm String to specify the node placement algorithm to be used. Defaults to "fr" for the force-directed
+#' algorithm of Fruchterman and Reingold. See <https://rdrr.io/cran/ggraph/man/layout_tbl_graph_igraph.html> for a
+#' full list of options.
+#'
+#' @examples
+#' ## Simulate simple P2P network
+#' sim_net <-
+#'   data.frame(TieOrigin_PersonId = sample(seq(1, 100), size = 100, replace = TRUE),
+#'              TieDestination_PersonId = sample(seq(1, 100), size = 100, replace = TRUE)) %>%
+#'   dplyr::mutate(TieOrigin_Organization = ifelse(TieOrigin_PersonId >= 50, "A", "B"),
+#'                 TieDestination_Organization = ifelse(TieDestination_PersonId >= 50, "A", "B"),
+#'                 StrongTieScore = rep(1, 100))
+#'
+#' ## Run plot
+#' ## ONLY return 'plot' instead of 'pdf' when data size is small
+#' network_p2p(data = sim_net,
+#'             hrvar = "Organization",
+#'             return = "plot")
+#'
+#' @import ggplot2
+#' @import dplyr
+#' @importFrom grDevices rainbow
+#'
+#' @export
+network_p2p <- function(data,
+                        hrvar,
+                        return = "pdf",
+                        path = "network_p2p",
+                        bg_fill = "#000000",
+                        font_col = "#FFFFFF",
+                        legend_pos = "bottom",
+                        palette = "rainbow",
+                        node_alpha = 0.7,
+                        algorithm = "fr"){
+
+  ## No filtering
+  tieorigin_var <- paste0("TieOrigin_", hrvar)
+  tiedestin_var <- paste0("TieDestination_", hrvar)
+
+  ## Set edges df
+  edges <-
+    data %>%
+    select(from = "TieOrigin_PersonId",
+           to = "TieDestination_PersonId",
+           weight = "StrongTieScore") %>%
+    select(-weight) # Overwrite - no info on edge
+
+  ## Vertices data frame to provide meta-data
+  vert_ft <-
+    rbind(
+      # TieOrigin
+      edges %>%
+        left_join(select(data, TieOrigin_PersonId, tieorigin_var),
+                  by = c("from"  = "TieOrigin_PersonId")) %>%
+        select(node = "from", !!sym(hrvar) := tieorigin_var),
+
+      # TieDestination
+      edges %>%
+        left_join(select(data, TieDestination_PersonId, tiedestin_var),
+                  by = c("to"  = "TieDestination_PersonId")) %>%
+        select(node = "to", !!sym(hrvar) := tiedestin_var)
+    )
+
+  ## Create igraph object
+  g <-
+    igraph::graph_from_data_frame(edges,
+                                  directed = FALSE, # Directed, but FALSE for visualization
+                                  vertices = unique(vert_ft)) %>% # remove duplicates
+    igraph::simplify()
+
+  ## Palette
+  ## Create tibble
+  pal <-
+    tibble(!!sym(hrvar) := g %>%
+             igraph::get.vertex.attribute(hrvar) %>%
+             unique())
+
+  ## Apply palette function
+  col_pal <- do.call(what = palette, args = list(nrow(pal)))
+
+  ## named character vector
+  pal <-
+    pal %>%
+    mutate(Colours = col_pal) %>%
+    tibble::deframe()
+
+  if(return == "table"){
+
+    edges
+
+  } else if(return == "network"){
+
+    g
+
+  } else if(return %in% c("plot", "pdf")){
+
+    outputPlot <-
+      g %>%
+      ggraph::ggraph(layout = "igraph", algorithm = algorithm) +
+      ggraph::geom_edge_link(colour = "lightgrey", edge_width = 0.01, alpha = 0.15) +
+      ggraph::geom_node_point(aes(colour = !!sym(hrvar)), alpha = node_alpha) +
+      scale_colour_discrete(type = pal) +
+      theme_void() +
+      theme(legend.position = legend_pos,
+            legend.background = element_rect(fill = bg_fill),
+            plot.background = element_rect(fill = bg_fill),
+            text = element_text(colour = font_col),
+            axis.line = element_blank()) +
+      labs(y = "",
+           x = "")
+
+    ## Inner conditional
+    if(return == "pdf"){
+
+      fn <- paste0(path, "_", tstamp(), ".pdf")
+      ggsave(filename = fn,
+             plot = outputPlot,
+             width = 12,
+             height = 9)
+
+
+    } else if(return == "plot"){
+
+      outputPlot
+
+    }
+
+  } else {
+
+    stop("Please enter a valid input for `return`.")
+
+  }
+}
--- a/R/pairwise_count.R
+++ b/R/pairwise_count.R
@ -0,0 +1,57 @@
+#' @title Perform a pairwise count of words by id
+#'
+#' @description This is a **data.table** implementation that mimics the output of
+#' `widyr::pairwise_count()` to reduce package dependency. This is used internally
+#' within `tm_cooc()`.
+#'
+#' @param data Data frame output from `tm_clean()`.
+#' @param id String to represent the id variable. Defaults to "line".
+#' @param word String to represent the word variable. Defaults to "word".
+#'
+#' @importFrom data.table ":=" "%like%" "%between%" rbindlist as.data.table
+#'
+#' @examples
+#' td <- data.frame(line = c(1, 1, 2, 2),
+#'                  word = c("work", "meeting", "catch", "up"))
+#'
+#' pairwise_count(td, id = "line", word = "word")
+#'
+#' @export
+pairwise_count <- function(data,
+                           id = "line",
+                           word = "word"){
+
+  # Make sure data.table knows we know we're using it
+  .datatable.aware = TRUE
+
+  data <-
+    data %>%
+    dplyr::rename(word := !!sym(word),
+                  id := !!sym(id))
+
+
+  DT <- data.table::as.data.table(data)
+
+  # convert to character
+  DT[, word := as.character(word)]
+
+  # subset those with >1 per id
+  DT2 <- DT[, N := .N, by = id][N>1]
+
+  # create all combinations of 2
+  # return as a data.table with these as columns `V1` and `V2`
+  # then count the numbers in each id
+  out_data <-
+    DT2[, rbindlist(utils::combn(word,2,
+                        FUN = function(x) as.data.table(as.list(x)),
+                        simplify = FALSE)), by = id] %>%
+    .[, .N, by = list(V1,V2)]
+
+  # format and sort
+  out_data %>%
+    dplyr::as_tibble() %>%
+    dplyr::rename(item1 = "V1",
+                  item2 = "V2",
+                  n = "N") %>%
+    dplyr::arrange(desc(n))
+}
--- a/R/remove_outliers.R
+++ b/R/remove_outliers.R
@ -21,7 +21,7 @@
 #' - `identify_nkw()`
 #' - `identify_inactiveweeks`
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param metric Character string containing the name of the metric,
 #' e.g. "Collaboration_hours"
 #'
--- a/R/sq_data.R
+++ b/R/sq_data.R
@ -3,10 +3,10 @@
 # Licensed under the MIT License. See LICENSE.txt in the project root for license information.
 # --------------------------------------------------------------------------------------------

-#' @title Sample Standard Query dataset
+#' @title Sample Standard Person Query dataset
 #'
 #' @description
-#' A dataset generated from a Standard Query from WpA.
+#' A dataset generated from a Standard Person Query from WpA.
 #'
 #' @format A data frame with 13442 rows and 66 variables:
 #' \describe{
--- a/R/standardise_pq.R
+++ b/R/standardise_pq.R
@ -7,7 +7,7 @@
 #'
 #' @description
 #' This function standardises the variable names to a Standard Person
-#' Query, where the standard use case is to pass a Collaboration Assessment
+#' Query, where the standard use case is to pass a Ways of Working Assessment
 #' Query to the function.
 #'
 #' @details
@ -15,13 +15,13 @@
 #' - `Collaboration_hrs` -> `Collaboration_hours`
 #' - `Instant_message_hours` -> `Instant_Message_hours`
 #'
-#' @param data A Collaboration Assessment query to pass through as a data frame.
+#' @param data A Ways of Working Assessment query to pass through as a data frame.
 #'
 #' @export
 standardise_pq <- function(data){

-  if(identify_query(data) != "Collaboration Assessment Query"){
-    stop("Currently only Collaboration Assessment Query to Standard Person Query
+  if(identify_query(data) != "Ways of Working Assessment Query"){
+    stop("Currently only Ways of Working Assessment Query to Standard Person Query
         conversions are supported.")
  }

--- a/R/themes.R
+++ b/R/themes.R
@ -83,18 +83,18 @@ theme_wpa_basic <- function(font_size = 12){

  text_small_dark <- element_text(size = font_size - 2, colour = text_colour, face = "plain")
  text_small_light <- element_text(size = font_size - 2, colour = "#FFFFFF", face = "plain")
-  text_normal <- element_text(size = font_size + 2, colour = text_colour, face = "plain")
-  text_italic <- element_text(size = font_size + 2, colour = text_colour, face = "italic")
+  text_normal <- element_text(size = font_size + 0, colour = text_colour, face = "plain")
+  text_italic <- element_text(size = font_size + 0, colour = text_colour, face = "italic")

-  text_bold <- element_text(size = font_size + 2, colour = text_colour, face = "bold")
-  text_title <- element_text(size = font_size + 8, colour = text_colour, face = "bold")
+  text_bold <- element_text(size = font_size + 0, colour = text_colour, face = "bold")
+  text_title <- element_text(size = font_size + 2, colour = text_colour, face = "bold")

  theme_minimal() +
    theme(plot.background = element_blank(),
          # plot.background = element_rect(fill = bg_colour),
          text = text_normal,
          plot.title = text_title,
-          plot.subtitle = text_italic,
+          plot.subtitle = text_normal,

          axis.title = text_normal,
          axis.text = text_small_dark,
@ -106,7 +106,7 @@ theme_wpa_basic <- function(font_size = 12){
          axis.ticks = element_blank(),

          legend.position = "bottom",
-          legend.title = element_blank(),
+          # legend.title = element_blank(),

          panel.grid = element_blank(),
          strip.background = element_rect(fill = bg_colour2, colour = bg_colour2),
--- a/R/tm_cooc.R
+++ b/R/tm_cooc.R
@ -26,15 +26,16 @@
 #' @import ggplot2
 #' @import ggraph
 #' @importFrom igraph graph_from_data_frame
-#' @importFrom widyr pairwise_count
 #' @importFrom tidytext unnest_tokens
 #'
 #' @family Text-mining
 #'
 #' @examples
-#' \dontrun{
-#' tm_cooc(mt_data,lmult = 0.01)
-#' }
+#' # Demo using a subset of `mt_data`
+#' mt_data %>%
+#'   dplyr::slice(1:20) %>%
+#'   tm_cooc(lmult = 0.01)
+#'
 #' @export
 tm_cooc <- function(data,
                    stopwords = NULL,
@ -50,10 +51,7 @@ tm_cooc <- function(data,
  # Calculate frequency of pairs
  title_word_pairs <-
    text_df %>%
-    widyr::pairwise_count(word,
-                          line,
-                          sort = TRUE,
-                          upper = FALSE)
+    pairwise_count(id = "line", word = "word")

  # Graph networks
  set.seed(seed)
--- a/R/totals_bind.R
+++ b/R/totals_bind.R
@ -11,14 +11,9 @@
 #' row. See example below on usage.
 #'
 #' @examples
-#' \dontrun{
 #' sq_data %>%
-#'   totals_bind(target_col = "LevelDesignation",
-#'               target_value = "Total") %>%
-#'   collab_sum(hrvar = "LevelDesignation",
-#'              return = "table")
-#'
-#' }
+#'   totals_bind(target_col = "LevelDesignation", target_value = "Total") %>%
+#'   collab_sum(hrvar = "LevelDesignation", return = "table")
 #'
 #' @param data data frame
 #' @param target_col Character value of the column in which to impute "Total".
--- a/R/totals_col.R
+++ b/R/totals_col.R
@ -0,0 +1,33 @@
+# --------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See LICENSE.txt in the project root for license information.
+# --------------------------------------------------------------------------------------------
+
+#' @title Fabricate a 'Total' HR variable
+#'
+#' @description Create a 'Total' column of character type comprising exactly of one unique value.
+#' This is a convenience function for returning a no-HR attribute view when NULL is supplied to
+#' the `hrvar` argument in functions.
+#'
+#' @examples
+#' ## Create a visual without HR attribute breaks
+#' sq_data %>%
+#'   totals_col() %>%
+#'   collab_fizz(hrvar = "Total")
+#'
+#' @param data data frame
+#' @param total_value Character value defining the name and the value of the "Total" column.
+#' Defaults to "Total". An error is returned if an existing variable has the same name as the
+#' supplied value.
+#'
+#' @export
+totals_col <- function(data, total_value = "Total"){
+
+  if(total_value %in% names(data)){
+    stop(paste("Column", wrap(total_value, wrapper = "`"), "already exists. Please supply a different
+               value to `total_value`"))
+  }
+
+  data %>%
+    dplyr::mutate(!!sym(total_value) := total_value)
+}
--- a/R/track_hr_change.R
+++ b/R/track_hr_change.R
@ -32,9 +32,9 @@
 #' When 'table' is passed, a summary table is returned as a data frame.
 #'
 #' @examples
-#' \dontrun{
+#'
 #' dv_data %>% track_HR_change()
-#'}
+#'
 #' @export
 track_HR_change <- function(data,
                            start_date = min(data$Date),
--- a/R/us_to_space.R
+++ b/R/us_to_space.R
@ -0,0 +1,13 @@
+#' @title Replace underscore with space
+#'
+#' @description Convenience function to convert underscores to space
+#'
+#' @param x String to replace all occurrences of `_` with a single space
+#'
+#' @examples
+#' us_to_space("Meeting_hours_with_manager_1_on_1")
+#'
+#' @export
+us_to_space <- function(x){
+  gsub(pattern = "_", replacement = " ", x = x)
+}
--- a/R/validation_report.R
+++ b/R/validation_report.R
@ -7,7 +7,7 @@
 #'
 #' @description
 #' The function generates an interactive HTML report using
-#' Standard Query data as an input. The report contains a checks on
+#' Standard Person Query data as an input. The report contains a checks on
 #' Workplace Analytics query outputs, to provide diagnostic information
 #' for the Analyst pre-analysis.
 #'
@ -20,6 +20,9 @@
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
 #'  but accepts any character vector, e.g. "Organization"
 #' @param path Pass the file path and the desired file name, _excluding the file extension_.
+#' @param hrvar_threshold Numeric value determining the maximum number of unique values
+#' to be allowed to qualify as a HR variable. This is passed directly to the `threshold`
+#' argument within `hrvar_count_all()`.
 #' @param timestamp Logical vector specifying whether to include a timestamp in the file name.
 #' Defaults to TRUE.
 #'
@ -41,6 +44,7 @@ validation_report <- function(data,
                              meeting_data = NULL,
                              hrvar = "Organization",
                              path = "validation report",
+                              hrvar_threshold = 150,
                              timestamp = TRUE){

  ## Create timestamped path (if applicable)
@ -114,7 +118,7 @@ validation_report <- function(data,

         read_preamble("organizational_data_quality.md"), #13, Header - 2. Organizational Data Quality
         read_preamble("attributes_available.md"),#14
-         data %>% hrvar_count_all(return = "table"),
+         data %>% hrvar_count_all(return = "table", threshold = hrvar_threshold),

         read_preamble("groups_under_privacy_threshold_1.md"), #16, Header - 2.2 Groups under Privacy Threshold
         paste(">", data %>% identify_privacythreshold(return="text")),
--- a/R/workloads_line.R
+++ b/R/workloads_line.R
@ -11,7 +11,7 @@
 #' with a separate panel per value in the HR attribute.
 #' Additional options available to return a summary table.
 #'
-#' @param data A Standard Query dataset in the form of a data frame.
+#' @param data A Standard Person Query dataset in the form of a data frame.
 #' @param hrvar HR Variable by which to split metrics, defaults to "Organization"
 #'  but accepts any character vector, e.g. "LevelDesignation"
 #' @param mingroup Numeric value setting the privacy threshold / minimum group size. Defaults to 5.
--- a/R/workpatterns_hclust.R
+++ b/R/workpatterns_hclust.R
@ -28,7 +28,7 @@
 #' and "abs".
 #' @param signals Character vector to specify which collaboration metrics to use:
 #' You may use "email" (default) for emails only, "IM" for Teams messages only,
-#' or a combination of the two `c("email", "IM")`.
+#' "unscheduled_calls" for Unscheduled Calls only, or a combination, such as `c("email", "IM")`.
 #' @param start_hour A character vector specifying starting hours,
 #' e.g. "0900"
 #' @param end_hour A character vector specifying starting hours,
@ -49,10 +49,16 @@
 #'
 #' @examples
 #' \dontrun{
-#' workpatterns_hclust(em_data, k = 4, return = "plot", signals = c("IM", "email"))
+#' # Run clusters with all three signal types, return plot
+#' workpatterns_hclust(em_data,
+#'                     k = 4,
+#'                     return = "plot",
+#'                     signals = c("IM", "email", "unscheduled_calls"))
 #'
+#' # Run clusters, return raw data
 #' workpatterns_hclust(em_data, k = 4, return = "data")
 #'
+#' # Run clusters for instant messages only, return hclust object
 #' workpatterns_hclust(em_data, k = 4, return = "hclust", signals = c("IM"))
 #' }
 #'
@ -68,57 +74,13 @@ workpatterns_hclust <- function(data,
                                start_hour = "0900",
                                end_hour = "1700"){

-  ## Select input variable names
-  if("email" %in% signals & "IM" %in% signals){
+  # Text replacement only for allowed values

-    ## Create 24 summed `Signals_sent` columns
-    signal_cols <-
-      purrr::map(0:23,
-                 ~combine_signals(data,
-                                  hr = .)) %>%
-      bind_cols()
+  if(any(signals %in% c("email", "IM", "unscheduled_calls"))){

-    ## Use names for matching
-    input_var <- names(signal_cols)
-
-    ## Average signals sent by Person
-    signals_df <-
-      data %>%
-      select(PersonId) %>%
-      cbind(signal_cols) %>%
-      group_by(PersonId) %>%
-      summarise_all(~mean(.))
-
-    ## Signal label
-    sig_label <- "Signals_sent"
-
-  } else if(signals == "IM"){
-
-    match_index <- grepl(pattern = "^IMs_sent", x = names(data))
-    input_var <- names(data)[match_index]
-
-    ## Average signals sent by Person
-    signals_df <-
-      data %>%
-      select(PersonId, all_of(input_var)) %>%
-      group_by(PersonId) %>%
-      summarise_all(~mean(.))
-
-    sig_label <- "IMs_sent"
-
-  } else if(signals == "email"){
-
-    match_index <- grepl(pattern = "^Emails_sent", x = names(data))
-    input_var <-names(data)[match_index]
-
-    ## Average signals sent by Person
-    signals_df <-
-      data %>%
-      select(PersonId, all_of(input_var)) %>%
-      group_by(PersonId) %>%
-      summarise_all(~mean(.))
-
-    sig_label <- "Emails_sent"
+    signal_set <- gsub(pattern = "email", replacement = "Emails_sent", x = signals) # case-sensitive
+    signal_set <- gsub(pattern = "IM", replacement = "IMs_sent", x = signal_set)
+    signal_set <- gsub(pattern = "unscheduled_calls", replacement = "Unscheduled_calls", x = signal_set)

  } else {

@ -126,6 +88,82 @@ workpatterns_hclust <- function(data,

  }

+  ## Create 24 summed `Signals_sent` columns
+  signal_cols <- purrr::map(0:23, ~combine_signals(data, hr = ., signals = signal_set))
+  signal_cols <- bind_cols(signal_cols)
+
+  ## Use names for matching
+  input_var <- names(signal_cols)
+
+  ## Average signals sent by Person
+  signals_df <-
+    data %>%
+    select(PersonId) %>%
+    cbind(signal_cols) %>%
+    group_by(PersonId) %>%
+    summarise_all(~mean(.))
+
+  ## Signal label
+  sig_label <- ifelse(length(signal_set) > 1, "Signals_sent", signal_set)
+
+  # ## Select input variable names
+  # if("email" %in% signals & "IM" %in% signals){
+  #
+  #   ## Create 24 summed `Signals_sent` columns
+  #   signal_cols <-
+  #     purrr::map(0:23,
+  #                ~combine_signals(data, hr = .,
+  #                                 signals = c("Emails_sent", "IMs_sent"))) %>%
+  #     bind_cols()
+  #
+  #   ## Use names for matching
+  #   input_var <- names(signal_cols)
+  #
+  #   ## Average signals sent by Person
+  #   signals_df <-
+  #     data %>%
+  #     select(PersonId) %>%
+  #     cbind(signal_cols) %>%
+  #     group_by(PersonId) %>%
+  #     summarise_all(~mean(.))
+  #
+  #   ## Signal label
+  #   sig_label <- "Signals_sent"
+  #
+  # } else if(signals == "IM"){
+  #
+  #   match_index <- grepl(pattern = "^IMs_sent", x = names(data))
+  #   input_var <- names(data)[match_index]
+  #
+  #   ## Average signals sent by Person
+  #   signals_df <-
+  #     data %>%
+  #     select(PersonId, all_of(input_var)) %>%
+  #     group_by(PersonId) %>%
+  #     summarise_all(~mean(.))
+  #
+  #   sig_label <- "IMs_sent"
+  #
+  # } else if(signals == "email"){
+  #
+  #   match_index <- grepl(pattern = "^Emails_sent", x = names(data))
+  #   input_var <-names(data)[match_index]
+  #
+  #   ## Average signals sent by Person
+  #   signals_df <-
+  #     data %>%
+  #     select(PersonId, all_of(input_var)) %>%
+  #     group_by(PersonId) %>%
+  #     summarise_all(~mean(.))
+  #
+  #   sig_label <- "Emails_sent"
+  #
+  # } else {
+  #
+  #   stop("Invalid input for `signals`.")
+  #
+  # }
+
  ## Normalised pattern data
  ptn_data_norm <-
    signals_df %>%
--- a/README.md
+++ b/README.md
@ -9,8 +9,11 @@ This is an R package for analyzing and visualizing data from [Microsoft Workplac
 ## With the **wpa** package, you can...

 1. **Run prebuilt analysis and visualizations** off Workplace Analytics data with settings for HR variables, privacy threshold, etc.
+
 2. **Generate prebuilt interactive HTML reports**, which cover specific areas e.g. collaboration, connectivity 
+
 3. Leverage **advanced analytics functions**, such as text mining and hierarchical clustering, which are built for Workplace Analytics metrics
+
 4. Integrate analysis of Workplace Analytics data with your R workflow seamlessly

 Here is an example of **wpa** in action:
@ -19,53 +22,16 @@ Here is an example of **wpa** in action:

 ---

-## :rocket: Quick start guide - For users
+## :rocket: Users

-### Installing the package
+To get started with the package,  please see the following links: 

-You can install the GitHub version of **wpa** with the following in R: 
-```R
-# Check if devtools is installed, if not then install it
-if(!"devtools" %in% installed.packages()){
-  install.packages("devtools")
-}
-devtools::install_git(url = "https://github.com/microsoft/wpa.git")
-```
-Note that **wpa** is not yet released on CRAN, and therefore `install.packages()` will not work.
+- [Installation and Set-up](https://microsoft.github.io/wpa/analyst_guide_getting_started.html)
+- [Full function list](https://microsoft.github.io/wpa/reference/index.html)

-### Examples
+Also check out our package cheat sheet for a quick glimpse of what **wpa** offers:

-The package comes shipped with a sample Standard Query dataset (`sq_data`), so you can start exploring functions without having to read in any data. Most functions in **wpa** share a consistent API, and enable you to return results for both a **plot** or a **table** (data frame):
-
-```R
-collaboration_sum(sq_data, return = "plot")
-```
-<img src="man/figures/collaboration_sum2.jpg" align="center" width=80% />
-
-By passing 'table' to the `return` argument, the function will return a data frame with the relevant summary statistics. 
-
-The following illustrates the basic API of standard analysis functions:
-
-<img src="man/figures/api-demo.png" align="center" width=80% />
-
---
-
-## :package: Package Structure
-
-For information on the package structure, please see the relevant section in our [Analyst Guide](.github/analyst_guide.md).
-
-Also check out our package cheat sheet for more information:
-
-<a href="https://github.com/microsoft/wpa/blob/main/man/figures/wpa%20cheatsheet_20201116.pdf"><img src="https://raw.githubusercontent.com/microsoft/wpa/main/man/figures/wpa%20cheatsheet_20201116.png" align="center" width=50% /></a>
-
---
-
-## Vignette
-
-You can browse the vignette by running the following in R:
-```R
-vignette(topic = "intro-to-wpa", package = "wpa")
-```
+<a href="https://github.com/microsoft/wpa/blob/main/man/figures/wpa%20cheatsheet.pdf"><img src="https://raw.githubusercontent.com/microsoft/wpa/main/man/figures/wpa%20cheatsheet.png" align="center" width=50% /></a>

 ---

@ -95,3 +61,15 @@ We would ask you to please read the [Microsoft Open Source Code of Conduct](http
 **Trademarks** 

 This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
+
+---
+
+## Finding this project useful? 
+
+:star: Please star this repository to keep us going! 
+
+Contributors to the GitHub repo:
+
+![](https://raw.githubusercontent.com/microsoft/wpa/main/CONTRIBUTORS.svg)
+
+See full list [here](https://microsoft.github.io/wpa/authors.html).
--- a/_pkgdown.yml
+++ b/_pkgdown.yml
@ -26,13 +26,15 @@ navbar:
      menu:
      - text: Introduction
        href: analyst_guide.html
-      - text: Summary analysis
+      - text: Getting Started
+        href: analyst_guide_getting_started.html
+      - text: Summary Functions
        href: analyst_guide_summary.html
-      - text: Distribution analysis
-        href: analyst_guide_distribution.html  
-      - text: Trend analysis
+      - text: Distribution Functions
+        href: analyst_guide_distribution.html
+      - text: Trend Functions
        href: analyst_guide_trend.html
-      - text: Network analysis
+      - text: Network Functions
        href: analyst_guide_network.html
    reference:
      text: Reference
@ -61,4 +63,8 @@ navbar:
        href: articles/intro-to-wpa.html
      - text: Text-mining Meeting Subject lines
        href: articles/text-mining-capabilities.html
+      - text: Best Practices when using wpa
+        href: articles/best-practices.html
+      - text: Introduction to Information Value
+        href: articles/IV-report.html

--- a/man/IV_by_period.Rd
+++ b/man/IV_by_period.Rd
@ -37,5 +37,10 @@ This function uses the Information Value algorithm to predict which WPA metrics
 \seealso{
 Other Flexible Input: 
 \code{\link{period_change}()}
+
+Other Information Value: 
+\code{\link{IV_report}()},
+\code{\link{create_IV}()}
 }
 \concept{Flexible Input}
+\concept{Information Value}
--- a/man/IV_report.Rd
+++ b/man/IV_report.Rd
@ -14,7 +14,7 @@ IV_report(
 )
 }
 \arguments{
-\item{data}{A Standard Query dataset in the form of a data frame.}
+\item{data}{A Standard Person Query dataset in the form of a data frame.}

 \item{predictors}{A character vector specifying the columns to be used as predictors.
 Defaults to NULL, where all numeric vectors in the data will be used as predictors.}
@ -32,7 +32,15 @@ Defaults to TRUE.}
 }
 \description{
 The function generates an interactive HTML report using
-Standard Query data as an input. The report contains a full Information Value analysis,  a data exploration technique that helps determine which columns in a data set have predictive power or influence on the value of a specified dependent variable.
+Standard Person Query data as an input. The report contains a full Information Value analysis,  a data exploration technique that helps determine which columns in a data set have predictive power or influence on the value of a specified dependent variable.
+}
+\examples{
+\dontrun{
+sq_data \%>\%
+  mutate(X = ifelse(Collaboration_hours > 12, 1, 0)) \%>\% # Simulate binary variable
+  IV_report(outcome =  "CH_binary", predictors = c("Email_hours", "Workweek_span"))
+}
+
 }
 \seealso{
 Other Reports: 
@ -42,5 +50,10 @@ Other Reports:
 \code{\link{connectivity_report}()},
 \code{\link{generate_report}()},
 \code{\link{validation_report}()}
+
+Other Information Value: 
+\code{\link{IV_by_period}()},
+\code{\link{create_IV}()}
 }
+\concept{Information Value}
 \concept{Reports}
--- a/Показать больше
+++ b/Показать больше