datamations/R
Hans Chaudry 043b063ddb
Issue170 fix (#177)
* Update R version

* enable pkgdown

* updated eslint config path

Signed-off-by: Chinmay Singh <chsingh@microsoft.com>

* eslint always enabled

Signed-off-by: Chinmay Singh <chsingh@microsoft.com>

* enable all workflows

Signed-off-by: Chinmay Singh <chsingh@microsoft.com>

* fall back to windows-2019

Signed-off-by: Chinmay Singh <chsingh@microsoft.com>

* update package pkgdown

* updated renv packages snapshot

* update renv

* renv updgrade

* updated workflows

* updated renv lock file

* fixing list to string conversion errors

* dev version of rmarkdown

* dev version of pkgdown

* don't build vignettes

* passing build args

* fixed mutation unit tests

* no error on warming

* disable ubuntu devel

* updated renv

* installed npm packeges and changed position of workflow name

* fixed typo

* reverted the changes in these files: eslint.yml, node.js.yml, pkgdown.yaml, pylint.yml, package-lock.json

* Minor changes to eslint.yml, pylint.yml, package-lock.json

* new line to eslint.yml, and pylint.yml

* new line changes to eslint.yml and pylint.yml

* new line changes to eslint.yml and pylint.yml

* fixed new line issues to eslint.yml and pylint.yml

* removed isssue170_fix from push branches

* Reverted changes in activate.R and renv.lock

* disabled ubuntu

Co-authored-by: Chinmay Singh <chsingh@microsoft.com>
2022-06-10 10:39:40 -04:00
..
README.md Document package dev workflow 2021-12-17 10:06:40 -05:00
app_server.R Styler 2021-12-13 12:16:53 -05:00
app_ui.R Styler 2021-06-15 16:21:03 -04:00
cat_gifs.R Don't generate docs for internal functions 2021-06-23 11:53:51 -04:00
data.R documentation for new dataset 2022-02-15 20:08:13 -06:00
datamation_sanddance.R handling of no-op verbs 2022-03-30 09:01:23 -05:00
datamation_tibble.R Add filter to list of supported functions 2021-12-07 12:59:44 -05:00
datamations-package.R Tests and r cmd check fixes 2021-05-11 11:25:06 -04:00
dmta_group_by.R Styler 2021-06-15 16:21:03 -04:00
dmta_summarize.R R cmd check fixes 2021-06-15 16:18:13 -04:00
dmta_ungroup.R Styler 2021-04-07 13:44:50 -04:00
generate_mapping.R Updates to error handling, missing vignette call 2022-03-31 21:18:17 -05:00
generate_mapping_from_plot.R Fix remotes, styler 2021-11-10 14:57:27 -05:00
make_coords.R Styler 2021-04-07 13:44:50 -04:00
map_coords.R Styler 2021-04-07 13:44:50 -04:00
mod_data_tabs.R Treat count the same as group_by + summarize steps 2021-12-07 12:59:44 -05:00
mod_datamation_sanddance.R WIP logic fr cyclical slider <-> tabs dependency 2021-06-09 16:49:07 -04:00
mod_inputs.R Styler 2021-06-15 16:21:03 -04:00
mod_pipeline.R Load dplyr/palmerpenguins via string in app when needed, avoids r cmd check issues 2021-12-08 13:53:25 -05:00
parse_functions.R Move ggplot2 checking into another function 2021-08-04 11:51:36 -04:00
parse_pipeline.R Treat count the same as group_by + summarize steps 2021-12-07 12:59:44 -05:00
prep_specs_count.R fixes to passing mutation_before, updates to meta and axes specs 2022-03-30 08:03:53 -05:00
prep_specs_data.R Removing of data states in groupby and summarize when a mutation occurs before, adding mapping for mutation basis 2022-03-05 10:03:18 -06:00
prep_specs_filter.R Set domain of color explicitly in filter step, in case values got filtered out 2021-12-07 12:59:44 -05:00
prep_specs_group_by.R Issue170 fix (#177) 2022-06-10 10:39:40 -04:00
prep_specs_mutate.R Issue170 fix (#177) 2022-06-10 10:39:40 -04:00
prep_specs_summarize.R Issue170 fix (#177) 2022-06-10 10:39:40 -04:00
prep_specs_tally.R fixes to passing mutation_before, updates to meta and axes specs 2022-03-30 08:03:53 -05:00
prep_specs_utils.R Merge branch 'main' into specs_mutate 2022-03-31 20:50:11 -05:00
run_app.R r cmd check fixes 2021-06-23 11:59:50 -04:00
snake.R Fix small bug, require latest purrr version because of issue with accumulate 2021-11-09 11:58:04 -05:00
theme_zilch.R Don't export internal functions 2021-06-15 16:18:13 -04:00
utils-pipe.R first commit 2020-11-22 14:51:51 -05:00
zzz.R Use y_tooltip field for tooltip, since y field may be affected by jittering / fake facets 2021-07-26 15:51:25 -04:00

README.md

Package development

To add to datamations code or work within the package repository itself, first clone the repository and open the project in RStudio.

renv

This package uses renv for managing dependencies during the development of datamations and to ensure package versions stay consistent, on your own machine and across other machines. renv isolates dependencies so that e.g. the set of (and versions of) libraries used in the development of datamations can be different than the libraries used in your other analyses.

This article about collaborating with renv outlines the steps for collaborating with renv. The setup has already been done, so the key is step #4 under "Collaborating with renv":

When a collaborator first launches in this project, renv should automatically bootstrap itself, thereby downloading and installing the appropriate version of renv into the project library. After this has completed, they can then use renv::restore() to restore the project library locally on their machine.

So, all you need to do is renv::restore() and renv will install all the correct packages and versions for you!

Loading, developing, and updating code

The devtools package contains all the needed functions to load, document, test, etc a package.

In particular, devtools::load_all() "simulates installing and reloading your package"", and so it loads all R code in R/ as well as any data files in data/, allowing you to access all functions and data without really "installing" the package.

So to e.g. load the package and launch the demo app, you load the package, then use the run_app() function:

devtools::load_all(".")
#  Loading datamations
run_app()

And to e.g. run examples, load the package, dplyr, and then the code:

library(dplyr)
devtools::load_all(".")
#  Loading datamations

"small_salary %>% 
  group_by(Degree) %>%
  summarize(mean = mean(Salary))" %>%
  datamation_sanddance()

If you make changes to functions, just run devtools::load_all(".") again and the changes will be available to use.

Continuous integration and testing

Package testing and R CMD Check can be run interactively via devtools::check(). This runs a series of standardized checks for R packages including documentation, metadata, issues with the R code, dependencies, conventions, as well as any tests created by the testthat framework, which live in tests/testthat.

These checks and tests are also run in continuous integration via GitHub Actions any time there is a pull request into the main branch. The .github/workflows/check_standard.yml file dictates how the checks are run, from installing dependencies to running the checks and tests themselves. Rather than just running the tests on the developer's machine, these tests are run on Linux, Mac, and Windows, and on the latest release of R as well as the development version.

Package documentation

The package's website is created using pkgdown. The site is build on GitHub Actions and only happens on any commit to the main branch (i.e. any commit directly to or a PR to the branch). The documentation itself lives in the gh-pages branch.

Functions

datamation_sanddance()

datamation_sanddance() is the main function that a user will call to generate a datamation. The code is documented to walk through each step, but at a high level:

  • It uses parse_pipeline() to parse the pipeline into steps,

  • and snake() to evaluate how the data looks at each step.

  • Then parse_functions() extracts the actual function names from the pipeline steps,

  • It checks that all functions are supported, then extracts the function arguments.

  • Generates "mapping" (facets, x, etc) using generate_mapping_from_plot() if the code contains a ggplot2 specification

  • and finishes generating the mapping via generate_mapping().

  • Then, it loops over all of the steps in the pipeline and uses the function (i.e. group by, summarize, filter), data, and arguments to generate the specs for each step, using functions:

  • Finally, all of the specs are returned and passed off to datamationSandDance(), which actually passes them off to the Javascript code.

prep_specs_data()

  • Preps encoding based on the mapping from the main function
  • Generates specs for an ungrouped icon array, using generate_vega_specs()

prep_specs_group_by()

  • Preps encoding based on the mapping from the main function

  • Generates specs for each grouping variable, in the following order: column -> row -> x/color, based on what actually exists in the mapping, all using generate_vega_specs()

    • Step 1: Generates specs for a grouped icon array, by column

      • Sends meta.parse = "grid" to indicate to the JS that this is not a real vega lite spec, and needs to be parsed into one
      • Sends a "colour" variable, only if colour is the same as the column facet variable (so they're animated in the same step)
    • Step 2: Generates specs for a grouped icon array, by row (and column, if it needs to be done)

      • Sends meta.parse = "grid" to indicate to the JS that this is not a real vega lite spec, and needs to be parsed into one
      • If the X variable is the same as the row facet variable, send meta.splitField = mapping$x (whatever the x variable is in the mapping), to indicate to the JS that the info grid needs to be split "within" a facet frame
      • Sends a "colour" variable, only if colour is the same as the row facet variable (so they're animated in the same step)
    • Step 3: Generates specs for grouped icon array, by x (and column/row if they're done)

      • Sends meta.parse = "grid" to indicate to the JS that this is not a real vega lite spec, and needs to be parsed into one
      • Sends meta.splitField = mapping$x, to indicate to the JS that the info grid needs to be split "within" a facet frame
      • Sends meta.axes = TRUE if there are faceting variables, to indicate that "fake facets" need to be drawn
      • Sends a "colour" variable, only if colour is the same as the x variable (so they're animated in the same step)

prep_specs_summarize()

  • Preps encoding based on the mapping from the main function

  • Generates center points for X values, to be used as the center of jittering when distributions are shown, as well as an expression to convert these numeric X values to actual labels (e.g. 1 = Male, 2 = Female) in generate_labelsExpr() and the full X domain (with 0.5 padding on left and right) via generate_x_domain()

  • Step 1: Generates specs for scatter plot (with any grouping), via generate_vega_specs()

    • Sends meta.parse = "jitter" to indicate to the JS that the x values need to be jittered

    • Sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)

    • Sends meta.splitField = mapping$x if there's a grouping variable on the x-axis, since each jitter field needs to be split by that X

    • Sends meta.xAxisLabels with the actual values of the X variable if there are facets, because if they are, they're fake and occupying the real axes, and so we need to actually send xAxisLabels to get drawn on

      • TODO for me: this has flag !has_facets, but I think it should be just has_facets (i.e. the opposite)
  • Step 2: Generate specs for summary plot, e.g. mean, with any grouping, via generate_vega_specs()

    • Just sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)
  • Step 3: Generate specs for errorbars plot (only if the summary function is mean), via generate_vega_specs()

    • Passes both the summarized and "raw" y-values to vega lite, since the errorbar is calculated by vega lite and needs the raw y-values to calculate this
    • Just sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)
  • Step 4: Generate specs for zoomed plot

    • If the summary function is tthe mean, and there's error bars, calculate the error bars manually to get the range of the plot
    • Otherwise, just do the range of the y-values
    • Again, just sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)

prep_specs_filter()

  • Reuse specs of previous frame (whether it was from the initial data, group_by, or summarize step)
  • Get gemini_ids of rows that are filtered in based on the operation in filter()
  • Update specs to have transform.filter = {"field": "gemini_id", "oneOf": [1, 2, ...]}} if there are multiple IDs (or transform.filter = {datum.gemini_id == 1} if there is only one)

shiny app

To embed a datamation_sanddance() object in a Shiny app, use renderDatamationSandDance() in the server function, and datamationSandDanceOutput() in the UI function.

For the actual "datamations" Shiny app:

  • app.R (in the main package directory, not in this R/ subdirectory) actually runs the app, by calling run_app() It needs to live here in order to be deployed on shinyapps.io.

  • run_app() creates a shiny app by calling the UI function (app_ui()) and the server function (app_server())

  • mod_inputs.R contains the module for the app inputs (data set, group by variables, summary function and variable)

  • mod_pipeline.R contains the module for constructing and displaying the tidyverse pipeline, generated from the inputs

  • mod_datamation_sanddance.R generates the actual datamation

  • mod_data_tabs.R generates the tabs that show the data at each stage