История

Hans Chaudry 043b063ddb Issue170 fix (#177 ) * Update R version * enable pkgdown * updated eslint config path Signed-off-by: Chinmay Singh <chsingh@microsoft.com> * eslint always enabled Signed-off-by: Chinmay Singh <chsingh@microsoft.com> * enable all workflows Signed-off-by: Chinmay Singh <chsingh@microsoft.com> * fall back to windows-2019 Signed-off-by: Chinmay Singh <chsingh@microsoft.com> * update package pkgdown * updated renv packages snapshot * update renv * renv updgrade * updated workflows * updated renv lock file * fixing list to string conversion errors * dev version of rmarkdown * dev version of pkgdown * don't build vignettes * passing build args * fixed mutation unit tests * no error on warming * disable ubuntu devel * updated renv * installed npm packeges and changed position of workflow name * fixed typo * reverted the changes in these files: eslint.yml, node.js.yml, pkgdown.yaml, pylint.yml, package-lock.json * Minor changes to eslint.yml, pylint.yml, package-lock.json * new line to eslint.yml, and pylint.yml * new line changes to eslint.yml and pylint.yml * new line changes to eslint.yml and pylint.yml * fixed new line issues to eslint.yml and pylint.yml * removed isssue170_fix from push branches * Reverted changes in activate.R and renv.lock * disabled ubuntu Co-authored-by: Chinmay Singh <chsingh@microsoft.com>		2022-06-10 10:39:40 -04:00
..
README.md	Document package dev workflow	2021-12-17 10:06:40 -05:00
app_server.R	Styler	2021-12-13 12:16:53 -05:00
app_ui.R	Styler	2021-06-15 16:21:03 -04:00
cat_gifs.R	Don't generate docs for internal functions	2021-06-23 11:53:51 -04:00
data.R	documentation for new dataset	2022-02-15 20:08:13 -06:00
datamation_sanddance.R	handling of no-op verbs	2022-03-30 09:01:23 -05:00
datamation_tibble.R	Add filter to list of supported functions	2021-12-07 12:59:44 -05:00
datamations-package.R	Tests and r cmd check fixes	2021-05-11 11:25:06 -04:00
dmta_group_by.R	Styler	2021-06-15 16:21:03 -04:00
dmta_summarize.R	R cmd check fixes	2021-06-15 16:18:13 -04:00
dmta_ungroup.R	Styler	2021-04-07 13:44:50 -04:00
generate_mapping.R	Updates to error handling, missing vignette call	2022-03-31 21:18:17 -05:00
generate_mapping_from_plot.R	Fix remotes, styler	2021-11-10 14:57:27 -05:00
make_coords.R	Styler	2021-04-07 13:44:50 -04:00
map_coords.R	Styler	2021-04-07 13:44:50 -04:00
mod_data_tabs.R	Treat count the same as group_by + summarize steps	2021-12-07 12:59:44 -05:00
mod_datamation_sanddance.R	WIP logic fr cyclical slider <-> tabs dependency	2021-06-09 16:49:07 -04:00
mod_inputs.R	Styler	2021-06-15 16:21:03 -04:00
mod_pipeline.R	Load dplyr/palmerpenguins via string in app when needed, avoids r cmd check issues	2021-12-08 13:53:25 -05:00
parse_functions.R	Move ggplot2 checking into another function	2021-08-04 11:51:36 -04:00
parse_pipeline.R	Treat count the same as group_by + summarize steps	2021-12-07 12:59:44 -05:00
prep_specs_count.R	fixes to passing mutation_before, updates to meta and axes specs	2022-03-30 08:03:53 -05:00
prep_specs_data.R	Removing of data states in groupby and summarize when a mutation occurs before, adding mapping for mutation basis	2022-03-05 10:03:18 -06:00
prep_specs_filter.R	Set domain of color explicitly in filter step, in case values got filtered out	2021-12-07 12:59:44 -05:00
prep_specs_group_by.R	Issue170 fix (#177 )	2022-06-10 10:39:40 -04:00
prep_specs_mutate.R	Issue170 fix (#177 )	2022-06-10 10:39:40 -04:00
prep_specs_summarize.R	Issue170 fix (#177 )	2022-06-10 10:39:40 -04:00
prep_specs_tally.R	fixes to passing mutation_before, updates to meta and axes specs	2022-03-30 08:03:53 -05:00
prep_specs_utils.R	Merge branch 'main' into specs_mutate	2022-03-31 20:50:11 -05:00
run_app.R	r cmd check fixes	2021-06-23 11:59:50 -04:00
snake.R	Fix small bug, require latest purrr version because of issue with accumulate	2021-11-09 11:58:04 -05:00
theme_zilch.R	Don't export internal functions	2021-06-15 16:18:13 -04:00
utils-pipe.R	first commit	2020-11-22 14:51:51 -05:00
zzz.R	Use y_tooltip field for tooltip, since y field may be affected by jittering / fake facets	2021-07-26 15:51:25 -04:00

README.md

Package development

To add to datamations code or work within the package repository itself, first clone the repository and open the project in RStudio.

renv

This package uses renv for managing dependencies during the development of datamations and to ensure package versions stay consistent, on your own machine and across other machines. renv isolates dependencies so that e.g. the set of (and versions of) libraries used in the development of datamations can be different than the libraries used in your other analyses.

This article about collaborating with renv outlines the steps for collaborating with renv. The setup has already been done, so the key is step #4 under "Collaborating with renv":

When a collaborator first launches in this project, renv should automatically bootstrap itself, thereby downloading and installing the appropriate version of renv into the project library. After this has completed, they can then use renv::restore() to restore the project library locally on their machine.

So, all you need to do is renv::restore() and renv will install all the correct packages and versions for you!

Loading, developing, and updating code

The devtools package contains all the needed functions to load, document, test, etc a package.

In particular, devtools::load_all() "simulates installing and reloading your package"", and so it loads all R code in R/ as well as any data files in data/, allowing you to access all functions and data without really "installing" the package.

So to e.g. load the package and launch the demo app, you load the package, then use the run_app() function:

devtools::load_all(".")
# ℹ Loading datamations
run_app()

And to e.g. run examples, load the package, dplyr, and then the code:

library(dplyr)
devtools::load_all(".")
# ℹ Loading datamations

"small_salary %>% 
  group_by(Degree) %>%
  summarize(mean = mean(Salary))" %>%
  datamation_sanddance()

If you make changes to functions, just run devtools::load_all(".") again and the changes will be available to use.

Continuous integration and testing

Package testing and R CMD Check can be run interactively via devtools::check(). This runs a series of standardized checks for R packages including documentation, metadata, issues with the R code, dependencies, conventions, as well as any tests created by the testthat framework, which live in tests/testthat.

These checks and tests are also run in continuous integration via GitHub Actions any time there is a pull request into the main branch. The .github/workflows/check_standard.yml file dictates how the checks are run, from installing dependencies to running the checks and tests themselves. Rather than just running the tests on the developer's machine, these tests are run on Linux, Mac, and Windows, and on the latest release of R as well as the development version.

Package documentation

The package's website is created using pkgdown. The site is build on GitHub Actions and only happens on any commit to the main branch (i.e. any commit directly to or a PR to the branch). The documentation itself lives in the gh-pages branch.

Functions

datamation_sanddance()

datamation_sanddance() is the main function that a user will call to generate a datamation. The code is documented to walk through each step, but at a high level:

It uses parse_pipeline() to parse the pipeline into steps,
and snake() to evaluate how the data looks at each step.
Then parse_functions() extracts the actual function names from the pipeline steps,
It checks that all functions are supported, then extracts the function arguments.
Generates "mapping" (facets, x, etc) using generate_mapping_from_plot() if the code contains a ggplot2 specification
and finishes generating the mapping via generate_mapping().
Then, it loops over all of the steps in the pipeline and uses the function (i.e. group by, summarize, filter), data, and arguments to generate the specs for each step, using functions:
- prep_specs_data() to generate the vega lite specs for the initial infogrid
- prep_specs_group_by() to generate the specs for the grouped info grids
- prep_specs_summarize() to generate the specs for the distribution and summary (e.g. mean, error bar, etc) frames
- prep_specs_filter() to generate the spec - this is whatever the previous spec was, filtered according to the operation in filter()
Finally, all of the specs are returned and passed off to datamationSandDance(), which actually passes them off to the Javascript code.
- datamationSandDance_html() controls the "look" of the widget, e.g. that there is a slider, description, etc.

prep_specs_data()

Preps encoding based on the mapping from the main function
Generates specs for an ungrouped icon array, using generate_vega_specs()

prep_specs_group_by()

Preps encoding based on the mapping from the main function
Generates specs for each grouping variable, in the following order: column -> row -> x/color, based on what actually exists in the mapping, all using generate_vega_specs()
- Step 1: Generates specs for a grouped icon array, by column
  - Sends meta.parse = "grid" to indicate to the JS that this is not a real vega lite spec, and needs to be parsed into one
  - Sends a "colour" variable, only if colour is the same as the column facet variable (so they're animated in the same step)
- Step 2: Generates specs for a grouped icon array, by row (and column, if it needs to be done)
  - Sends meta.parse = "grid" to indicate to the JS that this is not a real vega lite spec, and needs to be parsed into one
  - If the X variable is the same as the row facet variable, send meta.splitField = mapping$x (whatever the x variable is in the mapping), to indicate to the JS that the info grid needs to be split "within" a facet frame
  - Sends a "colour" variable, only if colour is the same as the row facet variable (so they're animated in the same step)
- Step 3: Generates specs for grouped icon array, by x (and column/row if they're done)
  - Sends meta.parse = "grid" to indicate to the JS that this is not a real vega lite spec, and needs to be parsed into one
  - Sends meta.splitField = mapping$x, to indicate to the JS that the info grid needs to be split "within" a facet frame
  - Sends meta.axes = TRUE if there are faceting variables, to indicate that "fake facets" need to be drawn
  - Sends a "colour" variable, only if colour is the same as the x variable (so they're animated in the same step)

prep_specs_summarize()

Preps encoding based on the mapping from the main function
Generates center points for X values, to be used as the center of jittering when distributions are shown, as well as an expression to convert these numeric X values to actual labels (e.g. 1 = Male, 2 = Female) in generate_labelsExpr() and the full X domain (with 0.5 padding on left and right) via generate_x_domain()
Step 1: Generates specs for scatter plot (with any grouping), via generate_vega_specs()
- Sends meta.parse = "jitter" to indicate to the JS that the x values need to be jittered
- Sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)
- Sends meta.splitField = mapping$x if there's a grouping variable on the x-axis, since each jitter field needs to be split by that X
- Sends meta.xAxisLabels with the actual values of the X variable if there are facets, because if they are, they're fake and occupying the real axes, and so we need to actually send xAxisLabels to get drawn on
  - TODO for me: this has flag !has_facets, but I think it should be just has_facets (i.e. the opposite)
Step 2: Generate specs for summary plot, e.g. mean, with any grouping, via generate_vega_specs()
- Just sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)
Step 3: Generate specs for errorbars plot (only if the summary function is mean), via generate_vega_specs()
- Passes both the summarized and "raw" y-values to vega lite, since the errorbar is calculated by vega lite and needs the raw y-values to calculate this
- Just sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)
Step 4: Generate specs for zoomed plot
- If the summary function is tthe mean, and there's error bars, calculate the error bars manually to get the range of the plot
- Otherwise, just do the range of the y-values
- Again, just sends meta.axes = TRUE if the plot has "fake facets" and axes need to be drawn, and FALSE if not (so the real axes can be used for X and Y)

prep_specs_filter()

Reuse specs of previous frame (whether it was from the initial data, group_by, or summarize step)
Get gemini_ids of rows that are filtered in based on the operation in filter()
Update specs to have transform.filter = {"field": "gemini_id", "oneOf": [1, 2, ...]}} if there are multiple IDs (or transform.filter = {datum.gemini_id == 1} if there is only one)

shiny app

To embed a datamation_sanddance() object in a Shiny app, use renderDatamationSandDance() in the server function, and datamationSandDanceOutput() in the UI function.

For the actual "datamations" Shiny app:

app.R (in the main package directory, not in this R/ subdirectory) actually runs the app, by calling run_app() It needs to live here in order to be deployed on shinyapps.io.
run_app() creates a shiny app by calling the UI function (app_ui()) and the server function (app_server())
- These mainly just call the app modules (discussed below), with a couple extra things: the UI function sends the slider value to Shiny so that the tabs change, and listens to the tab value for changing the slider and frame.
mod_inputs.R contains the module for the app inputs (data set, group by variables, summary function and variable)
mod_pipeline.R contains the module for constructing and displaying the tidyverse pipeline, generated from the inputs
mod_datamation_sanddance.R generates the actual datamation
mod_data_tabs.R generates the tabs that show the data at each stage

README.md Убрать экранирование Экранировать