Processing engine and React components for constructing configuration-based data transformation and processing pipelines.
Перейти к файлу
Chris Trevino 276acf70bf
Merge pull request #476 from microsoft/fix/resource_rename
2023-01-09 22:06:20 -08:00
.devcontainer remove unused container configs 2022-12-01 18:56:50 +00:00
.github Update poetry version + publish on demand 2022-12-16 09:49:58 -06:00
.vscode Fix unit tests 2022-12-08 19:17:14 -08:00
.yarn update semver 2023-01-09 16:30:35 -08:00
docs remove python fetch 2022-10-19 16:49:28 -07:00
javascript docs update 2023-01-09 17:36:55 -08:00
python format bin.py 2022-12-21 03:14:45 +00:00
schema Fix covid-19 fixture (add codebook for typing) 2023-01-06 20:45:01 +00:00
scripts Create versions task 2022-10-31 15:53:09 -07:00
.eslintignore work around issues with importing json using assertions 2022-09-22 14:10:39 -07:00
.eslintrc Split examples into individual stories 2022-09-20 15:59:07 -07:00
.gitattributes update essex scripts 2022-10-11 11:42:47 -07:00
.gitignore add typings to generated versions file 2022-09-22 14:22:35 -07:00
.prettierignore ignore generated schema json files 2022-10-05 14:42:05 -07:00
.prettierrc update explorer file-bundling 2022-05-27 17:12:40 +00:00
.vsts-ci.yml Initial rename 2022-08-10 16:25:58 -07:00
.yarnrc.yml udpate yarn 2022-12-20 16:23:31 -08:00
CODEOWNERS add basic, barely-functional story; remove react-router dom dependency 2022-11-15 12:09:09 -08:00
CODE_OF_CONDUCT.md CODE_OF_CONDUCT.md committed 2021-11-22 14:36:49 -08:00
LICENSE LICENSE committed 2021-11-22 14:36:51 -08:00
README.md update readme link 2022-10-11 13:40:22 -07:00
SECURITY.md Initial import 2021-12-01 19:10:08 -08:00
SUPPORT.md Initial import 2021-12-01 19:10:08 -08:00
package.json print git status on is_clean failure 2023-01-09 17:35:53 -08:00
turbo.json Fix root turbo execution 2022-10-31 16:08:38 -07:00
yarn.lock remove polyfills from source, polyfill tests correctly 2023-01-09 16:28:43 -08:00

README.md

datashaper

This project provides a collection of web components for doing lightweight data wrangling.

There are four goals of the project:

  1. Create a shareable client/server schema for serialized wrangling instructions. This is in the ./schema folder. TypeScript types and JSONSchema generation is in javascript/schema, and published schemas are copied out to ./schema along with test cases that are executed by JavaScript and Python builds to ensure parity.
  2. Maintain an implementation of a basic client-side wrangling engine (largely based on Arquero). This is in the ./javascript folder.
  3. Maintain a python implementation using common wrangling libraries (e.g., pandas) for backend or data science deployments. This is in the ./python folder.
  4. Provide some reusable React components so wrangling operations can be incorporated into webapps easily. This is in the ./javascript/react folder.

Individual documentation for the JavaScript and Python implementations can be found in their respective folders. Broad documentation about building pipelines and the available verbs is available in the docs folder

We currently have six primary JavaScript packages:

  • react - this is a set of React components for each verb that you can include in web apps that enable tranformation pipeline building.
  • schema - this is a set of core types and associated JSONSchema definitions for formalizing our data package and resource models (including the definitions for table parsing, Codebooks, and Workflows).
  • tables - this is the primary set of utilities for loading and parsing data tables, using Arquero under the hood.
  • utilities - this is a set of helpers for working with files, etc., to ease building data wrangling applications.
  • webapp - this is the deployable DataShaper that includes all of the verb components and allows creation, execution, and saving of pipeline JSON files.
  • workflow - this is the primary engine for pipeline execution. It includes low-level operational primitives to execute a wide variety of relational algebra transformations over Arquero tables.

Building

  • You need node and yarn installed
  • Run: yarn
  • Then: yarn build
  • Run the webapp locally: yarn start

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.