Processing engine and React components for constructing configuration-based data transformation and processing pipelines.

Перейти к файлу

Chris Trevino 276acf70bf Merge pull request #476 from microsoft/fix/resource_rename		2023-01-09 22:06:20 -08:00
.devcontainer	remove unused container configs	2022-12-01 18:56:50 +00:00
.github	Update poetry version + publish on demand	2022-12-16 09:49:58 -06:00
.vscode	Fix unit tests	2022-12-08 19:17:14 -08:00
.yarn	update semver	2023-01-09 16:30:35 -08:00
docs	remove python fetch	2022-10-19 16:49:28 -07:00
javascript	docs update	2023-01-09 17:36:55 -08:00
python	format bin.py	2022-12-21 03:14:45 +00:00
schema	Fix covid-19 fixture (add codebook for typing)	2023-01-06 20:45:01 +00:00
scripts	Create versions task	2022-10-31 15:53:09 -07:00
.eslintignore	work around issues with importing json using assertions	2022-09-22 14:10:39 -07:00
.eslintrc	Split examples into individual stories	2022-09-20 15:59:07 -07:00
.gitattributes	update essex scripts	2022-10-11 11:42:47 -07:00
.gitignore	add typings to generated versions file	2022-09-22 14:22:35 -07:00
.prettierignore	ignore generated schema json files	2022-10-05 14:42:05 -07:00
.prettierrc	update explorer file-bundling	2022-05-27 17:12:40 +00:00
.vsts-ci.yml	Initial rename	2022-08-10 16:25:58 -07:00
.yarnrc.yml	udpate yarn	2022-12-20 16:23:31 -08:00
CODEOWNERS	add basic, barely-functional story; remove react-router dom dependency	2022-11-15 12:09:09 -08:00
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md committed	2021-11-22 14:36:49 -08:00
LICENSE	LICENSE committed	2021-11-22 14:36:51 -08:00
README.md	update readme link	2022-10-11 13:40:22 -07:00
SECURITY.md	Initial import	2021-12-01 19:10:08 -08:00
SUPPORT.md	Initial import	2021-12-01 19:10:08 -08:00
package.json	print git status on is_clean failure	2023-01-09 17:35:53 -08:00
turbo.json	Fix root turbo execution	2022-10-31 16:08:38 -07:00
yarn.lock	remove polyfills from source, polyfill tests correctly	2023-01-09 16:28:43 -08:00

README.md

datashaper

This project provides a collection of web components for doing lightweight data wrangling.

There are four goals of the project:

Create a shareable client/server schema for serialized wrangling instructions. This is in the ./schema folder. TypeScript types and JSONSchema generation is in javascript/schema, and published schemas are copied out to ./schema along with test cases that are executed by JavaScript and Python builds to ensure parity.
Maintain an implementation of a basic client-side wrangling engine (largely based on Arquero). This is in the ./javascript folder.
Maintain a python implementation using common wrangling libraries (e.g., pandas) for backend or data science deployments. This is in the ./python folder.
Provide some reusable React components so wrangling operations can be incorporated into webapps easily. This is in the ./javascript/react folder.

Individual documentation for the JavaScript and Python implementations can be found in their respective folders. Broad documentation about building pipelines and the available verbs is available in the docs folder

We currently have six primary JavaScript packages:

react - this is a set of React components for each verb that you can include in web apps that enable tranformation pipeline building.
schema - this is a set of core types and associated JSONSchema definitions for formalizing our data package and resource models (including the definitions for table parsing, Codebooks, and Workflows).
tables - this is the primary set of utilities for loading and parsing data tables, using Arquero under the hood.
utilities - this is a set of helpers for working with files, etc., to ease building data wrangling applications.
webapp - this is the deployable DataShaper that includes all of the verb components and allows creation, execution, and saving of pipeline JSON files.
workflow - this is the primary engine for pipeline execution. It includes low-level operational primitives to execute a wide variety of relational algebra transformations over Arquero tables.

Building

You need node and yarn installed
Run: yarn
Then: yarn build
Run the webapp locally: yarn start

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.