SynapseML/CONTRIBUTING.md

## Interested in contributing to MMLSpark?  We're excited to work with you.

### You can contribute in many ways:

-   Use the library and give feedback: report bugs, request features.
-   Add sample Jupyter notebooks, Python or Scala code examples, documentation
    pages.
-   Fix bugs and issues.
-   Add new features, such as data transformations or machine learning algorithms.
-   Review pull requests from other contributors.

### How to contribute?

You can give feedback, report bugs and request new features anytime by opening
an issue.  Also, you can up-vote or comment on existing issues.

If you want to add code, examples or documentation to the repository, follow
this process:

#### Propose a contribution

-   Preferably, get started by tackling existing issues to get yourself acquainted
    with the library source and the process.
-   Open an issue, or comment on an existing issue to discuss your contribution
    and design, to ensure your contribution is a good fit and doesn't duplicate
    on-going work.
-   Any algorithm you're planning to contribute should be well known and accepted
    for production use, and backed by research papers.
-   Algorithms should be highly scalable and suitable for very large datasets.
-   All contributions need to comply with the MIT License.  Contributors external
    to Microsoft need to sign CLA.

#### Implement your contribution

-   Fork the MMLSpark repository.
-   Implement your algorithm in Scala, using our wrapper generation mechanism to
    produce PySpark bindings.
-   Use SparkML `PipelineStage`s so your algorithm can be used as a part of
    pipeline.
-   For parameters use `MMLParam`s.
-   Implement model saving and loading by extending SparkML `MLReadable`.
-   Use good Scala style.
-   Binary dependencies should be on Maven Central.
-   See this [pull request](https://github.com/Azure/mmlspark/pull/22) for an
    example contribution.

#### Implement tests

-   Set up build environment.  Use a Linux machine or VM (we use Ubuntu, but other
    distros should work too), and install environment using the [`runme`
    script](runme).
-   Test your code locally.
-   Add tests using ScalaTests — unit tests are required.
-   A sample notebook is required as an end-to-end test.

#### Implement documentation

-   Add a [sample Jupyter notebook](notebooks/samples) that shows the intended use
    case of your algorithm, with instructions in step-by-step manner.  (The same
    notebook could be used for testing the code.)
-   Add in-line ScalaDoc comments to your source code, to generate the [API
    reference documentation](https://mmlspark.azureedge.net/docs/pyspark/)

#### Open a pull request

-   In most cases, you should squash your commits into one.
-   Open a pull request, and link it to the discussion issue you created earlier.
-   An MMLSpark core team member will trigger a build to test your changes.
-   Fix any build failures.  (The pull request will have comments from the build
    with useful links.)
-   Wait for code reviews from core team members and others.
-   Fix issues found in code review and re-iterate.

#### Build and check-in

-   Wait for a core team member to merge your code in.
-   Your feature will be available through a Docker image and script installation
    in the next release, which typically happens around once a month.  You can try
    out your features sooner by using build artifacts for the version that has
    your changes merged in (such versions end with a `.devN`).

If in doubt about how to do something, see how it was done in existing code or
pull requests, and don't hesitate to ask.
Initial content 2017-06-02 18:57:25 +03:00			`## Interested in contributing to MMLSpark? We're excited to work with you.`

More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00			`### You can contribute in many ways:`
Initial content 2017-06-02 18:57:25 +03:00
style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- Use the library and give feedback: report bugs, request features.`
			`- Add sample Jupyter notebooks, Python or Scala code examples, documentation`
			`pages.`
			`- Fix bugs and issues.`
			`- Add new features, such as data transformations or machine learning algorithms.`
			`- Review pull requests from other contributors.`
Initial content 2017-06-02 18:57:25 +03:00
			`### How to contribute?`

More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00			`You can give feedback, report bugs and request new features anytime by opening`
			`an issue. Also, you can up-vote or comment on existing issues.`
Initial content 2017-06-02 18:57:25 +03:00
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00			`If you want to add code, examples or documentation to the repository, follow`
			`this process:`
Initial content 2017-06-02 18:57:25 +03:00
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00			`#### Propose a contribution`

style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- Preferably, get started by tackling existing issues to get yourself acquainted`
			`with the library source and the process.`
			`- Open an issue, or comment on an existing issue to discuss your contribution`
			`and design, to ensure your contribution is a good fit and doesn't duplicate`
			`on-going work.`
			`- Any algorithm you're planning to contribute should be well known and accepted`
			`for production use, and backed by research papers.`
			`- Algorithms should be highly scalable and suitable for very large datasets.`
			`- All contributions need to comply with the MIT License. Contributors external`
			`to Microsoft need to sign CLA.`
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00
			`#### Implement your contribution`

style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- Fork the MMLSpark repository.`
			`- Implement your algorithm in Scala, using our wrapper generation mechanism to`
			`produce PySpark bindings.`
			- Use SparkML `PipelineStage`s so your algorithm can be used as a part of
			`pipeline.`
			- For parameters use `MMLParam`s.
			- Implement model saving and loading by extending SparkML `MLReadable`.
			`- Use good Scala style.`
			`- Binary dependencies should be on Maven Central.`
			`- See this [pull request](https://github.com/Azure/mmlspark/pull/22) for an`
			`example contribution.`
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00
			`#### Implement tests`

style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- Set up build environment. Use a Linux machine or VM (we use Ubuntu, but other`
			distros should work too), and install environment using the [`runme`
			`script](runme).`
			`- Test your code locally.`
			`- Add tests using ScalaTests — unit tests are required.`
			`- A sample notebook is required as an end-to-end test.`
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00
			`#### Implement documentation`

style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- Add a [sample Jupyter notebook](notebooks/samples) that shows the intended use`
			`case of your algorithm, with instructions in step-by-step manner. (The same`
			`notebook could be used for testing the code.)`
			`- Add in-line ScalaDoc comments to your source code, to generate the [API`
			`reference documentation](https://mmlspark.azureedge.net/docs/pyspark/)`
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00
			`#### Open a pull request`

style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- In most cases, you should squash your commits into one.`
			`- Open a pull request, and link it to the discussion issue you created earlier.`
			`- An MMLSpark core team member will trigger a build to test your changes.`
			`- Fix any build failures. (The pull request will have comments from the build`
			`with useful links.)`
			`- Wait for code reviews from core team members and others.`
			`- Fix issues found in code review and re-iterate.`
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00
			`#### Build and check-in`

style: run markdown through markdown linter 2019-07-31 17:59:53 +03:00			`- Wait for a core team member to merge your code in.`
			`- Your feature will be available through a Docker image and script installation`
			`in the next release, which typically happens around once a month. You can try`
			`out your features sooner by using build artifacts for the version that has`
			your changes merged in (such versions end with a `.devN`).
More detailed CONTRIBUTING.md Adding more detailed and structured process to contributors' guide. 2017-07-10 20:46:18 +03:00
			`If in doubt about how to do something, see how it was done in existing code or`
			`pull requests, and don't hesitate to ask.`