Use Issues for Writing Design Proposals & Move Away from Design Proposal PRs (#340)

This commit is contained in:
Rahul Potharaju 2021-01-28 16:16:20 -08:00 коммит произвёл GitHub
Родитель 053b6b1e00
Коммит b64ef850b3
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
8 изменённых файлов: 33 добавлений и 449 удалений

32
.github/ISSUE_TEMPLATE/design-template.md поставляемый
Просмотреть файл

@ -7,14 +7,36 @@ assignees: ''
---
**Describe the problem**
## Problem Statement
A clear and concise description of what the problem is e.g., *I have this scenario and Hyperspace does not work [...]*
**Describe your proposed solution**
A clear and concise sketch of the solution you want to propose.
## Background and Motivation
An introduction of the necessary background and the problem being solved by the proposed change.
**Describe alternatives you've considered**
## Proposed Solution
A clear and concise sketch of the solution you want to propose. Please keep this short. There is a section below for writing a detailed solution.
## Alternatives
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
## Known/Potential Compatibility Issues
A discussion of the change with regard to the compatibility guidelines.
## Design
A description of the proposed design/algorithm. This should include a discussion of how the work fits into Hyperspace's roadmap.
## Implementation
A description of the steps in the implementation, who will do them, and when.
## Performance Implications (if applicable)
A discussion of impact on performance and any corner cases that the author is aware of. If there is a negative impact on performance, please make sure to capture an issue in the next section. This section may be omitted if there are none.
**Open issues (if applicable)**
A discussion of issues relating to this proposal for which the author does not know the solution. If you have already opened the corresponding issues, please link to them here. This section may be omitted if there are none.
- This is the first issue (issue-link)
- This is the second issue (issue-link)
- ...
**Additional context (if applicable)**
Add any other context or screenshots about the feature request here.

Просмотреть файл

@ -31,31 +31,18 @@ To learn about the motivation behind Hyperspace, see the talk [Hyperspace: An In
The process outlined below is for reviewing a proposal and reaching a decision about whether to accept/decline a proposal.
1. The proposal author [creates a brief issue](https://github.com/microsoft/hyperspace/issues/new?assignees=&labels=untriaged%2C+proposal&template=design-template.md&title=%5BPROPOSAL%5D%3A+) describing the proposal.
> **Note: There is no need for a design document at this point.
1. The proposal author [creates a design proposal issue](https://github.com/microsoft/hyperspace/issues/new?assignees=&labels=untriaged%2C+proposal&template=design-template.md&title=%5BPROPOSAL%5D%3A+) describing the proposal.
2. A discussion on the issue will aim to triage the proposal into one of three outcomes:
- Accept proposal
- Decline proposal
- Ask for a design doc
If the proposal is accepted/declined, the process is done. Otherwise, the discussion is expected to identify concerns that should be addressed in a more detailed design document.
3. The proposal author [writes a design doc](#writing-a-design-document) to work out details of the proposed design and address the concerns raised in the initial discussion.
4. Once comments and revisions on the design document are complete, there is a final discussion on the issue to reach one of two outcomes:
- Ask for more details
If the proposal is accepted/declined, the process is done. Otherwise, the discussion is expected to identify concerns that should be addressed by updating the proposal.
3. Once comments and revisions on the design proposal issue are complete, there is a final discussion on the issue to reach one of two outcomes:
- Accept proposal
- Decline proposal
After the proposal is accepted or declined (e.g., after Step 2 or Step 4), implementation work proceeds in the same way as any other contribution.
> **Tip:** If you are an experienced committer and are certain that a design doc will be required for a particular proposal, you can skip Step 2 and just include the doc PR with the initial issue.
### Writing a Design Document
As noted [above](#the-proposal-process), some (but not all) proposals need to be elaborated in a design document.
- The design document should follow the template outlined [here](./docs/design/TEMPLATE.md) and must be named as `docs/design/GITHUB-ISSUE-NUMBER-shortname.md`.
> Note: To obtain the `GITHUB-ISSUE-NUMBER`, you need to first open a GitHub issue and since you are in this section reading how to write a design document, it is assumed that you have already gone through a round of initial discussion in the issue and were asked to explicitly write a design document.
- Once you have the document ready and have addressed any specific concerns raised during the initial discussion, please open a PR.
- Address any additional feedback/questions and update your PR as needed. New design doc authors may be paired with a design doc *shepherd* to help work on the doc.
- Once all the comments are address, you can check-in the design doc. It is expected that the design doc may go through multiple checked-in revisions so please feel free to open subsequent PRs to update/add more information.
After the proposal is accepted or declined, implementation work proceeds in the same way as any other contribution.
### Proposal Review

Просмотреть файл

@ -1,180 +0,0 @@
# Proposal: Incremental Index Maintenance for File/Partition Mutable Datasets
Discussion at https://github.com/microsoft/hyperspace/issues/136.
## Abstract
[A short summary of the proposal.]
## Background
Hyperspace supports indexing immutable datasets. When the underlying data
changes, users have to invoke `refresh` which will fully rebuild the index.
While this approach is simple and clean, it is not scalable, especially
for large datasets.
## Proposal
In this design document, we propose an enhancement to the existing `refresh`
API in Hyperspace which allows users to perform *incremental maintenance*
of their indexes i.e., ways to avoid a full index rebuild.
## Rationale
[A discussion of alternate approaches and the trade offs, advantages, and disadvantages of the specified approach.]
TBD
## Compatibility
[A discussion of the change with regard to the
[compatibility guidelines](../../COMPATIBILITY.md).]
TBD
## Design
This design directly corresponds to Tracks 2,3,4 from the [Hyperspace's roadmap](../ROADMAP.md).
<table>
<thead>
<tr>
<th></th>
<th></th>
<th><b>Full Rebuild</b></th>
<th><b>Read Optimized (Quick Query)</b></th>
<th><b>Write Optimized (Fast Refresh)</b></th>
</tr>
</thead>
<tbody>
<tr>
<td><b>Append</b></td>
<td><b>Characteristic</b></td>
<td>Slowest refresh/fastest query</td>
<td>Slow refresh/fast query</td>
<td>Fast refresh/slow query</td>
</tr>
<tr>
<td></td>
<td><b>API</b></td>
<td>hs.refreshIndex(mode="full")</td>
<td>hs.refreshIndex(mode="smart")</td>
<td><b>**hs.refreshIndex(mode="quick")**</b></td>
</tr>
<tr>
<td></td>
<td><b>What it does?</b></td>
<td>Will rebuild the entire index by scanning the underlying source data</td>
<td>Will build index on newly added data and also optimizes on-the-fly small index files</td>
<td>Will build index ONLY on newly added data</td>
</tr>
<tr>
<td></td>
<td><b>When to use?</b></td>
<td>Underlying source data is relatively stable</td>
<td>Frequently appending new data</td>
<td>Infrequently appending new data</td>
</tr>
<tr>
<td colspan="5"></td>
</tr>
<tr>
<td><b>Delete</b></td>
<td><b>Characteristic</b></td>
<td rowspan="4">Same as above; <br><br>Creates a new index (and consequently, reshuffles the source data)</td>
<td>Slow refresh/fast query</td>
<td>Fast refresh/slow query</td>
</tr>
<tr>
<td></td>
<td><b>API</b></td>
<td>hs.refreshIndex(mode="smart")</td>
<td><b>**hs.refreshIndex(mode="quick")**</b></td>
</tr>
<tr>
<td></td>
<td><b>What it does?</b></td>
<td>
<ul>
<li>Deletes entries from index immediately</li>
<li>DOES NOT shuffle the underlying source data</li>
<li>Operates on lineage</li>
</ul>
</td>
<td>Captures file/partition predicates and deletes entries at query time</td>
</tr>
<tr>
<td></td>
<td><b>When to use?</b></td>
<td>Lots of underlying data getting deleted</td>
<td>Little data getting removed from the underlying data</td>
</tr>
<tr>
<td colspan="5"></td>
</tr>
<tr>
<td><b>Optimize</b></td>
<td></td>
<td></td>
<td><b>Faster Optimize Speed (Quick)</b></td>
<td><b>Slower Optimize Speed (Full)</b></td>
</tr>
<tr>
<td></td>
<td><b>API</b></td>
<td></td>
<td>hs.optimizeIndex(mode="quick")</td>
<td><b>**hs.optimizeIndex(mode="full")**</b></td>
</tr>
<tr>
<td></td>
<td><b>What it does?</b></td>
<td></td>
<td>
<ul>
<li>Changes the physical layout of the index to improve perf but across multiple DELTA indexes (i.e., d___=x index directories)</li>
<li>May have multiple files per bucket which means it does a best-effort merge of small files</li>
<li>DOES NOT refresh the index</li>
</ul>
</td>
<td>
<ul>
<li>Changes the physical layout of the index to improve perf</li>
<li>Create a single file per bucket by merging both small and large files</li>
<li>DOES NOT refresh the index</li>
</ul>
</td>
</tr>
<tr>
<td></td>
<td><b>When to use?</b></td>
<td></td>
<td>When perf starts degrading</td>
<td>When perf starts degrading</td>
</tr>
<tr>
<td colspan="5">Legend: <b>**DEFAULT**</b></td>
</tr>
</tbody>
</table>
## Implementation
[A description of the steps in the implementation, who will do them, and when.]
> Note: If you want to use any images, please upload the .svg AND .png/.jpg file them to `/docs/design/img/` and link to them here.
## Impact on Performance (if applicable)
[A discussion of impact on performance and any corner cases that the author is aware of. If there is a negative impact on performance, please make sure
to capture an issue in the next section. This section may be omitted if there are none.]
## Open issues (if applicable)
[A discussion of issues relating to this proposal for which the author does not
know the solution. If you have already opened the corresponding issues, please link
to them here. This section may be omitted if there are none.]
- This is the first issue ([issue-link]())
- This is the second issue ([issue-link]())
- ...

Просмотреть файл

@ -1,64 +0,0 @@
# Proposal: Hybrid Scan for File/Partition Mutable Datasets
Discussion at https://github.com/microsoft/hyperspace/issues/148.
## Abstract
[A short summary of the proposal.]
## Background
Hyperspace supports indexing immutable and mutable unmanaged datasets.
For managed datasets such as Delta Lake, further optimizations are possible.
For instance, instead of scanning the data lake for detecting changes to
the source data, we can "peek" into the transaction log of these systems
to determine and detect changes.
## Proposal
In this design document, we propose an end-to-end solution for supporting
managed tables such as [Delta Lake](https://delta.io).
## Rationale
[A discussion of alternate approaches and the trade offs, advantages, and disadvantages of the specified approach.]
TBD
## Compatibility
[A discussion of the change with regard to the
[compatibility guidelines](../../COMPATIBILITY.md).]
TBD
## Design
This design directly corresponds to Tracks 2,3,4 from the [Hyperspace's roadmap](../ROADMAP.md).
### Overview
### Change Detection
### Handling Appends and Deletes
## Implementation
[A description of the steps in the implementation, who will do them, and when.]
> Note: If you want to use any images, please upload the .svg AND .png/.jpg file them to `/docs/design/img/` and link to them here.
## Impact on Performance (if applicable)
[A discussion of impact on performance and any corner cases that the author is aware of. If there is a negative impact on performance, please make sure
to capture an issue in the next section. This section may be omitted if there are none.]
## Open issues (if applicable)
[A discussion of issues relating to this proposal for which the author does not
know the solution. If you have already opened the corresponding issues, please link
to them here. This section may be omitted if there are none.]
- This is the first issue ([issue-link]())
- This is the second issue ([issue-link]())
- ...

Просмотреть файл

@ -1,67 +0,0 @@
# Proposal: Hybrid Scan for File/Partition Mutable Datasets
Discussion at https://github.com/microsoft/hyperspace/issues/150.
## Abstract
[A short summary of the proposal.]
## Background
Hyperspace supports indexing immutable datasets. When the underlying data
changes, users have to invoke `refresh` which will either fully or
incrementally rebuild the index. While these approaches are simple and clean,
for relatively small operations on the underlying datasets, it is cumbersome
for the user to remember to invoke index maintenance operations (failing which,
Hyperspace disables index usage).
## Proposal
In this design document, we propose an enhancement to the existing Hyperspace
Optimization process through a technique called **Hybrid Scan**. Hybrid Scan
allows users to continue benefiting from indexes even when their underlying
data changes, without having to manually invoking index maintenance operations.
## Rationale
[A discussion of alternate approaches and the trade offs, advantages, and disadvantages of the specified approach.]
TBD
## Compatibility
[A discussion of the change with regard to the
[compatibility guidelines](../../COMPATIBILITY.md).]
TBD
## Design
This design directly corresponds to Tracks 2,3,4 from the [Hyperspace's roadmap](../ROADMAP.md).
### Overview
### Change Detection
### Hybrid Scan Strategy
## Implementation
[A description of the steps in the implementation, who will do them, and when.]
> Note: If you want to use any images, please upload the .svg AND .png/.jpg file them to `/docs/design/img/` and link to them here.
## Impact on Performance (if applicable)
[A discussion of impact on performance and any corner cases that the author is aware of. If there is a negative impact on performance, please make sure
to capture an issue in the next section. This section may be omitted if there are none.]
## Open issues (if applicable)
[A discussion of issues relating to this proposal for which the author does not
know the solution. If you have already opened the corresponding issues, please link
to them here. This section may be omitted if there are none.]
- This is the first issue ([issue-link]())
- This is the second issue ([issue-link]())
- ...

Просмотреть файл

@ -1,61 +0,0 @@
[This is a template for Hyperspace's design proposal process, documented [here](../../CONTRIBUTING.md).]
# Proposal: [Title]
Discussion at https://github.com/microsoft/hyperspace/XXXXX.
## Abstract
[A short summary of the proposal.]
## Background
[An introduction of the necessary background and the problem being solved by the proposed change.]
## Proposal
[A precise statement of the proposed change.]
## Rationale
[A discussion of alternate approaches and the trade offs, advantages, and disadvantages of the specified approach.]
## Compatibility
[A discussion of the change with regard to the
[compatibility guidelines](../../COMPATIBILITY.md).]
## Design
[A description of the proposed design/algorithm. This should include a discussion of how the work fits
into [Hyperspace's roadmap](../ROADMAP.md).]
> Note: If you want to use any images, please upload the .svg AND .png/.jpg file them to `/docs/design/img/` and link to them here.
Here's a sample image:
<img src="./img/sample.png"
alt="Sample image"
style="float: center; margin-right: 10px;"
width= "100px" />
## Implementation
[A description of the steps in the implementation, who will do them, and when.]
> Note: If you want to use any images, please upload the .svg AND .png/.jpg file them to `/docs/design/img/` and link to them here.
## Impact on Performance (if applicable)
[A discussion of impact on performance and any corner cases that the author is aware of. If there is a negative impact on performance, please make sure
to capture an issue in the next section. This section may be omitted if there are none.]
## Open issues (if applicable)
[A discussion of issues relating to this proposal for which the author does not
know the solution. If you have already opened the corresponding issues, please link
to them here. This section may be omitted if there are none.]
- This is the first issue ([issue-link]())
- This is the second issue ([issue-link]())
- ...

Двоичные данные
docs/design/img/sample.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 120 KiB

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

До

Ширина:  |  Высота:  |  Размер: 34 KiB