Updates to Readme and Roadmap (#48)
This commit is contained in:
Родитель
5cda89c23b
Коммит
49586b2e08
17
README.md
17
README.md
|
@ -22,11 +22,18 @@ Please review our [contribution guide](CONTRIBUTING.md).
|
|||
|
||||
This project would not have been possible without the outstanding work from the following communities:
|
||||
|
||||
- [Apache Spark](https://spark.apache.org/): Unified Analytics Engine for Big Data, the engine that Hyperspace builds on top of.
|
||||
- [Delta Lake](https://delta.io): Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and big data workloads. Hyperspace derives quite a bit of inspiration from the way the Delta Lake community operates, their novel use of optimistic concurrency.
|
||||
- [Databricks](https://databricks.com/): Unified analytics platform. Many thanks to all the inspiration they have provided us.
|
||||
- [.NET for Apache Spark™](https://github.com/dotnet/spark): Hyperspace offers .NET bindings for developers, thanks to the efforts of this team in collaborating and releasing the bindings just-in-time.
|
||||
- [Minimal Mistakes](https://github.com/mmistakes/minimal-mistakes): The awesome theme behind Hyperspace documentation.
|
||||
- [Apache Spark](https://spark.apache.org/): Unified Analytics Engine for Big Data, the engine that
|
||||
Hyperspace builds on top of.
|
||||
- [Delta Lake](https://delta.io): Delta Lake is an open-source storage layer that brings ACID
|
||||
transactions to Apache Spark™ and big data workloads. Hyperspace derives quite a bit of inspiration
|
||||
from the way the Delta Lake community operates and pioneering of some surrounding ideas in the
|
||||
context of data lakes (e.g., their novel use of optimistic concurrency).
|
||||
- [Databricks](https://databricks.com/): Unified analytics platform. Many thanks to all the inspiration
|
||||
they have provided us.
|
||||
- [.NET for Apache Spark™](https://github.com/dotnet/spark): Hyperspace offers .NET bindings for
|
||||
developers, thanks to the efforts of this team in collaborating and releasing the bindings just-in-time.
|
||||
- [Minimal Mistakes](https://github.com/mmistakes/minimal-mistakes): The awesome theme behind
|
||||
Hyperspace documentation.
|
||||
|
||||
## Code of Conduct
|
||||
|
||||
|
|
91
ROADMAP.md
91
ROADMAP.md
|
@ -1,11 +1,94 @@
|
|||
# Hyperspace Roadmap
|
||||
|
||||
TODO
|
||||
This document defines a high level roadmap for Hyperspace development and upcoming releases.
|
||||
Community and contributor involvement is vital for successfully implementing all desired
|
||||
items for each release. We hope that the items listed below will inspire further engagement
|
||||
from the community to keep Hyperspace progressing and shipping exciting and valuable features.
|
||||
|
||||
**Note**: Any dates listed below and the specific issues that will ship in a given milestone
|
||||
are subject to change but should give a general idea of what we are planning. We use the
|
||||
milestone feature in Github so look there for the most up-to-date and issue plan.
|
||||
|
||||
If you see a problem on this, or something is not clear, please consider
|
||||
[opening an issue](https://github.com/microsoft/hyperspace/issues).
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Zooming Out](#zooming-out)
|
||||
- [Short-term](#short-term)
|
||||
- [Long-term](#long-term)
|
||||
|
||||
## Zooming Out
|
||||
|
||||
Before we discuss the short and long term plans, it is useful to look at the big picture.
|
||||
The figure below shows the various investments in Hyperspace.
|
||||
|
||||
![Icon](https://github.com/rapoth/hyperspace/blob/master/docs/assets/images/hyperspace-roadmap.png?raw=true)
|
||||
|
||||
In short, there are **six** parallel tracks of work happening within Hyperspace. While we use the
|
||||
word *track* to imply that work can happen mostly independently without depending on design
|
||||
decisions from the other tracks, should the community notice such a hard dependency, we should
|
||||
work towards making the tracks independent, to the extent possible.
|
||||
|
||||
The best way to understand the tracks shown in the figure is:
|
||||
- **Track 1**: This is the foundation where we ensure that the meta-data we are
|
||||
storing is enough to capture all the context for an index. For instance, there are a number
|
||||
of questions that pop up when using indexes **instead** of the original base data:
|
||||
- Is the index up-to-date?
|
||||
- Were there any transformations applied?
|
||||
- **Tracks 2,3,4**: These tracks deal with the immutability aspect of the underlying
|
||||
data. The current focus of Hyperspace is on supporting indexing for *immutable datasets*
|
||||
(meaning that the only way to refresh the index would be to rebuild it in full). In
|
||||
the upcoming months, we hope to focus on the other aspects allowing us to index data
|
||||
that is getting updated (either append-only, or updated like how it happens in Delta Lake).
|
||||
- **Track 5**: One of the most frequently asked questions with indexing subsystems is
|
||||
*what to index?*. This track of Hyperspace focuses on finding a reasonable answer to this
|
||||
question. The current focus is on rule-based recommender systems, which are simple
|
||||
and effective, although not the best. Subsequently, the focus of Hyperspace would explore
|
||||
more sophisticated approaches like hypothetical indexes.
|
||||
- **Track 6**: Since indexes are a relatively new concept to data lakes, there are lots
|
||||
of gotchas in terms of how one would create and maintain them. In addition, there are
|
||||
lots of design decisions that go into supporting an index. An important challenge that
|
||||
Hyperspace primarily deals with is ensuring there is a balance between customizability
|
||||
and simplicity. Hyperspace chooses to provide simple APIs, with options to customize.
|
||||
This aspect of Hyperspace focuses on documenting all the surrounding concepts and
|
||||
writing accessible tutorials for users.
|
||||
|
||||
We will refer to each of these tracks in the format **T<num>** e.g., T1, where applicable.
|
||||
|
||||
## Short Term
|
||||
|
||||
TODO
|
||||
In the short-term (i.e., next 3-6 months), the focus of Hyperspace would be on work that
|
||||
spans the following categories:
|
||||
|
||||
## Longer Term
|
||||
- **Bug fixes** - We **do not** recommend using Hyperspace in production. However, we do
|
||||
encourage trying it out on your workloads and telling us what you like vs. what you would
|
||||
like to see. Our primary focus is on fixing any usage bugs that are reported in real-world
|
||||
usage.
|
||||
- **Stability improvements** - Includes aspects such as challenging the API design to ensure
|
||||
it is robust enough for evolving to support other index types, ensuring backward/forward
|
||||
compatibility of the meta-data (**T1**) and verifying that Hyperspace works correctly and consistently.
|
||||
- **Optimizer enhancements** - Hyperspace implements optimizer rules to perform index matching.
|
||||
It may be possible that there is definitely scope of improving these rules to achieve better
|
||||
optimizations.
|
||||
- **Robust support for Immutable (T2) & Append-only (T2) Datasets** - Includes aspects such
|
||||
as providing incremental indexing support with the necessary foundational APIs for optimizations.
|
||||
|
||||
TODO
|
||||
## Long Term
|
||||
|
||||
The long-term (i.e., next 6-12 months), the focus of Hyperspace would be on work that validates the
|
||||
underlying idea, obtaining community feedback on figuring out ways to decentralize the development
|
||||
of Hyperspace (e.g., across tracks) to make consistent progress. That being said, the immediate
|
||||
next focus would be on work that spans the following categories:
|
||||
|
||||
- **Index Recommendation** - To make Hyperspace immediately usable, it is critical to invest
|
||||
in index recommendation (even if it is a simpler rule-based engine). This aspect of Hyperspace
|
||||
would also include necessary work for doing cost-benefit analysis and laying down the necessary
|
||||
foundation.
|
||||
- **Robust support for Updateable (T3) Datasets** - Users are increasingly resorting to using
|
||||
engines such as Delta Lake and Hudi to deal with update semantics on data lakes. Hyperspace
|
||||
intends to first explore the cost of providing support for these engines and potentially
|
||||
providing first-class support.
|
||||
- **More index types** - Hyperspace implements support for one specific type of an index called
|
||||
the covering index. The long-term focus of Hyperspace would entail supporting other kinds of
|
||||
index types (or hyperspaces).
|
||||
|
|
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 359 KiB |
Двоичные данные
docs/assets/images/hyperspace-small-banner.png
Двоичные данные
docs/assets/images/hyperspace-small-banner.png
Двоичный файл не отображается.
До Ширина: | Высота: | Размер: 66 KiB После Ширина: | Высота: | Размер: 66 KiB |
Загрузка…
Ссылка в новой задаче