hyperspace/ROADMAP.md

95 строки
5.7 KiB
Markdown
Исходник Обычный вид История

2020-06-15 20:33:04 +03:00
# Hyperspace Roadmap
2020-06-24 20:04:48 +03:00
This document defines a high level roadmap for Hyperspace development and upcoming releases.
Community and contributor involvement is vital for successfully implementing all desired
items for each release. We hope that the items listed below will inspire further engagement
from the community to keep Hyperspace progressing and shipping exciting and valuable features.
**Note**: Any dates listed below and the specific issues that will ship in a given milestone
are subject to change but should give a general idea of what we are planning. We use the
milestone feature in Github so look there for the most up-to-date and issue plan.
If you see a problem on this, or something is not clear, please consider
[opening an issue](https://github.com/microsoft/hyperspace/issues).
## Table of Contents
- [Zooming Out](#zooming-out)
- [Short-term](#short-term)
- [Long-term](#long-term)
## Zooming Out
Before we discuss the short and long term plans, it is useful to look at the big picture.
The figure below shows the various investments in Hyperspace.
![Icon](https://github.com/rapoth/hyperspace/blob/master/docs/assets/images/hyperspace-roadmap.png?raw=true)
In short, there are **six** parallel tracks of work happening within Hyperspace. While we use the
word *track* to imply that work can happen mostly independently without depending on design
decisions from the other tracks, should the community notice such a hard dependency, we should
work towards making the tracks independent, to the extent possible.
The best way to understand the tracks shown in the figure is:
- **Track 1**: This is the foundation where we ensure that the meta-data we are
storing is enough to capture all the context for an index. For instance, there are a number
of questions that pop up when using indexes **instead** of the original base data:
- Is the index up-to-date?
- Were there any transformations applied?
- **Tracks 2,3,4**: These tracks deal with the immutability aspect of the underlying
data. The current focus of Hyperspace is on supporting indexing for *immutable datasets*
(meaning that the only way to refresh the index would be to rebuild it in full). In
the upcoming months, we hope to focus on the other aspects allowing us to index data
that is getting updated (either append-only, or updated like how it happens in Delta Lake).
- **Track 5**: One of the most frequently asked questions with indexing subsystems is
*what to index?*. This track of Hyperspace focuses on finding a reasonable answer to this
question. The current focus is on rule-based recommender systems, which are simple
and effective, although not the best. Subsequently, the focus of Hyperspace would explore
more sophisticated approaches like hypothetical indexes.
- **Track 6**: Since indexes are a relatively new concept to data lakes, there are lots
of gotchas in terms of how one would create and maintain them. In addition, there are
lots of design decisions that go into supporting an index. An important challenge that
Hyperspace primarily deals with is ensuring there is a balance between customizability
and simplicity. Hyperspace chooses to provide simple APIs, with options to customize.
This aspect of Hyperspace focuses on documenting all the surrounding concepts and
writing accessible tutorials for users.
We will refer to each of these tracks in the format **T<num>** e.g., T1, where applicable.
2020-06-15 20:33:04 +03:00
## Short Term
2020-06-24 20:04:48 +03:00
In the short-term (i.e., next 3-6 months), the focus of Hyperspace would be on work that
spans the following categories:
- **Bug fixes** - We **do not** recommend using Hyperspace in production. However, we do
encourage trying it out on your workloads and telling us what you like vs. what you would
like to see. Our primary focus is on fixing any usage bugs that are reported in real-world
usage.
- **Stability improvements** - Includes aspects such as challenging the API design to ensure
it is robust enough for evolving to support other index types, ensuring backward/forward
compatibility of the meta-data (**T1**) and verifying that Hyperspace works correctly and consistently.
- **Optimizer enhancements** - Hyperspace implements optimizer rules to perform index matching.
It may be possible that there is definitely scope of improving these rules to achieve better
optimizations.
- **Robust support for Immutable (T2) & Append-only (T2) Datasets** - Includes aspects such
as providing incremental indexing support with the necessary foundational APIs for optimizations.
## Long Term
2020-06-15 20:33:04 +03:00
2020-06-24 20:04:48 +03:00
The long-term (i.e., next 6-12 months), the focus of Hyperspace would be on work that validates the
underlying idea, obtaining community feedback on figuring out ways to decentralize the development
of Hyperspace (e.g., across tracks) to make consistent progress. That being said, the immediate
next focus would be on work that spans the following categories:
2020-06-15 20:33:04 +03:00
2020-06-24 20:04:48 +03:00
- **Index Recommendation** - To make Hyperspace immediately usable, it is critical to invest
in index recommendation (even if it is a simpler rule-based engine). This aspect of Hyperspace
would also include necessary work for doing cost-benefit analysis and laying down the necessary
foundation.
- **Robust support for Updateable (T3) Datasets** - Users are increasingly resorting to using
engines such as Delta Lake and Hudi to deal with update semantics on data lakes. Hyperspace
intends to first explore the cost of providing support for these engines and potentially
providing first-class support.
- **More index types** - Hyperspace implements support for one specific type of an index called
the covering index. The long-term focus of Hyperspace would entail supporting other kinds of
index types (or hyperspaces).