2020-06-15 20:33:04 +03:00
|
|
|
# Hyperspace Roadmap
|
|
|
|
|
2020-06-24 20:04:48 +03:00
|
|
|
This document defines a high level roadmap for Hyperspace development and upcoming releases.
|
|
|
|
Community and contributor involvement is vital for successfully implementing all desired
|
|
|
|
items for each release. We hope that the items listed below will inspire further engagement
|
|
|
|
from the community to keep Hyperspace progressing and shipping exciting and valuable features.
|
|
|
|
|
|
|
|
**Note**: Any dates listed below and the specific issues that will ship in a given milestone
|
|
|
|
are subject to change but should give a general idea of what we are planning. We use the
|
|
|
|
milestone feature in Github so look there for the most up-to-date and issue plan.
|
|
|
|
|
|
|
|
If you see a problem on this, or something is not clear, please consider
|
|
|
|
[opening an issue](https://github.com/microsoft/hyperspace/issues).
|
|
|
|
|
|
|
|
## Table of Contents
|
|
|
|
|
|
|
|
- [Zooming Out](#zooming-out)
|
|
|
|
- [Short-term](#short-term)
|
|
|
|
- [Long-term](#long-term)
|
|
|
|
|
|
|
|
## Zooming Out
|
|
|
|
|
|
|
|
Before we discuss the short and long term plans, it is useful to look at the big picture.
|
|
|
|
The figure below shows the various investments in Hyperspace.
|
|
|
|
|
|
|
|
![Icon](https://github.com/rapoth/hyperspace/blob/master/docs/assets/images/hyperspace-roadmap.png?raw=true)
|
|
|
|
|
|
|
|
In short, there are **six** parallel tracks of work happening within Hyperspace. While we use the
|
|
|
|
word *track* to imply that work can happen mostly independently without depending on design
|
|
|
|
decisions from the other tracks, should the community notice such a hard dependency, we should
|
|
|
|
work towards making the tracks independent, to the extent possible.
|
|
|
|
|
|
|
|
The best way to understand the tracks shown in the figure is:
|
|
|
|
- **Track 1**: This is the foundation where we ensure that the meta-data we are
|
|
|
|
storing is enough to capture all the context for an index. For instance, there are a number
|
|
|
|
of questions that pop up when using indexes **instead** of the original base data:
|
|
|
|
- Is the index up-to-date?
|
|
|
|
- Were there any transformations applied?
|
|
|
|
- **Tracks 2,3,4**: These tracks deal with the immutability aspect of the underlying
|
|
|
|
data. The current focus of Hyperspace is on supporting indexing for *immutable datasets*
|
|
|
|
(meaning that the only way to refresh the index would be to rebuild it in full). In
|
|
|
|
the upcoming months, we hope to focus on the other aspects allowing us to index data
|
|
|
|
that is getting updated (either append-only, or updated like how it happens in Delta Lake).
|
|
|
|
- **Track 5**: One of the most frequently asked questions with indexing subsystems is
|
|
|
|
*what to index?*. This track of Hyperspace focuses on finding a reasonable answer to this
|
|
|
|
question. The current focus is on rule-based recommender systems, which are simple
|
|
|
|
and effective, although not the best. Subsequently, the focus of Hyperspace would explore
|
|
|
|
more sophisticated approaches like hypothetical indexes.
|
|
|
|
- **Track 6**: Since indexes are a relatively new concept to data lakes, there are lots
|
|
|
|
of gotchas in terms of how one would create and maintain them. In addition, there are
|
|
|
|
lots of design decisions that go into supporting an index. An important challenge that
|
|
|
|
Hyperspace primarily deals with is ensuring there is a balance between customizability
|
|
|
|
and simplicity. Hyperspace chooses to provide simple APIs, with options to customize.
|
|
|
|
This aspect of Hyperspace focuses on documenting all the surrounding concepts and
|
|
|
|
writing accessible tutorials for users.
|
|
|
|
|
|
|
|
We will refer to each of these tracks in the format **T<num>** e.g., T1, where applicable.
|
2020-06-15 20:33:04 +03:00
|
|
|
|
|
|
|
## Short Term
|
|
|
|
|
2020-06-24 20:04:48 +03:00
|
|
|
In the short-term (i.e., next 3-6 months), the focus of Hyperspace would be on work that
|
|
|
|
spans the following categories:
|
|
|
|
|
|
|
|
- **Bug fixes** - We **do not** recommend using Hyperspace in production. However, we do
|
|
|
|
encourage trying it out on your workloads and telling us what you like vs. what you would
|
|
|
|
like to see. Our primary focus is on fixing any usage bugs that are reported in real-world
|
|
|
|
usage.
|
|
|
|
- **Stability improvements** - Includes aspects such as challenging the API design to ensure
|
|
|
|
it is robust enough for evolving to support other index types, ensuring backward/forward
|
|
|
|
compatibility of the meta-data (**T1**) and verifying that Hyperspace works correctly and consistently.
|
|
|
|
- **Optimizer enhancements** - Hyperspace implements optimizer rules to perform index matching.
|
|
|
|
It may be possible that there is definitely scope of improving these rules to achieve better
|
|
|
|
optimizations.
|
|
|
|
- **Robust support for Immutable (T2) & Append-only (T2) Datasets** - Includes aspects such
|
|
|
|
as providing incremental indexing support with the necessary foundational APIs for optimizations.
|
|
|
|
|
|
|
|
## Long Term
|
2020-06-15 20:33:04 +03:00
|
|
|
|
2020-06-24 20:04:48 +03:00
|
|
|
The long-term (i.e., next 6-12 months), the focus of Hyperspace would be on work that validates the
|
|
|
|
underlying idea, obtaining community feedback on figuring out ways to decentralize the development
|
|
|
|
of Hyperspace (e.g., across tracks) to make consistent progress. That being said, the immediate
|
|
|
|
next focus would be on work that spans the following categories:
|
2020-06-15 20:33:04 +03:00
|
|
|
|
2020-06-24 20:04:48 +03:00
|
|
|
- **Index Recommendation** - To make Hyperspace immediately usable, it is critical to invest
|
|
|
|
in index recommendation (even if it is a simpler rule-based engine). This aspect of Hyperspace
|
|
|
|
would also include necessary work for doing cost-benefit analysis and laying down the necessary
|
|
|
|
foundation.
|
|
|
|
- **Robust support for Updateable (T3) Datasets** - Users are increasingly resorting to using
|
|
|
|
engines such as Delta Lake and Hudi to deal with update semantics on data lakes. Hyperspace
|
|
|
|
intends to first explore the cost of providing support for these engines and potentially
|
|
|
|
providing first-class support.
|
|
|
|
- **More index types** - Hyperspace implements support for one specific type of an index called
|
|
|
|
the covering index. The long-term focus of Hyperspace would entail supporting other kinds of
|
|
|
|
index types (or hyperspaces).
|