5.7 KiB
Hyperspace Roadmap
This document defines a high level roadmap for Hyperspace development and upcoming releases. Community and contributor involvement is vital for successfully implementing all desired items for each release. We hope that the items listed below will inspire further engagement from the community to keep Hyperspace progressing and shipping exciting and valuable features.
Note: Any dates listed below and the specific issues that will ship in a given milestone are subject to change but should give a general idea of what we are planning. We use the milestone feature in Github so look there for the most up-to-date and issue plan.
If you see a problem on this, or something is not clear, please consider opening an issue.
Table of Contents
Zooming Out
Before we discuss the short and long term plans, it is useful to look at the big picture. The figure below shows the various investments in Hyperspace.
In short, there are six parallel tracks of work happening within Hyperspace. While we use the word track to imply that work can happen mostly independently without depending on design decisions from the other tracks, should the community notice such a hard dependency, we should work towards making the tracks independent, to the extent possible.
The best way to understand the tracks shown in the figure is:
- Track 1: This is the foundation where we ensure that the meta-data we are
storing is enough to capture all the context for an index. For instance, there are a number
of questions that pop up when using indexes instead of the original base data:
- Is the index up-to-date?
- Were there any transformations applied?
- Tracks 2,3,4: These tracks deal with the immutability aspect of the underlying data. The current focus of Hyperspace is on supporting indexing for immutable datasets (meaning that the only way to refresh the index would be to rebuild it in full). In the upcoming months, we hope to focus on the other aspects allowing us to index data that is getting updated (either append-only, or updated like how it happens in Delta Lake).
- Track 5: One of the most frequently asked questions with indexing subsystems is what to index?. This track of Hyperspace focuses on finding a reasonable answer to this question. The current focus is on rule-based recommender systems, which are simple and effective, although not the best. Subsequently, the focus of Hyperspace would explore more sophisticated approaches like hypothetical indexes.
- Track 6: Since indexes are a relatively new concept to data lakes, there are lots of gotchas in terms of how one would create and maintain them. In addition, there are lots of design decisions that go into supporting an index. An important challenge that Hyperspace primarily deals with is ensuring there is a balance between customizability and simplicity. Hyperspace chooses to provide simple APIs, with options to customize. This aspect of Hyperspace focuses on documenting all the surrounding concepts and writing accessible tutorials for users.
We will refer to each of these tracks in the format T e.g., T1, where applicable.
Short Term
In the short-term (i.e., next 3-6 months), the focus of Hyperspace would be on work that spans the following categories:
- Bug fixes - We do not recommend using Hyperspace in production. However, we do encourage trying it out on your workloads and telling us what you like vs. what you would like to see. Our primary focus is on fixing any usage bugs that are reported in real-world usage.
- Stability improvements - Includes aspects such as challenging the API design to ensure it is robust enough for evolving to support other index types, ensuring backward/forward compatibility of the meta-data (T1) and verifying that Hyperspace works correctly and consistently.
- Optimizer enhancements - Hyperspace implements optimizer rules to perform index matching. It may be possible that there is definitely scope of improving these rules to achieve better optimizations.
- Robust support for Immutable (T2) & Append-only (T2) Datasets - Includes aspects such as providing incremental indexing support with the necessary foundational APIs for optimizations.
Long Term
The long-term (i.e., next 6-12 months), the focus of Hyperspace would be on work that validates the underlying idea, obtaining community feedback on figuring out ways to decentralize the development of Hyperspace (e.g., across tracks) to make consistent progress. That being said, the immediate next focus would be on work that spans the following categories:
- Index Recommendation - To make Hyperspace immediately usable, it is critical to invest in index recommendation (even if it is a simpler rule-based engine). This aspect of Hyperspace would also include necessary work for doing cost-benefit analysis and laying down the necessary foundation.
- Robust support for Updateable (T3) Datasets - Users are increasingly resorting to using engines such as Delta Lake and Hudi to deal with update semantics on data lakes. Hyperspace intends to first explore the cost of providing support for these engines and potentially providing first-class support.
- More index types - Hyperspace implements support for one specific type of an index called the covering index. The long-term focus of Hyperspace would entail supporting other kinds of index types (or hyperspaces).