More documentation + minor styling fixes (#45)
This commit is contained in:
Родитель
77f8824823
Коммит
9367b0ac95
|
@ -17,22 +17,28 @@ docs:
|
|||
url: /docs/ug-configuration/
|
||||
- title: "Release Notes"
|
||||
url: https://github.com/microsoft/hyperspace/releases
|
||||
- title: "Best Practices"
|
||||
url: /docs/ug-best-practices/
|
||||
# Needs to be written later
|
||||
# - title: "Best Practices"
|
||||
# url: /docs/ug-best-practices/
|
||||
- title: "Frequently asked Questions"
|
||||
url: /docs/ug-faqs/
|
||||
- title: "Upgrading"
|
||||
url: /docs/ug-upgrading/
|
||||
# Needs to be written for next version
|
||||
# - title: "Upgrading"
|
||||
# url: /docs/ug-upgrading/
|
||||
- title: Developer Guide
|
||||
children:
|
||||
- title: "Building from Source"
|
||||
url: /docs/dg-building-from-source/
|
||||
- title: "Understanding Code Structure"
|
||||
- title: "Code Structure"
|
||||
url: /docs/dg-code-structure/
|
||||
- title: "Hacking Hyperspace"
|
||||
url: /docs/dg-hack-hyperspace/
|
||||
# - title: "Hacking Hyperspace"
|
||||
# url: /docs/dg-hack-hyperspace/
|
||||
- title: "Roadmap"
|
||||
url: https://github.com/microsoft/hyperspace/blob/master/ROADMAP.md
|
||||
- title: "Contributing"
|
||||
url: /docs/dg-contributing/
|
||||
url: https://github.com/microsoft/hyperspace/blob/master/CONTRIBUTING.md
|
||||
- title: "Code of Conduct"
|
||||
url: https://github.com/microsoft/hyperspace/blob/master/CODE_OF_CONDUCT.md
|
||||
- title: Tour of Hyperspace
|
||||
children:
|
||||
- title: "Introduction"
|
||||
|
@ -45,19 +51,29 @@ docs:
|
|||
url: /docs/toh-overview/
|
||||
- title: "Indexes on the Lake"
|
||||
url: /docs/toh-indexes-on-the-lake/
|
||||
- title: Serverless Index Management
|
||||
- title: Meta
|
||||
children:
|
||||
- title: "Metadata on Lake"
|
||||
url: /docs/sim-metadata-on-lake/
|
||||
- title: "Index State Management"
|
||||
url: /docs/sim-index-state-management/
|
||||
- title: "Concurrency Control"
|
||||
url: /docs/sim-concurrency-control/
|
||||
- title: Query Processing
|
||||
children:
|
||||
- title: "Intuition"
|
||||
url: /docs/qp-intuition/
|
||||
- title: "Rule-based Optimization"
|
||||
url: /docs/qp-rule-based-optimization/
|
||||
- title: "Hyperspace & Spark's Catalyst"
|
||||
url: /docs/qp-integration-with-spark-catalyst/
|
||||
- title: "License"
|
||||
url: https://github.com/microsoft/hyperspace/blob/master/LICENSE
|
||||
- title: "Notice"
|
||||
url: https://github.com/microsoft/hyperspace/blob/master/NOTICE.txt
|
||||
- title: "Security"
|
||||
url: https://github.com/microsoft/hyperspace/blob/master/SECURITY.md
|
||||
|
||||
# Need to be written
|
||||
# - title: Serverless Index Management
|
||||
# children:
|
||||
# - title: "Metadata on Lake"
|
||||
# url: /docs/sim-metadata-on-lake/
|
||||
# - title: "Index State Management"
|
||||
# url: /docs/sim-index-state-management/
|
||||
# - title: "Concurrency Control"
|
||||
# url: /docs/sim-concurrency-control/
|
||||
# - title: Query Processing
|
||||
# children:
|
||||
# - title: "Intuition"
|
||||
# url: /docs/qp-intuition/
|
||||
# - title: "Rule-based Optimization"
|
||||
# url: /docs/qp-rule-based-optimization/
|
||||
# - title: "Hyperspace & Spark's Catalyst"
|
||||
# url: /docs/qp-integration-with-spark-catalyst/
|
||||
|
|
|
@ -4,6 +4,8 @@ permalink: /docs/ug-quick-start-guide/
|
|||
excerpt: "How to quickly get started with Hyperspace for use with Apache Spark™."
|
||||
last_modified_at: 2020-06-23
|
||||
toc: true
|
||||
toc_label: "Quick-Start Guide"
|
||||
toc_icon: "shipping-fast"
|
||||
---
|
||||
|
||||
This guide helps you quickly get started with Hyperspace with Apache Spark™.
|
||||
|
|
|
@ -3,7 +3,8 @@ title: "Configuration"
|
|||
permalink: /docs/ug-configuration/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-23
|
||||
toc: true
|
||||
toc: false
|
||||
classes: wide
|
||||
---
|
||||
| Property name | Default | Meaning | Since Version |
|
||||
|------------------------------------------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|---------------|
|
||||
|
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Best Practices"
|
||||
permalink: /docs/ug-best-practices/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Upgrading"
|
||||
permalink: /docs/ug-upgrading/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -33,7 +33,7 @@ The test code has a similar structure as the source code and each package in the
|
|||
“docs” folder contains useful documentation on different aspects of Hyperspace including coding, contribution and formatting guidelines, and information on how to build the project in various environments. You can access the documentation using the [Hyperspace Documentation](https://microsoft.github.io/hyperspace/docs/ug-quick-start-guide/) site.
|
||||
|
||||
### dev
|
||||
“dev” folder contains required resources for contributing code to the Hyperspace project as a developer. Currently, it includes the Scala formatting configuration file that is needed to make sure any new code change complies with the Hyperspace’s [coding guidelines](https://microsoft.github.io/hyperspace/docs/dg-contributing/).
|
||||
“dev” folder contains required resources for contributing code to the Hyperspace project as a developer. Currently, it includes the Scala formatting configuration file that is needed to make sure any new code change complies with the Hyperspace’s [coding guidelines](https://github.com/microsoft/hyperspace/blob/master/CONTRIBUTING.md).
|
||||
|
||||
### build
|
||||
Hyperspace uses “sbt” as the build tool for building its code. “build” folder contains sbt build scripts and configurations which are used to create an artifact from Hyperspace source code.
|
||||
|
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Hack Hyperspace"
|
||||
permalink: /docs/dg-hack-hyperspace/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Contributing"
|
||||
permalink: /docs/dg-contributing/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +1,37 @@
|
|||
---
|
||||
title: "Configuration"
|
||||
title: "Introduction"
|
||||
permalink: /docs/toh-introduction/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
excerpt: "Introduction"
|
||||
last_modified_at: 2020-06-23
|
||||
toc: false
|
||||
classes: wide
|
||||
---
|
||||
|
||||
At large companies, it has now become typical to store datasets ranging in size
|
||||
from a few GBs to 100s of PBs in data lakes. The scope of analytics
|
||||
on these datasets ranges from traditional batch-style queries
|
||||
(e.g., OLAP) to explorative, *finding needle in a haystack* type of
|
||||
queries (e.g., point-lookups, summarization etc.). Resorting to
|
||||
linear scans of these large datasets with huge clusters for every
|
||||
simple query is prohibitively expensive and not the top choice for
|
||||
many of our customers, who are constantly exploring
|
||||
ways to reducing their operational costs – incurring unchecked
|
||||
expenses are their worst nightmare. One way to alleviate this
|
||||
issue would be to bring in ‘indexing’ capabilities (which come
|
||||
*de facto* in the traditional database systems world) into Apache Spark™.
|
||||
|
||||
Among many ways to improve query performance and lowering resource
|
||||
consumption in database systems, indexes are particularly efficient in
|
||||
providing tremendous acceleration for certain workloads since they could
|
||||
reduce the amount of data scanned for a given query and thus also result
|
||||
in lowering resource costs.
|
||||
|
||||
Hyperspace is envisioned to be an indexing subsystem for Apache Spark
|
||||
that introduces the ability for users to build, maintain (through a
|
||||
multi-user concurrency model) and leverage indexes (automatically,
|
||||
without any changes to their existing code) on their data (e.g., CSV,
|
||||
JSON, Parquet etc.) for query/workload acceleration.
|
||||
|
||||
The rest of the documentation covers the necessary foundations behind Hyperspace
|
||||
including the API design, and how it leverages Apache Spark™ Catalyst optimizer to
|
||||
provide a transparent user experience.
|
||||
|
|
|
@ -4,7 +4,7 @@ permalink: /docs/toh-indexes/
|
|||
excerpt: "Indexes, derived datasets and the whole mile of index types"
|
||||
last_modified_at: 2020-06-23
|
||||
toc: true
|
||||
toc_label: "FAQs"
|
||||
toc_label: "Indexes et al."
|
||||
toc_icon: "brain"
|
||||
---
|
||||
|
||||
|
|
|
@ -1,8 +1,34 @@
|
|||
---
|
||||
title: "Indexes on the Lake"
|
||||
permalink: /docs/toh-indexes-on-the-lake/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
excerpt: "Index organization on the lake."
|
||||
last_modified_at: 2020-06-23
|
||||
toc: false
|
||||
classes: wide
|
||||
---
|
||||
|
||||
To accomplish the goals outlined in [Design Goals](/hyperspace/docs/toh-design-goals/),
|
||||
Hyperspace stores all index metadata in the lake, without any external dependencies.
|
||||
|
||||
The following shows the organization of the index metadata on the data lake file
|
||||
system.
|
||||
|
||||
```bash
|
||||
Filesystem Root
|
||||
├── /indexes/
|
||||
| └── <index name>
|
||||
| ├── _hyperspace_log # Hyperspace operation log
|
||||
| | ├── create (active) # Entry indicating that the index got created
|
||||
| | ├── refresh (inc) # Entry indicating that the index is getting refreshed
|
||||
| | └── active # Entry indicating that the index is active again
|
||||
| ├── ...
|
||||
| ├── <index-directory-1> # First 'version' of the index ---
|
||||
| ├── <index-directory-2> # -------------------------------- |- Second version
|
||||
```
|
||||
|
||||
As shown, we store all indexes (or *derived datasets*, in the more general sense)
|
||||
at the root of the file system. An alternative
|
||||
design we considered was to co-locate the index with the dataset. However,
|
||||
since Hyperspace intends to support a more general notion of indexes such as
|
||||
materialized views (which can span datasets), Hyperspace decouples the index
|
||||
location from the original data location.
|
||||
|
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Metadata on Lake"
|
||||
permalink: /docs/sim-metadata-on-lake/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Index State Management"
|
||||
permalink: /docs/sim-index-state-management/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Concurrency Control"
|
||||
permalink: /docs/sim-concurrency-control/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Intuition"
|
||||
permalink: /docs/qp-intuition/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Rule-based Optimization"
|
||||
permalink: /docs/qp-rule-based-optimization/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
---
|
||||
title: "Integration with Spark Catalyst"
|
||||
permalink: /docs/qp-integration-with-spark-catalyst/
|
||||
excerpt: "How to configure Hyperspace for your needs."
|
||||
last_modified_at: 2020-06-20
|
||||
toc: true
|
||||
---
|
||||
|
|
@ -22,6 +22,29 @@ html {
|
|||
}
|
||||
}
|
||||
|
||||
.page__inner-wrap {
|
||||
font-size: 14px; // originally 16px
|
||||
@include breakpoint($medium) {
|
||||
font-size: 14px; // originally 18px
|
||||
}
|
||||
|
||||
@include breakpoint($large) {
|
||||
font-size: 14px; // originally 20px
|
||||
}
|
||||
|
||||
@include breakpoint($x-large) {
|
||||
font-size: 16px; // originally 22px
|
||||
}
|
||||
}
|
||||
|
||||
.toc .nav__title {
|
||||
color:#fff;
|
||||
font-size:.75em;
|
||||
background:black;
|
||||
border-top-left-radius:4px;
|
||||
border-top-right-radius:4px
|
||||
}
|
||||
|
||||
.feature__wrapper___small {
|
||||
@include clearfix();
|
||||
margin-bottom: 1em;
|
||||
|
|
Загрузка…
Ссылка в новой задаче