More documentation + minor styling fixes (#45)

This commit is contained in:
Rahul Potharaju 2020-06-24 00:45:52 -07:00 коммит произвёл GitHub
Родитель 77f8824823
Коммит 9367b0ac95
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
18 изменённых файлов: 130 добавлений и 113 удалений

Просмотреть файл

@ -17,22 +17,28 @@ docs:
url: /docs/ug-configuration/
- title: "Release Notes"
url: https://github.com/microsoft/hyperspace/releases
- title: "Best Practices"
url: /docs/ug-best-practices/
# Needs to be written later
# - title: "Best Practices"
# url: /docs/ug-best-practices/
- title: "Frequently asked Questions"
url: /docs/ug-faqs/
- title: "Upgrading"
url: /docs/ug-upgrading/
# Needs to be written for next version
# - title: "Upgrading"
# url: /docs/ug-upgrading/
- title: Developer Guide
children:
- title: "Building from Source"
url: /docs/dg-building-from-source/
- title: "Understanding Code Structure"
- title: "Code Structure"
url: /docs/dg-code-structure/
- title: "Hacking Hyperspace"
url: /docs/dg-hack-hyperspace/
# - title: "Hacking Hyperspace"
# url: /docs/dg-hack-hyperspace/
- title: "Roadmap"
url: https://github.com/microsoft/hyperspace/blob/master/ROADMAP.md
- title: "Contributing"
url: /docs/dg-contributing/
url: https://github.com/microsoft/hyperspace/blob/master/CONTRIBUTING.md
- title: "Code of Conduct"
url: https://github.com/microsoft/hyperspace/blob/master/CODE_OF_CONDUCT.md
- title: Tour of Hyperspace
children:
- title: "Introduction"
@ -45,19 +51,29 @@ docs:
url: /docs/toh-overview/
- title: "Indexes on the Lake"
url: /docs/toh-indexes-on-the-lake/
- title: Serverless Index Management
- title: Meta
children:
- title: "Metadata on Lake"
url: /docs/sim-metadata-on-lake/
- title: "Index State Management"
url: /docs/sim-index-state-management/
- title: "Concurrency Control"
url: /docs/sim-concurrency-control/
- title: Query Processing
children:
- title: "Intuition"
url: /docs/qp-intuition/
- title: "Rule-based Optimization"
url: /docs/qp-rule-based-optimization/
- title: "Hyperspace & Spark's Catalyst"
url: /docs/qp-integration-with-spark-catalyst/
- title: "License"
url: https://github.com/microsoft/hyperspace/blob/master/LICENSE
- title: "Notice"
url: https://github.com/microsoft/hyperspace/blob/master/NOTICE.txt
- title: "Security"
url: https://github.com/microsoft/hyperspace/blob/master/SECURITY.md
# Need to be written
# - title: Serverless Index Management
# children:
# - title: "Metadata on Lake"
# url: /docs/sim-metadata-on-lake/
# - title: "Index State Management"
# url: /docs/sim-index-state-management/
# - title: "Concurrency Control"
# url: /docs/sim-concurrency-control/
# - title: Query Processing
# children:
# - title: "Intuition"
# url: /docs/qp-intuition/
# - title: "Rule-based Optimization"
# url: /docs/qp-rule-based-optimization/
# - title: "Hyperspace & Spark's Catalyst"
# url: /docs/qp-integration-with-spark-catalyst/

Просмотреть файл

@ -4,6 +4,8 @@ permalink: /docs/ug-quick-start-guide/
excerpt: "How to quickly get started with Hyperspace for use with Apache Spark™."
last_modified_at: 2020-06-23
toc: true
toc_label: "Quick-Start Guide"
toc_icon: "shipping-fast"
---
This guide helps you quickly get started with Hyperspace with Apache Spark™.

Просмотреть файл

@ -3,7 +3,8 @@ title: "Configuration"
permalink: /docs/ug-configuration/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-23
toc: true
toc: false
classes: wide
---
| Property name | Default | Meaning | Since Version |
|------------------------------------------------------|--------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|---------------|

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Best Practices"
permalink: /docs/ug-best-practices/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Upgrading"
permalink: /docs/ug-upgrading/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -33,7 +33,7 @@ The test code has a similar structure as the source code and each package in the
“docs” folder contains useful documentation on different aspects of Hyperspace including coding, contribution and formatting guidelines, and information on how to build the project in various environments. You can access the documentation using the [Hyperspace Documentation](https://microsoft.github.io/hyperspace/docs/ug-quick-start-guide/) site.
### dev
“dev” folder contains required resources for contributing code to the Hyperspace project as a developer. Currently, it includes the Scala formatting configuration file that is needed to make sure any new code change complies with the Hyperspaces [coding guidelines](https://microsoft.github.io/hyperspace/docs/dg-contributing/).
“dev” folder contains required resources for contributing code to the Hyperspace project as a developer. Currently, it includes the Scala formatting configuration file that is needed to make sure any new code change complies with the Hyperspaces [coding guidelines](https://github.com/microsoft/hyperspace/blob/master/CONTRIBUTING.md).
### build
Hyperspace uses “sbt” as the build tool for building its code. “build” folder contains sbt build scripts and configurations which are used to create an artifact from Hyperspace source code.

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Hack Hyperspace"
permalink: /docs/dg-hack-hyperspace/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Contributing"
permalink: /docs/dg-contributing/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +1,37 @@
---
title: "Configuration"
title: "Introduction"
permalink: /docs/toh-introduction/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
excerpt: "Introduction"
last_modified_at: 2020-06-23
toc: false
classes: wide
---
At large companies, it has now become typical to store datasets ranging in size
from a few GBs to 100s of PBs in data lakes. The scope of analytics
on these datasets ranges from traditional batch-style queries
(e.g., OLAP) to explorative, *finding needle in a haystack* type of
queries (e.g., point-lookups, summarization etc.). Resorting to
linear scans of these large datasets with huge clusters for every
simple query is prohibitively expensive and not the top choice for
many of our customers, who are constantly exploring
ways to reducing their operational costs – incurring unchecked
expenses are their worst nightmare. One way to alleviate this
issue would be to bring in indexing capabilities (which come
*de facto* in the traditional database systems world) into Apache Spark™.
Among many ways to improve query performance and lowering resource
consumption in database systems, indexes are particularly efficient in
providing tremendous acceleration for certain workloads since they could
reduce the amount of data scanned for a given query and thus also result
in lowering resource costs.
Hyperspace is envisioned to be an indexing subsystem for Apache Spark
that introduces the ability for users to build, maintain (through a
multi-user concurrency model) and leverage indexes (automatically,
without any changes to their existing code) on their data (e.g., CSV,
JSON, Parquet etc.) for query/workload acceleration.
The rest of the documentation covers the necessary foundations behind Hyperspace
including the API design, and how it leverages Apache Spark™ Catalyst optimizer to
provide a transparent user experience.

Просмотреть файл

@ -4,7 +4,7 @@ permalink: /docs/toh-indexes/
excerpt: "Indexes, derived datasets and the whole mile of index types"
last_modified_at: 2020-06-23
toc: true
toc_label: "FAQs"
toc_label: "Indexes et al."
toc_icon: "brain"
---

Просмотреть файл

@ -1,8 +1,34 @@
---
title: "Indexes on the Lake"
permalink: /docs/toh-indexes-on-the-lake/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
excerpt: "Index organization on the lake."
last_modified_at: 2020-06-23
toc: false
classes: wide
---
To accomplish the goals outlined in [Design Goals](/hyperspace/docs/toh-design-goals/),
Hyperspace stores all index metadata in the lake, without any external dependencies.
The following shows the organization of the index metadata on the data lake file
system.
```bash
Filesystem Root
├── /indexes/
| └── <index name>
| ├── _hyperspace_log # Hyperspace operation log
| | ├── create (active) # Entry indicating that the index got created
| | ├── refresh (inc) # Entry indicating that the index is getting refreshed
| | └── active # Entry indicating that the index is active again
| ├── ...
| ├── <index-directory-1> # First 'version' of the index ---
| ├── <index-directory-2> # -------------------------------- |- Second version
```
As shown, we store all indexes (or *derived datasets*, in the more general sense)
at the root of the file system. An alternative
design we considered was to co-locate the index with the dataset. However,
since Hyperspace intends to support a more general notion of indexes such as
materialized views (which can span datasets), Hyperspace decouples the index
location from the original data location.

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Metadata on Lake"
permalink: /docs/sim-metadata-on-lake/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Index State Management"
permalink: /docs/sim-index-state-management/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Concurrency Control"
permalink: /docs/sim-concurrency-control/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Intuition"
permalink: /docs/qp-intuition/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Rule-based Optimization"
permalink: /docs/qp-rule-based-optimization/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -1,8 +0,0 @@
---
title: "Integration with Spark Catalyst"
permalink: /docs/qp-integration-with-spark-catalyst/
excerpt: "How to configure Hyperspace for your needs."
last_modified_at: 2020-06-20
toc: true
---

Просмотреть файл

@ -22,6 +22,29 @@ html {
}
}
.page__inner-wrap {
font-size: 14px; // originally 16px
@include breakpoint($medium) {
font-size: 14px; // originally 18px
}
@include breakpoint($large) {
font-size: 14px; // originally 20px
}
@include breakpoint($x-large) {
font-size: 16px; // originally 22px
}
}
.toc .nav__title {
color:#fff;
font-size:.75em;
background:black;
border-top-left-radius:4px;
border-top-right-radius:4px
}
.feature__wrapper___small {
@include clearfix();
margin-bottom: 1em;