Граф коммитов

213 Коммитов

Автор SHA1 Сообщение Дата
Apoorve Dave 2923e77408
add distinct on file ids
Co-authored-by: EJ Song <51077614+sezruby@users.noreply.github.com>
2021-03-26 17:39:42 -07:00
Apoorve Dave ecc455bd84 add filtering of conditions to choose only compatible conditions with the index 2021-03-25 11:13:34 -07:00
Apoorve Dave 87dce8c626 PEFilterIndexRule initial commit 2021-03-24 15:10:39 -07:00
Apoorve Dave 3ccb0ea4df
[Gold Standard]: Initial code for spark only setup with a single query (#384) 2021-03-16 14:58:12 -07:00
Apoorve Dave dc3cf8f747
[Gold Standard] Add resources files for spark queries from spark's plan stability suite (#383) 2021-03-12 16:40:47 -08:00
EJ Song c5714e4fcd
Fix Iceberg lineage for Windows (#375) 2021-03-09 12:56:49 -08:00
EJ Song 2b229d4505
Fix azure-pipelines.yml to use ubuntu-18.04 (#371) 2021-03-02 19:20:13 -08:00
Terry Kim 66076e0fa4
Add a sbt-buildinfo plugin to generate BuildInfo object (#364) 2021-02-22 22:04:50 -08:00
Andrei Ionescu b5c9b46575
Fix for doc of lineagePairs method of IcebergRelation (#362) 2021-02-22 12:52:19 -08:00
Andrei Ionescu 29ebdde030
Support Iceberg table format (#358) 2021-02-22 09:14:11 -08:00
Terry Kim 0fef99705b
Introduce SourceRelation/FileBasedRelation traits to remove direct dependency on LogicalRelation from actions/rules (#355)
Co-authored-by: Andrei Ionescu <webdev.andrei@gmail.com>
2021-02-12 13:44:24 -08:00
Terry Kim 097eb0fcc2
Fix e2e tests on managed tables and enable external table tests. (#349) 2021-02-09 10:38:13 -08:00
EJ Song 88f1b43147
Add config to use bucketed scan for filter indexes (#329) 2021-02-06 00:30:15 -08:00
EJ Song 9ddf44b887
Update user documentation for quick refresh (#348) 2021-02-02 09:32:37 -08:00
Terry Kim c5ec640b11
Update developer list in build.sbt (#346) 2021-01-29 11:35:22 -08:00
Terry Kim 9e7d26dddd
Update home.md to reflect the 0.4.0 release (#344) 2021-01-29 11:08:01 -08:00
EJ Song 7a793f0fd2
Update user documentation for Delta Lake (#343) 2021-01-29 10:37:00 -08:00
EJ Song 5ea6c8a7b3
Update user documentation for Hybrid Scan threshold configs (#339) 2021-01-29 09:48:59 -08:00
Terry Kim 066fdbfccc Setting version to 0.5.0-SNAPSHOT 2021-01-29 09:41:44 -08:00
Terry Kim 4d4d3057ac Setting version to 0.4.0 2021-01-29 09:40:00 -08:00
EJ Song cd9a632e3a
Add new IndexLogEntryTags to cache InMemoryFileIndex (#324) 2021-01-28 16:54:28 -08:00
Rahul Potharaju b64ef850b3
Use Issues for Writing Design Proposals & Move Away from Design Proposal PRs (#340) 2021-01-28 16:16:20 -08:00
Andrei Ionescu 053b6b1e00
Use SparkSession's Hadoop configuration (#327) 2021-01-28 15:51:00 -08:00
EJ Song 6a3bf0b99d
Fix wrong use of map of Set for summation (#336) 2021-01-26 20:13:58 -08:00
Terry Kim 0d8030b7c8
Update azure-pipelines.yml for parallel builds to speed up CI pipeline. (#337) 2021-01-26 18:53:38 -08:00
EJ Song 7f36568ebc
Add new IndexLogEntryTags to avoid duplicate calculation in getCandidateIndexes (#293) 2021-01-25 23:24:03 -08:00
EJ Song 9192481992
Introduce IndexHadoopFsRelation to show applied index name & version in query plan (#323) 2021-01-25 18:28:32 -08:00
EJ Song ec1dfb0f9a
Remove deleted threshold condition for append-only Hybrid Scan (#330) 2021-01-21 23:38:17 -08:00
Terry Kim 22b48cee16
Remove logical plan serde related code (#325) 2021-01-19 20:49:47 -08:00
Terry Kim 39781c5eb0
Speed up HybridScanForDeltaLakeTest (#326) 2021-01-19 20:47:48 -08:00
EJ Song efe33ec596
Refactor Hybrid Scan test suites (#274) 2021-01-18 18:50:01 -08:00
EJ Song 08650ffc73
Support incremental refresh for Delta Lake (#301) 2021-01-13 22:48:19 -08:00
Gurleen Singh 70320cec53
Use Hadoop yarn clock utils for all time based calculations (#313) 2021-01-13 21:39:00 -08:00
EJ Song 363d234e5f
Add similarity thresholds for Hybrid Scan (#300) 2021-01-13 21:02:05 -08:00
Apoorve Dave 117a070661
Support incremental refresh index with hive-partition columns (#281) 2021-01-12 15:59:52 -08:00
EJ Song c7b97aac16
Update notebooks - Hybrid Scan and Incremental Refresh (#294) 2021-01-11 13:42:28 -08:00
Terry Kim 2a1f62d060
Expose hyperspace.index(indexName) for Python binding (#316) 2021-01-07 22:23:16 -08:00
EJ Song b2b99320da
Rename test classes with ending "Test" and test traits with ending "Suite" (#319) 2021-01-07 22:19:40 -08:00
Terry Kim 57745275ae
Clean up unused import/code by enabling -Ywarn-unused in scalacOptions (#315) 2021-01-07 09:18:48 -08:00
Terry Kim 5bd1de78bc
Fix the usage of getting the commons source size tag in JoinIndexRanker (#314) 2021-01-06 19:59:47 -08:00
EJ Song 573639057b
Fix for setting COMMON_SOURCE_SIZE_IN_BYTES tag (#310) 2021-01-06 19:59:12 -08:00
Andrei Ionescu 5d3a4c57e1
Support Hyperspace with Databricks Runtime 5.5 LTS & 6.4 (#303) 2021-01-05 21:53:34 -08:00
EJ Song 3472bd4b49
Improve ranking algorithm for Hybrid Scan (#164) 2021-01-05 20:27:54 -08:00
Pouria Pirzadeh 8eb14f6205
Obtain Enhanced Index Statistics (#286) 2021-01-04 16:17:19 -08:00
Apoorve Dave 931c39d16e
Add globbing pattern instructions to quick start guide (#299) 2021-01-04 13:08:41 -08:00
kaustubhkhare b2e778e629
Remove duplicate getFileIdTracker() in tests (#304) 2021-01-04 10:06:32 -08:00
Gurleen Singh cafaa91389
Fix sample notebooks regarding index and included columns (#296) 2020-12-14 18:34:11 -08:00
EJ Song 2f513d761f
Add hasParquetAsSourceFormat API in source provider (#291) 2020-12-10 19:42:27 -08:00
EJ Song 8fdf28b001
Implement Delta Lake file-based source provider (#265) 2020-12-08 17:54:51 -08:00
Apoorve Dave 82e02cff7a
Support globbing patterns in dataframes for creation/maintenance/usage of indexes. (#269) 2020-12-04 12:28:12 -08:00