R interface to Azure Data Explorer, aka Kusto
Перейти к файлу
Alex Kyllo 60e7721f44 Vignette (#46)
* start vignette

* draft vignette

* start vignette

* draft vignette

* update copyright year to 2018-2019

* remove author line from vignette

* update readme, desc

* oops
2019-04-16 15:00:19 +10:00
R add compute and copy_to verbs (#45) 2019-03-27 05:45:12 +11:00
man add compute and copy_to verbs (#45) 2019-03-27 05:45:12 +11:00
tests add compute and copy_to verbs (#45) 2019-03-27 05:45:12 +11:00
vignettes Vignette (#46) 2019-04-16 15:00:19 +10:00
.Rbuildignore initial commit 2018-12-11 14:05:23 +11:00
.gitattributes initial commit 2018-12-11 14:05:23 +11:00
.gitignore Vignette (#46) 2019-04-16 15:00:19 +10:00
.travis.yml Create .travis.yml 2019-01-29 21:51:02 +11:00
AzureKusto.Rproj initial name change 2019-01-09 17:33:34 +11:00
AzureKusto.rxproj initial name change 2019-01-09 17:33:34 +11:00
AzureKusto.sln initial name change 2019-01-09 17:33:34 +11:00
DESCRIPTION Vignette (#46) 2019-04-16 15:00:19 +10:00
LICENSE Vignette (#46) 2019-04-16 15:00:19 +10:00
LICENSE.md Vignette (#46) 2019-04-16 15:00:19 +10:00
NAMESPACE add compute and copy_to verbs (#45) 2019-03-27 05:45:12 +11:00
README.md Vignette (#46) 2019-04-16 15:00:19 +10:00

README.md

AzureKusto

R interface to Kusto, also known as Azure Data Explorer, a fast and highly scalable data exploration service.

Installation

You can install the development version from GitHub. Note that if you are using Microsoft R, AzureKusto requires recent versions of some packages which will likely not be in your default MRAN snapshot. You can set the repository to CRAN before installing.

options(repos="https://cloud.r-project.org")
devtools::install_github("cloudyr/AzureKusto")

Example usage

Kusto endpoint interface

Connect to a Kusto cluster by instantiating a kusto_database_endpoint object with the cluster URI and database name.


library(AzureKusto)

Samples <- kusto_database_endpoint(server="https://help.kusto.windows.net", database="Samples")

# To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FPD8GZPY9 to authenticate.
# Waiting for device code in browser...
# Press Esc/Ctrl + C to abort
# Authentication complete.

Now you can issue queries to the Kusto database with run_query and get the results back as a data.frame.


res <- run_query(Samples, "StormEvents | summarize EventCount = count() by State | order by State asc")

head(res)

##            State EventCount
## 1        ALABAMA       1315
## 2         ALASKA        257
## 3 AMERICAN SAMOA         16
## 4        ARIZONA        340
## 5       ARKANSAS       1028
## 6 ATLANTIC NORTH        188

run_query() also supports query parameters. Simply pass your parameters as additional keyword arguments and they will be escaped and interpolated into the query string.


res <- run_query(Samples, "MyFunction(lim)", lim=10L)

Command statements work much the same way, except that they do not accept parameters.


res <- run_query(Samples, ".show tables | count")

dplyr Interface

The package also implements a dplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect().


library(dplyr)

StormEvents <- tbl_kusto(Samples, "StormEvents")

q <- StormEvents %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
    arrange(State)

show_query(q)

## <KQL> database('Samples').['StormEvents']
## | summarize ['EventCount'] = count() by ['State']
## | order by ['State'] asc

collect(q)

## # A tibble: 67 x 2
##    State          EventCount
##    <chr>               <dbl>
##  1 ALABAMA              1315
##  2 ALASKA                257
##  3 AMERICAN SAMOA         16
##  4 ARIZONA               340
##  5 ARKANSAS             1028
##  6 ATLANTIC NORTH        188
##  7 ATLANTIC SOUTH        193
##  8 CALIFORNIA            898
##  9 COLORADO             1654
## 10 CONNECTICUT           148
## # ... with 57 more rows

tbl_kusto also accepts query parameters, in case the Kusto source table is a parameterized function:


MyFunctionDate <- tbl_kusto(Samples, "MyFunctionDate(dt)", dt=as.Date("2019-01-01"))

MyFunctionDate %>%
    select(StartTime, EndTime, EpisodeId, EventId, State) %>%
    head() %>%
    collect()

## # A tibble: 6 x 5
##   StartTime           EndTime             EpisodeId EventId State         
##   <dttm>              <dttm>                  <int>   <int> <chr>         
## 1 2007-09-29 08:11:00 2007-09-29 08:11:00     11091   61032 ATLANTIC SOUTH
## 2 2007-09-18 20:00:00 2007-09-19 18:00:00     11074   60904 FLORIDA       
## 3 2007-09-20 21:57:00 2007-09-20 22:05:00     11078   60913 FLORIDA       
## 4 2007-12-30 16:00:00 2007-12-30 16:05:00     11749   64588 GEORGIA       
## 5 2007-12-20 07:50:00 2007-12-20 07:53:00     12554   68796 MISSISSIPPI   
## 6 2007-12-20 10:32:00 2007-12-20 10:36:00     12554   68814 MISSISSIPPI   

DBI interface

AzureKusto implements a subset of the DBI specification for interacting with databases. It should be noted that Kusto is quite different to the SQL databases that DBI targets, which affects the behaviour of certain DBI methods and renders other moot.

library(DBI)

# connect to the server: basically a wrapper for kusto_database_endpoint()
Samples <- dbConnect(AzureKusto(),
                     server="https://help.kusto.windows.net",
                     database="Samples")

dbListTables(Samples)

## [1] "StormEvents"       "demo_make_series1" "demo_series2"     
## [4] "demo_series3"      "demo_many_series1"

dbExistsTable(Samples, "StormEvents")

##[1] TRUE

dbGetQuery(Samples, "StormEvents | summarize ct = count()")

##      ct
## 1 59066

Azure Resource Manager interface

On the admin side, AzureKusto extends the framework supplied by the AzureRMR to support Kusto. Methods are provided to create and delete clusters and databases, and manage database principals.

# create a new Kusto cluster
az <- AzureRMR::get_azure_login()
ku <- az$
    get_subscription("sub_id")$
    get_resource_group("rgname")$
    create_kusto_cluster("mykustocluster")

# create a new database
db1 <- ku$create_database("database1")

# add a user
db1$add_principals("myusername", role="User", fqn="aaduser=username@mydomain")

cloudyr project logo