AzureKusto/README.md

2.8 KiB

AzureKusto

R interface to Kusto, also known as Azure Data Explorer, a fast and highly scalable data exploration service.

Installation

You can install the development version from GitHub, via devtools::install_github("cloudyr/AzureKusto").

Example Usage

Kusto Endpoint Interface

Connect to a Kusto cluster by instantiating a kusto_query_endpoint object with the cluster URI, database name, and an AzureRMR::AzureToken object, which you can obtain via the get_kusto_token helper function.


library(AzureKusto)

Samples <- kusto_query_endpoint(server="https://help.kusto.windows.net",
    database="Samples",
    fed=TRUE,
    .azure_token=get_kusto_token("help", tenant="microsoft"))

# To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FPD8GZPY9 to authenticate.
# AADSTS70016: Pending end-user authorization. 

Now you can issue queries to the Kusto database with run_query and get the results back as a data.frame.


res <- run_query(Samples, "StormEvents | summarize EventCount = count() by State | order by State asc")

as_tibble(res)

## # A tibble: 67 x 2
##    State          EventCount     
##    <chr>          <S3: integer64>
##  1 ALABAMA        1315           
##  2 ALASKA         " 257"         
##  3 AMERICAN SAMOA "  16"         
##  4 ARIZONA        " 340"         
##  5 ARKANSAS       1028           
##  6 ATLANTIC NORTH " 188"         
##  7 ATLANTIC SOUTH " 193"         
##  8 CALIFORNIA     " 898"         
##  9 COLORADO       1654           
## 10 CONNECTICUT    " 148"         
## # ... with 57 more rows

dplyr Interface

The package also implements a dbplyr-style interface for building a query upon a tbl_kusto object and then running it on the remote Kusto database and returning the result as a regular tibble object with collect().


StormEvents <- tbl_kusto(Samples, "StormEvents")

q <- StormEvents %>%
    group_by(State) %>%
    summarize(EventCount=n()) %>%
    arrange(State)

show_query(q)

# <KQL> database('Samples').StormEvents
# | summarize EventCount = count() by State
# | order by State asc

collect(q)

## # A tibble: 67 x 2
##    State          EventCount     
##    <chr>          <S3: integer64>
##  1 ALABAMA        1315           
##  2 ALASKA         " 257"         
##  3 AMERICAN SAMOA "  16"         
##  4 ARIZONA        " 340"         
##  5 ARKANSAS       1028           
##  6 ATLANTIC NORTH " 188"         
##  7 ATLANTIC SOUTH " 193"         
##  8 CALIFORNIA     " 898"         
##  9 COLORADO       1654           
## 10 CONNECTICUT    " 148"         
## # ... with 57 more rows


cloudyr project logo