​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.
Обновлено 2024-12-01 22:09:05 +03:00
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
Обновлено 2024-12-01 17:00:36 +03:00
🐙 Receives data from the survey_client, evaluates and visualizes it
Обновлено 2024-11-17 21:58:32 +03:00
Benchmark to evaluate performance of Azure Real-Time Services including Azure SignalR and Azure Web PubSub
Обновлено 2024-11-11 13:45:03 +03:00
The DSB benchmark is designed for evaluating both workloaddriven and traditional database systems on modern decision support workloads. DSB is adapted from the widely-used industrialstandard TPC-DS benchmark. It enhances the TPC-DS benchmark with complex data distribution and challenging yet semantically meaningful query templates. DSB also introduces configurable and dynamic workloads to assess the adaptability of database systems. Since workload-driven and traditional database systems have different performance dimensions, including the additional resources required for tuning and maintaining the systems, we provide guidelines on evaluation methodology and metrics to report.
Обновлено 2024-11-08 05:29:20 +03:00
A Json based Rules Engine with extensive Dynamic expression support
Обновлено 2024-11-05 00:45:12 +03:00
This is an open-source implementation of the ITU P.808 standard for "Subjective evaluation of speech quality with a crowdsourcing approach" (see https://www.itu.int/rec/T-REC-P.808/en). It uses Amazon Mechanical Turk as the crowdsourcing platform. It includes implementations for Absolute Category Rating (ACR), Degradation Category Rating (DCR), and Comparison Category Rating (CCR).
Обновлено 2024-05-23 20:22:42 +03:00
Azure Data Explorer can provide valuable insights into your IoT workloads. In the following Hands-On Lab we look at thermostat IoT Devices that are in 3 different office buildings.
Обновлено 2024-04-30 01:41:27 +03:00
Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation [ICML 2021]
Обновлено 2024-03-27 20:07:44 +03:00
A GitHub Action which evaluates twoslash bug reproductions in GitHub Issues
Обновлено 2024-02-15 03:56:40 +03:00
Translation quality evaluation for Firefox Translations models
Обновлено 2023-10-24 00:14:07 +03:00
Translation quality evaluation for Firefox Translations models
Обновлено 2023-10-24 00:14:07 +03:00
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Обновлено 2023-09-07 07:38:34 +03:00
Towards Neural Phrase-based Machine Translation
Обновлено 2023-06-27 16:02:11 +03:00
A test framework to evaluate SSDs and HDDs
Обновлено 2023-06-13 02:29:06 +03:00
Truly Conversational Search is the next logic step in the journey to generate intelligent and useful AI. To understand what this may mean, researchers have voiced a continuous desire to study how people currently converse with search engines. Traditionally, the desire to produce such a comprehensive dataset has been limited because those who have this data (Search Engines) have a responsibility to their users to maintain their privacy and cannot share the data publicly in a way that upholds the trusts users have in the Search Engines. Given these two powerful forces we believe we have a dataset and paradigm that meets both sets of needs: A artificial public dataset that approximates the true data and an ability to evaluate model performance on the real user behavior. What this means is we released a public dataset which is generated by creating artificial sessions using embedding similarity and will test on the original data. To say this again: we are not releasing any private user data but are releasing what we believe to be a good representation of true user interactions.
Обновлено 2023-06-12 21:21:58 +03:00
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR
Обновлено 2023-06-12 21:21:58 +03:00
Record-and-replay tools are indispensable for quality assurance of mobile applications. However, by conducting an empirical study of various existing tools in industrial settings, researchers have concluded that no existing tools under evaluation are sufficient for industrial applications. In this project, we present a record-and-replay tool called SARA towards bridging the gap and targeting a wide adoption.
Обновлено 2023-06-12 21:21:33 +03:00
A tool which evaluates whether or not PR activity in Azure REST API Specs repository meets our SLA.
Обновлено 2023-03-28 19:45:34 +03:00
Обновлено 2023-03-10 02:59:53 +03:00
Tiny expression evaluator
Обновлено 2023-03-05 23:05:27 +03:00
Performance Robustness Evaluation for Statistical Classifiers
Обновлено 2023-01-26 02:24:27 +03:00
STM32Cube MCU Full Package for the STM32F7 series - (HAL + LL Drivers, CMSIS Core, CMSIS Device, MW libraries plus a set of Projects running on all boards provided by ST (Nucleo, Evaluation and Discovery Kits))
Обновлено 2023-01-23 20:23:12 +03:00
Welcome to the Azure Stack HCI Evaluation Guide!
Обновлено 2023-01-17 23:45:00 +03:00
AuctionGym is a simulation environment that enables reproducible evaluation of bandit and reinforcement learning methods for online advertising auctions.
Обновлено 2022-12-22 18:06:09 +03:00
Repository that contains code related to artifact policy evaluation to be used in azure pipelines
Обновлено 2022-12-08 00:24:32 +03:00
A .NET framework for composing, evaluating, inspecting and persisting computational experiments which are represented as a dataflow.
Обновлено 2022-11-28 22:13:18 +03:00
Code for ACL2021 paper: "GLGE: A New General Language Generation Evaluation Benchmark"
Обновлено 2022-10-26 08:50:01 +03:00
Javascript Expression Language: Powerful context-based expression parser and evaluator
Обновлено 2022-10-07 18:17:31 +03:00