Обзор - Git

microsoft / torchgeo

Python 0 0

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

deep-learning pytorch datasets earth-observation models remote-sensing torchvision transforms

Обновлено 2024-11-21 10:30:46 +03:00

microsoft / HiTab

Python 0 0

[ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.

Обновлено 2024-11-19 03:32:30 +03:00

microsoft / qlib

Python 0 0

Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.

machine-learning deep-learning python platform research finance algorithmic-trading auto-quant fintech investment paper quant quant-dataset quant-models quantitative-finance quantitative-trading research-paper stock-data

Обновлено 2024-11-13 06:41:06 +03:00

microsoft / vision-datasets

Python 0 0

Обновлено 2024-11-06 04:21:03 +03:00

microsoft / subseasonal_data

Python 0 0

Data access package for the SubseasonalClimateUSA dataset

Обновлено 2024-11-05 16:57:54 +03:00

github / transparency

Python 0 0

Structured data files for topics covered by GitHub's Transparency Report

open-data dataset data transparency

Обновлено 2024-09-30 19:42:54 +03:00

microsoft / ORBIT-Dataset

Python 0 0

The ORBIT dataset is a collection of videos of objects in clean and cluttered scenes recorded by people who are blind/low-vision on a mobile phone. The dataset is presented with a teachable object recognition benchmark task which aims to drive few-shot learning on challenging real-world data.

microsoft machine-learning computer-vision video benchmark dataset classification few-shot-learning meta-learning object-recognition

Обновлено 2024-08-13 03:27:45 +03:00

github / government-open-source-policies

Python 0 0

Dataset of Government Open Source Policies

open-source open-data government policies

Обновлено 2024-07-06 05:15:10 +03:00

microsoft / vert-papers

Python 0 0

This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).

nlp ml ner named-entity-recognition entity-extraction entity-linking entity-resolution grn language-understanding linkingpark nlp-resources unitrans xl-ner bertel can-ner cross-lingual-ner entity-disambiguation

Обновлено 2024-03-16 09:53:11 +03:00

microsoft / PythonProgrammingPuzzles

Python 0 0

A Dataset of Python Challenges for AI Research

ai program-synthesis programming-competitions puzzles

Обновлено 2023-12-21 00:10:56 +03:00

microsoft / table-transformer

Python 0 0

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.

table-detection table-extraction table-functional-analysis table-structure-recognition

Обновлено 2023-09-07 07:38:34 +03:00

microsoft / mimic_sepsis

Python 0 0

Sepsis cohort from MIMIC dataset

Обновлено 2023-07-07 01:16:14 +03:00

microsoft / MS-Lumos

Python 0 0

Tools to compare metrics between datasets, accounting for population differences and invariant features.

Обновлено 2023-07-07 00:36:40 +03:00

microsoft / FERPlus

Python 0 0

This is the FER+ new label annotations for the Emotion FER dataset.

Обновлено 2023-06-12 23:52:53 +03:00

microsoft / Optimal-Freshness-Crawl-Scheduling

Python 0 0

Dataset and code for three Web crawling-related papers from SIGIR-2019, NeurIPS-2019. and ICML-2020.

Обновлено 2023-06-12 21:21:59 +03:00

microsoft / MSMARCO-Conversational-Search

Python 0 0

Truly Conversational Search is the next logic step in the journey to generate intelligent and useful AI. To understand what this may mean, researchers have voiced a continuous desire to study how people currently converse with search engines. Traditionally, the desire to produce such a comprehensive dataset has been limited because those who have this data (Search Engines) have a responsibility to their users to maintain their privacy and cannot share the data publicly in a way that upholds the trusts users have in the Search Engines. Given these two powerful forces we believe we have a dataset and paradigm that meets both sets of needs: A artificial public dataset that approximates the true data and an ability to evaluate model performance on the real user behavior. What this means is we released a public dataset which is generated by creating artificial sessions using embedding similarity and will test on the original data. To say this again: we are not releasing any private user data but are releasing what we believe to be a good representation of true user interactions.

Обновлено 2023-06-12 21:21:58 +03:00

microsoft / MSMARCO-Question-Answering

Python 0 0

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answering

Обновлено 2023-06-12 21:21:58 +03:00

microsoft / OpenKP

Python 0 0

Automatically extracting keyphrases that are salient to the document meanings is an essential step to semantic document understanding. An effective keyphrase extraction (KPE) system can benefit a wide range of natural language processing and information retrieval tasks. Recent neural methods formulate the task as a document-to-keyphrase sequence-to-sequence task. These seq2seq learning models have shown promising results compared to previous KPE systems The recent progress in neural KPE is mostly observed in documents originating from the scientific domain. In real-world scenarios, most potential applications of KPE deal with diverse documents originating from sparse sources. These documents are unlikely to include the structure, prose and be as well written as scientific papers. They often include a much diverse document structure and reside in various domains whose contents target much wider audiences than scientists. To encourage the research community to develop a powerful neural model with key phrase extraction on open domains we have created OpenKP: a dataset of over 150,000 documents with the most relevant keyphrases generated by expert annotation.

Обновлено 2023-06-12 21:21:58 +03:00

microsoft / human-pose-estimation.pytorch

Python 0 0

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

deep-learning human-pose-estimation coco-keypoints-detection mpii-dataset mscoco-keypoint

Обновлено 2022-11-28 22:11:04 +03:00

microsoft / otdd

Python 0 0

Optimal Transport Dataset Distance

Обновлено 2022-02-17 20:49:13 +03:00

microsoft / methods2test

Python 0 0

methods2test is a supervised dataset consisting of Test Cases and their corresponding Focal Methods from a set of Java software repositories

machine-learning automated-testing

Обновлено 2022-01-20 03:56:30 +03:00

microsoft / FS-Mol

Python 0 0

FS-Mol is A Few-Shot Learning Dataset of Molecules, containing molecular compounds with measurements of activity against a variety of protein targets. The dataset is presented with a model evaluation benchmark which aims to drive few-shot learning research in the domain of molecules and graph-structured data.

Обновлено 2022-01-06 19:18:51 +03:00

microsoft / MSMARCO-Document-Ranking

Python 0 0

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage/document ranking

Обновлено 2022-01-03 19:13:01 +03:00

WorldWideTelescope / wwt-hips-list-importer

Python 0 0

Create WTML from lists of HiPS datasets

Обновлено 2021-12-27 18:51:34 +03:00

microsoft / debiasing-item2item

Python 0 0

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Обновлено 2020-10-13 21:31:30 +03:00

mozilla / addons_daily

Python 0 0

ETL code that produces the addons_daily derived dataset.

Обновлено 2019-07-30 21:35:44 +03:00

mozilla / overscripted-explorer

Python 0 0

Explorer for the OverScripted dataset

Обновлено 2019-03-31 02:45:05 +03:00

mozilla / parquet2hive

Python 0 0

Hive import statement generator for Parquet datasets

Обновлено 2018-11-29 01:22:00 +03:00

Azure / batchai_retinanet_horovod_coco

Python 0 0

Sample code showing how to perform distributed training of a Fizyr Keras-RetinaNet model on the COCO dataset using Horovod on Batch AI

Обновлено 2018-08-10 20:56:45 +03:00