TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Обновлено 2024-11-06 19:56:28 +03:00
Обновлено 2024-11-06 04:21:03 +03:00
Data access package for the SubseasonalClimateUSA dataset
Обновлено 2024-11-05 16:57:54 +03:00
Structured data files for topics covered by GitHub's Transparency Report
Обновлено 2024-09-30 19:42:54 +03:00
Microsoft Azure Traces
Обновлено 2024-09-28 20:49:03 +03:00
Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
Обновлено 2024-09-12 18:44:27 +03:00
A high-performance modern set of graph rendering components, which enables users to visualize large graph datasets on the web.
Обновлено 2024-08-29 15:29:55 +03:00
A dataset of real DNA traces for benchmarking trace reconstruction algorithms
Обновлено 2024-08-13 21:17:32 +03:00
The ORBIT dataset is a collection of videos of objects in clean and cluttered scenes recorded by people who are blind/low-vision on a mobile phone. The dataset is presented with a teachable object recognition benchmark task which aims to drive few-shot learning on challenging real-world data.
Обновлено 2024-08-13 03:27:45 +03:00
Notebooks and documentation for AI-for-Earth-managed datasets on Azure
Обновлено 2024-07-25 14:51:04 +03:00
Dataset of Government Open Source Policies
Обновлено 2024-07-06 05:15:10 +03:00
[ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.
Обновлено 2024-04-09 04:30:40 +03:00
InnerEye dataset creation tool for InnerEye-DeepLearning library. Transforms DICOM data into mask for training Deep Learning models.
Обновлено 2024-03-21 12:52:00 +03:00
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
Обновлено 2024-03-16 09:53:11 +03:00
C# and F# language binding and extensions to Apache Spark
Обновлено 2024-01-30 22:45:57 +03:00
A Dataset of Python Challenges for AI Research
Обновлено 2023-12-21 00:10:56 +03:00
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Обновлено 2023-09-07 07:38:34 +03:00
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Обновлено 2023-07-07 01:41:03 +03:00
Sepsis cohort from MIMIC dataset
Обновлено 2023-07-07 01:16:14 +03:00
Normalized Trend Filtering for Biomedical Datasets
Обновлено 2023-07-07 01:07:14 +03:00
Tools to compare metrics between datasets, accounting for population differences and invariant features.
Обновлено 2023-07-07 00:36:40 +03:00
Code Hunt is a serious education game which has been played by over 140,000 students and enthusiasts over the past year. In the process we have collected over 1.5M programs. We hope that researchers will embark on research into the data. Please fill our quick survey to let us know how you are using the dataset, and get updates about new releases of Code Hunt data. See more on our Code Hunt Research page.
Обновлено 2023-06-27 16:09:36 +03:00
This repo contains a walkthrough of how to use RServer for HDInsight with large data sets like Criteo.
Обновлено 2023-06-27 16:07:15 +03:00
This is the FER+ new label annotations for the Emotion FER dataset.
Обновлено 2023-06-12 23:52:53 +03:00
This project created to analyze, compare and identify whale tails from the Kaggle competition dataset, "Humpback Whale Identification Challenge". It is written in Python, and uses the Keras API with Tensorflow backend. The project implemented both a Siamese Network and a SoftMax classifier with center loss.
Обновлено 2023-06-12 22:29:42 +03:00
Dataset and code for three Web crawling-related papers from SIGIR-2019, NeurIPS-2019. and ICML-2020.
Обновлено 2023-06-12 21:21:59 +03:00
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR
Обновлено 2023-06-12 21:21:58 +03:00