TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Обновлено 2024-11-06 19:56:28 +03:00
Обновлено 2024-11-06 04:21:03 +03:00
Data access package for the SubseasonalClimateUSA dataset
Обновлено 2024-11-05 16:57:54 +03:00
Structured data files for topics covered by GitHub's Transparency Report
Обновлено 2024-09-30 19:42:54 +03:00
Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
Обновлено 2024-09-12 18:44:27 +03:00
The ORBIT dataset is a collection of videos of objects in clean and cluttered scenes recorded by people who are blind/low-vision on a mobile phone. The dataset is presented with a teachable object recognition benchmark task which aims to drive few-shot learning on challenging real-world data.
Обновлено 2024-08-13 03:27:45 +03:00
Dataset of Government Open Source Policies
Обновлено 2024-07-06 05:15:10 +03:00
[ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.
Обновлено 2024-04-09 04:30:40 +03:00
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
Обновлено 2024-03-16 09:53:11 +03:00
A Dataset of Python Challenges for AI Research
Обновлено 2023-12-21 00:10:56 +03:00
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Обновлено 2023-09-07 07:38:34 +03:00
Sepsis cohort from MIMIC dataset
Обновлено 2023-07-07 01:16:14 +03:00
Tools to compare metrics between datasets, accounting for population differences and invariant features.
Обновлено 2023-07-07 00:36:40 +03:00
This is the FER+ new label annotations for the Emotion FER dataset.
Обновлено 2023-06-12 23:52:53 +03:00
Dataset and code for three Web crawling-related papers from SIGIR-2019, NeurIPS-2019. and ICML-2020.
Обновлено 2023-06-12 21:21:59 +03:00
Truly Conversational Search is the next logic step in the journey to generate intelligent and useful AI. To understand what this may mean, researchers have voiced a continuous desire to study how people currently converse with search engines. Traditionally, the desire to produce such a comprehensive dataset has been limited because those who have this data (Search Engines) have a responsibility to their users to maintain their privacy and cannot share the data publicly in a way that upholds the trusts users have in the Search Engines. Given these two powerful forces we believe we have a dataset and paradigm that meets both sets of needs: A artificial public dataset that approximates the true data and an ability to evaluate model performance on the real user behavior. What this means is we released a public dataset which is generated by creating artificial sessions using embedding similarity and will test on the original data. To say this again: we are not releasing any private user data but are releasing what we believe to be a good representation of true user interactions.
Обновлено 2023-06-12 21:21:58 +03:00
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension and question answering
Обновлено 2023-06-12 21:21:58 +03:00
Automatically extracting keyphrases that are salient to the document meanings is an essential step to semantic document understanding. An effective keyphrase extraction (KPE) system can benefit a wide range of natural language processing and information retrieval tasks. Recent neural methods formulate the task as a document-to-keyphrase sequence-to-sequence task. These seq2seq learning models have shown promising results compared to previous KPE systems The recent progress in neural KPE is mostly observed in documents originating from the scientific domain. In real-world scenarios, most potential applications of KPE deal with diverse documents originating from sparse sources. These documents are unlikely to include the structure, prose and be as well written as scientific papers. They often include a much diverse document structure and reside in various domains whose contents target much wider audiences than scientists. To encourage the research community to develop a powerful neural model with key phrase extraction on open domains we have created OpenKP: a dataset of over 150,000 documents with the most relevant keyphrases generated by expert annotation.
Обновлено 2023-06-12 21:21:58 +03:00
The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"
Обновлено 2022-11-28 22:11:04 +03:00
Optimal Transport Dataset Distance
Обновлено 2022-02-17 20:49:13 +03:00
methods2test is a supervised dataset consisting of Test Cases and their corresponding Focal Methods from a set of Java software repositories
Обновлено 2022-01-20 03:56:30 +03:00
FS-Mol is A Few-Shot Learning Dataset of Molecules, containing molecular compounds with measurements of activity against a variety of protein targets. The dataset is presented with a model evaluation benchmark which aims to drive few-shot learning research in the domain of molecules and graph-structured data.
Обновлено 2022-01-06 19:18:51 +03:00
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage/document ranking
Обновлено 2022-01-03 19:13:01 +03:00
Create WTML from lists of HiPS datasets
Обновлено 2021-12-27 18:51:34 +03:00
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)
Обновлено 2020-10-13 21:31:30 +03:00
ETL code that produces the addons_daily derived dataset.
Обновлено 2019-07-30 21:35:44 +03:00
Explorer for the OverScripted dataset
Обновлено 2019-03-31 02:45:05 +03:00
Hive import statement generator for Parquet datasets
Обновлено 2018-11-29 01:22:00 +03:00
Sample code showing how to perform distributed training of a Fizyr Keras-RetinaNet model on the COCO dataset using Horovod on Batch AI
Обновлено 2018-08-10 20:56:45 +03:00