TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Обновлено 2024-11-06 19:56:28 +03:00
Обновлено 2024-11-06 04:21:03 +03:00
Data access package for the SubseasonalClimateUSA dataset
Обновлено 2024-11-05 16:57:54 +03:00
Structured data files for topics covered by GitHub's Transparency Report
Обновлено 2024-09-30 19:42:54 +03:00
Microsoft Azure Traces
Обновлено 2024-09-28 20:49:03 +03:00
Perception toolkit for sim2real training and validation in Unity
machine-learning
computer-vision
deep-learning
detection
domain-randomization
object-detection
perception
pose-estimation
segmentation
synthetic-dataset-generation
Обновлено 2024-09-23 21:19:23 +03:00
Qlib is an AI-oriented quantitative investment platform that aims to realize the potential, empower research, and create value using AI technologies in quantitative investment, from exploring ideas to implementing productions. Qlib supports diverse machine learning modeling paradigms. including supervised learning, market dynamics modeling, and RL.
machine-learning
deep-learning
python
platform
research
finance
algorithmic-trading
auto-quant
fintech
investment
paper
quant
quant-dataset
quant-models
quantitative-finance
quantitative-trading
research-paper
stock-data
Обновлено 2024-09-12 18:44:27 +03:00
A high-performance modern set of graph rendering components, which enables users to visualize large graph datasets on the web.
Обновлено 2024-08-29 15:29:55 +03:00
A dataset of real DNA traces for benchmarking trace reconstruction algorithms
Обновлено 2024-08-13 21:17:32 +03:00
The ORBIT dataset is a collection of videos of objects in clean and cluttered scenes recorded by people who are blind/low-vision on a mobile phone. The dataset is presented with a teachable object recognition benchmark task which aims to drive few-shot learning on challenging real-world data.
microsoft
machine-learning
computer-vision
video
benchmark
dataset
classification
few-shot-learning
meta-learning
object-recognition
Обновлено 2024-08-13 03:27:45 +03:00
Notebooks and documentation for AI-for-Earth-managed datasets on Azure
Обновлено 2024-07-25 14:51:04 +03:00
Dataset of Government Open Source Policies
Обновлено 2024-07-06 05:15:10 +03:00
[ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.
Обновлено 2024-04-09 04:30:40 +03:00
InnerEye dataset creation tool for InnerEye-DeepLearning library. Transforms DICOM data into mask for training Deep Learning models.
Обновлено 2024-03-21 12:52:00 +03:00
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
nlp
ml
ner
named-entity-recognition
entity-extraction
entity-linking
entity-resolution
grn
language-understanding
linkingpark
nlp-resources
unitrans
xl-ner
bertel
can-ner
cross-lingual-ner
entity-disambiguation
Обновлено 2024-03-16 09:53:11 +03:00
Unity's privacy-preserving human-centric synthetic data generator
unity
unity3d
deep-learning
computer-vision
pose-estimation
object-detection
synthetic-data
perception
synthetic-dataset-generation
billing-5160
synthetic-datasets
applied-ml-research
human-activity-recognition
human-centric-ml
human-pose-estimation
icml-2022
labeling
owner-machine-learning
synthetic-data-generation
transfer-learning
Обновлено 2024-03-05 04:05:37 +03:00
C# and F# language binding and extensions to Apache Spark
csharp
fsharp
spark
dataset
bigdata
spark-streaming
streaming
apache-spark
rdd
dataframe
dstream
eventhubs
kafka-streaming
mapreduce
mobius
near-real-time
Обновлено 2024-01-30 22:45:57 +03:00
A Dataset of Python Challenges for AI Research
Обновлено 2023-12-21 00:10:56 +03:00
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.
Обновлено 2023-09-07 07:38:34 +03:00
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Обновлено 2023-07-07 01:41:03 +03:00
Sepsis cohort from MIMIC dataset
Обновлено 2023-07-07 01:16:14 +03:00
Normalized Trend Filtering for Biomedical Datasets
Обновлено 2023-07-07 01:07:14 +03:00
Tools to compare metrics between datasets, accounting for population differences and invariant features.
Обновлено 2023-07-07 00:36:40 +03:00
SynthDet - An end-to-end object detection pipeline using synthetic data
machine-learning
deep-learning
computer-vision
pose-estimation
synthetic-data
synthetic-dataset-generation
detection
domain-randomization
object-detection
synthetic-dataset
Обновлено 2023-07-05 23:50:31 +03:00
Code Hunt is a serious education game which has been played by over 140,000 students and enthusiasts over the past year. In the process we have collected over 1.5M programs. We hope that researchers will embark on research into the data. Please fill our quick survey to let us know how you are using the dataset, and get updates about new releases of Code Hunt data. See more on our Code Hunt Research page.
Обновлено 2023-06-27 16:09:36 +03:00
This repo contains a walkthrough of how to use RServer for HDInsight with large data sets like Criteo.
Обновлено 2023-06-27 16:07:15 +03:00
This is the FER+ new label annotations for the Emotion FER dataset.
Обновлено 2023-06-12 23:52:53 +03:00
This project created to analyze, compare and identify whale tails from the Kaggle competition dataset, "Humpback Whale Identification Challenge". It is written in Python, and uses the Keras API with Tensorflow backend. The project implemented both a Siamese Network and a SoftMax classifier with center loss.
Обновлено 2023-06-12 22:29:42 +03:00
Dataset and code for three Web crawling-related papers from SIGIR-2019, NeurIPS-2019. and ICML-2020.
Обновлено 2023-06-12 21:21:59 +03:00
MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR
Обновлено 2023-06-12 21:21:58 +03:00