PRESC/datasets
David Zeber 6acb4a8147
Merge pull request #225 from jdiego-miyashiro/master
Added tbe ML Workflow and the spatial_distribution class
2020-11-16 19:14:35 -06:00
..
Bike sharing demand dataset
README.md
Surgical-deepnet.csv
defaults.csv
eeg.csv
gender_classification.csv
generated.csv
mushrooms.csv
vehicles.csv
winequality.csv

README.md

Classification datasets

File Description Source link (with details) Preprocessing applied Label column
generated.csv Automatically-generated dataset containing data samples separated into very well-delineated categories. This can be considered a "best-case scenario" test case. label
defaults.csv Defaults on credit card payments UCI Minor (column name reformatting) defaulted
winequality.csv Quality ratings of Portuguese white wines UCI Added binarized label column recommend indicating quality >= 7 recommend
vehicles.csv Recognizing vehicle type from its silhouette OpenML None Class
eeg.csv EEG eye state measurements OpenML Dropped a few outlier rows Class
kick_starter.csv Kick stater project state Kaggle Dropped unnamed columns; Minor column name reformatting; Calculated duration of the project and dropped start and end dates; Dropped some rows with wrong input type; Dropped main category column and kept category column; randomply sampled 30% of the data; Filled NA with 0 for numeric values state
mushrooms.csv Classification mushrooms edibility based on physical features UCI Renamed the column class to edibility for descriptiveness edibility
Surgical-deepnet.csv Surgical cases related to complication Kaggle None complication
gender_classification.csv use hobbies to guess gender Kaggle None Gender

These can all be loaded using Pandas:

import pandas as pd
dataset = pd.read_csv("file.csv")