tech.ml.dataset

A Clojure high performance data processing system

Clojure624epl-1.0

16 days ago

clojurecsvdataframe

dataset

Easy-to-use data handling for SQL data stores with support for implicit table cr

Python4671mit

7 months ago

databasepythonsql

dataset-api

dataset-api

The ApolloScape Open Dataset for Autonomous Driving and its Application.

Jupyter Notebook522apache-2.0

10 months ago

3d-car-instance3d-lidarapolloscape-dataset

awesome-public-datasets

awesome-public-datasets

A topic-centric list of HQ open datasets.

57483mit

3 months ago

aaron-swartzawesome-public-datasetsdatasets

datasets-games

Datasets from a variety of games.

12

5 months ago

datagame

quickdraw-dataset

quickdraw-dataset

Documentation on how to access and use the Quick, Draw! Dataset.

5882other

7 months ago

datasetquickdraw-dataset

awesome-json-datasets

A curated list of awesome JSON datasets that don't require authentication.

JavaScript3133cc0-1.0

8 months ago

awesomeawesome-listdata

awesome-dataset-tools

🔧 A curated list of awesome dataset tools

763mit

10 months ago

annotation-toolannotationsawsome

game-datasets

:video_game: A curated list of awesome game datasets, and tools to artificial in

617cc-by-4.0

4 months ago

artificial-intelligenceawesomeawesome-game

dataset-packed-pe

dataset-packed-pe

Dataset of packed PE samples

Python21

15 days ago

binary-analysisdatasetexecutable-packing

transfermarkt-datasets

transfermarkt-datasets

⚽️ Extract, prepare and publish Transfermarkt datasets.

Python146cc0-1.0

3 months ago

analyticsdatasetdbt

dataset-serialize

dataset-serialize

JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)

Pascal560mit

4 months ago

child-datasetconverterdataset

tablecloth

Dataset manipulation library built on the top of tech.ml.dataset

HTML259mit

18 days ago

clojuredataframedataset

AIF360

AIF360

A comprehensive set of fairness metrics for datasets and machine learning models

Python2209apache-2.0

4 months ago

aiartificial-intelligencebias

waymo-open-dataset

waymo-open-dataset

Waymo Open Dataset

Python2496other

7 days ago

autonomous-drivingdataset

netex-gtfs-converter-java

Convert a NeTEx dataset into a GTFS dataset

Java6eupl-1.2

last month

ror

TCPD

The Turing Change Point Dataset - A collection of time series for the evaluation

Python123mit

8 months ago

change-detectionchange-pointchange-point-detection

congresstweets

Datasets of the daily Twitter output of Congress.

SCSS91mit

9 months ago

congresshousehouse-of-representatives

kitti360Scripts

This repository contains utility scripts for the KITTI-360 dataset.

Python343mit

11 months ago

AutoViz

AutoViz

Automatically Visualize any dataset, any size with a single line of code. Creat

Python1586apache-2.0

last month

auto-sklearnautomated-machine-learningautoml

sweetviz

sweetviz

Visualize and compare datasets, target values and associations, with one line of

Python2791mit

4 months ago

data-analysisdata-explorationdata-profiling

ffridataset-scripts

Make datasets like FFRI Dataset

Python9apache-2.0

9 months ago

AVData

AVData

Autonomous Vehicle Seasonal Dataset

C++289mit

10 months ago

agridat

agridat

Agricultural datasets

R101other

2 months ago

datarstats

MaleX

MaleX

A curated dataset of malware and benign Windows executable samples for malware r

HTML31gpl-3.0

3 months ago

deep-learningimage-classificationmachine-learning

medal

medal

Large medical text dataset curated for abbreviation disambiguation, designed for

Python191

5 months ago

combine_gtfs_feeds

A tool to combine gtfs datasets into one feed.

Python4mit

11 months ago

pyboreas

pyboreas

Devkit for the Boreas autonomous driving dataset.

Python79bsd-3-clause

4 days ago

frostline

frostline

A dataset, API, and parser for USDA plant hardiness zones.

Python139mit

4 months ago

apifarmgarden

JAX-Flax-Tutorial-Image-Classification-with-Linen

How to use the Flax Linen API to build a convolutional neural network model and

Jupyter Notebook21

7 months ago

duecredit

duecredit

Automated collection and reporting of citations for used software/methods/datase

Python231other

last month

wormtable

Write-once-read-many table for large datasets.

Python26lgpl-3.0

6 months ago

docker-packing-box

docker-packing-box

Docker image gathering packers and tools for making datasets of packed executabl

Python42gpl-3.0

6 days ago

binary-analysisdataset-generationdocker-image

FirstCourseNetworkScience

FirstCourseNetworkScience

Tutorials, datasets, and other material associated with textbook "A First Course

Jupyter Notebook314other

5 months ago

datasetsindiana-universitynetwork-science

DataProfiler

DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

Python1347apache-2.0

10 days ago

avrocsvdata-analysis

utbm_robocar_dataset

utbm_robocar_dataset

EU Long-term Dataset with Multiple Sensors for Autonomous Driving

C++207

10 months ago

autonomous-drivingdatasetlidar-odometry

perspective

A data visualization and analytics component, especially well-suited for large a

C++7329apache-2.0

24 days ago

analyticsbidata-visualization

tablib

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

Python4305mit

5 months ago

meteor-tabular

Reactive datatables for large or small datasets

JavaScript363mit

5 months ago

mirdata

mirdata

Python library for working with Music Information Retrieval datasets

Python319bsd-3-clause

4 months ago

audiodatasetmir

tonic

tonic

Publicly available event datasets and transforms.

Python183gpl-3.0

19 days ago

augmentationdatasetsevent-based

maptable

maptable

JS library which converts any dataset to an interactive set of components: a cho

JavaScript57mit

10 months ago

xarray

xarray

N-D labeled arrays and datasets in Python

Python3352apache-2.0

30 days ago

dasknetcdfnumpy

ambrosia

clean up your LLM datasets

Go104mit

10 months ago

Chatito

Chatito

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or

TypeScript859mit

7 months ago

chatbotchatbotschatito

networkdata

R package containing several network datasets

R139other

5 days ago

datasetnetwork-analysisrpackage

pointcloudset

pointcloudset

Efficient analysis of large datasets of point clouds recorded over time

Python40mit

11 months ago

3d4d4d-point-cloud

audiomate

Python library for handling audio datasets.

Python124mit

9 months ago

audioaudio-datasetscorpus-tools

Deep-Learning-for-Tracking-and-Detection

Deep-Learning-for-Tracking-and-Detection

Collection of papers, datasets, code and other resources for object tracking and

HTML2276

4 months ago

code-collectiondeep-learningdetection

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for dat

Scala3068apache-2.0

last month

dataqualityscalaspark

LLVIP

LLVIP

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

Jupyter Notebook538

10 months ago

cnncomputer-visiondeep-learning

3d-tiles

3d-tiles

Specification for streaming massive heterogeneous 3D geospatial datasets :earth_

Batchfile1974

3 days ago

3d-models3d-tilesgeospatial

awesome-transit

Community list of transit APIs, apps, datasets, research, and software :bus::sta

1140cc0-1.0

2 months ago

awesomeawesome-listbus

embedchain

Framework to easily create LLM powered bots over any dataset.

Python3887apache-2.0

7 months ago

aichatbotchatgpt

Analytics-Cloud-Dataset-Utils

Friendly utility to load your on-prem data, whether large or small, to Einstein

JavaScript127other

9 months ago

lightning-bolts

lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.

Python1621apache-2.0

2 months ago

aiganimage-processing

hollow

hollow

Hollow is a java library and toolset for disseminating in-memory datasets from a

Java1132apache-2.0

2 months ago

python-dsff

python-dsff

DataSet File Format (DSFF)

Python0gpl-3.0

6 months ago

datasetdsfffile-format

potree

potree

WebGL point cloud viewer for large datasets

JavaScript4190other

22 days ago

domains

domains

World’s single largest Internet domains dataset

HTML606bsd-3-clause

6 months ago

collydatasetinternet-domains

awesome-remote-sensing-change-detection

List of datasets, codes, and contests related to remote sensing change detection

1205

8 months ago

awesomechange-detectiondataset

batchflow

BatchFlow helps you conveniently work with random or sequential batches of your

Python195apache-2.0

2 months ago

data-sciencemachine-learningpipeline

implicit

Fast Python Collaborative Filtering for Implicit Feedback Datasets

Python3333mit

4 months ago

collaborative-filteringmachine-learningmatrix-factorization

entwine

entwine

Entwine - point cloud organization for massive datasets

C++415other

29 days ago

vue-dataset

A set of Vue.js components to display datasets (lists) with filtering, paging, a

JavaScript218mit

5 months ago

datagriddatasetdatatable

torchxrayvision

torchxrayvision

TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segm

Jupyter Notebook815apache-2.0

14 days ago

chest-radiographschest-xraychest-xray-images

nuscenes-devkit

nuscenes-devkit

The devkit of the nuScenes dataset.

Python2069other

10 hours ago

vision

Datasets, Transforms and Models specific to Computer Vision

Python14866bsd-3-clause

4 months ago

computer-visionmachine-learning

imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Python6572mit

4 months ago

data-analysisdata-sciencemachine-learning

sigsep-mus-db

sigsep-mus-db

Python parser and tools for MUSDB18 Music Separation Dataset

Python137mit

4 months ago

datasetmusmusdb

argoverse-api

argoverse-api

Official GitHub repository for Argoverse dataset

Python798other

3 months ago

NLP-progress

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including

Python22086mit

3 months ago

dialoguemachine-learningmachine-translation

awesome-lidar

😎 Awesome LIDAR list. The list includes LIDAR manufacturers, datasets, point cl

739cc0-1.0

4 months ago

3d3d-lidarautonomous-driving

extract-gtfs-pathways

Command-line tool to extract pathways from a GTFS dataset.

JavaScript3isc

3 months ago

accessibilitycligeojson

AgML

AgML

AgML is a centralized framework for agricultural machine learning. AgML provides

Python137apache-2.0

last month

agriculturecomputer-visiondataset

bloomfilter-rb

bloomfilter-rb

BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting fil

C468

6 days ago

awesome-transit

awesome-transit

Description Real-time transit information for the Puget Sound region and beyo

1176cc0-1.0

23 days ago

awesomeawesome-listbus

dstc8-schema-guided-dialogue

The Schema-Guided Dialogue Dataset

Python494cc-by-sa-4.0

8 months ago

assistantdatasetdialogue

clothing-dataset

clothing-dataset

Closing dataset, all classes

87cc0-1.0

3 years ago

awesome-satellite-imagery-datasets

awesome-satellite-imagery-datasets

🛰️ List of satellite image training datasets with annotations for computer visi

3231mit

2 years ago

computer-visiondeep-learningearth-observation

dataset

Crop/Weed Field Image Dataset

128

9 years ago

agricultureannotationsclassification

dbfc-dataset

dbfc-dataset

Single DBFC Dataset

Jupyter Notebook20cc-by-4.0

2 years ago

catalystchemistrydata

nlp-datasets

A list of datasets/corpora for NLP tasks, in reverse chronological order.

923

4 years ago

nlp-datasets

Alphabetical list of free/public domain datasets with text data for use in Natur

5544

last year

dataset

The Open Images dataset

Python4202apache-2.0

3 years ago

dataset-packed-elf

Dataset of packed ELF samples

12

last year

binary-analysisdatasetelf-binaries

skytrax-reviews-dataset

An air travel dataset consisting of user reviews from Skytrax (www.airlinequalit

Python70cc0-1.0

9 years ago

Brno-Urban-Dataset

Brno-Urban-Dataset

Navigation and localisation dataset for self driving cars and autonomous robots

142mit

2 years ago

torch-datasets

A collection of machine learning datasets for use with Torch7.

Lua37bsd-3-clause

10 years ago

lemon-dataset

lemon-dataset

Lemons quality control dataset

96

4 years ago

datasetlemonademachine-learning

awesome-robotics-datasets

A collection of useful datasets for robotics and computer vision

295

3 years ago

computer-visiondatasetrobotics

gtfs-rt-differential-to-full-dataset

Transform a differential GTFS Realtime feed into a full dataset/dump.

JavaScript3isc

2 years ago

differentialgtfs-realtimegtfs-rt

Objectron

Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the vi

Jupyter Notebook2202other

2 years ago

3d3d-reconstruction3d-vision

gta-3d-dataset

gta-3d-dataset

A dataset of 2D imagery, 3D point cloud data, and 3D vehicle bounding box labels

Python123

2 years ago

gbfs-tools-and-resources

Community list of shared micromobility APIs, apps, datasets, research, and softw

31cc0-1.0

last year

Hyperopt-Keras-CNN-CIFAR-100

Hyperopt-Keras-CNN-CIFAR-100

Auto-optimizing a neural net (and its architecture) on the CIFAR-100 dataset. Co

Python106other

6 years ago

cnncnn-kerashyperopt

All-Age-Faces-Dataset

All-Age-Faces-Dataset

All-Age-Faces (AAF) Database.

168

5 years ago

ocl-dataset

A Data Set of OCL Expressions on GitHub

Java4

last year

Gekko-Datasets

Gekko-Datasets

Gekko Trading Bot dataset dumps. Ready to use and download history files in SQLi

Perl163mit

6 years ago

backtestbacktesterbacktesting

kitti_ros

kitti_ros

A ROS-based player to replay KiTTI dataset. http://www.cvlibs.net/datasets/kitti

Python29

3 years ago

CryptoKnight

Cryptographic Dataset Generation & Modelling Framework

Python37apache-2.0

4 years ago

academic-projectcryptographydeep-learning

PyPackerDetect

A malware dataset curation tool which helps identify packed samples.

Python27agpl-3.0

5 years ago

malwarepackerpefile

rc-data

Question answering dataset featured in "Teaching Machines to Read and Comprehend

Python1283apache-2.0

7 years ago

mongo_smasher

A small tool to generate randomized datasets

C++33mit

8 years ago

kitti_to_rosbag

Dataset tools for working with the KITTI dataset raw data ( http://www.cvlibs.ne

C++242

5 years ago

aledataset

Scripts to generate a dataset with static frames from the Arcade Learning Enviro

Lua18

10 years ago

rc-data

Question answering dataset featured in "Teaching Machines to Read and Comprehend

Python1287apache-2.0

7 years ago

mzdata

Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History

Python7apache-2.0

8 years ago

B3FD

B3FD

Biometrically Filtered Famous Figure Dataset

5

2 years ago

textgenrnn

textgenrnn

Easily train your own text-generating neural network of any size and complexity

Python4938other

2 years ago

deep-learningkeraspython

forex.analytics

Node.js native library performing technical analysis over an OHLC dataset with u

C180mit

5 years ago

steam_reviews

Video game review datasets scraped from the Steam website (http://store.steampow

38mit

8 years ago

mongolastic

:traffic_light: A dataset migration tool from MongoDB to Elasticsearch and vice

Java138mit

3 years ago

connectorconverterelasticsearch

ScrollableGraphView

ScrollableGraphView

An adaptive scrollable graph view for iOS to visualise simple discrete datasets.

Swift5308mit

3 years ago

3w_dataset

The first realistic and public dataset with rare undesirable real events in oil

Jupyter Notebook102mit

2 years ago

classificationdatasetevent-management

FakeNewsCorpus

A dataset of millions of news articles scraped from a curated list of data sourc

366apache-2.0

4 years ago

artificial-intelligencecorpusdatabase

SOREL-20M

Sophos-ReversingLabs 20 million sample dataset

Python593apache-2.0

3 years ago

kitti2bag

kitti2bag

Convert KITTI dataset to ROS bag file the easy way!

Python680mit

last year

converterkittikitti-data

StarData

Starcraft AI Research Dataset

Python561other

3 years ago

alex_context_nlg_dataset

Dataset for NLG which contains preceding context along with each generation inst

23

8 years ago

CubePlusPlus

CubePlusPlus

Cube++ is a novel dataset collected for illumination estimation problem. It has

Python48

3 years ago

color-constancydatasetillumination-estimation

quizzn

This is an open-source quiz application. The current dataset is for world capit

Java17other

9 years ago

waymo_ros

This is a ROS package to connect Waymo open dataset to ROS

Jupyter Notebook11mit

4 years ago

GraphQuestions

A characteristic-rich dataset for factoid question answering described in the pa

ReScript88other

last year

newsqa

Tools for using Maluuba's NewsQA Dataset (public version)

Python250other

last year

38-Cloud-A-Cloud-Segmentation-Dataset

This data set includes Landsat 8 images and their manually extracted pixel-level

MATLAB134apache-2.0

4 years ago

visualize_ML

visualize_ML

Python package for consolidated and extensive Univariate,Bivariate Data Analysis

Python187mit

8 years ago

data-analysismachine-learningmatplotlib

DZNEmptyDataSet

DZNEmptyDataSet

A drop-in UITableView/UICollectionView superclass category for showing empty dat

Objective-C12105mit

2 years ago

narrativeqa

This repository contains the NarrativeQA dataset. It includes the list of docume

Shell414apache-2.0

4 years ago

DBNet

DBNet

DBNet: A Large-Scale Dataset for Driving Behavior Learning, CVPR 2018

Python211apache-2.0

5 years ago

autonomous-drivingbenchmarkcvpr2018

pem-dataset1

pem-dataset1

Proton Exchange Membrane (PEM) Fuel Cell Dataset

Jupyter Notebook74cc-by-4.0

2 years ago

activation-procedurechemistrydata

ELI5

ELI5

Scripts and links to recreate the ELI5 dataset.

Python310other

3 years ago

narrativeqa

This repository contains the NarrativeQA dataset. It includes the list of docume

Shell431apache-2.0

4 years ago

HAR-stacked-residual-bidir-LSTMs

HAR-stacked-residual-bidir-LSTMs

Using deep stacked residual bidirectional LSTM cells (RNN) with TensorFlow, we d

Python308apache-2.0

last year

bidirectional-lstm-cellshuman-activity-recognitionlstm

awesome-nlp-polish

awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in po

274mit

3 years ago

datasetsnlpnlp-machine-learning

lightning-bolts

lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.

Python1509apache-2.0

last year

aiganimage-processing

fma

FMA: A Dataset For Music Analysis

Jupyter Notebook2056mit

last year

datasetdeep-learningmusic-analysis

MORED

MORED

A Moroccan Buildings’ Electricity Consumption Dataset. MORED is made available

9

2 years ago

consumption-datadatasetelectricity-consumption

awesome-face

awesome-face

😎 face releated algorithm, dataset and paper

871mit

5 years ago

datasetfaceface-detection

revise-tool

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets --- https://

Jupyter Notebook107mit

2 years ago

GTFS-Data-Pipeline-TfNSW-Bus

GTFS-Data-Pipeline-TfNSW-Bus

GTFS Data Pipeline for TfNSW Bus Datasets

Jupyter Notebook6

last year

data-pipelinedatapipelinegtfs

gtfs-fares-v2-validator

Validates GTFS fares-v2 datasets

Python6mit

2 years ago

DDAD

DDAD

Dense Depth for Autonomous Driving (DDAD) dataset.

Python469other

3 years ago

goldeneye

Python implementation of the goldeneye algorithm to investigate how classifiers

Python2mit

6 years ago

data-sciencemodel-explanation

GTFS-viz

GTFS-viz

Converts a GTFS dataset into a SQLite DB + GeoJSONs / KMLs

Ruby85mit

8 years ago

CODAH

Repository for the CODAH dataset

Python21

last year

covid-19-data

COVID-19 datasets are constructed entirely from primary (government and public a

108other

3 years ago

2019-ncovcoronaviruscovid-19

Imitation-Learning-Dagger-Torcs

Imitation-Learning-Dagger-Torcs

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Tor

Python69

7 years ago

daggergym-torcsimitation-learning

geojson-join

Join a stream of GeoJSON against a dataset.

JavaScript35

6 years ago

dnddata

dnddata

Weekly updated dataset of D&D characters submitted to https://oganm.com/shiny/pr

R99mit

2 years ago

5ednddnd-characters

argoverse-api

argoverse-api

Official GitHub repository for Argoverse dataset

Python661other

last year

PASS

PASS

The PASS dataset: pretrained models and how to get the data

Python258mit

2 years ago

computer-visionrepresentation-learningself-supervised-learning

LSTM-Human-Activity-Recognition

LSTM-Human-Activity-Recognition

Human Activity Recognition example using TensorFlow on smartphone sensors datase

Jupyter Notebook3256mit

last year

activity-recognitiondeep-learninghuman-activity-recognition

Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video

702mit

2 years ago

deep-learningmachine-learningmultimodal-learning

extract-gtfs-shapes

Extract shapes from a GTFS dataset.

JavaScript5isc

3 years ago

cligeojsongtfs

gtfs-utils

Read & analyze GTFS datasets using Node.js.

JavaScript33isc

2 years ago

gtfspublic-transporttransit

ICON

ICON

R package that provides complex systems datasets from the Colorado Index of Comp

R6other

3 years ago

beginner-friendlybeginners-friendlybeginners-welcome

WLEmptyState

WLEmptyState

WLEmptyState is an iOS based component that lets you customize the view when the

Swift316mit

2 years ago

empty-stateemptydatasetios

covid-19

covid-19

Novel Coronavirus 2019 time series data on cases

Python1154

2 years ago

coronaviruscoronavirus-diseasecovid