tech.ml.dataset

A Clojure high performance data processing system

Clojure681epl-1.0

5 days ago

clojurecsvdataframe

dataset-api

dataset-api

The ApolloScape Open Dataset for Autonomous Driving and its Application.

Jupyter Notebook550apache-2.0

7 months ago

3d-car-instance3d-lidarapolloscape-dataset

dbfc-dataset

dbfc-dataset

Single DBFC Dataset

Jupyter Notebook23cc-by-4.0

8 months ago

catalystchemistrydata

dataset-packed-pe

dataset-packed-pe

Dataset of packed PE samples

Python23

4 months ago

binary-analysisdatasetexecutable-packing

dataset-serialize

dataset-serialize

JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)

Pascal649mit

last month

child-datasetconverterdataset

tablecloth

Dataset manipulation library built on the top of tech.ml.dataset

HTML303mit

2 months ago

clojuredataframedataset

waymo-open-dataset

waymo-open-dataset

Waymo Open Dataset

Python2637other

4 months ago

autonomous-drivingdataset

vue-dataset

A set of Vue.js components to display datasets (lists) with filtering, paging, a

JavaScript220mit

5 months ago

datagriddatasetdatatable

netex-gtfs-converter-java

Convert a NeTEx dataset into a GTFS dataset

Java5eupl-1.2

4 months ago

ror

TCPD

The Turing Change Point Dataset - A collection of time series for the evaluation

Python136mit

last month

change-detectionchange-pointchange-point-detection

ffridataset-scripts

Make datasets like FFRI Dataset

Python10apache-2.0

4 months ago

B3FD

B3FD

Biometrically Filtered Famous Figure Dataset

7

3 months ago

MaleX

MaleX

A curated dataset of malware and benign Windows executable samples for malware r

HTML37gpl-3.0

11 months ago

deep-learningimage-classificationmachine-learning

kitti2bag

kitti2bag

Convert KITTI dataset to ROS bag file the easy way!

Python710mit

5 months ago

converterkittikitti-data

pyboreas

pyboreas

Devkit for the Boreas autonomous driving dataset.

Python89bsd-3-clause

4 months ago

frostline

frostline

A dataset, API, and parser for USDA plant hardiness zones.

Python146mit

last year

apifarmgarden

dockstring

A Python package for molecular docking with an extensive, highly-curated dataset

Python146apache-2.0

4 months ago

utbm_robocar_dataset

utbm_robocar_dataset

EU Long-term Dataset with Multiple Sensors for Autonomous Driving

C++219

4 months ago

autonomous-drivingdatasetlidar-odometry

maptable

maptable

JS library which converts any dataset to an interactive set of components: a cho

JavaScript58mit

5 months ago

transfermarkt-datasets

transfermarkt-datasets

⚽️ Extract, prepare and publish Transfermarkt datasets.

Python247cc0-1.0

2 days ago

analyticsdatasetdbt

awesome-public-datasets

A topic-centric list of HQ open datasets.

61095mit

8 days ago

aaron-swartzawesome-public-datasetsdatasets

LLVIP

LLVIP

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

Jupyter Notebook641

10 months ago

cnncomputer-visiondeep-learning

pem-dataset1

pem-dataset1

Proton Exchange Membrane (PEM) Fuel Cell Dataset

Jupyter Notebook86cc-by-4.0

8 months ago

activation-procedurechemistrydata

game-datasets

:video_game: A curated list of awesome game datasets, and tools to artificial in

744cc-by-4.0

11 days ago

artificial-intelligenceawesomeawesome-game

domains

domains

World’s single largest Internet domains dataset

HTML712bsd-3-clause

3 days ago

collydatasetinternet-domains

nuscenes-devkit

nuscenes-devkit

The devkit of the nuScenes dataset.

Python2213other

4 months ago

argoverse-api

argoverse-api

Official GitHub repository for Argoverse dataset

Python836other

11 months ago

extract-gtfs-pathways

Command-line tool to extract pathways from a GTFS dataset.

JavaScript3isc

5 months ago

accessibilitycligeojson

duecredit

duecredit

Automated collection and reporting of citations for used software/methods/datase

Python234other

4 months ago

DataProfiler

DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

Python1434apache-2.0

7 days ago

avrocsvdata-analysis

covid-19

covid-19

Novel Coronavirus 2019 time series data on cases

Python1162

27 days ago

coronaviruscoronavirus-diseasecovid

perspective

A data visualization and analytics component, especially well-suited for large a

C++8117apache-2.0

4 months ago

analyticsbidata-visualization

meteor-tabular

Reactive datatables for large or small datasets

JavaScript363mit

4 months ago

blazedatatablemeteorjs

tonic

tonic

Publicly available event datasets and transforms.

Python211gpl-3.0

3 months ago

augmentationdatasetsevent-based

xarray

xarray

N-D labeled arrays and datasets in Python

Python3517apache-2.0

4 months ago

dasknetcdfnumpy

Deep-Learning-for-Tracking-and-Detection

Deep-Learning-for-Tracking-and-Detection

Collection of papers, datasets, code and other resources for object tracking and

HTML2439

6 months ago

code-collectiondeep-learningdetection

deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for dat

Scala3310apache-2.0

last month

dataqualityscalaspark

3d-tiles

3d-tiles

Specification for streaming massive heterogeneous 3D geospatial datasets :earth_

Batchfile2064

4 months ago

3d-models3d-tilesgeospatial

bloomfilter-rb

bloomfilter-rb

BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting fil

C472

8 months ago

lightning-bolts

lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.

Python1695apache-2.0

16 days ago

aiganimage-processing

hollow

hollow

Hollow is a java library and toolset for disseminating in-memory datasets from a

Java1188apache-2.0

4 months ago

potree

potree

WebGL point cloud viewer for large datasets

JavaScript4425other

4 months ago

docker-packing-box

docker-packing-box

Docker image gathering packers and tools for making datasets of packed executabl

Python44gpl-3.0

4 months ago

binary-analysisdataset-generationdocker-image

entwine

entwine

Entwine - point cloud organization for massive datasets

C++434other

6 months ago

torchxrayvision

torchxrayvision

TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segm

Jupyter Notebook863apache-2.0

6 months ago

chest-radiographschest-xraychest-xray-images

NLP-progress

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including

Python22720mit

4 months ago

dialoguemachine-learningmachine-translation

awesome-lidar

😎 Awesome LIDAR list. The list includes LIDAR manufacturers, datasets, point cl

865cc0-1.0

4 months ago

3d3d-lidarautonomous-driving

awesome-transit

awesome-transit

Description Real-time transit information for the Puget Sound region and beyo

1297cc0-1.0

5 months ago

awesomeawesome-listbus

kitti_to_rosbag

Dataset tools for working with the KITTI dataset raw data ( http://www.cvlibs.ne

C++248

6 years ago

clothing-dataset

clothing-dataset

Closing dataset, all classes

105cc0-1.0

4 years ago

dataset

Crop/Weed Field Image Dataset

131

10 years ago

agricultureannotationsclassification

Analytics-Cloud-Dataset-Utils

Friendly utility to load your on-prem data, whether large or small, to Einstein

JavaScript128other

last year

quickdraw-dataset

quickdraw-dataset

Documentation on how to access and use the Quick, Draw! Dataset.

6178other

last year

datasetquickdraw-dataset

awesome-dataset-tools

🔧 A curated list of awesome dataset tools

856mit

last year

annotation-toolannotationsawsome

dataset

The Open Images dataset

Python4264apache-2.0

3 years ago

dataset-packed-elf

Dataset of packed ELF samples

14

2 years ago

binary-analysisdatasetelf-binaries

skytrax-reviews-dataset

An air travel dataset consisting of user reviews from Skytrax (www.airlinequalit

Python72cc0-1.0

9 years ago

Brno-Urban-Dataset

Brno-Urban-Dataset

Navigation and localisation dataset for self driving cars and autonomous robots

146mit

3 years ago

lemon-dataset

lemon-dataset

Lemons quality control dataset

102

4 years ago

datasetlemonademachine-learning

gtfs-rt-differential-to-full-dataset

Transform a differential GTFS Realtime feed into a full dataset/dump.

JavaScript3isc

3 years ago

differentialgtfs-realtimegtfs-rt

Objectron

Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the vi

Jupyter Notebook2226other

2 years ago

3d3d-reconstruction3d-vision

gta-3d-dataset

gta-3d-dataset

A dataset of 2D imagery, 3D point cloud data, and 3D vehicle bounding box labels

Python134

2 years ago

Hyperopt-Keras-CNN-CIFAR-100

Hyperopt-Keras-CNN-CIFAR-100

Auto-optimizing a neural net (and its architecture) on the CIFAR-100 dataset. Co

Python106other

7 years ago

cnncnn-kerashyperopt

All-Age-Faces-Dataset

All-Age-Faces-Dataset

All-Age-Faces (AAF) Database.

181

6 years ago

ocl-dataset

A Data Set of OCL Expressions on GitHub

Java4

2 years ago

CryptoKnight

Cryptographic Dataset Generation & Modelling Framework

Python38apache-2.0

5 years ago

academic-projectcryptographydeep-learning

kitti360Scripts

This repository contains utility scripts for the KITTI-360 dataset.

Python374mit

2 years ago

PyPackerDetect

A malware dataset curation tool which helps identify packed samples.

Python28agpl-3.0

6 years ago

malwarepackerpefile

AVData

AVData

Autonomous Vehicle Seasonal Dataset

C++291mit

last year

rc-data

Question answering dataset featured in "Teaching Machines to Read and Comprehend

Python1293apache-2.0

8 years ago

mzdata

Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History

Python7apache-2.0

9 years ago

medal

medal

Large medical text dataset curated for abbreviation disambiguation, designed for

Python246

last year

textgenrnn

textgenrnn

Easily train your own text-generating neural network of any size and complexity

Python4945other

2 years ago

deep-learningkeraspython

forex.analytics

Node.js native library performing technical analysis over an OHLC dataset with u

C182mit

5 years ago

mongolastic

:traffic_light: A dataset migration tool from MongoDB to Elasticsearch and vice

Java137mit

4 years ago

connectorconverterelasticsearch

3w_dataset

The first realistic and public dataset with rare undesirable real events in oil

Jupyter Notebook110mit

2 years ago

classificationdatasetevent-management

FakeNewsCorpus

A dataset of millions of news articles scraped from a curated list of data sourc

386apache-2.0

5 years ago

artificial-intelligencecorpusdatabase

SOREL-20M

Sophos-ReversingLabs 20 million sample dataset

Python621apache-2.0

4 years ago

StarData

Starcraft AI Research Dataset

Python569other

3 years ago

alex_context_nlg_dataset

Dataset for NLG which contains preceding context along with each generation inst

23

8 years ago

CubePlusPlus

CubePlusPlus

Cube++ is a novel dataset collected for illumination estimation problem. It has

Python51

4 years ago

color-constancydatasetillumination-estimation

quizzn

This is an open-source quiz application. The current dataset is for world capit

Java17other

10 years ago

waymo_ros

This is a ROS package to connect Waymo open dataset to ROS

Jupyter Notebook11mit

4 years ago

GraphQuestions

A characteristic-rich dataset for factoid question answering described in the pa

ReScript92other

2 years ago

newsqa

Tools for using Maluuba's NewsQA Dataset (public version)

Python253other

2 years ago

38-Cloud-A-Cloud-Segmentation-Dataset

This data set includes Landsat 8 images and their manually extracted pixel-level

MATLAB150apache-2.0

4 years ago

kitti_ros

kitti_ros

A ROS-based player to replay KiTTI dataset. http://www.cvlibs.net/datasets/kitti

Python29

4 years ago

Chatito

Chatito

🎯🗯 Dataset generation for AI chatbots, NLP tasks, named entity recognition or

TypeScript877mit

last year

chatbotchatbotschatito

awesome-robotics-datasets

A collection of useful datasets for robotics and computer vision

373

3 years ago

computer-visiondatasetrobotics

datasets-games

Datasets from a variety of games.

13

last year

datagame

DBNet

DBNet

DBNet: A Large-Scale Dataset for Driving Behavior Learning, CVPR 2018

Python214apache-2.0

6 years ago

autonomous-drivingbenchmarkcvpr2018

ELI5

ELI5

Scripts and links to recreate the ELI5 dataset.

Python319other

3 years ago

narrativeqa

This repository contains the NarrativeQA dataset. It includes the list of docume

Shell459apache-2.0

5 years ago

dstc8-schema-guided-dialogue

The Schema-Guided Dialogue Dataset

Python549cc-by-sa-4.0

last year

assistantdatasetdialogue

nlp-datasets

A list of datasets/corpora for NLP tasks, in reverse chronological order.

919

5 years ago

fma

FMA: A Dataset For Music Analysis

Jupyter Notebook2248mit

2 years ago

datasetdeep-learningmusic-analysis

MORED

MORED

A Moroccan Buildings’ Electricity Consumption Dataset. MORED is made available

13

2 years ago

consumption-datadatasetelectricity-consumption

nlp-datasets

Alphabetical list of free/public domain datasets with text data for use in Natur

5780

2 years ago

python-dsff

python-dsff

DataSet File Format (DSFF)

Python1gpl-3.0

last year

datasetdsfffile-format

awesome-face

awesome-face

😎 face releated algorithm, dataset and paper

894mit

5 years ago

datasetfaceface-detection

DDAD

DDAD

Dense Depth for Autonomous Driving (DDAD) dataset.

Python487other

4 years ago

goldeneye

Python implementation of the goldeneye algorithm to investigate how classifiers

Python2mit

7 years ago

data-sciencemodel-explanation

GTFS-viz

GTFS-viz

Converts a GTFS dataset into a SQLite DB + GeoJSONs / KMLs

Ruby85mit

9 years ago

CODAH

Repository for the CODAH dataset

Python22

2 years ago

Gekko-Datasets

Gekko-Datasets

Gekko Trading Bot dataset dumps. Ready to use and download history files in SQLi

Perl170mit

6 years ago

backtestbacktesterbacktesting

Imitation-Learning-Dagger-Torcs

Imitation-Learning-Dagger-Torcs

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Tor

Python71

7 years ago

daggergym-torcsimitation-learning

geojson-join

Join a stream of GeoJSON against a dataset.

JavaScript35

6 years ago

dnddata

dnddata

Weekly updated dataset of D&D characters submitted to https://oganm.com/shiny/pr

R108mit

2 years ago

5ednddnd-characters

PASS

PASS

The PASS dataset: pretrained models and how to get the data

Python262mit

2 years ago

computer-visionrepresentation-learningself-supervised-learning

LSTM-Human-Activity-Recognition

LSTM-Human-Activity-Recognition

Human Activity Recognition example using TensorFlow on smartphone sensors datase

Jupyter Notebook3350mit

2 years ago

activity-recognitiondeep-learninghuman-activity-recognition

extract-gtfs-shapes

Extract shapes from a GTFS dataset.

JavaScript5isc

4 years ago

cligeojsongtfs

WLEmptyState

WLEmptyState

WLEmptyState is an iOS based component that lets you customize the view when the

Swift318mit

2 years ago

empty-stateemptydatasetios

congresstweets

Datasets of the daily Twitter output of Congress.

SCSS105mit

last year

congresshousehouse-of-representatives

mongo_smasher

A small tool to generate randomized datasets

C++34mit

9 years ago

gbfs-tools-and-resources

Community list of shared micromobility APIs, apps, datasets, research, and softw

31cc0-1.0

2 years ago

steam_reviews

Video game review datasets scraped from the Steam website (http://store.steampow

40mit

9 years ago

ScrollableGraphView

ScrollableGraphView

An adaptive scrollable graph view for iOS to visualise simple discrete datasets.

Swift5317mit

4 years ago

combine_gtfs_feeds

A tool to combine gtfs datasets into one feed.

Python6mit

2 years ago

wormtable

Write-once-read-many table for large datasets.

Python28lgpl-3.0

last year

visualize_ML

visualize_ML

Python package for consolidated and extensive Univariate,Bivariate Data Analysis

Python200mit

8 years ago

data-analysismachine-learningmatplotlib

DZNEmptyDataSet

DZNEmptyDataSet

A drop-in UITableView/UICollectionView superclass category for showing empty dat

Objective-C12086mit

2 years ago

ambrosia

clean up your LLM datasets

Go114mit

last year

pointcloudset

pointcloudset

Efficient analysis of large datasets of point clouds recorded over time

Python43mit

2 years ago

3d4d4d-point-cloud

JAX-Flax-Tutorial-Image-Classification-with-Linen

How to use the Flax Linen API to build a convolutional neural network model and

Jupyter Notebook24

last year

HAR-stacked-residual-bidir-LSTMs

HAR-stacked-residual-bidir-LSTMs

Using deep stacked residual bidirectional LSTM cells (RNN) with TensorFlow, we d

Python319apache-2.0

2 years ago

bidirectional-lstm-cellshuman-activity-recognitionlstm

awesome-nlp-polish

awesome-nlp-polish

A curated list of resources dedicated to Natural Language Processing (NLP) in po

294mit

3 years ago

datasetsnlpnlp-machine-learning

revise-tool

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets --- https://

Jupyter Notebook111mit

2 years ago

GTFS-Data-Pipeline-TfNSW-Bus

GTFS-Data-Pipeline-TfNSW-Bus

GTFS Data Pipeline for TfNSW Bus Datasets

Jupyter Notebook7

2 years ago

data-pipelinedatapipelinegtfs

gtfs-fares-v2-validator

Validates GTFS fares-v2 datasets

Python6mit

2 years ago

covid-19-data

COVID-19 datasets are constructed entirely from primary (government and public a

110other

4 years ago

2019-ncovcoronaviruscovid-19

Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video

764mit

3 years ago

deep-learningmachine-learningmultimodal-learning

gtfs-utils

Read & analyze GTFS datasets using Node.js.

JavaScript35isc

3 years ago

gtfspublic-transporttransit

ICON

ICON

R package that provides complex systems datasets from the Colorado Index of Comp

R7other

4 years ago

beginner-friendlybeginners-friendlybeginners-welcome