elasticsearch-hadoop
:elephant: Elasticsearch real-time search and analytics natively integrated wit
Java1920apache-2.0
2 months ago
spatial-framework-for-hadoop
The Spatial Framework for Hadoop allows developers and data scientists to use th
Java359apache-2.0
last year
data-managementspatial-analysis
hiho
Hadoop Data Integration with various databases, ftp servers, salesforce. Increme
Java90apache-2.0
11 years ago
awesome-hadoop
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources
1071
2 years ago
gis-tools-for-hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of b
509apache-2.0
2 years ago
hadoopspatial-analysis
HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Java9
6 years ago
hadoopsparkwarc
bigfatlm
Hadoop MapReduce training of modified Kneser-Ney smoothed language models
Java30lgpl-3.0
6 years ago
crunch
A fast to develop, fast to run, Go based toolkit for ETL and feature extraction
Go213
9 years ago
elephantdb
Distributed database specialized in exporting key/value data from Hadoop
Java555bsd-3-clause
10 years ago
DuctileDB
Ductile DB is a graph database based on Hadoop/HBase which provides a vast set o
Java13apache-2.0
6 years ago
daudit
🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Python104mit
4 years ago
auditingbigdatahadoop-spark
elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and H
Java1137apache-2.0
last year
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library
C++25350apache-2.0
2 months ago
distributed-systemsgbdtgbm
glow
Glow is an easy-to-use distributed computation system written in Go, similar to
Go3179
6 years ago
webarchive-indexing
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
Python40mit
6 years ago
netlytics
NetLytics is a Hadoop-powered framework for performing advanced analytics on var
Python10gpl-3.0
6 years ago
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. I
Python17120apache-2.0
3 months ago
hadoopluigiorchestration-framework
seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and da
Go21088apache-2.0
3 days ago
blob-storagecloud-drivedistributed-file-system
schedoscope
Schedoscope is a scheduling framework for painfree agile development, testing, (
Scala95apache-2.0
4 years ago
dev-setup
macOS development environment setup: Easy-to-understand instructions with autom
Python6014other
last year
android-developmentawsbash
elasticsearch-hadoop
:elephant: Elasticsearch real-time search and analytics natively integrated wit
Java1920apache-2.0
2 months ago
spatial-framework-for-hadoop
The Spatial Framework for Hadoop allows developers and data scientists to use th
Java359apache-2.0
last year
data-managementspatial-analysis
hiho
Hadoop Data Integration with various databases, ftp servers, salesforce. Increme
Java90apache-2.0
11 years ago
awesome-hadoop
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources
1071
2 years ago
gis-tools-for-hadoop
The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of b
509apache-2.0
2 years ago
hadoopspatial-analysis
HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Java9
6 years ago
hadoopsparkwarc
bigfatlm
Hadoop MapReduce training of modified Kneser-Ney smoothed language models
Java30lgpl-3.0
6 years ago
crunch
A fast to develop, fast to run, Go based toolkit for ETL and feature extraction
Go213
9 years ago
elephantdb
Distributed database specialized in exporting key/value data from Hadoop
Java555bsd-3-clause
10 years ago
DuctileDB
Ductile DB is a graph database based on Hadoop/HBase which provides a vast set o
Java13apache-2.0
6 years ago
daudit
🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Python104mit
4 years ago
auditingbigdatahadoop-spark
elephant-bird
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and H
Java1137apache-2.0
last year
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library
C++25350apache-2.0
2 months ago
distributed-systemsgbdtgbm
glow
Glow is an easy-to-use distributed computation system written in Go, similar to
Go3179
6 years ago
webarchive-indexing
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
Python40mit
6 years ago
netlytics
NetLytics is a Hadoop-powered framework for performing advanced analytics on var
Python10gpl-3.0
6 years ago
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. I
Python17120apache-2.0
3 months ago
hadoopluigiorchestration-framework
seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and da
Go21088apache-2.0
3 days ago
blob-storagecloud-drivedistributed-file-system
schedoscope
Schedoscope is a scheduling framework for painfree agile development, testing, (
Scala95apache-2.0
4 years ago
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras),
Python25938other
7 months ago
awsbig-datacaffe
dev-setup
macOS development environment setup: Easy-to-understand instructions with autom
Python6014other
last year
android-developmentawsbash