spark2014
SPARK 2014 is the new version of SPARK, a software development technology specif
Ada248gpl-3.0
24 days ago
awesome-spark
A curated list of awesome Apache Spark packages and resources.
Shell1722cc0-1.0
29 days ago
apache-sparkawesomepyspark
spark-gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
360other
7 years ago
apache-sparkbookguide
spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
Python1079apache-2.0
5 years ago
apache-sparkgrid-searchmachine-learning
spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra
Scala1943apache-2.0
3 months ago
cassandrascalaspark
spark-cassandra-stress
A tool for testing the DataStax Spark Connector against Apache Cassandra or DSE
Scala25apache-2.0
2 years ago
spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers
C#2025mit
4 months ago
analyticsapache-sparkazure
shc
The Apache Spark - Apache HBase Connector is a library to support Spark accessin
Scala552apache-2.0
4 years ago
dbscan-on-spark
An implementation of DBSCAN runing on top of Apache Spark
Scala183apache-2.0
7 years ago
spark-nlp
State of the Art Natural Language Processing
Scala3772apache-2.0
4 months ago
albertbertentity-extraction
jpmml-evaluator-spark
PMML evaluator library for the Apache Spark cluster computing system (http://spa
Java94agpl-3.0
3 years ago
EMR_Spark_Automation
A repository for deploying an AWS EMR cluster and submiting spark jobs on it. Bo
Python8apache-2.0
7 years ago
kotlin-spark-api
This projects gives Kotlin bindings and several extensions for Apache Spark. We
Kotlin461apache-2.0
5 months ago
bigdatakotlinnullability
spark-fast-tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and
Scala436mit
13 days ago
sparktesting-framework
spark-fast-tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and
Scala421mit
7 months ago
sparktesting-framework
neo4j-spark-connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write acces
Scala312apache-2.0
2 months ago
boltcypherhacktoberfest
neo4j-spark-connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write acces
Scala313apache-2.0
8 days ago
boltcypherhacktoberfest
spark-connect-rs
Apache Spark Connect Client for Rust
Rust90apache-2.0
20 days ago
grpc-clientsparkspark-connect
spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
JavaScript3150apache-2.0
2 years ago
apache-sparkdata-sciencenotebook
deep-spark
Connecting Apache Spark with different data stores [DEPRECATED]
Java197apache-2.0
8 years ago
spark-by-example
SPARK by Example is an adaptation of ACSL by Example for SPARK 2014, a programmi
Ada152
2 years ago
adaformal-methodsformal-specification
spark-riak-connector
The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV
Scala60apache-2.0
8 years ago
gatling-sql
Gatling Extension for JDBC or Spark Thrift Server stress tests
Scala6apache-2.0
4 years ago
gatlingjdbcstress-testing
sample-SparkJobserverCassandra
Simple sample job illustrating the use of Spark Jobserver to execute Apache Spar
Scala2apache-2.0
9 years ago
netapp-public
delight
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Del
Scala342other
6 months ago
apache-sparkcpudashboard
ada_language_server
Server implementing the Microsoft Language Protocol for Ada and SPARK
Ada235gpl-3.0
25 days ago
twut
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Ap
Scala9apache-2.0
2 years ago
apache-sparksparkspark-packages
gneiss
Framework for platform-independent SPARK components
Ada22agpl-3.0
4 years ago
adacomponent-basedembedded
tensorframes
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
Scala749apache-2.0
4 months ago
magellan
Geo Spatial Data Analytics on Spark
Scala533apache-2.0
3 years ago
big-datageojsongeometric-algorithms
ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as deriva
Scala145mit
2 months ago
archivesparkinternet-archivespark
sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Scala28apache-2.0
5 years ago
datapipelinesparkspark-sql
sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Python1329other
8 days ago
clusterjupyterjupyter-notebook
Mobius
C# and F# language binding and extensions to Apache Spark
C#941mit
10 months ago
apache-sparkbigdatacsharp
flintrock
A command-line tool for launching Apache Spark clusters.
Python638apache-2.0
5 months ago
apache-sparkapache-spark-clusterec2
sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Scala524apache-2.0
5 years ago
analyticshdfskafka
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open so
R1869mit
2 years ago
data-sciencedeep-learninggradient-boosting-machine
pyspark-stubs
Apache (Py)Spark type annotations (stub files).
Python115apache-2.0
2 years ago
apache-sparkmypypep484
geni
A Clojure dataframe library that runs on Spark
Clojure286apache-2.0
12 months ago
big-dataclojureclojure-library
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Sp
Scala130apache-2.0
4 years ago
aiartificial-intelligencebig-data
spindle
Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript331apache-2.0
10 years ago
datacompy
Pandas, Polars, and Spark DataFrame comparison for humans and more!
Python473apache-2.0
2 months ago
comparedaskdata
wildfire
🔥From a little spark may burst a flame.
CSS178gpl-3.0
6 years ago
comment-plugincommentsfirebase
osm4scala
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Scala79mit
last year
gisopenstreetmapopenstreetmap-pbf-files
crossdata
DISCONTINUED - Easy access to big things. Library for Apache Spark extending and
Scala169apache-2.0
5 years ago
pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Python260other
2 years ago
apache-sparkdata-processingdata-science
strong-together
A starter project to build single page Vue.js apps as stand-alone or for Laravel
CSS89mit
7 years ago
itachi
A library that brings useful functions from various modern database management s
Scala56apache-2.0
last year
hivepostgrespresto
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for dat
Scala3310apache-2.0
last month
dataqualityscalaspark
glow
Glow is an easy-to-use distributed computation system written in Go, similar to
Go3193
6 years ago
koalas
Koalas: pandas API on Apache Spark
Python3339apache-2.0
8 months ago
big-datadata-sciencedataframe
delta
An open-source storage framework that enables building a Lakehouse architecture
Scala7607apache-2.0
3 days ago
acidanalyticsbig-data
sample-KafkaSparkCassandra
Introductory sample scala app using Apache Spark Streaming to accept data from K
Scala23
6 years ago
netapp-public
LiFT
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the m
Scala168bsd-2-clause
2 years ago
fairnessfairness-aifairness-ml
awesome-ada
A curated list of awesome resources related to the Ada and SPARK programming lan
615cc0-1.0
4 months ago
adaada-bindingada-framework
RoaringBitmap
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache
Java3548apache-2.0
14 days ago
bitsetdruidjava
streaming-benchmarks
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache
Jupyter Notebook633apache-2.0
11 months ago
benchmarkslow-latencystreaming
TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Python3873apache-2.0
last year
clusterfeaturedmachine-learning
kafka-sparkstreaming-cassandra
Docker container for Kafka - Spark Streaming - Cassandra
Jupyter Notebook98
5 years ago
neo4j-mazerunner
Mazerunner extends a Neo4j graph database to run scheduled big data graph comput
Java381apache-2.0
2 years ago
incubator-livy
Apache Livy is an open source REST interface for interacting with Apache Spark f
Scala889apache-2.0
9 days ago
apachelivybigdatalivy
dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and
Python623gpl-3.0
6 years ago
apache-sparkdata-parallelismdata-science
livy
Livy is an open source REST interface for interacting with Apache Spark from any
Scala1010
2 years ago
sparkling-water
Sparkling Water provides H2O functionality inside Spark cluster
Scala968apache-2.0
2 days ago
big-datah2ointegration
vue-info-card
Simple and beautiful card component with an elegant spark line, for VueJS.
JavaScript192mit
2 years ago
cardcard-componentcomponent
oryx
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large sc
Java1787apache-2.0
3 years ago
apache-kafkaapache-sparkcloudera
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and
Python12270mit
4 months ago
big-data-analyticsdata-analysisdata-exploration
adam
ADAM is a genomics analysis platform with specialized file formats built using A
Scala1003apache-2.0
30 days ago
avrobig-databioinformatics
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras),
Python27493other
8 months ago
awsbig-datacaffe
dev-setup
macOS development environment setup: Easy-to-understand instructions with autom
Python6130other
2 years ago
android-developmentawsbash
scylla-migrator
Migrate data extract using Spark to Scylla, normally from Cassandra/parquet file
Scala58apache-2.0
2 months ago
alternatordynamodbmigration
clj-kondo
Static analyzer and linter for Clojure code that sparks joy
Clojure1681epl-1.0
4 months ago
clojureclojurescriptgraalvm
pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
451mit
2 years ago
cheatcheatsheetcheatsheets
Kimera-Semantics
Real-Time 3D Semantic Reconstruction from 2D data
C++635bsd-2-clause
12 months ago
3d-reconstructioncpudepth-image
lang
List of 126 languages for Laravel Framework, Laravel Jetstream, Laravel Fortify,
PHP7499mit
5 days ago
i18nlanguagelaravel
awesome
Description PBS KIDS Games makes learning fun & safe with 250+ educational ga
532cc0-1.0
2 years ago
awesomeawesome-listcraft
spark2014
SPARK 2014 is the new version of SPARK, a software development technology specif
Ada248gpl-3.0
24 days ago
awesome-spark
A curated list of awesome Apache Spark packages and resources.
Shell1722cc0-1.0
29 days ago
apache-sparkawesomepyspark
spark-gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
360other
7 years ago
apache-sparkbookguide
spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
Python1079apache-2.0
5 years ago
apache-sparkgrid-searchmachine-learning
spark-cassandra-connector
DataStax Connector for Apache Spark to Apache Cassandra
Scala1943apache-2.0
3 months ago
cassandrascalaspark
spark-cassandra-stress
A tool for testing the DataStax Spark Connector against Apache Cassandra or DSE
Scala25apache-2.0
2 years ago
spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers
C#2025mit
4 months ago
analyticsapache-sparkazure
shc
The Apache Spark - Apache HBase Connector is a library to support Spark accessin
Scala552apache-2.0
4 years ago
dbscan-on-spark
An implementation of DBSCAN runing on top of Apache Spark
Scala183apache-2.0
7 years ago
spark-nlp
State of the Art Natural Language Processing
Scala3772apache-2.0
4 months ago
albertbertentity-extraction
jpmml-evaluator-spark
PMML evaluator library for the Apache Spark cluster computing system (http://spa
Java94agpl-3.0
3 years ago
EMR_Spark_Automation
A repository for deploying an AWS EMR cluster and submiting spark jobs on it. Bo
Python8apache-2.0
7 years ago
kotlin-spark-api
This projects gives Kotlin bindings and several extensions for Apache Spark. We
Kotlin461apache-2.0
5 months ago
bigdatakotlinnullability
spark-fast-tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and
Scala436mit
13 days ago
sparktesting-framework
spark-fast-tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and
Scala421mit
7 months ago
sparktesting-framework
neo4j-spark-connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write acces
Scala312apache-2.0
2 months ago
boltcypherhacktoberfest
neo4j-spark-connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write acces
Scala313apache-2.0
8 days ago
boltcypherhacktoberfest
spark-connect-rs
Apache Spark Connect Client for Rust
Rust90apache-2.0
20 days ago
grpc-clientsparkspark-connect
spark-notebook
Interactive and Reactive Data Science using Scala and Spark.
JavaScript3150apache-2.0
2 years ago
apache-sparkdata-sciencenotebook
deep-spark
Connecting Apache Spark with different data stores [DEPRECATED]
Java197apache-2.0
8 years ago
spark-by-example
SPARK by Example is an adaptation of ACSL by Example for SPARK 2014, a programmi
Ada152
2 years ago
adaformal-methodsformal-specification
spark-riak-connector
The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV
Scala60apache-2.0
8 years ago
gatling-sql
Gatling Extension for JDBC or Spark Thrift Server stress tests
Scala6apache-2.0
4 years ago
gatlingjdbcstress-testing
delight
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Del
Scala342other
6 months ago
apache-sparkcpudashboard
ada_language_server
Server implementing the Microsoft Language Protocol for Ada and SPARK
Ada235gpl-3.0
25 days ago
twut
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Ap
Scala9apache-2.0
2 years ago
apache-sparksparkspark-packages
gneiss
Framework for platform-independent SPARK components
Ada22agpl-3.0
4 years ago
adacomponent-basedembedded
tensorframes
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
Scala749apache-2.0
4 months ago
magellan
Geo Spatial Data Analytics on Spark
Scala533apache-2.0
3 years ago
big-datageojsongeometric-algorithms
ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as deriva
Scala145mit
2 months ago
archivesparkinternet-archivespark
sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Scala28apache-2.0
5 years ago
datapipelinesparkspark-sql
sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Python1329other
8 days ago
clusterjupyterjupyter-notebook
Mobius
C# and F# language binding and extensions to Apache Spark
C#941mit
10 months ago
apache-sparkbigdatacsharp
flintrock
A command-line tool for launching Apache Spark clusters.
Python638apache-2.0
5 months ago
apache-sparkapache-spark-clusterec2
sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Scala524apache-2.0
5 years ago
analyticshdfskafka
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open so
R1869mit
2 years ago
data-sciencedeep-learninggradient-boosting-machine
pyspark-stubs
Apache (Py)Spark type annotations (stub files).
Python115apache-2.0
2 years ago
apache-sparkmypypep484
geni
A Clojure dataframe library that runs on Spark
Clojure286apache-2.0
12 months ago
big-dataclojureclojure-library
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Sp
Scala130apache-2.0
4 years ago
aiartificial-intelligencebig-data
spindle
Next-generation web analytics processing with Scala, Spark, and Parquet.
JavaScript331apache-2.0
10 years ago
datacompy
Pandas, Polars, and Spark DataFrame comparison for humans and more!
Python473apache-2.0
2 months ago
comparedaskdata
wildfire
🔥From a little spark may burst a flame.
CSS178gpl-3.0
6 years ago
comment-plugincommentsfirebase
osm4scala
Scala and Spark library focused on reading OpenStreetMap Pbf files.
Scala79mit
last year
gisopenstreetmapopenstreetmap-pbf-files
crossdata
DISCONTINUED - Easy access to big things. Library for Apache Spark extending and
Scala169apache-2.0
5 years ago
pysparkling
A pure Python implementation of Apache Spark's RDD and DStream interfaces.
Python260other
2 years ago
apache-sparkdata-processingdata-science
strong-together
A starter project to build single page Vue.js apps as stand-alone or for Laravel
CSS89mit
7 years ago
itachi
A library that brings useful functions from various modern database management s
Scala56apache-2.0
last year
hivepostgrespresto
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for dat
Scala3310apache-2.0
last month
dataqualityscalaspark
glow
Glow is an easy-to-use distributed computation system written in Go, similar to
Go3193
6 years ago
koalas
Koalas: pandas API on Apache Spark
Python3339apache-2.0
8 months ago
big-datadata-sciencedataframe
delta
An open-source storage framework that enables building a Lakehouse architecture
Scala7607apache-2.0
3 days ago
acidanalyticsbig-data
sample-KafkaSparkCassandra
Introductory sample scala app using Apache Spark Streaming to accept data from K
Scala23
6 years ago
netapp-public
LiFT
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the m
Scala168bsd-2-clause
2 years ago
fairnessfairness-aifairness-ml
awesome-ada
A curated list of awesome resources related to the Ada and SPARK programming lan
615cc0-1.0
4 months ago
adaada-bindingada-framework
RoaringBitmap
A better compressed bitset in Java: used by Apache Spark, Netflix Atlas, Apache
Java3548apache-2.0
14 days ago
bitsetdruidjava
streaming-benchmarks
Benchmarks for Low Latency (Streaming) solutions including Apache Storm, Apache
Jupyter Notebook633apache-2.0
11 months ago
benchmarkslow-latencystreaming
TensorFlowOnSpark
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Python3873apache-2.0
last year
clusterfeaturedmachine-learning
kafka-sparkstreaming-cassandra
Docker container for Kafka - Spark Streaming - Cassandra
Jupyter Notebook98
5 years ago
neo4j-mazerunner
Mazerunner extends a Neo4j graph database to run scheduled big data graph comput
Java381apache-2.0
2 years ago
incubator-livy
Apache Livy is an open source REST interface for interacting with Apache Spark f
Scala889apache-2.0
9 days ago
apachelivybigdatalivy
dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras and
Python623gpl-3.0
6 years ago
apache-sparkdata-parallelismdata-science
livy
Livy is an open source REST interface for interacting with Apache Spark from any
Scala1010
2 years ago
sparkling-water
Sparkling Water provides H2O functionality inside Spark cluster
Scala968apache-2.0
2 days ago
big-datah2ointegration
vue-info-card
Simple and beautiful card component with an elegant spark line, for VueJS.
JavaScript192mit
2 years ago
cardcard-componentcomponent
oryx
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large sc
Java1787apache-2.0
3 years ago
apache-kafkaapache-sparkcloudera
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and
Python12270mit
4 months ago
big-data-analyticsdata-analysisdata-exploration
adam
ADAM is a genomics analysis platform with specialized file formats built using A
Scala1003apache-2.0
30 days ago
avrobig-databioinformatics
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras),
Python27493other
8 months ago
awsbig-datacaffe
dev-setup
macOS development environment setup: Easy-to-understand instructions with autom
Python6130other
2 years ago
android-developmentawsbash
scylla-migrator
Migrate data extract using Spark to Scylla, normally from Cassandra/parquet file
Scala58apache-2.0
2 months ago
alternatordynamodbmigration
clj-kondo
Static analyzer and linter for Clojure code that sparks joy
Clojure1681epl-1.0
4 months ago
clojureclojurescriptgraalvm
pyspark-cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
451mit
2 years ago
cheatcheatsheetcheatsheets
Kimera-Semantics
Real-Time 3D Semantic Reconstruction from 2D data
C++635bsd-2-clause
12 months ago
3d-reconstructioncpudepth-image
lang
List of 126 languages for Laravel Framework, Laravel Jetstream, Laravel Fortify,
PHP7499mit
5 days ago
i18nlanguagelaravel
awesome
Description PBS KIDS Games makes learning fun & safe with 250+ educational ga
532cc0-1.0
2 years ago
awesomeawesome-listcraft