Reviews
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Search similar apps
License
MIT License
Related apps
HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Java9
7 years ago
hadoopsparkwarc
WarcPartitioner
Partition (W)ARC Files by MIME Type and Year
Java1mit
8 years ago
hadoopwarcweb-archiving
Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your o
Scala24mit
7 years ago