HadoopConcatGz

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

License

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

Creator

helgeho

Related apps

ArchiveSpark

ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as deriva

Scala140mit

3 months ago

archivesparkinternet-archivespark

WarcPartitioner

Partition (W)ARC Files by MIME Type and Year

Java1mit

7 years ago

hadoopwarcweb-archiving

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago