HadoopConcatGz

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

License

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

Creator

helgeho

Related apps

WarcPartitioner

Partition (W)ARC Files by MIME Type and Year

Java1mit

8 years ago

hadoopwarcweb-archiving

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago