WarcPartitioner

Partition (W)ARC Files by MIME Type and Year

License

MIT License

Partition (W)ARC Files by MIME Type and Year

Creator

helgeho

Related apps

ArchiveSpark

ArchiveSpark

An Apache Spark framework for easy data processing, extraction as well as deriva

Scala140mit

3 months ago

archivesparkinternet-archivespark

HadoopConcatGz

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

Java9

6 years ago

hadoopsparkwarc

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago