Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)

Search similar apps

    License

    MIT License

    An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)

    Creator

    helgeho

    Related apps

    ArchiveSpark

    ArchiveSpark

    An Apache Spark framework for easy data processing, extraction as well as deriva

    Scala140mit

    3 months ago

    archivesparkinternet-archivespark

    HadoopConcatGz

    A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

    Java9

    6 years ago

    hadoopsparkwarc

    WarcPartitioner

    Partition (W)ARC Files by MIME Type and Year

    Java1mit

    7 years ago

    hadoopwarcweb-archiving