Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)

Search similar apps

    License

    MIT License

    An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)

    Creator

    helgeho

    Related apps

    ArchiveSpark

    ArchiveSpark

    An Apache Spark framework for easy data processing, extraction as well as deriva

    Scala145mit

    2 months ago

    archivesparkinternet-archivespark

    HadoopConcatGz

    A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

    Java9

    7 years ago

    hadoopsparkwarc

    WarcPartitioner

    Partition (W)ARC Files by MIME Type and Year

    Java1mit

    8 years ago

    hadoopwarcweb-archiving