warc-safe

warc-safe

A tool for detecting viruses and NSFW material in WARC files

Python11

3 months ago

antivirusnsfw-classifierwarc

grab-site

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno

Python1322other

5 months ago

archivingcrawlcrawler

warcat

Tool and library for handling Web ARChive (WARC) files.

Python150gpl-3.0

last month

python

jwarc

Java library for reading and writing WARC files with a typed API

Java46apache-2.0

5 months ago

warc2html

warc2html

Converts WARC files to static HTML

Java39apache-2.0

5 months ago

warcprox

warcprox

WARC writing MITM HTTP/S proxy

Python375

4 months ago

solrwayback

solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer fram

Java102apache-2.0

10 days ago

warcio

Streaming WARC/ARC library for fast web archive IO

Python386apache-2.0

9 days ago

pythonpywbwarc

webarchive-discovery

WARC and ARC indexing and discovery tools.

Java117

3 months ago

wasapi-downloader

Java application to download WARCs from WASAPI

Java6other

3 days ago

applicationinfrastructurejava

node-warc

Parse And Create Web ARChive (WARC) files with node.js

JavaScript93mit

2 years ago

chrome-remote-interfacepupeteerwarc

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago

warctools

Command line tools and libraries for handling and manipulating WARC files (and H

Python152mit

4 years ago

html2warc

simple script to convert web resources to a single warc file

Python18mit

2 years ago

webarchive-indexing

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

Python41mit

7 years ago

webarchive

golang readers for ARC and WARC webarchive formats

Go20apache-2.0

2 years ago