node-warc

Parse And Create Web ARChive (WARC) files with node.js

JavaScript93mit

2 years ago

chrome-remote-interfacepupeteerwarc

grab-site

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno

Python1322other

4 months ago

archivingcrawlcrawler

warcat

Tool and library for handling Web ARChive (WARC) files.

Python143gpl-3.0

2 years ago

python

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago

jwarc

Java library for reading and writing WARC files with a typed API

Java46apache-2.0

4 months ago

warc2html

warc2html

Converts WARC files to static HTML

Java38apache-2.0

4 months ago

warcprox

warcprox

WARC writing MITM HTTP/S proxy

Python375

3 months ago

warctools

Command line tools and libraries for handling and manipulating WARC files (and H

Python147mit

4 years ago

solrwayback

solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer fram

Java99apache-2.0

3 months ago

html2warc

simple script to convert web resources to a single warc file

Python18mit

last year

warcio

Streaming WARC/ARC library for fast web archive IO

Python364apache-2.0

5 months ago

pythonpywbwarc

webarchive-indexing

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

Python41mit

7 years ago

webarchive

golang readers for ARC and WARC webarchive formats

Go19apache-2.0

2 years ago

webarchive-discovery

WARC and ARC indexing and discovery tools.

Java113

7 months ago

wasapi-downloader

Java application to download WARCs from WASAPI

Java6other

4 months ago

applicationinfrastructurejava

node-warc

Parse And Create Web ARChive (WARC) files with node.js

JavaScript93mit

2 years ago

chrome-remote-interfacepupeteerwarc

warc-safe

warc-safe

A tool for detecting viruses and NSFW material in WARC files

Python9

3 months ago

antivirusnsfw-classifierwarc

grab-site

grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno

Python1322other

4 months ago

archivingcrawlcrawler

warcat

Tool and library for handling Web ARChive (WARC) files.

Python143gpl-3.0

2 years ago

python

Web2Warc

An easy-to-use and highly customizable crawler that enables you to create your o

Scala24mit

7 years ago

jwarc

Java library for reading and writing WARC files with a typed API

Java46apache-2.0

4 months ago

warc2html

warc2html

Converts WARC files to static HTML

Java38apache-2.0

4 months ago

warcprox

warcprox

WARC writing MITM HTTP/S proxy

Python375

3 months ago

warctools

Command line tools and libraries for handling and manipulating WARC files (and H

Python147mit

4 years ago

solrwayback

solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer fram

Java99apache-2.0

3 months ago

html2warc

simple script to convert web resources to a single warc file

Python18mit

last year

warcio

Streaming WARC/ARC library for fast web archive IO

Python364apache-2.0

5 months ago

pythonpywbwarc

webarchive-indexing

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

Python41mit

7 years ago

webarchive

golang readers for ARC and WARC webarchive formats

Go19apache-2.0

2 years ago

webarchive-discovery

WARC and ARC indexing and discovery tools.

Java113

7 months ago

wasapi-downloader

Java application to download WARCs from WASAPI

Java6other

4 months ago

applicationinfrastructurejava