warc-safe
A tool for detecting viruses and NSFW material in WARC files
Python11
3 months ago
antivirusnsfw-classifierwarc
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno
Python1322other
5 months ago
archivingcrawlcrawler
solrwayback
A search interface and wayback machine for the UKWA Solr based warc-indexer fram
Java102apache-2.0
10 days ago
wasapi-downloader
Java application to download WARCs from WASAPI
Java6other
3 days ago
applicationinfrastructurejava
node-warc
Parse And Create Web ARChive (WARC) files with node.js
JavaScript93mit
2 years ago
chrome-remote-interfacepupeteerwarc
Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your o
Scala24mit
7 years ago
warctools
Command line tools and libraries for handling and manipulating WARC files (and H
Python152mit
4 years ago
webarchive-indexing
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
Python41mit
7 years ago