node-warc
Parse And Create Web ARChive (WARC) files with node.js
JavaScript93mit
2 years ago
chrome-remote-interfacepupeteerwarc
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno
Python1322other
5 months ago
archivingcrawlcrawler
Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your o
Scala24mit
7 years ago
warctools
Command line tools and libraries for handling and manipulating WARC files (and H
Python152mit
4 years ago
solrwayback
A search interface and wayback machine for the UKWA Solr based warc-indexer fram
Java102apache-2.0
13 days ago
warcio
Streaming WARC/ARC library for fast web archive IO
Python386apache-2.0
11 days ago
pythonpywbwarc
webarchive-indexing
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
Python41mit
7 years ago
wasapi-downloader
Java application to download WARCs from WASAPI
Java6other
6 days ago
applicationinfrastructurejava
warc-safe
A tool for detecting viruses and NSFW material in WARC files
Python11
3 months ago
antivirusnsfw-classifierwarc
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic igno
Python1322other
5 months ago
archivingcrawlcrawler
Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your o
Scala24mit
7 years ago
solrwayback
A search interface and wayback machine for the UKWA Solr based warc-indexer fram
Java102apache-2.0
13 days ago
warcio
Streaming WARC/ARC library for fast web archive IO
Python386apache-2.0
11 days ago
pythonpywbwarc
wasapi-downloader
Java application to download WARCs from WASAPI
Java6other
6 days ago
applicationinfrastructurejava