https://github.com/internetarchive/warctools
Python147
4 years ago
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
MIT License
WARC writing MITM HTTP/S proxy
Python375
3 months ago
brozzler - distributed browser-based web crawler
Python637apache-2.0
Internet Archive's Sparkling Data Processing Library
Scala11mit
5 months ago