#

webarchiving

Here are 46 public repositories matching this topic...

httpreserve / workbench

Client app for httpreserve pkg that generates CSV, JSON, HTTP, and BoltDB

archives boltdb code4lib internetarchive webarchiving digitalpreservation digital-repositories

Updated May 26, 2023
JavaScript

atomotic / pywb-recorder-tor

pywb recorder over tor, anonymously records the web. (docker image)

tor webrecorder webarchiving

Updated Dec 10, 2018
CSS

mijho / crawl-log2xml

Parse a Heritrix crawl.log into an XML sitemap

sitemap crawl sitemap-generator sitemap-xml webarchive heritrix webarchiving deno heritrix3

Updated Sep 30, 2023
TypeScript

ibnesayeed / archival-tests

A set of web archival replay test cases

testing memento webarchive webarchiving replay-tests archival-replay

Updated Oct 25, 2021
HTML

httpreserve / wayback

A restrictied API in Golang for the (semi)-exposed functions of the internet archive.

archives code4lib internetarchive webarchiving digitalpreservation

Updated Dec 22, 2021
Go

httpreserve / million-dollar-webpage

HTTPreserve Analysis of Million Dollar Web Page

digital-humanities web-archiving code4lib harvard webarchiving computing-history boingboing

Updated Jun 2, 2021

ibnesayeed / awesome-web-archiving

An Awesome List for getting started with web archiving

awesome awesome-list webarchiving

Updated Mar 23, 2020

atomotic / webrecorder-chrome-extension

record current active tab on webrecorder.io

chrome-extension webrecorder webarchiving

Updated May 9, 2017
JavaScript

arquivo / dspace-link-extractor

Extracts links from DSpace repositories

java tika sitemaps webarchiving

Updated Nov 2, 2023
Java

mgunn001 / WebArchiving-SeminarCourse

Class page for ODU CS 791 / 891 Web Archiving Seminar

webarchiving archived-webpages

Updated Dec 6, 2017
HTML

datacoon / metawarc

metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)

metadata osint warc webarchiving warc-files osint-python

Updated May 31, 2024
Python

moonshine

exponential-decay / moonshine

Given four bytes, download a random file from web archives implementing the UKWA Shine interface

archives glam code4lib digipres webarchiving warclight file-formats ukwa

Updated Sep 8, 2023
Go

athenekilta / arkisto

Digital archive of web pages related to the Guild of Information Networks

html php archive webarchiving

Updated Feb 9, 2024
HTML

shawnmjones / government-sites-archive-projects

This repository contains work done to determine how much of www.guideline.gov and qualitymeasures.ahrq.gov were archived.

archiving-datasets webarchiving webarchive-discovery

Updated Jul 16, 2018
Jupyter Notebook

httpreserve / phantomjsscreenshot

A wrapper for phantom.js commands for headless screenshots.

code4lib webarchiving digitalpreservation websnapshot httpreserve

Updated Jun 2, 2021
Go

MDBubing

pierlauro / MDBubing

From WARC records to MongoDB documents

crawler crawling warc webarchive webarchiving warc-files warc-format warc-record bubing

Updated Nov 3, 2020
Java

TarekJor / wpull

Wget-compatible web downloader and crawler.

crawler backup bookmarks wget web-archiving browsers preservation web-page webarchiving wpull web-downloader web-pages web-browsers

Updated Dec 20, 2017
HTML

gitdev-bash / webArchiver

A archiving utility with an interface for web servers.

webserver archiving webarchive webarchiving

Updated Aug 3, 2021
Python

PaladhiDinesh / Web-Science

https://bl.ocks.org/PaladhiDinesh/raw/56e1843c31960ecfe919/ All the Assignments are mainly based on crawling data from websites, web archieving and analyzing the data and writing reports using python, latex, R. Includes studies of the Web's properties, protocols, algorithms, and societal effects.

python latex twitter-api d3js webcrawler webscraping webarchiving

Updated May 12, 2017
Python

MozillaCZ / phpbbcrawler

Link crawler for a phpBB forum

crawler tool phpbb wayback-archiver webarchiving

Updated Jun 19, 2022
Java

Improve this page

Add a description, image, and links to the webarchiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the webarchiving topic, visit your repo's landing page and select "manage topics."