How-To: Import raw files out of a wget dump or similar website dump #1387
-
Hello, I've found myself having to download a website from the web archive, and I've got a dump of it, but I don't know how to import it into ArchiveBox. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
You can technically accomplish this but it's a very labor intensive process right now, as it's not one of our main use-cases. You have to create a
You can technically skip step (2), ArchiveBox will leave your files alone inside the snapshot dir, but it wont show them in the UI like it does its own files unless you mimic it's normal structure and add We might support automatically importing existing archives in the future, subscribe to this issue for progress updates: #160 |
Beta Was this translation helpful? Give feedback.
You can technically accomplish this but it's a very labor intensive process right now, as it's not one of our main use-cases. You have to create a
Snapshot
for each URL, and it's recommended to add anArchiveResult
entry for each item you want to import./add/
and add each URL to create aSnapshot
entry (you can deselect all the archive methods excepttitle
to avoid it trying to archive the page on its own)/admin/core/archiveresult/
and create a new ArchiveResult for each file you want to store for that URL, and point it to yourSnapshot
created earlier for that URLmp4
files into./archive/<timestamp>/media/
,…