mirror of
https://github.com/birbwatcher/wayback-machine-downloader.git
synced 2026-02-10 22:52:23 +00:00
114 lines
3.1 KiB
Markdown
114 lines
3.1 KiB
Markdown
# Wayback Machine Downloader JS
|
|
|
|

|
|
|
|
A script written in **Node.js** for downloading websites from [Web Archive](https://web.archive.org/).
|
|
|
|
Intended for use by:
|
|
- **Webmasters** — to restore their lost or hacked projects
|
|
- **OSINT researchers** — for local work with resources that no longer exist
|
|
|
|
This webarchive website downloader has an interactive interface, supports downloading with either original links preserved or rewritten into relative ones (for local usage).
|
|
|
|
---
|
|
|
|
## Features of Web Archive Website Downloader
|
|
|
|
1. Download entire websites or individual pages from the archive, including HTML, images, scripts, styles, and other assets.
|
|
2. Rewrite internal links for correct local browsing.
|
|
3. Multithreading support.
|
|
4. Save results into a chosen folder while keeping the original structure.
|
|
5. Ability to download external assets (e.g., images or scripts from a CDN).
|
|
|
|
#### Special Features
|
|
|
|
- The script fixes parameterized file names such as `main.css?ver=1.2` into `main.css` for proper local work.
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
- Node.js version 18.x or higher
|
|
|
|
---
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
git clone https://github.com/birbwatcher/wayback-machine-downloader.git
|
|
cd wayback-machine-downloader
|
|
|
|
# Install dependencies
|
|
npm install
|
|
```
|
|
|
|
---
|
|
|
|
## Run
|
|
|
|
```bash
|
|
node downloader.js
|
|
```
|
|
|
|
After launching, an interactive menu will appear with the following questions:
|
|
|
|
- base URL (e.g., https://example.com)
|
|
- date range (from/to)
|
|
- number of threads
|
|
- link rewriting mode (keep as-is or convert to relative)
|
|
- whether to remove `rel=canonical` from the downloaded site
|
|
- whether to download external assets
|
|
- directory for saving the files
|
|
|
|
---
|
|
|
|
## Example
|
|
|
|
```bash
|
|
node downloader.js
|
|
```
|
|
|
|
Dialog example:
|
|
|
|
```bash
|
|
Enter base URL to archive (e.g., https://example.com): https://example.com
|
|
From timestamp (YYYYMMDDhhmmss) or leave blank: 20200101000000
|
|
To timestamp (YYYYMMDDhhmmss) or leave blank: 20201231235959
|
|
Rewrite links? (yes=relative / no=as-is, default no): yes
|
|
Canonical: "keep" (default) or "remove": keep
|
|
How many download threads? (default 3): 5
|
|
Only exact URL (no wildcard /*)? (yes/no, default no): no
|
|
Target directory (leave blank for default websites/<host>/):
|
|
Download external assets? (yes/no, default no): no
|
|
```
|
|
|
|
After this, the archive download will begin.
|
|
|
|
---
|
|
|
|
## Common Issues
|
|
|
|
#### Script downloads only the homepage
|
|
**Answer:** try specifying the base URL with `/*` at the end.
|
|
For example: `https://example.com/*`, or try downloading a different time range.
|
|
|
|
---
|
|
|
|
## (Important) Download responsibly
|
|
|
|
Please note that downloading third-party websites may violate copyright laws.
|
|
Use this tool responsibly and make sure not to break the law.
|
|
|
|
---
|
|
|
|
## Contributing
|
|
|
|
Pull requests are welcome!
|
|
For major changes, please open an issue first to discuss what you would like to change.
|
|
|
|
1. Fork the project
|
|
2. Create your feature branch (`git checkout -b feature/fooBar`)
|
|
3. Commit your changes (`git commit -am 'Add some fooBar'`)
|
|
4. Push to the branch (`git push origin feature/fooBar`)
|
|
5. Create a new Pull Request
|