mirror of
https://github.com/birbwatcher/wayback-machine-downloader.git
synced 2026-04-23 14:33:51 +00:00
init: initial project setup
This commit is contained in:
114
README.md
114
README.md
@@ -1 +1,113 @@
|
||||
# wayback-machine-downloader
|
||||
# Wayback Machine Downloader JS
|
||||
|
||||

|
||||
|
||||
A script written in **Node.js** for downloading websites from [Web Archive](https://web.archive.org/).
|
||||
|
||||
Intended for use by:
|
||||
- **Webmasters** — to restore their lost or hacked projects
|
||||
- **OSINT researchers** — for local work with resources that no longer exist
|
||||
|
||||
This webarchive website downloader has an interactive interface, supports downloading with either original links preserved or rewritten into relative ones (for local usage).
|
||||
|
||||
---
|
||||
|
||||
## Features of Web Archive Website Downloader
|
||||
|
||||
1. Download entire websites or individual pages from the archive, including HTML, images, scripts, styles, and other assets.
|
||||
2. Rewrite internal links for correct local browsing.
|
||||
3. Multithreading support.
|
||||
4. Save results into a chosen folder while keeping the original structure.
|
||||
5. Ability to download external assets (e.g., images or scripts from a CDN).
|
||||
|
||||
#### Special Features
|
||||
|
||||
- The script fixes parameterized file names such as `main.css?ver=1.2` into `main.css` for proper local work.
|
||||
|
||||
---
|
||||
|
||||
## Requirements
|
||||
|
||||
- Node.js version 18.x or higher
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
git clone https://github.com/birbwatcher/wayback-machine-downloader.git
|
||||
cd wayback-machine-downloader
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
node downloader.js
|
||||
```
|
||||
|
||||
After launching, an interactive menu will appear with the following questions:
|
||||
|
||||
- base URL (e.g., https://example.com)
|
||||
- date range (from/to)
|
||||
- number of threads
|
||||
- link rewriting mode (keep as-is or convert to relative)
|
||||
- whether to remove `rel=canonical` from the downloaded site
|
||||
- whether to download external assets
|
||||
- directory for saving the files
|
||||
|
||||
---
|
||||
|
||||
## Example
|
||||
|
||||
```bash
|
||||
node downloader.js
|
||||
```
|
||||
|
||||
Dialog example:
|
||||
|
||||
```bash
|
||||
Enter base URL to archive (e.g., https://example.com): https://example.com
|
||||
From timestamp (YYYYMMDDhhmmss) or leave blank: 20200101000000
|
||||
To timestamp (YYYYMMDDhhmmss) or leave blank: 20201231235959
|
||||
Rewrite links? (yes=relative / no=as-is, default no): yes
|
||||
Canonical: "keep" (default) or "remove": keep
|
||||
How many download threads? (default 3): 5
|
||||
Only exact URL (no wildcard /*)? (yes/no, default no): no
|
||||
Target directory (leave blank for default websites/<host>/):
|
||||
Download external assets? (yes/no, default no): no
|
||||
```
|
||||
|
||||
After this, the archive download will begin.
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
#### Script downloads only the homepage
|
||||
**Answer:** try specifying the base URL with `/*` at the end.
|
||||
For example: `https://example.com/*`, or try downloading a different time range.
|
||||
|
||||
---
|
||||
|
||||
## (Important) Download responsibly
|
||||
|
||||
Please note that downloading third-party websites may violate copyright laws.
|
||||
Use this tool responsibly and make sure not to break the law.
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
Pull requests are welcome!
|
||||
For major changes, please open an issue first to discuss what you would like to change.
|
||||
|
||||
1. Fork the project
|
||||
2. Create your feature branch (`git checkout -b feature/fooBar`)
|
||||
3. Commit your changes (`git commit -am 'Add some fooBar'`)
|
||||
4. Push to the branch (`git push origin feature/fooBar`)
|
||||
5. Create a new Pull Request
|
||||
|
||||
Reference in New Issue
Block a user