init: initial project setup

2026-04-23 14:33:51 +00:00 · 2025-09-27 16:41:13 +02:00
parent dbd2f0adda
commit 7541f2ebee
4 changed files with 656 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1 +1,113 @@
-# wayback-machine-downloader
+# Wayback Machine Downloader JS
+
+![Web Achive Website Downloader](assets/webarchive-downloader.jpg)
+
+A script written in **Node.js** for downloading websites from [Web Archive](https://web.archive.org/).  
+
+Intended for use by:
+- **Webmasters** — to restore their lost or hacked projects  
+- **OSINT researchers** — for local work with resources that no longer exist  
+
+This webarchive website downloader has an interactive interface, supports downloading with either original links preserved or rewritten into relative ones (for local usage).  
+
+---
+
+## Features of Web Archive Website Downloader
+
+1. Download entire websites or individual pages from the archive, including HTML, images, scripts, styles, and other assets.  
+2. Rewrite internal links for correct local browsing.  
+3. Multithreading support.  
+4. Save results into a chosen folder while keeping the original structure.  
+5. Ability to download external assets (e.g., images or scripts from a CDN).  
+
+#### Special Features
+
+- The script fixes parameterized file names such as `main.css?ver=1.2` into `main.css` for proper local work.  
+
+---
+
+## Requirements
+
+- Node.js version 18.x or higher  
+
+---
+
+## Installation
+
+```bash
+git clone https://github.com/birbwatcher/wayback-machine-downloader.git
+cd wayback-machine-downloader
+
+# Install dependencies
+npm install
+```
+
+---
+
+## Run
+
+```bash
+node downloader.js
+```
+
+After launching, an interactive menu will appear with the following questions:
+
+- base URL (e.g., https://example.com)  
+- date range (from/to)  
+- number of threads  
+- link rewriting mode (keep as-is or convert to relative)  
+- whether to remove `rel=canonical` from the downloaded site  
+- whether to download external assets  
+- directory for saving the files  
+
+---
+
+## Example
+
+```bash
+node downloader.js
+```
+
+Dialog example:  
+
+```bash
+Enter base URL to archive (e.g., https://example.com): https://example.com
+From timestamp (YYYYMMDDhhmmss) or leave blank: 20200101000000
+To timestamp (YYYYMMDDhhmmss) or leave blank: 20201231235959
+Rewrite links? (yes=relative / no=as-is, default no): yes
+Canonical: "keep" (default) or "remove": keep
+How many download threads? (default 3): 5
+Only exact URL (no wildcard /*)? (yes/no, default no): no
+Target directory (leave blank for default websites/<host>/): 
+Download external assets? (yes/no, default no): no
+```
+
+After this, the archive download will begin.  
+
+---
+
+## Common Issues
+
+#### Script downloads only the homepage
+**Answer:** try specifying the base URL with `/*` at the end.  
+For example: `https://example.com/*`, or try downloading a different time range.  
+
+---
+
+## (Important) Download responsibly
+
+Please note that downloading third-party websites may violate copyright laws.  
+Use this tool responsibly and make sure not to break the law.  
+
+---
+
+## Contributing
+
+Pull requests are welcome!  
+For major changes, please open an issue first to discuss what you would like to change.  
+
+1. Fork the project  
+2. Create your feature branch (`git checkout -b feature/fooBar`)  
+3. Commit your changes (`git commit -am 'Add some fooBar'`)  
+4. Push to the branch (`git push origin feature/fooBar`)  
+5. Create a new Pull Request