Metadata-Version: 2.1
Name: crystal-web
Version: 1.5.0b0.post1
Summary: Downloads websites for long-term archival.
Home-page: https://github.com/davidfstr/Crystal-Web-Archiver
License: Proprietary
Author: David Foster
Author-email: david@dafoster.net
Requires-Python: >=3.8,<3.12
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: MacOS X
Classifier: Environment :: Win32 (MS Windows)
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: Other/Proprietary License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Internet :: WWW/HTTP :: Site Management
Classifier: Topic :: Software Development :: Version Control
Classifier: Topic :: System :: Archiving :: Backup
Classifier: Topic :: System :: Archiving :: Mirroring
Requires-Dist: appdirs (>=1.4.4,<2.0.0)
Requires-Dist: beautifulsoup4 (>=4.9.3,<5.0.0)
Requires-Dist: certifi
Requires-Dist: colorama (>=0.4.4,<0.5.0)
Requires-Dist: lxml (>=4.9.1,<5.0.0)
Requires-Dist: overrides (>=3.1.0,<4.0.0)
Requires-Dist: py2app (>=0.23,<0.24) ; sys_platform == "darwin"
Requires-Dist: py2exe (>=0.13.0.0,<0.14.0.0) ; sys_platform == "win32"
Requires-Dist: tinycss2 (>=1.1.0,<2.0.0)
Requires-Dist: tzlocal (>=4.2,<5.0)
Requires-Dist: wxPython (==4.1.1)
Project-URL: Repository, https://github.com/davidfstr/Crystal-Web-Archiver
Description-Content-Type: text/markdown

Crystal Web Archiver
====================

<img src="README/logo.png" title="Crystal Web Archiver icon" align="right" />

Crystal is a tool that downloads high fidelity copies of websites for long-term archival.

It works best on traditional websites made of distinct pages which make limited
use of JavaScript (such as blogs, wikis, and other static websites)
although it can also download more dynamic sites which have infinitely 
scrolling feeds of content (such as social media sites).

If you are an early adopter and want to get started creating your first project
with Crystal, please see the Tutorial below.
Additional documentation will be available once Crystal is no longer **in beta**.

Download ⬇︎
--------

* [macOS 10.14 and later](https://github.com/davidfstr/Crystal-Web-Archiver/releases/download/v1.5.0b/crystal-mac-1.5.0b.dmg)
    * You will need to [right-click or Control-click on the application and select "Open" to open it for the first time](https://github.com/davidfstr/Crystal-Web-Archiver/issues/20).
* [Windows 7, 8, 10](https://github.com/davidfstr/Crystal-Web-Archiver/releases/download/v1.5.0b/crystal-win-1.5.0b.exe)
* Linux
    * Install Python >=3.8,<3.12 and pip from your package manager
        * Ubuntu 22.04: `apt-get update; apt-get install -y python3 python3-pip python3-venv`
        * Fedora 37: `yum update -y; yum install -y python3 python3-pip`
    * Install dependencies of wxPython from your package manager
        * Ubuntu 22.04: `apt-get install -y libgtk-3-dev`
        * Fedora 37: `yum install -y wxGTK-devel gcc gcc-c++ which python3-devel`
    * Install pipx
        * `pip install pipx`
    * Install Crystal with pipx
        * NOTE: The following step will take a long time (10+ minutes)
          because installing wxPython (which is a dependency of Crystal)
          will need to be built from source, since it does not offer
          precompiled wheels for Linux.
        * `pipx install crystal-web`
    * Run Crystal:
        * `crystal`


Tutorial ⭐
--------

### To download a static website (ex: [xkcd]):

* Download the binary for your operating system. See the Download section above.
* Open the program and create a new project, call it "xkcd".
* Click the "+ URL" button to add the "https://xkcd.com/1/" URL, named "First Comic".
* Expand the new "First Comic" node to download the page and display its links.
* Click the "+ Group" button to add a new group called "Comics" with the pattern
  "https://xkcd.com/#/". The "#" is a wildcard that matches any number.
  Make sure it also has "First Comic" selected as the Source.
    * If you click the "Preview Members" button in the dialog, you should see a list of
      several URLs, including "https://xkcd.com/1/" and "https://xkcd.com/2/".
* Close the "First Comic" node so that you can see the new "Comics" node at the root level.
* Select the "Comics" node and press the "Download" button.
  This will download all xkcd comics.
* Expand the "Comics" node to see a list of all comic pages.
* Select any comic page you'd like to see and press the "View" button.
  Your default web browser should open and display the downloaded page.
* Congratulations! You've downloaded your first website with Crystal!

### To download a dynamic website (ex: [The Pragmatic Engineer]):

* Start Crystal from the [command-line], so that we will have access to the
  server log in the Terminal (macOS) or the Command Prompt (Windows) in later steps.
    * In a future version of Crystal it will be possible to 
      [access the server log from the regular UI], 
      without needing to run Crystal from the command-line.
* Press the "+ URL" button and add: `https://newsletter.pragmaticengineer.com/` -- Home
* Select the added "Home" and press the "Download" button. Wait for it to finish downloading.
* With "Home" still selected, press the "View" button.
  A web browser should open and display the downloaded home page.
* While browsing a downloaded site from a web browser,
  Crystal's server will log information to the terminal about requests it 
  receives from the web browser. For example:
    * `"GET /_/https/newsletter.pragmaticengineer.com/ HTTP/1.1" 200 -`
        * This line says the web browser did try to fetch the
          <https://newsletter.pragmaticengineer.com/> URL from Crystal.
* Notice in the server log that many red lines did appear saying
  "Requested resource not in archive".
    * Since these were fetched immediately when loading the page,
      they must be a kind of resource that is "embedded" into the page.
      When Crystal downloads a page it also downloads all embedded
      resources it can find statically, but these embedded resources 
      must have been fetched *dynamically* by JavaScript code running on the page.
* We want to eliminate those red lines that appear when viewing the home page.

Eliminate red lines:

* Let's start by eliminating the "Requested resource not in archive" red lines
  related to URLs like `https://bucketeer-*/**.png`
* Press the "+ Group" button and add: `https://bucketeer-*/**.png` -- Bucketeer PNG
* Reload the home page in the web browser.
* Notice in the server log that many green lines did appear saying
  "*** Dynamically downloading existing resource in group 'Bucketeer PNG':"
  and that there are no more red lines related to `https://bucketeer-*/**.png`.

Eliminate more red lines:

* However there are still "Requested resource not in archive" red lines 
  related to URLs like `https://substackcdn.com/**.png`. Let's eliminate them too.
* Press the "+ Group" button and add: `https://substackcdn.com/**.png` -- Substack CDN PNG
* Reload the home page in the web browser.
* Again, all red lines related to `https://substackcdn.com/**.png` should be gone.

Eliminate last two red lines:

* There should be only two red lines left:
    * `*** Requested resource not in archive: https://newsletter.pragmaticengineer.com/api/v1/archive?sort=new&search=&offset=12&limit=12`
    * `*** Requested resource not in archive: https://newsletter.pragmaticengineer.com/api/v1/firehose?`...
* Eliminate the first one by creating a group: `https://newsletter.pragmaticengineer.com/api/v1/archive?**` -- Archive API
* Eliminate the second one by creating a group: `https://newsletter.pragmaticengineer.com/api/v1/firehose?**` -- Firehose API
* Reload the home page in the web browser.
* There should be no red lines left.

Eliminate "Page not found" message:

* However there's a strange "Page not found" message displayed at the top of
  the home page.
    * The Pragmatic Engineer is a [Single Page Application] (SPA), 
      a particularly advanced kind of dynamic website.
    * SPAs can get confused when the URL in the browser has a path component
      that isn't what they expected:
        * When loading the real <https://newsletter.pragmaticengineer.com/>,
          the path component of the URL is: `/`
        * When loading the archived version at <http://localhost:2797/_/https/newsletter.pragmaticengineer.com/>,
          the path component of the URL is: `/_/https/newsletter.pragmaticengineer.com/`
* The "Page not found" message is probably caused by the SPA's routing code
  getting confused by the path component of the archived URL not matching
  the path component of the real URL.
* We can alter the path component of the archived URL to be more realistic
  and match the path component of the real URL by setting the
  Default URL Prefix of the project to `https://newsletter.pragmaticengineer.com`.
* Right-click (or Control-Click) on the "Home" URL and select
  "Set as Default URL Prefix" from the contextual menu.
* With the "Home" URL selected, press the "View" button to open it again in the web browser.
* It should have opened in the web browser at URL <http://localhost:2797/>,
  with a path component of `/` just like the real URL.
* There also should be no further "Page not found" messages.

Final testing:

* If you click the "Let me read it first" link at the bottom of the page,
  a list of article links should appear.
* Congratulations! You've fully downloaded the page! 🎉

### To download a website that requires login (ex: [The Pragmatic Engineer]):

* Using a browser like Chrome, login to the website you want to download.
* Right-click anywhere on the page and choose Inspect to open the Chrome Developer Tools.
* Switch to the Network pane and enable the Doc filter.
* Reload the page by pressing the ⟳ button.
* Select the page's URL in the Network pane.
* Scroll down to see the "Request Headers" section and look for a "cookie" request header.
* Copy the value of the "cookie" request header to a text file for safekeeping.
* Open Crystal, either creating a new project or opening an existing project.
* Click the "Preferences..." button, paste the cookie value in the text box, and click "OK".
    * This cookie value will be remembered only while the project remains open.
      If you reopen Crystal again later you'll need to paste the cookie value in again.
* Now download pages using Crystal as you would normally. The specified cookie
  header value (which logs you in to the remote server) will be used as you
  download pages.

[xkcd]: https://xkcd.com
[The Pragmatic Engineer]: https://newsletter.pragmaticengineer.com/

[command-line]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Command-Line-Interface
[access the server log from the regular UI]: https://github.com/davidfstr/Crystal-Web-Archiver/issues/44
[Single Page Application]: https://developer.mozilla.org/en-US/docs/Glossary/SPA


History 📖
-------

I wrote Crystal originally in 2011 because other website downloaders
I tried didn't work well for me and because I wanted to write a large
Python program, as Python was a new language for me at the time.

Every few years I revisit Crystal to add features allowing me to archive 
more sites that I care about, and to bring Crystal up-to-date for the latest
operating systems.


Design 📐
------

A few unique characteristics of Crystal:

* The Crystal project file format (`*.crystalproj`) is suitable for long-term archival:
    * Downloaded pages are stored in their original form as downloaded
      from the web including all HTTP headers.
    * Metadata is stored in a [SQLite database].

* To download pages automatically, the user must define "groups" of pages with similar
  URLs (ex: "Blog Posts", "Archive Pages") and specify rules for finding links to members
  of the group.
    * Once a group has been defined in this way, it is possible for the user to
      instruct Crystal to simply download the group. This involves finding links to all
      members of the group (possibly by downloading other groups) and then downloading
      each member of the group, in parallel.

The design is intended for the future addition of the following features:

* Intelligently updating the pages in websites that have already been downloaded.
    * This would be done by defining rules on groups that specify how often its members
      are updated. For example the set of "Archive Pages" on WordPress blogs is expected
      to change monthly. And the most recently added member of the "Archive Pages" group
      may change daily, whereas the other members are expected to never change.
    * Multiple revisions per downloaded resource are supported to allow multiple
      versions of the same resource to be tracked over time.

[SQLite database]: https://sqlite.org/lts.html


Contributing ⚒
------------

If you'd like to request a feature, report a bug, or ask a question, please create
[a new GitHub Issue](https://github.com/davidfstr/Crystal-Web-Archiver/issues/new),
with either the `type-feature`, `type-bug`, or `type-question` tag.

If you'd like to help work on coding new features, please see
the [code contributor workflow]. If you'd like to help moderate the community
please see the [maintainer workflow].

[code contributor workflow]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Contributor-Workflows#code-contributors
[maintainer workflow]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Contributor-Workflows#maintainers

### Code Contributors

To **run the code locally**,
run `poetry install` once in Terminal (Mac) or in Command Prompt (Windows), and
`poetry run python src/main.py` thereafter.

To **build new binaries** for Mac or Windows, follow the instructions at [COMPILING.txt].

To **run non-UI tests**, run `poetry run pytest` in Terminal (Mac) or in Command Prompt (Windows).

To **run UI tests**, run `poetry run python src/main.py --test` in Terminal (Mac) or in Command Prompt (Windows).

To **typecheck**, run `poetry run mypy` in Terminal (Mac) or in Command Prompt (Windows).

[COMPILING.txt]: COMPILING.txt


Related Projects ⎋
----------------

* [webcrystal]: An alternative website archiving tool that focuses on making it
  easy for automated crawlers (rather than for humans) to download websites.

[webcrystal]: http://dafoster.net/projects/webcrystal/


Release Notes ⋮
-------------

### Future

* See the [Roadmap].
* See open [high-priority issues] and [medium-priority issues].

[Roadmap]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Roadmap
[high-priority issues]: https://github.com/davidfstr/Crystal-Web-Archiver/issues?q=is%3Aopen+is%3Aissue+label%3Apriority-high
[medium-priority issues]: https://github.com/davidfstr/Crystal-Web-Archiver/issues?q=is%3Aopen+is%3Aissue+label%3Apriority-medium

### v1.5.0b <small>(April 2, 2023)</small>

This release focuses on making it easy to install Crystal from PyPI,
adds support for running on Linux from source (but not from a binary),
and fixes many bugs with the built-in CLI shell.

Additionally items in the main window are easier to understand
because icons and tooltips have been added for all tree nodes.

* Distribution improvements
    * Can install Crystal using pipx and pip, from PyPI:
        * `pipx install crystal-web`
        * `crystal`
    * Can run Crystal using `crystal` binary:
        * `poetry run crystal`
    * Can run Crystal using `python -m crystal`:
        * `poetry run python -m crystal`
    * Add support for Linux platform (Ubuntu 22.04, Fedora 37)

* CLI improvements
    * Fixed shell to not hang if exited before UI exited, under certain circumstances.
    * Fixed {help, exit, quit} functions to be available when Crystal runs as an .app or .exe.
    * Altered exiting message while windows open to be more accurate.
    * Pinned the public API of `Project` and `MainWindow`.

* Testing improvements
    * Tests are much faster now that download delays are minimized while running tests.
    * Failure messages are improved whenever a WaitTimedOut.
    * A screenshot is taken whenever a test fails.
    * Several race conditions related to accessing the foreground thread are fixed.

* UI Improvements
    * Icons and tooltips added to all tree nodes in the main window,
      clarifying the different types of entities, links, and tasks that exist.
        * Easy to distinguish between URLs and groups.
        * Easy to see whether a URL was downloaded,
          and whether it was downloaded successfully.
    * URL clusters now show in their title how many members they contain.
    * Fixed "Offsite" cluster nodes to update children appropriately whenever
      the Default URL Prefix is changed.
    * Fixed right-click on non-URL node to no longer print a traceback.
    * Fixed attempt to download a group with no source to no longer print a traceback.

### v1.4.0b <small>(August 22, 2022)</small>

This release adds early support for [incrementally redownloading sites
with new page versions](https://github.com/davidfstr/Crystal-Web-Archiver/issues/80).

It is also now possible to download sites requiring login from the UI
and a tutorial has been added showing how to do that.

There are also many stability improvements, with fewer wxPython-related
Segmentation Faults and dramatically improved automated test coverage.

* Downloading improvements
    * Can redownload newer versions of existing URLs using the UI or `--stale-before` CLI option.
    * Can download sites that require cookie-based login using the UI.
    * Fix to send URL path and query rather than absolute URL in HTTP GET requests,
      improving conformance to RFC 2616 (HTTP/1.1).
        * This helps download WordPress sites successfully.
    * Give up if it takes more than 10 seconds to start downloading an URL.
        * This helps automatically skip extremely slow URLs,
          which tend to be dead links.

* Parsing improvements
    * Can identify rel="stylesheet" references inside CSS that don't end in .css
    * Can identify URL references inside Atom feeds and RSS feeds.

* CLI improvements
    * The [shell] now runs commands on the foreground thread by default,
      making it easy to interact with the `project` and `window` variables.

* Stability improvements
    * Two issues that could cause Crystal to crash with a Segmentation Fault
      were fixed:
        * Updates to tasks now do manipulate the related tree nodes
          on the foreground thread correctly.
        * Crashes that occur in wx.Bind() event handlers no longer
          destabilize the program.
    * Various errors of the form
      `wrapped C/C++ object of type X has been deleted` that could be raised
      while Crystal is closing a project are now handled correctly.
    * Automated UI tests pass consistently:
        * Tests no longer rely on network access or real websites.
        * Did workaround wxDialog.ShowModal() hang on macOS.
        * Did workaround deadlock that can happen when closing a main window
          while there are still lingering tasks running.
        * Did add longer timeouts to accomodate slow test VMs on GitHub Actions.

* Documentation improvements
    * Improved introduction in the README.
    * Added tutorial: To download a dynamic website

### v1.3.0b <small>(July 10, 2022)</small>

This release allows more kinds of advanced sites to be downloaded,
including sites requiring login and sites relying on JSON APIs,
especially those with infinitely scrolling pages.

Projects can now be opened in a [read-only mode] such that
browsing existing downloaded content will never attempt to
dynamically download additional content.

Advanced manipulation of projects can now be done from a
[shell] launched from the command-line interface.

Last but not least, [Substack]-based sites are now recognized specially and can be
downloaded effectively without creating an explosion of URL combinations.

* Regular downloading improvements
    * Can download sites that require cookie-based login
      using the `--cookie` CLI option.

* Dynamic downloading improvements
    * Can identify URL references inside JSON responses.
        * In particular URLs that occur within JSON API endpoint responses
          are recognized correctly, which improves support for dynamically
          downloading infinitely scrolling pages.
    * Browsing to an URL that is a member of an existing resource group
      or matches an existing root URL will download it automatically.
    * Downloads now fail with a timeout error if an origin server fails
      to respond promptly rather than hanging the download operation forever.

* Parsing improvements
    * Whitespace is now stripped from relative URLs obtained from HTML tags,
      which allows the related linked URLs to be discovered correctly.

* Serving improvements
    * Downloaded sites will be served with shortened URLs if a
      Default URL Prefix is defined for a project.
    * Served sites will pin the value of `Date.now()` (and similar date/time
      functions in JavaScript) to always return the same date/time
      from when the page was originally downloaded, which helps ensure
      that any JavaScript on the page behaves in a consistent fashion.
        * In particular if there is JavaScript code that is using the
          current date/time to construct & fetch a URL,
          it will now generate a consistent URL (which can be downloaded
          to the project) rather than an inconsistent URL which cannot
          be cached properly.
    * Files that an origin server provided with a custom
      download filename via the [Content-Disposition] HTTP header
      are now correctly served with that filename.
    * Ignore early disconnection errors when a browser downloads a served URL.

* Archival improvements
    * It is now possible to open a project in [read-only mode],
      and this is done automatically for projects that are marked as
      Locked (on macOS), Read Only (on Windows), or reside on a read-only
      filesystem (such as on a DVD, CD, or optical disc).

* UI improvements
    * Main Window: Alter buttons to use more words and less symbols
    * Main Window: Fix splitter to be visible
    * Task Panel: Use white background on macOS (rather than invisible gray)
    * Main Window: Add version number

* CLI improvements
    * A `--serve` CLI option is added which automatically starts serving
      a project immediately after it is opened.
    * A `--shell` CLI option opens a Python shell that can be used to
      interact with projects in an advanced manner.

* Stability improvements
    * Two issues that could cause Crystal to crash with a Segmentation Fault
      were fixed:
        * [Fixed a crash related to attempting to set the icon of a tree node
          that no longer exists](https://github.com/davidfstr/Crystal-Web-Archiver/issues/52)
        * [Fixed a crash related to sorting of wx.TreeCtrl nodes](https://github.com/davidfstr/Crystal-Web-Archiver/commit/a90ca150b0fe76a9f584290a6a16c43b2ffd480a)
    * Automated UI tests now exist, and are run continuously with GitHub Actions.

[read-only mode]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Read-Only-Projects
[shell]: https://github.com/davidfstr/Crystal-Web-Archiver/wiki/Shell
[Substack]: https://substack.com/
[Content-Disposition]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition

### v1.2.0b <small>(April 12, 2021)</small>

This release primarily features better support for large projects and groups.
Downloads of large groups are dramatically faster and now only require a
constant amount of memory no matter how large the group is. Also a progress bar
is now displayed when opening a large project.

A few more link types in CSS and `<script>` tags are now recognized.

Last but not least, phpBB forums are now recognized specially and can be
downloaded effectively without creating an explosion of URL combinations.
phpBB support is still experimental and likely requires additional tuning.

* Performance & memory usage improvements
    * Don't hold resource revisions of group members in memory while downloading
      other members of the same group.
        * Drastically reduces memory usage while downloading large groups,
          and keeps memory usage mostly constant over time.
    * Don't attempt to reparse and redownload embedded resources for resources
      that were already downloaded in the current session of Crystal.
        * Speeds up downloading large groups where many members embed the
          same expensive subresource (like a soft 404 page).
    * Enumerate resource group members in constant time rather than linear time.
        * Drastically speeds up creating new resources and other operations.

* Parsing improvements
    * Can identify `@import "*";` references inside CSS.
    * Can identify //... references inside `<script>` tags.
    * Fix links that contain spaces and other characters to be percent-encoded.
    * Don't try to rewrite `data:` URLs

* Crawling improvements
    * Don't recurse infinitely if resource identifies ancestor as a
      self-embedded resource.
    * Don't download embedded resources of HTTP 4xx and 5xx error pages.

* Serving improvements
    * When dynamically downloading HTML pages, wait for embedded resources too.
      Avoids rendering such pages with a bunch of missing images.

* Miscellaneous
    * Specially recognize and normalize phpBB URLs.
    * Disallow delete of Resource if it is referenced by a RootResource.

### v1.1.1b <small>(April 2, 2021)</small>

Several first-time-launch issues were fixed. And domains are now recognized
in a case-insensitive fashion, eliminating duplicate URLs within some sites.

* macOS Fixes
    * Fix argument processing issue that prevented app launch on 
      macOS 10.14 Mojave.
    * Bundle HTTPS certificates from the 
      [certifi](https://pypi.org/project/certifi/) project.

* Windows Fixes
    * Embed VCRUNTIME140.dll so that Crystal does install reliably on
      a fresh Windows 7 machine.

* Serving & link-rewriting improvements
    * Treat domain names in a case-insensitive fashion.

* Miscellaneous
    * Can delete entire resources from the Crystal CLI, 
      in addition to resource revisions.

### v1.1.0b <small>(March 22, 2021)</small>

Our first beta release brings support for downloading more complex static sites,
recognizing vastly more link types than ever before. It also supports various 
kinds of *dynamic* link-rewriting (🧠), beyond the usual static link-rewriting.

Additionally the code has been modernized to work properly on the latest
operating systems and use newer versions of the BeautifulSoup parser and
the wxWidgets UI library. Unfortunately this has meant dropping support for
some older macOS versions and Windows XP.

* Parsing improvements
    * Recognize `url(*)` and `url("*")` references inside CSS!
    * Recognize http(s):// references inside `<script>` tags! 🧠
    * Recognize http(s):// references inside custom and unknown attribute types! 🧠
    * Recognize many more link types:
        * Recognize `<* background=*>` links
        * Recognize favicon links
    * Fix scoping issue that made detection of *multiple* links of the format
      `<input type='button' onclick='*.location = "*";'>` unreliable.
    * Fix Content-Type and Location headers to be recognized in case-insensitive fashion,
      fixing redirects and encoding issues on many archived sites.
    * Support rudimentary parsing of pages containing frames (and `<frameset>` tags),
      with a new "basic" parser that can be used instead of the "soup" parser.
    * Fix infinite recursion if a resource identifies itself as a self-embedded resource.

* Downloading improvements
    * Save download errors in archive more reliably

* Serving & link-rewriting improvements
    * Dynamically rewrite incoming links from unparseable site-relative and 
      protocol-relative URLs in archived resource revisions! 🧠
        * Did require altering the request URL format to be more distinct: **(Breaking Change)**
            * Old format: `http://localhost:2797/http/www.example.com/index.html`
            * New format: `http://localhost:2797/_/http/www.example.com/index.html`
    * Dynamically download accessed resources that are a member of an existing
      resource group. 🧠
        * Does allow many unparseable resource-relative URLs in archived
          resources to be recognized and downloaded successfully.
    * Better header processing:
        * Recognize many more headers:
            * Recognize standard headers related to CORS, Timing, Cookies, 
              HTTPS & Certificates, Logging, Referer, Protocol Upgrades,
              and X-RateLimit.
            * Recognize vendor-specific headers from AWS Cloudfront, 
              Cloudflare, Fastly, and Google Cloud.
        * Match headers against the header whitelist and blacklist in case-insensitive fashion,
          allowing more headers to be served correctly and reducing unknown-header warnings.
    * Fix to serve appropriate error page when viewing resource in archive
      that was fetched with an error, rather than crashing.
    * Fix transformed HTML and CSS documents to be reported as charset=utf-8 correctly.
    * Automatically fixup URLs lacking a path to have a / path.
    * Don't attempt to rewrite mailto or javascript URLs.
    * Don't print error if browser drops connection early.
    * Avoid printing binary data to console when handling incoming binary protocol message.
        * This can happen if archived JavaScript attempts to fetch a 
          archived resource over HTTPS from an http:// URL.
    * Colorize logged output by default. 🎨

* Modernize codebase
    * Upgrade Python 2.7 -> 3.8
    * Upgrade wxPython 2.x -> 4
    * Upgrade BeautifulSoup 2.x -> 4
    * Track and pin dependencies with Poetry
    * Change supported operating system versions **(Breaking Change)**
        * Drop support for Windows XP. Only Windows 7, 8, and 10 are now supported.
        * Drop support for Mac OS X 10.7 - 10.13. Only macOS 10.14+ is now supported.

* Miscellaneous
    * User-Agent: Alter to advertise correct version and project URL.
    * Logging changes:
        * Mac: Redirect stdout and stderr to file when running as a binary.
        * Windows: Alter location of stdout and stderr log files to be in %APPDATA%
          rather than beside the .exe, to enable logging even when Crystal is running
          from a locked volume.
    * Other fixes:
        * Mac: Fix wxPython warning around inserting an empty list of items to a list.
        * Fix closing the initial welcome dialog to be correctly interpreted as Quit.
    * Documentation improvements to the README
    * Upgrade development status from Alpha -> Beta 🎉

### v1.0.0a <small>(January 24, 2012)</small>

* Initial version

