2020-06-19 01:40:13 -04:00
|
|
|
# deHTTPS-Proxy - Access HTTPS Websites from Retro Computers
|
|
|
|
|
|
|
|
## Purpose
|
|
|
|
|
|
|
|
Today's web is based around Transport Layer Security (TLS) protocols, namely
|
|
|
|
HTTPS. In the last couple of years many sites that were previously available
|
|
|
|
in plaintext have been made HTTPS-only. This renders them inaccessable to
|
|
|
|
retro computers such as the Apple II, which are not capabable of handling the
|
|
|
|
cryptography.
|
|
|
|
|
|
|
|
I had the idea of building an HTTP to HTTPS gateway, which allows requests to
|
|
|
|
be submitted in plaintext HTTP and forwards them in HTTPS, acting as a kind
|
|
|
|
of proxy.
|
|
|
|
|
|
|
|
This particular implementation `dehttps-proxy.py` is a quick and dirty hack
|
|
|
|
using Python 3. The error handling is rather rough-and-ready but it does
|
|
|
|
work fairly well in practice.
|
|
|
|
|
|
|
|
## Configuration
|
|
|
|
|
|
|
|
Edit `dehttps-proxy.py` and set the `PORT` to the desired value. This will
|
|
|
|
usually be port 80, the normal HTTP port. You may also want to add any
|
|
|
|
top level domains you use to the list `tlds = [ ... ]`. This currently just
|
|
|
|
has the common ones (`.com`, `.net`, `.gov` etc.)
|
|
|
|
|
2020-06-19 01:43:28 -04:00
|
|
|
If you are using port 80 you will have to run the script as root:
|
2020-06-19 01:40:13 -04:00
|
|
|
|
|
|
|
```
|
|
|
|
sudo ./dehttps-proxy.py
|
|
|
|
```
|
|
|
|
|
|
|
|
## Using the Proxy
|
|
|
|
|
|
|
|
I used the Contiki `WEBBROWS.SYSTEM` program on the Apple II for testing,
|
|
|
|
but you can also use a modern browser such as Chromium or Firefox.
|
|
|
|
|
|
|
|
Suppose you want to browse `https://www.example.com/path/to/page`. Enter
|
|
|
|
the following as the URL in your browser:
|
|
|
|
`http://pi/www.example.com/path/to/page`, where `pi` is the hostname of
|
|
|
|
your Raspberry Pi (or whatever system you are using to host the proxy.)
|
|
|
|
|
|
|
|
Your request will go to port 80 (HTTP) on the server `pi` where it will be
|
|
|
|
handled by `dehttps-proxy.py`. The Python script adds `https:/` to the path
|
2020-06-19 01:43:28 -04:00
|
|
|
that was passed in (`/www.example.com/path/to/page` in this instance) and
|
2020-06-19 01:40:13 -04:00
|
|
|
performs the HTTPS request. The data that is obtained is returned to the
|
|
|
|
original HTTP requester in plaintext.
|
|
|
|
|
|
|
|
## Trick to Make Links Work (within a site)
|
|
|
|
|
|
|
|
Many links on webpages are internal to the site. These links are specified
|
|
|
|
using a relative rather than a fully qualified URL. `dehttps-proxy.py` uses
|
|
|
|
a trick to make these internal links work. When a fully qualified URL like
|
|
|
|
`www.example.com/path/to/page` is requested, the code keeps track of the
|
|
|
|
website domain `www.example.com`. The `https://www.example.com/` prefix is
|
|
|
|
added to Subsequent relative links with URLs such as `path/to/another_page`.
|
|
|
|
|
|
|
|
This trick does not work for links to other websites. If you click on a
|
|
|
|
link `https:://foo.com` then you will have to edit it to read
|
|
|
|
`http://pi/foo.com` and resubmit it in the browser.
|
|
|
|
|
2020-06-19 01:43:28 -04:00
|
|
|
## Error Handling Sucks
|
|
|
|
|
|
|
|
Error handling consists of catch the error and hope for the best. This
|
|
|
|
could be improved.
|
|
|
|
|