How I Built This Site

2024-02-10

I've written articles previously about how I built MyFeed, but I've never spoke about how this site is built.

jasongorman.uk is my area of the web. I host it in a range of places, firstly because having a backup is good, but also to answer the question "I wonder how hard it would be to host here?".

How?

This "primary" version of the site is statically generated at build time, and hosted via AWS S3 proxying through Cloudflare.

Statically generated

The technique of static site generation (SSG) is an old one, the idea is your source files are a combination of markdown files, templates and data files, and the build step translates that into HTML files and assets arranged into a folder structure representing your website.

Jekyll (written in Ruby) made this technique popular and in recent years static site generators have been written in all sorts of languages, such as Eleventy (written in JavaScript) and Hugo (written in Go). Regardless of the language the SSG is written in, they're all capable of producing the same output, HTML files and assets, arranged into a folder structure representing your website.

What I like about the technique is it fits the use case of a read heavy site/blog really well. Put the effort in upfront at build/compile time, and the output is easier to work with at runtime.

For this site I chose Eleventy (11ty), I found it to be the most flexible compared with Jekyll and Hugo. It also has reasonable performance showing as the site grows build times won't balloon out of control.

So theres a bunch of HTML files and assets, how do you get them online?

Hosting

As mentioned I use AWS S3 as the "hosting provider", but what, why and how?

S3 is described as "Object storage built to retrieve any amount of data from anywhere", essentially its a giant key-value database managed by AWS with 99.99% availability. So I'm pretty confident that my website isn't going to 500 error, be hacked via some plugin, knocked offline via DDOS or the server used to send out spam.

But how does a key-value store serve a website? That's an interesting one and it fits the concept of the web really well.

Think about a URL, what does it represent? A webpage, or a resource. And what's the value of that URL? A string of HTML, or CSS, or a binary blob for an image. So the idea of a URL (key) mapping to a resource (value) is exactly how the web works.

Due to this we can use an S3 bucket (table) to represent all the URLs making up this website.

S3 (database)
Bucket (table)
Key (URL path)
Value (resource data)

What about web servers like NGINX and Apache, how does this translate to them?

It's very similar, rather than a key-value database in the sense of Redis, Memcached or S3, the web server maps the URL path to the key-value database it already has available, the file system.

File system (table)
Key (folder structure and file name)
Value (file data)

How to set this up?

Create a bucket matching the host name part of a website (e.g jasongorman.uk)

Next upload the output of the SSG to that bucket.

In the bucket properties enable "Static website hosting" mode, and adjust the read access to public.

Then adjust the "Bucket policy" to only include Cloudflares IP address ranges. You could avoid this step, but it secures against bots bypassing Cloudflare and hitting your S3 bucket directly. 10 million requests equals around $4, so you can imagine how a bot attack could start racking up a bill of that's happening non-stop.

It should look something like the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudflareOnlyAccess"
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::jasongorman.uk/*",
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": [
                        "173.245.48.0/20",
                        "103.21.244.0/22",
                        "103.22.200.0/22",
                        "103.31.4.0/22",
                        "141.101.64.0/18",
                        "108.162.192.0/18",
                        "190.93.240.0/20",
                        "188.114.96.0/20",
                        "197.234.240.0/22",
                        "198.41.128.0/17",
                        "162.158.0.0/15",
                        "104.16.0.0/13",
                        "104.24.0.0/14",
                        "172.64.0.0/13",
                        "131.0.72.0/22"
                    ]
                }
            }
        }
    ]
}

The list of IP addresses can be found here: https://www.cloudflare.com/ips/

Next take the S3 bucket URL and add that as a CNAME in Cloudflare's DNS settings.

jasongorman.uk CNAME jasongorman.uk.s3-website-eu-west-1.amazonaws.com

That's it!

Requests to the site resolve through Cloudflare. It acts as a reverse proxy making GET requests to the S3 bucket and returning the results.

Updating the site involves

Changing the source files
Running a build
Uploading the output to the S3 bucket

This can be automated via pipelines such as Github Actions. So as soon as a change is made on the main branch, the pipeline triggers a build and pushes the output to S3.

One thing to keep an eye out for is caching rules, ensuring that Cloudflare caches files well, but also regularly checks with S3 for the latest version. I’ve made some handy notes here about the rules that should be applied.

Hopefully this has been insightful and I’ll expand on how I host this site in other places soon.