Shifting a Legacy Site to a CDN, Part 1

Submitted by Josh on

In my day job, we've been steadily moving into the cloud for a while now for all the same reasons everyone else does: it gives us less to maintain ourselves in terms of hardware, it allows us to distribute worldwide more efficiently, it saves us money in the long run, and so on. At home, though, it's a different story, though most of the goals are similar. So now that I have some working knowledge of using Amazon Web Services to create a content delivery network (more commonly known as CDN), I decided it was time to apply it to my largest and oldest hobby site too.

Why CDN?

The reasons in the lede describe it all fairly well, but here are some more specific ones for the use case I'm describing here. First, my code repository in git that drives the development and production sites currently has all the images in it, pushing the size of the repo well over a gigabyte in size. For static images that virtually never change, that's a big waste (and it causes Bitbucket to complain). Moving them to a CDN comes with the option to move the images to a separate repository and slim down the actual code repo.
 
Additionally, a well-built CDN will almost certainly speed up performance in two ways for. First, serving from AWS edge locations (the spots around the globe where Amazon keeps their massive server farms) gives me a lot of access to high-availability, high-bandwidth servers that can handle compression and delivery far better than I could possibly could from my server's limited resources. Beyond that, the caching systems available to the CDN take a lot of the guesswork of caching out of the browser's control and puts it at the CDN, a paradigm specifically designed to be smarter about delivering content to end users.
 
Longer term, offloading all of this content to a new, purpose-driven instance will allow me to take on other infrastructure projects to improve the experience of the website and potentially reduce cost. Having a slimmer main repository might allow me to reduce the overall size of the current server container, or even move it into the cloud separately and potentially reduce cost by going to a pay-for-what-you-use model of webhosting like Amazon Lightsail or Google Cloud Platform. Paying less fixed cost could allow for more budget to spend more specifically on performance improvements.

How CDN?

I assume everyone reading this isn't doing it just for fun, and has already pieced together and wants to know more about how I am using the Amazon ecosystem for handling my new CDN. It's what I already was used to using for work, and in November of 2021, Amazon announced price reductions for users that allow individuals more resources per month on the AWS free tier of service. For my estimates (using Amazon's not-very-intuitive but useful calculator), I could shift to CDN service for only a few dollars a month, a reasonable outlay to start unlocking opportunities to reduce overall cost.
 
Specifically, in my case, I decided to narrow my focus on what needed to move into the CDN right away. I immediately chose the site's now-defunct podcast archive, as the MP3 output for each of the 30+ episodes totaled 1.3GB. As the podcast had been defunct for a number of years, overall traffic into those files was quite low, but in aggregate they made a large chunk of potential bandwidth that would very much benefit from being hosted offsite. Next, I focused on the images that made up the main site layout, as those images are naturally used by every page of the site (in most cases) and would pay off handsomely for both me and my users in overall bandwidth as well as from being served from a high-availability, high-speed service. After that, I moved over the image content for each site subsection; most of those sections have hundreds to thousands of small files that are used less frequently but end up being used repeatedly by individual users once they dive into a content section. For site sections that include map navigators that work like Google Maps, the number of images can easily approach a thousand images just for those individual pages, so while each individual tile is quite small, they add up quickly - one section I checked for purposes of writing this post had 84MB of just images to represent the map. I did leave a number of things unmoved for this phase of the project, though, primarily the non-default site layout skins and the images created by our user-generated art sections, as switching those sections would be relatively burdensome and the amount of traffic they drive currently make them slightly less vital to convert right away. The overall goal of the project was to make immediate impact for the users, and also give me a large enough sample size to ensure that the pricing model would work for me once I'd tested for a couple months, so the rest of the list would suit the goals just fine.
 
So, what did I need to make this move, and in what order? That's what's next, and as it gets more technical and more than a bit longer, I'm splitting that off into a second post.