Zach’s ugly mug (his face) Zach Leatherman

The Sneaky Costs of Scaling Serverless

August 05, 2024 #1 Popular

Up front disclosures: I am a former Netlify employee (it’s been over a year ago) and currently receive sponsored hosting services from both Vercel and Netlify.

If I’m completely frank, the 11ty Screenshots API has been a bit of a maintenance annoyance over the years: it’s a beefy bundle and a bit of a resource hog. Historically Netlify has graciously provided hosting, but I’ve been getting increasingly uneasy about having all of my eggs in one hosting basket.

So I decided to take the plunge and migrate it elsewhere, mostly to see what it would really cost. I learned a few things along the way (and made a few mistakes)—hopefully writing them up can help you save some money on your hosting bill, too.

First stop, Vercel

The 11ty Screenshots API uses headless Chromium and Puppeteer, provided via a community fork of Chromium to fit inside the various bundle file size limits of hosting providers (Netlify is 50 MB compressed, Vercel is 250 MB uncompressed).

The deployment to Vercel was quick and easy: without hiccups. The live-updating Domains panel in the Project Settings of the Vercel app is very impressive. It tells you exactly what records to add to your DNS provider and recognized immediately when I configured things correctly. This was especially appreciated as I tend to sweat when making DNS changes.

After letting it run exclusively on Vercel I started to get an idea for the production usage cost on Vercel. The Pro Tier ($20/month) provides 1000 GB-hours of included serverless function usage and in 12 days this service had already eaten 494.2 GB-hours of it! Given a 31 day billing cycle, this service alone was projected to use ~1276 GB-Hrs per month. Given my other serverless function usage this would have resulted in about $160 per month extra on the Vercel bill (charged at $.18/GB-Hr).

Sorry folks, I can’t afford to pay $2000 a year to run this!

Next stop, AWS Lambda

Next, I bravely stepped into the vast barren desert of Amazon Web Services. This is a desolate place where developer experience goes to die. Instead of the lovely automated processes of Netlify or Vercel where a single project was bundled and deployed directly for me triggered by a GitHub commit—I had to do everything manually. There were so many steps. There were completely arbitrary hurdles placed in my way. But I did it. And I didn’t even stay at a Holiday Inn Express the night before.

Here’s a short summary of the steps:

  1. Create an AWS Lambda Function.
  2. Create a Layer for the larger Chromium dependency (for greater efficiency).
    • To create a Layer, you need to upload it via S3 because the Layer is larger than 10MB.
    • But also make sure your S3 bucket is in the same region as your Lambda because of “reasons”.
  3. Create a JavaScript serverless bundle.
    • I used esbuild to create the single bundled JavaScript file.
    • The actual bundle deployment artifact was a zip file (~2MB) with a single JavaScript file in it.
    • Upload and deploy the serverless bundle.
  4. Add API Gateway so that HTTP requests can reach the Lambda function.
    • Set up API Gateway so that the correct URL paths are configured and mapped to your Lambda function.
  5. Added a caching layer via CloudFront so that repeat requests to the screenshots service (in the same region) would be cached.
    • This is where you set up how unique you want your cache to be. I went as aggressive as possible here. I configured the cache to be unique to the URL path, discarding uniqueness of headers, query strings, and cookies. I set a default TTL of one month and a maximum TTL of one year.
  6. I then used a Vercel Rewrite to point to the CloudFront instance.

After watching this in production for 4 short days, I think AWS is likely its new permanent home. Even though Lambda only gives you 111.11 GB-Hrs of usage on the free tier my current projected usage is only 90% of that! Mind-boggling that the same service would use ~1276 GB-Hrs per month on Vercel and ~101 GB-Hrs on AWS.

Lessons Learned

AWS is great for this style of service

This service will have very few updates, will be heavily cached, and the content is not personalized. If this was something I had to update regularly, I would need to automate some of the things I did on AWS (or just pay the convenience tax on another host). Luckily, I barely ever deploy this thing and this tradeoff feels like a good one.

Even though the deployment process was downright painful—it’s nice to have a better understanding of how to further configure different caching approaches (based on request params, cookies, headers, etc) for the future.

Vercel Memory Defaults

Vercel’s Pro Tier defaults to higher memory usage and higher cost (1 vCPU). This might makes sense if you’re using it for very request-unique personalized content and don’t want to lean into server-side caching. Since I’m porting a heavily-cached public service, it makes sense to scale back the vCPU defaults to 0.6 (1024 MB memory) and save some money by leveraging the cache (make sure you run a deploy for this change to take effect).

Vercel claims that higher vCPU can translate to cost savings via reduced execution time, but that very broad claim seems like a stretch that I would need more data to buy into.

  • Vercel: 1024 MB (default for Hobby tier), 1769 MB (default for Pro tier), or 3009 MB memory
  • Netlify: 1024 MB (default)
  • AWS Lambda: anywhere from 128 MB (default) up to 10 GB

Why was Vercel’s Usage so inflated?

Host Invocations (~per day) Usage Duration (~per day)
Netlify (14 days) 6611.9 invocations 3.659 GB-Hrs
Vercel (12 days) 40462.0 invocations 41.183 GB-Hrs
AWS (4 days so far) 25383.75 requests
5960.7 invocations
4.436 GB-Hrs

Invocation counts on Vercel were about 6× larger than both other providers. Confusingly, Vercel’s invocation count (only uncached requests) was 1.6× the AWS request count (cache + uncached). Huh?

The source code for the Vercel variant of this project is on a v1-vercel branch on GitHub (if anyone wants to take a look—maybe I misconfigured something).

I kept expecting the Vercel server cache to kick in and reduce the invocation count (and execution duration) over time but it never did. I set all of my cache timeouts to the maximum (1 year) and invocation counts in usage statistics state that they do not include cache hits (related discussion on Mastodon).

Bar graph of serverless invocations on Vercel, showing invocation counts did not decrease over time

Also note: the only usage stats I could still get out of Netlify (30 days retention) were at the tail end of using it, so the server cache was about as warm as it could possibly be since the service was running there without a deploy for over a year. Confusingly, the AWS CloudFront cache was ice cold and still had much lower usage than Vercel.

Usage-based Pricing is Sneaky

  • Vercel’s vCPU is an unnecessary abstraction that further complicates pricing. Further, because usage is priced via GB-Hrs, 1 vCPU should use 1 GB of memory. Instead, 1 vCPU is 1.7 GB of memory and 1000 GB-Hrs is actually 588 Hours of usage.
  • Vercel rounds execution time up to the nearest 50ms. Update September 2024: Vercel’s docs are currently a little confusing here—the 50 ms execution unit is only applied for Edge Functions (not Serverless Functions). Lee from Vercel has passed along that they will update the docs for better clarity here. AWS charges you by the millisecond. (I’m not sure what Netlify does here)

I put together a little spreadsheet that shows how different serverless providers grow based on hours of usage at various memory configurations.

Serverless Costs (shared) Google Sheet

It’s worth noting that Netlify’s prices charge $25 per site over the 100 GB-Hrs usage mark (so don’t assume that you can add up the usage for a bunch of projects and use this same graph).

TL;DR

  • I’d recommend switching to use 0.6 vCPU (1024 MB memory) on Vercel Functions if you’re on the Pro tier (unless you know that your project will benefit directly from more memory). Otherwise, Vercel’s pricing quickly goes to the moon after ~500 hours of usage so make sure you watch it closely.
  • Is anyone else seeing inflated invocation count metrics on Vercel?
  • AWS is a huge pain to setup but it’s nice to have a fallback plan that isn’t going to cost an arm and a leg.
  • Netlify’s pricing (and overage pricing) are very good!

< Newer
Oddbird’s Winging It Live: Web Components
Older >
Eleventy v3.0.0-beta.1 is now available!

Zach Leatherman IndieWeb Avatar for https://zachleat.com/is a builder for the web and the creator/maintainer of IndieWeb Avatar for https://www.11ty.devEleventy (11ty), an award-winning open source site generator. At one point he became entirely too fixated on web fonts. He has given 83 talks in nine different countries at events like Beyond Tellerrand, Smashing Conference, Jamstack Conf, CSSConf, and The White House. Formerly part of CloudCannon, Netlify, Filament Group, NEJS CONF, and NebraskaJS. Learn more about Zach »

20 Reposts

Zach Leatherman :11ty:matsPodcryptaelBaldur BjarnasonMax BöckTestaroli in Production!Fynn BeckerAndy BellAxel LeroyJan Lehnardt :couchdb:number5Dylan StaleyAkosBrian LeRoux ????Ryan MulliganChris HayesIrfan SEleventy ???? v3.0.0-beta.1Raymond CamdenAndy Bell

36 Likes

CalebSergey Chernyshev :verified:Jacky (is looking for work)Chris JohnsonmatsPodRichard MacManusErlend Sogge HeggenMax BöckJoe LanmanFynn BeckerFrank // MottokroshSimon Cox :SEO:ChrisAndy BellDerekMariohaliphax ????Chee Aun ????Timo Tijhofd3v1an7Ross McKayTim RileyBrian LeRoux ????Chris Hayescthos ????Dave RupertJonathan YuBrantley Harrisderek :prami:Eleventy ???? v3.0.0-beta.1atomickerKeith CarangeloJoe Steinbring :thisisfine:Jason Lawton :wordpress:Valentin SawadskiJames Doc
32 Comments
  1. Chris Hayes

    @zachleat Another thing with Vercel is their services out-of-the-box are often not price optimized. For example, "Speed Insights" is great, but by default they assume you want 100% of user visits. You can; however, manually reduce that. Their network monitoring tool is w… Truncated

  2. Chris Hayes

    @zachleat Agreed on the Vercel higher vCPU is not needed unless you're memory constrained. For a passive screenshot tool, time is not a problem. I've had to use the performance vCPU for applications where puppeteer was running on a user action. (https://gist.github.com/ke… Truncated

  3. James Doc

    @zachleat Thanks for the write up here. Appreciate it.

  4. Paweł Grzybek

    @zachleat this was a very interesting read Zach. At the company that I work for, we received a surprisingly high bill from Vercel and after looking closely into the usage breakdown we noticed that the number of used cache units is so so so high. Too high to be even possible taki… Truncated

  5. James Miller

    @zachleat There are so many ways to do things in AWS! If you eventually want a friendlier first-party package/deploy story that isn't the result of clicking around the console, check out SAM. Also not sure how many requests per month are hitting it, but you could eliminate AP… Truncated

  6. Zach Leatherman :11ty:

    @james oh fascinating. I’ll definitely have a look at those—I’ve only just started!

  7. Zach Leatherman :11ty:

    @chris_hayes yes! I’ve been looking into some things that help with AWS deployment too (e.g. arc.codes or serverless.com)

  8. Raymond Camden

    @zachleat query - you shared an estimated cost on Vercel, what is the estimated cost of Lambda?

  9. Zach Leatherman :11ty:

    @jamesdoc you’re welcome!

  10. Zach Leatherman :11ty:

    @pawelgrzybek yeah, more data is needed to justify some of the numbers I was seeing. They didn’t compare with my experience on Netlify or AWS

  11. Zach Leatherman :11ty:

    @raymondcamden so far the projected usage is still under the free tier for the month!

  12. Raymond Camden

    @zachleat nice! i dont think you mentioned that, i would. (its also 99% possible you did and i missed it.) btw, _damn_ good article!

  13. John Christopher

    @zachleat @james https://sst.dev is pretty nice as well. SST

  14. Zach Leatherman :11ty:

    @raymondcamden thank you! I just pushed a small note about the AWS cost

  15. Zach Leatherman :11ty:

    @jgchristopher @james very nice! I’ve also been pointed at https://www.serverless.com/ and https://arc.codes/ which seem like similar abstractions Serverless: Zero-Friction Serverless Apps On AWS Lambda & Beyond.

  16. Cory :prami_pride_demi:

    @zachleat @raymondcamden I'm over here with Cloudflare Pages, Digital Ocean for my admin, Supabase, B2, bunny.net, Echofeed, Feedpress, Plausible analytics ????

  17. Raymond Camden

    @cory @zachleat I've liked Cloudflares serverless stuff a lot. Their hosting has been on my "to test and blog" list for a while

  18. Cory :prami_pride_demi:

    @raymondcamden @zachleat yeah, I quite like their workers stuff and I'm well below the paid threshold for those. I've got one for analytics, contact form, now playing display, schedule rebuilds and music tracking — 100,000 runs a day is super generous for the *tiny* scale… Truncated

  19. Stuart Langridge

    @zachleat “ because usage is priced via GB-Hrs, 1 vCPU should use 1 GB of memory. Instead, 1 vCPU is 1.7 GB of memory and 1000 GB-Hrs is actually 588 Hours of usage.” You know how washing powder people tell you “use one cup” but the cup they provide you has an almost invisible m… Truncated

  20. Testaroli in Production!

    @zachleat @baldur wait... why are you rendering screenshots /that/ often? that's a lot of hours. shouldn't you render images once on build (inside CICD)? or, shouldn't the lambda only invoke once (again, on build, preferably)

  21. Joe Lanman

    @zachleat very nice! Small thing - was playing around and this syntax in your documentation always gives me 404 https://v1.screenshot.11ty.dev/:url/ eg https://v1.screenshot.11ty.dev/https%3A%2F%2Fgov.uk/ it works if I add the other url params (size)

  22. Joe Lanman

    @zachleat also wondering if you considered Cloudflare, I agree AWS is painful https://developers.cloudflare.com/browser-rendering/get-started/screenshots/ Deploy a Browser Rendering Worker · Browser Rendering docs

  23. Sia Karamalegos

    @zachleat "This is a desolate place where developer experience goes to die." Lol ????

  24. Stefan Baumgartner

    @zachleat Still baffled on how different the GB-Hrs count is on Vercel ????

  25. Stefan Baumgartner

    @zachleat Btw. what about the equivalent in EC2 instances that run all year?

  26. Brian LeRoux ????

    @deadparrot @zachleat roughly $20/mo

  27. Zach Leatherman :11ty:

    @brianleroux @deadparrot I haven’t used or priced EC2 so I’ll defer to Brian ????

  28. Zach Leatherman :11ty:

    @deadparrot Same. *Something* is wrong there.

  29. Zach Leatherman :11ty:

    @sia ????

  30. Zach Leatherman :11ty:

    @joelanman a little bit! I know they have their own fork of puppeteer that might have worked nicely here https://github.com/cloudflare/puppeteer GitHub - cloudflare/puppeteer: Puppeteer Core fork that works with Cloudflare Browser Workers

  31. Brian LeRoux ????

    @zachleat @deadparrot no need these days! Maybe cheaper at high scale (though arguably not given maintenance cost way higher)

  32. Scott Jehl

    @zachleat I'm sure there's a good answer to this but is there a reason SVG isn't supported by opengraph? Seems like it'd be a nice way to point to a composed graphic and avoid these screenshot services (however genuinely cool they are)

Shamelessly plug your related post

These are webmentions via the IndieWeb and webmention.io.

Sharing on social media?

This is what will show up when you share this post on Social Media:

How did you do this? I automated my Open Graph images. (Peer behind the curtain at the test page)