How to Set Up a Free Web App Status Page and On-Call System: A Step-by-Step Guide
Several times throughout my career I’ve set up or operated systems to handle production site monitoring and alerting.
The end goal is always the same: ensure that if there is a massive issue, it gets dealt with quickly.
What often changes are the secondary requirements such as frequency of monitoring, on-call scheduling, how to notify people, etc…
I recently got an interesting challenge: build a system that costs $0 to operate.
Ok, Challenge accepted!
Let’s look at it, but first a couple of caveats.
I took a look at cost as the primary factor, but as a secondary factor ease of setup was important as I only had a few hours to hack something together.
I also wanted to explore open-source options, but given limited time I’m sure I missed some good options, so if any of you have some good recommendations I’m happy to hear about them.
Let’s break it down…
The System
At a minimum, a health monitor, status page, and incident management tools are needed to construct a functional on-call system.
Here are my picks that satisfy the requirements:
Health monitor/status page: Upptime
This open-source tool ended up being a slam dunk. By making a copying repo I was able to leverage GitHub Actions to periodically monitor multiple URLs, which can be manageable on a free plan of 2000 minutes.
The other major feature I enjoyed was being able to respond and update incident status on the site instantly by using GitHub issues. Any time an outage is detected, an issue will be created allowing the development team to communicate the current situation easily in a workflow (which they are likely to be familiar with).
Since it is deployed to GitHub pages, hosting is also free for public repos. Additionally, GitHub pages can be pointed to custom domains for additional customization.
There also appears to be a high ceiling for customization, allowing users to customize their page to the same degree as paid competitors.
There are some limitations, however. GitHub actions can only be scheduled in increments as small as 5 minutes apart, so if you need more frequent checks, this option is probably unsuitable.
There is also no mechanism to allow external stakeholders to subscribe to status updates like you would get from products like Atlassian’s status page.
As long as you follow the Upptime documentation closely this should be fairly straightforward
Incident management tool: Opsgenie (free tier)
This was a tougher call.
At the onset, I was hoping that the free plan for Atlassian’s status page would connect easily with Atlassian’s Opsgenie’s free plan feature, but surprisingly it didn’t the way I needed
I think Atlassian needs to have a chat internally — because why is there a webhook feature available for both Opsgenie and Status Page to each other, but not until you upgrade?
They probably would have had me as a long-term customer for both products hooking up the two up was easier while it was free, but now I’m not going to.
Gripes aside, it was an easy setup with Opsgenie so partial marks to Atlassian. Free plus easy is why I stuck with this tool.
The Glue: Zapier (free plan)
The biggest challenge was figuring out how to get Upptime to communicate with Opsgenie without needing a feature behind a paywall.
After wrestling with various options and methods (email, webhook, Slack, etc…) I figured out that Zapier could do what I wanted, which was to listen to Upptime via webhook, and then pass along the alert to opsgenie via API call.
I picked Zapier because their free tier provided sufficient resources for the on-call system. It was also easy to get set up.
I’m limited to 100 uses per month on the Zap I set up on the free plan, but if I’m triggering over 100 incidences in a month I’ve got bigger problems!
The Step-by-step guide
Upptime:
- Create a new repo from Upptime
In this step ensure the “Include all branches” option is selected (this is required!).
Secondly, note if the repo owner is an individual or an organization (this will have minor implications later).
Additionally, make sure you set it to a Public repository or else this will cease being a free endeavour (GitHub pages are not a free feature for private repos).
Finally, click “Create Repository” — it will take a minute to copy everything into your new repo.
Enable Publishing of your new repo. Follow the steps on this link. If you have ever used GitHub pages, this is the same.
2. Add Repository Secrets
Following the steps outlined in Upptime’s docs tripped me up a bit, so I’ll add some additional info here that I hope helps.
Setting the owner as yourself in Step 1 following their documentation as stated should be good. If that’s what you did skip down to the next step.
If you set the owner of the repo from Step 1 as an organization make sure that the organization has the “Allow access via fine-grained personal access tokens” setting enabled, otherwise, the GitHub actions won’t work. To do that follow these steps.
Once enabled create a New fine-grained personal access token but make sure that the Resource owner option is set to the organization rather than your personal GitHub account.
Now do as the Upptime documentation says.
You will find this file in the root of the repository. For example, this is the only file we will manipulate, but deeper customizations will include additional files.
The Upptime documentation does a decent job describing all of the customizations available for this file, but for simplicity here is mine (minus my personal information):
owner: my-organization # Your GitHub organization or username, where this repository lives
repo: my-upptime-repo # The name of your repository copy
sites:
- name: some App
url: https://app.example.com
- name: some Site
url: https://example.com
status-website:
# Add your custom domain name, or remove the `cname` line if you don’t have a domain
# Uncomment the `baseUrl` line if you don’t have a custom domain and add your repo name there
# cname: demo.upptime.js.org
baseUrl: /upptime
logoUrl: https://app.example.com/path/to/logo.svg
faviconSvg: https://example.com/path/to/favicon.png
name: My site’s Status page
introTitle: My site is awesome! So I care that it’s always working
introMessage: This is a status page which uses **real-time** data to ensure high availability
navbar:
- title: Status
href: /upptime
# - title: GitHub
# href: https://github.com/$OWNER/$REPO
workflowSchedule:
uptime: "*/10 * * * *" # sets to run uptime check every 10 minutes
# uptime: "*/5 * * * *" # sets to run uptime check every 5 minutes
Some highlights on this file to mention:
First, under the “sites” tag notice the multiple URLs listed. You can put a variety of sites here which monitor all of them in the same periodic GitHub action job (it won’t run double/triple/etc… the build minutes which is nice)
Secondly, the workflow schedule tag can override the 5-minute default cadence. I wanted to include it for anyone unsure where to add it. It follows the same syntax as using cron on Linux systems. As a side note, 5 minutes is the shortest cadence that GitHub allows.
Again there are further customizations you cannot add, but this is enough to get started
Opsgenie:
- Get an Opsgenie account
Ensure you have the free plan active. In their onboarding flow, you should be asked to include your contact info and ensure a test alert works and you get contacted appropriately. - Create a team and add yourself to that team. If you have other team members to add feel free to do it now, but this can be done at anytime
- In the team go to integrations settings and add an API integration. Note the given API key (also called the GenieKey – we will make use of this shortly)
4. Feel free to jump to the subscription page and explicitly opt into their free plan. This helps prevent accidental use of premium features that’ll break once the free trial expires (and tears later).
Zapier (plus some futzing with the other tools):
- Create a Zapier account. Similar to Opsgenie, feel free to explicitly opt for their free plan (let's not mince words, we’re just here for the free stuff!)
- Create a new Zap. This zap will only require 2 steps
Part 1 of Zap: create a catch hook. We’ll grab the created webhook and return to our GitHub Upptime repo.
We’ll need to add this webhook as a repository secret. Specifically these two: ( more about additional notification settings here )
Part 2 of Zap: triggering Ops genie API. Here we’ll want to check our alert input into the webhook step and forward it into the api call. We’ll also call the specific Opsgenie api: and add the geniekey to our api call as a header. Once set here is what the API call should look like:
3. Full test:
Now that everything is set up, let’s trigger an outage alert. The easiest way to do this is to put an incorrect URL into the configuration YML in the upptime repo:
owner: my-organization # Your GitHub organization or username, where this repository lives
repo: my-upptime-repo # The name of your repository copy
sites:
- name: some App
url: https://app.example.com
- name: some Site
url: https://<SOME OBVIOUSLY NON-WORKING URL HERE>.com
status-website:
# Add your custom domain name, or remove the `cname` line if you don’t have a domain
# Uncomment the `baseUrl` line if you don’t have a custom domain and add your repo name there
# cname: demo.upptime.js.org
baseUrl: /upptime
logoUrl: https://app.example.com/path/to/logo.svg
faviconSvg: https://example.com/path/to/favicon.png
name: My site’s Status page
introTitle: My site is awesome! So I care that it’s always working
introMessage: This is a status page which uses **real-time** data to ensure high availability
navbar:
- title: Status
href: /upptime
# - title: GitHub
# href: https://github.com/$OWNER/$REPO
workflowSchedule:
uptime: "*/10 * * * *" # sets to run uptime check every 10 minutes
# uptime: "*/5 * * * *" # sets to run uptime check every 5 minutes
Doing this should quickly trigger the flow and you should receive a notification on your phone. If that all works, congratulations, you’ve set up a simple alerting system that costs nothing to operate!
Overall this is a “good enough” solution that I can see running into scaling issues.
If you need something free and “good enough” this should be good for a startup situation until your team gets more time to implement something more robust and more budget.
I expect this should only take about an hour to set up unless you’re experimenting, but let me know what you discover or if you find better options.
I hope you found this guide handy! If you found this valuable, please follow the blog, where I’ll continue to post on tech topics and more!