Here at Money Alive, we set out to create a platform that was both scalable and had resilience built into it. We wanted to build something that could grow and adapt to the increased traffic to the platform and seamlessly scale without affecting the end client.
We decided to go with AWS (Amazon Web Services) as they host some of the biggest websites in the world. AWS clients include, but are not limited to Netflix, Twitch, LinkedIn, Airbnb, Apple and the list goes as AWS has 33% of the cloud market- we decided to join these clients because we felt AWS provides great services at a great price and they have scalability and redundancy built into everything they offer.
Because of the infrastructure we have built on AWS, we have managed to maintain an average uptime of 99.8% and on a weekly basis, we reach 100% uptime and an average CPU usage of 2.2% across our infrastructure and an average load time of 428 milliseconds.
A general overview of how our infrastructure works and how it can help maintain scalability and redundancy.
We make use of MariaDB hosted on AWS using their RDS service. Our database is stored in London. We make use of their redundancy features. Our database has Multi-AZ enabled. This means that our database is stored in a given “Availability Zone” (AZ), but has a backup database in a different AZ. If for any reason our current active AZ has any technical issues and becomes unavailable, the backup instance will kick in and will take over from the original till it can be resolved. This change over is automatic and happens in minutes. No changes are required from our end as we point to a single endpoint, which automatically sends traffic to the correct database instance, either the default database or the failover instance.
Our databases also have autoscaling storage, so if our database grows and reaches its limits. The storage used will automatically scale up to accommodate this increase of data.
Our databases are backed up nightly and can be restored to within 15-minute timeframes. These backups are stored in multiple locations throughout our Infrastructure and have multiple redundancies in case any of these locations should fail for any reason.
Our server infrastructure automatically scales to accommodate an increase in demand and have redundancy to account for any technical issues that may occur.
Our web servers are protected by Cloudflare, so when a user visits our platform, their request is first passed through Cloudflare. This helps block any malicious requests from getting through to our servers, this includes attacks like DDOS.
Our servers are set within an autoscaling group and there are always at minimum three servers available to handle the traffic. If requests start to increase and therefore the servers need a little help, then a new server is automatically provisioned into the autoscale group. This takes a couple of minutes to enter the group and start taking traffic and load of other servers.
A load balancer sits between Cloudflare and our servers. This load balancer will send requests to the backend servers and will make sure to disrupt the load eventually across all the servers to make sure no one server is handling too much traffic. When a new server is added to the autoscale group, the load balancer is aware of this and it will also start to send traffic to that new server once it has been fully provisioned.
Our autoscale group will also automatically provision servers within different “Availability Zones”. There is always at least one server in each availability zone. This means that if anyone or two zones fail, then there will be a server to handle requests and autoscale by provisioning more servers in that availability zone. These availability zones are in different data centres and are physically separated by a meaningful distance, many kilometres, from any other AZ, although all are within 100 km (60 miles) of each other.