Most web applications are fundamentally similar. They get input from users and then process and store it. The end result usually shows the processed input to users in the form of an output. Such applications are usually called CRUDs (create, read, update and delete). This means that the infrastructure (the core components that work together) needed to support a typical product can stay almost the same for the majority of different kinds of startups. In this article, I show what are those core components and how to design the infrastructure in a way that it is efficient, scalable (it grows along with the product) and cost-effective.
The Core Components
Here are the elements that most web applications need.
This is a place where the code is executed. There are many options to choose from but, typically, going with one of the cloud web service providers, such as Microsoft (Azure VM) or Amazon (EC2), is a safe choice. They are well-supported, easy to use and when your product grows, it is easy enough to migrate to a more expensive but faster option (from the same provider).
Application Dependency Manager
The dependencies of the application (like background processing worker, cache database, HTTP server, etc.) have to be managed. You can either do it manually (which can be erroneous and not easily scalable and upgradable) or use containers (usually in the form of Docker containers), which is a bit harder at the beginning but, in the long-run, more cost-effective. When using containers, you can think of dependencies as black boxes with clearly defined rules of cooperation.
It may be obvious but it is still worth remembering that the code itself needs a programming language interpreter or virtual machine. The environment that the code needs (like Ruby or Node.js) is a major factor when choosing the web server as some technologies like Ruby are more platform-agnostic (you can run them almost everywhere) and some are usually working better on a particular platform - like .NET.
The data has to be stored somewhere. There are 2 major choices here:
Should the database be on the same machine as the application?
I’d argue that it shouldn’t. The effective configuration of the machine for a web server and a database are different, but the major reason is that it is hard to beat the quality of a solution designed specifically for databases like Amazon RDS and Azure SQL. These machines are managed by some of the best experts in DB administration. If you can afford them, which is extremely realistic, do it now. They allow you to scale performance as needed and create backups just by changing settings in the dashboard. Using the same server to host database makes the application more prone to data loss in case of a server failure. Safety-wise, a database should always be stored on a separate machine (or service).
Should you use a relational or non-relational database?
In order not to start a flame war, I’d say that the data in most applications can be organised in one or more tables (or "relations") of columns and rows. Accordingly, choosing a relational database as a default one is usually a safe choice. Using non-relational database should be backed by research to see if it is a good choice for the particular use case.
Background Processing Worker
A place where all the calculations that take more than a second to complete are processed. You can think about it like this: without a background processing worker, any calculation that is done in the application requires someone looking at the spinning circle in their browser, waiting for the page to load while it is being calculated. You don’t want that for calculations that take more than a second.
Should I have the background processing worker on a different machine than the application?
Probably yes, but you can start by having it on the same machine because the migration (when needed) will beeasy if you’re using Docker.
Nowadays, databases are fast but for the data that doesn’t have to be stored persistently, it is a good idea to store it in the memory. Random Access Memory (RAM) takes nanoseconds to read from or write to, while hard drive access speed is measured in milliseconds. Other than the persistency problem, RAM is more expensive than hard disks. That’s why only some of the data is stored in-memory. Whenever data doesn’t need to be persistent, store it in the memory. This greatly improves the speed of the application.
You can host an in-memory database like Redis on the server where you have your application. When your application outgrows this solution, it is very easy to switch to external services like AWS ElastiCache.
Email Delivery Service
Email delivery is hard. Outsourcing the problem is the most cost-effective solution. It’s not expensive and the saved time can be spent doing more valuable things. Not only will you have your emails delivered, but also get a lot of useful statistics. You will also avoid potential address/domain blacklisting in regards of spamming. Use external services, like SendGrid or Mailgun to send emails.
If you need to store a lot of static files using a dedicated storage server is a must. It will be cheaper and faster for your users. Let your web server handle business logic and storage server handle delivery of static files. Using solutions like AWS S3 or Azure Blobs will also come with the benefit of having a backup for the files. These solutions are incredibly cheap and guarantee 99.999999% SLA.
Content Delivery Network
Most startups need one or two web servers to handle all the traffic. But even if the web servers are fast, the data still needs to travel over the wires to the users. That’s why it is important to have a CDN which will help (almost automatically, without much configuration) to serve most of the content to a user based on their geographic location. This will also help your product survive attacks, like DDoS. Most common solutions include Cloudflare and AWS Cloudfront.
This a very wide topic. In the old days, there was a log file (or files) stored on the server and accessed only by connecting to the server. Nowadays, we have so many logging tools that you can (and should) log pretty much everything. Having had multiple different machines, it is worth aggregating logs to some centralised tool in order to browse through them with ease like in PaperTrail, Logmatic or ELK Stack.
Error tracking software
The first thing is to know whether the users that access a web application have any problems. A service like Rollbar can be easily integrated into the application and catch as well as notify you whenever errors occur. Adding an application analysis tool, like NewRelic, will help you to find bottlenecks within your application.
It’s crucial not only to record your application’s logs but also server’s logs. Ideally, the service that aggregates all the logs can also do an automatic analysis of the data to notify you whenever unexpected events take place.
Having a good logging infrastructure will not only notify you about a potential or existing problem, but also considerably decrease the time needed to fix it.
Connecting All Components
Having so many components may seem complicated at first, but - in reality - for an experienced person, it can be implemented within 2 days (or even a couple of hours). Using many external tools isn’t necessarily more expensive than trying to do all the things yourself.
Here’s a draft of the infrastructure that uses services which we think are the best for most case scenarios. You may need additional components or you may not need some of those I have included. It is a draft of an infrastructure that will suit most products because it is not only cost-effective but also can scale when your product grows. And if your product is not a typical CRUD application or has any special needs, this is an excellent starting point to adjust the infrastructure further to suit the exact requirements of your product.
You can change particular providers for others, add additional web servers (in that case you’ll also need a load balancer and potentially an autoscaling mechanism) or an additional database. You know your infrastructure is well planned out if you can easily change a part of it without changing the whole concept.
We use the infrastructure (or very similar) that I described in most applications and it proved to be an excellent framework to start with and expand if needed. If you happen to have any questions regarding this infrastructure or you want us to create such an infrastructure for your product – don’t hesitate and give us a shout at firstname.lastname@example.org.