While we want to give valuable error messages to help our customers navigate our system, being too specific may open our systems to threats. Finding balance between readable server messages and securing system from hacker attacks is key.
Error messages have always been a part of programming. For example, a common mistake every beginner developer makes is dividing by zero or unconsciously using null. There is no progress in learning without making mistakes.
But what if our systems encounter a problem? Should developers make it straightforward for users that a situation like this occurred or hide it without any information? What should users know to avoid repeating the same error pattern? How much error data should developers share to prevent sharing critical information for hackers?
Modern development has yet to develop concrete guidelines and principles, so in this article, we’ll share what we’ve learned in over a decade of operation and present the basic do’s and don’ts of error messages.
Don’t mention what applications / stack is your system using
Sometimes, a developer may think that returning the database type would be useful for the user to understand why something went wrong. For example, an error stating that a user cannot be assigned to a project, because it’s already in relation with another database row, may seem like a helpful error, but inherent problems may cause issues down the road.
It’s easy to implement this error message because most of the time, we can copy the error message returned from the database and shuffle its content to make it sound “smart” without copying the whole database stack trace. But, we need to remember that ready solutions can have vulnerabilities and these types of error messages are giving “hints” to potential hackers via error messages.
Mentioning our used applications and stacks may provide too much data to hackers and make it easy for them to create attacks. A project using SQL may be vulnerable to an SQL injection. If hackers find out we use Postgres, it’s even easier to just pick a specially designed query to attack a project. The same might happen when using other tools like Airflow, Redis, CircleCi and so on.
Don’t forward server errors without any mapping
Even if our system fails, we should never return error messages coming from our application engine/tool. We had already mentioned how dangerous it would be for a hacker to know what technology we are using. Second, these errors don’t provide anything readable or essential for the end user.
If something uncontrolled happens, we should return status code 500, inform our end user that the error occurred because of an application issue, and it was already reported to the developer team. In that case, we should avoid giving any specific explanation. Users need to know that they didn’t cause the error, and we know that it happened, so it does not require informing the support team about it.
Don’t send error codes that don’t match error messages
There are a lot of different error codes provided by the REST API. There is basic splitting between four hundreds for user fault and five hundreds for server fault, that will give some basic knowledge to frontend team or end user.
We can say a lot about the system itself via status codes:
- 404 - given value does not exist.
- 409 - we cannot add value because some value provided by us is not unique
- 403 - we don’t have a proper role to access resources.
A typical issue with poorly designed / managed apps is that no matter what happens inside the application, a 400 status code is always returned. This error code means a BAD REQUEST, and it’s always connected strictly to the user's fault.
No matter if the application has an error, no matter if there is an unhandled bug, or inserting a given value will create conflict in the database - returning error code 400, means that it’s always a user's fault without specification what was exactly broken.
Additionally, when adding vague descriptions to error messages or completely skipping error messages, our system becomes a “black box” that does not allow users to feel safe when using it and makes other team members work harder, as the frontend team will be blamed for displaying vague errors on the website and testing the system properly will become almost impossible for the QA team.
Do: Make errors readable for non technical users
It's common for support teams to get 'bombarded' with the same questions about incomprehensible errors returned by the system. Remember that most people prefer to avoid writing emails and jumping through hoops to understand an application. Indecipherable error codes harm a product's perceived convenience, leading to unsatisfied clients and a higher churn rate.
That said, users tend to search for answers by themselves before asking the support team. While answers may sometimes appear in the FAQ section of an application, this is only sometimes the case - especially for new errors.
Using the "user project assignment" error as an example, one could applaud the error message for being detailed; it makes the program sound intelligent and technical. But how could someone without knowledge of Computer Science know what relation in the database exactly is? Why do they have to know?
The exact error message could be written more clearly and provide the same benefit.
Instead of "user cannot be assigned to a project because it's already in relation with another database row" we could say, "user has already been assigned to another project." and provide a URL that precisely shows which project our users belong to.
Though this solution will require constant testing and writing error mappers, it will help avoid creating repeated work for the support team.
Every possible error due to specific business requirements should always be mapped by our system and made readable. If we know that there are unique types in our database, we should inform our end users that they cannot pick existing values.
We shouldn't just say, "you cannot insert these values" or "we could not insert provided data" because users will try the same approach a couple of times and then contact the support team. Users deserve to understand how to use our system correctly.
Do: Similar response time for different errors causes
Though you might not realize it, knowing which programs and applications people use is a powerful tool for hackers.Phishing mail becomes more convincing if it is pretending to be sent from a company used by the recipient. If a user is subscribed to a specific cell phone company, it is more probable that they will open a fake bill that “comes” from there.
Other than retrieving user information through data leaks, scammers may do it through a less known way - application response times.
Most of the services have the same approach regarding user logging. First, the application scours the database to find the user. If the user doesn't exist, the app returns a message that the email or password is invalid.
One way to avoid this is by saying that we don't have this person in the database; we say that something provided (a password or email) is not valid. However, if a person does exist in the database, we need to validate that the password that was sent matches the encrypted value in our database.
This logic takes milliseconds longer to execute. Though an end user won't typically feel this time difference, a hacker using a computer clock counting milliseconds is able to spot this discrepancy. How can we handle that? By controlling that, endpoints with sensitive data have the same response time no matter what error occurred.
Do: Create and send error IDs to provide technical info to QA and other developers
Having quick, immediate access to lines of communication between QA and the developer team is key to building a responsible system. Custom error codes can be long random number sequences that don't mean anything to people from the outside. However, at the same time internal members can match confluence pages to the error and work together to solve the problems efficiently.
For example, users will only see 500 codes and info that the error code equaled XXX and that this error was reported to the support team. On the other hand, testers using that code will know that the database has been broken during the request period, and they need to check its logs.
Do: Unify error message between different microservices
Users don’t have to know what architecture we are using. From the outside, they should see our project as a whole, meaning every error message returned should be similar to others. There shouldn’t be any different rules between communication coming from different resources.
From a security perspective, knowing the system architecture gives hackers an idea of what tools they can use to penetrate our data. Even more, inconsistent formatting in client communication can give the impression that a company that provides tools is unprofessional and not handled properly.
Building valuable server messages and secure systems is a constant balancing act
Even though projects are judged by front-end designs and usage speed, we should never forget about user experience when it comes to unhappy paths. Users should feel that we have maximum control over the project - that the customer journey flows well and without issue.
Readability is a big part of client-developer communication and vague descriptions will only break user trust. Users don’t need to know technical aspects of our app, they just need to know if they are responsible for the error and if so, how to solve it next time.