Flatbuffers Vs Protobufs - How They Are Used In Java

Updated Sep 28, 2023 • 15 min read

If you create distributed applications, you’ve probably heard of FlatBuffers and Protbufs. Below, we explain what they are, their key differences, and how they stack up against JSON.

Today’s trends require developed distributed systems that efficiently communicate with one another. Unfortunately, data serialization remains a serious problem that could lead to information leakage, misuse, or false interpretation if improperly done. Issues in data consistency or using raw structs, for example, can be very costly as it involves heavy data clean–up and wastes precious time.

The average application spends most of its CPU time doing some kind of serialization. With this in mind, it is no surprise that high priority has been placed on developing more performant remedies to decode and serialize structured data.

Protobufs and FlatBuffers present themselves as solutions to this problem. As open-source projects from Google, both these projects are aimed at making an efficient language neutral serialization mechanization that rivals that of JSON.

This is how their libraries work:

We provide a schema using their definition syntax (.proto and .fbs files)
We generate the code from their schema into your favorite language (for example, Java 😊) using their compilers
We use the generated code and the provided libraries to serialize objects into binary format

Although we have existing solutions to transfer our data through the wire format, like JSON, XML, Apache Thrift to mention a few, having other options to solve a particular problem is always a good idea.

The key now is to educate ourselves about these prospective ones to make sure that mistakes are avoided in the long run.

In this article, we dive into the details of FlatBuffers and Protobufs. Our goal is to dissect its different toolsets and features to make an informed conclusion for our next project.

What is a FlatBuffer file?

From official documentation, “A FlatBuffer is a binary buffer containing nested objects (structs, tables, vectors,..) organized using offsets so that the data can be traversed in-place just like any pointer-based data structure.”

FlatBuffer is a zero copy Google project aimed at the serialization’s memory factor. It uses strict rules of alignment to ensure cross-platform performance with much less memory. Their emphasis on tables and schemas for data storage reduces memory requirements and search time. By reading only a subset of stored data, FlatBuffer improves application performance and user experience.

What is a FlatBuffer used for?

FlatBuffer’s main focus is to improve the performance of mobile applications and games. Memory and memory bandwidth are pervasive problems that many mobile developers face. FlatBuffer’s wire format was created with this end-user in mind.

With the internal structure using offsets and nested objects with features like zero copy and no additional allocation, instances where server memory storage, bandwidth and speed are a main concern should also be considered a priority use case.

Google had also released general guidelines of do’s and don'ts. FlatBuffers, for example, should not be used in cases where we have to change the data. However, this makes them responsible for use cases with a lot of read operation and read only accesses.

By accessing parts without deserializing the entire dataset, FlatBuffer occasionally becomes a more efficient solution than Protobufs.

What is a Protobuf file? (Protocol Buffer)

Let’s turn to the official documentation again and look for a definition.

“Protocol buffers provide a language-neutral, platform-neutral, extensible mechanism to decode and serialize structured data in a forward-compatible and backward-compatible way. It’s like json, except it's smaller and faster, and it generates native language bindings.”

Protobufs present themselves as a language neutral serialization mechanism, more efficient in comparison to json, and easier to implement given the large body of documentation available. When it was first introduced, many developers drew heavy comparisons with XML.

What are Protobufs used for?

Specific use cases have not been outlined in the documentation. There were no similar motivations behind the creation of this library compared to FlatBuffers.

As it stands today, Protobufs remain the most common data format on Google as it’s used often in inter-server communications as well as disk data storage. Real-life implementations of this use cases include gRPC, Google Cloud, and Envoy Proxy.

This makes sense - Google wanted to create something that is better than JSON and the end product eventually replaced the text-based data format from the early 2000’s.

Comparing FlatBuffers and Protobufs

There are clear similarities and differences between the two serialization services, but we should not stop there. There are more specific features and key points to compare and contrast, such as:

Schema

We use schemas to describe the messages that will be serialized. They are essential if we want to use either FlatBuffers or Protobufs. Rather than simply dumping raw structs, having a schema definition reduces the probability for error and provides countless optimization opportunities. They are structured quite similarly for both formats, so minimal memorization is needed.

Here are some examples for a .proto and for a .fbs file.


namespace com.example.demoproto.model.fbs;

enum PhoneType:byte { MOBILE, HOME, WORK }

table PhoneNumber {
 number:string;
 type:PhoneType;
}

enum Position:byte { DEVELOPER, MANAGER, DIRECTOR }

table Person {
 name:string;
 email:string;
 position:Position;
 phones:[PhoneNumber];
}

table AddressBook {
 name:string;
 people:[Person];
 id:int;
}

root_type AddressBook;


syntax = "proto3";
package demoproto;

option java_multiple_files = true;
option java_package = "com.example.demoproto.model.protos";
option java_outer_classname = "AddressBookProtos";

enum PhoneType {
 MOBILE = 0;
 HOME = 1;
 WORK = 2;
}

message PhoneNumber {
 string number = 1;
 PhoneType type = 2;
}

enum Position {
 DEVELOPER = 0;
 MANAGER = 1;
 DIRECTOR = 2;
}

message Person {
 string name = 1;
 string email = 3;
 Position position = 4;
 repeated PhoneNumber phones = 5;
}

message AddressBook {
 string name = 1;
 repeated Person people = 2;
 int32 id = 3;
}

Schema features

Whether it’s on FlatBuffers or Protobufs, the syntax of the schema language looks strikingly similar to that of the C family.

Similar schema features amongst the two serialization libraries include the representation of arrays, enumeration support, and built-in scalar and non-scalar types at any bit rate. Importing schema definitions from other files is also a shared feature between the two serialization formats.

Although either service supports maps, this feature is not encouraged in FlatBuffers. Instead, a developer could look to add a table with keys as field names, if the keys were enumerable. If they’re not enumerable, create a table with both a key and a value, then store a vector of those. It might be beneficial to sort this vector by key before you add the FlatBuffer object in the case of performance issues.

For more schema features check out the official documentation for FlatBuffers and Protobufs.

Handling the Schema

Both interface description languages have their own rules when it comes to schema changes. Fortunately, overall they are quite forgiving. In general it should not present a huge issue, but here are the most important things for each of the two serialization systems to keep in mind.

FlatBuffers

FlatBuffers adds new fields to the end of your object definition. To maintain backwards compatibility, older data will still be read correctly and simply ignore this new field. If you want flexibility between older and newer data, you could instead manually assign ids or change field and table names.

Another thing to note is that you cannot delete fields no longer in use. Instead, mark them as depreciated and simply stop using them. Examine your code again as doing so may break some logic.

Protobufs

Protobufs is similar to FlatBuffers in that it allows new fields to be added freely. However, changing the field numbers for existing fields will break the code.

In this library, we can remove the fields without repercussions. The only requirement being that field numbers or ids should no longer be used again.

If you’re unsure about the compatibility between bytes and types, visit the official documentation for a detailed guide.

Performance

There are some benchmarks on the internet comparing the serialization techniques. Here is one for example dedicated to JVM languages. However, these benchmarks should not be followed blindly.

The effectiveness of these serializers mostly depend on how well we can define our schema and the whole data. Using a tool does not necessarily make things faster or better. Because of this the performance should be measured on the exact use cases!

Limitations

As with everything else, nothing is entirely perfect, let’s discuss some of the limitations for both Protobufs and FlatBuffers.

Protobufs are not designed for messages larger than 64MB. Having a message larger than this requires several GB of RAM to decode, a luxury resource not many businesses can afford to allocate to one data point. In fact, it is recommended that anything above 1MB to be split into multiple chunks that could be parsed separately.

This limit could be lifted manually if needed. Know that in doing so, code compiled with this lifted storage limit would be incompatible with any other implementation or software.

Both FlatBuffers and Protobufs have a hard limit of 2GB has been set in place because these implementations use 32-bit signed arithmetic. With plans for 64-bit extension, this hard limit could be raised soon enough.

Use in Java

Either service supports Java. Their compilers can generate Java classes from the schema definitions. They also provide specific documentation so we can read and generate the messages from/to binary formats.

However, you might be surprised to find that Protobufs’ primary difference is that it enjoys more features. For example, Spring has a dedicated HTTP message converter. It will automatically parse the HTTP body into Java objects if it detects the content as Protobufs. To have the same conversion in FlatBuffers, the code must be written manually.

Protobufs provides easier methods to modify the wire format, because there are convenient builder methods to do so. To create the same message for FlatBuffers, intricate and extensive lines of code are needed.

FlatBuffers and Protobufs: Pros and Cons

To help you decide between the two services, here are the prominent advantages and disadvantages of both serialization systems.

FlatBuffers

Pros	Cons
No need to deserialize the whole dataset if only a part of it is needed	Not intended for mutable state
Perfect for performance critical applications	Not developer friendly and requires a lot of code to build the messages

As a serialization library, FlatBuffer improves the performance of performance-critical applications, given its capability to deserialize only certain sections of a dataset. Without parsing or unpacking, data loading can be done into memory as is.

FlatBuffer is an alternative, faster, and smaller solution to other serialization techniques. Without this mechanism, the data and wire frames compiled would be useless as it’s not in a human readable representation. The service itself is also supported by many languages.

However, you should know that, at its core, FlatBuffers was designed to not be modified after creation - it is a serialization format, after all. The library also isn’t very developer friendly as tons of lines of code are required to build messages. This was done intentionally by Google as it allows the developer to specify their storage format in great detail.

Protobufs

Pros	Cons
Well supported library, backed by Google	Parses the whole message before accessing it
Developer friendly and it is easy to construct messages	Limited data size - works best with max few MBs

As opposed to FlatBuffers, Protobufs are much more developer friendly with a large body of existing documentation. Storing and interchanging structured information of all types has led to the library being widely adopted with massive community support - factors that have played key roles in Protobufs having an extensive body of documentation.

The developers at Google placed a large emphasis on simplicity and performance when they came up with the library. Starting with Protobufs is simple. Downloading and installing the Protobufs compiler and a quick peruse at the overview and tutorial should set you on the right path of development.

However, Protobufs is slower than FlatBuffer given that it requires the whole message to be parsed prior to accessing it. As it works best with messages of a limited data size, storage is also another problem that Protobufs users might face.

That said, the two serialization mechanisms do share similarities, as seen below:

Efficiency - Both libraries provide an alternative, faster, smaller solution to other serialization techniques than even rapid JSON.
Quicker Time To Develop - A special generated source code is generated from a schema to a language using compilers. You can use this to easily write and read structured data.
Cross-Language Compatability - From C++ to Dart, From Java to Python, ProtoBufs and FlatBuffers both support a wide array of languages.
Strongly typed - Protobufs and FlatBuffers output errors at compilate times. This avoids errors that only happen at run-time checks.
Schema management - Both languages are highly particular with how their schemas are set up. Take extra care to ensure that everything is in sync.
Schema dependency - The serialized data is not human readable, unlike JSON or XML. Data points are useless without schema management On top of this, both libraries need a well-constructed schema to be effective
Unreadable - FlatBuffers suffer the same drawback from Protobufs due to either’s lack of human readable representation.

FlatBuffers or Protobufs? Improving your data models

After having played with them for a while, we can say that there are specific use cases where one serialized is better than the other and vice versa. Is it enough to decide on which is the best overall? As always, the answer is not that simple.

It would make sense that different languages would have their own set of benefits and disadvantages. Other languages are better for different use cases too, especially FlatBuffers. Here’s where we stand on the comparisons:

Use JSON if we want the data streams to be readable (like storing in a database for easy queries) or if there are plenty of resources. It is also easier to use and doesn’t require schema management. RapidJSON is also one of the fastest C++ JSON parsers, so you may want to check that.
Use Protobufs if we want to be more efficient and the message is not that big (1 MB or less).
Use FlatBuffers if we want to be more efficient with larger messages. FlatBuffers is the better choice if you’re looking to create read-only query messages - this feature also saves on time and memory.

This may be a bold claim - but one we stand by whenever we plan for our next project.

As software engineers, we are responsible for analyzing the situations. Only after we gather all possible information can we make the most educated decision - a decision like choosing the best serialization library for our projects.