Data Validation with Pydantic

Photo of Karol Szuster

Karol Szuster

Updated Feb 21, 2024 • 14 min read
Data Validation with Pydantic 1200x630

Unforeseen actions or incorrect data types such as software scenarios that haven’t been considered, unusual user behavior, incorrect data, or database communication errors are some of the most common issues developers face.

To ensure the best and the most fail-safe functionality, it’s good to have control over system processes, the data, and their types, and ensure incorrect data doesn’t disrupt operations. One way is by validating variable software types, which is where Pydantic comes into the equation.

Pydantic is a Python tool that’s primarily a parsing library, as opposed to a validation library. According to documentation, it “guarantees the types and constraints of the output model, not the input data. Although validation isn’t the main purpose of Pydantic, you can use this library for custom validation.”

What is Pydantic?

Pydantic is a Python package that provides you with two main functionalities:

  • data validation
  • settings management

We’ll discuss the data validation functionality in the article below.

The Pydantic doc states: “Pydantic enforces type hints at runtime, and provides user-friendly errors when data is invalid.”

So, in layman’s terms, Pydantic is a set of tools for controlling the format and type of input and output data. It uses Python type hints, so there’s no need to learn a domain-specific language.

Why should you use Pydantic?

Pydantic is handy for two main reasons. Firstly, you gain readability of the code. When someone’s working on code (or is coming back to it after a long break), Pydantic enables you to clearly see the structure and type of data expected or required.

Secondly, data passed to functions is validated, saving you from undesirable actions caused by wrong data types. Sometimes, you can’t be sure what kind of data is passed to your program, so it's better to protect yourself.

Sounds like a Python dataclass?

Yes and no. Pydantic is similar because it helps you determine the type of data processed. With both dataclass and Pydantic, you define the type of expected data with type hints, and it looks like this:

from dataclasses import dataclass
from pydantic import BaseModel

@dataclass
class Bird:
   name: str
   wingspan: int
  
class PydanticBird(BaseModel):
   name: str
   wingspan: int

>>> bird_data = {"name": "Alcedo", "wingspan": 25}
>>> alcedo = Bird(**bird_data)
>>> pydantic_alcedo = PydanticBird(**bird_data)
>>> alcedo
Bird(name='Alcedo', wingspan=25)
>>> pydantic_alcedo
PydanticBird(name='Alcedo' wingspan=25)

Both seem to work the same. But, what if you pass the wrong data type?


>>> bird_data = {"name": "Alcedo", "wingspan": "blue"}
>>> alcedo = Bird(**bird_data)
>>> pydantic_alcedo = PydanticBird(**bird_data)


Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File ""<input>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PydanticBird
wingspan
  value is not a valid integer (type=type_error.integer)

As you can see in the traceback, Pydantic doesn’t allow you to create class instances with the wrong datatype, but a dataclass will. As such, Pydantic is a useful tool for preventing software from undesirable behavior.

Be careful: validation or parsing?

If you go deeper into the topic, you may come across information saying Pydantic doesn’t really validate, but parses. What’s the difference? In the previous example, Pydantic worked excellently! By the end of this article, let’s code with Pydantic! And remember, the difference between validation and parsing is crucial.

Let’s look at the next example.

>>> bird_data = {"name": "Alcedo", "wingspan": 25.4}
>>> pydantic_alcedo = PydanticBird(**bird_data)
>>> pydantic_alcedo
PydanticBird(name='Alcedo', wingspan=25)

Didn't we want to validate the "wingspan" variable so that it always contains an integer? Looking at the output, everything’s correct and we have an integer. Pydantic parsed a float to int – when Pydantic gets data, it tries to parse the data to the specified type.

But what if you want to avoid such parsing situations and for Pydantic to pass only integers? Pydantic offers Strict Types, such as:

  • StrictStr
  • StrictBytes
  • StrictInt
  • StrictFloat
  • StrictBool
from pydantic import BaseModel, StrictInt

class PydanticBird(BaseModel):
    name: str
    wingspan: StrictInt
>>> bird_data = {"name": "Alcedo", "wingspan": 25.4}
>>> pydantic_alcedo = PydanticBird(**bird_data)

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.
10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PydanticBird
wingspan
  value is not a valid integer (type=type_error.integer)

Now, Pydantic expects exactly an integer type. You should therefore think carefully about what data is desired, because if you specify StrictFloat, and at some point, the software converts a floating point number to an integer (for example, 3.0 to 3), an error is thrown (meaning the validation works).

In addition to Python types, thanks to Pydantic you can also validate a variety of other useful data types such as:

  • IP addresses
  • Email addresses
  • Path to file
  • Path to directory
  • Color
  • JSON
  • URL
  • UUID
  • Payment card number (and more)

Function arguments and class attributes validation

Handy and elegant describes the functionality of validating arguments passed to functions. All you have to do is set up Python type hints and a decorator for a function imported from the Pydantic library.

from pydantic import StrictFloat, validate_arguments

@validate_arguments
def check_if_alcedo_has_regular_wingspan(wingspan: StrictFloat):
    if 23 < wingspan < 27:
        return "Regular Alcedo"
>>> check_if_alcedo_has_regular_wingspan(25.0)
'Regular Alcedo'
>>> check_if_alcedo_has_regular_wingspan(25)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module> 
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
  File "pydantic/decorator.py", line 133, in pydantic.decorator.ValidatedFunction.call
  File "pydantic/decorator.py", line 130, in pydantic.decorator.ValidatedFunction.init_model_instance
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CheckIfAlcedoHasRegularWingspan
wingspan
  value is not a valid float (type=type_error.float)

In addition to validating the type of variables that are passed to a function, you can also set rules for a variable, such as number ranges, the maximum and minimum number of objects in a list, the length of strings, etc. It’s necessary to use the Field function, and you can combine it with the built-in python Annotated function from the typing library.

from typing import Annotated
from pydantic import Field, validate_arguments, StrictFloat

@validate_arguments
def get_only_regular_alcedo(wingspan: Annotated[float, Field(gt=23, le=27)]):
    return "Regular Alcedo"
>>> get_only_regular_alcedo(21)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module> 
  File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
  File "pydantic/decorator.py", line 133, in pydantic.decorator.ValidatedFunction.call
  File "pydantic/decorator.py", line 130, in pydantic.decorator.ValidatedFunction.init_model_instance
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for GetOnlyRegularAlcedo
wingspan

The Field function can also be used in the class when defining variables in the following way:

class PydanticBird(BaseModel):
    name: str
    wingspan: Annotated[float, Field(gt=0, lt=35)]

Or this way:

class PydanticBird(BaseModel):
    name: str
    wingspan: float = Field(gt=0, lt=35)

You may catch errors with the ValidationError imported from the Pydantic module and the error messages are in a more friendly format.

>>> from pydantic import ValidationError
>>> try: get_only_regular_alcedo(21)
... except ValidationError as error:
... print(error)
    
1 validation error for GetOnlyRegularAlcedo
wingspan
  ensure this value is greater than 23 (type=value_error.number.not_gt; limit_value=23)

Pydantic also gives you the ability to create your own validators, allowing you to adapt the tool to the needs of each developer.

class PydanticBird(BaseModel):
    name: str
    wingspan: float = Field(gt=0, lt=35)

    @validator("name")
    def name_cannot_contain_non_alphabetic_characters(cls, name: str):
        if not name.isalpha():
            raise ValueError("cannot contain non alphabetic characters")
        return name.title()
>>> bird = PydanticBird(name="Alcedo 5", wingspan=20)
...
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module> 
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for PydanticBird
name
  cannot contain non alphabetic characters (type=value_error)

Sometimes, it may be the case that it’s not necessary to validate data, because it’s already been validated or comes from a trusted source. In this kind of situation, you can use the built-in construct() method to create objects. What’s the benefit? Pydantic documentation states: “It’s generally around 30 times faster than creating a model with full validation.”

>>> pydantic_alcedo = PydanticBird.construct(**bird_data)

Handling JSON

Another method that can be called using a Pydantic model is JSON(). Of course, there’s no problem getting JSON from the traditional class object, but it’s handier with Pydantic, as per the following code:

>>> import json
>>> json.dumps(alcedo.__dict__)
'{"name": "Alcedo", "wingspan": 25}'
>>> pydantic_alcedo.json()
'{"name": "Alcedo", "wingspan": 25}'

If you’d like a more detailed description of the Pydantic object, you can call on the following object schema() or schema_json() method:

>>> PydanticBird.schema()
{
   "title":"PydanticBird",
   "type":"object",
   "properties":{
      "name":{
         "title":"Name",
         "type":"string"
      },
      "wingspan":{
         "title":"Wingspan",
         "exclusiveMinimum":0,
         "exclusiveMaximum":35,
         "type":"number"
      }
   },
   "required":[
      "name",
      "wingspan"
   ]
}

It’s worth mentioning that Pydantic offers features that help handle objects, such as methods to parse objects from a JSONstring, dict (dictionary), or file.

  • parse_obj throws an error when an argument isn’t a dict type
>>> bird_data = {"name": "Alcedo", "wingspan": 25}
>>> PydanticBird.parse_obj(bird_data)
PydanticBird(name='Alcedo', wingspan=25.0)
  • parse_raw method as an argument takes string or bytes and parses it as JSON
>>> bird_data = '{"name": "Alcedo", "wingspan": 25}'
>>> PydanticBird.parse_raw(bird_data)
PydanticBird(name='Alcedo', wingspan=25.0)

During software development, you may need to create an object that you want to remain unchanged. It's possible to create that object with the allow_mutation flag. After its creation, any editing attempts should fail. With a dataclass, you can set the keyword argument to True to receive an immutable object.

@dataclass(frozen=True)
class Bird:
    name: str
    wingspan: int

To achieve the same result with Pydantic, you have to set the allow_mutation flag to False in the Config class inside the proper class.

class PydanticBird(BaseModel):
    name: str
    wingspan: float

    class Config:
        allow_mutation = False

Recursive models

Recursive models are also a useful mechanism, because when creating a more complex model that contains other models in itself, the structure is created based on the passed data type. Pydantic recognizes data types and creates objects based on them. Then you can refer to the model's attribute, instead of dictionaries such as creating models via dataclasses or in the most traditional way.

class Bird(BaseModel):
    name: str
    wingspan: float = None

class Props(BaseModel):
    one_species = True
    migrating = True

class Flock(BaseModel):
    properties: Props
    birds: List[Bird]
>>> flock = Flock(properties={'migrating': False}, birds=[{'name': 'Alcedo_1'}, {'name': 'Alcedo_2'}])
>>> flock.properties
Props(one_species=True, migrating=False)
>>> flock.properties.migrating
False
>>> flock.birds
[Bird(name='Alcedo_1', wingspan=None), Bird(name='Alcedo_2', wingspan=None)]

ORM mode

If you’re working with databases, you probably know what ORM is: object-relational mapping. With a Pydantic class, you can set the ORM mode, informing the Pydantic model that in addition to the dictionary, it could also be an ORM model. With this config, you’ll receive all data related to this model. When the ORM mode is set to False, it won’t include the relationship data, even if those relationships are declared in your Pydantic models.

Proof of its usefulness is that FastAPI, a modern web framework, is based on Pydantic, and it’s common to use this ORM mode with FastAPI applications.

Pydantic data validation

To sum up, what do you gain using Pydantic?

  • Better data validation and thus greater control over how your software works
  • Tool with a high level of customization, making it extremely versatile
  • Tool based on the Python syntax, so no need to learn a new programming language
  • Bunch of handy methods to handle objects
  • Serialization/deserialization Pydantic models to JSON
  • Settings management
Photo of Karol Szuster

More posts by this author

Karol Szuster

Python Developer at Netguru
Python Development Services  Use a technology that fits your challenges and business goals Read the Customer Story!

We're Netguru!

At Netguru we specialize in designing, building, shipping and scaling beautiful, usable products with blazing-fast efficiency
Let's talk business!

Trusted by: