Effortless Data Validation in Python with Pydantic
Learn how to handle data validation correctly in Python using Pydantic. From basic validation to advanced field customization and comparison with traditional OOP.
Introduction
Handling data correctly is fundamental in software development. We need to ensure that the data we receive, process, and store matches the structure and types we expect. Failing to do so can lead to bugs, security issues, and frustrated users. Python's flexibility is great, but it doesn't automatically enforce these rules. That's where Pydantic comes in.
What is Pydantic?
Pydantic is a Python library designed for data validation and settings management using Python type annotations. It parses your data based on the types you define (int, str, bool, etc.) and validates it. If the data doesn't match, Pydantic raises clear, helpful errors. It can even perform type coercion when it makes sense (like converting "1" to 1). This helps you build more reliable applications faster by catching data errors early.
Getting Started: Basic Validation
Let's see Pydantic in action. We'll define a simple User model using standard Python type hints.
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
age : int
Creating an Instance:
Now, let's create an instance of this User. Pydantic validates the data upon initialization.
# This works perfectly because the data types match the model definition
user = User(id=1, name="John Doe", email="john.doe@example.com", age=20)
print(user)
# Output: id=1 name='John Doe' email='john.doe@example.com' age=20
Handling Invalid Data:
What happens if we provide incorrect data? Pydantic steps in and tells us exactly what's wrong.
from pydantic import ValidationError
try:
# Attempt to create a user missing the 'age' field
invalid_user_2 = User(id=1, name="John Doe", email="john.doe@example.com")
except ValidationError as e:
print("--- Missing Field Error ---")
print(e)
"""
Expected Output:
--- Missing Field Error ---
1 validation error for User
age
Field required [type=missing, input_value={'id': 1, 'name': 'John Doe', 'email': 'john.doe@example.com'}, input_type=dict]
"""
try:
# Attempt to create a user with 'age' as a string "twenty"
invalid_user_4 = User(id=1, name="John Doe", email="john.doe@example.com", age="twenty")
except ValidationError as e:
print("\n--- Incorrect Type Error ---")
print(e)
"""
Expected Output:
--- Incorrect Type Error ---
1 validation error for User
age
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='twenty', input_type=str]
"""
# You can try other invalid cases from the introduction.py file as well:
# invalid_user_3 = User(id=1, name="John Doe", age=20) # Missing email
# invalid_user_5 = User(id=1, name="John Doe", email=5, age=20) # Email is not a string
Automatic Type Conversion:
Pydantic is smart enough to convert types when it's unambiguous. For example, it can convert a string containing a number into an actual integer or float.
# Pydantic converts age="20" to age=20 (int)
user_2 = User(id=1, name="John Doe", email="john.doe@example.com", age="20")
print(f"Converted Age: {user_2.age} (type: {type(user_2.age)})")
# Output: Converted Age: 20 (type: <class 'int'>)
# Pydantic converts id="1" to id=1 (int)
user_3 = User(id="1", name="John Doe", email="john.doe@example.com", age=20)
print(f"Converted ID: {user_3.id} (type: {type(user_3.id)})")
# Output: Converted ID: 1 (type: <class 'int'>)
# Another example with a Price model
class Price(BaseModel):
amount: float
currency: str
# Works fine
price_1 = Price(amount=100, currency="USD")
# Pydantic converts amount="100" to amount=100.00 (float)
price_2 = Price(amount="100", currency="USD")
print(f"Converted Amount: {price_2.amount} (type: {type(price_2.amount)})")
# Output: Converted Amount: 100.00 (type: <class 'float'>)
Diving Deeper: Field Customization with Field
For more control over validation, Pydantic provides the Field function. You can use it to set default values, add constraints, define aliases, and more.
Default Values:
You can specify default values directly in the model definition or using Field.
from pydantic import BaseModel
class UserWithDefaults(BaseModel):
id: int
name: str = "ENES"
email: str
age: int = 22
# Uses the default name 'ENES' and age 22
user_with_defaults = UserWithDefaults(id=1, email="enes@example.com")
print(user_with_defaults)
# Output: id=1 name='ENES' email='enes@example.com' age=22
# Overrides defaults
user_override = UserWithDefaults(id=1, name="John Doe", email="john.doe@example.com", age=30)
print(user_override)
# Output: id=1 name='John Doe' email='john.doe@example.com' age=30
Constraints (Numeric, Text, etc.):
Field is powerful for adding constraints. Let's look at numeric and text constraints.
from pydantic import BaseModel, Field
from decimal import Decimal
class ProductNumeric(BaseModel):
id: int
name: str
quantity_in_stock: int = Field(gt=0, le=1000) # Must have > 0 and <= 1000
# Valid: quantity is 25 (between 1 and 1000)
product_valid_qty = ProductNumeric(id=1, name="Gadget", quantity_in_stock=25)
print(product_valid_qty)
# Output: id=1 name='Gadget' quantity_in_stock=25
# Invalid: quantity is 0 (fails gt=0)
try:
invalid_product = ProductNumeric(id=2, name="Thing", quantity_in_stock=0)
except ValidationError as e:
print("\n--- Numeric Constraint Error (quantity) ---")
print(e)
"""
Expected Output:
--- Numeric Constraint Error (quantity) ---
1 validation error for ProductNumeric
quantity_in_stock
Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]
"""
# ----------------------------------------------------------
# TEXT FIELD FEATURES
# ----------------------------------------------------------
class UserProfile(BaseModel):
# Username must be at least 3 characters
username: str = Field(min_length=3)
# Bio can be at most 10 characters
bio: str = Field(max_length=10)
# Phone number must consist of only digits (using regex)
phone_number: str = Field(pattern=r'^\d*$')
# Valid profile
user_prof = UserProfile(username="john", bio="developer", phone_number="1234567890")
print(f"\nValid Profile: {user_prof}")
# Output: Valid Profile: username='john' bio='developer' phone_number='1234567890'
# Invalid profile (multiple errors possible)
try:
# username 'jo' is too short, bio is too long, phone_number contains 'ABC'
invalid_user = UserProfile(username="jo", bio="this bio is way too long for the field", phone_number="ABC")
except ValidationError as e:
print("\n--- Text Constraint Errors ---")
print(e) # Will show errors for username, bio, and phone_number
"""
Expected Output:
--- Text Constraint Errors ---
3 validation errors for UserProfile
username
String should have at least 3 characters [type=string_too_short, input_value='jo', input_type=str]
bio
String should have at most 10 characters [type=string_too_long, input_value='this bio is way too long for the field', input_type=str]
phone_number
String should match pattern '^\\d*$' [type=string_pattern_mismatch, input_value='ABC', input_type=str]
"""
Pydantic offers many other field options. For a full list, check the official documentation: Pydantic Fields.
Pydantic vs. Traditional OOP
Let's compare defining a data structure with Pydantic versus a standard Python class.
from pydantic import BaseModel, ValidationError
import json
# ----------------------------------------------------------
# PYDANTIC MODEL
# ----------------------------------------------------------
class User(BaseModel):
id: int
name: str
email: str
age: float
# ----------------------------------------------------------
# TRADITIONAL OOP MODEL
# ----------------------------------------------------------
class TraditionalUser:
def __init__(self, id: int, name: str, email: str, age: float):
# NO AUTOMATIC VALIDATION HERE!
# We *could* add manual checks, e.g., isinstance(name, str)
self.id = id
self.name = name
self.email = email
self.age = age
def __str__(self):
return f"TraditionalUser(id={self.id}, name='{self.name}', email='{self.email}', age={self.age})"
# Manual methods needed for serialization
def to_dict(self):
return {
"id": self.id,
"name": self.name,
"email": self.email,
"age": self.age
}
def to_json(self):
return json.dumps(self.to_dict())
Key Conceptual Differences:
The core difference lies in the primary purpose and built-in capabilities. Pydantic models are specifically designed for data validation and parsing, leveraging type hints declaratively to enforce structure and types automatically. Standard OOP classes, while capable of holding data, don't inherently perform validation; you must implement validation logic imperatively (e.g., inside init). Furthermore, Pydantic provides out-of-the-box utilities for common data-related tasks like serialization (.model_dump(), .model_dump_json()), which require manual implementation in traditional classes. Pydantic focuses on the data's shape and correctness, whereas traditional OOP focuses more broadly on behavior and state.
Conclusion
Pydantic offers a Pythonic and efficient way to handle data validation and serialization. By using type hints you're likely already familiar with, it helps create more robust, maintainable, and reliable applications. It reduces boilerplate code, provides clear error messages, and integrates smoothly into modern Python development workflows.
Want to dive deeper?
- Explore the complete code examples featured here on GitHub.
- Watch a detailed explanation (in Turkish) on my YouTube channel.