Pydantic: How It's Just My Bread and Butter for Everything AI

By Mohammad Kaamil MirzaAugust 20, 20259 min readAI

Pydantic: How It's Just My Bread and Butter for Everything AI

TL;DR;
LLMs are basically fancy autocomplete on steroids that love to hallucinate when you need them to be precise. Pydantic is the adult supervision they desperately need to become somewhat trustworthy in the chaotic word salad that they produce (The LLMs), validating data that you can actually trust in production. It's become my go-to solution for literally everything AI-related, from simple data validation to complex multi-step ML pipelines.

1. "Problem with LLMs" - The illusion of autocomplete

These autocomplete machines have overtaken the world and are being used in way more areas than anyone would be comfortable with, especially software engineer, but it is what it is and it is here to stay. They are trained to predict the next token in a sequence and that's all they are good at, I mean extremely good at, they do not have any real understanding of anything and how do they mimic our work so well? That's a blackbox, still. As someone said, English is the hottest programming language, which sounds great on surface but a programming language? It would be absolutely terrible with no structure, no safety, nothing.

Why an issue? These LLMs are clueless statistical parrots that produce very believable and very convincing sounding text. When you ask LLM model to generate a json, it will give you something like

{
    "username": "kaamil",  // Should be a valid username
    "email": "mirzakaamil@gmail.com",  // Should be a valid email
    "preferences": null,  // Should be an array
    "confidence": 1.2  // Should be in the range 0-1
}

And something like this would be spewed out and it will be done with high confidence but it becomes a nightmare to deal with when making production apps because the username might not adhere to the right set of rules that have been laid out, email might not have the right domain, preferences could be all over the place and so on. I know what you are thinking, these things can be fixed with a beautifully crafted, well written prompt but that is the point, it's not a certainty but a high probability that the prompt would fix such stuff.

2. "Problems with Expectations with LLMs" - The Golden Hammer Syndrome

I had to look it up, who said the proverb, I went to the internet and literally searched “Hammer that can hit every nail” and google’s gemini did its little search summarize thingy and gave me the right name for the title as well as the paragraph that I was going to write. So, the proverb is by Abraham Maslow and it goes “If the only tool you have is a hammer, you tend to see every problem as a nail”. And this can very well sum up the last 2 years. We have almost perfected the art of using darn good autocomplete to literally drive automation pipelines and other very tedious works and every business has one final gun in the pocket called “let’s throw GPT at it”.

I’ve seen people try to use LLMs for:

Business Lalas Searching for business Ideas getting the ones the model is most trained on statistically
Software engineers getting implementation code when the cutoff date was earlier than the library release
Project managers plotting deadlines while Dave only works Mondays after the 4th Tuesday, our little secret.

So what is happening on the ground let me tell you, is that, these business stakeholders come across videos of LLMs writing shakespeare like poems and solving coding problems and now they want the LLMs to reliable extract structured data from their messy disgusting Excel files (yes it's always that excel file), without any supervision, completely automated and add that 1k employee firing to the TODOs on the day of delivery.

Using LLM outputs in production is like playing dice in a burning building to decide which exit to take. People gotta use them what they are good at, which would be text retrieval from a known and understood source, and besides that, etc (for the lack of any better use case, sorry, cannot get anything else in mind right now).

3. How Pydantic became my Prince Charming.

In simple terms. It just serves us structured order on a silver platter, exactly what we need to do. But be aware, it's not the final solution, pydantic alone cannot solve everything for you but it's an important piece of the puzzle. It’s sort of like a bouncer at the gate of the club making sure only the people with the right dress code come in.

from pydantic import BaseModel, Field
from typing import List, Optional
from datetime import datetime
class MedicalInsuranceForm(BaseModel):
    patient_name: str = Field(..., description="Full legal name of the patient")
    policy_number: str = Field(..., regex=r'^INS\\d{6}$')
    date_of_birth: datetime
    coverage_amount: int = Field(..., gt=0, description="Coverage amount in dollars")
    pre_existing_conditions: List[str] = Field(default=[])
    emergency_contact: Optional[str] = None
    class Config:
        # This is where the magic happens
        validate_assignment = True
        use_enum_values = True

4. Why Pydantic is Perfect for LLMs

JSON Schema Generation: Pydantic automatically generates JSON schemas that can be passed directly to LLM function calling APIs so it is letting us do what we already do in our existing flows
Automatic Validation: Input validation happens automatically, catching LLM mistakes before they propagate
Type Safety: Your IDE knows exactly what fields exist and their types because we are literally defining a schema
Error Messages: When validation fails, you get clear error messages you can feed back to the LLM

A real world example of a production code would be

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field

client = instructor.from_openai(OpenAI())

class ExtractedData(BaseModel):
    patient_name: str = Field(description="Patient's full legal name")
    policy_number: str = Field(pattern=r'^[A-Z]{3}\d{7}$')
    claim_amount: float = Field(gt=0, description="Claim amount in USD")

def extract_insurance_data(text: str) -> ExtractedData:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=ExtractedData,
        messages=[
            {"role": "user", "content": f"Extract insurance data: {text}"}
        ]
    )
# This GUARANTEES structured output or raises a validation error

So what pydantic is essentially doing is “type safety and validation to AI applications” as mentioned by the creator of pydantic but more so what it is really doing is making a framework available that reduces the need of writing exact well crated prompts as well as parsers to parse from the text that is thrown out of the LLM.

So it's not just the glorified parsing and validation that it does but the entire experience as a whole which is really helpful. And that is why it makes itself one of the most downloaded PyPI packages.

4.Pydantic in action

A more real use case or actual use of pydantic for structured LLM output would look something like this
Lets look at it from a pov of a restaurant owner using it along with all of there existing stuff.

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List, Literal

class ReviewAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"] = Field(description="Overall sentiment")
    rating: int = Field(ge=1, le=5, description="Star rating from 1-5")
    key_points: List[str] = Field(max_length=3, description="Main points mentioned")
    would_recommend: bool = Field(description="Would customer recommend this product")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
parser = PydanticOutputParser(pydantic_object=ReviewAnalysis)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You analyze customer reviews. Extract key information precisely.\\n{format_instructions}"),
    ("human", "Analyze this review: {review_text}")
]).partial(format_instructions=parser.get_format_instructions())

review_analyzer = prompt | llm | parser

messy_review = """ honestly this product is pretty good i guess. shipped fast which was nice.  the quality is ok not amazing but not terrible either. would probably buy again  if its on sale. customer service was helpful when i had questions. 4/5 stars """
result = review_analyzer.invoke({"review_text": messy_review})
print(f"Sentiment: {result.sentiment}")
print(f"Rating: {result.rating}")
print(f"Key points: {result.key_points}")
print(f"Recommend: {result.would_recommend}")

Why is it just always really good and reliable?

No more regex parsing - The LLM handles the natural language understanding, Pydantic handles the structure
Type safety everywhere - Your IDE knows exactly what fields exist and their types
Automatic validation - If the LLM tries to return rating=6, Pydantic catches it
Business rules enforced - The ge=1, le=5 constraint on rating means you'll never get invalid data

Imagine a pot where you can just mix all these ideas up at once for an ever crazier use

# Process 1000 reviews in a batch
reviews = load_customer_reviews()  # Your data source
analyzed_reviews = []
for review in reviews:
    try:
        analysis = review_analyzer.invoke({"review_text": review})
        analyzed_reviews.append(analysis.model_dump())  # Convert to dict for storage
    except Exception as e:
        print(f"Failed to analyze review: {e}")
        # Log the problematic review for manual inspection

# Now you have clean, structured data for your dashboard
positive_reviews = [r for r in analyzed_reviews if r['sentiment'] == 'positive']
average_rating = sum(r['rating'] for r in analyzed_reviews) / len(analyzed_reviews)

Also just to remind you, before pydantic, it looked something like this

# The old, painful way
def parse_review_manually(text):
    sentiment = "neutral"  # Default guess
    rating = 3  # Another guess
    
    if "love" in text.lower() or "great" in text.lower():
        sentiment = "positive"
    elif "hate" in text.lower() or "terrible" in text.lower():
        sentiment = "negative"
    
    # Try to find rating with regex... good luck with that
    import re
    rating_match = re.search(r'(\d)/5', text)
    if rating_match:
        rating = int(rating_match.group(1))
    
    # This breaks constantly and misses tons of cases
    return {"sentiment": sentiment, "rating": rating}

With Pydantic + LLM, it just works. The LLM understands context and nuance, while Pydantic ensures the output is exactly what your code expects. It's like having a smart intern who never makes formatting mistakes.

The beauty is in the simplicity you define your data structure once, and everything else just flows from there. Whether it's customer reviews, support tickets, or user feedback, the pattern stays the same: messy text goes in, clean structured data comes out. And there, such kind of use cases are here to stay and they are forever replaced if the AI was not doing and performing and delivering them already.

The next time you're building an AI system, remember: raw LLM outputs are like unfiltered tap water in some countries it might be fine, but why risk it when you can easily make it safe?

Somewhere else I read something along the lines that "Pydantic is the hero that we do not deserve but the one that LLMs desperately need."

Trust me, your future self (and your production monitoring dashboard) will thank you.

Find anything wrong? Let me know. Want to geek out and develop something cool? Let me know.

Thanks for reading!