JSON Fields for Nested Pydantic Models? #1925

scuervo91 · 2021-08-31T13:31:10Z

scuervo91
Aug 31, 2021

First Check

I added a very descriptive title to this issue.
I used the GitHub search to find a similar issue and didn't find it.
I searched the SQLModel documentation, with the integrated search.
I already searched in Google "How to X in SQLModel" and didn't find any information.
I already read and followed all the tutorial in the docs and didn't find an answer.
I already checked if it is not related to SQLModel but to Pydantic.
I already checked if it is not related to SQLModel but to SQLAlchemy.

Commit to Help

I commit to help with one of those options 👆

Example Code

from tortoise.models import Model 
from tortoise.fields import UUIDField, DatetimeField,CharField, BooleanField, JSONField, ForeignKeyField, CharEnumField, IntField
from tortoise.contrib.pydantic import pydantic_model_creator

class Schedule(Model):
    id = UUIDField(pk=True)
    created_at = DatetimeField(auto_now_add=True)
    modified_at = DatetimeField(auto_now=True)
    case = JSONField()
    type = CharEnumField(SchemasEnum,description='Schedule Types')
    username = ForeignKeyField('models.Username')
    description = CharField(100)
    
schedule_pydantic = pydantic_model_creator(Schedule,name='Schedule')

Description

I have already implemented an API using FastAPI to store Pydantic Models. These models are themselves nested Pydantic models so the way they interact with a Postgres DataBase is throught JsonField. I've been using Tortoise ORM as the example shows.

Is there an equivalent model in SQLModel?

Operating System

Linux

Operating System Details

WSL 2 Ubuntu 20.04

SQLModel Version

0.0.4

Python Version

3.8

Additional Context

No response

OXERY · 2021-09-03T06:59:50Z

OXERY
Sep 3, 2021

I also wondered how to store JSON objects without converting to string. SQL Alchemy supports storing these directly

0 replies

TheJedinator · 2021-09-09T17:38:24Z

TheJedinator
Sep 9, 2021

@OXERY && @scuervo91 - I was able to get something that works Using this:

regions: dict = Field(sa_column=Column(JSON), default={'all': 'true'})

That said: this is a postgresql JSONB column in my database. But it works.

For a nested Object you could use a pydantic model as the Type and do it the same way. Hope this helps as I was having a difficult time figuring out a solution as well :)

0 replies

OXERY · 2021-09-10T05:01:27Z

OXERY
Sep 10, 2021

I also got it working, on SQLite and Postgresql:
mygreatfield: Dict[Any, Any] = Field(index=False, sa_column=Column(JSON))
needs from sqlmodel import Field, SQLModel, Column, JSON as well as from typing import Dict, Any

0 replies

psarka · 2021-12-01T17:10:42Z

psarka
Dec 1, 2021

@TheJedinator Could you help a bit more with the nested object? I tried to "use the pydantic model as the Type" but I can't get it to work :( Here is my snippet:

from sqlalchemy import Column
from sqlalchemy.dialects.postgresql import JSONB
from sqlmodel import Field
from sqlmodel import Session
from sqlmodel import SQLModel

from engine import get_sqlalchemy_engine


class J(SQLModel):
    j: int


class A(SQLModel, table=True):
    a: int = Field(primary_key=True)
    b: J = Field(sa_column=Column(JSONB))


engine = get_sqlalchemy_engine()
SQLModel.metadata.create_all(engine)

with Session(engine) as session:
    a = A(a=1, b=J(j=1))
    session.add(a)
    session.commit()
    session.refresh(a)

Throws an error

sqlalchemy.exc.StatementError: (builtins.TypeError) Object of type J is not JSON serializable
[SQL: INSERT INTO a (b, a) VALUES (%(b)s, %(a)s)]
[parameters: [{'a': 1, 'b': J(j=1)}]]

0 replies

psarka · 2021-12-01T21:50:41Z

psarka
Dec 1, 2021

Thank you! Unfortunately I get the same error :(

I found one workaround - registering a custom_serializer for the sqlalchemy engine, like so:

def custom_serializer(d):
    return json.dumps(d, default=lambda v: v.json())

def get_sqlalchemy_engine():
    return create_engine("postgresql+psycopg2://", creator=get_conn, json_serializer=custom_serializer)

But if there is a cleaner way, I would gladly use that instead.

0 replies

TheJedinator · 2021-12-02T00:54:59Z

TheJedinator
Dec 2, 2021

Hey @psarka

I just actually tried what I told and sorry have mislead... I did get a working solution though 😄

It was actually the opposite function that you need to use, here's the example you supplied with the amendments to make it work:

with Session(engine) as session:
    j = J(j=1)
    j_dumped = J.json(j)
    a = A(a=1, b=j_dumped)
    session.add(a)
    session.commit()
    session.refresh(a)

0 replies

psarka · 2021-12-02T05:19:35Z

psarka
Dec 2, 2021

Hmm, this doesn't (or at least shouldn't) typecheck :)

But I see what you did there, essentially it's the same as registring a custom serializer, but manually.

0 replies

TheJedinator · 2021-12-02T11:53:40Z

TheJedinator
Dec 2, 2021

It does type check when you create the J Object (which it should) So if you tried to supply a string it would fail J(j="foo")

This allows for the type checking of the object, the A class requires a serialized version of J in order for it to be entered in to the database.

It is essentially the same as registering a custom serializer but allows you to be explicit about using it.

0 replies

HenningScheufler · 2022-01-09T15:37:30Z

HenningScheufler
Jan 9, 2022

A hacky method with type checking that work with sqlite is

from sqlalchemy import Column
from typing import List
# from sqlalchemy.dialects.postgresql import JSONB
from sqlmodel import Field
from sqlmodel import Session
from pydantic import validator
from sqlmodel import SQLModel, JSON,create_engine

# from engine import get_sqlalchemy_engine
sqlite_file_name = "test.db"
sqlite_url = f"sqlite:///{sqlite_file_name}"

engine = create_engine(sqlite_url)


class J2(SQLModel):
    test: List[int]

class J(SQLModel):
    j: int
    nested: J2


class A(SQLModel, table=True):
    a: int = Field(primary_key=True)
    b: J = Field(sa_column=Column(JSON))

    @validator('b')
    def val_b(cls, val):
        return val.dict()

SQLModel.metadata.create_all(engine)

with Session(engine) as session:
    a = A(a=1, b=J(j=1,nested=J2(test=[100,100,100])))
    session.add(a)
    session.commit()
    session.refresh(a)

0 replies

hakanoktay · 2022-02-10T13:06:29Z

hakanoktay
Feb 10, 2022

hi,
I created a "JSON Field" based on what is written here. I am using SQLite.

from sqlmodel import SQLModel,Relationship,Field,JSON
from typing import Optional,List, Dict
from sqlalchemy import Column
from pydantic import validator


#
class J2(SQLModel):
    id: int
    title:str

#
class Companies(SQLModel, table=True):
    id:Optional[int]=Field(default=None,primary_key=True)
    name:str
    adddresses: List['J2'] = Field(sa_column=Column(JSON))


    @validator('adddresses')
    def val_b(cls, val):
        print(val)
        return val.dict()

Given error.

TypeError: Type is not JSON serializable: J2

when i print it, it returns

[J2(id=1, title='address1'), J2(id=2, title='address2')]

how can i handle that? Why is this J2 added, how can I get rid of it, i can't turn it to .dict(), i cannot serialise it... can you give an idea?

0 replies

HenningScheufler · 2022-02-10T13:16:17Z

HenningScheufler
Feb 10, 2022

Does this work?

    @validator('adddresses')
    def val_b(cls, value):
        print(value)
        return [v.dict() for v in value]

0 replies

hakanoktay · 2022-02-10T13:20:52Z

hakanoktay
Feb 10, 2022

Does this work?

    @validator('adddresses')
    def val_b(cls, value):
        print(value)
        return [v.dict() for v in value]

@HenningScheufler thank you for your help, it worked perfect.

0 replies

MaximilianFranz · 2022-03-16T11:03:53Z

MaximilianFranz
Mar 16, 2022

Hey all,

thanks for the great advice here. Creating a the object using the classes and writing them to the DB works as expected and writes the data as a dict into a JSON field.

See this example:

class ComplexHeroField(SQLModel, table=False):
    some: str
    other: float
    more: Optional[List[str]]

class Hero(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    complex_field: ComplexHeroField = Field(sa_column=Column(JSON))
    name: str
    secret_name: str
    age: Optional[int] = None

    @validator('complex_field')
    def val_complex(cls, val: ComplexHeroField):
        # Used in order to store pydantic models as dicts
        return val.dict()

    class Config:
        arbitrary_types_allowed = True

However, when reading the model from the DB using a select() I would want the JSON field to be read into a ComplexHeroField class using pydantics parse_raw or parse_obj. Because they way it's currently done (with the validator) this happens:

        statement = select(Hero)
        results = session.exec(statement)
        for hero in results:
            print(hero.complex_field.some)

       # AttributeError: 'dict' object has no attribute 'some'

Any hint how that could be achieved? Maybe via the custom-serialiser mentioned by @psarka ?

Thanks already!

0 replies

MaximilianFranz · 2022-03-16T11:09:27Z

MaximilianFranz
Mar 16, 2022

Something like this works, but obviously doesn't scale if we have mulitple nested models, instead of just the ComplexHeroField:


def custom_serializer(d):
    return json.dumps(d, default=lambda v: v.json())

def custom_deserialiser(d):
    return ComplexHeroField.parse_raw(d)

engine = create_engine(url_string, echo=True, json_serializer=custom_serializer, json_deserializer=custom_deserialiser)

complex_value = ComplexHeroField(some="value", other=5, more=["dd", "sdf"])
hero_1 = Hero(name="Deadpond", secret_name="Dive Wilson", complex_field=complex_value)
session.add(hero_1)
session.commit()

statement = select(Hero)
results = session.exec(statement)
for hero in results:
    print(hero.complex_field.some)
    # value

Instead, we would need more context in the deserialiser (i.e. access to the type-hint of the field we're trying to deserialise so that we can use UseType.parse_raw().

Any hint where and how I could achieve that kind of access to the deserialisation process?

Thanks :)

0 replies

Seluj78 · 2025-03-07T10:58:18Z

Seluj78
Mar 7, 2025

Here is a minimum reproducible example:

import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB
import sqlmodel as sm
import uuid
import typing as t

class TestTable(sm.SQLModel, table=True):  # type: ignore[call-arg]
    uuid_id: uuid.UUID = sm.Field(  # type: ignore[call-overload]
        default_factory=uuid.uuid4,
        sa_column=sa.Column("id", sa.UUID, unique=True, nullable=False, primary_key=True),
    )
    conversation_transcript: t.Optional[t.List[dict]] = sm.Field(sa_column=sa.Column(JSONB, nullable=True), default=None)


SYNC_SQLALCHEMY_URL = "postgresql://postgres:postgres@localhost:5432/postgres"

SYNC_DB_ENGINE = sa.create_engine(SYNC_SQLALCHEMY_URL, pool_pre_ping=True)  # type: ignore
SYNC_SESSION_MAKER = sa.orm.sessionmaker(bind=SYNC_DB_ENGINE, class_=sa.orm.Session, expire_on_commit=False, autoflush=True)

sm.SQLModel.metadata.create_all(SYNC_DB_ENGINE)

with SYNC_SESSION_MAKER() as session:
    session.add(TestTable(conversation_transcript=[{"text": "Hello"}]))
    session.commit()

with SYNC_SESSION_MAKER() as session:
    items = session.query(TestTable).all()
    item = items[0]
    print(f"Before update: {item.conversation_transcript} (ID: {item.uuid_id})")
    item_id = item.uuid_id
    item.conversation_transcript.append({"text": "World"})
    print(f"After update before save: {item.conversation_transcript} (ID: {item.uuid_id})")
    session.add(item)
    session.commit()

with SYNC_SESSION_MAKER() as session:
    item = session.get(TestTable, item_id)
    print(f"After update after load: {item.conversation_transcript} (ID: {item.uuid_id})")

0 replies

fny · 2025-03-10T21:08:15Z

fny
Mar 10, 2025

Following @MaximilianFranz @Anudorannador , make it work for List of nested object, work perfectly for JSON as I need more readable content in db

T = TypeVar('T')

#63

def pydantic_column_type(pydantic_type):
class PydanticJSONType(TypeDecorator, Generic[T]):
impl = JSON()

    def __init__(
        self, json_encoder=json,
    ):
        self.json_encoder = json_encoder
        super(PydanticJSONType, self).__init__()

    def bind_processor(self, dialect):
        impl_processor = self.impl.bind_processor(dialect)
        dumps = self.json_encoder.dumps
        if impl_processor:
            def process(value: T):
                if value is not None:
                    if isinstance(value, list) and isinstance(pydantic_type, ModelMetaclass):
                        value_to_dump = [pydantic_type.model_validate(item) for item in value]
                    elif isinstance(pydantic_type, ModelMetaclass):
                        value_to_dump = pydantic_type.model_validate(value)
                    else:
                        value_to_dump = value
                    value = jsonable_encoder(value_to_dump)
                return impl_processor(value)
        else:
            def process(value):
                if isinstance(value, list) and isinstance(pydantic_type, ModelMetaclass):
                    value_to_dump = [pydantic_type.model_validate(item) for item in value]
                elif isinstance(pydantic_type, ModelMetaclass):
                    value_to_dump = pydantic_type.model_validate(value)
                else:
                    value_to_dump = value
                value = dumps(jsonable_encoder(value_to_dump))
                return value
        return process

    def result_processor(self, dialect, coltype) -> T:
        impl_processor = self.impl.result_processor(dialect, coltype)
        if impl_processor:
            def process(value):
                value = impl_processor(value)
                if value is None:
                    return None
                data = value
                # Explicitly use the generic directly, not type(T)
                if isinstance(data, list) and isinstance(pydantic_type, ModelMetaclass):
                    full_obj = [pydantic_type.model_validate(item) for item in data]
                elif isinstance(pydantic_type, ModelMetaclass):
                    full_obj = pydantic_type.model_validate(data)
                else:
                    full_obj = data
                return full_obj
        else:
            def process(value):
                if value is None:
                    return None
                # Explicitly use the generic directly, not type(T)
                if isinstance(value, list) and isinstance(pydantic_type, ModelMetaclass):
                    full_obj = [pydantic_type.model_validate(item) for item in value]
                elif isinstance(pydantic_type, ModelMetaclass):
                    full_obj = pydantic_type.model_validate(value)
                else:
                    full_obj = value
                return full_obj
        return process

    def compare_values(self, x, y):
        return x == y

return PydanticJSONType

Example

from sqlmodel import SQLModel,Relationship,Field,JSON
from typing import Optional,List, Dict
from sqlalchemy import Column
from pydantic import validator

class J2(SQLModel):
id: int
title:str

class Companies(SQLModel, table=True):
id:Optional[int]=Field(default=None,primary_key=True)
name:str
adddresses: List['J2'] = Field(sa_column=Column(pydantic_column_type(J2)))

No need for setting up engine level serializer and derilizer as bind_processor defines serialization behavior and result_processor for deserialization.

Great work, but seems to be broken. (1) It runs validations on nested types which slows down reads (2) somehow this is triggering other records to be downloaded.

0 replies

Seluj78 · 2025-03-10T21:10:39Z

Seluj78
Mar 10, 2025

@fny I know, it's certainly not great. If I get a better version I will be sure to post it here, and you can do the same as well, while we wait for an official implementation

0 replies

fny · 2025-03-12T21:21:10Z

fny
Mar 12, 2025

Hi all, Below is a better implementation based on @iloveiltaly's ActiveModel project. My version is more robust than ActiveModel's. It also assumes all records coming out of the database are valid (i.e. uses Model.construct(...) instead of Model(...)). Additionally, this can handle deeply nested models and other recursive structures.

This tremendously speeds up loading records since validations are skipped. I anticipate this should handle 80% of use cases.

@tiangolo You might find this useful too.

from typing import get_args, get_origin
from pydantic import BaseModel

def convert_field_value(annotation: Any, raw_value: Any) -> Any:
    origin = get_origin(annotation)
    args = get_args(annotation)

    if is_union_type(origin):
        if raw_value is None:
            return None
            # The above is optimistic: it's possible that None is not a valid
            # value for the annotation. The code below would account for that.
            #
            # if type(None) in args:
            #     return None
            # else:
            #     raise ValueError(f"None is not a valid value for the annotation {annotation}")
        for arg in args:
            converted = convert_field_value(arg, raw_value)
            if converted is not None:
                return converted
        return None

    if origin is list:
        return [convert_field_value(args[0], item) for item in raw_value]

    if origin is dict:
        return {
            key: convert_field_value(args[1], value) for key, value in raw_value.items()
        }

    if origin is tuple:
        if len(args) != len(raw_value):
            return raw_value
        return tuple(
            convert_field_value(arg, item) for arg, item in zip(args, raw_value)
        )

    try:
        if issubclass(annotation, BaseModel):
            attrs = {
                field_name: convert_field_value(
                    field_info.annotation, raw_value.get(field_name)
                )
                for field_name, field_info in annotation.model_fields.items()
            }

            return annotation.model_construct(**attrs)
    except TypeError as e:
        if "issubclass()" not in str(e):
            raise e

    return raw_value


class SQLModelJSONMixin:
    @reconstructor
    def init_on_load(self):
        for field_name, field_info in self.model_fields.items():

            raw_value = getattr(self, field_name)
            print(field_name, field_info, raw_value)
            converted = convert_field_value(field_info.annotation, raw_value)
            setattr(self, field_name, converted)

Usage:

class MyModel(SQLModel, SQLModelJSONMixin):
    nested_model: Optional[NestedModel] = Field(sa_type=JSON(), nullable=True)

Other note: you need to call record.init_on_load() after you commit the record to the database, otherwise sqlalchemy will overwrite the field with a dict-like object.

I'm sure there's something I could add to the mixin, but I haven't had time to investigate.

0 replies

DaanRademaker · 2025-03-13T06:56:34Z

DaanRademaker
Mar 13, 2025

^^ Very nice! Tested the above seems to work great so far!

Also I had to add 2 lines of code to make sure it works with None default values being returned if the type is an array or list (this is technically possible).

  if not origin:  # not a container type (e.g. int, untyped list, None, datetime)
        return raw_value
  if raw_value is None:
      return None

0 replies

Seluj78 · 2025-03-13T07:42:47Z

Seluj78
Mar 13, 2025

@fny looks great ! Can you provide an example on usage and migrations (with alembic) ?

0 replies

fny · 2025-03-13T17:55:30Z

fny
Mar 13, 2025

@DaanRademaker I just realized that error myself. I updated my version to make it more robust and also avoid an issue where setattr(...) was triggering validations. @Seluj78: I added an example. Migrations will work without any additional changes.

Other note: you need to call record.init_on_load() after you commit the record to the database, otherwise sqlalchemy will overwrite the field with a dict-like object.

I'm sure there's something I could add to the mixin, but I haven't had time to investigate.

0 replies

iloveitaly · 2025-03-15T17:56:30Z

iloveitaly
Mar 15, 2025

@fny would love to merge these updates into the active model project if you're up for submitting a PR

0 replies

amanmibra · 2025-03-16T12:37:06Z

amanmibra
Mar 16, 2025

I am here to bump this. I would love to see this added!

0 replies

pporcher · 2025-03-16T14:46:34Z

pporcher
Mar 16, 2025

Here is how I do it using pydantic's TypeAdapter.

from sqlalchemy import TypeDecorator
from sqlmodel import JSON
from pydantic import TypeAdapter

class PydanticJson(TypeDecorator):
    impl = JSON()
    cache_ok = True

    def __init__(self, pt):
        super().__init__()
        self.pt = TypeAdapter(pt)
        self.coerce_compared_value = self.impl.coerce_compared_value

    def bind_processor(self, dialect):
        return lambda value: self.pt.dump_json(value) if value is not None else None

    def result_processor(self, dialect, coltype):
        return lambda value: self.pt.validate_json(value) if value is not None else None

And how to use it.

from sqlalchemy import Column
from pydantic import BaseModel
from sqlmodel import SQLModel, Field

class Nested(BaseModel):
    value: str

class Parent(SQLModel, table=True):
    id: int = Field(primary_key=True, default=None)
    nested: Nested | None = Field(sa_column=Column(PydanticJson(Nested)))
    nested_list: list[Nested] = Field(sa_column=Column(PydanticJson(list[Nested])))

0 replies

fny · 2025-04-15T13:25:14Z

fny
Apr 15, 2025

@fny would love to merge these updates into the active model project if you're up for submitting a PR

Hey @iloveitaly! Sorry for the late response. I just saw this. I'll try to get around to it this week.

0 replies

Dude29 · 2025-06-02T15:44:43Z

Dude29
Jun 2, 2025

I tried using @pporcher solution and it works great to create the tables and get running.
But another problem I faced further ahead was when auto generating migrations using Alembic.

It generated the migration file with code like this:
sa.Column('accessories', a.very.long.path.to.the.class.PydanticJson(), nullable=False),

Which has two issues:

That long path is not recognized in the migration file but I can work around that by adding import statements
Alembic doesn't know how to use PydanticJson() and doesnt pass the pydantic model in its constructor

I guess the solution is to just tell Alembic to create the column as type JSON and go from there but im not sure how to do that.
Does anyone know how to tell Alembic to map a type to another one(if this is even possible)?

0 replies

Alex-S-H-P · 2025-08-26T21:34:11Z

Alex-S-H-P
Aug 26, 2025

For the specific use-case where one wants to store in a JSON column a list of Pydantic Model instances, I built on @pporcher's answer

# sql.py
from typing import Any, Generic, Self, TypeVar
from sqlalchemy import Column, Dialect, TypeDecorator
from sqlalchemy.sql.operators import OperatorType
from sqlalchemy.sql.type_api import _BindProcessorType, _ResultProcessorType
from sqlmodel import JSON
from pydantic import BaseModel, TypeAdapter


T = TypeVar("T", bound=BaseModel)


class PydanticColumn(TypeDecorator, Generic[T]):
    impl = JSON
    cache_ok = True

    def __init__(self, pt: type[T]):
        super().__init__()
        self.adapter: TypeAdapter[T] = TypeAdapter(pt)

    def coerce_compared_value(self, op: OperatorType | None, value: Any) -> Any:
        return self.impl.coerce_compared_value(self, op, value)  # type: ignore

    def bind_processor(self, dialect: Dialect) -> _BindProcessorType | None:
        def processor(value: T | None) -> bytes | None:
            if value is None:
                return None
            return self.adapter.dump_json(value)
        return processor

    def result_processor(self, dialect: Dialect, coltype: Any) -> _ResultProcessorType | None:
        def processor(value: bytes | str | None):
            if value is None:
                return None
            return self.adapter.validate_json(value)
        return processor

    @classmethod
    def col(cls, model: type[T]) -> Column[Self]:
        return Column(cls(model))

from pydantic import RootModel, BaseModel
from sqlmodel import Field, SQLModel
from .sql import PydanticColumn

class Item(BaseModel):
    item: str

class List(RootModel[list[Item]], Sequence[Item]):
    root: list[Item]
    def __len__(self) -> int:
        return len(self.root)

    @overload
    def __getitem__(self, item: int) -> Item:
        ...

    @overload
    def __getitem__(self, item: slice[int | None, int | None, int | None]) -> Sequence[Item]:
        ...

    def __getitem__(self, item: int | slice[int | None, int | None, int | None]) -> Item | Sequence[Item]:
        return self.root[item]

class MyTable(SQLModel, table=True):
     ...  # other columns here
     custom_list: List = Field(default_factory=list, sa_column=PydanticColumn.col(List))

I added the PydanticColumn.col(...) classmethod which is a quicker way to write Column(PydanticColumn(...))

0 replies

iloveitaly · 2026-04-09T13:11:42Z

iloveitaly
Apr 9, 2026

I recently added JSONB field mutation tracking to activemodel. This works both for JSOB fields that render as Pydantic objects and plain old py objects.

0 replies

CHC383 · 2026-05-11T06:14:08Z

CHC383
May 11, 2026

Here is another variation for postgresql + asyncpg + JSONB based on @pporcher's solution and @Alex-S-H-P's solution.

Returns str in bind_processor, as postgresql + asyncpg dialect expects str for encoding (source)
Uses TypeAdapter.validate_python in result_processor to work with Python object. postgresql + asyncpg dialect loads the result to Python object by default (doc, source).
Since result_processor processes Python object, it will work directly with Pydantic model or list of Pydantic model.

from typing import TYPE_CHECKING, Any, cast, override

from pydantic import TypeAdapter
from sqlalchemy import Dialect, TypeDecorator
from sqlalchemy.dialects.postgresql import JSONB

if TYPE_CHECKING:
    from sqlalchemy.sql.type_api import _BindProcessorType, _ResultProcessorType


class PydanticJSONB[T](TypeDecorator):
    impl = JSONB()
    cache_ok = True

    def __init__(self, pydantic_type: type[T]) -> None:
        super().__init__()
        self.adapter = TypeAdapter(pydantic_type)
        self.coerce_compared_value = cast(
            "JSONB", PydanticJSONB.impl
        ).coerce_compared_value

    @override
    def bind_processor(self, dialect: Dialect) -> _BindProcessorType:
        def processor(value: T | None) -> str | None:
            if value is None:
                return None
            return self.adapter.dump_json(value).decode("utf-8")

        return processor

    @override
    def result_processor(self, dialect: Dialect, coltype: Any) -> _ResultProcessorType:
        def processor(value: Any) -> T:
            if value is None:
                return None
            return self.adapter.validate_python(value)

        return processor

Inspired by #1324 (comment), the alembic hook render_item in env.py can be used to generate the migration scripts properly. @Dude29 you could either replace the new class with the original sqlalchemy type in the migration script, or add import for the new class through autogen_context.imports.add

from typing import TYPE_CHECKING, Any, Literal

if TYPE_CHECKING:
    from alembic.autogenerate.api import AutogenContext

def render_item(
    type_: str,
    obj: Any,  # noqa: ANN401
    autogen_context: AutogenContext,
) -> str | Literal[False]:
    if type_ == "type" and isinstance(obj, PydanticJSONB):
        autogen_context.imports.add("import sqlalchemy as sa")
        autogen_context.imports.add("from sqlalchemy.dialects import postgresql")
        return "postgresql.JSONB(astext_type=sa.Text())"
    return False

Inspired by #1324, here is another implementation which is less hacky and follows the sqlalchemy.TypeDecorator API to override process_bind_param and process_result_value.

TypeDecorator.bind_processor: Runs process_bind_param to convert the value before calling impl.bind_processor.
TypeDecorator.result_processor: Runs process_result_value after calling impl.result_processor to convert the value. For the asyncpg case, impl.result_processor is None so process_result_value is called directly, see result_processor below for details.
JSONB.bind_processor: Takes a Python object and returns a JSON string.
JSONB.result_processor: Takes a JSON string/bytes and returns a Python object. This is overriden in AsyncpgJSONB (as of sqlalchemy v2.0.49) because asyncpg provides dialects level deserialization.

So effectively the process is

Serialization: (collections of) Pydantic model -> process_bind_param to convert to Python object -> JSONB bind processor to convert to str -> asyncpg dialect to convert to bytes
Deserializtion: asyncpg dialect convert bytes to Python object -> process_result_value to convert to (collections of) Pydantic model

As a result, this implementation is less performant than overriding bind_processor directly due to double serialization, i.e. (collections of) Pydantic model ---> Python object ---> str instead of (collections of) Pydantic model ---> str. Deserialization is not affected.

from typing import Any, cast, override

from pydantic import TypeAdapter
from sqlalchemy import Dialect, TypeDecorator
from sqlalchemy.dialects.postgresql import JSONB


class PydanticJSONB[T](TypeDecorator):
    impl = JSONB()
    cache_ok = True

    def __init__(self, pydantic_type: type[T]) -> None:
        super().__init__()
        self.adapter = TypeAdapter(pydantic_type)
        self.coerce_compared_value = cast(
            "JSONB", PydanticJSONB.impl
        ).coerce_compared_value

    @override
    def process_bind_param(self, value: T | None, dialect: Dialect) -> Any:
        if value is None:
            return None
        return self.adapter.dump_python(value)

    @override
    def process_result_value(self, value: Any, dialect: Dialect) -> T | None:
        if value is None:
            return None
        return self.adapter.validate_python(value)

0 replies

This comment has been hidden.

Sign in to view

Uh oh!

JSON Fields for Nested Pydantic Models? #1925

Uh oh!

First Check

Commit to Help

Example Code

Description

Operating System

Operating System Details

SQLModel Version

Python Version

Additional Context

Replies: 69 comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been hidden.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!