
If you listen real close you can hear the type whispers in there.
Peewee 4.x is a picture, in code, of me eating my hat. We got async
and are shipping a significantly better type whispering
stub based on typeshed's original. Beyond those, I've added a core JSONField
which exposes a nice cross-backend API for working with JSON data, and a new
declarative API for eager-loading relations.
Peewee 4 is broadly compatible with the 3.x releases, check the changelog for full details.
AsyncIO
To implement asyncio there are a couple of paths authors of synchronous
libraries can take. Django chose the "just shit it into a threadpool" approach,
and as I'm writing this still doesn't support transactions in its async
endpoints, among other quirks like running everything serialized onto a
dedicated thread via sync_to_async (with thread-sensitive set). redis-py uses
the "copy/paste/adjust-to-taste" approach of making a parallel coroutine-based
API, and they have thousands of lines of near-duplication.
Other projects like psycopg do code-gen
to generate sync code from async implementations. All this is to say that
handling async and sync from one codebase often requires making choices which
come with tradeoffs (thread-safety issues, maintenance burden, codegen
stability, etc).
Luckily, Mike Bayer of SQLAlchemy came up with an approach that works well with
libraries like Peewee. This approach uses the greenlet library (not gevent
dammit!) to yield to the event loop whenever I/O occurs. Since Peewee I/O
occurs through a single method (execute_sql), the implementation ended up
being simple: Peewee takes the coroutine, switches control to the async layer
and awaits it on the event loop, and when it's done the result gets passed back
into Peewee transparently.
Example:
import asyncio, time
from playhouse.pwasyncio import AsyncPostgresqlDatabase
db = AsyncPostgresqlDatabase('peewee_test', pool_size=30)
async def slow_query():
async with db: # Acquire a connection for this task.
await db.aexecute_sql('select pg_sleep(1)')
print('finished sleeping')
async def main():
start = time.perf_counter()
await asyncio.gather(*(slow_query() for _ in range(30)))
print(round(time.perf_counter() - start, 2))
await db.close_pool()
asyncio.run(main())
# Prints 1.05 on my machine.
Changing the pool size to 10 correctly prints ~3.05, for example.
In this example, each task checks out its own connection from the pool, hands
pg_sleep(1) to asyncpg, and awaits it, so all thirty tasks run concurrently
on one thread, with no monkeypatching or any nonsense. Connections in pwasyncio
are task-local and maintain their own transactional state, so transactions are
isolated and won't interleave across concurrently-running tasks. Since queries
are awaited like any other coroutine, they also cancel cleanly.
The best part is that the core stays synchronous and, as every query funnels through execute_sql, running queries bridges to the loop in one spot. In short, Peewee uses greenlets to pass coroutines out of synchronous code, so they can be await-ed, at the cost of two lightweight context switches. For more detail check out my earlier post.
My end-goal with pwasyncio was to enable Peewee to work smoothly with
async-first frameworks like FastAPI, Quart, etc. At the time of writing, the
FastAPI docs don't really discuss using an async ORM. SQLModel in the tutorial
uses synchronous endpoints throughout (the threadpool again). With pwasyncio
and the new pydantic helpers,
a fully async FastAPI demo looks like this:
# fastapi_example.py
from fastapi import Depends, FastAPI, HTTPException
from contextlib import asynccontextmanager
from peewee import *
from playhouse.pwasyncio import AsyncPostgresqlDatabase
from playhouse.pydantic_utils import to_pydantic
db = AsyncPostgresqlDatabase('peewee_test')
class User(db.Model):
name = CharField(verbose_name='Full Name', help_text='Display name')
email = CharField(unique=True)
status = IntegerField(default=1, choices=(
(1, 'Active'),
(2, 'Inactive'),
(3, 'Deleted')))
# Generate pydantic schemas suitable for create and responses.
# Schemas include metadata derived from verbose_name, help_text, choices, and
# default value.
UserCreate = to_pydantic(User, model_name='UserCreate')
UserResponse = to_pydantic(User, exclude_autofield=False, model_name='UserResponse')
async def get_db():
# Hold a pooled connection open for the duration of the request.
async with db:
yield db
@asynccontextmanager
async def lifespan(app):
# Create tables (if they don't exist) at application startup.
async with db:
await db.acreate_tables([User])
yield
await db.close_pool() # Shut-down pool and exit.
app = FastAPI(lifespan=lifespan)
@app.get('/users', response_model=list[UserResponse])
async def list_users(db=Depends(get_db)):
rows = await User.select().dicts().aexecute()
return [UserResponse(**row) for row in rows]
@app.post('/users', response_model=UserResponse)
async def create_user(data: UserCreate, db=Depends(get_db)):
user = await User.acreate(**data.model_dump())
return UserResponse.model_validate(user)
@app.get('/users/{user_id}', response_model=UserResponse)
async def get_user(user_id: int, db=Depends(get_db)):
try:
user = await db.get(User.select().where(User.id == user_id))
except User.DoesNotExist:
raise HTTPException(status_code=404, detail='User not found')
return UserResponse.model_validate(user)
To learn more, check out the async documentation.
The framework integration doc
shows other FastAPI patterns such as Dependency Injection, as well as how to
use Peewee with other popular async frameworks like Starlette, Quart, etc.
JSON Field
Peewee 4 adds a core JSONField
which provides a unified API for reading, writing, querying and modifying JSON
data. Prior to this, the JSON fields were scattered across backend-specific playhouse
extensions, and they each had subtly different APIs and behaviors. The biggest
pain-point is differentiating between SQL types and JSON types, e.g.:
-- sqlite
WITH t(j) AS (VALUES ('{"k": "v"}'), ('{"k": 1}'))
SELECT j,
j->'k' AS json_val,
typeof(j->'k') AS json_type,
j->>'k' AS sql_val,
typeof(j->>'k') AS sql_type
FROM t;
╭────────────┬──────────┬───────────┬─────────┬──────────╮
│ j │ json_val │ json_type │ sql_val │ sql_type │
╞════════════╪══════════╪═══════════╪═════════╪══════════╡
│ {"k": "v"} │ "v" │ text │ v │ text │
│ {"k": 1} │ 1 │ text │ 1 │ integer │
╰────────────┴──────────┴───────────┴─────────┴──────────╯
SQLite's ->> returns a native integer for the numeric key, while -> keeps
it as JSON-typed text. Postgres is subtly different: -> always returns
jsonb and ->> always returns text, regardless of the value.
-- postgres
WITH t(j) AS (VALUES ('{"k": "v"}'::jsonb), ('{"k": 1}'::jsonb))
SELECT j,
j->'k' AS json_val,
pg_typeof(j->'k') AS json_type,
j->>'k' AS sql_val,
pg_typeof(j->>'k') AS sql_type
FROM t;
j │ json_val │ json_type │ sql_val │ sql_type
────────────┼──────────┼───────────┼─────────┼──────────
{"k": "v"} │ "v" │ jsonb │ v │ text
{"k": 1} │ 1 │ jsonb │ 1 │ text
MySQL 8 has a real JSON type too, so -> comes back as json. MariaDB has no
JSON type at all (it's an alias for longtext), so everything you extract is
plain text.
The core JSONField provides a consistent API that works across SQLite (3.38+),
Postgres (jsonb), MySQL 8 and MariaDB (pass mariadb=True to MySQLDatabase).
The complexities around SQL-vs-JSON are mostly avoided by treating everything
as a JSON value (-> in SQLite/Postgres, as opposed to ->>). This isn't
perfect, as it requires explicit casts in some places to force numeric ordering
(as opposed to lexicographic), but on the whole it works well and is an
improvement over the pre-existing playhouse implementations.
Here's the model and data I'll use in the examples:
from peewee import *
db = PostgresqlDatabase('peewee_test') # or SqliteDatabase / MySQLDatabase.
class Pet(db.Model):
name = TextField()
data = JSONField()
with db.atomic():
db.create_tables([Pet])
Pet.create(name='Huey', data={
'species': 'cat',
'age': 14,
'toys': ['hard-fur-mouse', 'ball'],
'vitals': {'weight': 8, 'vet': 'Dr. Quinn'}})
Pet.create(name='Mickey', data={
'species': 'dog',
'age': 10,
'toys': ['bone', 'rope'],
'vitals': {'weight': 40}})
Pet.create(name='Zaizee', data={
'species': 'cat',
'age': 2,
'toys': ['feather'],
'vitals': {'weight': 15, 'vet': 'Dr. Quinn'}})
Peewee supports reading the data back, traversing paths, and doing comparisons:
Pet.get(Pet.name == 'Huey').data
# Poor Huey, he's getting to be an old boy. He goes kinda slow down
# the stairs these days.
{'age': 14,
'toys': ['hard-fur-mouse', 'ball'],
'vitals': {'vet': 'Dr. Quinn', 'weight': 8},
'species': 'cat'}
# Each pet's best toy.
query = Pet.select(Pet.name, Pet.data['toys'][0])
list(query.tuples())
# Result:
[('Huey', 'hard-fur-mouse'), ('Mickey', 'bone'), ('Zaizee', 'feather')]
# How many of each species do we have? My Aunt and Uncle in Connecticut
# used to have like 30 pets, including a (real) prairie dog! Could I
# have a future in animal hoarding? At what point do they just start
# eating each other, I wonder?
query = (Pet
.select(Pet.data['species'].alias('species'),
fn.COUNT(Pet.id).alias('count'))
.group_by(Pet.data['species']))
[(p.species, p.count) for p in query]
# Result (amateur numbers):
[('cat', 2), ('dog', 1)]
# What cats do we have, again?
[p.name for p in Pet.select().where(Pet.data['species'] == 'cat')]
# Result:
['Huey', 'Zaizee']
# In order to avoid lexicographic comparison (SQLite and MariaDB), when
# comparing against numbers add `.as_int()` (or `.as_float()`):
[p.name for p in Pet.select().where(Pet.data['age'].as_int() > 5)]
# Result:
['Huey', 'Mickey']
There are also helpers for doing atomic mutations on JSON values:
# Give Huey some yarn, since he's getting old.
(Pet.update(data=Pet.data['toys'].append('yarn'))
.where(Pet.name == 'Huey')
.execute())
# Huey's new prescription food has more nutrition in it, I think.
(Pet.update(data=Pet.data['vitals']['weight'].set(9))
.where(Pet.name == 'Huey')
.execute())
Pet.get(Pet.name == 'Huey').data
# Result:
{'age': 14,
'toys': ['hard-fur-mouse', 'ball', 'yarn'],
'vitals': {'vet': 'Dr. Quinn', 'weight': 9},
'species': 'cat'}
Finally, MySQL, MariaDB and Postgres offer containment operations, so you can
search by key or partial document.
has_vet = Pet.select().where(Pet.data['vitals'].has_key('vet'))
[p.name for p in has_vet]
# Cats are, generally, sicklier than dogs:
['Huey', 'Zaizee']
# Partial-doc containment:
cats = Pet.select().where(Pet.data.contains({'species': 'cat'}))
[p.name for p in cats]
# Result:
['Huey', 'Zaizee']
Instead of learning one API per database, there's now one canonical JSON API.
Unfortunately, though, there are still some little quirks that ended up being
too awkward to paper over. The main one to be aware of is that Postgres ships a
shallow update, while SQLite and MySQL offer full merge/patch nested updates.
The backend-specific fields in playhouse aren't going anywhere, if you prefer
to continue using those, however the new JSONField is more robust and consistent
in how it differentiates between SQL and JSON-typed values, and should be
preferred whenever possible.
For more details, see the JSONField docs.
Eager Loading
For as long as I can remember, Peewee has provided eager-loading of relations
via JOINs across the "forward" direction of a foreign-key. For example, when
displaying a list of Tweets, also display the author's username:
query = (Tweet
.select(Tweet, User)
.join(User))
for tweet in query:
print(tweet.user.username, '->', tweet.content)
But the opposite direction has always been a bit trickier: for each user in a
list, also list out their tweets. I added prefetch() many years ago to
address this need. Behind-the-scenes, prefetch executes one query per relation
being fetched, and then merges the related rows back together in Python
efficiently. The implementation is reasonably solid, but it suffers from a few
defects. First, prefetch is not declarative - it evaluates its parameters
immediately and doesn't play very nicely with other ORM APIs. Second, a very
common scenario is to only prefetch X number of rows per parent. This is
particularly tricky to get right, as it requires using CTEs.
For Peewee 4, I've added a new declarative API for controlling eager-loading of
relations. The with_related()
method, together with Load(), supports nestable eager-loading using one of
several strategies, and also supports a limit-per-parent on the child relation.
The examples below will use my old standbys:
class User(db.Model):
username = TextField()
class Tweet(db.Model):
user = ForeignKeyField(User, backref='tweets')
content = TextField()
timestamp = TimestampField()
class Favorite(db.Model):
tweet = ForeignKeyField(Tweet, backref='favorites')
user = ForeignKeyField(User, backref='favorites')
For the first example, let's fetch a list of users and their tweets. To do
this, we'll specify that we want to Load() the backref User.tweets:
query = User.select().with_related(Load(User.tweets))
for user in query:
print(user.username, [t.content for t in user.tweets])
# Huey ['meow', 'purr', 'hiss']
# Mickey ['woof', 'bark']
# Zaizee []
# We can also specify a query to filter the relation. Let's ignore
# Mickey and see everyone else along with what they're saying in 2026.
tweets_query = Tweet.select().where(Tweet.timestamp.year >= 2026)
query = (User
.select()
.where(User.username != 'Mickey')
.with_related(Load(User.tweets, tweets_query)))
These examples run two queries no matter how many users are found. The query
on the Tweets table is filtered by the Users from the outer query:
-- First example.
SELECT "id", "username" FROM "user";
SELECT * FROM "tweet" WHERE "user_id" IN (
SELECT "id" FROM "user"
);
-- Second example.
SELECT "id", "username" FROM "user" WHERE "username" != 'Mickey';
SELECT * FROM "tweet" WHERE "timestamp" >= ... AND "user_id" IN (
SELECT "id" FROM "user"
WHERE "username" != 'Mickey'
);
The primary use-case is for eagerly-loading backrefs, but the Load helper can
also fetch forward foreign-key references. The rows are loaded once, the first
time the query is evaluated, regardless of whether you're iterating, calling
get(), indexing, taking len(), etc., so it plays nicely with the rest of
Peewee's APIs.
To go deeper than one level, relations can be nested with Load.then(). The
example below efficiently fetches Users, their tweets, and the favorites on
each tweet:
query = User.select().with_related(
Load(User.tweets).then(
Load(Tweet.favorites)))
for user in query:
for tweet in user.tweets:
print(user.username, tweet.content, len(tweet.favorites))
# Huey meow 1
# Huey purr 0
# Huey hiss 2
# Mickey woof 1
# Mickey bark 0
Each Load can also specify its own query for filtering, ordering and joining:
# Each tweet's favorites, with the favoriting user loaded in the same query:
favorites = Favorite.select(Favorite, User).join(User)
query = (Tweet
.select()
.with_related(Load(Tweet.favorites, favorites))
.order_by(Tweet.content))
for tweet in query:
print(tweet.content, [f.user.username for f in tweet.favorites])
# bark []
# hiss ['Mickey', 'Zaizee']
# meow ['Zaizee']
# purr []
# woof ['Huey']
Because the favorites query is a real query, accessing f.user.username is
free since the favoriting users were joined and selected.
The piece I'm happiest with is per_parent. A plain LIMIT on an eager load
applies to the relation as a whole, so limit(2) gives you two tweets total, not
two per user. Getting the two most-recent tweets for each user is a top-N-per-group
problem, and, as I mentioned, those are fiddly to write by hand. per_parent=n
does this with a window function, still in a single query:
recent = Tweet.select().order_by(Tweet.timestamp.desc())
query = User.select().with_related(
Load(User.tweets, recent, per_parent=2))
for user in query:
print(user.username, [t.content for t in user.tweets])
# Huey ['hiss', 'purr']
# Mickey ['bark', 'woof']
# Zaizee []
The last piece is the loading strategy. The default strategy embeds the parent
query as an IN (SELECT ...) subquery. PREFETCH_TYPE.JOIN joins against the
parent query as a derived table instead, which you need when paginating the
parent on MySQL/MariaDB (since neither will accept a LIMIT inside an IN
subquery). PREFETCH_TYPE.MATERIALIZE skips the subquery entirely and sends
the parent keys it already holds in memory as a list of values:
Load(User.tweets, strategy=PREFETCH_TYPE.MATERIALIZE)
# SELECT * FROM "tweet" WHERE "user_id" IN (1, 2, 3)
Materialize avoids re-running the parent query, at the cost of one bind
parameter per parent, so it is suitable for expensive parent queries that
return a manageable number of rows.
For more details, see the eager-loading docs.
Types
The final piece is also the one I deserve the most flak for not doing sooner,
but yeah, type whispers have finally
arrived in-tree... Well, kinda. It's a long story, but trying to get typing
into Peewee is actually a pain-in-the-ass because we ship a py-module instead
of a package, so there's no place to put a py.typed marker. I originally
began by making some improvements to the typeshed stub, but my patch was just
sitting for a while, so I started digging and figured out we could just vendor
the stub in a top-level peewee-stubs/ package built as part of the
wheel/sdist. mypy and pyright both pick it up and immediately start using it.
As of 4.1.1, Peewee will ship with this improved type stub, which (finally)
properly handles differentiating between Model.field and instance.field:
class User(Model):
username = CharField()
age = IntegerField(null=True) # Type stub respects null= option.
class Tweet(Model):
user = ForeignKeyField(User)
reveal_type(User.username) # CharField[str] (querying, etc)
reveal_type(User().username) # str (the actual username value)
reveal_type(User.age) # IntegerField[int | None]
reveal_type(User().age) # int | None
reveal_type(Tweet.user) # ForeignKeyField[User]
reveal_type(Tweet().user) # User
Anecdotally, type-checking for ORMs like Peewee is kinda tricky. Django has an
elaborate mypy plugin, and
SQLAlchemy chose to redesign their surface to better accommodate checkers in 2.0.
While the stub isn't perfect, my hope is that it hits the 80% use-case neatly,
and provides a stable base for future improvement.
What's next
I plan to continue to improve the asyncio implementation if adoption grows and
requests start showing up in the issue tracker. I'm still a little dubious
whether it will actually see adoption, or if people will continue using
the synchronous endpoint + threadpool pattern with frameworks like FastAPI.
Beyond that, my plan is to continue improving Peewee one piece at a time.
Maybe migrations, although I don't know, that one's pretty fraught. At work we
have schema changes every release, and it seems to always be the case that
there are subtle wrinkles that prevent a naive approach: huge tables that
require adding the column, default value, and constraint in steps, indexes that
need to be added concurrently outside the main DDL transaction, business logic
to migrate or update data in new or altered columns... We'll see, though, there
is certainly room for some improvement.
I've been happy to make concessions to async and typing. For better or worse,
they are mainstream Python, and I don't want to diminish Peewee's utility for
an aesthetic pose. One fashion I will continue to reject,
though, is the cottage industry that's grown up around importing Spring Data,
Fowler, and Enterprise Java Patterns into Python. Repositories
around sessions, services around repositories, DTO around models, generics
at every layer, all to avoid User.select(). Nobody wants spaghetti code, but
check out the lasagna and the baking-sheets covered in tiny dry raviolis.
Peewee's philosophy from the start is that the ORM is the abstraction over the
database, and business logic lives in a class or function wrapped with db.atomic().
That's one hat I will not eat, unless possibly it was made out of candy.