Building a Pythonic REST Client Without Pydantic, dataclasses, or Code Generation
GoFigr's Python client has survived four major API versions, a data model that grew from 5 entity types to 15+, and the daily demands of data scientists who don't care about REST conventions -- they just want workspace.analyses to work. The entire framework that makes this possible is ~500 lines of code, and adding a new entity type takes about ten lines and ten minutes.
This post is about how we got there. Most Python API clients land in one of two places. The first is a thin wrapper that returns raw dictionaries, forcing users to memorize key names and manually chase nested objects:
response = client.get("/workspace/abc-123/")
workspace = response.json()
for analysis_id in workspace["analyses"]:
analysis = client.get(f"/analysis/{analysis_id}/").json()
for figure_id in analysis["figures"]:
figure = client.get(f"/figure/{figure_id}/").json()
print(figure["name"])The second is a client generated from an OpenAPI spec using tools like openapi-generator or openapi-python-client. These produce thousands of lines of boilerplate where every model is a rigid class with dozens of fields, no relationship navigation, and an interface that feels like it was designed for Java.
We built something different: a small framework using a custom Field system, a ModelMixin base class, Python descriptors, and a runtime model-binding trick to create a client that reads like an ORM:
gf = GoFigr(api_key="...")
# Navigate the hierarchy naturally
for workspace in gf.workspaces:
for analysis in workspace.analyses:
for figure in analysis.figures:
print(figure.name)
# Find-or-create is a single call
analysis = workspace.get_analysis(name="Experiment 1", create=True)
# Create objects with bound references -- no passing the client around
fig = gf.Figure(name="Results", analysis=analysis)
fig.create()And the developer experience is just as concise. Adding a new model type to the client is a field list and an endpoint string -- everything else (CRUD, serialization, relationship navigation, lazy loading) is inherited:
That's the complete definition. Seven lines give you a fully functional model with .fetch(), .save(), .create(), .delete(), .to_json(), .from_json(), bidirectional relationship navigation (analysis.workspace, analysis.figures), find-or-create on child collections, link sharing, activity logs, and thumbnails. The mixins compose orthogonally, and adding a new entity to the system takes minutes, not hours.
This post walks through how it works, why we built it this way instead of using Pydantic or code generation, and the specific Python techniques that make it possible.
The design constraints
GoFigr's data model is a hierarchy: Workspaces contain Analyses, which contain Figures, which contain Revisions. Revisions hold polymorphic Data objects (images, code, tables, files, text). There are also cross-cutting concerns: sharing, memberships, activity logs, thumbnails, and assets.
The client library needs to:
Navigate relationships in both directions --
figure.analysisandanalysis.figuresHandle polymorphic data -- a revision's data list contains a mix of images, code snippets, tables, and files, each with type-specific properties
Support lazy and eager loading -- list endpoints return shallow objects (API IDs only for children), detail endpoints return nested objects
Serialize and deserialize transparently -- timestamps, base64 blobs, UUIDs, and nested entities all need bidirectional conversion
Avoid passing the client instance everywhere --
gf.Figure(name="x")should work withoutFigure(name="x", client=gf)Stay small -- this is a pip-installable library for data scientists, not a framework
Why not Pydantic?
Pydantic is the obvious first choice for Python data modeling. It handles validation, serialization, and type coercion well. But it creates friction in several places for an API client like this:
Mutable objects with server-side state. Pydantic models are designed to validate data at construction time. API entities are different: you create a Figure(name="Results") with most fields unset, call .create(), and the server fills in api_id, created_on, created_by, etc. This partial-construction pattern fights Pydantic's validation-at-init design. You'd need Optional on nearly every field and lose the validation benefits. Or we'd need to split the models into a domain layer and a DTO layer, which would just double the boilerplate.
Bidirectional relationships with lazy loading. When the server returns a workspace, analyses might be a list of full objects (detail endpoint) or a list of API ID strings (list endpoint). The same field needs to hold either form and seamlessly resolve one to the other. Pydantic's validators can handle this, but the model definitions become cluttered with Union[str, Analysis, None] types and custom validators that obscure the simple underlying structure.
Client binding. Every model instance needs a reference to the GoFigr client to make HTTP requests (.fetch(), .save(), .create()). Pydantic models don't naturally carry a reference to their "service layer." You either inject it post-construction (awkward), thread it through a context variable (implicit), or design a separate repository/service layer (more code). GoFigr's approach -- binding it at class creation time -- is more direct.
Polymorphic deserialization. A revision's data field contains a mix of ImageData, CodeData, TableData, FileData, and TextData objects. The type is determined by a type field in the JSON. Pydantic v2 supports discriminated unions, but the ergonomics aren't ideal when you want revision.image_data to return a filtered, typed list and revision.image_data = [...] to replace only the image entries.
None of this means Pydantic is wrong for API clients in general. For read-heavy clients with immutable responses, it's excellent. But for a client that's also a builder, navigator, and mutator of a complex entity graph, the impedance mismatch adds up.
Why not code generation?
OpenAPI code generators (openapi-generator, openapi-python-client, Fern, Stainless) produce clients from a spec. This guarantees API coverage and type correctness, but:
The generated code is the wrong abstraction level. Generated clients mirror the HTTP API: one method per endpoint, flat parameter lists, no relationship navigation. client.list_analyses(workspace_id="...") instead of workspace.analyses. The user mentally translates between the object graph they're thinking in and the flat API they're calling.
Maintenance is bidirectional. Every API change requires regenerating the client, reviewing the diff, and resolving conflicts with any hand-written extensions. For a small team, maintaining the generator configuration, custom templates, and post-generation patches costs more than maintaining 500 lines of framework code.
No room for domain-specific sugar. workspace.get_analysis(name="x", create=True) -- find by name, create if missing -- is one line in the GoFigr client. In a generated client, it's a try/catch around a list call, a filter, and a conditional create call. The difference matters when your users are data scientists who want to focus on their analysis, not on API plumbing.
The Field system
The foundation is a Field class hierarchy that handles bidirectional conversion between Python objects and JSON:
The base Field is an identity transform. Subclasses override the conversion methods:
This is deliberately simpler than both Pydantic field types and Django REST Framework serializer fields. There's no validation, no schema generation, no OpenAPI integration. Each field does exactly one thing: convert between wire format and Python objects. The derived flag marks fields that come from the server but shouldn't be sent back (like computed properties or reverse relationships).
Why not use descriptors for all fields?
Python descriptors (__get__/__set__/__delete__) would be a natural fit for fields, and GoFigr does use them for one specific case (covered below). But the main field system needs something descriptors don't provide: fields that can be iterated as a collection.
Serialization and deserialization both work by looping over fields. to_json() iterates self.fields.values() and calls to_representation() on each one. _update_properties() does the reverse, iterating and calling to_internal_value(). With descriptors, you'd need a separate registry alongside them to enumerate which attributes participate in serialization -- two parallel structures to keep in sync.
There's a second problem: deserialization needs context that descriptors don't receive. LinkedEntityField.to_internal_value(gf, data) takes the GoFigr client instance as an argument so it can resolve model classes (e.g., lambda gf: gf.Workspace). A descriptor's __set__ only receives (self, instance, value) -- there's no natural way to pass that context. You'd have to split deserialization out of the field objects entirely, losing the cohesion of having each field own its own conversion logic.
Finally, half the fields don't need descriptor behavior at all. Fields like "api_id", "name", and "description" are defined as bare strings -- shorthand for Field("name"), an identity transform. They're just regular attributes that happen to participate in serialization.
So instead of descriptors, each ModelMixin.__init__ clones the field specifications and stores them as an instance-level dictionary:
The class-level fields list is the schema definition. At init time, it's cloned into an instance-level dictionary so that each instance has its own field objects -- important because LinkedEntityField stores a reference to its owning instance for setting up backlinks on child collections.
Linked entities and the mini-ORM
The most interesting field type is LinkedEntityField, which handles relationships between API entities:
The entity_type parameter is a callable that takes a GoFigr instance and returns a model class -- a lambda like lambda gf: gf.Workspace. This deferred resolution is necessary because model classes don't exist yet when field definitions are evaluated (they're created at runtime by _bind_models -- more on this below).
When deserializing, LinkedEntityField handles both shallow (API ID string) and deep (full JSON object) representations:
If the server sends "analyses": ["uuid-1", "uuid-2"], each becomes an Analysis(api_id="uuid-1") -- a shallow object that can be .fetch()'d later. If the server sends full objects ("analyses": [{...}, {...}]), each is fully populated immediately.
For one-to-many relationships (many=True), entities are wrapped in a LinkedEntityCollection:
This gives collections an ORM-like interface. workspace.analyses.find(name="Experiment 1") searches locally; workspace.analyses.create(gf.Analysis(name="New")) creates on the server and appends locally. The backlink_property automatically sets the parent reference -- when you create an analysis through workspace.analyses, the analysis's workspace field points back to the workspace.
Here's how a model definition looks in practice:
The entire data model -- workspaces, analyses, figures, revisions, assets, data objects, invitations, API keys -- is defined this way. Each model class is ~10-20 lines: a field list, an endpoint string, and any entity-specific methods. Compare this with the typical Pydantic model where you'd need field definitions, validators, serialization config, and separate service methods.
The binding trick
Every model needs a reference to the GoFigr client to make HTTP requests. When you call figure.save(), the model has to know which server to talk to and which credentials to use. The question is how that reference gets there.
The obvious approaches all have drawbacks:
Passing the client explicitly:
Figure(name="Results", client=gf)-- verbose, easy to forget, and every constructor call has to carry this boilerplateModule-level singleton:
gofigr.configure(api_key="...")thenFigure(name="Results")-- implicit global state, and you can't have two clients talking to different serversContext variable:
with gf.context(): Figure(name="Results")-- fragile in async code, easy to leak across scopesFactory methods:
gf.create_figure(name="Results")-- no polymorphism, and the client class explodes with one method per entity type
GoFigr solves this with _bind_models, called in GoFigr.__init__:
This iterates over every ModelMixin subclass in the module (named with a gf_ prefix by convention), creates a new subclass with _gf bound to the current GoFigr instance, renames it for clean display, and attaches it as an attribute of the client.
After this runs, gf.Figure is not gf_Figure -- it's a dynamically created subclass of gf_Figure where _gf is already set. When you write gf.Figure(name="Results"), the __init__ inherited from ModelMixin has access to self._gf without you passing it.
This is a form of partial application at the class level. Instead of functools.partial on a function, we're creating a "partially applied class" where one constructor argument is pre-filled. You get the ergonomics of a singleton with the correctness of dependency injection. Multiple GoFigr instances can coexist, each with its own set of bound model classes. And because _Bound is a real subclass, isinstance(gf.Figure(...), gf_Figure) returns True -- type checking works as expected.
Why the gf_ prefix?
gf_ prefix?Model classes are defined with a gf_ prefix (gf_Workspace, gf_Figure) that gets stripped during binding. This is a practical convention, not an aesthetic choice:
It prevents name collisions with common names (
Data,Asset,Figureare all common in scientific Python).It makes the binding code simple:
name.replace("gf_", "")is more reliable than trying to filter a list of "model-ish" names.The prefix classes are internal -- users only interact with the bound versions (
gf.Figure,gf.Workspace).
Descriptors for embedded metadata
While most fields use the Field system, one case calls for Python descriptors: fields that live inside a metadata JSON blob rather than as top-level attributes.
Data objects (images, code, tables, files) share a common structure with api_id, name, type, data (bytes), and a metadata JSON field. But each data type has its own metadata keys: images have format and is_watermarked, code has language and encoding, tables have format and encoding.
MetadataProxyField is a descriptor that reads from and writes to the nested metadata dictionary:
Usage is clean:
From the user's perspective, image.format and image.is_watermarked look like regular attributes. Under the hood, they read from and write to image.metadata["format"] and image.metadata["is_watermarked"]. When the object is serialized, the metadata dict is sent as a single JSON field -- the server doesn't need to know about the individual keys.
This pattern is useful whenever you have a structured blob (JSON, HSTORE, protocol buffer Any) where you want attribute-style access to known keys without losing the ability to store arbitrary additional metadata. Django's JSONField has a similar need, and libraries like django-jsonfield-backport use comparable approaches.
Polymorphic data objects
Revisions store a flat list of data objects with mixed types. The API returns them all as generic Data objects with a type discriminator. After deserialization, _update_properties in RevisionMixin specializes each one:
The specialize() method on gf_Data acts as a factory:
Note that this uses self.client.ImageData (the bound class), not gf_ImageData directly. This ensures the specialized instances carry the correct _gf reference.
The revision then provides filtered property access:
The setter uses _replace_data_type, which preserves all data of other types:
This means revision.image_data = [new_image] replaces only images, leaving code, tables, and files untouched. The flat internal list acts as a union type, while the typed properties provide a discriminated view. Compare this to Pydantic's discriminated unions, which enforce type at construction but don't provide filtered views or partial replacement.
Shallow vs. deep: a bandwidth optimization
API entities exist in two states. A shallow object has an api_id but no payload data -- it's a reference that can be resolved later. A deep object has all its fields populated. The API uses this distinction for efficiency: list endpoints return shallow children, detail endpoints return deep objects.
The _restore_data pattern in RevisionMixin handles a subtle interaction between shallow responses and local state:
When you create or save a revision, you send full data objects (with bytes). The server responds with a shallow copy (no bytes) to save bandwidth. Without _restore_data, the response would overwrite the local data with empty shells, and subsequent operations would fail. The snapshot-and-restore pattern preserves the local byte payloads across save round-trips.
What this adds up to
The complete framework is roughly:
~110 lines for the
Fieldhierarchy (Field,Timestamp,Base64Field,JSONField,LinkedEntityField,DataField)~180 lines for
ModelMixin(CRUD, serialization, field management)~80 lines for
LinkedEntityCollection(find, find_or_create, create)~45 lines for
MetadataProxyField(descriptor)~20 lines for
_bind_models(runtime class creation)
Total: ~435 lines of framework, supporting 15+ model classes with full CRUD, relationship navigation, polymorphic data, lazy loading, and serialization.
The equivalent in Pydantic + a service layer would likely be 2-3x the code. A generated client would be 10-20x but with less functionality (no relationship navigation, no find-or-create, no bound instances). The tradeoff is that this is a custom micro-framework -- new team members need to learn it. But at ~435 lines, reading the source is faster than reading Pydantic's documentation.
When to use this pattern
This approach works well when:
Your API has a rich entity graph with relationships, not just flat resources
Entities are mutable and go through lifecycles (create → modify → save)
You need polymorphic types that the server discriminates
Your user base values ergonomics over strict type safety
The team is small enough that a custom mini-framework is maintainable
It works less well when:
You need strict runtime validation (use Pydantic or attrs)
You want static type checking with mypy/pyright (the dynamic binding defeats inference)
The API surface is huge and changes frequently (use code generation)
Multiple languages need clients from the same spec (use OpenAPI)
For GoFigr, the tradeoff has been worth it. The client reads like an ORM, the source fits in two files, and we've maintained it across 4 major API versions with minimal churn. Sometimes the right abstraction is one you build yourself.
What we give up
Honesty demands a section on what this approach costs. Building your own model layer means forgoing a real ecosystem, and some of the gaps sting.
No auto-generated documentation. This is the big one. With Pydantic models backing a FastAPI or Django Ninja API, you get Swagger / OpenAPI docs for free -- interactive, always up-to-date, and explorable by anyone with a browser. Our API documentation is hand-written. It's accurate, but it requires discipline to keep in sync, and it doesn't let you click "Try it out" to fire a request. For a public API with external consumers, this would be a serious liability. For a client library consumed primarily through Python and R, it's manageable but still a tax we pay on every release.
No static type checking. The _bind_models trick defeats every type checker. gf.Figure is dynamically created at runtime, so mypy and pyright see GoFigr as having no Figure attribute. IDE autocompletion works intermittently -- some editors pick up the dynamic attributes after running the code, most don't. We could add a .pyi stub file listing all bound model classes, but that's another artifact to maintain, and it would lie about the implementation (claiming class attributes that are actually instance attributes). For a library used in Jupyter notebooks where users rely on tab completion, this is a real friction point.
No cross-language client generation. OpenAPI's greatest strength is that one spec produces clients in Python, JavaScript, Java, Go, and a dozen other languages. GoFigr's R client was built from scratch with its own patterns (environments instead of classes, factory functions instead of model binding). The two clients work well independently, but there's no shared contract enforcing that they handle the same fields and endpoints. When we add a field to the Python client, we manually update the R client. This has been fine at our scale, but it wouldn't survive a third or fourth language.
No runtime validation. Pydantic catches malformed data at construction time. Our models accept anything you pass them and fail later -- when you call .save() and the server rejects the payload, or when you access a field that should be a datetime but is actually a string because you forgot to pass parse=True. The errors are clear enough, but they arrive late. Pydantic's "fail fast" model is genuinely better for catching bugs early, and we occasionally miss it.
No schema evolution tooling. Pydantic and OpenAPI ecosystems have tools for schema diffing, backward compatibility checking, and migration. We have tests. The tests are thorough -- integration tests that round-trip every entity type through create/fetch/update/delete -- but they don't tell you "this change will break clients on version 1.2." We track compatibility manually through API versioning on the server side, which works but requires vigilance.
These are real costs, not hypothetical ones. We've accepted them because the alternative costs (generated code maintenance, Pydantic impedance mismatch, loss of relationship navigation) were higher for our specific situation. A team with different constraints - a public API, multiple consumer languages - should weigh them differently.
Source code
The full source is on GitHub: models.py contains the Field system, ModelMixin, and all entity definitions; __init__.py contains the GoFigr client class and _bind_models.
Last updated