Capturing Plots in R and Python: A Tale of Two Architectures
In building GoFigr, we had to solve the same problem in two very different languages (R and Python):
Automatically capture every plot a user creates and publish it, without requiring any changes to the user's code beyond a single setup call.
In Python (Jupyter), this works reliably across matplotlib, plotly, plotnine, py3Dmol, and anything else that uses IPython's display system. In R, auto-publish is experimental -- it works in many common cases (ggplot2, lattice, ComplexHeatmap, and other libraries that produce plot objects rendered via print()), but the R graphics architecture introduces enough exceptions and edge cases that we recommend R users pipe plots to gofigR::publish() explicitly. This turns out to be both more reliable and more idiomatic given how R users typically work with plotting code.
This post explains why. The answer lies in a fundamental architectural difference in how these two ecosystems handle display output.
The core problem
An auto-publish system needs to:
Detect that a plot was created.
Obtain the plot object (not just the rendered pixels) so we can re-render it in multiple formats, apply watermarks, extract titles, and attach metadata.
Capture context -- source code, execution environment, data frames -- to make the figure reproducible.
In an ideal world, all plot output would flow through a single chokepoint we can instrument. Python's Jupyter ecosystem gives us exactly that. R does not.
Python: A centralized display layer
IPython's DisplayPublisher
Every rich output in Jupyter -- whether it's a matplotlib figure, a Plotly chart, a pandas DataFrame, or an HTML widget -- ultimately flows through a single object: the IPython shell's display_pub, an instance of DisplayPublisher. When any code calls display(obj) or when a cell's return value is auto-displayed, IPython serializes the object into a dictionary of MIME types (image/png, text/html, application/vnd.plotly.v1+json, etc.) and hands it to display_pub.publish().
This is the chokepoint.
Wrapping the publisher
GoFigr replaces IPython's native DisplayPublisher with a thin wrapper:
The installation is one line in the extension's register_hooks():
From this point on, every single display call in the notebook session passes through our wrapper. matplotlib, plotly, py3Dmol -- it doesn't matter. If it calls display(), we see it.
From MIME data back to figure objects
Intercepting the display call gives us the MIME dictionary (data), but we need the actual figure object -- a matplotlib.figure.Figure or a plotly.graph_objects.Figure -- for re-rendering, watermarking, and serialization.
GoFigr uses stack inspection to find it. When the display trap fires, we walk up the call stack looking for the originating library's display/show function, then extract the figure from its arguments:
The get_all_function_arguments helper iterates over all positional args, *args, and **kwargs of a stack frame:
This is admittedly a hack -- we're relying on the internal call structure of matplotlib and plotly. But it's a stable hack: these libraries' display paths haven't changed in years, and if they do, the failure mode is graceful (we just don't find the figure and skip auto-publish).
The full pipeline
Here's the complete flow for a single figure:
Key details:
SuppressDisplayTrap temporarily disables the trap during publication to prevent infinite recursion (publishing a watermarked figure would itself trigger
display()).Duplicate prevention: Each figure object is tagged with
_gf_is_published = Trueafter publication. Theauto_publish_hookchecks this flag before publishing.Deferred revisions: If
publish()is called beforepre_run_cellfires (e.g. extension loaded in the same cell as the first plot), the revision is deferred and annotated with cell metadata inpost_run_cell.
Why this works so well
The architecture has a single interception point that every visualization library must pass through. Adding support for a new library requires only a new GoFigrBackend subclass with a find_figures method -- no monkey-patching, no library-specific hooks, no changes to the core pipeline.
R: No centralized display layer
How R graphics work
R's graphics system is built on graphics devices. A graphics device is a C-level struct (DevDesc) that implements ~30 callback functions for primitive drawing operations: line(), rect(), circle(), polygon(), text(), raster(), newPage(), and so on.
When you call plot(x, y) in R, the following happens:
The
plot()S3 generic dispatches to a method (e.g.plot.default).That method calls low-level graphics functions (
plot.new(),plot.window(),axis(),points(), etc.).Each of those calls the active graphics device's C callbacks directly.
There is no equivalent to IPython's DisplayPublisher. There is no single function that all plot output flows through. The graphics device receives a stream of drawing primitives and has no knowledge of which library is calling it, what the source code was, or even when a "plot" begins and ends.
The key difference is stark: in Python, there's a funnel. In R, there's a fan-out directly to the device.
Object-oriented vs. imperative plotting
This difference is compounded by a split in R's plotting paradigms:
ggplot2 (and lattice) are object-oriented: they construct a plot object that is rendered when print() is called. In ggplot2's case, print.ggplot() is the method that actually draws to the device. This means we can intercept print() and capture the object before it's rendered.
Base R graphics are imperative: plot(x, y) immediately draws to the active device. There is no "plot object" to intercept. The plot exists only as a sequence of drawing commands that have already been executed. You can retroactively capture the device state via recordPlot(), but that gives you a snapshot of the entire device, not a structured object you can re-render at different sizes or in different formats.
GoFigr's approach in R
Given these constraints, GoFigr's R client offers two paths. The recommended approach is explicit: pipe your plot to gofigR::publish(). This is idiomatic R -- users are accustomed to composing pipelines, and an explicit publish call gives full control over figure names, metadata, and data attachments. The client also offers an experimental auto-publish mode that overrides print() in the global environment to intercept supported plot objects before they reach the device.
The core mechanism:
When enable(auto_publish=TRUE) is called, intercept_base() assigns the wrapped version of print to .GlobalEnv. From that point on, any print() call in user code resolves to gf_print (because .GlobalEnv is searched before base), which checks if the object is a supported plot type and publishes it if so.
The is_supported check uses ggplotify to test whether an object can be converted to a ggplot:
What auto-publish catches and what it misses
Auto-publish works for any object that satisfies two conditions: it goes through print() for rendering, and ggplotify::as.ggplot() can convert it. This covers a lot of common cases -- ggplot2, lattice (trellis), ComplexHeatmap, grid grobs, patchwork, and others. But the exceptions and edge cases add up:
ggplot2
Yes
print.ggplot() is intercepted via global print override
lattice, ComplexHeatmap, grid grobs
Yes
Rendered via print() and convertible by ggplotify
plot(x, y), hist(), barplot()
No
Writes directly to device, never calls print()
pheatmap::pheatmap()
Duplicated
Draws to device AND returns an object; intercepting print() causes it to render twice
Anything called inside a function that uses base::print explicitly
No
Namespaced calls bypass .GlobalEnv
Auto-publish works in many common workflows, but the edge cases are numerous enough that we recommend the explicit path instead. Piping to publish() works uniformly regardless of the plotting library and sidesteps all of the above:
This is idiomatic R. Users are already accustomed to piping results through transformations, and an explicit publish() call gives full control over figure names, attached data, and metadata.
Why a custom graphics device won't help
A natural question is: if the graphics device is the one place all output converges, why not write a custom GoFigr device?
R's graphics device API is a C struct with callbacks like:
A GoFigr device would proxy these calls to a real backend (PNG, SVG) while recording the output. The problems:
No high-level context. The device sees
drawLine(x1, y1, x2, y2)anddrawText("Hello", x, y). It has zero knowledge of source code, figure titles, data frames, or which library is calling it. This is a devastating loss -- the whole point of GoFigr is linking figures to their provenance.Plot boundary detection. The device API has
newPage()but no "plot finished" signal. Withpar(mfrow=c(2,2)), four sub-plots share one page. With incremental drawing (plot(x); lines(y); legend(...)), there's no way to know when the user is done composing.Device conflicts. If a user opens a
png("output.png")device while the GoFigr device is active, the new device becomesdev.cur(). All subsequent drawing commands go to the user's device, not to GoFigr. We'd need to also interceptpng(),pdf(),svg(),dev.set(),dev.off(), etc. -- multiplying edge cases.knitr conflicts. knitr opens its own devices to capture chunk output. A GoFigr device would compete with knitr's, creating unpredictable behavior in the most important use case (R Markdown / Quarto).
Implementation cost. Writing a correct graphics device is a substantial C/C++ project. Packages like
raggandsvgliteare thousands of lines of C++ for a single output format. We'd be maintaining this alongside the R package itself.
The cost/complexity ratio is extremely unfavorable, and the result would be strictly worse than the current approach for everything except base graphics capture.
What about knitr hooks?
knitr has its own plot capture system -- it evaluates chunks using the evaluate package, which records plots via recordPlot(), and then renders them through configurable hooks. We could, in principle, intercept at the knitr level using knit_hooks$set(plot = ...):
This would catch ALL plot types in R Markdown, because knitr captures everything. But it creates significant problems:
Divergent code paths. The knitr hook is a fundamentally different capture mechanism (image-file-in, no plot object) that shares almost none of the existing publish pipeline. Every feature -- watermarking, multi-format export, metadata attachment -- would need to be reimplemented for the image-only path.
No runtime introspection. The hook receives a file path and chunk options. It has no access to the execution environment. You can't attach data frames, inspect the plot object, extract structured titles, or do anything that requires the live R session state.
Single image format. You get whatever knitr rendered -- typically a PNG. Re-rendering to EPS, SVG, or PDF from a raster image is meaningless. knitr can be configured to produce multiple formats via
dev = c("png", "svg"), but this pushes complexity onto the user and creates correlation problems (matching multiple hook calls to the same logical figure).
These trade-offs aren't worth the gain.
The fundamental asymmetry
The difference between Python and R plot capture is not a matter of implementation cleverness. It's a consequence of architectural decisions made decades ago.
Python's Jupyter ecosystem was designed around a message-passing protocol (the Jupyter messaging spec) where all rich output is serialized into MIME bundles and sent through a well-defined channel. Every visualization library adopted this protocol. IPython's DisplayPublisher is the programmatic interface to that channel, and it's a clean place to hook into.
R's graphics system was designed around the graphics device abstraction -- a low-level, C-based interface inspired by pen-plotter metaphors. Libraries write directly to the device using drawing primitives. There is no intermediate representation, no message protocol, no single point of serialization.
ggplot2 and lattice added an object-oriented layer on top of this, where plots are first-class objects that render on print(). But base R graphics, and many domain-specific libraries (pheatmap, ComplexHeatmap, corrplot, etc.), still use the imperative model.
GoFigr's R client embraces this reality rather than fighting it. We offer experimental auto-publish that works in many common cases -- ggplot2, lattice, ComplexHeatmap, and other object-oriented plotting libraries -- but the edge cases are numerous enough that our recommended path is explicit: pipe your plots to gofigR::publish(). This works with every plotting library, avoids the edge cases inherent in print() interception, and fits naturally into R's pipe-oriented idioms. A custom graphics device or knitr hooks would add substantial complexity without meaningfully closing the architectural gap.
Sometimes the right engineering decision is to recognize a platform limitation for what it is and design an ergonomic workflow around it, rather than building increasingly elaborate workarounds that create their own problems.
Last updated