PaleobiologyDB.jl is a Julia package for paleobiological informatics and analysis.
The primary package provides an interface to the Paleobiology Database (PBDB) Web API: every endpoint has a corresponding function, keyword arguments map directly to API parameters, and all results are returned as DataFrames ready for the Julia data ecosystem.
Beyond the API wrapper, the package includes a Taxonomy submodule for both data curation (resolving synonyms, filtering by quality) and biodiversity exploration and instruction (navigating hierarchies, building subtrees, classroom-ready querying); integrated caching designed for bandwidth-limited settings such as workshops and shared classroom networks; a PhyloPic integration layer for enriching DataFrames and plots with silhouette images; and Makie-based extensions for interactive dendrogram visualisation and range-chart overlays.
using Pkg
Pkg.add("PaleobiologyDB")Development version (PhyloPicMakie must be added first — it is a required dependency of PaleobiologyDB and is not yet in the registry):
using Pkg
Pkg.add(url = "https://github.com/jeetsukumaran/PhyloPicMakie.jl")
Pkg.add(url = "https://github.com/jeetsukumaran/PaleobiologyDB.jl")using PaleobiologyDB: pbdb_occurrences
# Fossil occurrences — returns a DataFrame
canids = pbdb_occurrences(base_name = "Canidae", interval = "Miocene", show = "full")using PaleobiologyDB: pbdb_taxon
# Taxonomic record for a name
canis = pbdb_taxon(name = "Canis", show = ["attr", "app", "size"])using PaleobiologyDB: pbdb_collection
# A single collection with location and stratigraphic detail
coll = pbdb_collection("col:1003", show = ["loc", "stratext"], extids = true)PaleobiologyDB.Taxonomy serves two audiences.
For researchers, it provides data quality and curation tools: filtering occurrences to well-resolved and authority-validated names, augmenting DataFrames with full taxonomic lineages, and querying the PBDB taxonomic hierarchy.
For educators and students, it offers a programmatic environment for biodiversity exploration and instruction: navigating taxonomic trees, building subtrees from any node, and filtering datasets by clade membership.
# Quality filtering — keep only genus-resolved, authority-recognised occurrences
using PaleobiologyDB: pbdb_occurrences
using PaleobiologyDB.Taxonomy: drop_unresolved_taxa, drop_unrecognized_taxa
df = pbdb_occurrences(base_name = "Carnivora", interval = "Miocene", show = "full")
df = drop_unresolved_taxa(df, :genus) # drop rows not resolved to genus level
df = drop_unrecognized_taxa(df, :genus) # drop rows with names not in PBDB authority# Augment occurrences with full lineage columns, then filter by clade
using PaleobiologyDB: pbdb_occurrences
using PaleobiologyDB.Taxonomy: augment_taxonomy, contains_taxon, taxon_occursin
df = pbdb_occurrences(base_name = "Canidae", interval = "Neogene", show = "full")
df2 = augment_taxonomy(df) # adds taxonomy_genus, taxonomy_family, …, taxonomy_clades
# Filters for taxon => "Canis"
df2[contains_taxon(df2, "Canis"), :]
df2[taxon_occursin("Canis", df2), :]
# Build and navigate a taxonomic subtree
using PaleobiologyDB.Taxonomy: taxon_subtree, root_taxon, leaf_taxa
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
root_taxon(tree).name # → "Carnivora"
root_taxon(tree).rank # → "order"
# Names of all family-level leaves
leaf_taxa(tree) .|> (n -> n.name)
# → ["Ailuridae", "Amphicyonidae", "Canidae", "Felidae", …]
# The underlying Graphs.jl SimpleDiGraph is accessible for further algorithmsThree functions map PBDB taxon names to PhyloPic silhouette images:
| Function | Returns | Use when |
|---|---|---|
acquire_phylopic |
NamedTuple or DataFrame |
One representative image record per taxon |
augment_phylopic |
DataFrame |
Enrich an occurrences DataFrame in one call |
phylopic_images_dataframe |
DataFrame |
All available images for a taxon or clade |
# Enrich an occurrences DataFrame with PhyloPic image columns
using CairoMakie # triggers TaxonomyMakie extension
using PaleobiologyDB: pbdb_occurrences, acquire_phylopic, augment_phylopic
df = pbdb_occurrences(base_name = "Ceratopsia", interval = "Cretaceous", show = "full")
pics = acquire_phylopic(df) # DataFrame: one row per occurrence row
pics.phylopic_thumbnail # → Vector of thumbnail URLs / missings
pics.phylopic_license # → Vector of licence strings
enriched = augment_phylopic(df) # original columns + 14 phylopic_ columns# Browse all available images for a taxon or clade
using CairoMakie # triggers TaxonomyMakie extension
using PaleobiologyDB: phylopic_images_dataframe
imgs = phylopic_images_dataframe("Carnivora")
nrow(imgs) # → hundreds (all images within Carnivora clade)
imgs.phylopic_thumbnail[1:5] # thumbnail URLs
imgs.phylopic_raster[1:5] # full-resolution PNG URLs
# Restrict to images tagged to exactly the Carnivora node (far fewer)
imgs_node = phylopic_images_dataframe("Carnivora"; filter = :node)# Anchor a PhyloPic glyph at each taxon's first appearance on a range chart
using CairoMakie: Figure, Axis, lines!, xlims!, display
using PaleobiologyDB: augment_phylopic_ranges!
taxa = ["Tyrannosaurus", "Triceratops", "Ankylosaurus",
"Pachycephalosaurus", "Edmontosaurus"]
first_app = [68.0, 68.0, 70.0, 74.0, 76.0]
last_app = [66.0, 66.0, 66.0, 66.0, 66.0]
fig = Figure(size = (800, 420))
ax = Axis(fig[1, 1]; xlabel = "Age (Ma)", xreversed = true,
yticks = (1:length(taxa), taxa))
for (i, (fa, la)) in enumerate(zip(first_app, last_app))
lines!(ax, [fa, la], [i, i]; linewidth = 6, color = :gray30)
end
augment_phylopic_ranges!(
ax, first_app, last_app, collect(1.0:length(taxa));
taxon = taxa,
at = :start,
glyph_size = 0.38,
)
xlims!(ax, 78, 64)
display(fig)# PhyloPic thumbnail gallery
using CairoMakie: display
using PaleobiologyDB: phylopic_thumbnail_grid
# Single taxon — all clade images
fig = phylopic_thumbnail_grid("Felis"; image_filter = :clade, ncols = 4)
display(fig)
# Multiple taxa — primary image per taxon
fig2 = phylopic_thumbnail_grid(
["Felis", "Canis", "Panthera", "Lynx"];
image_filter = :primary,
ncols = 2,
)
display(fig2)See the TaxonomyMakie PhyloPic guide for the full API and layout options.
using CairoMakie: Figure, Axis, save, display
using PaleobiologyDB.Taxonomy: taxon_subtree
using PaleobiologyDB: taxonomytreeplot, taxonomytreeplot!, set_rank_axis_ticks!
# Build a subtree and render it — branches and nodes coloured by taxonomic rank
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
fig, ax, p = taxonomytreeplot(tree; showtips = true, color_by_rank = true, ladderize = true)
save("carnivora_families.png", fig)
# Compose into an existing axis
fig2 = Figure(size = (1000, 700))
ax2 = Axis(fig2[1, 1]; title = "Canidae genera")
tree2 = taxon_subtree("Canidae"; leaf_rank = "genus")
taxonomytreeplot!(ax2, tree2; showtips = true, ladderize = true)
set_rank_axis_ticks!(ax2, tree2)
display(fig2)using CairoMakie: Figure, Axis, save
using PaleobiologyDB.Taxonomy: taxon_subtree
using PaleobiologyDB: taxonomytreeplot, augment_tip_phylopic!
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
fig, ax, p = taxonomytreeplot(tree; showtips = true, color_by_rank = true, ladderize = true)
# Overlay PhyloPic silhouettes at each leaf tip
augment_tip_phylopic!(ax, p; xoffset = 0.5)
save("carnivora_phylopic.png", fig)See the TaxonomyMakie guide for the full attribute reference and worked examples.
Teaching workshops and classroom sessions often place dozens of students on a shared network, all querying the same PBDB endpoints simultaneously. Research workflows repeatedly re-run the same queries during analysis. PaleobiologyDB.jl integrates DataCaches.jl to address both scenarios: pre-fetch all data once, then serve every subsequent identical call from a local cache with no network round-trip.
# Enable transparent autocaching for all API functions
using PaleobiologyDB: pbdb_occurrences
using DataCaches: set_autocaching!
set_autocaching!(true)
canids = pbdb_occurrences(base_name = "Canidae", interval = "Miocene") # fetched + cached
canids = pbdb_occurrences(base_name = "Canidae", interval = "Miocene") # instant cache hit
set_autocaching!(false) # disable when no longer needed
# Cache a specific function only
set_autocaching!(true, pbdb_occurrences)# Explicit per-call caching
using PaleobiologyDB: pbdb_occurrences, pbdb_taxa
using DataCaches: @memcache, @filecache
# In-session memoisation (lost on Julia exit)
canids = @memcache pbdb_occurrences(base_name = "Canidae", interval = "Miocene")
taxa = @memcache pbdb_taxa(name = "Dinosauria")
# Persistent file cache (survives across sessions)
canids = @filecache pbdb_occurrences(base_name = "Canidae", interval = "Miocene")See the Caching guide for file cache configuration, per-function control, and classroom pre-fetch patterns.
| Category | Functions |
|---|---|
| Occurrences | pbdb_occurrence, pbdb_occurrences, pbdb_ref_occurrences |
| Collections | pbdb_collection, pbdb_collections, pbdb_collections_geo, pbdb_ref_collections |
| Taxa | pbdb_taxon, pbdb_taxa, pbdb_taxa_auto, pbdb_ref_taxa, pbdb_opinions_taxa |
| Intervals / Scales | pbdb_interval, pbdb_intervals, pbdb_scale, pbdb_scales |
| Strata | pbdb_strata, pbdb_strata_auto |
| References | pbdb_reference, pbdb_references |
| Specimens | pbdb_specimen, pbdb_specimens, pbdb_ref_specimens, pbdb_measurements |
| Opinions | pbdb_opinion, pbdb_opinions |
| Counts | pbdb_count |
All functions accept keyword arguments that map directly to PBDB API parameters.
Use pbdb_count to count records without downloading them.
| Category | Functions |
|---|---|
| Quality filtering | drop_unresolved_taxa, drop_unresolved_taxa!, drop_unrecognized_taxa, drop_unrecognized_taxa! |
| Augmentation | augment_taxonomy |
| Row filtering | taxon_occursin, contains_taxon |
| Hierarchy queries | child_taxa, parent_taxa, registered_taxa, taxonomic_ranks |
| Name validation | istaxon, audit_taxonomy |
| Tree graphs | taxon_subtree, root_taxon, leaf_taxa, taxa_at_rank |
| Types | TaxonNode, TaxonomyTree |
| Category | Functions |
|---|---|
| Data acquisition | acquire_phylopic, augment_phylopic, phylopic_images_dataframe, phylopic_images, phylopic_node |
| Makie overlays | augment_phylopic!, augment_phylopic_ranges!, augment_phylopic_ranges |
| Gallery | phylopic_thumbnail_grid, phylopic_thumbnail_grid! |
| Symbol | Description |
|---|---|
taxonomytreeplot |
Standalone figure; returns (Figure, Axis, TaxonomyTreePlot) |
taxonomytreeplot! |
Add dendrogram to an existing axis |
set_rank_axis_ticks! |
Label x-axis with rank names at their depth positions |
tip_positions |
Extract leaf-tip coordinates from a tree or plot |
augment_tip_phylopic! |
Add PhyloPic silhouettes at each leaf tip |
TaxonomyTreePlot |
Plot type (for dispatch and attribute access) |
- Quick Start — examples for all endpoint types, advanced query options
- Caching — file, memory, and autocaching; classroom pre-fetch patterns
- TaxonomyTreeMakie — dendrogram visualisation guide and attribute reference
- PhyloPicPBDB — PhyloPic overlay and gallery guide
- Contributing — testing, development, and external resources
API reference
- Occurrences
- Collections
- Taxa
- Specimens
- Other endpoints — intervals, scales, strata, references, opinions
- Taxonomy — queries and filtering
- Taxonomy — row filtering
- Taxonomy — tree graphs
- Taxonomy — search
- PhyloPic acquisition
- PhyloPicPBDB Makie API
- TaxonomyTreeMakie API
- Interactive Help — REPL-based parameter and field discovery
- Depot — local data snapshot management
julia --project -e 'using Pkg; Pkg.test()'Enable live API tests:
PBDB_LIVE=1 julia --project -e 'using Pkg; Pkg.test()'Contributions are welcome. Please fork the repository, add tests for new functionality, and submit a pull request.
If you use PaleobiologyDB.jl in your research, please cite both this package and the Paleobiology Database:
@misc{PaleobiologyDB.jl,
author = {Jeet Sukumaran},
title = {PaleobiologyDB.jl: A Julia interface to the Paleobiology Database},
url = {https://github.com/jeetsukumaran/PaleobiologyDB.jl},
year = {2025},
doi = {10.5281/zenodo.17043157}
}
@article{Peters2016,
author = {Shanan E. Peters and Michael McClennen},
title = {The Paleobiology Database application programming interface},
journal = {Paleobiology},
volume = {42},
number = {1},
pages = {1--7},
year = {2016},
doi = {10.1017/pab.2015.39}
}