EEB Student Dissertation Defense - Computational Morphometrics of Half a Million Herbarium Specimens Reveals Lineage-Specific Evolutionary Dynamics in Climate-Leaf Trait Relationships
Will Weaver
Abstract:
Digitized herbarium collections facilitate access to hundreds of millions of preserved plant specimens, enabling quantitative analysis of leaf morphology across taxonomic and geographic scales. This dissertation presents a computational framework that leverages machine learning, computer vision, large‐language models, and novel mathematical methods to extract, analyze, and interpret leaf traits at scale. By combining lineage‐specific approaches with broad comparative analyses, it establishes procedures for measuring traits across these vast datasets.
Chapter 2 introduces LeafMachine2, a modular machine‐learning pipeline that automates quantitative trait extraction from digitized herbarium specimens. Trained on nearly half a million manual annotations from over two thousand species, LeafMachine2 employs multiple neural networks to detect and segment individual leaves, recognize rulers and labels, and identify pseudo‐landmarks such as midvein length, petiole length, and lobe positions. This tool processes thousands of images per hour, calculates pixel‐to‐metric conversion factors for most common scalebar types, and generates standardized measurements (area, perimeter, Fourier descriptors, and landmark‐based metrics) even when specimens contain occluded or overlapping leaves. By automating leaf extraction and measurement procedures, LeafMachine2 removes a bottleneck in morphological data acquisition and lays the foundation for large‐scale analyses.
Chapter 3 presents FieldPrism, a versatile hardware and software suite designed to extend quantitative trait collection to field photographs. FieldPrism employs a photogrammetric background template (FieldSheet) containing machine‐readable scale markers and optional QR codes for specimen identification. Coupled with a Raspberry Pi-based mobile imaging apparatus (FieldStation), FieldPrism automatically corrects image distortion, computes pixel‐to‐metric conversion, decodes QR codes to assign meaningful file names, and embeds GPS metadata. Validation tests across smartphone, mirrorless, and machine‐vision cameras demonstrate high accuracy in scale conversion and reliable QR code decoding at typical working distances. FieldPrism enables the creation of “snapshot vouchers” that complement physical voucher specimens, opening new possibilities for citizen science and field‐based trait collection.
Chapter 4 explores the integration of large language models (LLMs) into herbarium label transcription. VoucherVision combines optical character recognition with LLM‐based parsing to transform unstructured OCR text into structured spreadsheet entries. In practice, VoucherVision increased transcription productivity by 25 percent over manual workflows. This chapter also examines potential risks like job displacement, copyright and data sovereignty issues, model permanence, and outlines a collaborative benchmarking initiative to develop fine‐tuned, community‐driven LLM solutions that respect data ownership and ensure reproducibility.
Chapter 5 applies LeafMachine2, Euler characteristic transform (ECT), and a novel geomorphon‐based classification to analyze 4.6 million individual leaf outlines from nearly half a million herbarium specimens, covering 12 angiosperm families and close to seven thousand species. ECT matrices capture topological signatures of each leaf outline, while geomorphon analysis distills leaf margin features into continuous and categorical metrics that quantify toothedness and lobedness. By integrating WorldClim bioclimatic layers with dated phylogenies and analyzing the dataset using linear mixed-effects and phylogenetic regression models, I demonstrate that the canonical climate–leaf trait associations—larger, entire-margined leaves in warmer, wetter environments and more toothed or lobed leaves in cooler, drier locations—are clade-specific rather than universal across angiosperms. Instead, family‐ and genus‐specific evolutionary regimes emerge. Phylogenetic correction often weakens climate-leaf area associations while strengthening leaf shape associations with minimum temperatures. Correlated‐trait analyses reveal that margin‐type evolution is climate‐dependent in some woody clades but independent in many herbaceous groups. Overall, leaf traits are shaped by a dynamic interaction of developmental constraints, phylogenetic history, growth habit, and climate.
This dissertation demonstrates that integrating machine learning, artificial intelligence, and innovative mathematical approaches transform digitized herbarium specimens into a rigorous analytical foundation for addressing foundational questions in leaf evolution and biogeography. Providing scalable, streamlined workflows that operate at previously unattainable scales, these contributions enable researchers to move beyond isolated case studies and achieve a holistic synthesis of how environment, development, and phylogenetic history interact to generate the extraordinary diversity of angiosperm leaf forms.
Digitized herbarium collections facilitate access to hundreds of millions of preserved plant specimens, enabling quantitative analysis of leaf morphology across taxonomic and geographic scales. This dissertation presents a computational framework that leverages machine learning, computer vision, large‐language models, and novel mathematical methods to extract, analyze, and interpret leaf traits at scale. By combining lineage‐specific approaches with broad comparative analyses, it establishes procedures for measuring traits across these vast datasets.
Chapter 2 introduces LeafMachine2, a modular machine‐learning pipeline that automates quantitative trait extraction from digitized herbarium specimens. Trained on nearly half a million manual annotations from over two thousand species, LeafMachine2 employs multiple neural networks to detect and segment individual leaves, recognize rulers and labels, and identify pseudo‐landmarks such as midvein length, petiole length, and lobe positions. This tool processes thousands of images per hour, calculates pixel‐to‐metric conversion factors for most common scalebar types, and generates standardized measurements (area, perimeter, Fourier descriptors, and landmark‐based metrics) even when specimens contain occluded or overlapping leaves. By automating leaf extraction and measurement procedures, LeafMachine2 removes a bottleneck in morphological data acquisition and lays the foundation for large‐scale analyses.
Chapter 3 presents FieldPrism, a versatile hardware and software suite designed to extend quantitative trait collection to field photographs. FieldPrism employs a photogrammetric background template (FieldSheet) containing machine‐readable scale markers and optional QR codes for specimen identification. Coupled with a Raspberry Pi-based mobile imaging apparatus (FieldStation), FieldPrism automatically corrects image distortion, computes pixel‐to‐metric conversion, decodes QR codes to assign meaningful file names, and embeds GPS metadata. Validation tests across smartphone, mirrorless, and machine‐vision cameras demonstrate high accuracy in scale conversion and reliable QR code decoding at typical working distances. FieldPrism enables the creation of “snapshot vouchers” that complement physical voucher specimens, opening new possibilities for citizen science and field‐based trait collection.
Chapter 4 explores the integration of large language models (LLMs) into herbarium label transcription. VoucherVision combines optical character recognition with LLM‐based parsing to transform unstructured OCR text into structured spreadsheet entries. In practice, VoucherVision increased transcription productivity by 25 percent over manual workflows. This chapter also examines potential risks like job displacement, copyright and data sovereignty issues, model permanence, and outlines a collaborative benchmarking initiative to develop fine‐tuned, community‐driven LLM solutions that respect data ownership and ensure reproducibility.
Chapter 5 applies LeafMachine2, Euler characteristic transform (ECT), and a novel geomorphon‐based classification to analyze 4.6 million individual leaf outlines from nearly half a million herbarium specimens, covering 12 angiosperm families and close to seven thousand species. ECT matrices capture topological signatures of each leaf outline, while geomorphon analysis distills leaf margin features into continuous and categorical metrics that quantify toothedness and lobedness. By integrating WorldClim bioclimatic layers with dated phylogenies and analyzing the dataset using linear mixed-effects and phylogenetic regression models, I demonstrate that the canonical climate–leaf trait associations—larger, entire-margined leaves in warmer, wetter environments and more toothed or lobed leaves in cooler, drier locations—are clade-specific rather than universal across angiosperms. Instead, family‐ and genus‐specific evolutionary regimes emerge. Phylogenetic correction often weakens climate-leaf area associations while strengthening leaf shape associations with minimum temperatures. Correlated‐trait analyses reveal that margin‐type evolution is climate‐dependent in some woody clades but independent in many herbaceous groups. Overall, leaf traits are shaped by a dynamic interaction of developmental constraints, phylogenetic history, growth habit, and climate.
This dissertation demonstrates that integrating machine learning, artificial intelligence, and innovative mathematical approaches transform digitized herbarium specimens into a rigorous analytical foundation for addressing foundational questions in leaf evolution and biogeography. Providing scalable, streamlined workflows that operate at previously unattainable scales, these contributions enable researchers to move beyond isolated case studies and achieve a holistic synthesis of how environment, development, and phylogenetic history interact to generate the extraordinary diversity of angiosperm leaf forms.
Building: | Biological Sciences Building |
---|---|
Event Type: | Workshop / Seminar |
Tags: | biological science, Biosciences, Bsbsigns, Data Science, department of ecology and evolutionary biology, Dissertation, ecology, Ecology & Biology, Ecology And Evolutionary Biology, eeb, Free, Graduate School, Graduate Students, Herbarium, Research, Research Museums Center |
Source: | Happening @ Michigan from Ecology and Evolutionary Biology, EEB Defenses |