At U-M’s Herbarium, Will Weaver is harnessing machine learning to transform historical specimen labels into searchable data — a breakthrough that’s changing how we preserve and share the past for the future.


A herbarium specimen offers so much more information beyond that of a pressed plant. Each specimen also includes a label, often with decades-old handwriting, describing where it was collected, when, and by whom, and occasionally a few extra historical details. That label is a treasure trove of scientific information, but turning it into searchable data is painstaking, time-consuming work.

A Schmidt Sciences AI Postdoc with MIDAS (and affiliated with the Herbarium, SEAS, EES, and EEB), Will is helping change that with VoucherVision, a machine learning tool designed to read and transcribe specimen labels. Using large language models — the same type of AI technology behind ChatGPT — VoucherVision can quickly extract and organize label information into structured datasets.

So far, the accuracy of the datasets is quite high. “If the label is typed, the transcription is basically as accurate as a human,” says Will. “For handwritten labels, the quality depends on the handwriting, but the technology has improved significantly in the past couple of years.”

The payoff is big. VoucherVision boosts productivity by an estimated 25%, allowing staff to focus their time on challenging or historically important specimens. It can also scan existing image archives, unlocking information from tens of thousands of specimens that might otherwise have sat unexamined for years.

Beyond speed, AI offers a new kind of access to hidden knowledge. Many herbarium labels contain ethnobotanical details — such as how local communities used a plant — that were never transcribed in older digitization projects. Large language models can recognize and search for these details even if they’re described in varied ways, opening up new research possibilities.

The tool is transforming workflows. “It’s not about taking humans out of the loop,” Will explains. “It’s about augmenting our work — letting AI handle the repetitive parts so we can spend more time on the complex, interesting challenges.”

Currently, VoucherVision is used by more than 20 partner institutions, with plans to integrate it directly into common collections management software, like Specify at U-M’s Herbarium, for an even smoother process. In the future, Will hopes to run the tool on secure, locally hosted servers to protect sensitive data and reduce costs.

For now, it’s already helping rediscover forgotten collections and connect researchers with specimens that might otherwise remain hidden. “It’s like opening a time capsule,” Will says. “We’re finding stories in our collections that we didn’t even know were there.”