Case Study

Legacy data has a second life

Bringing six-year-old TiZrNi work into the lab notebook, four years after the papers came out

Oleksiy Penkov · Tenured Faculty, Zhejiang University· May 8, 2026

Cross-sectional TEM of an annealed Ti41Zr39Ni20 coating showing columnar microstructure

A Word document with a critical number in it

There is a file on a hard drive in my office called Specimens_List_4_(26-28).docx. It is three pages long, mostly a table somebody copied out of a lab notebook in 2021. One column of that table records the thickness of a tungsten underlayer beneath three specimens. After going through the project folder twice I am fairly sure that document is the only place those three numbers are written down.

Those three specimens were the bridge between two papers. One of them, in Thin Solid Films in 2022, reported a hardness of 25 GPa for an annealed Ti₄₁Zr₃₉Ni₂₀ coating — the headline result of four years of work on quasicrystalline thin films. The other, in Energy Engineering a year later, reported the thermal conductivity of the same composition in its amorphous state. Specimens #26, #27, and #28 show up in both. Until March 2026, four years after the project wound down, nothing on any disk anywhere said so. The linkage lived in three students’ heads and one Word document, and it was getting harder to recover every year.

In March I poured the archive into the lab notebook. Project 15, “TiNiZr.” Two weeks of evenings, eighty-six items, three hundred files, fifty-three measured values. The papers are out. The work is finished. No new samples are coming. The previous story in this series is about building the substrate going forward, sample by sample, as you go. This one is the opposite case: whether it’s worth doing the work after the fact, on a project that will never produce another publication.

I think it is.

What was on the hard drive

The project ran from 2019 to 2022, with three students taking turns on it across four time zones — Kharkiv, Seoul, Singapore, Hangzhou. By the time it wound down the folder tree had grown organically, with three different naming conventions stacked one on top of the other. Three batches of samples on two substrates, stainless steel and leucosapphire. Twenty-eight specimens, hand-numbered #1 through #28. About three hundred files.

The instrument outputs were the usual mix. .mit files from the UNHT-3 nanoindenter for the hardness work. Friction traces from an NST-2 reciprocating tester and a DaVinci 1000 micro-tribometer for the wear study. Origin projects with names like ALL_COF.opj and H&WR_vs_Temp.opj. Micrographs from a Sigma 300 SEM and a JEOL 3000 TEM. 3D profiles from a Filmetrics rig. The TSF paper drew on the indentation and microscopy across the full anneal range, 200 to 850 °C; the Energy Engineering study added frequency-domain thermoreflectance on the amorphous films and a Wiedemann–Franz analysis pinning electrons to about 80 % of the thermal conductivity. The overlap between the two studies was Ong Weeliat’s indentation work on the bilayer specimens. It fed both papers and is credited as such in neither file tree.

None of the data was bad. All of it was filed somewhere. The problem was that “somewhere” depended on who had been working on which batch when, and the index lived in three students’ heads. And in a Word document.

Pouring it into the notebook

The unsexy part of the story.

Three folders for the three batches. Twenty-eight specimens entered as Items, each with anneal state, substrate, and W-underlayer thickness written into the description as plain prose. Searchable later, but not retro-fit into structured fields the original lab notebook never had. Every measurement under each specimen as an Action, with the equipment named: UNHT-3 for indentation, NST-2 and DaVinci 1000 for tribology, Sigma 300 and JEOL 3000 for microscopy, Filmetrics 3D for profiling.

Project 15 tree view: three batch folders, 28 specimens, every measurement nested under the specimen that produced it

Project 15. Three batches, 28 specimens, every measurement nested under the specimen that produced it.

Fifty-three hardness and reduced-modulus values came out of the .mit files into the structured value system. Each one carries the original filename and indent count in its description: “From weeliat N26.mit (20 indents, STD 3.85).” The filenames are not pretty. That’s the point. In five years they will be the only thing left that ties a number back to who measured it and when. Standard deviations went in alongside the means, so a specimen whose indentation runs were noisy declares itself on the page rather than getting laundered into a clean-looking row.

Friction-test conditions — 0.98 N normal load, 1000 cycles, 2 mm stroke, 4 mm/s — went in as structured values on each tribotest action. Same conditions across the series, but recorded once per action. The point is to query, not to remember.

Ten hours per paper, give or take.

What you get back

The thing I had not predicted, going in, was how much cross-paper lineage falls out once the specimens live in one place.

Specimens #26 through #28 — the bilayer trio with the W underlayers — now sit in folder 23 with descriptions that name both papers they appeared in. A wiki page in the Analysis section ties both DOIs to the specimens that produced the numbers. Someone reading the Energy Engineering paper a year from now can land on specimen #27 and see the structural characterisation that didn’t make it into that publication: diffraction, indentation, micrographs, all under the same item.

The phase ladder for the system — amorphous below 460 °C, single-phase quasicrystal at 460 °C / 19 h, layered phases by 600 °C, 2/1 approximant plus Laves plus Zr-Ti above 770 °C / 14 h — now lives in the wiki page, not scattered across two methods sections and three students’ OneNotes. None of that is new information. It just hadn’t been written down anywhere together.

Wiki summary in project 15's Analysis section: phase ladder, both DOIs, links to the specimens that produced the numbers

The wiki page. Phase ladder, two DOIs, the specimens behind each number — one document where there used to be none.

That was the surprise. Not a result, not a publishable insight. A project I thought I knew well turned out to be more legible to me in March 2026 than it had been in 2022, when I was working on it every day.

Queries the archive now answers

The notebook is reachable through an MCP server, so the same questions you would put to a senior student can be put to an LLM connected to the database. Three real ones from the past few weeks.

“Plot hardness vs anneal temperature for the steel-substrate series.” It pulls the specimens, reads the H values out of the structured fields, and returns the table that Figure 3 of the TSF paper was built from. Not a regenerated figure — the underlying numbers, traceable back to the .mit file each one came from.

“Which specimens in any project were measured by Ong Weeliat?” A grep across value descriptions finds the seven weeliat-prefixed indentation runs. Two minutes. In 2022 it would have been a Slack thread.

“What’s the maximum published hardness for TiZrNi in our lab?” 25 GPa. With the DOI.

The wiki summary itself was written by an LLM session in March, from the project’s structured contents — list_items(project_id=15) plus a Zotero collection lookup, published back through a wiki_create_page call. That’s the closest the story gets to an AI beat. The model did not extract structure from the raw files. It read structure we had spent two weeks putting in.

A few judgment calls

Carry the standard deviations across, even when they’re large. The temptation when ingesting is to record the means and quietly drop the spreads. Don’t. A noisy value with its noise attached tells you which specimens to trust, which to revisit, which conditions push the measurement past what the instrument can reliably resolve. A clean-looking table that hides the noise is worse than the raw file. The raw file at least knows it’s raw.

Don’t invent retro-fields. The original lab notebook did not have a structured “substrate” field. It had whatever the student wrote. We kept it that way: substrate is text in the specimen description, not a structured key. Retro-fitting structure the original work didn’t have means making decisions you don’t have authority to make six years later. Search will find prose. Search cannot rescue a wrong structured value.

Use the original filenames in value descriptions. They are the only thing that survives if a student leaves and the source files migrate to a different drive. weeliat N26.mit is ugly. weeliat N26.mit (20 indents, STD 3.85) will still be readable in 2031.

Tie specimens to DOIs the moment you ingest. The lineage between a specimen and the paper that reported its numbers lives in your head right now. It will not be there in five years. The wiki entry takes ten minutes per paper. Future-you will use it.

The project went live for read access on March 23, five collaborators in one click. Most of them had never seen the underlying samples. What I had not expected was how fast they reached for it — and what for. Not the published results. The specimens that didn’t make it in.

Try this in your lab

If you have a finished project on a hard drive that produced a paper or two, you can do this in roughly the budget the TiZrNi ingestion took. Ten hours per paper. A couple of weeks of evenings.

The work pays off most where specimens were shared — between papers, between students, between instruments. Anywhere a measurement on one sample shows up in two contexts and the linkage is currently in someone’s head. If the project is single-paper, single-student, single-instrument, the case is weaker.

You don’t need to ingest everything. The hour spent on the published specimens is worth more than the hour spent on scratch work nobody cited. Include the almost-published ones too — the samples that nearly made it into a figure. Future questions keep returning to those.

And the moment to do it, if you’re going to, is now. In another two years one of the students will move, a hard drive will fail, and the W-underlayer thickness on the bilayer specimens will not exist anywhere on Earth except in someone’s memory. Right now it’s in a Word document called Specimens_List_4_(26-28).docx. That is not a stable state. The notebook is.

← All case studies