Certifying outputs and detecting drift
Source:vignettes/certify-and-drift.Rmd
certify-and-drift.RmdThis vignette covers Tier 2 of the reproducr workflow in
depth: certify(), check_drift(), and
list_certs(). These three functions together form the
baseline and drift detection system.
The problem they solve — a real scenario
Scenario — The revision drift problem
You submit a paper in March. Before submission you run the analysis and note the key results: hazard ratio 0.582 (95% CI: 0.446–0.760, p < 0.001).
In May a reviewer asks for a revision. While working on the response
you upgrade your packages — including lme4, which adjusted
its default optimizer tolerances between versions 1.1.29 and 1.1.30. You
re-run the analysis: hazard ratio 0.591 (95% CI: 0.452–0.768).
The numbers are slightly different. No error was thrown. The code is identical. Without a record of what the March run produced, you would not know whether the change came from your revision or from the package upgrade.
[DRIFTED] hr: 0.582 → 0.591
[DRIFTED] ci_lower: 0.446 → 0.452
[DRIFTED] ci_upper: 0.760 → 0.768
With certify() and check_drift(), this is
caught immediately and you can investigate before submitting to the
reviewer.
More broadly, packages change hands, maintainers push silent fixes, platform-level libraries (BLAS, LAPACK) get updated by system administrators, and R itself changes RNG defaults between minor versions. Any of these can alter your numerical results without producing an error.
certify() and check_drift() detect this.
The idea is simple:
- After a successful analysis run, hash the key outputs and store the hashes.
- Later — after any change to the environment — re-run the analysis and compare the new hashes against the stored ones.
- Any mismatch is reported explicitly, by output name.
certify() — creating a baseline
What gets hashed
Pass a fully named list of any R objects you want to protect. Common choices:
model <- lm(mpg ~ wt + cyl, data = mtcars)
certify(
outputs = list(
coefs = coef(model),
r_squared = summary(model)$r.squared,
sigma = sigma(model),
n_obs = nrow(mtcars),
n_complete = sum(complete.cases(mtcars)),
group_means = aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
),
tag = "baseline-v1",
script = "analysis.R",
file = cert_file
)
#> reproducr: certified 6 output(s) [2026-06-10] under tag 'baseline-v1'Choosing what to certify
Certify outputs that are:
- Conclusions — the numbers that appear in your paper or report
- Stable — not random session artefacts like timestamps or row ordering
- Interpretable — so a drift report tells you something meaningful
Avoid certifying objects that are expected to differ across runs by
design, such as proc.time() outputs or
Sys.time() values.
Tags and the certification store
Every certification requires a tag — a human-readable
label:
certify(
outputs = list(coefs = coef(model)),
tag = "pre-peer-review",
file = cert_file
)
#> reproducr: certified 1 output(s) [2026-06-10] under tag 'pre-peer-review'
certify(
outputs = list(coefs = coef(model)),
tag = "post-revision",
file = cert_file
)
#> reproducr: certified 1 output(s) [2026-06-10] under tag 'post-revision'Passing a duplicate tag overwrites the existing record with a warning:
certify(
outputs = list(coefs = coef(model)),
tag = "baseline-v1",
file = cert_file
)
#> Warning: Tag 'baseline-v1' already exists in '/tmp/RtmpVu4Wu1/file1c70519a52e'.
#> Overwriting.
#> reproducr: certified 1 output(s) [2026-06-10] under tag 'baseline-v1'
list_certs() — inspecting the store
list_certs(file = cert_file)
#> tag timestamp r_version os
#> 1 baseline-v1 2026-06-10T21:15:02+0000 4.6.0 Linux 6.17.0-1015-azure
#> 2 pre-peer-review 2026-06-10T21:15:01+0000 4.6.0 Linux 6.17.0-1015-azure
#> 3 post-revision 2026-06-10T21:15:01+0000 4.6.0 Linux 6.17.0-1015-azure
#> n_outputs script
#> 1 1 <NA>
#> 2 1 <NA>
#> 3 1 <NA>
check_drift() — comparing against a baseline
Basic usage
model2 <- lm(mpg ~ wt + cyl, data = mtcars)
result <- check_drift(
outputs = list(
coefs = coef(model2),
r_squared = summary(model2)$r.squared,
sigma = sigma(model2),
n_obs = nrow(mtcars),
n_complete = sum(complete.cases(mtcars)),
group_means = aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
),
against = "baseline-v1",
file = cert_file
)
#>
#> -- reproducr drift check vs 'baseline-v1' --
#>
#> Verdict : ALL OUTPUTS MATCH
#> OK : 1
#> Drifted : 0
#> Missing : 0
#> New : 5The four statuses
certify(
outputs = list(
stays_same = 42L,
will_change = coef(lm(mpg ~ wt, data = mtcars)),
will_vanish = "this output disappears next run"
),
tag = "four-statuses",
file = cert_file
)
#> reproducr: certified 3 output(s) [2026-06-10] under tag 'four-statuses'
demo_result <- check_drift(
outputs = list(
stays_same = 42L,
will_change = coef(lm(mpg ~ hp, data = mtcars)),
brand_new = "this output is new"
),
against = "four-statuses",
file = cert_file
)
#>
#> -- reproducr drift check vs 'four-statuses' --
#>
#> Verdict : DRIFT DETECTED
#> OK : 1
#> Drifted : 1
#> Missing : 1
#> New : 1
#>
#> Drifted outputs:
#> - will_change
print(demo_result)
#>
#> -- reproducr drift report --
#>
#> [OK] stays_same
#> [DRIFT] will_change
#> Hash mismatch (numeric tolerance check requires stored values).
#> [NEW] brand_new
#> Not present in the baseline certification.
#> [MISSING] will_vanish
#> Present in baseline but not supplied to check_drift().| Status | Meaning |
|---|---|
ok |
Hash matches the baseline exactly |
drifted |
Hash differs — output has changed |
missing |
Present in baseline, not supplied to check_drift()
|
new |
Supplied to check_drift(), not in baseline |
Using "latest"
certify(outputs = list(x = 1L), tag = "run-1", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-10] under tag 'run-1'
certify(outputs = list(x = 1L), tag = "run-2", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-10] under tag 'run-2'
certify(outputs = list(x = 1L), tag = "run-3", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-10] under tag 'run-3'
check_drift(outputs = list(x = 1L), against = "latest", file = cert_file)
#> reproducr: comparing against latest tag: 'run-3'
#>
#> -- reproducr drift check vs 'run-3' --
#>
#> Verdict : ALL OUTPUTS MATCH
#> OK : 1
#> Drifted : 0
#> Missing : 0
#> New : 0Using drift results programmatically
result <- check_drift(outputs = current_outputs, against = "latest")
n_drifted <- sum(result$status == "drifted")
if (n_drifted > 0L) {
drifted_names <- result$output[result$status == "drifted"]
stop(sprintf(
"%d output(s) have drifted since last certification: %s",
n_drifted,
paste(drifted_names, collapse = ", ")
))
}Recommended workflow
After reviewer comments
check_drift(
outputs = list(
primary_coef = coef(model)[2],
primary_pval = summary(model)$coefficients[2, 4],
n = nrow(data),
effect_size = compute_d(model)
),
against = "submitted-2026-01-15"
)