This vignette describes measure’s internal class system. While most users won’t need to interact with these internals directly, understanding them is useful if you’re:
Early versions of measure relied on a column named
.measures to store spectral data. This worked but had
limitations:
.measures
columnFollowing Issue #16,
measure now uses custom S3 classes. This enables robust detection via
inherits() and supports multiple measure columns per
dataset (see Multiple Measure
Columns below).
measure uses a two-level class hierarchy:
measure_tblA single measurement - a tibble with location and
value columns:
# After preprocessing, each row's .measures element is a measure_tbl
rec <- recipe(water + fat + protein ~ ., data = meats_long) |>
update_role(id, new_role = "id") |>
step_measure_input_long(transmittance, location = vars(channel)) |>
prep()
result <- bake(rec, new_data = NULL)
# Extract one measurement
one_measurement <- result$.measures[[1]]
one_measurement
#> <measure_tbl [100 x 2]>
#> # A tibble: 100 × 2
#> location value
#> <int> <dbl>
#> 1 1 2.62
#> 2 2 2.62
#> 3 3 2.62
#> 4 4 2.62
#> 5 5 2.62
#> 6 6 2.62
#> 7 7 2.62
#> 8 8 2.62
#> 9 9 2.63
#> 10 10 2.63
#> # ℹ 90 more rows
# Check the class
class(one_measurement)
#> [1] "measure_tbl" "tbl_df" "tbl" "data.frame"
is_measure_tbl(one_measurement)
#> [1] TRUEmeasure_listA list column containing multiple measure_tbl objects -
one per row in your data:
# The .measures column itself is a measure_list
class(result$.measures)
#> [1] "measure_list" "vctrs_list_of" "vctrs_vctr" "list"
is_measure_list(result$.measures)
#> [1] TRUE
# Nice printing in tibbles
result
#> # A tibble: 215 × 6
#> id water fat protein .measures channel
#> <int> <dbl> <dbl> <dbl> <meas> <list>
#> 1 1 60.5 22.5 16.7 [100 × 2] <int [100]>
#> 2 2 46 40.1 13.5 [100 × 2] <int [100]>
#> 3 3 71 8.4 20.5 [100 × 2] <int [100]>
#> 4 4 72.8 5.9 20.7 [100 × 2] <int [100]>
#> 5 5 58.3 25.5 15.5 [100 × 2] <int [100]>
#> 6 6 44 42.7 13.7 [100 × 2] <int [100]>
#> 7 7 44 42.7 13.7 [100 × 2] <int [100]>
#> 8 8 69.3 10.6 19.3 [100 × 2] <int [100]>
#> 9 9 61.4 19.9 17.7 [100 × 2] <int [100]>
#> 10 10 61.4 19.9 17.7 [100 × 2] <int [100]>
#> # ℹ 205 more rowsmeasure provides helper functions to find and validate measure columns:
is_measure_list() and
is_measure_tbl()Test if an object has the appropriate class:
find_measure_cols()Find all measure columns in a data frame:
The class-based approach provides several benefits:
inherits(x, "measure_list") instead of checking column
names<meas [100]> instead of raw list outputIf you’re writing functions that work with measure data:
my_function <- function(data) {
# Validate input has measure columns
has_measure_col(data)
# Find measure columns
meas_cols <- find_measure_cols(data)
# Work with the measure_list
for (col in meas_cols) {
measurements <- data[[col]]
# Each element is a measure_tbl with $location and $value
}
}The helper functions measure_to_matrix() and
matrix_to_measure() in R/helpers.R convert
between measure lists and matrices for bulk operations.
While recipe steps are the primary interface for production pipelines, measure provides utility functions for interactive exploration and debugging.
When developing a custom transformation, use
measure_map() to test it interactively:
# Apply a transformation to each sample's measurements
centered <- measure_map(result, ~ {
.x$value <- .x$value - mean(.x$value)
.x
})
# Check the result
mean(centered$.measures[[1]]$value) # Should be ~0
#> [1] -1.599431e-16Important: measure_map() is for
exploration only. Once your transformation works, move it to
step_measure_map() for reproducible pipelines:
When exploring data that might have problematic samples, use the safer variant:
Compute summary statistics across all samples at each measurement location:
# Default: mean and SD at each location
stats <- measure_summarize(result)
head(stats)
#> # A tibble: 6 × 3
#> location mean sd
#> <int> <dbl> <dbl>
#> 1 1 2.81 0.411
#> 2 2 2.81 0.413
#> 3 3 2.81 0.416
#> 4 4 2.82 0.418
#> 5 5 2.82 0.421
#> 6 6 2.82 0.424This is useful for:
measure supports multiple measure columns in a single dataset. This is useful when you have different types of measurements (e.g., UV and MS spectra) that need separate processing.
Use the col_name parameter in input steps:
By default, processing steps operate on all measure columns:
To process specific columns, use the measures
parameter:
When multiple measure columns exist, output steps require you to specify which column to output:
rec <- rec |>
step_measure_output_wide(measures = ".uv_spectrum", prefix = "uv_") |>
step_measure_output_wide(measures = ".ms_spectrum", prefix = "ms_")If you don’t specify and multiple columns exist, you’ll get a helpful error message telling you which columns are available.