--- title: "Reference: Multi-Dimensional Measurements" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Reference: Multi-Dimensional Measurements} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(measure) library(recipes) library(dplyr) ``` ## Introduction Many analytical techniques produce multi-dimensional data. Examples include: - **LC-DAD**: Liquid chromatography with diode array detection (time × wavelength) - **GC×GC**: Comprehensive two-dimensional gas chromatography (time₁ × time₂) - **EEM**: Excitation-emission matrix fluorescence (excitation × emission wavelength) - **2D NMR**: Two-dimensional nuclear magnetic resonance (chemical shift × chemical shift) The `measure` package provides native support for n-dimensional measurement data through the `measure_nd_tbl` and `measure_nd_list` classes. ## Creating 2D Measurement Data Let's create synthetic LC-DAD data with retention time and wavelength dimensions: ```{r create-data} set.seed(42) # Simulate 3 samples with LC-DAD measurements # 10 time points × 4 wavelengths = 40 data points per sample lc_dad_data <- tibble( sample_id = rep(1:3, each = 40), retention_time = rep(rep(seq(0, 9, by = 1), each = 4), 3), wavelength = rep(c(254, 280, 320, 350), 30), absorbance = rnorm(120, mean = 100, sd = 10), concentration = rep(c(10, 25, 50), each = 40) ) head(lc_dad_data, 12) ``` This is the typical "long format" for 2D analytical data, where each row represents a single measurement at a specific (time, wavelength) coordinate. ## Ingesting 2D Data Use `step_measure_input_long()` with multiple location columns to create a 2D measure column: ```{r ingest-2d} rec <- recipe(concentration ~ ., data = lc_dad_data) |> update_role(sample_id, new_role = "id") |> step_measure_input_long( absorbance, location = vars(retention_time, wavelength), dim_names = c("time", "wavelength"), dim_units = c("min", "nm") ) |> prep() result <- bake(rec, new_data = NULL) result ``` The `.measures` column now contains `measure_nd_list` objects - one 2D measurement per sample: ```{r inspect-class} class(result$.measures) measure_ndim(result$.measures) ``` ## Inspecting 2D Measurements Each element of the measure column is a `measure_nd_tbl`: ```{r inspect-element} # First sample's measurement m1 <- result$.measures[[1]] class(m1) m1 ``` Dimension metadata is preserved: ```{r metadata} measure_dim_names(m1) measure_dim_units(m1) ``` ## Grid Information The `measure_grid_info()` function provides detailed information about the measurement grid: ```{r grid-info} info <- measure_grid_info(m1) info$ndim info$shape info$n_points info$is_regular ``` A "regular" grid means all combinations of location values are present (complete rectangular grid). ## Applying 1D Operations to 2D Data The `measure_apply()` function enables existing 1D preprocessing operations to work on n-dimensional data by applying them along specified dimensions. ```{r define-smooth} # Define a simple 1D smoothing function smooth_1d <- function(x, window = 3) { if (nrow(x) < window) return(x) smoothed <- stats::filter(x$value, rep(1/window, window), sides = 2) valid <- !is.na(smoothed) new_measure_tbl( location = x$location[valid], value = as.numeric(smoothed[valid]) ) } ``` Apply smoothing along the time dimension (dimension 1): ```{r apply-smooth} # Apply to a single 2D measurement smoothed <- measure_apply(m1, smooth_1d, along = 1, window = 3) # Original had 40 points (10 time × 4 wavelength) nrow(m1) # Smoothed has fewer points (edges removed by filter) nrow(smoothed) ``` The function was applied independently to each wavelength slice, treating time as the 1D axis. ## Converting Back to Long Format Use `step_measure_output_long()` to convert the nested measure back to long format: ```{r output-long} output_rec <- recipe(concentration ~ ., data = lc_dad_data) |> update_role(sample_id, new_role = "id") |> step_measure_input_long( absorbance, location = vars(retention_time, wavelength) ) |> step_measure_output_long( values_to = "absorbance", location_to = "loc" ) |> prep() output_result <- bake(output_rec, new_data = NULL) head(output_result) ``` For 2D data, location columns are named with dimension suffixes (`loc_1`, `loc_2`). ## Irregular Grids Not all 2D data forms a regular rectangular grid. The package handles irregular grids gracefully: ```{r irregular} # Create irregular data (different wavelengths sampled at different times) irregular_data <- tibble( sample_id = rep(1, 7), time = c(0, 0, 0, 5, 5, 10, 10), wavelength = c(254, 280, 320, 254, 280, 254, 350), value = rnorm(7), outcome = 1 ) irr_rec <- recipe(outcome ~ ., data = irregular_data) |> update_role(sample_id, new_role = "id") |> step_measure_input_long(value, location = vars(time, wavelength)) |> prep() irr_result <- bake(irr_rec, new_data = NULL) # Check regularity measure_is_regular(irr_result$.measures[[1]]) ``` ## Dimension Reduction Operations The package provides several operations for reducing dimensionality of nD data. ### Unfolding and Folding `measure_unfold()` converts nD data to 1D for use with modeling techniques that expect vectors: ```{r unfold} # Unfold 2D to 1D m1d <- measure_unfold(m1) m1d # The fold metadata is preserved attr(m1d, "fold_info")$ndim ``` `measure_fold()` reconstructs the original nD structure: ```{r fold} # Reconstruct 2D from 1D m2d_restored <- measure_fold(m1d) measure_ndim(m2d_restored) ``` ### Slicing `measure_slice()` extracts subsets at specific coordinates: ```{r slice} # Extract data at wavelength = 254 slice_254 <- measure_slice(m1, wavelength = 254) slice_254 # Extract multiple wavelengths (keeps 2D structure) slice_uv <- measure_slice(m1, wavelength = c(254, 280), drop = FALSE) measure_ndim(slice_uv) ``` ### Projection `measure_project()` aggregates across dimensions: ```{r project} # Average across wavelengths to get time trace time_trace <- measure_project(m1, along = "wavelength") time_trace # Sum across time to get total absorbance per wavelength wl_total <- measure_project(m1, along = "time", fn = sum) wl_total ``` ## Multi-Channel Operations When working with multiple detector channels (e.g., UV + RI in SEC, or multiple wavelengths in LC-DAD), the package provides steps for aligning, combining, and computing ratios between channels. ### Channel Alignment `step_measure_channel_align()` aligns multiple measure columns to a common grid: ```{r channel-align, eval = FALSE} # Align UV and RI detector signals to the same time grid rec <- recipe(outcome ~ ., data = sec_data) |> step_measure_input_wide(starts_with("uv_"), col_name = "uv") |> step_measure_input_wide(starts_with("ri_"), col_name = "ri") |> step_measure_channel_align(uv, ri, method = "intersection") |> prep() ``` Methods include: - `"intersection"`: Keep only locations present in all channels - `"union"`: Include all locations, interpolating missing values - `"reference"`: Align all channels to a reference channel's grid ### Channel Combination `step_measure_channel_combine()` merges multiple channels: ```{r channel-combine, eval = FALSE} # Stack channels into a single 2D measure (location x channel) rec <- recipe(outcome ~ ., data = multi_detector_data) |> step_measure_input_wide(starts_with("uv_"), col_name = "uv") |> step_measure_input_wide(starts_with("ri_"), col_name = "ri") |> step_measure_channel_align(uv, ri) |> step_measure_channel_combine(uv, ri, strategy = "stack") |> prep() ``` Strategies include: - `"stack"`: Create a 2D measure with channel as a dimension - `"concat"`: Concatenate into a single 1D vector - `"mean"` or `"weighted_sum"`: Combine into a single channel ### Channel Ratios `step_measure_channel_ratio()` computes ratios between channels: ```{r channel-ratio, eval = FALSE} # Compute UV/RI ratio for each sample rec <- recipe(outcome ~ ., data = sec_data) |> step_measure_input_wide(starts_with("uv_"), col_name = "uv") |> step_measure_input_wide(starts_with("ri_"), col_name = "ri") |> step_measure_channel_align(uv, ri) |> step_measure_channel_ratio(numerator = uv, denominator = ri) |> prep() ``` ## Multi-Way Analysis For extracting interpretable components from 2D or 3D measurement data, the package provides multi-way decomposition methods. ### PARAFAC Decomposition `step_measure_parafac()` performs Parallel Factor Analysis, extracting trilinear components: ```{r parafac, eval = FALSE} # Extract 3 PARAFAC components from EEM fluorescence data rec <- recipe(concentration ~ ., data = eem_data) |> step_measure_input_long( fluorescence, location = vars(excitation, emission) ) |> step_measure_parafac(n_components = 3) |> prep() # Result contains parafac_1, parafac_2, parafac_3 score columns baked <- bake(rec, new_data = NULL) ``` PARAFAC is particularly useful for: - EEM fluorescence (excitation x emission matrices) - Resolving overlapping chromatographic peaks - Identifying underlying chemical species in mixtures ### Tucker Decomposition `step_measure_tucker()` provides more flexibility with independent ranks per mode: ```{r tucker, eval = FALSE} # Tucker decomposition with different ranks for each dimension rec <- recipe(concentration ~ ., data = lc_dad_data) |> step_measure_input_long( absorbance, location = vars(time, wavelength) ) |> step_measure_tucker(ranks = c(5, 3)) |> # 5 time, 3 wavelength components prep() ``` ### MCR-ALS (Experimental) `step_measure_mcr_als()` implements Multivariate Curve Resolution with Alternating Least Squares: ```{r mcr-als, eval = FALSE} # MCR-ALS with non-negativity constraints rec <- recipe(concentration ~ ., data = chrom_data) |> step_measure_input_long( absorbance, location = vars(time, wavelength) ) |> step_measure_mcr_als( n_components = 3, non_negativity = TRUE ) |> prep() ``` > **Note:** MCR-ALS is marked as experimental. The implementation uses a simple ALS algorithm suitable for exploratory analysis. ## Summary Key functions for multi-dimensional measurement data: | Function | Purpose | |----------|---------| | `step_measure_input_long()` | Ingest nD data with multiple location columns | | `step_measure_output_long()` | Convert nD data back to long format | | `measure_ndim()` | Get number of dimensions | | `measure_dim_names()` | Get semantic dimension names | | `measure_dim_units()` | Get dimension units | | `measure_is_regular()` | Check if grid is regular/rectangular | | `measure_grid_info()` | Get detailed grid information | | `measure_apply()` | Apply 1D functions along specified dimensions | | `measure_unfold()` | Convert nD to 1D with fold metadata | | `measure_fold()` | Reconstruct nD from unfolded 1D | | `measure_slice()` | Extract slices at specific coordinates | | `measure_project()` | Aggregate across dimensions | | `step_measure_channel_align()` | Align channels to common grid | | `step_measure_channel_combine()` | Combine multiple channels | | `step_measure_channel_ratio()` | Compute ratios between channels | | `step_measure_parafac()` | PARAFAC decomposition | | `step_measure_tucker()` | Tucker decomposition | | `step_measure_mcr_als()` | MCR-ALS decomposition (experimental) | ## See Also - [Getting Started](../articles/tutorial-getting-started.html) - Introduction to measure workflows - [Preprocessing Reference](../articles/reference-preprocessing.html) - Guide to preprocessing techniques - [Internal Class System](explanation-internals.html) - How measure data structures work