--- title: "Model fitting for partially observed functional data" output: rmarkdown::html_vignette: toc: yes fig_width: 6 vignette: > %\VignetteIndexEntry{Model fitting for partially observed functional data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction Partially observed functional data arise when each curve is recorded only on part of its domain, leaving one or more gaps. This is common when sensors fail for a period, when measurements are taken only while a condition holds, or when images are partially occluded. Discarding the incomplete curves or imputing the missing parts can bias the analysis, so it is preferable to work directly with the observed information. The `VDPO` package implements a scalar-on-function regression model for this setting. The functional covariate and the functional coefficient are represented with B-spline bases, and the model is estimated through the mixed model representation of penalized splines, which also takes care of the smoothing parameters. The approach is described in Hernandez-Amaro et al. (2025). This vignette shows how to simulate partially observed data, fit the model with `po_fit`, and inspect the estimated functional coefficient together with its pointwise confidence intervals. ``` r library(VDPO) ``` ## Data generation The `data_generator_po_1d` function simulates a scalar response together with a partially observed functional covariate. Its main arguments control the number of curves (`n`), the number of points in the common grid (`grid_points`), the response distribution (`response_type`), the shape of the true coefficient (`linear_predictor`) and the number of unobserved segments per curve (`n_missing`). ``` r set.seed(123) sim <- data_generator_po_1d(n = 150, grid_points = 80) ``` The returned list contains the observed curves in `noisy_curves_miss`, with `NA` on the unobserved part of each domain, the indices of the unobserved points in `missing_points`, the common grid in `grid`, the scalar `response`, and the true functional coefficient in `beta`, which we will use to assess the estimate. ``` r mean(is.na(sim$noisy_curves_miss)) #> [1] 0.2 ``` ## Model fitting The functional covariate enters the model through the `ffpo` constructor. Its arguments are the matrix of observed curves (`X`), the list of unobserved indices (`missing_points`), the common `grid`, and the basis specification: `nbasis` gives the number of basis functions for the curve representation and for the coefficient. The model is fitted with `po_fit`, which uses the same formula interface as the other model-fitting functions in the package. ``` r fit <- po_fit( response ~ ffpo(X = sim$noisy_curves_miss, missing_points = sim$missing_points, grid = sim$grid, nbasis = c(30, 30)), data = list(response = sim$response, X = sim$noisy_curves_miss, grid = sim$grid, missing_points = sim$missing_points) ) ``` The fitted object is a list of class `po_fit`. It returns the estimated functional coefficient (`Beta`), the intercept, the basis coefficients (`theta`) and their covariance (`covar_theta`), the observed-domain information (`M`), and the evaluations of the functional term (`ffpo_evals`). ``` r str(fit, max.level = 1) #> List of 7 #> $ fit :List of 15 #> ..- attr(*, "class")= chr "sop" #> $ Beta :List of 1 #> $ intercept : num 0.105 #> $ theta : num [1:30, 1] -0.01223 0.00181 0.01579 0.02877 0.03864 ... #> $ covar_theta: num [1:30, 1:30] 2.13e-04 1.20e-04 4.76e-05 6.97e-06 -8.03e-06 ... #> $ M :List of 1 #> $ ffpo_evals :List of 1 #> - attr(*, "class")= chr "po_fit" #> - attr(*, "N")= int 150 ``` ## Estimated coefficient The element `Beta` holds, for each functional term, a data frame with the grid (`t`), the estimated coefficient (`beta`), its standard error (`se`) and the lower and upper limits of the pointwise confidence interval (`lower` and `upper`). ``` r b <- fit$Beta[[1]] head(b) #> t beta se lower upper #> 1 0.0000000 0.001797685 0.008967647 -0.015778581 0.01937395 #> 2 0.1265823 0.006577162 0.007498188 -0.008119017 0.02127334 #> 3 0.2531646 0.011312337 0.006318194 -0.001071096 0.02369577 #> 4 0.3797468 0.015964972 0.005450567 0.005282057 0.02664789 #> 5 0.5063291 0.020487481 0.004870483 0.010941509 0.03003345 #> 6 0.6329114 0.024802314 0.004556353 0.015872026 0.03373260 ``` The coefficient is displayed together with its confidence band. ``` r plot(b$t, b$beta, type = "n", xlab = "t", ylab = expression(beta(t)), ylim = range(c(b$lower, b$upper))) polygon(c(b$t, rev(b$t)), c(b$lower, rev(b$upper)), col = grDevices::adjustcolor("#2c7fb8", 0.25), border = NA) lines(b$t, b$beta, col = "#2c7fb8", lwd = 2) lines(b$t, b$lower, col = "#2c7fb8", lty = 2) lines(b$t, b$upper, col = "#2c7fb8", lty = 2) legend("topright", legend = c("Estimate", "95% CI"), col = "#2c7fb8", lty = c(1, 2), lwd = c(2, 1), bty = "n") ``` ![](includes/VDPO-03-unnamed-chunk-7-1.png) Because the data are simulated, the estimate can be compared with the true coefficient. The correlation between the two is high, which confirms that the shape of the coefficient is recovered well. ``` r true_beta <- sim$beta[seq_along(b$beta)] cor(b$beta, true_beta) #> [1] 0.9938624 ``` ## The two-dimensional case When the functional observations are surfaces that are partially observed, the same workflow applies through the analogous functions `data_generator_po_2d`, `ffpo_2d` and `po_2d_fit`. The fitted object has the same structure, with the estimated coefficient returned as a surface. ## References Hernandez-Amaro et al. (2025). A new scalar on function generalized additive model for partially observed functional data. .