Package 'VDPO' reference manual

Title:	Working with and Analyzing Functional Data of Varying Lengths
Description:	Comprehensive set of tools for analyzing and manipulating functional data with non-uniform lengths. This package addresses two common scenarios in functional data analysis: Variable Domain Data, where the observation domain differs across samples, and Partially Observed Data, where observations are incomplete over the domain of interest. 'VDPO' enhances the flexibility and applicability of functional data analysis in 'R'. See Amaro et al. (2024) <doi:10.48550/arXiv.2401.05839>.
Authors:	Pavel Hernandez [aut, cre], Jose Ignacio Diez [ctr], Maria Durban [ctb], Maria del Carmen Aguilera-Morillo [ctb]
Maintainer:	Pavel Hernandez <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2025-04-02 06:38:04 UTC
Source:	https://github.com/pavel-hernadez-amaro/vdpo

Grid adder for dataframes

Description

It prepared the partially observed data to be inputed in the ffpo function. This function should only be used when the bidimensional_grid parameter of the ffpo function is FALSE.

Usage

add_grid(df, grid)
add_grid(df, grid)

Arguments

`df`	`data.frame` object to which the grid will be added.
`grid`	Grid vector.

Value

data.frame with the grid added.

Generate 1D functional data for simulation studies

Description

Creates synthetic 1D functional data with optional noise components and different coefficient patterns. Uses trapezoidal rule for integration.

Usage

data_generator_po_1d(
  n = 100,
  grid_points = 100,
  noise_sd = 0.015,
  rsq = 0.95,
  beta_type = c("sin", "gaussian"),
  n_missing = 1,
  min_distance = NULL
)
data_generator_po_1d(
  n = 100,
  grid_points = 100,
  noise_sd = 0.015,
  rsq = 0.95,
  beta_type = c("sin", "gaussian"),
  n_missing = 1,
  min_distance = NULL
)

Arguments

`n`	Number of samples to generate. Default is 100.
`grid_points`	Number of points in the grid. Default is 100.
`noise_sd`	Standard deviation of measurement noise. Default is 0.015.
`rsq`	Desired R-squared value for the response. Default is 0.95.
`beta_type`	Type of coefficient function ("sin" or "gaussian"). Default is "sin".
`n_missing`	Number of missing segments per curve. Default is 1.
`min_distance`	Minimum length of missing segments. Default is NULL (auto-calculated).

Value

A list containing:

curves: List of n true (noiseless) curves
noisy_curves: List of n observed (noisy) curves
noisy_curves_miss: List containing curves with missing values
response: Vector of n response values
grid: Grid points
beta: True coefficient function
stochastic_components: Vector of a values used for each curve

Generate 1D Functional Data for Simulation Studies

Creates synthetic 1D functional data with optional noise components and different coefficient patterns. Uses the trapezoidal rule for numerical integration.

A list containing:

curves: Matrix of n true (noiseless) curves, each as a row.
noisy_curves: Matrix of n observed (noisy) curves, each as a row.
noisy_curves_miss: Matrix of noisy curves with missing values.
miss_points: Indices of the missing segments in the noisy curves.
missing_points: Details of the missing segments for each curve.
response: Vector of n response values.
grid: Grid points on which the curves are defined.
beta: Coefficient function applied to the curves.
stochastic_components: List of stochastic coefficients used for each curve.

Examples

# Generate basic 1D functional data with default parameters
data <- data_generator_po_1d(n = 10)

# Generate data with a Gaussian-shaped coefficient function
data <- data_generator_po_1d(n = 2, beta_type = "gaussian")

# Generate data with higher grid resolution
data <- data_generator_po_1d(n = 2, grid_points = 200)

# Generate data with larger measurement noise
data <- data_generator_po_1d(n = 2, noise_sd = 0.05)

# Introduce missing segments in the curves
data <- data_generator_po_1d(n = 2, n_missing = 3, min_distance = 10)

# Generate data with low R-squared value
data <- data_generator_po_1d(n = 2, rsq = 0.8)

# Generate basic 1D functional data with default parameters
data <- data_generator_po_1d(n = 10)

# Generate data with a Gaussian-shaped coefficient function
data <- data_generator_po_1d(n = 2, beta_type = "gaussian")

# Generate data with higher grid resolution
data <- data_generator_po_1d(n = 2, grid_points = 200)

# Generate data with larger measurement noise
data <- data_generator_po_1d(n = 2, noise_sd = 0.05)

# Introduce missing segments in the curves
data <- data_generator_po_1d(n = 2, n_missing = 3, min_distance = 10)

# Generate data with low R-squared value
data <- data_generator_po_1d(n = 2, rsq = 0.8)

Generate 2D functional data for simulation studies

Description

Creates synthetic 2D functional data with optional noise components and different coefficient patterns. Uses Simpson's rule for accurate integration.

Usage

data_generator_po_2d(
  n = 20,
  grid_x = 20,
  grid_y = 20,
  noise_sd = 0.015,
  rsq = 0.95,
  beta_type = c("saddle", "exp"),
  response_type = c("gaussian", "binomial"),
  a1 = NULL,
  a2 = NULL,
  sub_response = 50,
  n_missing = 1,
  min_distance_x = NULL,
  min_distance_y = NULL
)
data_generator_po_2d(
  n = 20,
  grid_x = 20,
  grid_y = 20,
  noise_sd = 0.015,
  rsq = 0.95,
  beta_type = c("saddle", "exp"),
  response_type = c("gaussian", "binomial"),
  a1 = NULL,
  a2 = NULL,
  sub_response = 50,
  n_missing = 1,
  min_distance_x = NULL,
  min_distance_y = NULL
)

Arguments

`n`	Number of samples to generate.
`grid_x`	Number of points in x-axis grid. Default is 20.
`grid_y`	Number of points in y-axis grid. Default is 20.
`noise_sd`	Standard deviation of measurement noise. Default is 0.015.
`rsq`	Desired R-squared value for the response. Default is 0.95.
`beta_type`	Type of coefficient surface ("saddle" or "exp"). Default is "saddle".
`response_type`	Type of the response variable ("gaussian" or "binomial"). Default is "gaussian".
`a1`	Optional fixed value for first stochastic component. If provided, a2 must also be provided.
`a2`	Optional fixed value for second stochastic component. If provided, a1 must also be provided.
`sub_response`	Number of intervals for Simpson integration. Default is 50.
`n_missing`	Number of holes in every curve.
`min_distance_x`	Length of the holes in the x axis.
`min_distance_y`	Length of the holes in the y axis.

Value

A list containing:

surfaces: List of n true (noiseless) surfaces
noisy_surfaces: List of n observed (noisy) surfaces
response: Vector of n response values
grid_x: x-axis grid points
grid_y: y-axis grid points
beta: True coefficient surface
stochastic_components: Matrix of a1 and a2 values used for each surface

Examples

# Generate basic 2D functional data with default parameters
data <- data_generator_po_2d(n = 2)

# Generate data with custom grid size and Gaussian response
data <- data_generator_po_2d(n = 2, grid_x = 30, grid_y = 30, response_type = "gaussian")

# Generate data with binomial response and saddle-shaped coefficient surface
data <- data_generator_po_2d(n = 2, response_type = "binomial", beta_type = "saddle")

# Generate data with fixed stochastic components
data <- data_generator_po_2d(n = 2, a1 = 0.1, a2 = -0.2)

# Introduce missing data with holes along curves
data <- data_generator_po_2d(n = 2, n_missing = 3, min_distance_x = 5, min_distance_y = 5)

# Generate basic 2D functional data with default parameters
data <- data_generator_po_2d(n = 2)

# Generate data with custom grid size and Gaussian response
data <- data_generator_po_2d(n = 2, grid_x = 30, grid_y = 30, response_type = "gaussian")

# Generate data with binomial response and saddle-shaped coefficient surface
data <- data_generator_po_2d(n = 2, response_type = "binomial", beta_type = "saddle")

# Generate data with fixed stochastic components
data <- data_generator_po_2d(n = 2, a1 = 0.1, a2 = -0.2)

# Introduce missing data with holes along curves
data <- data_generator_po_2d(n = 2, n_missing = 3, min_distance_x = 5, min_distance_y = 5)

Data generator function for the variable domain case

Description

Generates a variable domain functional regression model

Usage

data_generator_vd(
  N = 100,
  J = 100,
  nsims = 1,
  Rsq = 0.95,
  aligned = TRUE,
  multivariate = FALSE,
  beta_index = 1,
  use_x = FALSE,
  use_f = FALSE
)
data_generator_vd(
  N = 100,
  J = 100,
  nsims = 1,
  Rsq = 0.95,
  aligned = TRUE,
  multivariate = FALSE,
  beta_index = 1,
  use_x = FALSE,
  use_f = FALSE
)

Arguments

`N`	Number of subjects.
`J`	Number of maximum observations per subject.
`nsims`	Number of simulations per the simulation study.
`Rsq`	Variance of the model.
`aligned`	If the data that will be generated is aligned or not.
`multivariate`	If TRUE, the data is generated with 2 functional variables.
`beta_index`	Index for the beta.
`use_x`	If the data is generated with x.
`use_f`	If the data is generated with f.

Value

A list containing the following components:

y: vector of length N containing the response variable.
X_s: matrix of non-noisy functional data for the first functional covariate.
X_se: matrix of noisy functional data for the first functional covariate
Y_s: matrix of non-noisy functional data for the second functional covariate (if multivariate).
Y_se: matrix of noisy functional data for the second covariate (if multivariate).
x1: vector of length N containing the non-functional covariate (if use_x is TRUE).
x2: vector of length N containing the observed values of the smooth term (if use_f is TRUE).
smooth_term: vector of length N containing a smooth term (if use_f is TRUE).
Beta: array containing the true functional coefficients.

Examples

# Basic usage with default parameters
sim_data <- data_generator_vd()

# Generate data with non-aligned domains
non_aligned_data <- data_generator_vd(N = 150, J = 120, aligned = FALSE)

# Generate multivariate functional data
multivariate_data <- data_generator_vd(N = 200, J = 100, multivariate = TRUE)

# Generate data with non-functional covariates and smooth term
complex_data <- data_generator_vd(
  N = 100,
  J = 150,
  use_x = TRUE,
  use_f = TRUE
)

# Generate data with a different beta function and R-squared value
custom_beta_data <- data_generator_vd(
  N = 80,
  J = 80,
  beta_index = 2,
  Rsq = 0.8
)

# Access components of the generated data
y <- sim_data$y # Response variable
X_s <- sim_data$X_s # Noise-free functional covariate
X_se <- sim_data$X_se # Noisy functional covariate

# Basic usage with default parameters
sim_data <- data_generator_vd()

# Generate data with non-aligned domains
non_aligned_data <- data_generator_vd(N = 150, J = 120, aligned = FALSE)

# Generate multivariate functional data
multivariate_data <- data_generator_vd(N = 200, J = 100, multivariate = TRUE)

# Generate data with non-functional covariates and smooth term
complex_data <- data_generator_vd(
  N = 100,
  J = 150,
  use_x = TRUE,
  use_f = TRUE
)

# Generate data with a different beta function and R-squared value
custom_beta_data <- data_generator_vd(
  N = 80,
  J = 80,
  beta_index = 2,
  Rsq = 0.8
)

# Access components of the generated data
y <- sim_data$y # Response variable
X_s <- sim_data$X_s # Noise-free functional covariate
X_se <- sim_data$X_se # Noisy functional covariate

Defining partially observed functional data terms in VDPO formulae

Description

Auxiliary function used to define ffpo terms within VDPO model formulae.

Usage

ffpo(X, grid, bidimensional_grid = FALSE, nbasis = c(30, 30), bdeg = c(3, 3))
ffpo(X, grid, bidimensional_grid = FALSE, nbasis = c(30, 30), bdeg = c(3, 3))

Arguments

`X`	partially observed functional covariate `matrix`.
`grid`	observation grid of the covariate.
`bidimensional_grid`	boolean value that specifies if the grid should be treated as 1-dimensional or 2-dimensional. The default value is `FALSE` (1-dimensional). See also 'Details'.
`nbasis`	number of basis to be used.
`bdeg`	degree of the basis to be used.

Details

When the same observation points are used for every functional covariate, we end up with a vector observation grid. Imagine plotting multiple curves, each representing a functional covariate, all measured at the same time instances.

Conversely, if the observation points differ for each functional covariate, we have a matrix observation grid. Picture a matrix where each row represents a functional covariate, and the columns denote distinct observation points. Varying observation points introduce complexity, as each covariate might be sampled at different time instances.

Value

the function is interpreted in the formula of a VDPO model. list containing the following elements:

B_ffpo design matrix.
Phi B-spline basis used for the functional coefficient.
M vector or matrix object indicating the observed domain of the data.
nbasis number of the basis used.

Defining partially observed bidimensional functional data terms in VDPO formulae

Description

Auxiliary function used to define ffpo_2d terms within VDPO model formulae.

Usage

ffpo_2d(X, miss_points, missing_points, nbasis = rep(15, 4), bdeg = rep(3, 4))
ffpo_2d(X, miss_points, missing_points, nbasis = rep(15, 4), bdeg = rep(3, 4))

Arguments

`X`	partially observed bidimensional functional covariate `matrix`.
`miss_points`, `missing_points`	`list` of missing observation points. See 'Details' for more information about the difference in structure between both.
`nbasis`	number of basis to be used.
`bdeg`	degree of the basis to be used.

Details

The difference between miss_points and missing_points is the format in which the data is presented.

miss_points is a list of lists where each inner list corresponds to the observation points in the y-axis and contains the observation points of the missing values for the x-axis. miss_points acts as a guide for identifying and addressing missing observations in functional data and is used for properly calculating the inner product matrix.

missing_points is a list where each element is a matrix containing the missing observations points.

Value

The function is interpreted in the formula of a VDPO model. list containing the following elements:

B_ffpo2d design matrix.
Phi_ffpo2d bidimensional B-spline basis used for the functional coefficient.
M_ffpo2d the missing_points used as input in the function.
nbasis number of the basis used.

Defining variable domain functional data terms in vd_fit formulae

Description

Auxiliary function used to define ffvd terms within vd_fit model formulae. This term represents a functional predictor where each function is observed over a domain of varying length. The formulation is $\frac{1}{T_i} \int _1^{T_i} X_i(t)\beta(t,T_i)dt$ , where $X_i(t)$ is a functional covariate of length $T_i$ , and $\beta(t,T_i)$ is an unknown bivariate functional coefficient. The functional basis used to model this term is the B-spline basis.

Usage

ffvd(X, grid, nbasis = c(30, 50, 30), bdeg = c(3, 3, 3))
ffvd(X, grid, nbasis = c(30, 50, 30), bdeg = c(3, 3, 3))

Arguments

`X`	variable domain functional covariate `matrix`.
`grid`	observation points of the variable domain functional covariate. If not provided, it will be `1:ncol(X)`.
`nbasis`	number of bspline basis to be used.
`bdeg`	degree of the bspline basis used.

Value

the function is interpreted in the formula of a VDPO model. list containing the following elements:

An item named B design matrix.
An item named X_hat smoothed functional covariate.
An item named L_Phi and B_T 1-dimensional marginal B-spline basis used for the functional coefficient.
An item named M matrix object indicating the observed domain of the data.
An item named nbasis number of basis used.

Examples

# Generate sample data
set.seed(123)
data <- data_generator_vd(beta_index = 1, use_x = FALSE, use_f = FALSE)
X <- data$X_se

# Specifying a custom grid
custom_grid <- seq(0, 1, length.out = ncol(X))
ffvd_term_custom_grid <- ffvd(X, grid = custom_grid, nbasis = c(10, 10, 10))

# Customizing the number of basis functions
ffvd_term_custom_basis <- ffvd(X, nbasis = c(10, 10, 10))

# Customizing both basis functions and degrees
ffvd_term_custom <- ffvd(X, nbasis = c(10, 10, 10), bdeg = c(3, 3, 3))

# Generate sample data
set.seed(123)
data <- data_generator_vd(beta_index = 1, use_x = FALSE, use_f = FALSE)
X <- data$X_se

# Specifying a custom grid
custom_grid <- seq(0, 1, length.out = ncol(X))
ffvd_term_custom_grid <- ffvd(X, grid = custom_grid, nbasis = c(10, 10, 10))

# Customizing the number of basis functions
ffvd_term_custom_basis <- ffvd(X, nbasis = c(10, 10, 10))

# Customizing both basis functions and degrees
ffvd_term_custom <- ffvd(X, nbasis = c(10, 10, 10), bdeg = c(3, 3, 3))

Plot Functional Curves with Confidence Intervals

Description

Generates a plot of functional Beta estimates for specified curves, along with their 95% confidence intervals. This function computes the 95% confidence intervals for each curve based on the covariance matrix and the fitted values from the provided object. The resulting plot includes estimated curves, confidence interval ribbons, and a legend distinguishing the curves.

Usage

plot_ci(object, beta_index = 1, curves)
plot_ci(object, beta_index = 1, curves)

Arguments

`object`	An object of class `'vd_fit'` or similar, containing the fitted model results, Beta estimates, and evaluation details.
`beta_index`	An integer specifying which Beta coefficient matrix to use. Default is 1.
`curves`	A numeric vector specifying the indices of the curves (rows) to plot.

Value

A ggplot2 object displaying the Beta estimates and confidence intervals for the specified curves.

Examples

## Not run: 
if (requireNamespace("ggplot2", quietly = TRUE)) {
  # Assuming `model_object` is an object of class 'vd_fit'
  plot_functional_curves_combined(model_object, beta = 1, curves = c(50, 70, 100))
}

## End(Not run)

## Not run: 
if (requireNamespace("ggplot2", quietly = TRUE)) {
  # Assuming `model_object` is an object of class 'vd_fit'
  plot_functional_curves_combined(model_object, beta = 1, curves = c(50, 70, 100))
}

## End(Not run)

Title

Description

Title

Usage

po_2d_fit(formula, data, family = stats::gaussian(), offset = NULL)
po_2d_fit(formula, data, family = stats::gaussian(), offset = NULL)

Arguments

`formula`	.
`data`	.
`family`	.
`offset`	.

Value

Title

Description

Title

Usage

po_fit(formula, data, family = stats::gaussian(), offset = NULL)
po_fit(formula, data, family = stats::gaussian(), offset = NULL)

Arguments

`formula`	.
`data`	.
`family`	.
`offset`	.

Value

Estimation of the generalized additive functional regression models for variable domain functional data

Description

The vd_fit function fits generalized additive functional regression models for variable domain functional data.

Usage

vd_fit(formula, data, family = stats::gaussian(), offset = NULL)
vd_fit(formula, data, family = stats::gaussian(), offset = NULL)

Arguments

`formula`	a formula object with at least one `ffvd` term.
`data`	a `list` object containing the response variable and the covariates as the components of the list.
`family`	a `family` object specifying the distribution from which the data originates. The default distribution is `gaussian`.
`offset`	An offset vector. The default value is `NULL`.

Value

An object of class vd_fit. It is a list containing the following items:

An item named fit of class sop. See sop.fit.
An item named Beta which is the estimated functional coefficient.
An item named theta which is the basis coefficient of Beta.
An item named covar_theta which is the covariance matrix of theta.
An item named M which is the number of observations points for each curve.
An item named ffvd_evals which is the result of the evaluations of the ffvd terms in the formula.

Examples

# VARIABLE DOMAIN FUNCTIONAL DATA EXAMPLE

# set seed for reproducibility
set.seed(42)

# generate example data
data <- data_generator_vd(
  N = 100,
  J = 100,
  beta_index = 1,
  use_x = TRUE,
  use_f = TRUE,
)

# Define a formula object that specifies the model behavior.
# The formula includes a functional form of the variable 'X_se' using 'ffvd'
# with a non-default number of basis functions ('nbasis' is set to c(10, 10, 10)).
# Additionally, it includes a smooth function 'f' applied to 'x2' with 10 segments ('nseg = 10'),
# a second-order penalty ('pord = 2'), and cubic splines ('degree = 3').
# The model also contains the linear term 'x1'.
formula <- y ~ ffvd(X_se, nbasis = c(10, 10, 10)) + f(x2, nseg = 10, pord = 2, degree = 3) + x1

# We can fit the model using the data and the formula
res <- vd_fit(formula = formula, data = data)

# Some important parameters of the model can be accesed as follows
res$Beta # variable domain functional coefficient
res$fit$fitted.values # estimated response variable

# Also, a summary of the fit can be accesed using the summary function
summary(res)

# And a heatmap for an specific beta can be obtained using the plot function
plot(res, beta_index = 1)

# VARIABLE DOMAIN FUNCTIONAL DATA EXAMPLE

# set seed for reproducibility
set.seed(42)

# generate example data
data <- data_generator_vd(
  N = 100,
  J = 100,
  beta_index = 1,
  use_x = TRUE,
  use_f = TRUE,
)

# Define a formula object that specifies the model behavior.
# The formula includes a functional form of the variable 'X_se' using 'ffvd'
# with a non-default number of basis functions ('nbasis' is set to c(10, 10, 10)).
# Additionally, it includes a smooth function 'f' applied to 'x2' with 10 segments ('nseg = 10'),
# a second-order penalty ('pord = 2'), and cubic splines ('degree = 3').
# The model also contains the linear term 'x1'.
formula <- y ~ ffvd(X_se, nbasis = c(10, 10, 10)) + f(x2, nseg = 10, pord = 2, degree = 3) + x1

# We can fit the model using the data and the formula
res <- vd_fit(formula = formula, data = data)

# Some important parameters of the model can be accesed as follows
res$Beta # variable domain functional coefficient
res$fit$fitted.values # estimated response variable

# Also, a summary of the fit can be accesed using the summary function
summary(res)

# And a heatmap for an specific beta can be obtained using the plot function
plot(res, beta_index = 1)

Package 'VDPO'

Help Index

Grid adder for dataframes

Description

Usage

Arguments

Value

See Also

Generate 1D functional data for simulation studies

Description

Usage

Arguments

Value

Examples

Generate 2D functional data for simulation studies

Description

Usage

Arguments

Value

Examples

Data generator function for the variable domain case

Description

Usage

Arguments

Value

Examples

Defining partially observed functional data terms in VDPO formulae

Description

Usage

Arguments

Details

Value

See Also

Defining partially observed bidimensional functional data terms in VDPO formulae

Description

Usage

Arguments

Details

Value

Defining variable domain functional data terms in vd_fit formulae

Description

Usage

Arguments

Value

Examples

Plot Functional Curves with Confidence Intervals

Description

Usage

Arguments

Value

Examples

Title

Description

Usage

Arguments

Value

Title

Description

Usage

Arguments

Value

Estimation of the generalized additive functional regression models for variable domain functional data

Description

Usage

Arguments

Value

See Also

Examples