Package 'VDPO'

Title: Working with and Analyzing Functional Data of Varying Lengths
Description: Comprehensive set of tools for analyzing and manipulating functional data with non-uniform lengths. This package addresses two common scenarios in functional data analysis: Variable Domain Data, where the observation domain differs across samples, and Partially Observed Data, where observations are incomplete over the domain of interest. 'VDPO' enhances the flexibility and applicability of functional data analysis in 'R'. See Amaro et al. (2024) <doi:10.48550/arXiv.2401.05839>.
Authors: Pavel Hernandez [aut, cre], Jose Ignacio Diez [ctr], Maria Durban [ctb], Maria del Carmen Aguilera-Morillo [ctb]
Maintainer: Pavel Hernandez <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0
Built: 2024-11-25 11:19:07 UTC
Source: https://github.com/pavel-hernadez-amaro/vdpo

Help Index


Grid adder for dataframes

Description

It prepared the partially observed data to be inputed in the ffpo function. This function should only be used when the bidimensional_grid parameter of the ffpo function is FALSE.

Usage

add_grid(df, grid)

Arguments

df

data.frame object to which the grid will be added.

grid

Grid vector.

Value

data.frame with the grid added.

See Also

ffpo


Data generator function for the variable domain case

Description

Generates a variable domain functional regression model

Usage

data_generator_vd(
  N = 100,
  J = 100,
  nsims = 1,
  Rsq = 0.95,
  aligned = TRUE,
  multivariate = FALSE,
  beta_index = 1,
  use_x = FALSE,
  use_f = FALSE
)

Arguments

N

Number of subjects.

J

Number of maximum observations per subject.

nsims

Number of simulations per the simulation study.

Rsq

Variance of the model.

aligned

If the data that will be generated is aligned or not.

multivariate

If TRUE, the data is generated with 2 functional variables.

beta_index

Index for the beta.

use_x

If the data is generated with x.

use_f

If the data is generated with f.

Value

A list containing the following components:

  • y: vector of length N containing the response variable.

  • X_s: matrix of non-noisy functional data for the first functional covariate.

  • X_se: matrix of noisy functional data for the first functional covariate

  • Y_s: matrix of non-noisy functional data for the second functional covariate (if multivariate).

  • Y_se: matrix of noisy functional data for the second covariate (if multivariate).

  • x1: vector of length N containing the non-functional covariate (if use_x is TRUE).

  • x2: vector of length N containing the observed values of the smooth term (if use_f is TRUE).

  • smooth_term: vector of length N containing a smooth term (if use_f is TRUE).

  • Beta: array containing the true functional coefficients.

Examples

# Basic usage with default parameters
sim_data <- data_generator_vd()

# Generate data with non-aligned domains
non_aligned_data <- data_generator_vd(N = 150, J = 120, aligned = FALSE)

# Generate multivariate functional data
multivariate_data <- data_generator_vd(N = 200, J = 100, multivariate = TRUE)

# Generate data with non-functional covariates and smooth term
complex_data <- data_generator_vd(
  N = 100,
  J = 150,
  use_x = TRUE,
  use_f = TRUE
)

# Generate data with a different beta function and R-squared value
custom_beta_data <- data_generator_vd(
  N = 80,
  J = 80,
  beta_index = 2,
  Rsq = 0.8
)

# Access components of the generated data
y <- sim_data$y # Response variable
X_s <- sim_data$X_s # Noise-free functional covariate
X_se <- sim_data$X_se # Noisy functional covariate

Defining partially observed functional data terms in VDPO formulae

Description

Auxiliary function used to define ffpo terms within VDPO model formulae.

Usage

ffpo(X, grid, bidimensional_grid = FALSE, nbasis = c(30, 30), bdeg = c(3, 3))

Arguments

X

partially observed functional covariate matrix.

grid

observation grid of the covariate.

bidimensional_grid

boolean value that specifies if the grid should be treated as 1-dimensional or 2-dimensional. The default value is FALSE (1-dimensional). See also 'Details'.

nbasis

number of basis to be used.

bdeg

degree of the basis to be used.

Details

When the same observation points are used for every functional covariate, we end up with a vector observation grid. Imagine plotting multiple curves, each representing a functional covariate, all measured at the same time instances.

Conversely, if the observation points differ for each functional covariate, we have a matrix observation grid. Picture a matrix where each row represents a functional covariate, and the columns denote distinct observation points. Varying observation points introduce complexity, as each covariate might be sampled at different time instances.

Value

the function is interpreted in the formula of a VDPO model. list containing the following elements:

  • B_ffpo design matrix.

  • Phi B-spline basis used for the functional coefficient.

  • M vector or matrix object indicating the observed domain of the data.

  • nbasis number of the basis used.

See Also

add_grid


Defining partially observed bidimensional functional data terms in VDPO formulae

Description

Auxiliary function used to define ffpo_2d terms within VDPO model formulae.

Usage

ffpo_2d(X, miss_points, missing_points, nbasis = rep(15, 4), bdeg = rep(3, 4))

Arguments

X

partially observed bidimensional functional covariate matrix.

miss_points, missing_points

list of missing observation points. See 'Details' for more information about the difference in structure between both.

nbasis

number of basis to be used.

bdeg

degree of the basis to be used.

Details

The difference between miss_points and missing_points is the format in which the data is presented.

miss_points is a list of lists where each inner list corresponds to the observation points in the y-axis and contains the observation points of the missing values for the x-axis. miss_points acts as a guide for identifying and addressing missing observations in functional data and is used for properly calculating the inner product matrix.

missing_points is a list where each element is a matrix containing the missing observations points.

Value

The function is interpreted in the formula of a VDPO model. list containing the following elements:

  • B_ffpo2d design matrix.

  • Phi_ffpo2d bidimensional B-spline basis used for the functional coefficient.

  • M_ffpo2d the missing_points used as input in the function.

  • nbasis number of the basis used.


Defining variable domain functional data terms in vd_fit formulae

Description

Auxiliary function used to define ffvd terms within vd_fit model formulae. This term represents a functional predictor where each function is observed over a domain of varying length. The formulation is 1Ti1TiXi(t)β(t,Ti)dt\frac{1}{T_i} \int _1^{T_i} X_i(t)\beta(t,T_i)dt, where Xi(t)X_i(t) is a functional covariate of length TiT_i, and β(t,Ti)\beta(t,T_i) is an unknown bivariate functional coefficient. The functional basis used to model this term is the B-spline basis.

Usage

ffvd(X, grid, nbasis = c(30, 50, 30), bdeg = c(3, 3, 3))

Arguments

X

variable domain functional covariate matrix.

grid

observation points of the variable domain functional covariate. If not provided, it will be 1:ncol(X).

nbasis

number of bspline basis to be used.

bdeg

degree of the bspline basis used.

Value

the function is interpreted in the formula of a VDPO model. list containing the following elements:

  • An item named B design matrix.

  • An item named X_hat smoothed functional covariate.

  • An item named L_Phi and B_T 1-dimensional marginal B-spline basis used for the functional coefficient.

  • An item named M matrix object indicating the observed domain of the data.

  • An item named nbasis number of basis used.

Examples

# Generate sample data
set.seed(123)
data <- data_generator_vd(beta_index = 1, use_x = FALSE, use_f = FALSE)
X <- data$X_se

# Specifying a custom grid
custom_grid <- seq(0, 1, length.out = ncol(X))
ffvd_term_custom_grid <- ffvd(X, grid = custom_grid, nbasis = c(10, 10, 10))

# Customizing the number of basis functions
ffvd_term_custom_basis <- ffvd(X, nbasis = c(10, 10, 10))

# Customizing both basis functions and degrees
ffvd_term_custom <- ffvd(X, nbasis = c(10, 10, 10), bdeg = c(3, 3, 3))

Generate 1D functional data for simulation studies

Description

Creates synthetic 1D functional data with optional noise components and different coefficient patterns. Uses trapezoidal rule for integration.

Usage

generate_1d_po_functional_data(
  n = 100,
  grid_points = 100,
  noise_sd = 0.015,
  rsq = 0.95,
  beta_type = c("sin", "gaussian"),
  n_missing = 1,
  min_distance = NULL
)

Arguments

n

Number of samples to generate

grid_points

Number of points in the grid. Default is 100

noise_sd

Standard deviation of measurement noise. Default is 0.015

rsq

Desired R-squared value for the response. Default is 0.95

beta_type

Type of coefficient function ("sin" or "gaussian"). Default is "sin"

n_missing

Number of missing segments per curve. Default is 1

min_distance

Minimum length of missing segments. Default is NULL (auto-calculated)

Value

A list containing:

  • curves: List of n true (noiseless) curves

  • noisy_curves: List of n observed (noisy) curves

  • noisy_curves_miss: List containing curves with missing values

  • response: Vector of n response values

  • grid: Grid points

  • beta: True coefficient function

  • stochastic_components: Vector of a values used for each curve


Generate 2D functional data for simulation studies

Description

Creates synthetic 2D functional data with optional noise components and different coefficient patterns. Uses Simpson's rule for accurate integration.

Usage

generate_2d_po_functional_data(
  n = 20,
  grid_x = 20,
  grid_y = 20,
  noise_sd = 0.015,
  rsq = 0.95,
  beta_type = c("saddle", "exp"),
  response_type = c("gaussian", "binomial"),
  a1 = NULL,
  a2 = NULL,
  sub_response = 50,
  n_missing = 1,
  min_distance_x = NULL,
  min_distance_y = NULL
)

Arguments

n

Number of samples to generate.

grid_x

Number of points in x-axis grid. Default is 20.

grid_y

Number of points in y-axis grid. Default is 20.

noise_sd

Standard deviation of measurement noise. Default is 0.015.

rsq

Desired R-squared value for the response. Default is 0.95.

beta_type

Type of coefficient surface ("saddle" or "exp"). Default is "saddle".

response_type

Type of the response variable ("gaussian" or "binomial"). Default is "gaussian".

a1

Optional fixed value for first stochastic component. If provided, a2 must also be provided.

a2

Optional fixed value for second stochastic component. If provided, a1 must also be provided.

sub_response

Number of intervals for Simpson integration. Default is 50.

n_missing

Number of holes in every curve.

min_distance_x

Length of the holes in the x axis.

min_distance_y

Length of the holes in the y axis.

Value

A list containing:

  • surfaces: List of n true (noiseless) surfaces

  • noisy_surfaces: List of n observed (noisy) surfaces

  • response: Vector of n response values

  • grid_x: x-axis grid points

  • grid_y: y-axis grid points

  • beta: True coefficient surface

  • stochastic_components: Matrix of a1 and a2 values used for each surface


Plot Beta Estimates with Confidence Intervals

Description

Generates a line plot of Beta estimates with their 95% confidence intervals for a specified curve. This function computes the 95% confidence intervals for Beta estimates using the fitted values and covariance matrix from the vd_fit object. The resulting plot displays the Beta estimates, lower confidence bounds, and upper confidence bounds as separate lines.

Usage

plot_beta_with_ci(object, curve = 1)

Arguments

object

An object of class 'vd_fit' containing the fitted model results.

curve

An integer specifying the row (curve) of Beta to plot. Default is 1.

Value

A ggplot2 object representing the plot of Beta estimates and confidence intervals.

Examples

## Not run: 
if (requireNamespace("ggplot2", quietly = TRUE)) {
  # Assuming `fit` is an object of class 'vd_fit'
  plot_beta_with_ci(fit, curve = 1)
}

## End(Not run)

Estimation of the generalized additive functional regression models for variable domain functional data

Description

The vd_fit function fits generalized additive functional regression models for variable domain functional data.

Usage

vd_fit(formula, data, family = stats::gaussian(), offset = NULL)

Arguments

formula

a formula object with at least one ffvd term.

data

a list object containing the response variable and the covariates as the components of the list.

family

a family object specifying the distribution from which the data originates. The default distribution is gaussian.

offset

An offset vector. The default value is NULL.

Value

An object of class vd_fit. It is a list containing the following items:

  • An item named fit of class sop. See sop.fit.

  • An item named Beta which is the estimated functional coefficient.

  • An item named theta which is the basis coefficient of Beta.

  • An item named covar_theta which is the covariance matrix of theta.

  • An item named M which is the number of observations points for each curve.

  • An item named ffvd_evals which is the result of the evaluations of the ffvd terms in the formula.

See Also

ffvd

Examples

# VARIABLE DOMAIN FUNCTIONAL DATA EXAMPLE

# set seed for reproducibility
set.seed(42)

# generate example data
data <- data_generator_vd(
  N = 100,
  J = 100,
  beta_index = 1,
  use_x = TRUE,
  use_f = TRUE,
)

# Define a formula object that specifies the model behavior.
# The formula includes a functional form of the variable 'X_se' using 'ffvd'
# with a non-default number of basis functions ('nbasis' is set to c(10, 10, 10)).
# Additionally, it includes a smooth function 'f' applied to 'x2' with 10 segments ('nseg = 10'),
# a second-order penalty ('pord = 2'), and cubic splines ('degree = 3').
# The model also contains the linear term 'x1'.
formula <- y ~ ffvd(X_se, nbasis = c(10, 10, 10)) + f(x2, nseg = 10, pord = 2, degree = 3) + x1

# We can fit the model using the data and the formula
res <- vd_fit(formula = formula, data = data)

# Some important parameters of the model can be accesed as follows
res$Beta # variable domain functional coefficient
res$fit$fitted.values # estimated response variable

# Also, a summary of the fit can be accesed using the summary function
summary(res)

# And a heatmap for an specific beta can be obtained using the plot function
plot(res, beta_index = 1)