Title: | Working with and Analyzing Functional Data of Varying Lengths |
---|---|
Description: | Comprehensive set of tools for analyzing and manipulating functional data with non-uniform lengths. This package addresses two common scenarios in functional data analysis: Variable Domain Data, where the observation domain differs across samples, and Partially Observed Data, where observations are incomplete over the domain of interest. 'VDPO' enhances the flexibility and applicability of functional data analysis in 'R'. See Amaro et al. (2024) <doi:10.48550/arXiv.2401.05839>. |
Authors: | Pavel Hernandez [aut, cre], Jose Ignacio Diez [ctr], Maria Durban [ctb], Maria del Carmen Aguilera-Morillo [ctb] |
Maintainer: | Pavel Hernandez <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-11-25 11:19:07 UTC |
Source: | https://github.com/pavel-hernadez-amaro/vdpo |
It prepared the partially observed data to be inputed in the ffpo
function.
This function should only be used when the bidimensional_grid
parameter of the ffpo
function is FALSE
.
add_grid(df, grid)
add_grid(df, grid)
df |
|
grid |
Grid vector. |
data.frame
with the grid added.
Generates a variable domain functional regression model
data_generator_vd( N = 100, J = 100, nsims = 1, Rsq = 0.95, aligned = TRUE, multivariate = FALSE, beta_index = 1, use_x = FALSE, use_f = FALSE )
data_generator_vd( N = 100, J = 100, nsims = 1, Rsq = 0.95, aligned = TRUE, multivariate = FALSE, beta_index = 1, use_x = FALSE, use_f = FALSE )
N |
Number of subjects. |
J |
Number of maximum observations per subject. |
nsims |
Number of simulations per the simulation study. |
Rsq |
Variance of the model. |
aligned |
If the data that will be generated is aligned or not. |
multivariate |
If TRUE, the data is generated with 2 functional variables. |
beta_index |
Index for the beta. |
use_x |
If the data is generated with x. |
use_f |
If the data is generated with f. |
A list containing the following components:
y: vector
of length N containing the response variable.
X_s: matrix
of non-noisy functional data for the first functional covariate.
X_se: matrix
of noisy functional data for the first functional covariate
Y_s: matrix
of non-noisy functional data for the second functional covariate (if multivariate).
Y_se: matrix
of noisy functional data for the second covariate (if multivariate).
x1: vector
of length N containing the non-functional covariate (if use_x is TRUE).
x2: vector
of length N containing the observed values of the smooth term (if use_f is TRUE).
smooth_term: vector
of length N containing a smooth term (if use_f is TRUE).
Beta: array
containing the true functional coefficients.
# Basic usage with default parameters sim_data <- data_generator_vd() # Generate data with non-aligned domains non_aligned_data <- data_generator_vd(N = 150, J = 120, aligned = FALSE) # Generate multivariate functional data multivariate_data <- data_generator_vd(N = 200, J = 100, multivariate = TRUE) # Generate data with non-functional covariates and smooth term complex_data <- data_generator_vd( N = 100, J = 150, use_x = TRUE, use_f = TRUE ) # Generate data with a different beta function and R-squared value custom_beta_data <- data_generator_vd( N = 80, J = 80, beta_index = 2, Rsq = 0.8 ) # Access components of the generated data y <- sim_data$y # Response variable X_s <- sim_data$X_s # Noise-free functional covariate X_se <- sim_data$X_se # Noisy functional covariate
# Basic usage with default parameters sim_data <- data_generator_vd() # Generate data with non-aligned domains non_aligned_data <- data_generator_vd(N = 150, J = 120, aligned = FALSE) # Generate multivariate functional data multivariate_data <- data_generator_vd(N = 200, J = 100, multivariate = TRUE) # Generate data with non-functional covariates and smooth term complex_data <- data_generator_vd( N = 100, J = 150, use_x = TRUE, use_f = TRUE ) # Generate data with a different beta function and R-squared value custom_beta_data <- data_generator_vd( N = 80, J = 80, beta_index = 2, Rsq = 0.8 ) # Access components of the generated data y <- sim_data$y # Response variable X_s <- sim_data$X_s # Noise-free functional covariate X_se <- sim_data$X_se # Noisy functional covariate
Auxiliary function used to define ffpo
terms within VDPO
model
formulae.
ffpo(X, grid, bidimensional_grid = FALSE, nbasis = c(30, 30), bdeg = c(3, 3))
ffpo(X, grid, bidimensional_grid = FALSE, nbasis = c(30, 30), bdeg = c(3, 3))
X |
partially observed functional covariate |
grid |
observation grid of the covariate. |
bidimensional_grid |
boolean value that specifies if the grid should
be treated as 1-dimensional or 2-dimensional. The default value is
|
nbasis |
number of basis to be used. |
bdeg |
degree of the basis to be used. |
When the same observation points are used for every functional covariate, we end up with a vector observation grid. Imagine plotting multiple curves, each representing a functional covariate, all measured at the same time instances.
Conversely, if the observation points differ for each functional covariate, we have a matrix observation grid. Picture a matrix where each row represents a functional covariate, and the columns denote distinct observation points. Varying observation points introduce complexity, as each covariate might be sampled at different time instances.
the function is interpreted in the formula of a VDPO
model.
list
containing the following elements:
B_ffpo
design matrix.
Phi
B-spline basis used for the functional coefficient.
M
vector
or matrix
object indicating the observed domain
of the data.
nbasis
number of the basis used.
Auxiliary function used to define ffpo_2d
terms within VDPO
model
formulae.
ffpo_2d(X, miss_points, missing_points, nbasis = rep(15, 4), bdeg = rep(3, 4))
ffpo_2d(X, miss_points, missing_points, nbasis = rep(15, 4), bdeg = rep(3, 4))
X |
partially observed bidimensional functional covariate |
miss_points , missing_points
|
|
nbasis |
number of basis to be used. |
bdeg |
degree of the basis to be used. |
The difference between miss_points and missing_points is the format in which the data is presented.
miss_points
is a list
of list
s where each inner list corresponds
to the observation points in the y-axis and contains the observation points
of the missing values for the x-axis. miss_points
acts as a guide for
identifying and addressing missing observations in functional data and is used
for properly calculating the inner product matrix.
missing_points
is a list
where each element is a matrix
containing the missing observations points.
The function is interpreted in the formula of a VDPO
model.
list
containing the following elements:
B_ffpo2d
design matrix.
Phi_ffpo2d
bidimensional B-spline basis used for the functional coefficient.
M_ffpo2d
the missing_points
used as input in the function.
nbasis
number of the basis used.
Auxiliary function used to define ffvd
terms within vd_fit
model formulae.
This term represents a functional predictor where each function is observed over a domain of varying length.
The formulation is , where
is a functional covariate of length
, and
is an unknown bivariate functional coefficient.
The functional basis used to model this term is the B-spline basis.
ffvd(X, grid, nbasis = c(30, 50, 30), bdeg = c(3, 3, 3))
ffvd(X, grid, nbasis = c(30, 50, 30), bdeg = c(3, 3, 3))
X |
variable domain functional covariate |
grid |
observation points of the variable domain functional covariate.
If not provided, it will be |
nbasis |
number of bspline basis to be used. |
bdeg |
degree of the bspline basis used. |
the function is interpreted in the formula of a VDPO
model.
list
containing the following elements:
An item named B
design matrix.
An item named X_hat
smoothed functional covariate.
An item named L_Phi
and B_T
1-dimensional marginal B-spline basis used for the functional coefficient.
An item named M
matrix object indicating the observed domain of the data.
An item named nbasis
number of basis used.
# Generate sample data set.seed(123) data <- data_generator_vd(beta_index = 1, use_x = FALSE, use_f = FALSE) X <- data$X_se # Specifying a custom grid custom_grid <- seq(0, 1, length.out = ncol(X)) ffvd_term_custom_grid <- ffvd(X, grid = custom_grid, nbasis = c(10, 10, 10)) # Customizing the number of basis functions ffvd_term_custom_basis <- ffvd(X, nbasis = c(10, 10, 10)) # Customizing both basis functions and degrees ffvd_term_custom <- ffvd(X, nbasis = c(10, 10, 10), bdeg = c(3, 3, 3))
# Generate sample data set.seed(123) data <- data_generator_vd(beta_index = 1, use_x = FALSE, use_f = FALSE) X <- data$X_se # Specifying a custom grid custom_grid <- seq(0, 1, length.out = ncol(X)) ffvd_term_custom_grid <- ffvd(X, grid = custom_grid, nbasis = c(10, 10, 10)) # Customizing the number of basis functions ffvd_term_custom_basis <- ffvd(X, nbasis = c(10, 10, 10)) # Customizing both basis functions and degrees ffvd_term_custom <- ffvd(X, nbasis = c(10, 10, 10), bdeg = c(3, 3, 3))
Creates synthetic 1D functional data with optional noise components and different coefficient patterns. Uses trapezoidal rule for integration.
generate_1d_po_functional_data( n = 100, grid_points = 100, noise_sd = 0.015, rsq = 0.95, beta_type = c("sin", "gaussian"), n_missing = 1, min_distance = NULL )
generate_1d_po_functional_data( n = 100, grid_points = 100, noise_sd = 0.015, rsq = 0.95, beta_type = c("sin", "gaussian"), n_missing = 1, min_distance = NULL )
n |
Number of samples to generate |
grid_points |
Number of points in the grid. Default is 100 |
noise_sd |
Standard deviation of measurement noise. Default is 0.015 |
rsq |
Desired R-squared value for the response. Default is 0.95 |
beta_type |
Type of coefficient function ("sin" or "gaussian"). Default is "sin" |
n_missing |
Number of missing segments per curve. Default is 1 |
min_distance |
Minimum length of missing segments. Default is NULL (auto-calculated) |
A list containing:
curves: List of n true (noiseless) curves
noisy_curves: List of n observed (noisy) curves
noisy_curves_miss: List containing curves with missing values
response: Vector of n response values
grid: Grid points
beta: True coefficient function
stochastic_components: Vector of a values used for each curve
Creates synthetic 2D functional data with optional noise components and different coefficient patterns. Uses Simpson's rule for accurate integration.
generate_2d_po_functional_data( n = 20, grid_x = 20, grid_y = 20, noise_sd = 0.015, rsq = 0.95, beta_type = c("saddle", "exp"), response_type = c("gaussian", "binomial"), a1 = NULL, a2 = NULL, sub_response = 50, n_missing = 1, min_distance_x = NULL, min_distance_y = NULL )
generate_2d_po_functional_data( n = 20, grid_x = 20, grid_y = 20, noise_sd = 0.015, rsq = 0.95, beta_type = c("saddle", "exp"), response_type = c("gaussian", "binomial"), a1 = NULL, a2 = NULL, sub_response = 50, n_missing = 1, min_distance_x = NULL, min_distance_y = NULL )
n |
Number of samples to generate. |
grid_x |
Number of points in x-axis grid. Default is 20. |
grid_y |
Number of points in y-axis grid. Default is 20. |
noise_sd |
Standard deviation of measurement noise. Default is 0.015. |
rsq |
Desired R-squared value for the response. Default is 0.95. |
beta_type |
Type of coefficient surface ("saddle" or "exp"). Default is "saddle". |
response_type |
Type of the response variable ("gaussian" or "binomial"). Default is "gaussian". |
a1 |
Optional fixed value for first stochastic component. If provided, a2 must also be provided. |
a2 |
Optional fixed value for second stochastic component. If provided, a1 must also be provided. |
sub_response |
Number of intervals for Simpson integration. Default is 50. |
n_missing |
Number of holes in every curve. |
min_distance_x |
Length of the holes in the x axis. |
min_distance_y |
Length of the holes in the y axis. |
A list containing:
surfaces: List of n true (noiseless) surfaces
noisy_surfaces: List of n observed (noisy) surfaces
response: Vector of n response values
grid_x: x-axis grid points
grid_y: y-axis grid points
beta: True coefficient surface
stochastic_components: Matrix of a1 and a2 values used for each surface
Generates a line plot of Beta estimates with their 95% confidence intervals for a specified curve.
This function computes the 95% confidence intervals for Beta estimates using the fitted values and
covariance matrix from the vd_fit
object. The resulting plot displays the Beta estimates, lower confidence bounds,
and upper confidence bounds as separate lines.
plot_beta_with_ci(object, curve = 1)
plot_beta_with_ci(object, curve = 1)
object |
An object of class |
curve |
An integer specifying the row (curve) of Beta to plot. Default is 1. |
A ggplot2
object representing the plot of Beta estimates and confidence intervals.
## Not run: if (requireNamespace("ggplot2", quietly = TRUE)) { # Assuming `fit` is an object of class 'vd_fit' plot_beta_with_ci(fit, curve = 1) } ## End(Not run)
## Not run: if (requireNamespace("ggplot2", quietly = TRUE)) { # Assuming `fit` is an object of class 'vd_fit' plot_beta_with_ci(fit, curve = 1) } ## End(Not run)
The vd_fit
function fits generalized additive functional regression models
for variable domain functional data.
vd_fit(formula, data, family = stats::gaussian(), offset = NULL)
vd_fit(formula, data, family = stats::gaussian(), offset = NULL)
formula |
a formula object with at least one |
data |
a |
family |
a |
offset |
An offset vector. The default value is |
An object of class vd_fit
. It is a list
containing the following items:
An item named fit
of class sop
. See sop.fit.
An item named Beta
which is the estimated functional coefficient.
An item named theta
which is the basis coefficient of Beta
.
An item named covar_theta
which is the covariance matrix of theta
.
An item named M
which is the number of observations points for each curve.
An item named ffvd_evals
which is the result of the evaluations of the ffvd
terms in the formula.
# VARIABLE DOMAIN FUNCTIONAL DATA EXAMPLE # set seed for reproducibility set.seed(42) # generate example data data <- data_generator_vd( N = 100, J = 100, beta_index = 1, use_x = TRUE, use_f = TRUE, ) # Define a formula object that specifies the model behavior. # The formula includes a functional form of the variable 'X_se' using 'ffvd' # with a non-default number of basis functions ('nbasis' is set to c(10, 10, 10)). # Additionally, it includes a smooth function 'f' applied to 'x2' with 10 segments ('nseg = 10'), # a second-order penalty ('pord = 2'), and cubic splines ('degree = 3'). # The model also contains the linear term 'x1'. formula <- y ~ ffvd(X_se, nbasis = c(10, 10, 10)) + f(x2, nseg = 10, pord = 2, degree = 3) + x1 # We can fit the model using the data and the formula res <- vd_fit(formula = formula, data = data) # Some important parameters of the model can be accesed as follows res$Beta # variable domain functional coefficient res$fit$fitted.values # estimated response variable # Also, a summary of the fit can be accesed using the summary function summary(res) # And a heatmap for an specific beta can be obtained using the plot function plot(res, beta_index = 1)
# VARIABLE DOMAIN FUNCTIONAL DATA EXAMPLE # set seed for reproducibility set.seed(42) # generate example data data <- data_generator_vd( N = 100, J = 100, beta_index = 1, use_x = TRUE, use_f = TRUE, ) # Define a formula object that specifies the model behavior. # The formula includes a functional form of the variable 'X_se' using 'ffvd' # with a non-default number of basis functions ('nbasis' is set to c(10, 10, 10)). # Additionally, it includes a smooth function 'f' applied to 'x2' with 10 segments ('nseg = 10'), # a second-order penalty ('pord = 2'), and cubic splines ('degree = 3'). # The model also contains the linear term 'x1'. formula <- y ~ ffvd(X_se, nbasis = c(10, 10, 10)) + f(x2, nseg = 10, pord = 2, degree = 3) + x1 # We can fit the model using the data and the formula res <- vd_fit(formula = formula, data = data) # Some important parameters of the model can be accesed as follows res$Beta # variable domain functional coefficient res$fit$fitted.values # estimated response variable # Also, a summary of the fit can be accesed using the summary function summary(res) # And a heatmap for an specific beta can be obtained using the plot function plot(res, beta_index = 1)