Title: | Practical Smoothing with P-Splines |
---|---|
Description: | Functions and data to reproduce all plots in the book "Practical Smoothing. The Joys of P-splines" by Paul H.C. Eilers and Brian D. Marx (2021, ISBN:978-1108482950). |
Authors: | Paul Eilers [aut, cre], Brian Marx [aut], Bin Li [aut], Jutta Gampe [aut], Maria Xose Rodriguez-Alvarez [aut] |
Maintainer: | Paul Eilers <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.1.19 |
Built: | 2025-02-26 04:15:56 UTC |
Source: | https://github.com/cran/JOPS |
Compute a B-spline basis matrix using evenly spaced knots.
bbase(x, xl = min(x), xr = max(x), nseg = 10, bdeg = 3)
bbase(x, xl = min(x), xr = max(x), nseg = 10, bdeg = 3)
x |
a vector of argument values, at which the B-spline basis functions are to be evaluated. |
xl |
the lower limit of the domain of x; default is |
xr |
the upper limit of the domain of x; default is |
nseg |
the number of equally sized segments between xl and xr; default is 10. |
bdeg |
the degree of the splines, usually 1, 2, or 3 (default). |
If xl
is larger than min(x)
, it will be adjusted to min(x)
and a warning wil be given.
If xr
is smaller than max(x)
, it will be adjusted to max(x)
and a warning wil be given.
The values of the design parameters x, xl, xr, ndeg, bdeg
and type = 'bbase'
are added to the list of attributes of the matrix.
A matrix with length(x)
rows and nseg + bdeg
columns.
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder), Statistical Science, 11: 89-121.
Eilers, P.H.C. and B.D. Marx (2010). Splines, knots and penalties. Wiley Interdisciplinary Reviews: Computational Statistics. Wiley: NY. DOI: 10.1002/wics.125
# Compute and plot a B-spline basis matrix x = seq(0, 360, by = 2) B = bbase(x, 0, 360, nseg = 8, bdeg = 3) matplot(x, B, type = 'l', lty = 1, lwd = 2, xlab = 'x', ylab = '')
# Compute and plot a B-spline basis matrix x = seq(0, 360, by = 2) B = bbase(x, 0, 360, nseg = 8, bdeg = 3) matplot(x, B, type = 'l', lty = 1, lwd = 2, xlab = 'x', ylab = '')
Translates number vector to bin index, given lower and upper limits of the domain and number of bins. A support function for (smoothing) histograms.
binit(x, xmin = min(x), xmax = max(x), nbin = 100)
binit(x, xmin = min(x), xmax = max(x), nbin = 100)
x |
a numerical vector. |
xmin |
the lower limit of the domain. |
xmax |
the upper limit of the domain. |
nbin |
the number of bins (default=100). |
A list with components:
xbin |
a vector of |
xgrid |
a vector of |
nbin |
the number of bins. |
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Relative spinal bone mineral density measurements on 261 North
American adolescents. Each value is the difference in spnbmd
taken on two consecutive visits, divided by the average. The age is
the average age over the two visits.
data(bone_data)
data(bone_data)
A dataframe with four columns:
idnum
ID of the child
age
age
gender
male or female
spnbmd
Relative Spinal bone mineral density.
https://web.stanford.edu/~hastie/ElemStatLearn/datasets/bone.data
Bachrach, L.K., Hastie, T., Wang, M.-C., Narasimhan, B., Marcus, R. (1999). Bone Mineral Acquisition in Healthy Asian, Hispanic, Black and Caucasian Youth. A Longitudinal Study. J Clin Endocrinol Metab 84, 4702-12.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Computes a circular B-spline basis matrix using evenly spaced knots.
cbase(x, xl = min(x), xr = max(x), nseg = 10, bdeg = 3)
cbase(x, xl = min(x), xr = max(x), nseg = 10, bdeg = 3)
x |
a vector of argument values, at which the B-spline basis functions are to be evaluated. |
xl |
the lower limit of the domain of x; default is |
xr |
the upper limit of the domain of x; default is |
nseg |
the number of B-spline segments (default 10) between xl and xr. |
bdeg |
the degree of the basis, usually 1, 2, or 3 (default). |
If xl
is larger than min(x)
, it wil be adjusted to min(x)
and a warning wil be given.
If xr
is smaller than max(x)
, it wil be adjusted to max(x)
and a warning wil be given.
The design parameters x, xl, xr, ndeg, bdeg
and type = 'cbase'
are added to the list of attributes.
In a circular basis, the B-splines are wrapped around the boundaries of the domain. Use a circular basis for data
like directions or angles. It should be combined with a circular penalty matrix, as computed by cdiff()
.
A matrix with length(x)
rows and nseg
columns.
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
# Compute and plot a circular B-spline basis matrix x = seq(0, 360, by = 2) B = cbase(x, 0, 360, nseg = 8, bdeg = 3) matplot(x, B, type = 'l', lty = 1, lwd = 2, xlab = 'x', ylab = '') title('Note how the ends connect smoothly meet at boundaries' )
# Compute and plot a circular B-spline basis matrix x = seq(0, 360, by = 2) B = cbase(x, 0, 360, nseg = 8, bdeg = 3) matplot(x, B, type = 'l', lty = 1, lwd = 2, xlab = 'x', ylab = '') title('Note how the ends connect smoothly meet at boundaries' )
Compute difference matrix used for circular penalities.
cdiff(n)
cdiff(n)
n |
number of rows (and columns) of the square differencing matrix. |
A square matrix with n
rows and columns.
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
# Compare standard and circular differencing matrix n = 8 D1 = diff(diag(n), diff = 2) D2 = cdiff(n) oldpar = par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow = c(1, 2)) image(t(D1)) title('Linear differencing matrix') image(t(D2)) title('Circular differencing matrix')
# Compare standard and circular differencing matrix n = 8 D1 = diff(diag(n), diff = 2) D2 = cdiff(n) oldpar = par(no.readonly = TRUE) on.exit(par(oldpar)) par(mfrow = c(1, 2)) image(t(D1)) title('Linear differencing matrix') image(t(D2)) title('Circular differencing matrix')
A crude simulation of comparative genomic hybridization (CGH) data.
data(CGHsim)
data(CGHsim)
A data frame with 400 rows and two columns:
y
Log R ratio
x
Genomic position (but in fact the row number).
The simulation program could not be located anymore. But the data have a very simple structure.
Extract basis parameters from an existing B-splines basis matrix,
and use them for computing a new basis at new values of x
.
clone_base(B, x)
clone_base(B, x)
B |
a B-splines basis matrix, computed with |
x |
a vector of new argument values. |
If values in x
are outside the domain used for computing B
, they will be discarded, with a warning.
A matrix with number of rows=length(xnew)
.
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
x = seq(0, 10, length = 20) n = length(x) y = sin(x / 2) + rnorm(n) * 0.2 B = bbase(x) nb = ncol(B) D = diff(diag(nb), diff = 2) lambda = 1 a = solve(t(B) %*% B + lambda * t(D)%*% D, t(B) %*% y) # Clone basis on finer grid xg = seq(0, 10, length = 200) Bg = clone_base(B, xg) yg = Bg %*% a plot(x, y) lines(xg, yg, col = 'blue')
x = seq(0, 10, length = 20) n = length(x) y = sin(x / 2) + rnorm(n) * 0.2 B = bbase(x) nb = ncol(B) D = diff(diag(nb), diff = 2) lambda = 1 a = solve(t(B) %*% B + lambda * t(D)%*% D, t(B) %*% y) # Clone basis on finer grid xg = seq(0, 10, length = 200) Bg = clone_base(B, xg) yg = Bg %*% a plot(x, y) lines(xg, yg, col = 'blue')
Environmental complaints about odors from the Rijnmond region (near Rotterdam in the Netherlands) in 1988.
data(Complaints)
data(Complaints)
A dataframe with two columns:
freq
The daily number of complaints.
count
The number of days the specific complaint frequency occurred.
In 1988, the Rijnmond Environmental Agency registered approximately 20,000 complaints about odors from regional inhabitants.
Personal information from Paul Eilers.
plot(Complaints$freq, Complaints$count, type = 'h', xlab = 'Number of complaints per day', ylab = 'Frequency')
plot(Complaints$freq, Complaints$count, type = 'h', xlab = 'Number of complaints per day', ylab = 'Frequency')
Count the number of occurrences of pairs of positive integers in two vectors, producing a matrix.
count2d(xb, yb, nb)
count2d(xb, yb, nb)
xb |
a vector of integers. |
yb |
a vector of integers. |
nb |
a vector of length 2 that provides the number of bins for the 2D histogram on |
This function builds a two-dimensional histogram, based on two two vectors of bin numbers (obtained
with binit
). Rows where x[i] > nb[1]
or y[i] > nb[2]
are discarded without a warning.
A matrix with nb[1]
rows and nb[2]
columns with counts.
It serves as the input for two-dimensional histogram smoothing.
Calculates the deviance and returns the ML estimated dispersion parameter for a variety of response distributions for P-spline fitting within the GLM framework.
dev_calc( family = "gaussian", y, mu, m_binomial = 0 * y + 1, r_gamma = 0 * y + 1 )
dev_calc( family = "gaussian", y, mu, m_binomial = 0 * y + 1, r_gamma = 0 * y + 1 )
family |
the response distribution, e.g. |
y |
the glm response vector of length |
mu |
the P-spline estimated mean for the glm response vector of length |
m_binomial |
a vector of binomial trials having |
r_gamma |
a vector of gamma shape parameters, when |
A list with two fields:
dev |
the estimated deviance. |
dispersion_parm |
the ML estimated dispersion parameter. |
Prices and capacities of hard disk drives, as advertised in a Dutch computer monthly in 1999. Prices are given in Dutch guilders; the Euro did not yet exist.
data(Disks)
data(Disks)
A dataframe with six columns:
Year
1999-2000
Month
month, 1-12
Size
capacity in Gb
Buffer
buffer size (Mb)
RPM
rotating speed (rpm)
PriceDG
in Dutch Guilders, divide by 2.2 for Euro.
Personal information from Paul Eilers.
The data set includes two signals, respiration and the ECG. Both signals are distorted by strong 60Hz interference from the mains power.
data(ECG)
data(ECG)
A data frame with three columns:
time in seconds
respiration, arbitrary units
ECG, arbitrary units.
https://physionet.org/content/fantasia/1.0.0/
Iyengar N, Peng C-K, Morin R, Goldberger AL, Lipsitz LA. Age-related alterations in the fractal scaling of cardiac interbeat interval dynamics. Am J Physiol, 1996; 271: 1078-1084.
Standard citation for PhysioNet: Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals (2003). Circulation. 101(23):e215-e220.
There are two functions for fitting the expectile bundle model, one for estimating asymmetry parameters (fitasy
),
the other for estimating the amplitude function, fitampl
, this function. See the details below.
fitampl(y, B, alpha, p, a, pord = 2, lambda)
fitampl(y, B, alpha, p, a, pord = 2, lambda)
y |
a response vector. |
B |
a proper B-spline basis matrix, see |
alpha |
a vector of B-spline coefficients. |
p |
a vector of asymmetries. |
a |
a vector of asymmetry parameters. |
pord |
the order of the difference penalty, default is 2. |
lambda |
the positive tuning parameter for the penalty. |
The expectile bundle model determines a set of expectile curves for a point cloud with data vectors x
and y
,
as . Here
is the asymmetry parameter corresponding to a given asymmetry
.
A vector of asymmetries with all
is specified by the user.
The asymmetric least squares objective function is
The function is called the amplitude. The weights depend on the residuals:
if and
otherwise.
The amplitude function is a sum of B-splines with coefficients alpha
. There is no direct solution, so alpha
and the asymmetry parameters a
must be updated alternatingly. See the example.
a vector of estimated B-spline coefficients.
This is a simplification of the model described in the reference. There is no explict term for the trend.
Paul Eilers
Schnabel, S.K. and Eilers, P.H.C. (2013) A location-scale model for non-crossing expectile curves. Stat 2: 171–183.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
# Get the data data(bone_data) x = bone_data$age y = bone_data$spnbmd m <- length(x) # Set asymmetry levels p = c(0.005, 0.01, 0.02, 0.05, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99, 0.995) np <- length(p) # Set P-spline parameters x0 <- 5 x1 <- 30 ndx <- 20 bdeg <- 3 pord <- 2 # Compute bases B <- bbase(x, x0, x1, ndx, bdeg) xg <- seq(from = min(x), to = max(x), length = 100) Bg <- clone_base(B, xg) n <- ncol(B) lambda = 1 alpha <- rep(1,n) a = p for (it in 1:20){ alpha <- fitampl(y, B, alpha, p, a, pord, lambda) alpha <- alpha / sqrt(mean(alpha ^ 2)) anew <- fitasy(y, B, alpha, p, a) da = max(abs(a - anew)) a = anew cat(it, da, '\n') if (da < 1e-6) break } # Compute bundle on grid ampl <- Bg %*% alpha Z <- ampl %*% a # Plot data and bundle plot(x, y, pch = 15, cex = 0.7, col = 'grey', xlab = 'Age', ylab = 'Density') cols = colorspace::rainbow_hcl(np, start = 10, end = 350) matlines(xg, Z, lty = 1, lwd = 2, col = cols)
# Get the data data(bone_data) x = bone_data$age y = bone_data$spnbmd m <- length(x) # Set asymmetry levels p = c(0.005, 0.01, 0.02, 0.05, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99, 0.995) np <- length(p) # Set P-spline parameters x0 <- 5 x1 <- 30 ndx <- 20 bdeg <- 3 pord <- 2 # Compute bases B <- bbase(x, x0, x1, ndx, bdeg) xg <- seq(from = min(x), to = max(x), length = 100) Bg <- clone_base(B, xg) n <- ncol(B) lambda = 1 alpha <- rep(1,n) a = p for (it in 1:20){ alpha <- fitampl(y, B, alpha, p, a, pord, lambda) alpha <- alpha / sqrt(mean(alpha ^ 2)) anew <- fitasy(y, B, alpha, p, a) da = max(abs(a - anew)) a = anew cat(it, da, '\n') if (da < 1e-6) break } # Compute bundle on grid ampl <- Bg %*% alpha Z <- ampl %*% a # Plot data and bundle plot(x, y, pch = 15, cex = 0.7, col = 'grey', xlab = 'Age', ylab = 'Density') cols = colorspace::rainbow_hcl(np, start = 10, end = 350) matlines(xg, Z, lty = 1, lwd = 2, col = cols)
There are two functions for fitting the expectile bundle model, the present one for estimating asymmetry parameters (fitasy
),
the other for estimating the amplitude function, fitampl
. See the details below.
fitasy(y, B, b, p, c0)
fitasy(y, B, b, p, c0)
y |
a response vector. |
B |
a proper B-spline basis matrix, see |
b |
a vector of B-spline coefficients. |
p |
a vector of asymmetries with values between 0 and 1. |
c0 |
a vector. |
The expectile bundle model determines a set of expectile curves for a point cloud with data vectors x
and y
,
as . Here
is the asymmetry parameter corresponding to a given asymmetry
.
A vector of asymmetries with all
is specified by the user.
The asymmetric least squares objective function is
The function is called the amplitude. The weights depend on the residuals:
if and
otherwise.
The amplitude function is a sum of B-splines with coefficients alpha
. There is no direct solution, so alpha
and the asymmetry parameters a
must be updated alternatingly. See the example.
a vector of estimated asymmetry parameters .
This is a simplification of the model described in the reference. There is no explict term for the trend.
Paul Eilers
Schnabel, S.K. and Eilers, P.H.C. (2013) A location-scale model for non-crossing expectile curves. Stat 2: 171–183.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
# Get the data data(bone_data) x = bone_data$age y = bone_data$spnbmd m <- length(x) # Set asymmetry levels p = c(0.005, 0.01, 0.02, 0.05, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99, 0.995) np <- length(p) # Set P-spline parameters x0 <- 5 x1 <- 30 ndx <- 20 bdeg <- 3 pord <- 2 # Compute bases B <- bbase(x, x0, x1, ndx, bdeg) xg <- seq(from = min(x), to = max(x), length = 100) Bg <- clone_base(B, xg) n <- ncol(B) lambda = 1 alpha <- rep(1,n) a = p for (it in 1:20){ alpha <- fitampl(y, B, alpha, p, a, pord, lambda) alpha <- alpha / sqrt(mean(alpha ^ 2)) anew <- fitasy(y, B, alpha, p, a) da = max(abs(a - anew)) a = anew cat(it, da, '\n') if (da < 1e-6) break } # Compute bundle on grid ampl <- Bg %*% alpha Z <- ampl %*% a # Plot data and bundle plot(x, y, pch = 15, cex = 0.7, col = 'grey', xlab = 'Age', ylab = 'Density') cols = colorspace::rainbow_hcl(np, start = 10, end = 350) matlines(xg, Z, lty = 1, lwd = 2, col = cols)
# Get the data data(bone_data) x = bone_data$age y = bone_data$spnbmd m <- length(x) # Set asymmetry levels p = c(0.005, 0.01, 0.02, 0.05, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99, 0.995) np <- length(p) # Set P-spline parameters x0 <- 5 x1 <- 30 ndx <- 20 bdeg <- 3 pord <- 2 # Compute bases B <- bbase(x, x0, x1, ndx, bdeg) xg <- seq(from = min(x), to = max(x), length = 100) Bg <- clone_base(B, xg) n <- ncol(B) lambda = 1 alpha <- rep(1,n) a = p for (it in 1:20){ alpha <- fitampl(y, B, alpha, p, a, pord, lambda) alpha <- alpha / sqrt(mean(alpha ^ 2)) anew <- fitasy(y, B, alpha, p, a) da = max(abs(a - anew)) a = anew cat(it, da, '\n') if (da < 1e-6) break } # Compute bundle on grid ampl <- Bg %*% alpha Z <- ampl %*% a # Plot data and bundle plot(x, y, pch = 15, cex = 0.7, col = 'grey', xlab = 'Age', ylab = 'Density') cols = colorspace::rainbow_hcl(np, start = 10, end = 350) matlines(xg, Z, lty = 1, lwd = 2, col = cols)
An extract of the data set G519
in the Bioconductor package Vega
, for chromosome 18.
data(G519C18)
data(G519C18)
A dataframe with two columns:
y
Probe position
x
Log R Ratio
.
https://www.bioconductor.org/packages/release/bioc/html/Vega.html
plot(G519C18$x, G519C18$y, type = 'l', ylab = 'LRR', xlab = 'Position', main = 'Chromosome 18')
plot(G519C18$x, G519C18$y, type = 'l', ylab = 'LRR', xlab = 'Position', main = 'Chromosome 18')
Deaths in Greece in 1960.
data(Greece_deaths)
data(Greece_deaths)
A dataframe with three columns:
Age
0 - 85
Male
male deaths
Female
female deaths.
All counts for ages above 84 have been grouped to one number for age 85.
Personal information from Aris Perperoglou.
Prevalence of Hepatitis among a sample of Bulgarian males.
data(Hepatitis)
data(Hepatitis)
A data frame with three columns:
Age
years
Infected
number of infected persons
Sampled
number of sampled persons.
Table 2 in Keiding (1991).
N. Keiding (1991) Age-Specific Incidence and Prevalence: A Statistical Perspective. JRSS-A 154, 371-396.
Compute a two-dimesnional histogram from two vectors (of the same length), x
and y
.
hist2d(x, y, nb = c(100, 100), xlim = range(x), ylim = range(y))
hist2d(x, y, nb = c(100, 100), xlim = range(x), ylim = range(y))
x |
a numeric vector. |
y |
a numeric vector of the same length as |
nb |
a vector |
xlim |
a vector |
ylim |
a vector |
If nb
is scalar, it is extended to c(nb, nb)
, so that both dimensions will have the same number of bins.
Elements of x
(y
) that fall outside the range specified by xlim
(ylim
) are not counted.
A list with components:
H |
a matrix of dimension |
xgrid |
a vector of length |
ygrid |
a vector of length |
xbin |
a vector giving the bin number of each element of |
ybin |
a vector giving the bin number of each element of |
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
data(faithful) x = faithful$eruptions y = faithful$waiting C = hist2d(x, y, c(50,50)) image(C$xgrid, C$ygrid, C$H, xlab='Eruption length (min)', ylab='Waiting time (min)') title('Old Faithful geyser')
data(faithful) x = faithful$eruptions y = faithful$waiting C = hist2d(x, y, c(50,50)) image(C$xgrid, C$ygrid, C$H, xlab='Eruption length (min)', ylab='Waiting time (min)') title('Old Faithful geyser')
Fit a 2D smooth P-spline surface to a matrix of counts, assuming Poisson distributed observations.
hist2dsm( Y, nsegx = 10, nsegy = nsegx, bdeg = 3, lambdax = 10, lambday = lambdax, dx = 3, dy = dx, Mu = Y + 0.01, kappa = 1e-04, tol = 1e-05 )
hist2dsm( Y, nsegx = 10, nsegy = nsegx, bdeg = 3, lambdax = 10, lambday = lambdax, dx = 3, dy = dx, Mu = Y + 0.01, kappa = 1e-04, tol = 1e-05 )
Y |
a matrix of counts. |
nsegx |
the number of knots along |
nsegy |
the number of evenly spaced knots along |
bdeg |
the degree of the basis, default is 3. |
lambdax |
the positive number for the tuning parameter along |
lambday |
the positive number for the tuning parameter along |
dx |
the order of the difference penalty along |
dy |
the order of the difference penalty along |
Mu |
the initialization of the mean (default |
kappa |
a (small, positive) number for ridge tuning parameter to stabilize estimation (default |
tol |
the convergence criterion (default |
A list with elements:
ed |
the effective dimension of the smooth 2D surface. |
Mu |
a matrix with the smooth estimates, with dimensions of |
pen |
the numerical value of the penalty. |
Paul Eilers
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
x = faithful$eruptions y = faithful$waiting h = hist2d(x, y, c(100, 100)) sm = hist2dsm(h$H, nsegx = 25, nsegy = 25, bdeg = 3, lambdax = 10, lambday = 10) image(h$xgrid, h$ygrid, sm$Mu, xlab = 'Eruption length (min)', ylab = 'Waiting time (min)', main = 'Old Faithful')
x = faithful$eruptions y = faithful$waiting h = hist2d(x, y, c(100, 100)) sm = hist2dsm(h$H, nsegx = 25, nsegy = 25, bdeg = 3, lambdax = 10, lambday = 10) image(h$xgrid, h$ygrid, sm$Mu, xlab = 'Eruption length (min)', ylab = 'Waiting time (min)', main = 'Old Faithful')
An X-ray diffractogram.
data(indiumoxide)
data(indiumoxide)
A matrix with two columns:
angle
the angles (degrees) of diffraction
count
corresponding photon counts.
An X-ray diffractogram of Indium-Tin oxide.
These data have been taken from the source of package Diffractometry
,
which is no longer available from CRAN in binary form.
P.L. Davies, U. Gather, M. Meise, D. Mergel, T. Mildenberger (2008). Residual based localization and quantification of peaks in x-ray diffractograms, Annals of Applied Statistics, Vol. 2, No. 3, 861-886.
angle = indiumoxide[,1] photon = indiumoxide[,2] plot(angle, type = 'l', photon, xlab = 'Angle', ylab = 'Photon count')
angle = indiumoxide[,1] photon = indiumoxide[,2] plot(angle, type = 'l', photon, xlab = 'Angle', ylab = 'Photon count')
Inverse link function, used for GLM fitting.
inverse_link(x, link)
inverse_link(x, link)
x |
scalar, vector, or matrix input. |
link |
the link function, one of |
The inverse link function applied to x
.
If link
is not in the above list of allowed names, NULL
will be returned.
A package for working with and learning about P-splines. P-splines combine B-splines with discrete penalties to build a very flexible and effective smooth models. They can handle non-normal data in the style of generalized linear models.
This package provides functions for constructing B-spline bases and penalty matrices. It solves the penalized likelihood equations efficiently.
Several methods are provided to determine the values of penalty parameters automatically, using cross-validation, AIC, mixed models or fast Bayesian algorithms.
This package is a companion to the book by Eilers and Marx (2021). The book presents the underlying theory and contains many examples and the code R
for each example is available on the website https://psplines.bitbucket.io
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder), Statistical Science, 11: 89-121.
Custom color ramp.
JOPS_colors(n)
JOPS_colors(n)
n |
number of steps. |
custom color ramp.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Custom size and color of points.
JOPS_point(s_size = 1.5)
JOPS_point(s_size = 1.5)
s_size |
point size parameter for ggplot2 (default = 1.5). |
themeing function for ggplot2 features.
Set a ggplot theme in black and white, with centered titles.
Set a ggplot theme in black and white, with centered titles.
JOPS_theme(h_just = 0.5) JOPS_theme(h_just = 0.5)
JOPS_theme(h_just = 0.5) JOPS_theme(h_just = 0.5)
h_just |
horizontal justification for ggplot2. |
custom theme for ggplot.
Custom theming function used to unify ggplot features.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Bayesian density estimation with P-splines and Laplace approximation.
LAPS_dens(B, P, y, loglambdas, tol = 1e-05, mon = FALSE)
LAPS_dens(B, P, y, loglambdas, tol = 1e-05, mon = FALSE)
B |
matrix ( |
P |
penalty matrix ( |
y |
vector (length |
loglambdas |
a vector of values of logarithms of |
tol |
convergence tolerance (relative change in coefficients), default |
mon |
TRUE or FALSE to monitor the iteration history (default FALSE). |
The B-spline basis should be based on the midpoints of the histogram bins. See the example below. This function is based on the paper of Gressani and Lambert (2018) and code input by Oswaldo Gressani.
A list with elements:
alpha |
P-spline coefficients of length |
weights |
weights from the Laplace approximation, which sum to 1 and are
the same length as |
mu |
a vector of length |
Cov |
covariance matrix ( |
lambda |
the penalty parameter. |
ed |
the effective model dimension. |
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Gressani, O. and Lambert, P. (2018). Fast Bayesian inference using Laplace approximations in a flexible promotion time cure model based on P-splines. Computational Statistics and Data Analysis 124, 151-167.
# Smoothing a histogram of Old Faithful eruption durations data(faithful) durations = faithful[, 1] # Eruption length # Histogram with narrow bin widths bw = 0.05 hst = hist(durations, breaks = seq(1, 6, by = bw), plot = TRUE) x = hst$mids y = hst$counts # B-spline basis matrices, for fitting and plotting nseg = 30 B = bbase(x, nseg = nseg) xg = seq(min(x), max(x), by = 0.01) Bg = bbase(xg, nseg = nseg) n = ncol(B) # Penalty matrix D2 = diff(diag(n), diff = 2) P2 = t(D2) %*% D2 # Fit the model loglambs = seq(-1, 2, by = 0.05) laps2 = LAPS_dens(B, P2, y, loglambs, mon = FALSE) fhat2 = exp(Bg %*% laps2$alpha) lines(xg, fhat2, col = "blue", lwd = 2)
# Smoothing a histogram of Old Faithful eruption durations data(faithful) durations = faithful[, 1] # Eruption length # Histogram with narrow bin widths bw = 0.05 hst = hist(durations, breaks = seq(1, 6, by = bw), plot = TRUE) x = hst$mids y = hst$counts # B-spline basis matrices, for fitting and plotting nseg = 30 B = bbase(x, nseg = nseg) xg = seq(min(x), max(x), by = 0.01) Bg = bbase(xg, nseg = nseg) n = ncol(B) # Penalty matrix D2 = diff(diag(n), diff = 2) P2 = t(D2) %*% D2 # Fit the model loglambs = seq(-1, 2, by = 0.05) laps2 = LAPS_dens(B, P2, y, loglambs, mon = FALSE) fhat2 = exp(Bg %*% laps2$alpha) lines(xg, fhat2, col = "blue", lwd = 2)
The mixture data were obtained in an unpublished experiment in 2001 by Zhenyu Wang at University of Amsterdam, under the supervision of Age Smilde. We are grateful for the permission to use the data.
data(Mixture)
data(Mixture)
A list consisting of the following:
fractions
a 34 x 3 matrix of mixure fractions (rows sum to unity):
Water
(subboiled demi water (self made)),
1,2ethanediol
(99.8% Sigma-Aldrich Germany),
3amino1propanol
(99% Merk Schuchardt Germany)
xspectra
spectra array, 34 (observations) x 401 (wavelenths channels) x 12 (temperatures (C): 30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70 )
wl
wavelengths for the spectra, 700 to 1100 (nm), by 1nm.
The following instruments and chemicals were used in the experiment: HP 8453 spectrophotometer (Hewlett-Packard, Palo Alto, CA); 2cm closed quartz cuvette with glass thermostatable jacket; Pt-100 temperature sensor; Neslab microprocessor EX-111 circulator bath; UV-visible Chemstation software (Rev A.02.04) on a Hewlett-Packard Vectra XM2 PC.
Eilers, P. H. C., and Marx, B. D. (2003). Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intellegent Laboratory Systems, 66, 159–174.
Marx, B. D., Eilers, P. H. C., and Li, B. (2011). Multidimensional single-index signal regression. Chemometrics and Intelligent Laboratory Systems, 109(2), 120–130. [see the Appendix within]
Zhenyou Wang and Age Smilde, Univeristy of Amsterdam, The Netherlands. Personal communication.
Ovarian cancer data
data(ova)
data(ova)
A dataframe with five columns:
Diameter
FIGO
Karnofsky
time
death
death
tba
Fit a smooth latent distribution using the penalized composite link model (PCLM).
pclm(y, C, B, lambda = 1, pord = 2, itmax = 50, show = FALSE)
pclm(y, C, B, lambda = 1, pord = 2, itmax = 50, show = FALSE)
y |
a vector of counts, length |
C |
a composition matrix, |
B |
a B-spline basis matrix, |
lambda |
the penalty parameter. |
pord |
the the order of the difference penalty (default = 2). |
itmax |
the maximum number of iterations (default = 50). |
show |
Set to TRUE or FALSE to display iteration history (default = FALSE). |
The composite link model assumes that , where
is
a latent discrete distribution, usually on a finer grid than that for
.
Note that sum(gamma) == sum(mu)
.
A list with the following items:
alpha |
the estimated B-spline coefficients, length |
gamma |
the estimated latent distribution, length |
mu |
estimated values of |
dev |
the deviance of the model. |
ed |
the effective model dimension. |
aic |
Akaike's Information Criterion. |
Paul Eilers and Jutta Gampe
Eilers, P. H. C. (2007). III-posed problems with counts, the composite link model and penalized likelihood. Statistical Modelling, 7(3), 239–254.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
# Left and right boundaries, and counts, of wide intervals of the data cb <- c( 0, 20, 30, 40, 50, 60) ce <- c(20, 30, 40, 50, 60, 70) y <- c(79, 54, 19, 1, 1, 0) # Construct the composition matrix m <- length(y) n <- max(ce) C <- matrix(0, m, n) for (i in 1:m) C[i, cb[i]:ce[i]] <- 1 mids = (cb + ce) / 2 - 0.5 widths = ce - cb + 1 dens = y / widths / sum(y) x = (1:n) - 0.5 B = bbase(x) fit = pclm(y, C, B, lambda = 2, pord = 2, show = TRUE) gamma = fit$gamma / sum(fit$gamma) # Plot density estimate and data plot(x, gamma, type = 'l', lwd = 2, xlab = "Lead Concentration", ylab = "Density") rect(cb, 0, ce, dens, density = rep(10, 6), angle = rep(45, 6))
# Left and right boundaries, and counts, of wide intervals of the data cb <- c( 0, 20, 30, 40, 50, 60) ce <- c(20, 30, 40, 50, 60, 70) y <- c(79, 54, 19, 1, 1, 0) # Construct the composition matrix m <- length(y) n <- max(ce) C <- matrix(0, m, n) for (i in 1:m) C[i, cb[i]:ce[i]] <- 1 mids = (cb + ce) / 2 - 0.5 widths = ce - cb + 1 dens = y / widths / sum(y) x = (1:n) - 0.5 B = bbase(x) fit = pclm(y, C, B, lambda = 2, pord = 2, show = TRUE) gamma = fit$gamma / sum(fit$gamma) # Plot density estimate and data plot(x, gamma, type = 'l', lwd = 2, xlab = "Lead Concentration", ylab = "Density") rect(cb, 0, ce, dens, density = rep(10, 6), angle = rep(45, 6))
ps2DGLM
Plotting function for 2D P-spline (GLM) smooothing
(using ps2DGLM
with class ps2dglm
).
## S3 method for class 'ps2dglm' plot(x, ..., xlab = " ", ylab = " ", Resol = 100, se = 2)
## S3 method for class 'ps2dglm' plot(x, ..., xlab = " ", ylab = " ", Resol = 100, se = 2)
x |
the P-spline object, usually from |
... |
other parameters. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
Resol |
resolution for plotting, default |
se |
a scalar, e.g. |
Plot |
a plot of the mean (inverse link) 2D P-spline (GLM) smooth surface. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
library(fields) library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age Start <- kyphosis$Start y <- 1 * (Kyphosis == "present") # make y 0/1 fit <- ps2DGLM( Data = cbind(Start, Age, y), Pars = rbind(c(1, 18, 10, 3, .1, 2), c(1, 206, 10, 3, .1, 2)), family = "binomial" ) plot(fit, xlab = "Start", ylab = "Age") #title(main = "Probability of Kyphosis")
library(fields) library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age Start <- kyphosis$Start y <- 1 * (Kyphosis == "present") # make y 0/1 fit <- ps2DGLM( Data = cbind(Start, Age, y), Pars = rbind(c(1, 18, 10, 3, .1, 2), c(1, 206, 10, 3, .1, 2)), family = "binomial" ) plot(fit, xlab = "Start", ylab = "Age") #title(main = "Probability of Kyphosis")
ps2DNormal
Plotting function for 2D P-spline smooothing
(using ps2DNormal
with class ps2dnormal
).
## S3 method for class 'ps2dnormal' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
## S3 method for class 'ps2dnormal' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
x |
the P-spline object, usually from |
... |
other parameters. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
Resol |
resolution for plotting, default |
Plot |
a plot of the smooth 2D P-spline smooth surface. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
library(SemiPar) library(fields) library(spam) library(JOPS) # Get the data data(ethanol) x <- ethanol$C y <- ethanol$E z <- ethanol$NOx # Set parameters for domain xlo <- 7 xhi <- 19 ylo <- 0.5 yhi <- 1.25 # Set P-spline parameters, fit and compute surface xpars <- c(xlo, xhi, 10, 3, 3, 1) ypars <- c(ylo, yhi, 10, 3, 3, 1) Pars1 <- rbind(xpars, ypars) fit <- ps2DNormal(cbind(x, y, z), Pars = Pars1) plot(fit, xlab = "C", ylab = "E")
library(SemiPar) library(fields) library(spam) library(JOPS) # Get the data data(ethanol) x <- ethanol$C y <- ethanol$E z <- ethanol$NOx # Set parameters for domain xlo <- 7 xhi <- 19 ylo <- 0.5 yhi <- 1.25 # Set P-spline parameters, fit and compute surface xpars <- c(xlo, xhi, 10, 3, 3, 1) ypars <- c(ylo, yhi, 10, 3, 3, 1) Pars1 <- rbind(xpars, ypars) fit <- ps2DNormal(cbind(x, y, z), Pars = Pars1) plot(fit, xlab = "C", ylab = "E")
ps2DSignal
Plotting function for 2D P-spline signal regression
coefficients (using ps2DSignal
with class ps2dsignal
). Although
standard error surface bands
can be comuputed they are intentially left out as they are not
interpretable, and there is generally little data to steer
such a high-dimensional parameterization.
## S3 method for class 'ps2dsignal' plot(x, ..., xlab = " ", ylab = " ", Resol = 200)
## S3 method for class 'ps2dsignal' plot(x, ..., xlab = " ", ylab = " ", Resol = 200)
x |
the P-spline object, usually from |
... |
other parameters. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
Resol |
Resolution of bgrid (default |
Plot |
a plot of the 2D P-spline signal coefficent surface. |
Paul Eilers and Brian Marx
Marx, B.D. and Eilers, P.H.C. (2005). Multidimensional penalized signal regression, Technometrics, 47: 13-22.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fields) library(JOPS) # Get the data x0 <- Sugar$X x0 <- x0 - apply(x0, 1, mean) # center Signal y <- as.vector(Sugar$y[, 3]) # Response is Ash # Inputs for two-dimensional signal regression nseg <- c(7, 37) pord <- c(3, 3) min_ <- c(230, 275) max_ <- c(340, 560) M1_index <- rev(c(340, 325, 305, 290, 255, 240, 230)) M2_index <- seq(from = 275, to = 560, by = .5) p1 <- length(M1_index) p2 <- length(M2_index) # Fit optimal model based on LOOCV opt_lam <- c(8858.6679, 428.1332) # Found via svcm Pars_opt <- rbind( c(min_[1], max_[1], nseg[1], 3, opt_lam[1], pord[1]), c(min_[2], max_[2], nseg[2], 3, opt_lam[2], pord[2])) fit <- ps2DSignal(y, x0, p1, p2, "unfolded", M1_index, M2_index, Pars_opt, int = FALSE, ridge_adj = 1e-4 ) # Plotting coefficient image plot(fit)
library(fields) library(JOPS) # Get the data x0 <- Sugar$X x0 <- x0 - apply(x0, 1, mean) # center Signal y <- as.vector(Sugar$y[, 3]) # Response is Ash # Inputs for two-dimensional signal regression nseg <- c(7, 37) pord <- c(3, 3) min_ <- c(230, 275) max_ <- c(340, 560) M1_index <- rev(c(340, 325, 305, 290, 255, 240, 230)) M2_index <- seq(from = 275, to = 560, by = .5) p1 <- length(M1_index) p2 <- length(M2_index) # Fit optimal model based on LOOCV opt_lam <- c(8858.6679, 428.1332) # Found via svcm Pars_opt <- rbind( c(min_[1], max_[1], nseg[1], 3, opt_lam[1], pord[1]), c(min_[2], max_[2], nseg[2], 3, opt_lam[2], pord[2])) fit <- ps2DSignal(y, x0, p1, p2, "unfolded", M1_index, M2_index, Pars_opt, int = FALSE, ridge_adj = 1e-4 ) # Plotting coefficient image plot(fit)
psNormal
, psPoisson
, psBinomial
Plotting function for P-spline smooth with normal, Poisson, or binomial responses
(class pspfit
), with or without standard error bands.
## S3 method for class 'pspfit' plot(x, ..., se = 2, xlab = "", ylab = "", col = "black", pch = 1)
## S3 method for class 'pspfit' plot(x, ..., se = 2, xlab = "", ylab = "", col = "black", pch = 1)
x |
the P-spline object, usually from psNormal, psPoisson, psBinomial. |
... |
other parameters. |
se |
a scalar, e.g. |
xlab |
label for the x-axis. |
ylab |
label for the y-axis. |
col |
color for points. |
pch |
point character. |
Plot |
a plot of the mean (inverse link) smoothed normal, Poisson, or binomial responses, with or without se bands. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
library(JOPS) #Extract data library(MASS) # Get the data data(mcycle) x = mcycle$times y = mcycle$accel fit1 = psNormal(x, y, nseg = 20, bdeg = 3, pord = 2, lambda = .8) plot(fit1, se = 2, xlab = "time (ms)", ylab = "accel") library(JOPS) library(boot) # Extract the data Count = hist(boot::coal$date, breaks=c(1851:1963), plot = FALSE)$counts Year = c(1851:1962) xl = min(Year) xr = max(Year) # Poisson smoothing nseg = 20 bdeg = 3 fit1=psPoisson(Year, Count, xl, xr, nseg, bdeg, pord = 2, lambda = 1) names(fit1) plot(fit1, xlab = "Year", ylab = "Count", se = 2) library(JOPS) #Extract data library(rpart) Kyphosis = kyphosis$Kyphosis Age =kyphosis$Age y = 1 * (Kyphosis == "present") # make y 0/1 # Binomial smoothing fit1 = psBinomial(Age, y, xl = min(Age), xr = max(Age), nseg = 20, bdeg = 3, pord = 2, lambda = 1) names(fit1) plot(fit1, xlab = "Age", ylab = '0/1', se = 2)
library(JOPS) #Extract data library(MASS) # Get the data data(mcycle) x = mcycle$times y = mcycle$accel fit1 = psNormal(x, y, nseg = 20, bdeg = 3, pord = 2, lambda = .8) plot(fit1, se = 2, xlab = "time (ms)", ylab = "accel") library(JOPS) library(boot) # Extract the data Count = hist(boot::coal$date, breaks=c(1851:1963), plot = FALSE)$counts Year = c(1851:1962) xl = min(Year) xr = max(Year) # Poisson smoothing nseg = 20 bdeg = 3 fit1=psPoisson(Year, Count, xl, xr, nseg, bdeg, pord = 2, lambda = 1) names(fit1) plot(fit1, xlab = "Year", ylab = "Count", se = 2) library(JOPS) #Extract data library(rpart) Kyphosis = kyphosis$Kyphosis Age =kyphosis$Age y = 1 * (Kyphosis == "present") # make y 0/1 # Binomial smoothing fit1 = psBinomial(Age, y, xl = min(Age), xr = max(Age), nseg = 20, bdeg = 3, pord = 2, lambda = 1) names(fit1) plot(fit1, xlab = "Age", ylab = '0/1', se = 2)
psSignal
Plotting function for signal regression P-spline smooth coefficients (using psSignal
with class pssignal
), with or
without standard error bands.
## S3 method for class 'pssignal' plot(x, ..., se = 2, xlab = "", ylab = "", col = "black", lty = 1)
## S3 method for class 'pssignal' plot(x, ..., se = 2, xlab = "", ylab = "", col = "black", lty = 1)
x |
the P-spline x, usually from |
... |
other parameters. |
se |
a scalar, e.g. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
col |
color. |
lty |
line type for plotting e.g. |
Plot |
a plot of the smooth P-spline signal coefficent vector, with or without standard error bands. |
Paul Eilers and Brian Marx
Marx, B.D. and Eilers, P.H.C. (1999). Generalized linear regression for sampled signals and curves: A P-spline approach. Technometrics, 41(1): 1-13.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Get the data library(fds) data(nirc) iindex=nirc$x X=nirc$y sel= 50:650 #1200 <= x & x<= 2400 X=X[sel, ] iindex=iindex[sel] dX=diff(X) diindex=iindex[-1] y=as.vector(labc[1,1:40]) oout = 23 dX=t(dX[,-oout]) y=y[-oout] fit2 = psSignal(y, dX, diindex, nseg = 25,lambda = 0.0001) plot(fit2, se = 2, xlab = 'Coefficient Index', ylab= "ps Smooth Coeff") title(main='25 B-spline segments with tuning=0.0001') names(fit2)
library(JOPS) # Get the data library(fds) data(nirc) iindex=nirc$x X=nirc$y sel= 50:650 #1200 <= x & x<= 2400 X=X[sel, ] iindex=iindex[sel] dX=diff(X) diindex=iindex[-1] y=as.vector(labc[1,1:40]) oout = 23 dX=t(dX[,-oout]) y=y[-oout] fit2 = psSignal(y, dX, diindex, nseg = 25,lambda = 0.0001) plot(fit2, se = 2, xlab = 'Coefficient Index', ylab= "ps Smooth Coeff") title(main='25 B-spline segments with tuning=0.0001') names(fit2)
psVCSignal
Plotting function for varying-coefficent signal
regression P-spline smooth coefficients (using psVCSignal
with class psvcsignal
).
Although se surface bands can be comuputed they are intentially left out as they are not
interpretable, and there is generally little data to steer
such a high-dimensional parameterization.
## S3 method for class 'psvcsignal' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
## S3 method for class 'psvcsignal' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
x |
the P-spline object, usually from |
... |
other parameters. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
Resol |
resolution for plotting, default |
Plot |
a two panel plot, one of the 2D P-spline signal coefficient surface and another that displays several slices of the smooth coefficient vectors at fixed levels of the varying index. |
Paul Eilers and Brian Marx
Eilers, P. H. C. and Marx, B. D. (2003). Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intellegent Laboratory Systems, 66, 159–174.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat t_var <- as.vector(labc[4, 1:40]) # percent flour oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] t_var = t_var[-oout] Pars = rbind(c(min(diindex), max(diindex), 25, 3, 1e-7, 2), c(min(t_var), max(t_var), 20, 3, 0.0001, 2)) fit1 <- psVCSignal(y, dX, diindex, t_var, Pars = Pars, family = "gaussian", link = "identity", int = TRUE) plot(fit1, xlab = "Coefficient Index", ylab = "VC: % Flour") names(fit1)
library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat t_var <- as.vector(labc[4, 1:40]) # percent flour oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] t_var = t_var[-oout] Pars = rbind(c(min(diindex), max(diindex), 25, 3, 1e-7, 2), c(min(t_var), max(t_var), 20, 3, 0.0001, 2)) fit1 <- psVCSignal(y, dX, diindex, t_var, Pars = Pars, family = "gaussian", link = "identity", int = TRUE) plot(fit1, xlab = "Coefficient Index", ylab = "VC: % Flour") names(fit1)
sim_psr
Plotting function for single-index signal
regression with tensor product P-splines (using sim_psr
with class simpsr
).
## S3 method for class 'simpsr' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
## S3 method for class 'simpsr' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
x |
the P-spline object, usually from |
... |
other parameters. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
Resol |
resolution for plotting, default |
Plot |
a two panel plot, one for the estimated P-spline signal coefficent vector, and another for the estimated (unkown) P-spline smooth link function. |
Paul Eilers, Brian Marx, and Bin Li
Eilers, P.H.C., B. Li, B.D. Marx (2009). Multivariate calibration with single-index signal regression, Chemometrics and Intellegent Laboratory Systems, 96(2), 196-202.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] pords <- c(2, 2) nsegs <- c(27, 7) bdegs = c(3, 3) lambdas <- c(1e-6, .1) max_iter <- 100 # Single-index model fit <- sim_psr(y, dX, diindex, nsegs, bdegs, lambdas, pords, max_iter) plot(fit, xlab = "Wavelength (nm)", ylab = " ")
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] pords <- c(2, 2) nsegs <- c(27, 7) bdegs = c(3, 3) lambdas <- c(1e-6, .1) max_iter <- 100 # Single-index model fit <- sim_psr(y, dX, diindex, nsegs, bdegs, lambdas, pords, max_iter) plot(fit, xlab = "Wavelength (nm)", ylab = " ")
sim_vcpsr
Plotting function for varying-coefficient single-index signal
regression using tensor product P-splines (using sim_vcpsr
with class simvcpsr
).
## S3 method for class 'simvcpsr' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
## S3 method for class 'simvcpsr' plot(x, ..., xlab = " ", ylab = " ", Resol = 100)
x |
the P-spline object, usually from |
... |
other parameters. |
xlab |
label for the x-axis, e.g. "my x" (quotes required). |
ylab |
label for the y-axis, e.g. "my y" (quotes required). |
Resol |
resolution for plotting, default |
Plot |
a plot of the estimated 2D P-spline signal coefficient surface along with the companion plot of the estimated 2D P-spline varying link function surface. Slices of these plots, at fixed levels of the indexing covariate, are also provided. |
Paul Eilers and Brian Marx
Marx, B. D. (2015). Varying-coefficient single-index signal regression. Chemometrics and Intellegent Laboratory Systems, 143, 111–121.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
# Load libraries library(fields) # Needed for plotting # Get the data Dat <- Mixture # Dimensions: observations, temperature index, signal m <- 34 p1 <- 401 p2 <- 12 # Stacking mixture data, each mixture has 12 signals stacked # The first differenced spectra are also computed. mixture_data <- matrix(0, nrow = p2 * m, ncol = p1) for (ii in 1:m) { mixture_data[((ii - 1) * p2 + 1):(ii * p2), 1:p1] <- t(as.matrix(Dat$xspectra[ii, , ])) d_mixture_data <- t(diff(t(mixture_data))) } # Response (typo fixed) and index for signal y_mixture <- Dat$fractions y_mixture[17, 3] <- 0.1501 index_mixture <- Dat$wl # Select response and replicated for the 12 temps # Column 1: water; 2: ethanediol; 3: amino-1-propanol y <- as.vector(y_mixture[, 2]) y <- rep(y, each = p2) bdegs = c(3, 3, 3, 3) pords <- c(2, 2, 2, 2) nsegs <- c(12, 5, 5, 5) # Set to c(27, 7, 7 ,7) for given lambdas mins <- c(700, 30) maxs <- c(1100, 70) lambdas <- c(1e-11, 100, 0.5, 1) # based on svcm search x_index <- seq(from = 701, to = 1100, by = 1) # for dX t_var_sub <- c(30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70) t_var <- rep(t_var_sub, m) max_iter <- 2 # Set higher in practice, e.g. 100 int <- TRUE # Defining x as first differenced spectra, number of channels. x <- d_mixture_data # Single-index VC model using optimal tuning fit <- sim_vcpsr(y, x, t_var, x_index, nsegs, bdegs, lambdas, pords, max_iter = max_iter, mins = mins, maxs = maxs) plot(fit, xlab = "Wavelength (nm)", ylab = "Temp C")
# Load libraries library(fields) # Needed for plotting # Get the data Dat <- Mixture # Dimensions: observations, temperature index, signal m <- 34 p1 <- 401 p2 <- 12 # Stacking mixture data, each mixture has 12 signals stacked # The first differenced spectra are also computed. mixture_data <- matrix(0, nrow = p2 * m, ncol = p1) for (ii in 1:m) { mixture_data[((ii - 1) * p2 + 1):(ii * p2), 1:p1] <- t(as.matrix(Dat$xspectra[ii, , ])) d_mixture_data <- t(diff(t(mixture_data))) } # Response (typo fixed) and index for signal y_mixture <- Dat$fractions y_mixture[17, 3] <- 0.1501 index_mixture <- Dat$wl # Select response and replicated for the 12 temps # Column 1: water; 2: ethanediol; 3: amino-1-propanol y <- as.vector(y_mixture[, 2]) y <- rep(y, each = p2) bdegs = c(3, 3, 3, 3) pords <- c(2, 2, 2, 2) nsegs <- c(12, 5, 5, 5) # Set to c(27, 7, 7 ,7) for given lambdas mins <- c(700, 30) maxs <- c(1100, 70) lambdas <- c(1e-11, 100, 0.5, 1) # based on svcm search x_index <- seq(from = 701, to = 1100, by = 1) # for dX t_var_sub <- c(30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70) t_var <- rep(t_var_sub, m) max_iter <- 2 # Set higher in practice, e.g. 100 int <- TRUE # Defining x as first differenced spectra, number of channels. x <- d_mixture_data # Single-index VC model using optimal tuning fit <- sim_vcpsr(y, x, t_var, x_index, nsegs, bdegs, lambdas, pords, max_iter = max_iter, mins = mins, maxs = maxs) plot(fit, xlab = "Wavelength (nm)", ylab = "Temp C")
ps2DGLM
Prediction function which returns both linear
predictor and inverse link predictions at arbitrary (x, y) data locations
(using ps2DGLM
with class ps2dglm
).
## S3 method for class 'ps2dglm' predict(object, ..., XY, type = "mu")
## S3 method for class 'ps2dglm' predict(object, ..., XY, type = "mu")
object |
an object using |
... |
other parameters. |
XY |
a matrix of arbitrary ( |
type |
the mean value |
pred |
the estimated mean (inverse link function) (default)
or the linear predictor prediction with |
Paul Eilers and Brian Marx
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fields) library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age Start <- kyphosis$Start y <- 1 * (Kyphosis == "present") # make y 0/1 fit <- ps2DGLM( Data = cbind(Start, Age, y), Pars = rbind(c(1, 18, 10, 3, .1, 2), c(1, 206, 10, 3, .1, 2)), family = "binomial", link = "logit") predict(fit, XY = cbind(Start, Age)[1:5,])
library(fields) library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age Start <- kyphosis$Start y <- 1 * (Kyphosis == "present") # make y 0/1 fit <- ps2DGLM( Data = cbind(Start, Age, y), Pars = rbind(c(1, 18, 10, 3, .1, 2), c(1, 206, 10, 3, .1, 2)), family = "binomial", link = "logit") predict(fit, XY = cbind(Start, Age)[1:5,])
ps2DNormal
Prediction function which returns linear
predictions at arbitrary (x, y) data locations (using ps2DNormal
with class ps2dnormal
).
## S3 method for class 'ps2dnormal' predict(object, ..., XY)
## S3 method for class 'ps2dnormal' predict(object, ..., XY)
object |
an object using ps2DNormal. |
... |
other parameters. |
XY |
a matrix of arbitrary ( |
pred |
the estimated mean at ( |
Paul Eilers and Brian Marx
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(SemiPar) library(fields) library(spam) library(JOPS) # Get the data data(ethanol) x <- ethanol$C y <- ethanol$E z <- ethanol$NOx # Set parameters for domain xlo <- 7 xhi <- 19 ylo <- 0.5 yhi <- 1.25 # Set P-spline parameters, fit and compute surface xpars <- c(xlo, xhi, 10, 3, 0.01, 1) ypars <- c(ylo, yhi, 10, 3, 0.1, 1) Pars1 <- rbind(xpars, ypars) fit <- ps2DNormal(cbind(x, y, z), Pars = Pars1) predict(fit, XY = cbind(x, y)[1:5, ])
library(SemiPar) library(fields) library(spam) library(JOPS) # Get the data data(ethanol) x <- ethanol$C y <- ethanol$E z <- ethanol$NOx # Set parameters for domain xlo <- 7 xhi <- 19 ylo <- 0.5 yhi <- 1.25 # Set P-spline parameters, fit and compute surface xpars <- c(xlo, xhi, 10, 3, 0.01, 1) ypars <- c(ylo, yhi, 10, 3, 0.1, 1) Pars1 <- rbind(xpars, ypars) fit <- ps2DNormal(cbind(x, y, z), Pars = Pars1) predict(fit, XY = cbind(x, y)[1:5, ])
ps2DSignal
Prediction function which returns both linear
predictor and inverse link predictions for arbitrary 2D signals (using
ps2DSignal
with class ps2dsignal
).
## S3 method for class 'ps2dsignal' predict(object, ..., M_pred, M_type = "unfolded", type = "mu")
## S3 method for class 'ps2dsignal' predict(object, ..., M_pred, M_type = "unfolded", type = "mu")
object |
an object using |
... |
other parameters. |
M_pred |
a matrix of |
M_type |
"stacked" or "unfolded" (default). |
type |
the mean value |
pred |
the estimated mean (inverse link function)
or the linear predictor prediction with |
Paul Eilers and Brian Marx
Marx, B.D. and Eilers, P.H.C. (2005). Multidimensional penalized signal regression, Technometrics, 47: 13-22.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fields) library(JOPS) # Get the data x0 <- Sugar$X x0 <- x0 - apply(x0, 1, mean) # center Signal y <- as.vector(Sugar$y[, 3]) # Response is Ash # Inputs for two-dimensional signal regression nseg <- c(7, 37) pord <- c(3, 3) min_ <- c(230, 275) max_ <- c(340, 560) M1_index <- rev(c(340, 325, 305, 290, 255, 240, 230)) M2_index <- seq(from = 275, to = 560, by = .5) p1 <- length(M1_index) p2 <- length(M2_index) # Fit optimal model based on LOOCV opt_lam <- c(8858.6679, 428.1332) # Found via svcm Pars_opt <- rbind( c(min_[1], max_[1], nseg[1], 3, opt_lam[1], pord[1]), c(min_[2], max_[2], nseg[2], 3, opt_lam[2], pord[2]) ) fit <- ps2DSignal(y, x0, p1, p2, "unfolded", M1_index, M2_index, Pars_opt,int = TRUE, ridge_adj = 0.0001, M_pred = x0 ) predict(fit, M_pred= x0, type = "mu", M_type = "unfolded")
library(fields) library(JOPS) # Get the data x0 <- Sugar$X x0 <- x0 - apply(x0, 1, mean) # center Signal y <- as.vector(Sugar$y[, 3]) # Response is Ash # Inputs for two-dimensional signal regression nseg <- c(7, 37) pord <- c(3, 3) min_ <- c(230, 275) max_ <- c(340, 560) M1_index <- rev(c(340, 325, 305, 290, 255, 240, 230)) M2_index <- seq(from = 275, to = 560, by = .5) p1 <- length(M1_index) p2 <- length(M2_index) # Fit optimal model based on LOOCV opt_lam <- c(8858.6679, 428.1332) # Found via svcm Pars_opt <- rbind( c(min_[1], max_[1], nseg[1], 3, opt_lam[1], pord[1]), c(min_[2], max_[2], nseg[2], 3, opt_lam[2], pord[2]) ) fit <- ps2DSignal(y, x0, p1, p2, "unfolded", M1_index, M2_index, Pars_opt,int = TRUE, ridge_adj = 0.0001, M_pred = x0 ) predict(fit, M_pred= x0, type = "mu", M_type = "unfolded")
psNormal
, psBinomial
, psPoisson
Prediction function which returns both linear
predictor and inverse link predictions at arbitrary data locations
(using psNormal
, psBinomial
, psPoisson
with class pspfit
).
## S3 method for class 'pspfit' predict(object, ..., x, type = "mu")
## S3 method for class 'pspfit' predict(object, ..., x, type = "mu")
object |
an object using |
... |
other parameters. |
x |
a scalar or vector of arbitrary |
type |
the mean value |
pred |
the estimated mean (inverse link function) (default)
or the linear predictor prediction with |
Paul Eilers and Brian Marx
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) library(boot) # Extract the data Count <- hist(boot::coal$date, breaks = c(1851:1963), plot = FALSE)$counts Year <- c(1851:1962) xl <- min(Year) xr <- max(Year) # Poisson smoothing nseg <- 20 bdeg <- 3 fit1 <- psPoisson(Year, Count, xl, xr, nseg, bdeg, pord = 2, lambda = 1) names(fit1) plot(fit1, xlab = "Year", ylab = "Count", se = 2) predict(fit1, x = fit1$x[1:5]) predict(fit1, x = fit1$x[1:5], type = "eta")
library(JOPS) library(boot) # Extract the data Count <- hist(boot::coal$date, breaks = c(1851:1963), plot = FALSE)$counts Year <- c(1851:1962) xl <- min(Year) xr <- max(Year) # Poisson smoothing nseg <- 20 bdeg <- 3 fit1 <- psPoisson(Year, Count, xl, xr, nseg, bdeg, pord = 2, lambda = 1) names(fit1) plot(fit1, xlab = "Year", ylab = "Count", se = 2) predict(fit1, x = fit1$x[1:5]) predict(fit1, x = fit1$x[1:5], type = "eta")
psSignal
Prediction function which returns both linear
predictor and inverse link predictions, for an arbitrary matrix of signals
(using psSignal
with class pssignal
).
## S3 method for class 'pssignal' predict(object, ..., X_pred, type = "mu")
## S3 method for class 'pssignal' predict(object, ..., X_pred, type = "mu")
object |
an object using |
... |
other parameters. |
X_pred |
a matrix of arbitrary signals with |
type |
the mean value |
pred |
the estimated mean (inverse link function) (default)
or the linear predictor prediction with |
Paul Eilers and Brian Marx
Marx, B.D. and Eilers, P.H.C. (1999). Generalized linear regression for sampled signals and curves: A P-spline approach. Technometrics, 41(1): 1-13.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Get the data library(fds) data(nirc) iindex=nirc$x X=nirc$y sel= 50:650 #1200 <= x & x<= 2400 X=X[sel,] iindex=iindex[sel] dX=diff(X) diindex=iindex[-1] y=as.vector(labc[1,1:40]) oout=23 dX=t(dX[,-oout]) y=y[-oout] fit1 = psSignal(y, dX, diindex, nseg = 25,lambda = 0.0001) predict(fit1, X_pred = dX[1:5, ]) predict(fit1, X_pred = dX[1:5, ], type = 'eta')
library(JOPS) # Get the data library(fds) data(nirc) iindex=nirc$x X=nirc$y sel= 50:650 #1200 <= x & x<= 2400 X=X[sel,] iindex=iindex[sel] dX=diff(X) diindex=iindex[-1] y=as.vector(labc[1,1:40]) oout=23 dX=t(dX[,-oout]) y=y[-oout] fit1 = psSignal(y, dX, diindex, nseg = 25,lambda = 0.0001) predict(fit1, X_pred = dX[1:5, ]) predict(fit1, X_pred = dX[1:5, ], type = 'eta')
psVCSignal
Prediction function which returns both linear
predictor and inverse link predictions for an arbitrary matrix of
signals with their vector of companion indexing covariates (using
psVCSignal
with class psvcsignal
).
## S3 method for class 'psvcsignal' predict(object, ..., X_pred, t_pred, type = "mu")
## S3 method for class 'psvcsignal' predict(object, ..., X_pred, t_pred, type = "mu")
object |
an object using |
... |
other parameters. |
X_pred |
a matrix of |
t_pred |
a |
type |
the mean value |
pred |
the estimated mean (inverse link function) (default)
or the linear predictor prediction with |
Paul Eilers and Brian Marx
Eilers, P. H. C. and Marx, B. D. (2003). Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intellegent Laboratory Systems, 66, 159–174.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat t_var <- as.vector(labc[4, 1:40]) # percent flour oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] t_var = t_var[-oout] Pars = rbind(c(min(diindex), max(diindex), 25, 3, 1e-7, 2), c(min(t_var), max(t_var), 20, 3, 0.0001, 2)) fit1 <- psVCSignal(y, dX, diindex, t_var, Pars = Pars, family = "gaussian", link = "identity", int = TRUE) predict(fit1, X_pred = dX[1:5,], t_pred = t_var[1:5])
library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat t_var <- as.vector(labc[4, 1:40]) # percent flour oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] t_var = t_var[-oout] Pars = rbind(c(min(diindex), max(diindex), 25, 3, 1e-7, 2), c(min(t_var), max(t_var), 20, 3, 0.0001, 2)) fit1 <- psVCSignal(y, dX, diindex, t_var, Pars = Pars, family = "gaussian", link = "identity", int = TRUE) predict(fit1, X_pred = dX[1:5,], t_pred = t_var[1:5])
sim_psr
Prediction function which returns single-index
inverse link linear predictions at arbitrary data locations (using
sim_psr
with class simpsr
).
## S3 method for class 'simpsr' predict(object, ..., X_pred)
## S3 method for class 'simpsr' predict(object, ..., X_pred)
object |
an object using |
... |
other parameters. |
X_pred |
a matrix of arbitrary signals with |
pred |
the estimated (inverse single-index) mean for the signals in |
Paul Eilers and Brian Marx
Eilers, P.H.C., B. Li, B.D. Marx (2009). Multivariate calibration with single-index signal regression, Chemometrics and Intellegent Laboratory Systems, 96(2), 196-202.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] pords <- c(2, 2) nsegs <- c(27, 7) bdegs = c(3, 3) lambdas <- c(1e-6, .1) max_iter <- 100 # Single-index model fit <- sim_psr(y, dX, diindex, nsegs, bdegs, lambdas, pords, max_iter) predict(fit, X_pred = dX)
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] pords <- c(2, 2) nsegs <- c(27, 7) bdegs = c(3, 3) lambdas <- c(1e-6, .1) max_iter <- 100 # Single-index model fit <- sim_psr(y, dX, diindex, nsegs, bdegs, lambdas, pords, max_iter) predict(fit, X_pred = dX)
sim_vcpsr
Prediction function which returns varying-coefficient single-index
inverse link linear predictions at arbitrary data locations (using sim_vcpsr
with
class simvcpsr
).
## S3 method for class 'simvcpsr' predict(object, ..., X_pred, t_pred)
## S3 method for class 'simvcpsr' predict(object, ..., X_pred, t_pred)
object |
an object using |
... |
other parameters. |
X_pred |
a matrix of arbitrary signals with |
t_pred |
a |
pred |
the estimated (inverse single-index) mean for the signals in the matrix |
Paul Eilers and Brian Marx
Marx, B. D. (2015). Varying-coefficient single-index signal regression. Chemometrics and Intellegent Laboratory Systems, 143, 111–121.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
# Load libraries library(fields) # Needed for plotting # Get the data Dat <- Mixture # Dimensions: observations, temperature index, signal m <- 34 p1 <- 401 p2 <- 12 # Stacking mixture data, each mixture has 12 signals stacked # The first differenced spectra are also computed. mixture_data <- matrix(0, nrow = p2 * m, ncol = p1) for (ii in 1:m) { mixture_data[((ii - 1) * p2 + 1):(ii * p2), 1:p1] <- t(as.matrix(Dat$xspectra[ii, , ])) d_mixture_data <- t(diff(t(mixture_data))) } # Response (typo fixed) and index for signal y_mixture <- Dat$fractions y_mixture[17, 3] <- 0.1501 index_mixture <- Dat$wl # Select response and replicated for the 12 temps # Column 1: water; 2: ethanediol; 3: amino-1-propanol y <- as.vector(y_mixture[, 2]) y <- rep(y, each = p2) bdegs = c(3, 3, 3, 3) pords <- c(2, 2, 2, 2) nsegs <- c(12, 5, 5, 5) # Set to c(27, 7, 7 ,7) for given lambdas mins <- c(700, 30) maxs <- c(1100, 70) lambdas <- c(1e-11, 100, 0.5, 1) # based on svcm search x_index <- seq(from = 701, to = 1100, by = 1) # for dX t_var_sub <- c(30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70) t_var <- rep(t_var_sub, m) max_iter <- 2 # Set higher in practice, e.g. 100 int <- TRUE # Defining x as first differenced spectra, number of channels. x <- d_mixture_data # Single-index VC model using optimal tuning fit <- sim_vcpsr(y, x, t_var, x_index, nsegs, bdegs, lambdas, pords, max_iter = max_iter, mins = mins, maxs = maxs) predict(fit, X_pred = x, t_pred = t_var)
# Load libraries library(fields) # Needed for plotting # Get the data Dat <- Mixture # Dimensions: observations, temperature index, signal m <- 34 p1 <- 401 p2 <- 12 # Stacking mixture data, each mixture has 12 signals stacked # The first differenced spectra are also computed. mixture_data <- matrix(0, nrow = p2 * m, ncol = p1) for (ii in 1:m) { mixture_data[((ii - 1) * p2 + 1):(ii * p2), 1:p1] <- t(as.matrix(Dat$xspectra[ii, , ])) d_mixture_data <- t(diff(t(mixture_data))) } # Response (typo fixed) and index for signal y_mixture <- Dat$fractions y_mixture[17, 3] <- 0.1501 index_mixture <- Dat$wl # Select response and replicated for the 12 temps # Column 1: water; 2: ethanediol; 3: amino-1-propanol y <- as.vector(y_mixture[, 2]) y <- rep(y, each = p2) bdegs = c(3, 3, 3, 3) pords <- c(2, 2, 2, 2) nsegs <- c(12, 5, 5, 5) # Set to c(27, 7, 7 ,7) for given lambdas mins <- c(700, 30) maxs <- c(1100, 70) lambdas <- c(1e-11, 100, 0.5, 1) # based on svcm search x_index <- seq(from = 701, to = 1100, by = 1) # for dX t_var_sub <- c(30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70) t_var <- rep(t_var_sub, m) max_iter <- 2 # Set higher in practice, e.g. 100 int <- TRUE # Defining x as first differenced spectra, number of channels. x <- d_mixture_data # Single-index VC model using optimal tuning fit <- sim_vcpsr(y, x, t_var, x_index, nsegs, bdegs, lambdas, pords, max_iter = max_iter, mins = mins, maxs = maxs) predict(fit, X_pred = x, t_pred = t_var)
ps2D_PartialDeriv
provides the partial derivative
P-spline surface along x
, with aniosotripic penalization of
tensor product B-splines.
ps2D_PartialDeriv( Data, Pars = rbind(c(min(Data[, 1]), max(Data[, 1]), 10, 3, 1, 2), c(min(Data[, 2]), max(Data[, 2]), 10, 3, 1, 2)), XYpred = cbind(Data[, 1], Data[, 2]) )
ps2D_PartialDeriv( Data, Pars = rbind(c(min(Data[, 1]), max(Data[, 1]), 10, 3, 1, 2), c(min(Data[, 2]), max(Data[, 2]), 10, 3, 1, 2)), XYpred = cbind(Data[, 1], Data[, 2]) )
Data |
a matrix of 3 columns |
Pars |
a matrix of 2 rows, where the first and second row
sets the P-spline paramters for |
XYpred |
a matrix with two columns |
This is support function for sim_vcpsr
.
coef |
a vector of length |
B |
the tensor product B-spline matrix of dimensions |
fit |
a vector of |
pred |
a vector of length |
d_coef |
a vector of length |
B_d |
the tensor product B-spline matrix of dimensions |
d_fit |
a vector of |
d_pred |
a vector of length |
Pars |
a matrix of 2 rows, where each the first (second) row
sets the P-spline paramters for |
cv |
root leave-one-out CV or root average PRESS. |
XYpred |
a matrix with two columns |
Brian Marx
Marx, B. D. (2015). Varying-coefficient single-index signal regression. Chemometrics and Intelligent Laboratory Systems, 143, 111–121.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
ps2DGLM
is used to smooth scattered
normal or non-normal responses, with aniosotripic
penalization of tensor product P-splines.
ps2DGLM( Data, Pars = rbind(c(min(Data[, 1]), max(Data[, 1]), 10, 3, 1, 2), c(min(Data[, 2]), max(Data[, 2]), 10, 3, 1, 2)), ridge_adj = 0, XYpred = Data[, 1:2], z_predicted = NULL, se_pred = 2, family = "gaussian", link = "default", m_binomial = rep(1, nrow(Data)), wts = rep(1, nrow(Data)), r_gamma = rep(1, nrow(Data)) )
ps2DGLM( Data, Pars = rbind(c(min(Data[, 1]), max(Data[, 1]), 10, 3, 1, 2), c(min(Data[, 2]), max(Data[, 2]), 10, 3, 1, 2)), ridge_adj = 0, XYpred = Data[, 1:2], z_predicted = NULL, se_pred = 2, family = "gaussian", link = "default", m_binomial = rep(1, nrow(Data)), wts = rep(1, nrow(Data)), r_gamma = rep(1, nrow(Data)) )
Data |
a matrix of 3 columns |
Pars |
a matrix of 2 rows, where the first and second row
sets the P-spline paramters for |
ridge_adj |
a ridge penalty tuning parameter, usually set to small value, e.g. |
XYpred |
a matrix with two columns |
z_predicted |
a vector of responses associated with |
se_pred |
a scalar, default |
family |
|
link |
the link function, one of |
m_binomial |
vector of binomial trials, default is vector of ones with |
wts |
non-negative weights, which can be zero (default ones). |
r_gamma |
gamma scale parameter, default is vector ones with |
Support functions needed: pspline_fitter
, bbase
, and pspline_2dchecker
.
pcoef |
a vector of length |
mu |
a vector of |
dev |
the deviance of fit. |
eff_df |
the approximate effective dimension of fit. |
aic |
AIC. |
df_resid |
approximate df residual. |
cv |
leave-one-out standard error prediction, when |
cv_predicted |
standard error prediction for |
avediff_pred |
mean absolute difference prediction, when |
Pars |
the design and tuning parameters (see arguments above). |
dispersion_parm |
estimate of dispersion, |
summary_predicted |
inverse link prediction vectors, and |
eta_predicted |
estimated linear predictor of |
press_mu |
leave-one-out prediction of mean, when |
bin_percent_correct |
percent correct classification based on 0.5 cut-off (when |
Data |
a matrix of 3 columns |
Q |
the tensor product B-spline basis. |
qr |
the Q-R of the model. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
ps2DNormal
library(fields) library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age Start <- kyphosis$Start y <- 1 * (Kyphosis == "present") # make y 0/1 fit <- ps2DGLM( Data = cbind(Start, Age, y), Pars = rbind(c(1, 18, 10, 3, .1, 2), c(1, 206, 10, 3, .1, 2)), family = "binomial", link = "logit") plot(fit, xlab = "Start", ylab = "Age") #title(main = "Probability of Kyphosis")
library(fields) library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age Start <- kyphosis$Start y <- 1 * (Kyphosis == "present") # make y 0/1 fit <- ps2DGLM( Data = cbind(Start, Age, y), Pars = rbind(c(1, 18, 10, 3, .1, 2), c(1, 206, 10, 3, .1, 2)), family = "binomial", link = "logit") plot(fit, xlab = "Start", ylab = "Age") #title(main = "Probability of Kyphosis")
ps2DNormal is used to smooth scattered (normal) data, with aniosotripic penalization of tensor product P-splines.
ps2DNormal( Data, Pars = rbind(c(min(Data[, 1]), max(Data[, 1]), 10, 3, 1, 2), c(min(Data[, 2]), max(Data[, 2]), 10, 3, 1, 2)), XYpred = expand.grid(Data[, 1], Data[, 2]) )
ps2DNormal( Data, Pars = rbind(c(min(Data[, 1]), max(Data[, 1]), 10, 3, 1, 2), c(min(Data[, 2]), max(Data[, 2]), 10, 3, 1, 2)), XYpred = expand.grid(Data[, 1], Data[, 2]) )
Data |
a matrix of 3 columns |
Pars |
a matrix of 2 rows, where the first and second row
sets the P-spline paramters for |
XYpred |
a matrix with two columns |
Support functions needed: pspline_fitter
, bbase
, and pspline_2dchecker
.
coef |
a vector of length |
fit |
a vector of |
pred |
a vector of length |
Pars |
the design and tuning parameters (see arguments above). |
cv |
leave-one-out standard error of prediction or root average PRESS. |
h |
"hat" diagonals of tensor P-spline fit. |
B |
tensor product B-spline basis used for fitting. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
ps2DGLM
library(SemiPar) library(fields) library(spam) library(JOPS) # Get the data data(ethanol) x <- ethanol$C y <- ethanol$E z <- ethanol$NOx # Set parameters for domain xlo <- 7 xhi <- 19 ylo <- 0.5 yhi <- 1.25 # Set P-spline parameters, fit and compute surface xpars <- c(xlo, xhi, 10, 3, 3, 1) ypars <- c(ylo, yhi, 10, 3, 3, 1) Pars1 <- rbind(xpars, ypars) fit <- ps2DNormal(cbind(x, y, z), Pars = Pars1) plot(fit, xlab = "C", ylab = "E")
library(SemiPar) library(fields) library(spam) library(JOPS) # Get the data data(ethanol) x <- ethanol$C y <- ethanol$E z <- ethanol$NOx # Set parameters for domain xlo <- 7 xhi <- 19 ylo <- 0.5 yhi <- 1.25 # Set P-spline parameters, fit and compute surface xpars <- c(xlo, xhi, 10, 3, 3, 1) ypars <- c(ylo, yhi, 10, 3, 3, 1) Pars1 <- rbind(xpars, ypars) fit <- ps2DNormal(cbind(x, y, z), Pars = Pars1) plot(fit, xlab = "C", ylab = "E")
ps2DSignal
is a function used to regress a (glm) response onto a two-dimensional
signal or image, with aniosotripic penalization of tensor product P-splines.
ps2DSignal( y, M, p1, p2, M_type = "stacked", M1_index = c(1:p1), M2_index = c(1:p2), Pars = rbind(c(1, p1, 10, 3, 1, 2), c(1, p2, 10, 3, 1, 2)), ridge_adj = 1e-06, M_pred = M, y_predicted = NULL, family = "gaussian", link = "default", m_binomial = 1 + 0 * y, wts = 1 + 0 * y, r_gamma = 1 + 0 * y, int = TRUE, se_pred = 2 )
ps2DSignal( y, M, p1, p2, M_type = "stacked", M1_index = c(1:p1), M2_index = c(1:p2), Pars = rbind(c(1, p1, 10, 3, 1, 2), c(1, p2, 10, 3, 1, 2)), ridge_adj = 1e-06, M_pred = M, y_predicted = NULL, family = "gaussian", link = "default", m_binomial = 1 + 0 * y, wts = 1 + 0 * y, r_gamma = 1 + 0 * y, int = TRUE, se_pred = 2 )
y |
a response vector of length |
M |
The signal/image regressors, which are either "stacked" or "unfolded",
with dimensions ( |
p1 |
the row dimension of the image. |
p2 |
the column dimension of the image. |
M_type |
"stacked" (signal as matrix) or "unfolded" (signal as vector). |
M1_index |
an index of length |
M2_index |
an index of length |
Pars |
a matrix of 2 rows, where the first and second row
sets the P-spline paramters for |
ridge_adj |
A ridge penalty tuning parameter (usually set to small value, default |
M_pred |
(e.g. stacked ( |
y_predicted |
a vector of responses from a cv data set (assoc. with |
family |
the response distribution, e.g.
|
link |
the link function, one of |
m_binomial |
a vector of binomial trials having |
wts |
the weight vector of |
r_gamma |
a vector of gamma shape parameters. Default is 1 vector for for |
int |
set to TRUE or FALSE to include intercept term in linear predictor (default |
se_pred |
a scalar, e.g. |
Support functions needed: pspline_fitter
, bbase
, and pspline_2dchecker
.
pcoef |
a vector of length |
summary_predicted |
inverse link prediction vectors, and standard error surfaces. |
dev |
deviance of fit. |
eff_df |
the approximate effective dimension of fit. |
aic |
AIC. |
df_resid |
approximate df residual. |
cv |
leave-one-out standard error prediction, when |
cv_predicted |
standard error prediction for |
avediff_pred |
mean absolute difference prediction, when |
Pars |
design and tuning parameters (see above arguments). |
Dispersion_parm |
estimate of dispersion, |
summary_predicted |
inverse link prediction vectors at |
eta_predicted |
estimated linear predictor of |
press_mu |
leave-one-out prediction of mean, when |
bin_percent_correct |
percent correct classification based on 0.5 cut-off,
when |
B |
Tensor basis ( |
Q |
Effective regressors ( |
Ahat |
smooth P-spline coefficient vector of length |
M |
the signal/image regressors. |
y |
the response vector. |
M1index |
index of length |
M2index |
index of length |
M_type |
"stacked" or "unfolded". |
w |
GLM weight vector of length |
h |
"hat" diagonals. |
ridge_adj |
additional ridge tuning parameter to stabilize estimation. |
Paul Eilers and Brian Marx
Marx, B.D. and Eilers, P.H.C. (2005). Multidimensional penalized signal regression, Technometrics, 47: 13-22.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fields) library(JOPS) # Get the data x0 <- Sugar$X x0 <- x0 - apply(x0, 1, mean) # center Signal y <- as.vector(Sugar$y[, 3]) # Response is Ash # Inputs for two-dimensional signal regression nseg <- c(7, 37) pord <- c(3, 3) min_ <- c(230, 275) max_ <- c(340, 560) M1_index <- rev(c(340, 325, 305, 290, 255, 240, 230)) M2_index <- seq(from = 275, to = 560, by = .5) p1 <- length(M1_index) p2 <- length(M2_index) # Fit optimal model based on LOOCV opt_lam <- c(8858.6679, 428.1332) # Found via svcm Pars_opt <- rbind( c(min_[1], max_[1], nseg[1], 3, opt_lam[1], pord[1]), c(min_[2], max_[2], nseg[2], 3, opt_lam[2], pord[2]) ) fit <- ps2DSignal(y, x0, p1, p2, "unfolded", M1_index, M2_index, Pars_opt,int = TRUE, ridge_adj = 0.0001, M_pred = x0 ) # Plotting coefficient image plot(fit)
library(fields) library(JOPS) # Get the data x0 <- Sugar$X x0 <- x0 - apply(x0, 1, mean) # center Signal y <- as.vector(Sugar$y[, 3]) # Response is Ash # Inputs for two-dimensional signal regression nseg <- c(7, 37) pord <- c(3, 3) min_ <- c(230, 275) max_ <- c(340, 560) M1_index <- rev(c(340, 325, 305, 290, 255, 240, 230)) M2_index <- seq(from = 275, to = 560, by = .5) p1 <- length(M1_index) p2 <- length(M2_index) # Fit optimal model based on LOOCV opt_lam <- c(8858.6679, 428.1332) # Found via svcm Pars_opt <- rbind( c(min_[1], max_[1], nseg[1], 3, opt_lam[1], pord[1]), c(min_[2], max_[2], nseg[2], 3, opt_lam[2], pord[2]) ) fit <- ps2DSignal(y, x0, p1, p2, "unfolded", M1_index, M2_index, Pars_opt,int = TRUE, ridge_adj = 0.0001, M_pred = x0 ) # Plotting coefficient image plot(fit)
psBinomial
is used to smooth scattered
binomial data using P-splines using a logit link function.
psBinomial( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, ntrials = 0 * y + 1, wts = NULL, show = FALSE, iter = 100, xgrid = 100 )
psBinomial( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, ntrials = 0 * y + 1, wts = NULL, show = FALSE, iter = 100, xgrid = 100 )
x |
the vector for the continuous regressor of |
y |
the response vector, usually 0/1 or binomial counts. |
xl |
the lower limit for the domain of |
xr |
the upper limit for the domain of |
nseg |
the number of evenly spaced segments between xl and xr. |
bdeg |
the number of the degree of the basis, usually 1, 2 (default), or 3. |
pord |
the number of the order of the difference penalty, usually 1, 2, or 3 (defalult). |
lambda |
the (positive) number for the tuning parameter for the penalty. |
ntrials |
the vector for the number of binomial trials (default = 1). |
wts |
the vector of weights, default is 1, zeros allowed. |
show |
Set to TRUE or FALSE to display iteration history. |
iter |
a scalar to set the maximum number of iterations, default |
xgrid |
a scalar or a vector that gives the |
pcoef |
a vector of length |
p |
a vector of length |
muhat |
a vector of length |
dev |
deviance |
effdim |
effective dimension of the smooth. |
aic |
AIC |
wts |
a vector of preset weights (default = 1). |
nseg |
the number of B-spline segments. |
bdeg |
the degree of the B-spline basis. |
pord |
the order of the difference penalty. |
family |
the GLM family (repsonse distribution). |
link |
the link function. |
y |
the binomial response. |
x |
the regressor on which the basis is constructed. |
P |
"half" of the penalty matrix, |
B |
the B-spline basis. |
lambda |
the positive tuning parameter. |
dispersion |
dispersion parameter estimated |
xgrid |
gridded |
ygrid |
gridded fitted linear predictor values, useful for plotting. |
pgrid |
gridded (inverse link) fitted probability values, useful for plotting. |
se_eta |
gridded standard errors for the linear predictor. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age y <- 1 * (Kyphosis == "present") # make y 0/1 fit1 <- psBinomial(Age, y, xl = min(Age), xr = max(Age), nseg = 20, bdeg = 3, pord = 2, lambda = 10 ) names(fit1) plot(fit1, xlab = "Age", ylab = "0/1", se = 2)
library(JOPS) # Extract data library(rpart) Kyphosis <- kyphosis$Kyphosis Age <- kyphosis$Age y <- 1 * (Kyphosis == "present") # make y 0/1 fit1 <- psBinomial(Age, y, xl = min(Age), xr = max(Age), nseg = 20, bdeg = 3, pord = 2, lambda = 10 ) names(fit1) plot(fit1, xlab = "Age", ylab = "0/1", se = 2)
psNormal
is used to smooth scattered (normal) data using P-splines (with identity link function).
psNormal( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, wts = NULL, xgrid = 100 )
psNormal( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, wts = NULL, xgrid = 100 )
x |
the vector for the continuous regressor of |
y |
the response vector, usually continuous data. |
xl |
the number for the min along |
xr |
the number for the max along |
nseg |
the number of evenly spaced segments between |
bdeg |
the number of the degree of the basis, usually 1, 2 (default), or 3. |
pord |
the number of the order of the difference penalty, usually 1, 2, or 3 (defalult). |
lambda |
the (positive) number for the tuning parameter for the penalty (default 1). |
wts |
the vector of general weights, default is 1; zero allowed. |
xgrid |
a scalar or a vector that gives the |
pcoeff |
a vector of length |
muhat |
a vector of length |
B |
a matrix of dimension |
wts |
a vector of length |
effdim |
estimated effective dimension. |
ed_resid |
approximate df residual. |
sigma |
square root of MSE. |
cv |
standard error of leave-one-out prediction or root average PRESS. |
nseg |
the number of B-spline segments. |
bdeg |
the degree of the B-spline basis. |
pord |
the order of the difference penalty. |
lambda |
the positive tuning parameter. |
xgrid |
gridded x values, useful for plotting. |
ygrid |
gridded fitted mean values, useful for plotting. |
se_eta |
gridded standard errors for the fitted mean values, useful for plotting. |
P |
"half" of the penalty, such that |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
library(JOPS) library(MASS) data(mcycle) x <- mcycle$times y <- mcycle$accel fit1 <- psNormal(x, y, nseg = 20, bdeg = 3, pord = 2, lambda = .8) plot(fit1, se = 2, xlab = "Time (ms)", ylab = "Acceleration")
library(JOPS) library(MASS) data(mcycle) x <- mcycle$times y <- mcycle$accel fit1 <- psNormal(x, y, nseg = 20, bdeg = 3, pord = 2, lambda = .8) plot(fit1, se = 2, xlab = "Time (ms)", ylab = "Acceleration")
psNormal_Deriv
provides the derivative
P-spline fit along x
.
psNormal_Deriv( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, wts = rep(1, length(y)), xgrid = x )
psNormal_Deriv( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, wts = rep(1, length(y)), xgrid = x )
x |
the vector for the continuous regressor of |
y |
the response vector, usually continuous data. |
xl |
the number for the min along |
xr |
the number for the max along |
nseg |
the number of evenly spaced segments between |
bdeg |
the number of the degree of the basis, usually 1, 2, or 3 (defalult). |
pord |
the number of the order of the difference penalty, usually 1, 2 (defalult), or 3. |
lambda |
the positive tuning parameter (default 1). |
wts |
the vector of weights, default is 1; 0/1 allowed. |
xgrid |
a scalar or a vector that gives the |
This is also a
support function needed for sim_psr
and sim_vcpsr
.
SISR (Eilers, Li, Marx, 2009).
coef |
a vector of |
B |
The B-spline matrix of dimensions |
fit |
a vector of |
pred |
a vector of |
d_coef |
a vector of |
B_d |
The first derivative B-spline matrix of dimensions |
d_fit |
a vector of |
d_pred |
a vector of length |
xl |
the number for the min along |
xr |
the number for the max along |
nseg |
the number of evenly spaced segments between |
bdeg |
the number of the degree of the basis, usually 1, 2, or 3 (default). |
pord |
the number of the order of the difference penalty, usually 1, 2 (default), or 3. |
lambda |
the positive tuning parameter (default 1). |
Paul Eilers and Brian Marx
Marx, B. D. (2015). Varying-coefficient single-index signal regression. Chemometrics and Intelligent Laboratory Systems, 143, 111–121.
Eilers, P.H.C., B. Li, B.D. Marx (2009). Multivariate calibration with single-index signal regression, Chemometrics and Intellegent Laboratory Systems, 96(2), 196-202.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
sim_psr sim_vcpsr
pspline_checker
checks to see if all the inputs associated
for P-spines are properly defined.
pspline_checker(family, link, bdeg, pord, nseg, lambda, ridge_adj, wts)
pspline_checker(family, link, bdeg, pord, nseg, lambda, ridge_adj, wts)
family |
the response distribution, e.g. |
link |
the link function, one of |
bdeg |
the degree of B-splines. |
pord |
the order of the penalty. |
nseg |
the number of evenly-spaced B-spline segmements. |
lambda |
the positive tuning parameter for the difference penalty. |
ridge_adj |
the positive tuning parameter for the ridge penalty. |
wts |
the weight vector, separate from GLM weights. |
list |
same as inputs, with warnings if required. |
pspline_fitter
appies the method of scoring
to a variety of response distributions and link functions within
for P-spline fitting within the GLM framework.
pspline_fitter( y, B, family = "gaussian", link = "identity", P, P_ridge = 0 * diag(ncol(B)), wts = 0 * y + 1, m_binomial = 0 * y + 1, r_gamma = 0 * y + 1 )
pspline_fitter( y, B, family = "gaussian", link = "identity", P, P_ridge = 0 * diag(ncol(B)), wts = 0 * y + 1, m_binomial = 0 * y + 1, r_gamma = 0 * y + 1 )
y |
the glm response vector of length |
B |
The effective P-spline regressors, e.g. |
family |
the response distribution, e.g. |
link |
the link function, one of |
P |
P-spline ("half") penalty matrix for data augmentation, such that |
P_ridge |
ridge ("half") penalty for data augmentation, usually |
wts |
the weight vector of |
m_binomial |
a vector of binomial trials having |
r_gamma |
a vector of gamma shape parameters, when |
coef |
the estimated P-spline coefficient regressor, using the effective regressors. |
w |
|
f |
the |
eta |
the linear predictor from |
pspline_2dchecker
checks to see if all the 2D tensor inputs associated
for P-spines are properly defined.
pspline2d_checker( family, link, bdeg1, bdeg2, pord1, pord2, nseg1, nseg2, lambda1, lambda2, ridge_adj, wts )
pspline2d_checker( family, link, bdeg1, bdeg2, pord1, pord2, nseg1, nseg2, lambda1, lambda2, ridge_adj, wts )
family |
the response distribution, e.g. |
link |
the link function, one of |
bdeg1 |
the degree of B-splines. |
bdeg2 |
the degree of B-splines. |
pord1 |
the order of the penalty. |
pord2 |
the order of the penalty. |
nseg1 |
the number of evenly spaced B-spline segmements. |
nseg2 |
the number of evenly spaced B-spline segmements. |
lambda1 |
the positive tuning parameter for the difference penalty. |
lambda2 |
the positive tuning parameter for the difference penalty. |
ridge_adj |
the positive tuning parameter for the ridge penalty. |
wts |
the weight vector, separate from GLM weights. |
list |
same as inputs, with warnings if required. |
psPoisson
is used to smooth scattered
Poisson data using P-splines with a log link function.
psPoisson( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, wts = NULL, show = FALSE, iter = 100, xgrid = 100 )
psPoisson( x, y, xl = min(x), xr = max(x), nseg = 10, bdeg = 3, pord = 2, lambda = 1, wts = NULL, show = FALSE, iter = 100, xgrid = 100 )
x |
the vector for the continuous regressor of |
y |
the response vector, usually count data. |
xl |
the number for the min along |
xr |
the number for the max along |
nseg |
the number of evenly spaced segments between |
bdeg |
the number of the degree of the basis, usually 1, 2, or 3 (defalult). |
pord |
the number of the order of the difference penalty, usually 1, 2 (default), or 3. |
lambda |
the (positive) number for the tuning parameter for the penalty (default 1). |
wts |
the vector of general weights, zeros are allowed (default 1). |
show |
Set to TRUE or FALSE to display iteration history (default FALSE). |
iter |
a scalar to set the maximum number of iterations, default |
xgrid |
a scalar or a vector that gives the |
pcoef |
a vector of length |
muhat |
a vector of length |
B |
the |
dev |
deviance of fit. |
effdim |
effective dimension of fit. |
aic |
AIC. |
wts |
the vector of given prior weights. |
nseg |
the number of B-spline segments. |
bdeg |
the degree of the B-spline basis. |
pord |
the order of the difference penalty. |
lambda |
the positive tuning parameter. |
family |
the family of the response (
|
link |
the link function used ( |
xgrid |
gridded x values, useful for plotting. |
ygrid |
gridded fitted linear predictor values, useful for plotting. |
mugrid |
gridded (inverse link) fitted mean values, useful for plotting. |
se_eta |
gridded standard errors for the linear predictor. |
dispersion |
Dispersion parameter estimated |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Eilers, P.H.C., Marx, B.D., and Durban, M. (2015). Twenty years of P-splines, SORT, 39(2): 149-186.
library(JOPS) library(boot) # Extract the data Count <- hist(boot::coal$date, breaks = c(1851:1963), plot = FALSE)$counts Year <- c(1851:1962) xl <- min(Year) xr <- max(Year) # Poisson smoothing nseg <- 20 bdeg <- 3 fit1 <- psPoisson(Year, Count, xl, xr, nseg, bdeg, pord = 2, lambda = 1) plot(fit1, xlab = "Year", ylab = "Count", se = 2)
library(JOPS) library(boot) # Extract the data Count <- hist(boot::coal$date, breaks = c(1851:1963), plot = FALSE)$counts Year <- c(1851:1962) xl <- min(Year) xr <- max(Year) # Poisson smoothing nseg <- 20 bdeg <- 3 fit1 <- psPoisson(Year, Count, xl, xr, nseg, bdeg, pord = 2, lambda = 1) plot(fit1, xlab = "Year", ylab = "Count", se = 2)
Smooth signal (multivariate calibration) regression using P-splines.
psSignal( y, x_signal, x_index = c(1:ncol(x_signal)), nseg = 10, bdeg = 3, pord = 3, lambda = 1, wts = 1 + 0 * y, family = "gaussian", link = "default", m_binomial = 1 + 0 * y, r_gamma = wts, y_predicted = NULL, x_predicted = x_signal, ridge_adj = 0, int = TRUE )
psSignal( y, x_signal, x_index = c(1:ncol(x_signal)), nseg = 10, bdeg = 3, pord = 3, lambda = 1, wts = 1 + 0 * y, family = "gaussian", link = "default", m_binomial = 1 + 0 * y, r_gamma = wts, y_predicted = NULL, x_predicted = x_signal, ridge_adj = 0, int = TRUE )
y |
a (glm) response vector, usually continuous, binomial or count data. |
x_signal |
a matrix of continuous regressor with |
x_index |
a vector to of length |
nseg |
the number of evenly spaced segments between |
bdeg |
the degree of the basis, usually 1, 2, or 3 (defalult). |
pord |
the order of the difference penalty, usually 1, 2, or 3 (defalult). |
lambda |
the (positive) tuning parameter for the penalty (default 1). |
wts |
the weight vector of |
family |
the response distribution, e.g.
|
link |
the link function, one of |
m_binomial |
a vector of binomial trials having length(y); default is 1 vector for |
r_gamma |
a vector of gamma shape parameters. Default is 1 vector for |
y_predicted |
a vector of responses associated
with |
x_predicted |
a matrix of external signals to yield external prediction. |
ridge_adj |
A ridge penalty tuning parameter, which can be set to small value, e.g. |
int |
set to TRUE or FALSE to include intercept term in linear predictor (default TRUE). |
Support functions needed: pspline_fitter
, bbase
and pspline_checker
.
coef |
a vector with |
mu |
a vector with |
eta |
a vector of |
B |
the B-spline basis (for the coefficients), with dimension |
deviance |
the deviance of fit. |
eff_df |
the approximate effective dimension of fit. |
aic |
AIC. |
df_resid |
approximate df residual. |
beta |
a vector of length |
std_beta |
a vector of length |
cv |
leave-one-out standard error prediction, when |
cv_predicted |
standard error prediction for |
nseg |
the number of evenly spaced B-spline segments. |
bdeg |
the degree of B-splines. |
pord |
the order of the difference penalty. |
lambda |
the positive tuning parameter. |
family |
the family of the response. |
link |
the link function. |
y_intercept |
the estimated y-intercept (when |
int |
a logical variable related to use of y-intercept in model. |
dispersion_param |
estimate of dispersion, |
summary_predicted |
inverse link prediction vectors, and twice se bands. |
eta_predicted |
estimated linear predictor of |
press_mu |
leave-one-out prediction of mean, when |
bin_percent_correct |
percent correct classification based on 0.5 cut-off, when |
x_index |
a vector to of length |
Brian Marx
Marx, B.D. and Eilers, P.H.C. (1999). Generalized linear regression for sampled signals and curves: A P-spline approach. Technometrics, 41(1): 1-13.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] fit1 <- psSignal(y, dX, diindex, nseg = 25, bdeg = 3, lambda = 0.0001, pord = 2, family = "gaussian", link = "identity", x_predicted = dX, int = TRUE) plot(fit1, xlab = "Coefficient Index", ylab = "ps Smooth Coeff") title(main = "25 B-spline segments with tuning = 0.0001") names(fit1)
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] fit1 <- psSignal(y, dX, diindex, nseg = 25, bdeg = 3, lambda = 0.0001, pord = 2, family = "gaussian", link = "identity", x_predicted = dX, int = TRUE) plot(fit1, xlab = "Coefficient Index", ylab = "ps Smooth Coeff") title(main = "25 B-spline segments with tuning = 0.0001") names(fit1)
psVCSignal
is used to regress a (glm) response onto a
signal such that the signal coefficients can vary over another covariate t
.
Anisotripic penalization of tensor product B-splines produces a 2D coefficient surface that
can be sliced at t
.
@details Support functions needed: pspline_fitter
, pspline_2dchecker
,
and bbase
.
@import stats
psVCSignal( y, X, x_index, t_var, Pars = rbind(c(min(x_index), max(x_index), 10, 3, 1, 2), c(min(t_var), max(t_var), 10, 3, 1, 2)), family = "gaussian", link = "default", m_binomial = 1 + 0 * y, wts = 1 + 0 * y, r_gamma = 1 + 0 * y, X_pred = X, t_pred = t_var, y_predicted = NULL, ridge_adj = 1e-08, int = TRUE )
psVCSignal( y, X, x_index, t_var, Pars = rbind(c(min(x_index), max(x_index), 10, 3, 1, 2), c(min(t_var), max(t_var), 10, 3, 1, 2)), family = "gaussian", link = "default", m_binomial = 1 + 0 * y, wts = 1 + 0 * y, r_gamma = 1 + 0 * y, X_pred = X, t_pred = t_var, y_predicted = NULL, ridge_adj = 1e-08, int = TRUE )
y |
a glm response vector of length |
X |
a |
x_index |
|
t_var |
|
Pars |
a matrix with 2 rows, each with P-spline parameters:
|
family |
the response distribution, e.g.
|
link |
the link function, one of |
m_binomial |
a vector of binomial trials having |
wts |
a |
r_gamma |
a vector of gamma shape parameters. Default is 1 vector for |
X_pred |
a matrix of signals with |
t_pred |
a vector for the VC indexing variable with length |
y_predicted |
a vector for the responses associated with |
ridge_adj |
a small ridge penalty tuning parameter to regularize estimation (default |
int |
intercept set to TRUE or FALSE for intercept term. |
pcoef |
a vector of length |
summary_predicted |
inverse link prediction vectors, and twice se bands. |
dev |
the deviance of fit. |
eff_dim |
the approximate effective dimension of fit. |
family |
the family of the response. |
link |
the link function. |
aic |
AIC. |
df_resid |
approximate df residual. |
cv |
leave-one-out standard error prediction when |
cv_predicted |
standard error prediction for |
Pars |
design and tuning parameters; see arguments above. |
dispersion_parm |
estimate of dispersion, |
summary_predicted |
inverse link prediction vectors, and twice se bands. |
eta_predicted |
estimated linear predictor of |
press_mu |
leave-one-out prediction of mean when |
bin_percent_correct |
percent correct classification based on 0.5 cut-off when |
Bx |
B-spline basis matrix of dimension |
By |
B-spline basis matrix of dimension |
Q |
Modified tensor basis ( |
yint |
the estimated y-intercept (when |
int |
a logical variable related to use of y-intercept in model. |
Paul Eilers and Brian Marx
Eilers, P.H.C. and Marx, B.D. (2003). Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemometrics and Intellegent Laboratory Systems, 66, 159–174.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat t_var <- as.vector(labc[4, 1:40]) # percent flour oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] t_var = t_var[-oout] Pars = rbind(c(min(diindex), max(diindex), 25, 3, 1e-7, 2), c(min(t_var), max(t_var), 20, 3, 0.0001, 2)) fit1 <- psVCSignal(y, dX, diindex, t_var, Pars = Pars, family = "gaussian", link = "identity", int = TRUE) plot(fit1, xlab = "Coefficient Index", ylab = "VC: % Flour") names(fit1)
library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) # percent fat t_var <- as.vector(labc[4, 1:40]) # percent flour oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] t_var = t_var[-oout] Pars = rbind(c(min(diindex), max(diindex), 25, 3, 1e-7, 2), c(min(t_var), max(t_var), 20, 3, 0.0001, 2)) fit1 <- psVCSignal(y, dX, diindex, t_var, Pars = Pars, family = "gaussian", link = "identity", int = TRUE) plot(fit1, xlab = "Coefficient Index", ylab = "VC: % Flour") names(fit1)
Observations on the widths of red blood cell distributions (RDW).
data(rdw)
data(rdw)
A vector.
Erasmus University Medical Centre, Rotterdam, The Netherlands
data(rdw) hist(rdw, breaks = 20)
data(rdw) hist(rdw, breaks = 20)
Compute the row tensor product of two matrices with identical numbers of rows.
rowtens(X, Y = X)
rowtens(X, Y = X)
X |
a numeric matrix. |
Y |
a numeric matrix (if missing, |
The input matrices must have the same number of rows, say m
. If their numbers of columns are n1
and n2
,
the result is a matrix with m
rows and n1 * n2
columns. Each row of the result is the Kronecker
product of the corresponding rows of X
and Y
.
The row-wise tensor product of the two matrices.
Paul Eilers
Eilers, P. H. C. and Currie, I. D. and Durban, M. (2006) Fast and compact smoothing on large multidimensional grids CSDA 50, 61–76.
Save a plot as a PDF file in a (default) folder. The present default is determined by the folder structure for the production of the book.
save_PDF( fname = "scratch", folder = "../../Graphs", show = T, width = 6, height = 4.5 )
save_PDF( fname = "scratch", folder = "../../Graphs", show = T, width = 6, height = 4.5 )
fname |
the file name without the extension PDF (default: |
folder |
the folder for saving PDF plots (default |
show |
a logical parameter; if TRUE the full file name will be displayed. |
width |
figure width in inches (default = 6). |
height |
figure height in inches (default = 4.5). |
save a plot as a PDF file.
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Adapt margins and axes layout for multiple panels.
set_panels(rows = 1, cols = 1)
set_panels(rows = 1, cols = 1)
rows |
number of rows. |
cols |
number of columns. |
Prepare graphics layout for multiple panels
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
Open a a window for graphics, with specified width and height.
set_window(width = 6, height = 4.5, kill = TRUE, noRStudioGD = TRUE)
set_window(width = 6, height = 4.5, kill = TRUE, noRStudioGD = TRUE)
width |
figure width in inches (default = 6). |
height |
figure height in inches (default = 4.5). |
kill |
if TRUE (default) closes all graphics windows. Works only for Windows. |
noRStudioGD |
if TRUE: do not use the RStudio device (which does not accept width and height). |
open a graphics window.
Currently only works for Windows!
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
sim_psr
is a single-index
signal regression model that estimates both the signal coefficients
vector and the unknown link function using P-splines.
sim_psr( y, X, x_index = c(1:ncol(X)), nsegs = rep(10, 2), bdegs = rep(3, 3), lambdas = rep(1, 2), pords = rep(2, 2), max_iter = 100 )
sim_psr( y, X, x_index = c(1:ncol(X)), nsegs = rep(10, 2), bdegs = rep(3, 3), lambdas = rep(1, 2), pords = rep(2, 2), max_iter = 100 )
y |
a response vector of length |
X |
The signal regressors with dimension |
x_index |
an index of length |
nsegs |
a vector of length 2 containing
the number of evenly spaced segments between min and max, for each
the coefficient vector and the (unknown) link function,
resp. (default |
bdegs |
a vector of length 2 containing
the degree of B-splines, for the coefficient vector and
the (unknown) link function, resp. (default cubic or |
lambdas |
a vector of length 2 containing
the positive tuning parameters, for each
the coefficient vector and the (unknown) link function, resp. (default |
pords |
a vector of length 2 containing
the difference penalty order, for each
the coefficient vector and the (unknown) link function, resp. (default |
max_iter |
a scalar for the maximum number of iterations (default 100). |
y |
the response vector of length |
alpha |
the P-spline coefficient vector of length |
iter |
the number of iterations used for the single-index fit. |
yint |
the estimated y-intercept for the single-index model. |
B |
the B-spline matrix built along the signal index, using |
Q |
the effective regressors from the |
nsegs |
a vector of length 2 containing the number of evenly spaced segments between min and max, for each the coefficient vector and the link function, resp. |
bdegs |
a vector of length 2 containing the degree of B-splines, for each the coefficient vector and the link function, resp. |
lambdas |
a vector of length 2 containing the positive tuning parameters, for each the coefficient vector and the link function, resp. |
pords |
a vector of length 2 containing the difference penalty order, for each the coefficient vector and the link function, resp. |
eta |
the estimated linear predictor for the single-index fit. |
cv |
the leave-one-out cross-validation statistic or the standard error of prediction for the single-index fit. |
delta_alpha |
change measure in signal-coefficent parameters at convervence. |
x_index |
the index of length |
f_fit |
the |
f_eta |
the predicted values of the link function estimated with |
Paul Eilers, Brian Marx, and Bin Li
Eilers, P.H.C., B. Li, B.D. Marx (2009). Multivariate calibration with single-index signal regression, Chemometrics and Intellegent Laboratory Systems, 96(2), 196-202.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] pords <- c(2, 2) nsegs <- c(27, 7) bdegs = c(3, 3) lambdas <- c(1e-6, .1) max_iter <- 100 # Single-index model fit <- sim_psr(y, dX, diindex, nsegs, bdegs, lambdas, pords, max_iter) plot(fit, xlab = "Wavelength (nm)", ylab = " ")
library(JOPS) # Get the data library(fds) data(nirc) iindex <- nirc$x X <- nirc$y sel <- 50:650 # 1200 <= x & x<= 2400 X <- X[sel, ] iindex <- iindex[sel] dX <- diff(X) diindex <- iindex[-1] y <- as.vector(labc[1, 1:40]) oout <- 23 dX <- t(dX[, -oout]) y <- y[-oout] pords <- c(2, 2) nsegs <- c(27, 7) bdegs = c(3, 3) lambdas <- c(1e-6, .1) max_iter <- 100 # Single-index model fit <- sim_psr(y, dX, diindex, nsegs, bdegs, lambdas, pords, max_iter) plot(fit, xlab = "Wavelength (nm)", ylab = " ")
sim_vcpsr
is a varying-coefficient single-index
signal regression approach that allows both the signal coefficients
and the unknown link function to vary with
an indexing variable t
, e.g. temperature. Two surfaces
are estimated (coefficent and link) that can be sliced at arbitary t
.
Anisotripic penalization with P-splines is used on both.
sim_vcpsr( y, X, t_var, x_index = c(1:ncol(X)), nsegs = rep(10, 4), bdegs = rep(3, 4), lambdas = rep(1, 4), pords = rep(2, 4), max_iter = 100, mins = c(min(x_index), min(t_var)), maxs = c(max(x_index), max(t_var)) )
sim_vcpsr( y, X, t_var, x_index = c(1:ncol(X)), nsegs = rep(10, 4), bdegs = rep(3, 4), lambdas = rep(1, 4), pords = rep(2, 4), max_iter = 100, mins = c(min(x_index), min(t_var)), maxs = c(max(x_index), max(t_var)) )
y |
a response vector of length |
X |
the signal regressors with dimension |
t_var |
the varying coeffient indexing variable of length |
x_index |
an index of length |
nsegs |
a vector of length 4 containing
the number of evenly spaced segments between min and max, for each
the coefficient surface (row and col) and
link surface (row and col), resp. (default |
bdegs |
a vector of length 4 containing
the degree of B-splines, for each
the coefficient surface (row and col) and link surface (row and col), resp.
(default cubic |
lambdas |
a vector of length 4 containing
the positive tuning parameters, for each
the coefficient surface (row and col) and link surface (row and col), resp.
(default |
pords |
a vector of length 4 containing
the difference penalty order, for each
the coefficient surface (row and col) and link surface (row and col), resp.
(default |
max_iter |
a scalar for the maximum number of iterations (default 100) |
mins |
A vector length 2, containing min for signal index and |
maxs |
A vector length 2, containing max for signal index and |
y |
the response vector of length |
alpha |
the P-spline coefficient vector (unfolded) of length |
iter |
the number of iterations used for the single-index fit. |
yint |
the estimated y-intercept for the single-index model. |
Bx |
the B-spline matrix built along the signal index, using |
By |
the B-spline matrix built along the |
Q |
the effective regressors from the |
t_var |
the VC indexing variable of length |
nsegs |
a vector of length 4 containing the number of evenly spaced segments between min and max, for each the coefficient surface (row and col) and link surface (row and col). |
bdegs |
a vector of length 4 containing the degree of B-splines, for each the coefficient surface (row and col) and link surface (row and col). |
lambdas |
a vector of length 4 containing the positive tuning parameters, for each the coefficient surface (row and col) and link surface (row and col). |
pords |
a vector of length 4 containing the difference penalty order, for each the coefficient surface (row and col) and link surface (row and col). |
mins |
a vector length 2, containing min for signal index and |
maxs |
a vector length 2, containing max for signal index and |
eta |
the estimated linear predictor for the single-index fit. |
Pars |
a matrix of 2 rows associated with the signal coefficient surface
design parameters, each row: |
pPars |
a matrix of 2 rows associated with the link function
design parameters, each row: |
cv |
the leave-one-out cross-validation statistic or the standard error of prediction for the single-index fit. |
delta_alpha |
change measure in signal-coefficent parameters at convergence. |
fit2D |
|
Paul Eilers and Brian Marx
Marx, B. D. (2015). Varying-coefficient single-index signal regression. Chemometrics and Intelligent Laboratory Systems, 143, 111–121.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
# Load libraries library(fields) # Needed for plotting # Get the data Dat <- Mixture # Dimensions: observations, temperature index, signal m <- 34 p1 <- 401 p2 <- 12 # Stacking mixture data, each mixture has 12 signals stacked # The first differenced spectra are also computed. mixture_data <- matrix(0, nrow = p2 * m, ncol = p1) for (ii in 1:m) { mixture_data[((ii - 1) * p2 + 1):(ii * p2), 1:p1] <- t(as.matrix(Dat$xspectra[ii, , ])) d_mixture_data <- t(diff(t(mixture_data))) } # Response (typo fixed) and index for signal y_mixture <- Dat$fractions y_mixture[17, 3] <- 0.1501 index_mixture <- Dat$wl # Select response and replicated for the 12 temps # Column 1: water; 2: ethanediol; 3: amino-1-propanol y <- as.vector(y_mixture[, 2]) y <- rep(y, each = p2) bdegs = c(3, 3, 3, 3) pords <- c(2, 2, 2, 2) nsegs <- c(12, 5, 5, 5) # Set to c(27, 7, 7 ,7) for given lambdas mins <- c(700, 30) maxs <- c(1100, 70) lambdas <- c(1e-11, 100, 0.5, 1) # based on svcm search x_index <- seq(from = 701, to = 1100, by = 1) # for dX t_var_sub <- c(30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70) t_var <- rep(t_var_sub, m) max_iter <- 2 # Set higher in practice, e.g. 100 int <- TRUE # Defining x as first differenced spectra, number of channels. x <- d_mixture_data # Single-index VC model using optimal tuning fit <- sim_vcpsr(y, x, t_var, x_index, nsegs, bdegs, lambdas, pords, max_iter = max_iter, mins = mins, maxs = maxs) plot(fit, xlab = "Wavelength (nm)", ylab = "Temp C")
# Load libraries library(fields) # Needed for plotting # Get the data Dat <- Mixture # Dimensions: observations, temperature index, signal m <- 34 p1 <- 401 p2 <- 12 # Stacking mixture data, each mixture has 12 signals stacked # The first differenced spectra are also computed. mixture_data <- matrix(0, nrow = p2 * m, ncol = p1) for (ii in 1:m) { mixture_data[((ii - 1) * p2 + 1):(ii * p2), 1:p1] <- t(as.matrix(Dat$xspectra[ii, , ])) d_mixture_data <- t(diff(t(mixture_data))) } # Response (typo fixed) and index for signal y_mixture <- Dat$fractions y_mixture[17, 3] <- 0.1501 index_mixture <- Dat$wl # Select response and replicated for the 12 temps # Column 1: water; 2: ethanediol; 3: amino-1-propanol y <- as.vector(y_mixture[, 2]) y <- rep(y, each = p2) bdegs = c(3, 3, 3, 3) pords <- c(2, 2, 2, 2) nsegs <- c(12, 5, 5, 5) # Set to c(27, 7, 7 ,7) for given lambdas mins <- c(700, 30) maxs <- c(1100, 70) lambdas <- c(1e-11, 100, 0.5, 1) # based on svcm search x_index <- seq(from = 701, to = 1100, by = 1) # for dX t_var_sub <- c(30, 35, 37.5, 40, 45, 47.5, 50, 55, 60, 62.5, 65, 70) t_var <- rep(t_var_sub, m) max_iter <- 2 # Set higher in practice, e.g. 100 int <- TRUE # Defining x as first differenced spectra, number of channels. x <- d_mixture_data # Single-index VC model using optimal tuning fit <- sim_vcpsr(y, x, t_var, x_index, nsegs, bdegs, lambdas, pords, max_iter = max_iter, mins = mins, maxs = maxs) plot(fit, xlab = "Wavelength (nm)", ylab = "Temp C")
Two-dimensional smoothing of scattered data points with tensor product P-splines.
SpATS.nogeno( response, spatial, fixed = NULL, random = NULL, data, family = gaussian(), offset = 0, weights = NULL, control = list(maxit = 100) )
SpATS.nogeno( response, spatial, fixed = NULL, random = NULL, data, family = gaussian(), offset = 0, weights = NULL, control = list(maxit = 100) )
response |
a character string with the name of the variable that contains the response variable of interest. |
spatial |
a right hand |
fixed |
an optional right hand |
random |
an optional right hand |
data |
a data frame containing the variables. |
family |
object of class |
offset |
an optional numerical vector containing an a priori known component to be included in the linear predictor during fitting. |
weights |
an optional numerical vector of weights to be used in the fitting process. By default, the weights are considered to be one. |
control |
a list of control values. |
This function is a modified version of the function SpATS
in the package SpATS
. The difference is that genotypes have been removed.
A list with the following components:
call |
the matched call. |
data |
the original supplied data argument with a new column with the weights used during the fitting process. |
model |
a list with the model components: response, spatial, fixed and/or random. |
fitted |
a numeric vector with the fitted values. |
residuals |
a numeric vector with deviance residuals. |
psi |
a two-length vector with the values of the dispersion parameters at convergence. For Gaussian responses both elements coincide, being the (REML) estimate of dispersion parameter. For non-Gaussian responses, the result depends on the argument |
var.comp |
a numeric vector with the (REML) variance component estimates. This vector contains the variance components associated with the spatial trend, as well as those related with the random model terms. |
eff.dim |
a numeric vector with the estimated effective dimension (or effective degrees of freedom) for each model component (spatial, fixed and/or random). |
dim |
a numeric vector with the (model) dimension of each model component (spatial, fixed and/or random). This value corresponds to the number of parameters to be estimated. |
dim.nom |
a numeric vector with the (nominal) dimension of each component (spatial, fixed and/or random). For the random terms of the model, this value corresponds to upper bound for the effective dimension (i.e., the maximum effective dimension a random term can achive). This nominal dimension is |
nobs |
number of observations used to fit the model. |
niterations |
number of iterations EM-algorithm. |
deviance |
the (REML) deviance at convergence (i.e., |
coeff |
a numeric vector with the estimated fixed and random effect coefficients. |
terms |
a list with the model terms: response, spatial, fixed and/or random. The information provided here is useful for printing and prediction purposes. |
vcov |
inverse of the coefficient matrix of the mixed models equations. The inverse is needed for the computation of standard errors. For computational issues, the inverse is returned as a list: C22_inv corresponds to the coefficient matrix associated with the spatial, the fixed and the random components. |
Maria-Xose Rodriguez-Alvarez and Paul Eilers
Rodriguez-Alvarez, M.X, Boer, M.P., van Eeuwijk, F.A., and Eilers, P.H.C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics, 23, 52 - 71. https://doi.org/10.1016/j.spasta.2017.10.003.
# Get the data library(SemiPar) data(ethanol) # Fit the PS-ANOVA model ps2d <- SpATS.nogeno(response = "NOx", spatial = ~PSANOVA(E, C, nseg = c(20, 20), nest.div = c(2, 2)), data = ethanol, control = list(maxit = 100, tolerance = 1e-05, monitoring = 0, update.psi = FALSE)) # Report effective dimensions, if desired # print(summary(ps2d)) # Compute component surface and their sum on a fine grid Tr = obtain.spatialtrend(ps2d, grid = c(100, 100)) # Plot surface and contours image(Tr$row.p, Tr$col.p, Tr$fit, col = terrain.colors(100), xlab = 'C', ylab = 'E') contour(Tr$row.p, Tr$col.p, Tr$fit, add = TRUE, col = 'blue') points(ethanol$C, ethanol$E, pch = '+')
# Get the data library(SemiPar) data(ethanol) # Fit the PS-ANOVA model ps2d <- SpATS.nogeno(response = "NOx", spatial = ~PSANOVA(E, C, nseg = c(20, 20), nest.div = c(2, 2)), data = ethanol, control = list(maxit = 100, tolerance = 1e-05, monitoring = 0, update.psi = FALSE)) # Report effective dimensions, if desired # print(summary(ps2d)) # Compute component surface and their sum on a fine grid Tr = obtain.spatialtrend(ps2d, grid = c(100, 100)) # Plot surface and contours image(Tr$row.p, Tr$col.p, Tr$fit, col = terrain.colors(100), xlab = 'C', ylab = 'E') contour(Tr$row.p, Tr$col.p, Tr$fit, add = TRUE, col = 'blue') points(ethanol$C, ethanol$E, pch = '+')
Constructs a sparse B-spline basis on evenly spaced knots.
spbase(x, xl = min(x), xr = max(x), nseg = 10, bdeg = 3)
spbase(x, xl = min(x), xr = max(x), nseg = 10, bdeg = 3)
x |
a vector of argument values, at which the B-spline basis functions are to be evaluated. |
xl |
the lower limit of the domain of |
xr |
the upper limit of the domain of |
nseg |
the number of evenly spaced segments between |
bdeg |
the degree of the basis, usually 1, 2, or 3 (default). |
A sparse matrix (in spam
format) with length(x)
of rows= and nseg + bdeg
columns.
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with comments and rejoinder), Statistical Science, 11: 89-121.
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Basis on grid x = seq(0, 4, length = 1000) B = spbase(x, 0, 4, nseg = 50, bdeg = 3) nb1 = ncol(B) matplot(x, B, type = 'l', lty = 1, lwd = 1, xlab = 'x', ylab = '') cat('Dimensions of B:', nrow(B), 'by', ncol(B), 'with', length(B@entries), 'non-zero elements' )
library(JOPS) # Basis on grid x = seq(0, 4, length = 1000) B = spbase(x, 0, 4, nseg = 50, bdeg = 3) nb1 = ncol(B) matplot(x, B, type = 'l', lty = 1, lwd = 1, xlab = 'x', ylab = '') cat('Dimensions of B:', nrow(B), 'by', ncol(B), 'with', length(B@entries), 'non-zero elements' )
Sugar was sampled continuously during eight hours to make a mean sample representative for one "shift" (eight hour period). Samples were taken during the three months of operation (the so-called campaign) in late autumn from a sugar plant in Scandinavia giving a total of 268 samples. The sugar was sampled directly from the final unit operation (centrifuge) of the process.
data(Sugar)
data(Sugar)
A list consisting of the following:
y
a 268 x 3 matrix of quality parameters: date
, color
, ash
*1000
X
fluoresence array, 268 (observations) x [571 (emission channels) x 7 (excitation channels)]
Lab
Lab information
DimX
array dimension for X
Yidx
names (id) for y
EmAx
Emmission levels for axis (nm)
ExAx
Excitation levels for axis (nm)
time
readmetime
Lname
LabNumber
ProcNumber
Proc
DimLab
DimProc
https://ucphchemometrics.com/sugar-process-data/
R. Bro, Exploratory study of sugar production using fluorescence spectroscopy and multi-way analysis, Chemom. Intell. Lab. Syst., 1999, (46), 133-147.
The dataset comprises lengths (in days) of psychiatric treatment spells for patients used as controls in a study of suicide risks.
data(Suicide)
data(Suicide)
A dataframe with one column: y
.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. Chapman & Hall.
Compute a truncated power function.
tpower(x, knot, p)
tpower(x, knot, p)
x |
a vector on which the basis is calculated. |
knot |
a scalar giving the truncation point. |
p |
a scalar power for the basis, e.g. |
a vector with the truncated power function.
Paul Eilers
Eilers, P.H.C. and Marx, B.D. (2021). Practical Smoothing, The Joys of P-splines. Cambridge University Press.
library(JOPS) # Basis on grid x = seq(0, 4, length = 500) knots = 0:3 Y = outer(x, knots, tpower, 1) matplot(x, Y, type ='l', lwd = 2, xlab = 'x', ylab = '', main ='Linear TPF basis')
library(JOPS) # Basis on grid x = seq(0, 4, length = 500) knots = 0:3 Y = outer(x, knots, tpower, 1) matplot(x, Y, type ='l', lwd = 2, xlab = 'x', ylab = '', main ='Linear TPF basis')
Brightness of a variable star.
data(Varstar)
data(Varstar)
A dataframe with eleven columns (V1-V11
):
V1
day index
V2
brightness
V3-V11
Paul Eilers, personal communication.
Paul Eilers (personal communication).
Profile of a sanded piece of wood.
data(Woodsurf)
data(Woodsurf)
A data frame with one column: y
.
Pandit, S.M. and Wu, S.M. (1993). Time Series and System Analysis with Applications. Krieger Publishing Company.