Skip to main content

Advertisement

Log in

Semi-supervised approach to event time annotation using longitudinal electronic health records

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Large clinical datasets derived from insurance claims and electronic health record (EHR) systems are valuable sources for precision medicine research. These datasets can be used to develop models for personalized prediction of risk or treatment response. Efficiently deriving prediction models using real world data, however, faces practical and methodological challenges. Precise information on important clinical outcomes such as time to cancer progression are not readily available in these databases. The true clinical event times typically cannot be approximated well based on simple extracts of billing or procedure codes. Whereas, annotating event times manually is time and resource prohibitive. In this paper, we propose a two-step semi-supervised multi-modal automated time annotation (MATA) method leveraging multi-dimensional longitudinal EHR encounter records. In step I, we employ a functional principal component analysis approach to estimate the underlying intensity functions based on observed point processes from the unlabeled patients. In step II, we fit a penalized proportional odds model to the event time outcomes with features derived in step I in the labeled data where the non-parametric baseline function is approximated using B-splines. Under regularity conditions, the resulting estimator of the feature effect vector is shown as root-n consistent. We demonstrate the superiority of our approach relative to existing approaches through simulations and a real data example on annotating lung cancer recurrence in an EHR cohort of lung cancer patients from Veteran Health Administration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianxi Cai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

In Appendix A, we present additional simulation studies with Gamma intensities, as well as extra information on the simulation settings. In Appendix B, we offer additional details on the data example of lung cancer recurrence with VACCR data. In Appendix C, we provide the theoretical properties for the derived features. In Appendix D, we provide the theoretical properties for the MATA estimator based on the proportional odds model. In Appendix F, we provide the detailed algorithm for optimization of the log-likelihood \(l_n\).

Appendix A additional simulation details

1.1 A1 simulation settings for the gaussian intensities

We first simulate Gaussian shape density, i.e., \(f_i^{\scriptscriptstyle [j]}\) is the density function of \(\hbox {Normal}(\mu _{ij},\sigma _{ij}^2)\) truncated at 0.

Set \(\mu _{ij}\) to be \({F_j}^{-1}\{\varPhi (\nu _{ij})\}\), \(F_j\) is the CDF of \(\hbox {Gamma}(k_{1j},\theta _{1j})\), with \(k_{1j}\sim \hbox {Uniform}(3,6)\) and \(\theta _{1j}\sim \hbox {Uniform}(2,3)\) for \(j=1,\cdots ,q\), and \({\varvec{\nu }}_i = (\nu _{i1},\cdots ,\nu _{iq})^\mathsf{\scriptscriptstyle T}\sim \hbox {MNormal}(\mathbf{0},\varSigma _{{\varvec{\nu }}})\), i.e., the multivariate normal distribution with mean \(\mathbf{0}\) and variance \(\varSigma _{{\varvec{\nu }}}\). For simplicity, we set \(\varSigma _{\varvec{\nu }}=\varSigma _{\varvec{\iota }}\). We further set \(\mu _{ij}\) to be one if it is less than one.

Simulate \(\sigma _{ij}\sim \hbox {Uniform}(0.5,s_j)\) with \(s_j=\mathrm{min}\{0.9\mu _{ij}, {F_j}^{-1}(0.5)\}\), where \(F_j\) is the CDF of \(\hbox {Gamma}(k_{1j},\theta _{1j})\). The way we simulate \(\mu _{ij}\) and \(\sigma _{ij}\) guarantees that the largest change in the intensity functions only occurs after patients enter the study, i.e., \(\mu _{ij}-\sigma _{ij}>0\), as expected in practice. Besides, the simulated \(\sigma _{ij}\) is not only controlled by the value of \(\mu _{ij}\) but also the median of \(\hbox {Gamma}(k_{1j},\theta _{1j})\). Thus \(\sigma _{ij}\) will not get too extreme even with a large peak time \(\mu _{ij}\). In other words, the corresponding largest change in the intensity function \(\mu _{ij}-\sigma _{ij}\) is more likely to occur near the peak time \(\mu _{ij}\) than much earlier than \(\mu _{ij}\) .

Finally, we set \(\alpha _c\), the constant in the nonparametric function \(\alpha (t)\), to be 7.5 and 1.1 to obtain an approximately \(30\%\) and \(70\%\) censoring rate.

1.2 A2 Simulation settings for the gamma intensity functions

We also consider gamma shape density, i.e., \(f_i^{\scriptscriptstyle [j]}(t)\) is the density function of \(\hbox {Gamma}(k_{ij},\theta _{ij})\), truncated at 0. We let \(f_i^{\scriptscriptstyle [j]}(t)\) be the density function of \(\hbox {Gamma}(k_{ij},\theta _{ij})\), truncated at 0. Set \(k_{ij}=F_j^{-1}\{\varPhi (\nu _{ij})\}\), where \(F_j\) is the CDF of \(\hbox {Uniform}(k_{\ell ,j},k_{u,j})\), with \(k_{\ell ,j}\sim \hbox {Uniform}(2,4)\), and \(k_{u,j}\sim \hbox {Uniform}(4,6)\), and \({\varvec{\nu }}_i=(\nu _{i1},\cdots ,\nu _{iq})^\mathsf{\scriptscriptstyle T}\sim \hbox {MNormal}(\mathbf{0},\varSigma _{{\varvec{\nu }}})\). Generate \(\theta _{ij}\) from Gamma \((a_{j},b_{j})\) truncated at its third quartile with \(a_j\sim \hbox {Uniform}(3,6)\), and \(b_j\sim \hbox {Uniform}(2,4)\). We set \(\alpha _c=6.8\) and 1.9 to obtain the approximate 30% and 70% censoring rates.

1.3 A3 Results for gamma intensity setting

For the true feature sets, we reported the bias and standard error (se) of the non-zero coefficients, i.e., \({\varvec{\beta }}_1=(\beta _{11},\beta _{12})^\mathsf{\scriptscriptstyle T}\), from MATA and NPMLE in Table 5. Similar to the Gaussian intensities settings, we find that the MATA procedure performs well with small sample size regardless of the censoring rate, the correlation structure between groups of encounters, and the family of the intensity curves. MASA generally leads to both smaller bias and smaller standard error compared to the NPMLE. In the extreme case when \(n=200\) and the censoring rate reaches 70%, both estimators deteriorate. However, the resulting 95% confidence interval of MATA covers the truth as the absolute bias is less than 1.96 times standard error. In contrast, the NPMLE tends to be numerically unstable. We observe the estimation bias of NPMLE for \(n=400\) setting is larger than its own standard error and the the bias at \(n=200\) setting. These results is consistent with Theorem 2.

Table 5 Displayed are the bias and standard error of the estimation on \({\varvec{\beta }}_1=(\beta _{11},\beta _{12})^\mathsf{\scriptscriptstyle T}\) fitted with the true features from 400 simulations each with \(N+n=4,000\) and \(n=200\) and 400. Two methods, MATA and NPMLE, are contrasted. Panels from the top to bottom are Gamma intensities with the subject-specific follow-up duration under 30% and 70% censoring rate as discussed in Sect. A1. The results under independent groups of encounters are shown on the left whereas the results for correlated one are shown on the right

For both true and estimated feature sets, we computed the out-of-sample accuracy measures discussed in Sect. 2.3 on a validation data set. All other accuracy measures, i.e., Kendall’s-\(\tau \) type rank correlation summary measures \({\mathscr {C}}_{u}, {\mathscr {C}}_{u}^+\), and absolute prediction error \(\text{ APE}_u\) depend on u, which is easy to control for MATA and NPMLE but not for Tree and Logi. We therefore minimize the cross-validation error for the Tree approach and minimize the misclassification rate for the Logi approach at their first step, i.e., classifying the censoring status \(\varDelta \). For MATA and NPMLE, We calculate these accuracy measures at \(u=0.02\ell \) for \(\ell =0,1,\cdots ,50\) and pick the u with minimum \(\text{ APE}_u\). We then compare these measures at the selected u with Tree and Logi methods in Tables 6 and 7.

Table 6 True features, Gamma. Kendall’s-\(\tau \) type rank correlation summary measures (\({\mathscr {C}}\) and \({\mathscr {C}}^+\)), and absolute prediction error (\(\text{ APE }\)) are computed from four methods, MATA, NPMLE, Tree, and Logi, under \(q=10\) Gamma intensities over 400 simulations each with \(n+N=4,000\) and \(n=200\) or 400
Table 7 Estimated features, Gamma. Kendall’s-\(\tau \) type rank correlation summary measures (\({\mathscr {C}}\) and \({\mathscr {C}}^+\)), and absolute prediction error (\(\text{ APE }\)) are computed from four methods, MATA, NPMLE, Tree, and Logi, under \(q=10\) Gamma intensities over 400 simulations each with \(n+N=4,000\) and \(n=200\) or 400

Similar to the Gaussian intensities setting, the performance of the MATA estimator when fitted with the true features largely dominates that of NPMLE, Tree, and Logi, with higher \({\mathscr {C}}, {\mathscr {C}}^+\) and lower \(\text{ APE }\) in all cases except when the encounters are simulated from independent Gamma counting processes with 30% censoring rate. In this exceptional case, our MATA estimator has very minor advantage in \({\mathscr {C}}^+\) compared to NPMLE, and is still better in terms of \({\mathscr {C}}\) and \(\text{ APE }\). When fitted with the estimated features, there is no clear winner among the four methods when the labeled data size is \(n=200\); however, when the labeled data size increased to \(n=400\), MATA generally outperforms the other three approaches in terms of \(\text{ APE }\).

Supplementary Results on Simulations:

We show the sparsity in the simulated data in Tables 8. We show the Average Model Size and MSE of Estimation in Table 9.

Table 8 Estimated probability of having zero or \(\le 3\) encounter arrival times under each counting process \({\mathcal {N}}^{\scriptscriptstyle [j]}\) for \(j=1,\cdots ,10\) from a simulation with sample size 500, 000
Table 9 Average model sizes selected by MATA

Appendix B Additional details on data example

We show the sparsity of features in Table 10. The radiotherapy, medication for systematic therapies and biopsy/excision has a zero code rate of 77.8%, 70.7% and 96.4%, respectively. Consequently, the estimated peak and largest increase time of these features are identical as the associated first occurrence time for most patients. Thus, only the first occurrence time and the total number of diagnosis and procedure code are considered for these features.

Table 10 Sparsity of the nine groups of medical encounter data analyzed in Sect. 5

We show the MATA and NPMLE coefficients for \(n=1000, 400, 200\) in Tables 1113. Similar as in Sect. 4, our MATA estimator has smaller bootstrap standard error compared to the NPMLE. For the analysis with \(n=1000\), both MATA and NPMLE showed a significant impact of first arrival time and peak time of lung cancer code, first arrival time and first FPCA score of chemotherapy code, first arrival time of radiotherapy code, total number of secondary malignant neoplasm code, peak and change point times of palliative or hospice care in medical notes, first FPCA score and total number of recurrence in medical notes and first arrival time of biopsy or excision. MATA additionally finds the change point time of lung cancer code to have strong association with high risk of lung cancer recurrence. Furthermore, MATA excludes the stage II cancer, which coincides with the large p-values on those four group of encounters under NPMLE. For the analyses with \(n=200\) and \(n=400\), MATA excludes cancer stage, age at diagnosis and medication for systematic therapies, which coincides with the groups without any significant feature from the \(n=1000\) NPMLE analysis.

Table 11 Analysis with \(n=1000\). Estimated coefficient (“est”), bootstrap standard error (“boot.se”), and p-value (“pval”) over 400 bootstraps for the extracted feature sets, including first code time (1stCode), peak time (Pk), change point time (ChP), first FPC score (1stScore), and log of total number of codes (logN), from the nine groups of medical encounter data in 5
Table 12 Analysis with \(n=400\). Estimated coefficient (“est”), bootstrap standard error (“boot.se”), and p-value (“pval”) over 400 bootstraps for the extracted feature sets, including first code time (1stCode), peak time (Pk), change point time (ChP), first FPC score (1stScore), and log of total number of codes (logN), from the nine groups of medical encounter data in 5
Table 13 Analysis with \(n=200\). Estimated coefficient (“est”), bootstrap standard error (“boot.se”), and p-value (“pval”) over 400 bootstraps for the extracted feature sets, including first code time (1stCode), peak time (Pk), change point time (ChP), first FPC score (1stScore), and log of total number of codes (logN), from the nine groups of medical encounter data in 5

Appendix C Convergence rate of derived features

Instead of deriving asymptotic properties for truncated density \(f_{C_i}\), i.e., random density \(f_i\) truncated on \([0,C_i]\), we focus on the scaled densities \(f_{C_i,\mathrm{scaled}}\), which is \(f_{C_i}\) scaled to [0, 1]. As we assume censoring time \(C_i\) has finite support \([0,{{{\mathcal {E}}}}]\) with \({{{\mathcal {E}}}}<\infty \), \(f_{C_i,\mathrm{scaled}}\) and \(f_{C_i}\) shared the same asymptotic properties.

Let \(f^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t)=E\{f_{C,\mathrm{scaled}}^{\scriptscriptstyle [j]}(t)\}\) and \(G_{\mathrm{scaled}}^{\scriptscriptstyle [j]}(t,s) = \mathrm{cov}\{f_{C,\mathrm{scaled}}^{\scriptscriptstyle [j]}(t),f_{C,\mathrm{scaled}}^{\scriptscriptstyle [j]}(s)\}\). The Karhunen-Loève theorem (Stark and Woods 1986) states

$$\begin{aligned} f_{C,\mathrm{scaled}}^{\scriptscriptstyle [j]}(t)= f^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t)+\sum _{k=1}^{\infty }\zeta ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}\phi ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t), \mathrm{~ for~} t\in [0,1], \end{aligned}$$

where \(\{\phi ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t)\}\) are the orthonormal eigenfunctions of \(G^{\scriptscriptstyle [j]}_{\mathrm{scaled}}(t,s)\), \(\{\zeta ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}\}\) are pairwise uncorrelated random variables with mean 0 and variance \(\lambda _{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}\), and \(\{\lambda _{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}\}\) are eigenvalues of \(G^{\scriptscriptstyle [j]}_{\mathrm{scaled}}(t,s)\).

For the i-th patient, conditional on \(f^{\scriptscriptstyle [j]}_{C_i}(t)\), and \(M_i^{\scriptscriptstyle [j]}= {\mathcal {N}}^{\scriptscriptstyle [j]}([0,C_i])\), the observed event times \(t_{i1}^{\scriptscriptstyle [j]},\cdots , t_{iM_i^{\scriptscriptstyle [j]}}^{\scriptscriptstyle [j]}\) are assumed to be generated as an i.i.d. sample \(t_{ij}^{\scriptscriptstyle [j]}\overset{\mathrm{iid}}{\sim } f_{C_i}^{\scriptscriptstyle [j]}(t)\). Equivalently, the scaled observed event times \(t_{i1}^{\scriptscriptstyle [j]}/C_i,\cdots , t_{iM_i^{\scriptscriptstyle [j]}}^{\scriptscriptstyle [j]}/C_i\overset{\mathrm{iid}}{\sim } f_{C_i,\mathrm{scaled}}^{\scriptscriptstyle [j]}(t)\). Following Wu et al. (2013), we estimate \(f^{\scriptscriptstyle [j]}_{\mu }(t)\) and \(G^{\scriptscriptstyle [j]}(t,s)\), which are the mean and covariance functions of scaled density \(f_{C, \mathrm{scaled}}^{\scriptscriptstyle [j]}(t)\) respectively, as

$$\begin{aligned} {\widehat{f}}^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t)= & {} (M^{\scriptscriptstyle [j]}_{\scriptscriptstyle \mathsf +})^{-1}\sum _{i=1}^{n+N}\sum _{\ell =1}^{M^{\scriptscriptstyle [j]}_i} \kappa _{\scriptscriptstyle \mu }^{\scriptscriptstyle h_{\mu }^{\scriptscriptstyle [j]}}(t-t_{i\ell }^{\scriptscriptstyle [j]}/C_i);\\ {\widehat{G}}^{\scriptscriptstyle [j]}_{\mathrm{scaled}}(t,s)= & {} {\widehat{g}}_{\mathrm{scaled}}^{\scriptscriptstyle [j]}(t,s)-\widehat{f}^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t){\widehat{f}}^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(s), \end{aligned}$$

for \(t,s\in [0,1]\), where

$$\begin{aligned} {\widehat{g}}_{\mathrm{scaled}}^{\scriptscriptstyle [j]}(t,s)= & {} (M^{\scriptscriptstyle [j]}_{\scriptscriptstyle \mathsf ++})^{-1} \sum _{i=1}^{n+N}I(M^{\scriptscriptstyle [j]}_i\ge 2) \sum _{1 \le \ell \ne k \le M_{i}^{\scriptscriptstyle [j]}} \kappa _{\scriptscriptstyle G}^{\scriptscriptstyle h_{g}^{\scriptscriptstyle [j]}}\left( t-t^{\scriptscriptstyle [j]}_{i\ell }/C_i,s-t^{\scriptscriptstyle [j]}_{ik}/C_i\right) . \end{aligned}$$

Here \(M^{\scriptscriptstyle [j]}_{\scriptscriptstyle \mathsf +}=\sum _{i=1}^{n+N} M^{\scriptscriptstyle [j]}_i\) is the total number of encounters. \(M^{\scriptscriptstyle [j]}_{\scriptscriptstyle \mathsf ++}=\sum _{i=1,M^{\scriptscriptstyle [j]}_i\ge 2}^{n+N} M^{\scriptscriptstyle [j]}_i(M^{\scriptscriptstyle [j]}_i-1)\) is the total number of pairs. \(\kappa _{\scriptscriptstyle \mu }\) and \(\kappa _{\scriptscriptstyle G}\) are symmetric univariate and bivariate probability density functions, respectively, with \(\kappa _{\scriptscriptstyle \mu }^h(x) = \kappa _{\scriptscriptstyle \mu }(x/h)/h\), \(\kappa _{\scriptscriptstyle G}^h(x_1,x_2) = \kappa _{\scriptscriptstyle G}(x_1/h, x_2/h)/h^{2}\). \(h_{\mu }^{\scriptscriptstyle [j]}\) and \(h_g^{\scriptscriptstyle [j]}\) are bandwidth parameters.

The estimates of eigenfunctions and eigenvalues, denoted by \({\widehat{\phi }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}(x)\) and \({\widehat{\lambda }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}\) respectively, are solutions to

$$\begin{aligned} \int _0^1 \widehat{G}_{\mathrm{scaled}}^{\scriptscriptstyle [j]}(s,t){\widehat{\phi }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}(s)ds = {\widehat{\lambda }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}{\widehat{\phi }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}(t), \end{aligned}$$

with constraints \(\int _0^1 {\widehat{\phi }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}(s)^2ds=1\) and \(\int _0^1 {\widehat{\phi }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}(s){\widehat{\phi }}_{\ell ,\mathrm{scaled}}^{\scriptscriptstyle [j]}(s)ds=0\). One can obtain estimated eigenfunctions \({\widehat{\phi }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}(x)\) and eigenvalues \({\widehat{\lambda }}_{k,\mathrm{scaled}}^{\scriptscriptstyle [j]}\) by numerical spectral decomposition on a properly discretized version of the smooth covariance function \({\widehat{G}}_{\mathrm{scaled}}^{\scriptscriptstyle [j]}(t,s)\) (Rice and Silverman 1991; Capra and Müller 1997). Subsequently, we estimate

$$\begin{aligned} \zeta ^{\scriptscriptstyle [j]}_{ik,\mathrm{scaled}}= & {} \int \{f^{\scriptscriptstyle [j]}_{C_i,\mathrm{scaled}}(t)-f^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t)\}\phi ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t)dt, \end{aligned}$$

by

$$\begin{aligned} {\widehat{\zeta }}^{\scriptscriptstyle [j]}_{ik,\mathrm{scaled}}=\frac{1}{M^{\scriptscriptstyle [j]}_i}\sum _{\ell =1}^{M^{\scriptscriptstyle [j]}_i}{\widehat{\phi }}^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t_{i\ell }^{\scriptscriptstyle [j]}/C_i)-\int {\widehat{f}}^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t){\widehat{\phi }}^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t)dt. \end{aligned}$$

Let \({\widetilde{\zeta }}^{\scriptscriptstyle [j]}_{ik,\mathrm{scaled}} =( M^{\scriptscriptstyle [j]}_i)^{-1}\sum _{\ell =1}^{M^{\scriptscriptstyle [j]}_i}\phi ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t_{i\ell }^{\scriptscriptstyle [j]}/C_i)-\int f^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t)\phi ^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t)dt\) be the population counterpart of \({\widehat{\zeta }}^{\scriptscriptstyle [j]}_{ik,\mathrm{scaled}}\) constructed with true eigenfunctions. We show in Lemma A3 that \(\mathrm{max}_i |{\widehat{\zeta }}_{ik,\mathrm{scaled}}-{\widetilde{\zeta }}_{ik,\mathrm{scaled}}|\) goes to zero at any k as long as \(Nh_\mu ^2\rightarrow \infty \) and \(Nh_g^4\rightarrow \infty \).

We then estimate the scaled density \(f^{\scriptscriptstyle [j]}_{C_i,\mathrm{scaled}}(t)\) as

$$\begin{aligned} \textstyle {\widehat{f}}^{\scriptscriptstyle [j]}_{iK,\mathrm{scaled}}(t)= \mathrm{max}\left\{ 0, {\widehat{f}}^{\scriptscriptstyle [j]}_{\mu ,\mathrm{scaled}}(t)+\sum _{k=1}^{K^{\scriptscriptstyle [j]}}{\widehat{\zeta }}^{\scriptscriptstyle [j]}_{ik,\mathrm{scaled}}{\widehat{\phi }}^{\scriptscriptstyle [j]}_{k,\mathrm{scaled}}(t) \right\} , \end{aligned}$$

and the truncated density \(f^{\scriptscriptstyle [j]}_{C_i}(t)\) as

$$\begin{aligned} \widehat{f}^{\scriptscriptstyle [j]}_{iK}(t)= {\widehat{f}}^{\scriptscriptstyle [j]}_{iK,\mathrm{scaled}}(t/C_i)/ \int _0^{C_i}\widehat{f}^{\scriptscriptstyle [j]}_{iK,\mathrm{scaled}}(t/C_i) dt. \end{aligned}$$

For the i-th patient and its j-th point process \({\mathcal {N}}_i^{\scriptscriptstyle [j]}\), we only observe one realization of its expected number of encounters on \([0,C_i]\), i.e., \(M_i={\mathcal {N}}_i^{\scriptscriptstyle [j]}([0,C_i])\). Following Wu et al. (2013), we approximate the expected numbers of encounters with observed encounters, and estimate \(\lambda _i(t)\) as \({\widehat{\lambda }}_i^{\scriptscriptstyle [j]}(t)=M_i{\widehat{f}}_{iK}^{\scriptscriptstyle [j]}(t)\), for \(t\in [0,C_i]\). We further estimate the derived feature \(\mathbf{W}_i^{\scriptscriptstyle [j]}\) as \({\widehat{\mathbf{W}}}_i^{\scriptscriptstyle [j]}= {{\mathcal {G}}}\circ {\widehat{f}}_{iK}^{\scriptscriptstyle [j]}\).

For notation simplicity in the proof, we drop the superscript \(^{\scriptscriptstyle [j]}\), the index for the j-th counting process, for \(j=1,\cdots ,q\) throughout the appendix.

Derivative of the Mean and Covariance Functions:

Nonparametric estimation of the mean and covariance function on the scaled densities are

$$\begin{aligned} {\widehat{f}}_{\mu ,\mathrm{scaled}}(t)= & {} (M_{\scriptscriptstyle \mathsf +})^{-1}\sum _{i=1}^{n+N}\sum _{\ell =1}^{M_i}\kappa _{\scriptscriptstyle \mu }^{h_\mu }(t-t_{i\ell }/C_i);\\ {\widehat{G}}_{\mathrm{scaled}}(t,s)= & {} {\widehat{g}}_{\mathrm{scaled}}(t,s)-\widehat{f}_{\mu ,\mathrm{scaled}}(t){\widehat{f}}_{\mu ,\mathrm{scaled}}(s), \end{aligned}$$

for \(t,s\in [0,1]\), where

$$\begin{aligned} {\widehat{g}}_{\mathrm{scaled}}(t,s)= & {} (M_{\scriptscriptstyle \mathsf ++})^{-1} (h_{g})^2 \sum _{i=1}^{n+N}I(M_i\ge 2) \sum _{1 \le \ell \ne k \le M_{i} } \kappa _{\scriptscriptstyle G}^{h_g}\left( t-t_{i\ell }/C_i,s-t_{ik}/C_i\right) . \end{aligned}$$

Here \(M_{\scriptscriptstyle \mathsf +}=\sum _{i=1}^{n+N} M_i\) is the total number of encounters. \(M_{\scriptscriptstyle \mathsf ++}=\sum _{i=1,M_i\ge 2}^{n+N} M_i(M_i-1)\) is the total number of pairs. \(\kappa _{\scriptscriptstyle \mu }\) and \(\kappa _{\scriptscriptstyle G}\) are symmetric univariate and bivariate probability density functions, respectively, with \(\kappa _{\scriptscriptstyle \mu }^h(x) = \kappa _{\scriptscriptstyle \mu }(x/h)/h\), \(\kappa _{\scriptscriptstyle G}^h(x_1,x_2) = \kappa _{\scriptscriptstyle G}(x_1/h, x_2/h)/h^{2}\). \(h_{\mu }\) and \(h_g\) are bandwidth parameters.

Their derivatives are

$$\begin{aligned} {{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}}(t)= & {} \frac{1}{M_{\scriptscriptstyle \mathsf +}(h_\mu )^2}\sum _{i=1}^{n+N}\sum _{\ell =1}^{M_i}\kappa _1'\left( \frac{t-t_{i\ell }/C_i}{h_\mu }\right) ,\\ {{\widehat{G}}_{\mathrm{scaled}}}{}^{(0,1)}(t,s)= & {} {{\widehat{g}}_{\mathrm{scaled}}}{}^{(0,1)}(t,s)-{\widehat{f}}_{\mu ,\mathrm{scaled}}(t){{\widehat{f}}'_{\mu ,\mathrm{scaled}}}(s),\\ {{\widehat{G}}_{\mathrm{scaled}}}{}^{(1,0)}(t,s)= & {} {{\widehat{g}}_{\mathrm{scaled}}}{}^{(1,0)}(t,s)-{{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}}(t)\widehat{f}_{\mu ,\mathrm{scaled}}(s),\\ \end{aligned}$$

with

$$\begin{aligned} {{\widehat{g}}_{\mathrm{scaled}}}{}^{(\nu ,u)}(t,s)= & {} \frac{1}{M_{\scriptscriptstyle \mathsf ++}{h_g}^3}\sum _{i=1,M_i\ge 2}^{n+N}\sum _{\ell =1}^{M_i}\sum _{k=1,k\ne j}^{M_i} \kappa _2^{(\nu ,u)}\left( \frac{t-t_{i\ell }/C_i}{h_g},\frac{s-t_{ik}/C_i}{h_g}\right) , \end{aligned}$$

for \(\nu =0,u=1\) and \(\nu =1,u=0\), where for an arbitrary bivariate function h, \(h^{(\nu ,u)}(x,y)=\partial ^{\nu +u} G(x,y)/\partial ^{\nu } x\partial ^u y.\)

Assume the following regularity conditions holds.

  1. (A1)

    Scaled random densities \(f_{C_i,\mathrm{scaled}}\), its mean density \(f_{\mu ,\mathrm{scaled}}\), covariance function \(g_{\mathrm{scaled}}\) and eigenfunctions \(\phi _{k,\mathrm{scaled}}(x)\) are thrice continuously differentiable.

  2. (A2)

    \(f_{C_i,\mathrm{scaled}}\), \(f_{\mu ,\mathrm{scaled}}\) and their first three derivatives are bounded, where the bounds hold uniformly across the set of random densities.

  3. (A3)

    \(\kappa _1(\cdot )\) and \(\kappa _2(\cdot ,\cdot )\) are symmetric univariate and bivariate density function satisfying

    $$\begin{aligned}&\int u\kappa _1(u)du=\int u\kappa _2(u,v)dudv=\int v\kappa _2(u,v)dudv=0,\\&\int u^2\kappa _1(u)du<\infty ,\int u^2\kappa _2(u,v)dudv<\infty , \int v^2\kappa _2(u,v)dudv<\infty . \end{aligned}$$
  4. (A4)

    Denote the Fourier transformations \(\chi _1(t) = \int \hbox {exp}(-iut)\kappa _1(u)du\) and \(\chi _2(s,t)= \int \hbox {exp}(-ius-ivt)\kappa _2(u,v)dudv\). \(\int |\chi _1(u)|du<\infty \) and \(\int |u\chi _1(u)|du<\infty \). \(\int |\chi _2(u,v)|dudv<\infty \), \(\int |u\chi _2(u,v)|dudv<\infty \) and \(\int |v\chi _2(u,v)|dudv<\infty \).

  5. (A5)

    The numbers of observations \(M_i\) for the j-th trajectory of i-th object, are i.i.d. r.v.’s that are independent of the densities \(f_i\) and satisfy

    $$\begin{aligned} E(N/M_{\scriptscriptstyle \mathsf +})<\infty , ~ E\{N/M_{\scriptscriptstyle \mathsf ++}\}<\infty . \end{aligned}$$
  6. (A6)

    \(h_\mu \rightarrow 0, h_g\rightarrow 0, N{h_\mu }^4\rightarrow \infty ,N{h_g}^6\rightarrow \infty \) as \(N\rightarrow \infty \).

  7. (A7)

    \(M_i,i=1,\cdots ,n+N\) are i.i.d positive r.v. generated from a truncated-Poisson distribution with rate \(\tau _N\), such that \(\hbox {pr}(M_i=0)=0\), and \(\hbox {pr}(M_i=k)={\tau _N}^k\hbox {exp}(-\tau _N)/[k!\{1-\hbox {exp}(-\tau _N)\}]\) for \(k\ge 1\).

  8. (A8)

    \(\omega _i = E(M_i\mid C_i) = E(N_i[0,C_i]\mid C_i)\) and \(f_{C_i,\mathrm{scaled}},i=1,\cdots ,n+N\) are independent. \(E{\omega _i}^{-1/2}=O(\alpha _N)\), where \(\alpha _N\rightarrow 0\) as \(N\rightarrow \infty \) for \(j=1,\cdots ,q\).

  9. (A9)

    The number of eigenfunctions and functional principal components \(K_i\) is a r.v. with \(K_i\overset{d}{=}K\), and for any \(\epsilon >0\), there exists \(K_\epsilon ^*<\infty \) such that \(\hbox {pr}(K>K_\epsilon ^*)<\epsilon \) for \(j=1,\cdots ,q\).

Lemma A1

Under the regularity conditions A1–A6,

$$\begin{aligned} \hbox {sup}_x |{\widehat{f}}_{\mu ,\mathrm{scaled}}(x)-f_{\mu ,\mathrm{scaled}}(x)|= & {} O_p\left( h_\mu ^2+\frac{1}{\sqrt{N}h_\mu }\right) ,\end{aligned}$$
(A.1)
$$\begin{aligned} \hbox {sup}_x |{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(x)-{f'_{\mu ,\mathrm{scaled}}}(x)|= & {} O_p\left( h_\mu ^2+\frac{1}{\sqrt{N}h^2_\mu }\right) ,\end{aligned}$$
(A.2)
$$\begin{aligned} \hbox {sup}_{x,y} |{\widehat{g}}_{\mathrm{scaled}}(x,y)-g_{\mathrm{scaled}}(x,y)|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^2}\right) ,\end{aligned}$$
(A.3)
$$\begin{aligned} \hbox {sup}_{x,y} |\triangledown {\widehat{g}}_{\mathrm{scaled}}(x,y)-\triangledown {g_{\mathrm{scaled}}}(x,y)|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^3}\right) ,\end{aligned}$$
(A.4)
$$\begin{aligned} \hbox {sup}_{x,y} |{\widehat{G}}_{\mathrm{scaled}}(x,y)-G_{\mathrm{scaled}}(x,y)|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^2}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu }\right) ,\nonumber \\ \end{aligned}$$
(A.5)
$$\begin{aligned} \hbox {sup}_{x,y} |\triangledown {\widehat{G}}_{\mathrm{scaled}}(x,y)-\triangledown {G_{\mathrm{scaled}}}(x,y)|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h^2_\mu }\right) .\nonumber \\ \end{aligned}$$
(A.6)

Proof

The proof on the mean density and covariance function can be found in Wu et al. (2013). Here we only obtain the proof for the derivative of the mean density function. The proof for the derivative of the covariance function is similar.

Under conditions A1 and A2, we have

$$\begin{aligned} E\{{{\widehat{f}}}'_{\mu ,\mathrm{scaled}} (x)\}= & {} E\left[ \frac{1}{M_{\scriptscriptstyle \mathsf +}h_\mu ^2}\sum _{i=1}^{n+N}M_i E\left\{ \kappa _1'\left( \frac{t-t_{i\ell }/C_i}{h_{\mu }}\right) \mid M_i, f_{C_i,\mathrm{scaled}}\right\} \right] \\= & {} E\left[ \frac{1}{M_{\scriptscriptstyle \mathsf +}}\sum _{i=1}^{n+N}M_i E\{f'_{C_i,\mathrm{scaled}}(x)+\frac{1}{2}f'''_{C_i,\mathrm{scaled}}(x)\sigma _{\kappa _1}^2h_\mu ^2+o(h_\mu ^2)\mid M_i\}\right] \\= & {} E\left[ \frac{1}{M_{\scriptscriptstyle \mathsf +}}\sum _{i=1}^{n+N}M_i \{f'_{\mu ,\mathrm{scaled}}(x)+\frac{1}{2}E{f'''_{\mu ,\mathrm{scaled}}}(x)\sigma _{\kappa _1}^2h_\mu ^2+o(h_\mu ^2)\}\right] \\= & {} f_{\mu ,\mathrm{scaled}}'(x)+O(h_\mu ^2), \end{aligned}$$

Hence, \(\hbox {sup}_x|E{{\widehat{f}}}'_\mu (x)-f'_\mu (x)|=O(h_\mu ^2)\).

With inverse Fourier transformation \(\kappa _1(t)=(2\pi )^{-1}\int \hbox {exp}(iut)\chi _1(u)du\), we have

$$\begin{aligned} \kappa _1'(t)=(2\pi )^{-1}i\int u\hbox {exp}(iut)\chi _1(u)du. \end{aligned}$$

We further insert this equation into \({\widehat{f}}'_{\mu }\),

$$\begin{aligned} {{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)= & {} \frac{1}{M_{\scriptscriptstyle \mathsf +}h_{\mu }^2}\sum _{k=1}^{n+N}\sum _{\ell =1}^{M_k}\kappa _1'\left( \frac{t-t_{k\ell }/C_k}{h_{\mu }}\right) \\= & {} \frac{1}{M_{\scriptscriptstyle \mathsf +}h_{\mu }^2}\sum _{k=1}^{n+N}\sum _{\ell =1}^{M_k}(2\pi )^{-1}i\int u\hbox {exp}\{iu(t-t_{k\ell }/C_k)/h_\mu \}\chi _1(u)du\\= & {} \frac{1}{M_{\scriptscriptstyle \mathsf +}}\sum _{k=1}^{n+N}\sum _{\ell =1}^{M_k}(2\pi )^{-1}i\int u\hbox {exp}\{iu(t-t_{k\ell }/C_k)\}\chi _1(uh_\mu )du\\= & {} (2\pi )^{-1}i\int \varsigma (u)u\hbox {exp}(iut)\chi _1(uh_\mu )du, \end{aligned}$$

where

$$\begin{aligned} \varsigma (u) = \frac{1}{M_{\scriptscriptstyle \mathsf +}h_{\mu }^2}\sum _{k=1}^{n+N}\sum _{\ell =1}^{M_k} \hbox {exp}\{-iut_{k\ell }/C_k\}. \end{aligned}$$

Therefore,

$$\begin{aligned}&|{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)-E{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)|\\&\quad = |(2\pi )^{-1}i\int \{\varsigma (u)-E\varsigma (u)\}u\hbox {exp}(iut)\chi _1(uh_\mu )du|\\&\quad \le (2\pi )^{-1}\int |\varsigma (u)-E\varsigma (u)||u\chi _1(uh_\mu )|du. \end{aligned}$$

Note that the right-hand side of the above inequality is free of t. Thus,

$$\begin{aligned} \hbox {sup}_t|{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)-E{\widehat{f}}'_{\mu ,\mathrm{scaled}}(t)| \le (2\pi )^{-1}\int |\varsigma (u)-E\varsigma (u)||u\chi _1(uh_\mu )|du. \end{aligned}$$

As an intermediate result of the Proof of Theorem 1 in Wu et al. (2013), we have

$$\begin{aligned} \hbox {var}\{\varsigma (u)\}\le \frac{1}{n+N}\left\{ 1+2E\left( \frac{n+N}{M_{\scriptscriptstyle \mathsf +}}\right) \right\} . \end{aligned}$$

This further lead to

$$\begin{aligned}&E\{\hbox {sup}_t|{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)-E{\widehat{f}}'_{\mu ,\mathrm{scaled}}(t)|\}\\&\quad \le (2\pi )^{-1}E\{\int |\varsigma (u)-E\varsigma (u)||u\chi _1(uh_\mu )|du\}\\&\quad = (2\pi )^{-1}\int E\{|\varsigma (u)-E\varsigma (u)|\}|u\chi _1(uh_\mu )|du\\&\quad \le (2\pi )^{-1}\int [\hbox {var}\{\varsigma (u)\}]^{1/2}|u\chi _1(uh_\mu )|du\\&\quad \le (2\pi )^{-1}\sqrt{\frac{1}{n+N}\left\{ 1+2E\left( \frac{n+N}{M_{\scriptscriptstyle \mathsf +}}\right) \right\} }\int |u\chi _1(uh_\mu )|du\\&\quad = O\left( \frac{1}{\sqrt{N}h_\mu ^2}\right) . \end{aligned}$$

Thus, \(\hbox {sup}_t|{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)-E{{\widehat{f}}}'_{\mu ,\mathrm{scaled}}(t)|=O_p\left( \frac{1}{\sqrt{N}h_\mu ^2}\right) \). Furthermore,

$$\begin{aligned} \hbox {sup}_t|{\widehat{f}}'_{\mu ,\mathrm{scaled}}(t)-f'_{\mu ,\mathrm{scaled}}(t)|\le & {} \hbox {sup}_x|{\widehat{f}}'_{\mu ,\mathrm{scaled}}(t)-E\widehat{f}'_{\mu ,\mathrm{scaled}}(t)|\\&+\hbox {sup}_t|E{\widehat{f}}'_{\mu ,\mathrm{scaled}}(t)-f'_{\mu ,\mathrm{scaled}}(t)|\\= & {} O_p\left( h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) . \end{aligned}$$

\(\square \)

Derivative of the Eigenfunctions:

Lemma A2

Under the regularity conditions A1–A6,

$$\begin{aligned} |{\widehat{\lambda }}_{k,\mathrm{scaled}}-\lambda _{k,\mathrm{scaled}}|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^2}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu }\right) , \end{aligned}$$
(A.7)
$$\begin{aligned} \hbox {sup}_x|{\widehat{\phi }}_{k,\mathrm{scaled}}(x)-\phi _{k,\mathrm{scaled}}(x)|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^2}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu }\right) , \end{aligned}$$
(A.8)
$$\begin{aligned} \hbox {sup}_x|{\widehat{\phi }}'_{k,\mathrm{scaled}}(x)-\phi '_{k,\mathrm{scaled}}(x)|= & {} O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h^2_\mu }\right) . \end{aligned}$$
(A.9)

Proof

The first two equations are direct result of Theorem 2 in Yao et al. (2005). Note that

$$\begin{aligned} {\widehat{\lambda }}_{k,\mathrm{scaled}}{\widehat{\phi }}'_{k,\mathrm{scaled}}(x)= & {} \int \widehat{G}_{\mathrm{scaled}}^{(1,0)}(x,y){\widehat{\phi }}_{k,\mathrm{scaled}}(y)dy,\\ {\lambda _{k,\mathrm{scaled}}}{\phi _{k,\mathrm{scaled}}}'(x)= & {} \int G_{\mathrm{scaled}}^{(1,0)}(x,y)\phi _{k,\mathrm{scaled}}(y)dy, \end{aligned}$$

where \(G_{\mathrm{scaled}}^{(1,0)}(x,y)=\partial G_{\mathrm{scaled}}(x,y)/\partial x.\) Thus,

$$\begin{aligned}&\bigg |{\widehat{\lambda }}_{k,\mathrm{scaled}}{\widehat{\phi }}'_{k,\mathrm{scaled}}(x)-{\lambda _{k,\mathrm{scaled}}}{\phi _{k,\mathrm{scaled}}}'(x)\bigg | \\&\quad = \bigg |\int {\widehat{G}}_{\mathrm{scaled}}^{(1,0)}(x,y){\widehat{\phi }}_{k,\mathrm{scaled}}(y)dy-\int G_{\mathrm{scaled}}^{(1,0)}(x,y)\phi _{k,\mathrm{scaled}}(y)dy\bigg | \\&\quad \le \int |{\widehat{G}}_{\mathrm{scaled}}^{(1,0)}(x,y)- G^{(1,0)}_{\mathrm{scaled}}(x,y)||{\widehat{\phi }}_{k,\mathrm{scaled}}(y)|dy\\&\qquad +\int |G_{\mathrm{scaled}}^{(1,0)}(x,y)||{\widehat{\phi }}_{k,\mathrm{scaled}}(y)-\phi _{k,\mathrm{scaled}}(y)|dy \\&\quad \le \left\{ \int |{\widehat{G}}_{\mathrm{scaled}}^{(1,0)}(x,y)- G_{\mathrm{scaled}}^{(1,0)}(x,y)|^2dy\right\} ^{1/2}\\&\qquad +\left\{ \int |G_{\mathrm{scaled}}^{(1,0)}(x,y)|^2dy\right\} ^{1/2}\left\{ \int |{\widehat{\phi }}_{k,\mathrm{scaled}}(y)-\phi _{k,\mathrm{scaled}}(y)|^2dy\right\} ^{1/2}. \end{aligned}$$

Without loss of generality assuming \(\lambda _{k,\mathrm{scaled}}>0\), then

$$\begin{aligned}&\hbox {sup}_x|({\widehat{\lambda }}_{k,\mathrm{scaled}}/\lambda _{k,\mathrm{scaled}}){\widehat{\phi }}'_{k,\mathrm{scaled}}(x)-\phi '_{k,\mathrm{scaled}}(x)|\\&\quad = O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h^2_\mu }\right) . \end{aligned}$$

Then (A.9) follows by applying (A.7). \(\square \)

Derivative of the Estimated Density Functions:

Lemma A3

Under regularity conditions A1–A9, for any \(\epsilon >0\), there exists an event \(A_\epsilon \) with \(\hbox {pr}(A_\epsilon )\ge 1-\epsilon \) such that on \(A_\epsilon \) it holds that

$$\begin{aligned} |{\widehat{\zeta }}_{ik,\mathrm{scaled}}-\zeta _{ik,\mathrm{scaled}}|= & {} O_p\left( \alpha _N+\frac{1}{\sqrt{N}h_g^2}+\frac{1}{\sqrt{N}h_\mu }\right) , \end{aligned}$$
(A.10)
$$\begin{aligned} \hbox {sup}_x|{\widehat{f}}_{C_i,\mathrm{scaled}}(x)-f_{C_i,\mathrm{scaled}}(x)|= & {} O_p\left( \alpha _N+\frac{1}{\sqrt{N}h_g^2}+\frac{1}{\sqrt{N}h_\mu }\right) , \end{aligned}$$
(A.11)
$$\begin{aligned} \hbox {sup}_x|{\widehat{f}}'_{C_i,\mathrm{scaled}}(x)-f'_{C_i,\mathrm{scaled}}(x)|= & {} O_p\left( \alpha _N+h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) .\nonumber \\ \end{aligned}$$
(A.12)

Proof

The existence of \(A_\epsilon \) for (A.10) - (A.11) are guaranteed by the Theorem 3 in Wu et al. (2013). We followed their definition of \(A_\epsilon \), i.e., \(A_\epsilon ^c = \{K>K_\epsilon ^*\}\cup \{M_i=1,i=1,\cdots ,n+N\}\), and prove for (A.12).

Note that

$$\begin{aligned} |{\widehat{f}}'_{C_i,\mathrm{scaled}}(x)-f'_{C_i,\mathrm{scaled}}(x)|\le & {} |{\widehat{f}}'_{C_i,\mathrm{scaled}}(x)-f^{'K}_{C_i,\mathrm{scaled}}(x)|\\&\quad +|f^{'K}_{C_i,\mathrm{scaled}}(x)-f'_{C_i,\mathrm{scaled}}(x)|. \end{aligned}$$

We have

$$\begin{aligned} \hbox {sup}_x E|f^{'K}_{C_i,\mathrm{scaled}}(x)-f'_{C_i,\mathrm{scaled}}(x)|^2= & {} \hbox {sup}_x E|\sum _{k=K+1}^\infty \zeta _{ik,\mathrm{scaled}}{\phi _{k,\mathrm{scaled}}}'(x)|^2\\= & {} \hbox {sup}_x\sum _{k=K+1}^\infty \lambda _{k,\mathrm{scaled}}|{\phi _{k,\mathrm{scaled}}}'(x)|^2\rightarrow 0, \end{aligned}$$

as \(K\rightarrow \infty \). Hence, \(|f^{'K}_{C_i,\mathrm{scaled}}(x)-f'_{C_i,\mathrm{scaled}}(x)|=o_p(1)\).

Furthermore, on \(A_\epsilon \)

$$\begin{aligned}&\hbox {sup}_x|{\widehat{f}}'_{C_i,\mathrm{scaled}}(x)-f^{'K}_{C_i,\mathrm{scaled}}(x)|\\&\quad \le \hbox {sup}_x|{\widehat{f}}'_{\mu ,\mathrm{scaled}}(x)-f'_{\mu ,\mathrm{scaled}}(x)|+\sum _{k=1}^K\hbox {sup}_x|{\widehat{\zeta }}_{ik,\mathrm{scaled}} {\widehat{\phi }}'_{k,\mathrm{scaled}}(x)-\zeta _{ik,\mathrm{scaled}}\phi '_{k,\mathrm{scaled}}(x)|\\&\quad \le \hbox {sup}_x|{\widehat{f}}'_{\mu ,\mathrm{scaled}}(x)-f'_{\mu ,\mathrm{scaled}}(x)|+\sum _{k=1}^K\hbox {sup}_x|{\widehat{\zeta }}_{ik,\mathrm{scaled}}-\zeta _{ik,\mathrm{scaled}}| |{\widehat{\phi }}'_{k,\mathrm{scaled}}(x)|\\&\qquad + \sum _{k=1}^K\hbox {sup}_x|\zeta _{ik,\mathrm{scaled}}||{\widehat{\phi }}'_{k,\mathrm{scaled}}(x)-{\phi _{k}}'(x)|\\&\quad =O_p\left( h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) + O_p\left( \alpha _N+\frac{1}{\sqrt{N}h_g^2}+\frac{1}{\sqrt{N}h_\mu }\right) \\&\qquad + O_p\left( h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) \\&\quad =O_p\left( \alpha _N+h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) . \end{aligned}$$

Therefore (A.12) holds. \(\square \)

Peaks and Change Points:

Assume \(f_{C_i,\mathrm{scaled}}\) is locally unimodal, i.e., \(f'_{C_i,\mathrm{scaled}}(x)=0\) has a unique solution, denoted by \(x_{i0}\), in a neighbourhood of \(x_{i0}\), denoted by \(\mathcal{B}(x_{i0}) = (x_{i0}-\varDelta x_{i0}, x_{i0} + \varDelta x_{i0})\). Further assume \(|f''_{C_i,\mathrm{scaled}}|\) is bounded away from 0 in \(\bigcup _{x_{i0}: f'_{C_i,\mathrm{scaled}}(x_{i0})=0}{{{\mathcal {B}}}}(x_{i0})\), and the bound holds uniformly across \(i=1,\cdots ,n+N\). Let \(\widehat{x}_{i0}\) be the solution of \({\widehat{f}}'_{C_i,\mathrm{scaled}}(x)=0\) which is closet to \(x_{i0}\). Then

$$\begin{aligned} 0= & {} {\widehat{f}}'_{C_i,\mathrm{scaled}}({\widehat{x}}_{i0})\\= & {} f'_{C_i,\mathrm{scaled}}({\widehat{x}}_{i0}) + O_p\left( \alpha _N+h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) \\= & {} f''_{C_i,\mathrm{scaled}}({x_{i0}}^*)({\widehat{x}}_{i0}-x_{i0})+ O_p\left( \alpha _N+h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) , \end{aligned}$$

where \({x_{i0}}^*\) is an intermediate value between \(x_{i0}\) and \({\widehat{x}}_{i0}\).

Thus, \(|\widehat{x}_{i0}-x_{i0}|=O_p\left( \alpha _N+h_g^2+\frac{1}{\sqrt{N}h_g^3}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^2}\right) \). This further implies \({\widehat{x}}_{i0}\) is the only solution of \(\widehat{f}'_{C_i,\mathrm{scaled}}\) in \({{{\mathcal {B}}}}(x_{i0})\). In other words, there is one-to-one correspondence between estimated peak and the true peak and the estimated peak converges to the true peak uniformly.

The derivation of the change point is similar, and here we only list the order of the absolute difference between estimated change point \({\widehat{y}}_{i0}\) and the true change point \(y_{i0}\).

$$\begin{aligned} |\widehat{y}_{i0}-y_{i0}|=O_p\left( \alpha _N+h_g^2+\frac{1}{\sqrt{N}h_g^4}+h_\mu ^2+\frac{1}{\sqrt{N}h_\mu ^3}\right) . \end{aligned}$$

Remark A1

For peak and change point, the approximation error would decay faster than \(n^{-1/2}\) when the unlabeled data expand with \(\alpha _N \ll n^{-1/2}\) in follow-up duration and \(N \gg n^3\) in sample size. In that case, we may choose \((n/N)^{1/8} \ll h_g \ll n^{-1/4}\) and \((n/N)^{1/6} \ll h_{\mu } \ll n^{-1/4} \) so that Assumption (C5) is satisfied.

Appendix D B-spline approximation and profile-likelihood estimation

Some Definitions on Vector and Matrix Norms:

For any vector \(\mathbf{a}=(a_{1},\ldots ,a_{s})^\mathsf{\scriptscriptstyle T}\in R^s \), denote the norm \(\Vert \mathbf{a}\Vert _r=(|a_1|^r+\dots +|a_s|^r)^{1/r}\), \(1\le r\le \infty \). For positive numbers \(a_n\) and \(b_n\), \(n>1\), let \(a_n\asymp b_n\) denote that \(\lim _{n\rightarrow \infty }a_n/b_n=c\), where c is some nonzero constant. Denote the space of the \(q^{th}\) order smooth functions as \(\mathbf{C}^{(q)}([0,{{\mathcal {E}}}] )=\left\{ \phi : \phi ^{(q)}\in \mathbf{C}[0,{{\mathcal {E}}}] \right\} \). For any \(s\times s\) symmetric matrix \(\mathbf{A}\), denote its \(L_q\) norm as \(\Vert \mathbf{A}\Vert _q =\mathrm{max}_{\mathbf{v}\in R^s,\mathbf{v}\ne 0}\Vert \mathbf{A}\mathbf{v}\Vert _q\Vert \mathbf{v}\Vert _q^{-1}\). Let \(\Vert \mathbf{A}\Vert _\infty =\mathrm{max}_{1\le i\le s}\sum _{j=1}^s|a_{ij}|\). For a vector \(\mathbf{a}\), let \(\Vert \mathbf{a}\Vert _{\infty }=\mathrm{max}_{1\le i\le s}|a_i|\).

Some Definition on Scores and Hessian Matrices:

Define

$$\begin{aligned} \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }})= & {} \frac{\partial \hbox {log}\widetilde{H}_i({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\gamma }}} = \varDelta _i\mathbf{B}_r(X_i)- (1+\varDelta _i) \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} \mathbf{B}_r(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du},\\ \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }},m)= & {} \varDelta _i\mathbf{B}_r(X_i)- (1+\varDelta _i) \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du},\\ \mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }})= & {} \frac{\partial \hbox {log}\widetilde{H}_i({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\beta }}} = \varDelta _i\mathbf{Z}_i- (1+\varDelta _i) \frac{\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du}{ 1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du},\\ \mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }},m)= & {} \varDelta _i\mathbf{Z}_i- (1+\varDelta _i) \frac{\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}{ 1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}. \end{aligned}$$

Further, define

$$\begin{aligned} \mathbf{S}_{{\varvec{\beta }}{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }})\equiv & {} \frac{\partial \mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}} =-(1+\varDelta _i) \frac{\mathbf{Z}_i^{\otimes 2}\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du]^2},\\ \mathbf{S}_{{\varvec{\beta }}{\varvec{\beta }},i}({\varvec{\beta }},m)\equiv & {} \frac{\partial \mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}} =-(1+\varDelta _i) \frac{\mathbf{Z}_i^{\otimes 2}\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2},\\ \mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }})\equiv & {} \frac{\partial \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\\= & {} -(1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du}\\&+(1+\varDelta _i)\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) [\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du]^2},\\ \mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }},m)\equiv & {} -(1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}\\&+(1+\varDelta _i)\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) [\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2},\\ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }})\equiv & {} \frac{\partial \mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}} = \frac{-(1+\varDelta _i) \mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du]^2},\\ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }},m)\equiv & {} \frac{-(1+\varDelta _i)\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}. \end{aligned}$$

Note that

$$\begin{aligned} \frac{l_n({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\gamma }}}=\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }}), \ \ \ \frac{l_n({\varvec{\beta }},{\varvec{\gamma }})}{\partial {\varvec{\beta }}}=\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }}). \end{aligned}$$

For \(u\in [0,{{\mathcal {E}}}] \), define

$$\begin{aligned} {\widehat{\sigma }}^2(u,{\varvec{\beta }}) =\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u) \{\mathbf{V}_n({\varvec{\beta }}_0)\}^{-1}\{ n^{-2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)^{\otimes 2}\} \{\mathbf{V}_n({\varvec{\beta }}_0)\}^{-1} \mathbf{B}_r(u), \qquad \end{aligned}$$
(A.13)

where

$$\begin{aligned} \mathbf{V}_n({\varvec{\beta }})=-E\{\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }},m)\}. \end{aligned}$$

Approximation Error from \({\widehat{\mathbf{W}}}\):

We first assess the approximation error from using the estimated features \({\widehat{\mathbf{W}}}\) in \(l_n\). Once we establish the identifiability of \(l_n\) in the proof of Lemma  1, the approximation of losses would translate to the approximation of their minimums.

Lemma A4

Let

$$\begin{aligned} l_n^*({\varvec{\beta }},{\varvec{\gamma }})= & {} \sum _{i=1}^n\left[ \varDelta _i\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(X_i){\varvec{\gamma }}+\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}\}\right. \\&\left. - (1+ \varDelta _i)\hbox {log}\left\{ 1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) \int _0^{X_i}\hbox {exp}\{{\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\mathbf{B}_r(t)\}dt\right\} \right] \end{aligned}$$

with \(\mathbf{Z}_i = (\mathbf{U}_i^\mathsf{\scriptscriptstyle T},\mathbf{W}_i{^{\scriptscriptstyle [1]}}^\mathsf{\scriptscriptstyle T},\ldots , \mathbf{W}_i{^{\scriptscriptstyle [q]}}^\mathsf{\scriptscriptstyle T})^\mathsf{\scriptscriptstyle T}\) be the loss with true features from the intensity functions. Let \(\varOmega \) be a sufficiently large compact neighborhood of

$$\begin{aligned} {\varvec{\theta }}_0 = ({\varvec{\beta }}_0^\mathsf{\scriptscriptstyle T}, {\varvec{\gamma }}_0^\mathsf{\scriptscriptstyle T})^\mathsf{\scriptscriptstyle T}= \text{ argmin}_{{\varvec{\theta }}} E\{n^{-1}l_n^*({\varvec{\beta }},{\varvec{\gamma }})\}. \end{aligned}$$

We have

$$\begin{aligned} \hbox {sup}_{{\varvec{\theta }}\in \varOmega }\frac{1}{n}\left| l_n^*({\varvec{\beta }},{\varvec{\gamma }}) - l_n({\varvec{\beta }},{\varvec{\gamma }})\right| \lesssim \hbox {sup}_{i=1,\dots ,n}\Vert {\widehat{\mathbf{W}}}_i - \mathbf{W}_i\Vert . \end{aligned}$$
(A.14)

Proof

By the mean value theorem, we may express the difference as

$$\begin{aligned}&\frac{1}{n}\left\{ l_n^*({\varvec{\beta }},{\varvec{\gamma }}) - l_n({\varvec{\beta }},{\varvec{\gamma }})\right\} \nonumber \\&\quad = \underbrace{\frac{1}{n}\sum _{i=1}^n \varDelta _i{\varvec{\beta }}_{-1}^\mathsf{\scriptscriptstyle T}\left( \mathbf{W}_i - {\widehat{\mathbf{W}}}_i\right) }_{T_1} \nonumber \\&\quad - \underbrace{\frac{1}{n}\sum _{i=1}^n (1+\varDelta _i) \frac{\hbox {exp}({\tilde{\mathbf{Z}}}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) \int _0^{X_i}\hbox {exp}\{{\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\mathbf{B}_r(t)\}dt}{1+\hbox {exp}({\tilde{\mathbf{Z}}}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) \int _0^{X_i}\hbox {exp}\{{\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\mathbf{B}_r(t)\}dt} {\varvec{\beta }}_{-1}^\mathsf{\scriptscriptstyle T}\left( \mathbf{W}_i - {\widehat{\mathbf{W}}}_i\right) }_{T_2} \end{aligned}$$
(A.15)

for \({\tilde{\mathbf{Z}}}_i\) between \({\widehat{\mathbf{Z}}}_i\) and \(\mathbf{Z}_i\). Since \(\varDelta _i\) is binary and \(\Vert {\varvec{\beta }}\Vert \) is bounded in compact set \(\varOmega \), we have

$$\begin{aligned} |T_1| \le \Vert {\varvec{\beta }}\Vert \hbox {sup}_{i=1,\dots ,n}\Vert {\widehat{\mathbf{W}}}_i - \mathbf{W}_i\Vert \lesssim \hbox {sup}_{i=1,\dots ,n}\Vert {\widehat{\mathbf{W}}}_i - \mathbf{W}_i\Vert . \end{aligned}$$
(A.16)

For \(T_2\), we apply the bounds for \(\varDelta _i\) and \(\Vert {\varvec{\beta }}\Vert \) along with the bound of the function \(e^x/(1+e^x) \in [0,1]\),

$$\begin{aligned} |T_2| \le \Vert {\varvec{\beta }}\Vert \hbox {sup}_{i=1,\dots ,n}\Vert {\widehat{\mathbf{W}}}_i - \mathbf{W}_i\Vert \lesssim \hbox {sup}_{i=1,\dots ,n}\Vert {\widehat{\mathbf{W}}}_i - \mathbf{W}_i\Vert . \end{aligned}$$
(A.17)

Thus, we obtain (A.14) by applying (A.16) and (A.17) to (A.15). \(\square \)

In the following theorems, we establish the consistency, asymptotic normality of our procedure.

Proof of Lemma 1

By Lemma A4, the loss with estimated features deviates from the loss with true features by at most \(\hbox {sup}_{i=1,\dots ,n}\Vert {\widehat{\mathbf{W}}}_i - \mathbf{W}_i\Vert \). Under Assumption (C5), the error decays faster than \(n^{-1/2}\) order. Thus, if either loss produces estimator identifying the true parameter at \(n^{-1/2}\) rate, both losses produce asymptotically equivalent consistent estimators. We focus the analysis of the loss with true features in the following.

For \(m\in C^q[0,{{\mathcal {E}}}] \), there exists \({\varvec{\gamma }}_0\in R^{P_n}\), such that

$$\begin{aligned} \hbox {sup}_{u\in [0,{{\mathcal {E}}}] }| m(u)-\widetilde{m} (u)| =O(h^q), \end{aligned}$$
(A.18)

where \(\widetilde{m}(u)=\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}_0\) (de Boor 2001). In the following, we prove the results for the nonparametric estimator \( \widehat{m}(u,{\varvec{\beta }})\) in Theorem 1 when \({\varvec{\beta }}={\varvec{\beta }}_0\). Then the results also hold when \({\varvec{\beta }}\) is a \(\sqrt{n}\)-consistent estimator of \({\varvec{\beta }}_0\), since the nonparametric convergence rate in Theorem 1 is slower than \(n^{-1/2}\). Define the distance between neighboring knots as \(h_p=\xi _{p+1}-\xi _p,r\le p\le R_n+r\), and \(h=\mathrm{max}_{r\le p\le R_n+r}h_p\). Let \(\rho _n=n^{-1/2}h^{-1}+h^{q-1/2}\). We will show that for any given \(\epsilon >0\), for n sufficiently large, there exists a large constant \(C>0\) such that

$$\begin{aligned} \hbox {pr}\{\hbox {sup}_{\Vert {\varvec{\tau }}\Vert _{2}=C}l_n({\varvec{\beta }}_0, {\varvec{\gamma }}_0+\rho _n{\varvec{\tau }})<l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)\}\ge 1-6\epsilon . \end{aligned}$$
(A.19)

This implies that for n sufficiently large, with probability at least \( 1-6\epsilon \), there exists a local maximum for (2) in the ball \(\{ {\varvec{\gamma }}_0+\rho _n{\varvec{\tau }}:\Vert {\varvec{\tau }}\Vert _2\le C\} \). Hence, there exists a local maximizer such that \(\Vert {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\Vert _2=O_p(\rho _n)\). Note that

$$\begin{aligned} \frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }})}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}} =\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}) \end{aligned}$$

and

$$\begin{aligned} \mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }})= & {} -(1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} \mathbf{B}_r(u)^{\otimes 2} du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du]^2}\\&-(1+\varDelta _i) \frac{ \hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} \mathbf{B}_r(u)^{\otimes 2} du \int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du]^2}\\&+(1+\varDelta _i)\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) [\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du]^2}. \end{aligned}$$

The first term above is negative-definite, and last two terms are also negative-definite because of Cauch-Schwartz inequality, hence \(\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }})\) is negative-definite. Thus, \(l_n({\varvec{\beta }}_0,{\varvec{\gamma }})\) is a concave function of \({\varvec{\gamma }}\), so the local maximizer is the global maximizer of (2), which will show the convergence of \({\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)\) to \({\varvec{\gamma }}_0\).

By Taylor expansion, we have

$$\begin{aligned} l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0+\rho _n{\varvec{\tau }})-l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0) =\frac{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}} \rho _n{\varvec{\tau }}- \left\{ -\frac{1}{2} \rho _n{\varvec{\tau }}^\mathsf{\scriptscriptstyle T}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\rho _n{\varvec{\tau }}\right\} , \nonumber \\ \end{aligned}$$
(A.20)

where \({\varvec{\gamma }}^{*}=\rho {\varvec{\gamma }}+(1-\rho ){\varvec{\gamma }}_0\) for some \( \rho \in (0,1)\). Moreover,

$$\begin{aligned}&\left| \frac{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}} \rho _n{\varvec{\tau }}\right| \le \rho _n\left\| \frac{\partial l_n({\varvec{\beta }}_0, {\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}}\right\| _2\left\| {\varvec{\tau }}\right\| _2 =C\rho _n\left\| \frac{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}} \right\| _2 \\&\quad =C\rho _n\Vert \mathbf{T}_{n1}+\mathbf{T}_{n2}\Vert _2, \end{aligned}$$

where

$$\begin{aligned} \mathbf{T}_{n1}= & {} \sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\\ \mathbf{T}_{n2}= & {} \sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)-\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m). \end{aligned}$$

Recall that \(S_C(\cdot )\) and \(f_C(\cdot )\) are the censoring process survival and density functions respectively, we have

$$\begin{aligned}&E\{\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\mid \mathbf{Z}_i\}\\&\quad =E\left[ \varDelta _i\mathbf{B}_r(X_i)- (1+\varDelta _i) \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] \\&\quad =\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{\mathcal {E}}} \left[ \mathbf{B}_r(X_i)-\frac{2\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] \\&\qquad \times \frac{\hbox {exp}\{m(X_i)\}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}S_C(X_i\mid \mathbf{Z}_i)dX_i\\&\qquad -\int _0^{{{\mathcal {E}}}-} \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}f_C(X_i\mid \mathbf{Z}_i)dX_i\\&\qquad - \frac{\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\}du]^2}S_C({{\mathcal {E}}}-\mid \mathbf{Z}_i)\\&\quad = \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\left[ \int _0^{{{\mathcal {E}}}-} \frac{\hbox {exp}\{m(X_i)\}\mathbf{B}_r(X_i)}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}S_C(X_i\mid \mathbf{Z}_i)dX_i\right. \\&\qquad -\int _0^{{{\mathcal {E}}}-}\frac{2 \hbox {exp}\{m(X_i)\} \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^3}S_C(X_i\mid \mathbf{Z}_i)dX_i \\&\qquad \left. -\int _0^{{{\mathcal {E}}}-} \frac{\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}f_C(X_i\mid \mathbf{Z}_i) dX_i\right] \\&\qquad - \frac{\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\}du]^2}S_C({{\mathcal {E}}}-\mid \mathbf{Z}_i)\\&\quad =\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\left[ \int _0^{{{\mathcal {E}}}-}S_C(X_i\mid \mathbf{Z}_i) \frac{\partial }{\partial X_i}\frac{\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2} dX_i\right. \\&\qquad \left. -\int _0^{{{\mathcal {E}}}-} \frac{\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}f_C(X_i\mid \mathbf{Z}_i) dX_i\right] \\&\qquad - \frac{\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\}du]^2}S_C({{\mathcal {E}}}-\mid \mathbf{Z}_i)\\&\quad =\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\left[ \int _0^{{{\mathcal {E}}}-} \frac{\partial }{\partial X_i}\frac{S_C(X_i\mid \mathbf{Z}_i)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}dX_i \right] \\&\qquad - \frac{\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\}du]^2}S_C({{\mathcal {E}}}-\mid \mathbf{Z}_i)\\&\quad =\mathbf{0}. \end{aligned}$$

In the following, all the integrals are calculated on \([0,{{\mathcal {E}}}]\), unless otherwise specified.

Thus, \(E(\mathbf{T}_{n1})=\mathbf{0}\). Further

$$\begin{aligned}&E[\{\mathbf{e}_p^\mathsf{\scriptscriptstyle T}\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}^2\vert \mathbf{Z}_i]\\&\quad =E\left( \left[ \varDelta _iB_{r,p}(X_i)- (1+\varDelta _i) \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2\Bigg \vert \mathbf{Z}_i\right) \\&\quad = \int \left[ B_{r,p}(X_i)-2 \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2 f_T(X_i\mid \mathbf{Z}_i) S_C(X_i\mid \mathbf{Z}_i) dX_i\\&\qquad +\int \left[ \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2 f_C(X_i\mid \mathbf{Z}_i) S_T(X_i\mid \mathbf{Z}_i) dX_i\\&\quad =\int \left[ B_{r,p}(X_i)-2 \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2 f_T(X_i\mid \mathbf{Z}_i) S_C(X_i\mid \mathbf{Z}_i) dX_i\\&\qquad +\int _0^{{{\mathcal {E}}}-}\left[ \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2 f_C(X_i\mid \mathbf{Z}_i) S_T(X_i\mid \mathbf{Z}_i) dX_i\\&\qquad +\left[ \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\}du}\right] ^2 S_C({{\mathcal {E}}}-\mid \mathbf{Z}_i) S_T({{\mathcal {E}}}\mid \mathbf{Z}_i)\\&\qquad \le C_1''\left( \int \left[ B_{r,p}(X_i)-2 \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2 dX_i\right. \\&\qquad \left. +\int _0^{{{\mathcal {E}}}-}\left[ \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^2 dX_i\right. \\&\qquad \left. +\left[ \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} B_{r,p}(u) du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\}du}\right] ^2\right) \\&\qquad \le C_1''\left( 2\int B_{r,p}(X_i)^2 dX_i+ 9\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int \left[ \int \hbox {exp}\{m(u)\} B_{r,p}(u) du\right] ^2 dX_i\right. \\&\qquad \left. +\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) \left[ \int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} B_{r,p}(u) du\right] ^2\right) \\&\qquad \le C_1''\left( 2\int B_{r,p}(X_i)^2 dX_i+ 9{{\mathcal {E}}}\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\left[ \int \hbox {exp}\{m(u)\} B_{r,p}(u) du\right] ^2\right. \\&\qquad \left. +\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) \left[ \int _0^{{{\mathcal {E}}}}\hbox {exp}\{m(u)\} B_{r,p}(u) du\right] ^2\right) \\&\quad \le C_1''\left( 2\int B_{r,p}(X_i)^2 dX_i+ (9{{\mathcal {E}}}+1)\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\left[ \int \hbox {exp}\{2m(u)\}du\int B_{r,p}^2(u) du\right] \right) \\&\quad \le C_1'h, \end{aligned}$$

for some constant \(0<C_1'<\infty \) by Condition (C4). Thus, \(E(\Vert n^{-1}\mathbf{T}_{n1}\Vert _{2}^{2})\le P_nn^{-1}C_1'h\). By Condition (C3), we have \(h\asymp P_n^{-1}\). Then \(E(\Vert n^{-1}\mathbf{T}_{n1}\Vert _{2}^{2})\le C_1n^{-1}\) for some constant \(0<C_1<\infty \). Then for any \(\epsilon >0\), by Chebyshev’s inequality, we have \(\hbox {pr}(\Vert n^{-1}\mathbf{T}_{n1}\Vert _{2}\ge \sqrt{n^{-1}C_1\epsilon ^{-1}})\le \epsilon \), or equivalently

$$\begin{aligned} \hbox {pr}(\Vert \mathbf{T}_{n1}\Vert _{2}\ge \sqrt{nC_1\epsilon ^{-1}})\le \epsilon . \end{aligned}$$
(A.21)

Moreover, by (A.18), we have \(\hbox {sup}_u|\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}_0 -m(u)|=O(h^q)\). Denote

$$\begin{aligned}&T_{ip}\\&\quad =\mathbf{e}_p^\mathsf{\scriptscriptstyle T}\{\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)- \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0, m)\}\\&\quad = (1+\varDelta _i)\left[ \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}B_{r,p}(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right. \\&\quad -\left. \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} B_{r,p}(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du}\right] \\&\quad = \frac{(1+\varDelta _i) \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i} [\hbox {exp}\{m(u)\} -\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}] B_{r,p}(u) du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du] [1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du] }\\&\qquad +(1+\varDelta _i) \hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\\&\qquad \left[ \frac{ \int _0^{X_i}\hbox {exp}\{m(u)\}B_{r,p}(u) du \int _0^{X_i}[\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} -\hbox {exp}\{m(u)\}]du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du] [1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du] }\right] \\&\quad +(1+\varDelta _i) \hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\\&\quad \left[ \frac{ \int _0^{X_i}[\hbox {exp}\{m(u)\}-\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} ]B_{r,p}(u) du \int _0^{X_i}\hbox {exp}\{m(u)\}du }{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du] [1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}du] }\right] , \end{aligned}$$

then

$$\begin{aligned} |T_{ip}|\le & {} 2 \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i} |\hbox {exp}\{m(u)\} -\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}| B_{r,p}(u) du\\&+2 \hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) \int _0^{X_i}\hbox {exp}\{m(u)\}B_{r,p}(u) du \int _0^{X_i}|\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\} -\hbox {exp}\{m(u)\}|du\\&+2\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}) \int _0^{X_i}|\hbox {exp}\{m(u)\}-\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}\}|B_{r,p}(u) du \int _0^{X_i}\hbox {exp}\{m(u)\}du\\\le & {} C_2'h^{q+1} \end{aligned}$$

for a constant \(0<C_2'<\infty \) under Condition (C4). Therefore, \(E(\Vert \mathbf{T}_{n2}\Vert _2) \le \{P_n(C_2'h^{q+1}n)^2\}^{1/2} =P_n^{1/2}C_2'nh^{q+1}\le C_2nh^{q+1/2} \) for a constant \(0<C_2<\infty \), and \(E(\Vert \mathbf{T}_{n2}\Vert _{2}^2) \le P_n(C_2'h^{q+1}n)^2 \le (C_2nh^{q+1/2})^2\). Again by Chebyshev’s inequality, for \(1/4>\epsilon >0\), we have

$$\begin{aligned}&\hbox {pr}(\Vert \mathbf{T}_{n2}\Vert _{2}\ge \epsilon ^{-1/2}C_2nh^{q+1/2})\nonumber \\&\quad \le \hbox {pr}\{ |\Vert \mathbf{T}_{n2}\Vert _2-E(\Vert \mathbf{T}_{n2}\Vert _{2})|\ge \epsilon ^{-1/2}C_2nh^{q+1/2}/2\}\nonumber \\&\quad + \hbox {pr}\{E(\Vert \mathbf{T}_{n2}\Vert _2)\ge \epsilon ^{-1/2}C_2nh^{q+1/2}/2\}\nonumber \\&\quad \le \hbox {pr}(|\Vert \mathbf{T}_{n2}\Vert _2-E(\Vert \mathbf{T}_{n2}\Vert _2)| \ge \epsilon ^{-1/2}\{\hbox {var}(\Vert \mathbf{T}_{n2}\Vert _2)\}^{1/2} /2) \nonumber \\&\quad + \hbox {pr}( C_2nh^{q+1/2} \ge \epsilon ^{-1/2}C_2nh^{q+1/2}/2)\nonumber \\&\quad = \hbox {pr}(|\Vert \mathbf{T}_{n2}\Vert _2-E(\Vert \mathbf{T}_{n2}\Vert _2)| \ge \epsilon ^{-1/2}\{\hbox {var}(\Vert \mathbf{T}_{n2}\Vert _2)\}^{1/2} /2) \nonumber \\&\quad \le 4\epsilon . \end{aligned}$$
(A.22)

Combining (A.21) and (A.22), with probablity at least \(1-5\epsilon \),

$$\begin{aligned} |\{\partial l_n({\varvec{\beta }}_0, {\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\}^\mathsf{\scriptscriptstyle T}\rho _n{\varvec{\tau }}|\le & {} C\rho _n (\Vert \mathbf{T}_{n1}\Vert _{2}+\Vert \mathbf{T}_{n2}\Vert _{2}) \nonumber \\\le & {} C\rho _n\left( \sqrt{C_1\epsilon ^{-1}}n^{1/2}+\epsilon ^{-1/2}C_2nh^{q+1/2}\right) . \qquad \end{aligned}$$
(A.23)

Moreover, Lemma A5 implies there exists a constant \(0<C_3<\infty \) such that

$$\begin{aligned} -\frac{1}{2}{\varvec{\tau }}^\mathsf{\scriptscriptstyle T}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}{\varvec{\tau }}\ge nC_3C^2h \end{aligned}$$

for n sufficiently large, with probability approaching 1. Thus, for any \(\epsilon >0\), there is probability at least \(1-\epsilon \),

$$\begin{aligned} -2^{-1} (\rho _n{\varvec{\tau }})^\mathsf{\scriptscriptstyle T}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}(\rho _n{\varvec{\tau }})\ge \rho _n^2C_3C^2nh. \end{aligned}$$
(A.24)

Therefore, by (A.20), (A.23) and (A.24), for n sufficiently large, with probability at least \(1-6\epsilon \),

$$\begin{aligned}&l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0+\rho _n{\varvec{\tau }})-l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0) \\&\quad \le C\rho _n\left( \sqrt{C_1\epsilon ^{-1}}n^{1/2}+\epsilon ^{-1/2}C_2nh^{q+1/2}\right) -\rho _n^{2}C_3C^2nh \\&\quad = C\rho _nh\left( \sqrt{C_{1}\epsilon ^{-1}}n^{1/2}h^{-1}+\epsilon ^{-1/2}C_2nh^{q-1/2}-CC_3n\rho _n\right) \\&\quad = C\rho _nh\left( \sqrt{C_{1}\epsilon ^{-1}}n^{1/2}h^{-1}+\epsilon ^{-1/2}C_2nh^{q-1/2}-CC_3n^{1/2}h^{-1}-CC_3nh^{q-1/2}\right) \\&\quad <0, \end{aligned}$$

when \(C>\mathrm{max}(C_3^{-1}\sqrt{C_1\epsilon ^{-1}},\epsilon ^{-1/2}C_3^{-1}C_2)\). This shows (A.19). Hence, we have \(\Vert {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\Vert _{2}=O_p(\rho _n)=O_p(n^{-1/2}h^{-1}+h^{q-1/2})=o_p(1)\) under Condition (C3).

It is easily seen that \(E\{\Vert \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\Vert _\infty ^d\}\le C_4^d h\) for a constant \(1<C_4<\infty \) and any \(d\ge 1\), by Bernstein’s inequality, under condition (C3), we have

$$\begin{aligned} \Vert n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\Vert _\infty =O_p[h+\{h\hbox {log}(n)\}^{1/2}n^{-1/2}]=O_p(h). \end{aligned}$$

Also, it is easy to check that

$$\begin{aligned} \Vert n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m) -n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) \Vert _\infty =O_p(h^{q+1}). \end{aligned}$$

Thus, combining with Lemma A7-A8, we have

$$\begin{aligned}&\left| \mathbf{B}_r(u)^\mathsf{\scriptscriptstyle T}\left[ \left\{ -n^{-1} \frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}\left\{ n^{-1}\frac{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}} \right\} - \mathbf{V}_n({\varvec{\beta }}_0)^{-1}n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right] \right| \nonumber \\&\quad \le r\left\{ \Vert \mathbf{B}_r(u)\Vert _{\infty }\left\| \left\{ -n^{-1} \frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}\right\| _\infty \Vert n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) \right. \nonumber \\&\qquad \left. -n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\Vert _\infty \nonumber \right. \\&\qquad \left. +\Vert \mathbf{B}_r(u)\Vert _\infty \left\| \left\{ -n^{-1} \frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1} - \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\right\| _\infty \Vert n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m) \Vert _\infty \right\} \nonumber \\&\quad = O_p(h^{-1})O_p(h^{q+1})+O_p(h^{q-1}+n^{-1/2}h^{-1})O_p(h)\nonumber \\&\quad = O_p(h^{q}+n^{-1/2}), \end{aligned}$$
(A.25)

where the inequality above uses the fact that for arbitrary u, only r elements in \(\mathbf{B}_r(u)\) are non-zero.

Let \(\widehat{\mathbf{e}}=\mathbf{V}_n({\varvec{\beta }}_0)^{-1} n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\). Let \(\mathbf{Z}=(\mathbf{Z}_1^\mathsf{\scriptscriptstyle T},\dots , \mathbf{Z}_n^\mathsf{\scriptscriptstyle T})^\mathsf{\scriptscriptstyle T}\). By Central Limit Theorem,

$$\begin{aligned} \left[ \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\text {var}\left( {\widehat{\mathbf{e}}}|\mathbf{Z}\right) \mathbf{B}_r(u)\right] ^{-1/2}\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\widehat{\mathbf{e}}}\rightarrow \hbox {Normal}(0,1), \end{aligned}$$

where \(\hbox {var}( {\widehat{\mathbf{e}}}|\mathbf{Z}) =\{\mathbf{V}_n({\varvec{\beta }}_0)\}^{-1}\{ n^{-2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)^{\otimes 2}\} \{\mathbf{V}_n({\varvec{\beta }}_0)\}^{-1} \) and \(\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\hbox {var}( {\widehat{\mathbf{e}}} | \mathbf{Z})\mathbf{B}_r(u)={\widehat{\sigma }}^2(u,{\varvec{\beta }}_0)\). With Lemmas A7 and A9, we can get that \(c_5(nh)^{-1} \Vert \mathbf{B}_r(u)\Vert _2^2\le \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\hbox {var}( {\widehat{\mathbf{e}}}|\mathbf{Z})\mathbf{B}_r(u) \le C_5(nh)^{-1} \Vert \mathbf{B}_r(u)\Vert _2^2, \) for some constants \(0<c_5,c_5<\infty \). So there exist constants \(0<c_\sigma \le C_\sigma <\infty \) such that with probability approaching 1 and for large enough n,

$$\begin{aligned} c_\sigma (nh)^{-1/2}\le \inf _{u\in [0,{{\mathcal {E}}}] }{\widehat{\sigma }}(u,{\varvec{\beta }}_0)\le \hbox {sup}_{u\in [0,{{\mathcal {E}}}] }{\widehat{\sigma }}(u,{\varvec{\beta }}_0)\le C_\sigma (nh)^{-1/2}. \end{aligned}$$
(A.26)

Thus \(\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\widehat{\mathbf{e}}}=O_p\left\{ (nh)^{-1/2}\right\} \) uniformly in \(u\in [0,{{\mathcal {E}}}] \), and hence

$$\begin{aligned} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\left\{ -\partial ^{2}l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0) /\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\right\} ^{-1}\{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\}= & {} O_p\left\{ (nh)^{-1/2}+h^{q}+n^{-1/2}\right\} \\= & {} O_p(h^{q}+n^{-1/2}h^{-1/2}). \end{aligned}$$

uniformly in \(u\in [0,{{\mathcal {E}}}] \) as well.

By Taylor expansion,

$$\begin{aligned} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\{\widehat{{\varvec{\gamma }}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\}= & {} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\left\{ -\partial ^{2}l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\right\} ^{-1}\{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\}\{1+o_p(1)\}\nonumber \\= & {} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\left\{ -\partial ^{2}l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\right\} ^{-1}\{\partial l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\}\nonumber \\&+o_p(h^q+n^{-1/2}h^{-1/2}). \end{aligned}$$
(A.27)

Thus by (A.25), (A.26), (A.27) and Condition (C3),

$$\begin{aligned}&\hbox {sup}_{u\in [0,{{\mathcal {E}}}] }|{\widehat{\sigma }}(u,{\varvec{\beta }}_0)^{-1}\left[ \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\left\{ {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\right\} -\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\widehat{\mathbf{e}}}\right] |\\= & {} O_p\{(nh)^{1/2})\}\{O_p(h^{q}+n^{-1/2})+o_p(h^q+n^{-1/2}h^{-1/2})\}\\= & {} O_p(n^{1/2}h^{q+1/2}+h^{1/2})+o_p(1)\\= & {} o_p(1). \end{aligned}$$

Therefore by Slutsky’s theorem \({\widehat{\sigma }}^{-1}(u,{\varvec{\beta }}_0)\left\{ \widehat{m}(u,{\varvec{\beta }}_0)-\widetilde{m}(u)\right\} \rightarrow \hbox {Normal} (0,1)\) and \(\widehat{m}(u,{\varvec{\beta }}_0)-\widetilde{m}(u)=O_p\left\{ (nh)^{-1/2}\right\} \) uniformly in \(u\in [0,{{\mathcal {E}}}] \). By \(\hbox {sup}_{u\in [0,{{\mathcal {E}}}] }|m(u)-\widetilde{m}(u)|=O(h^q)\), we have \(|\widehat{m}(u,{\varvec{\beta }}_0)-m(u)|=O_p\{(nh)^{-1/2}+h^q\}\) uniformly in \(u\in [0,{{\mathcal {E}}}]\). By Slutsky’s theorem and Condition (C3), we have

$$\begin{aligned} {\widehat{\sigma }}^{-1}(u,{\varvec{\beta }}_0)\left\{ \widehat{m}(u,{\varvec{\beta }}_0)-m(u)\right\} \rightarrow \hbox {Normal}(0,1). \end{aligned}$$

\(\square \)

Proof of Lemma 2

Because \(\mathbf{S}_{{\varvec{\beta }}{\varvec{\beta }},i}({\varvec{\beta }},{\varvec{\gamma }})\) is negative definite and \(E\{\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,m)\}=\mathbf{0}\), similar but simpler derivation as for Theorem 1 can be used to show the consistency of the maximizer \({\widehat{{\varvec{\beta }}}}\).

Because at any \({\varvec{\beta }}\), \(\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}\{{\varvec{\beta }},{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})\}=\mathbf{0}\), hence

$$\begin{aligned} \mathbf{0}= & {} \sum _{i=1}^n\frac{\partial \mathbf{S}_{{\varvec{\gamma }},i}\{{\varvec{\beta }},{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})\}}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}} +\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}\{{\varvec{\beta }},{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})\} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}\\= & {} \sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}\{{\varvec{\beta }},{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})\} +\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}\{{\varvec{\beta }},{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})\} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }})}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}. \end{aligned}$$

so

$$\begin{aligned} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}= & {} - [n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}\{{\varvec{\beta }}_0,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)\}]^{-1} n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}\{{\varvec{\beta }}_0,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)\}\nonumber \\= & {} \mathbf{V}_n({\varvec{\beta }}_0)^{-1} E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} +\mathbf{r}_1, \end{aligned}$$
(A.28)

where \(\mathbf{r}_1\) is the residual term and is of smaller order of \(\mathbf{V}_n({\varvec{\beta }}_0)^{-1} E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} \) componentwise. Note that \(\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }},{\varvec{\gamma }})=O_p(h)\) uniformly elementwise. Hence,

$$\begin{aligned} \Vert \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\Vert _2 = \Vert \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\Vert _2 = O_p(h^{1/2}),\\ \Vert \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\Vert _\infty = O_p(h),\\ \Vert \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\Vert _\infty = O_p(1). \end{aligned}$$

Subsequently, we have

$$\begin{aligned} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1} E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} \Vert _2\le & {} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _2 \Vert E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} \Vert _2\\= & {} O_p(h^{-1})O_p(h^{1/2}) = O_p(h^{-1/2}), \end{aligned}$$

and

$$\begin{aligned} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1} E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} \Vert _\infty\le & {} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty \Vert E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} \Vert _\infty \\= & {} O_p(h^{-1})O_p(h) = O_p(1). \end{aligned}$$

Here we use the fact that \(\Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _2 = O_p(h^{-1})\) and \(\Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty =O_p(h^{-1})\), where the former one is a direct corollary of Lemma A5 and the latter one is shown in Lemma A8. Therefore, \(\Vert \mathbf{r}_1\Vert _2 = o_p(h^{-1/2})\) and \(\Vert \mathbf{r}_1\Vert _\infty =o_p(1)\).

By Taylor expansion, for \({\varvec{\beta }}^*=\rho {\varvec{\beta }}_0+(1-\rho ){\widehat{{\varvec{\beta }}}}\), \(0<\rho <1\),

$$\begin{aligned} \mathbf{0}= & {} n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}\{{\widehat{{\varvec{\beta }}}},{\widehat{{\varvec{\gamma }}}}({\widehat{{\varvec{\beta }}}})\}\nonumber \\= & {} n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}\{{\varvec{\beta }}_0,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)\}+n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }}{\varvec{\beta }},i}\{{\varvec{\beta }}^*,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}^*)\}n^{1/2}({\widehat{{\varvec{\beta }}}}-{\varvec{\beta }}_0)\nonumber \\&+n^{-1}\sum _{i=1}^n\left[ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}\{{\varvec{\beta }}^*,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}^*)\} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}^*)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}} \right] n^{1/2}({\widehat{{\varvec{\beta }}}}-{\varvec{\beta }}_0)\nonumber \\= & {} n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}\{{\varvec{\beta }}_0,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)\}+\left[ E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\beta }},i}({\varvec{\beta }}_0,m)\right\} +o_p(1)\right] n^{1/2}({\widehat{{\varvec{\beta }}}}-{\varvec{\beta }}_0)\nonumber \\&+\left[ E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}+\mathbf{r}_2 \right] n^{1/2}({\widehat{{\varvec{\beta }}}}-{\varvec{\beta }}_0), \end{aligned}$$
(A.29)

where \(\mathbf{r}_2\) is the residual term and is of smaller order of \(E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}/{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}\) componentwise. We claim that the residual term \(\mathbf{r}_2\) satisfies \(\Vert \mathbf{r}_2\Vert _2=o_p(1)\) and \(\Vert \mathbf{r}_2\Vert _\infty =o_p(1)\). This is because

$$\begin{aligned} \left\| E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}\right\| _2\le & {} \Vert E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}\Vert _2 \left\| \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}\right\| _2\\= & {} O_p(h^{1/2})O_p(h^{-1/2}) = O_p(1), \end{aligned}$$

and

$$\begin{aligned} \left\| E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\} \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}\right\| _\infty\le & {} \Vert E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}\Vert _\infty \left\| \frac{\partial {\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)}{\partial {\varvec{\beta }}^\mathsf{\scriptscriptstyle T}}\right\| _\infty \\= & {} O_p(1)O_p(1) = O_p(1), \end{aligned}$$

which leads to the claimed order of the residual \(\mathbf{r}_2\) in (A.29).

We further use Taylor expansion to write

$$\begin{aligned}&n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}\{{\varvec{\beta }}_0,{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)\}\\&\quad =n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) +n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}^*) \{{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\}\\&\quad =n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)\\&\qquad +n^{-1/2}\sum _{i=1}^n\frac{-(1+\varDelta _i) \mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\{{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\}du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\}du]^2}\\&\quad = n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)\\&\qquad +n^{-1/2}\sum _{i=1}^n\frac{-(1+\varDelta _i) \mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)\{{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\}du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}\\&\qquad +n^{1/2}O_p(h^q)O_p(h^q+n^{-1/2}h^{-1/2})\\&\quad =n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)\\&\qquad +\left( E\left[ \frac{-(1+\varDelta _i) \mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}\right] +\mathbf{r}\right) n^{1/2}\{{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\}\\&\qquad +o_p(1)\\&\quad = n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) +E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right\} n^{1/2}\{{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\} +o_p(1), \end{aligned}$$

where \({\varvec{\gamma }}^*=\rho {\varvec{\gamma }}_0+(1-\rho ){\widehat{{\varvec{\gamma }}}}\), \(0<\rho <1\), and the residual term \(\mathbf{r}\) in the second last equality satisfies \(\Vert \mathbf{r}\Vert _\infty =O_p(n^{-1/2})\) and \(\Vert \mathbf{r}\Vert _2=O_p(n^{-1/2}h^{1/2})\).

Plugging this and (A.28) into (A.29), recall that

$$\begin{aligned} \mathbf{A}=E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\beta }},i}({\varvec{\beta }}_0,m)\right\} -E\{\mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\} [E\{\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}]^{-1} E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T}({\varvec{\beta }}_0,m)\right\} , \end{aligned}$$

we get

$$\begin{aligned}&n^{1/2}\{-\mathbf{A}+o_p(1)\}({\widehat{{\varvec{\beta }}}}-{\varvec{\beta }})\\&\quad = n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) +E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right\} n^{1/2}\{{\widehat{{\varvec{\gamma }}}}({\varvec{\beta }}_0)-{\varvec{\gamma }}_0\} +o_p(1)\\&\quad = n^{-1/2}\sum _{i=1}^n\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) +n^{-1/2}\sum _{i=1}^nE\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right\} \mathbf{V}_n({\varvec{\beta }}_0)^{-1} \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) +o_p(1). \end{aligned}$$

It is straightforward to check that

$$\begin{aligned}&E\{\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,m)\}\\&\quad = E\left[ \varDelta _i\mathbf{Z}_i- (1+\varDelta _i) \frac{\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}{ 1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] \\&\quad = \int \left[ \mathbf{Z}_i-2 \frac{\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] \frac{\hbox {exp}[\{m(X_i)+\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}\}]S_C(X_i,\mathbf{Z}_i)}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}dX_i\\&\qquad -\int \left[ \frac{\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] \frac{f_C(X_i,\mathbf{Z}_i)}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du} dX_i\\&\quad = \int \frac{\partial }{\partial X_i}\left[ \frac{\mathbf{Z}_i\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du S_C(X_i,\mathbf{Z}_i)}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}\right] dX_i\\&\quad =\mathbf{0}, \end{aligned}$$

and we already have \(E\{\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}=\mathbf{0}\). Thus,

$$\begin{aligned}&E[\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) +E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right\} \mathbf{V}_n({\varvec{\beta }}_0)^{-1} \mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)]\\&\quad = E\left[ \mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)-\mathbf{S}_{{\varvec{\beta }},i}({\varvec{\beta }}_0,m) +E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right\} \mathbf{V}_n({\varvec{\beta }}_0)^{-1} \right. \\&\left. \{\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)-\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}\right] \\&\quad =O(h^{q+1})+O_p(\Vert E\left\{ \mathbf{S}_{{\varvec{\beta }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\right\} \Vert _\infty \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty \Vert E\{\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0)\\&\qquad -\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)\}\Vert _\infty )\\&\quad =O(h^{q+1})+O_p(1)O_p(h^{-1})O_p(h^{q+1})\\&\quad =O(h^{q}). \end{aligned}$$

By Central Limit Theorem, \(n^{1/2}({\widehat{{\varvec{\beta }}}}-{\varvec{\beta }})\rightarrow \hbox {Normal}\{\mathbf{0},\mathbf{A}^{-1}\varvec{\varSigma }(\mathbf{A}^{-1})^\mathsf{\scriptscriptstyle T}\}\), where \(\varvec{\varSigma }\) is given in Theorem 2. \(\square \)

Proof of Theorem 1

We prove the theorem in two steps. First we derive the asymptotic distribution of the solution \({\tilde{{\varvec{\theta }}}}\) by restricting \({\varvec{\theta }}\) to the oracle group selection set \({\mathcal {S}}\). Then we validate that the \({\tilde{{\varvec{\theta }}}}\) satisfies the optimality condition of the original problem (6). Without loss of generality, we rearrange the order of the covariates by moving the nonzero groups to the front. We would have simpler notation with \({\mathcal {S}} = \{1,\dots , card({\mathcal {S}})\}\). We denote the Hessian and its limit

$$\begin{aligned} \hat{{\mathbf {H}}} = -n^{-1}\ell _n''({\widehat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}), \, {\mathbf {H}} = E \left( \begin{array}{cc} \mathbf{S}_{{\varvec{\beta }},{\varvec{\beta }},i} &{} \mathbf{S}_{{\varvec{\beta }},{\varvec{\gamma }},i} \\ \mathbf{S}_{{\varvec{\gamma }},{\varvec{\beta }},i} &{} \mathbf{S}_{{\varvec{\gamma }},{\varvec{\gamma }},i} \end{array}\right) , \end{aligned}$$

and the sub-matrices notations \(A_{{\mathcal {S}},\cdot }\) for selecting rows, \(A_{\cdot ,{\mathcal {S}}}\) for selecting columns, \(A_{{\mathcal {S}},{\mathcal {S}}}\) for selecting rows and columns in \({\mathcal {S}}\cup \{p+1,\dots , p+P_n\}\). We denote the variance of score as \( \mathbf{V}= E \left\{ (\mathbf{S}_{{\varvec{\beta }},i}^\mathsf{\scriptscriptstyle T}, \mathbf{S}_{{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T})^\mathsf{\scriptscriptstyle T}(\mathbf{S}_{{\varvec{\beta }},i}^\mathsf{\scriptscriptstyle T}, \mathbf{S}_{{\varvec{\gamma }},i}^\mathsf{\scriptscriptstyle T})\right\} . \)

Define the oracle selection subspace \(R^{{\mathcal {S}}} = \{{\varvec{\theta }}\in R^{p+P_n}: \theta _j = 0, \text { for } j\le p, j\notin {\mathcal {S}}\}\) and the estimator under oracle selection

$$\begin{aligned} {\tilde{{\varvec{\theta }}}} =\underset{{\varvec{\theta }}\in R^{{\mathcal {S}}}}{\text{ argmin }} ({\varvec{\theta }}- {\widehat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE})^\mathsf{\scriptscriptstyle T}{\hat{{\mathbf {H}}}}({\varvec{\theta }}- {\widehat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE})+\lambda \sum _g \frac{\left\| {\varvec{\beta }}^{\scriptscriptstyle [g]}\right\| _2}{\Vert {\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}\Vert _2}. \end{aligned}$$
(A.30)

Since \({\mathcal {S}}\) contains only groups with nonzero coefficient of \({\varvec{\beta }}_0\), and \({\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf MLE}\) is consistent for \({\varvec{\beta }}_0\) by Lemma 2, the denominators in the penalty terms in (A.30) \(\Vert {\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}\Vert _2\) are bounded away from zero. Then choosing \(\lambda = o(n^{-1/2})\), we have the solution as

$$\begin{aligned} {\tilde{{\varvec{\theta }}}}_{{\mathcal {S}}} = {\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},\cdot } {\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}+ o_p\left( n^{-1/2}\right) . \end{aligned}$$

Using the identity

$$\begin{aligned} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},\cdot }{\varvec{\theta }}_0 = {\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}{\varvec{\theta }}_{\scriptscriptstyle 0, {\mathcal {S}}}= {\varvec{\theta }}_{\scriptscriptstyle 0, {\mathcal {S}}}, \end{aligned}$$

we may derive the estimation error of \({\tilde{{\varvec{\theta }}}}\) as

$$\begin{aligned} \sqrt{n}({\tilde{{\varvec{\theta }}}}_{\scriptscriptstyle {\mathcal {S}}}- {\varvec{\theta }}_{\scriptscriptstyle 0, {\mathcal {S}}}) =&\sqrt{n}{\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},\cdot } ({\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}- {\varvec{\theta }}_0) + \underbrace{{\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},\cdot }{\varvec{\theta }}_0 - {\varvec{\theta }}_{\scriptscriptstyle 0, {\mathcal {S}}}}_{=0} + O_p\left( \sqrt{n}\lambda \right) \nonumber \\ =&\,\, {\hat{{\mathbf {H}}}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\hat{{\mathbf {H}}}}_{{\mathcal {S}},\cdot } \sqrt{n}({\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}- {\varvec{\theta }}_0) + o_p(1). \end{aligned}$$
(A.31)

In the proof of the Lemma 2, we have establish for \(h^q \ll n^{-1/2}\) the asymptotic normality of \({\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}\) and the consistency of Hessian

$$\begin{aligned} \sqrt{n}({\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}- {\varvec{\theta }}_0) \rightarrow \mathrm {Normal}\left( {\mathbf {0}}, {\mathbf {H}}^{-1} \mathbf{V}{\mathbf {H}}^{-1}\right) , \; \Vert {\widehat{{\mathbf {H}}}} - {\mathbf {H}}\Vert = O_p\left( n^{-1/2}\right) . \end{aligned}$$
(A.32)

Applying the (A.32) to (A.31), we obtain

$$\begin{aligned} \sqrt{n}({\tilde{{\varvec{\theta }}}}_{\scriptscriptstyle {\mathcal {S}}}- {\varvec{\theta }}_{\scriptscriptstyle 0, {\mathcal {S}}}) \rightarrow&\mathrm {Normal}\left( {\mathbf {0}}, {\mathbf {H}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} {\mathbf {H}}_{{\mathcal {S}},\cdot }{\mathbf {H}}^{-1} \mathbf{V}{\mathbf {H}}^{-1}{\mathbf {H}}_{\cdot ,{\mathcal {S}}} {\mathbf {H}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} \right) \\ =&\mathrm {Normal}\left( {\mathbf {0}}, {\mathbf {H}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} \mathbf{V}_{{\mathcal {S}},{\mathcal {S}}} {\mathbf {H}}_{{\mathcal {S}},{\mathcal {S}}}^{-1} \right) . \end{aligned}$$

Profiling out \({\varvec{\gamma }}\) components as in Lemma 2, we have

$$\begin{aligned} \sqrt{n}({\tilde{{\varvec{\beta }}}}_{\scriptscriptstyle {\mathcal {S}}}- {\varvec{\beta }}_{\scriptscriptstyle 0, {\mathcal {S}}}) \rightarrow \mathrm {Normal}\left( {\mathbf {0}}, \mathbf{A}_{{\mathcal {S}}, {\mathcal {S}}}^{-1}\varvec{\varSigma }_{{\mathcal {S}}, {\mathcal {S}}}\mathbf{A}_{{\mathcal {S}}, {\mathcal {S}}}^{-1} \right) . \end{aligned}$$
(A.33)

The optimality condition for original problem (6) is

$$\begin{aligned} \text {if } \left\| {\varvec{\beta }}^{[g]}\right\| _2> 0,&\quad 2{\hat{{\mathbf {H}}}}_{[g],\cdot } {\varvec{\theta }}- 2{\hat{{\mathbf {H}}}}_{[g],\cdot }{\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}+ \frac{\lambda {\varvec{\beta }}^{[g]}}{\left\| {\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}\right\| _2\left\| {\varvec{\beta }}^{[g]}\right\| _2}={\mathbf {0}}, \nonumber \\ \text {if } \left\| {\varvec{\beta }}^{[g]}\right\| _2 = 0,&\quad 2\left\| {\hat{{\mathbf {H}}}}_{[g],\cdot } {\varvec{\theta }}- {\hat{{\mathbf {H}}}}_{[g],\cdot }{\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}\right\| _2 \le \frac{\lambda }{\left\| {\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}\right\| _2}, \nonumber \\ \text {for } j > p,&\quad 2{\hat{{\mathbf {H}}}}_{j,\cdot } {\varvec{\theta }}- 2{\hat{{\mathbf {H}}}}_{j,\cdot }{\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}=0. \end{aligned}$$
(A.34)

The oracle selection estimator \({\tilde{{\varvec{\theta }}}}\) must satisfy the conditions in (A.34) for positions in \(R^{{\mathcal {S}}}\) by the same set of optimality conditions for (A.30). We only need to verify that \({\tilde{{\varvec{\theta }}}}\) also satisfy the conditions in (A.34) for \(j \in {\mathcal {S}}^c = \{1,\dots ,p\}\setminus {\mathcal {S}}\). By Lemma 2 and the definition of \({\mathcal {S}}\), we have

$$\begin{aligned} {\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}= O_p\left( n^{-1/2}\right) , \; \text { for } g:\, {\varvec{\beta }}_{0,[g]} = {\mathbf {0}}. \end{aligned}$$
(A.35)

For \(\lambda \gg n^{-1}\), the penalty factor for zero group g is

$$\begin{aligned} \frac{\lambda }{\Vert {\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}\Vert _2} \gg n^{-1/2}, \text { for } g:\, {\varvec{\beta }}_{0,[g]} = {\mathbf {0}}. \end{aligned}$$
(A.36)

By definition \({\tilde{{\varvec{\theta }}}} \in R^{{\mathcal {S}}}\), the \({\mathcal {S}}^c\) components of \({\tilde{{\varvec{\theta }}}}\) are all zero,

$$\begin{aligned} {\tilde{{\varvec{\theta }}}}_{[g]} = {\mathbf {0}}, \text { for } g:\, {\varvec{\beta }}_{0,[g]} = {\mathbf {0}}. \end{aligned}$$
(A.37)

Combining (A.35)-(A.37), we establish that the optimality conditions in (A.34) hold asymptotically

$$\begin{aligned} 2\left\| {\hat{{\mathbf {H}}}}_{[g],\cdot } {\tilde{{\varvec{\theta }}}} - {\hat{{\mathbf {H}}}}_{[g],\cdot }{\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE}\right\| _2 \asymp n^{-1/2} \ll \frac{\lambda }{\Vert {\hat{{\varvec{\theta }}}}_{\scriptscriptstyle \mathsf MLE,[g]}\Vert _2} \end{aligned}$$

for \(g:\, {\varvec{\beta }}_{0,[g]} = {\mathbf {0}}\), i.e. all elements in \({\mathcal {S}}^c\). Therefore, we conclude that \({\hat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf glasso}= {\tilde{{\varvec{\beta }}}}\) with large probability. The asymptotic distribution of \({\tilde{{\varvec{\beta }}}}\) (A.33) is thus the asymptotic distribution of \({\hat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf glasso}\). \(\square \)

Proof of Corollary 1

Using delta method, it is seen that

$$\begin{aligned}&{\hat{F}}(t|{\hat{\mathbf{Z}}}) - F(t|\mathbf{Z})\\&\quad \asymp F(t|\mathbf{Z})\{1-F(t|\mathbf{Z})\} \int _0^t e^{{\varvec{\beta }}_0^\mathsf{\scriptscriptstyle T}\mathbf{Z}+ m(u)} \\&\qquad \left\{ \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\widehat{{\varvec{\gamma }}}}_{\scriptscriptstyle \mathsf glasso}- m(u) + ({\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf glasso}- {\varvec{\beta }}_0)^\mathsf{\scriptscriptstyle T}\mathbf{Z}+ {\varvec{\beta }}_0^\mathsf{\scriptscriptstyle T}({\widehat{\mathbf{Z}}} - \mathbf{Z}) \right\} du \\&\quad \asymp F(t|\mathbf{Z})\{1-F(t|\mathbf{Z})\} \int _0^t e^{{\varvec{\beta }}_0^\mathsf{\scriptscriptstyle T}\mathbf{Z}+ m(u)} \left\{ \mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u)({\widehat{{\varvec{\gamma }}}}_{\scriptscriptstyle \mathsf glasso}-{\varvec{\gamma }}_0) + ({\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf glasso}- {\varvec{\beta }}_0)^\mathsf{\scriptscriptstyle T}\mathbf{Z}\right\} du \\&\qquad + O_p(h^q) + O_p\left( \Vert {\widehat{\mathbf{Z}}} - \mathbf{Z}\Vert \right) \end{aligned}$$

Applying the \(\sqrt{n}\) asymptotical normality of \({\widehat{{\varvec{\gamma }}}}_{\scriptscriptstyle \mathsf glasso}\) and \({\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf glasso}\) of Lemma 1 and 1 along with Assumption (C5), we conclude that

$$\begin{aligned} {\hat{F}}(t|{\hat{\mathbf{Z}}}) - F(t|\mathbf{Z}) \asymp n^{-1/2} + h^q \end{aligned}$$

and \(\sqrt{n}\) asymptotically normal with \(h < n^{-1/2q}\). \(\square \)

Appendix E matrix norms

Lemma A5

There exists constants \(0<c<C<\infty \) such that, for n sufficiently large, with probability approach 1,

$$\begin{aligned} ch<\left\| -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\| _2<Ch,\\ ch<\left\| -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\| _\infty<Ch,\\ ch<\left\| \mathbf{V}_n({\varvec{\beta }}_0)\right\| _2<Ch,\\ ch<\left\| \mathbf{V}_n({\varvec{\beta }}_0)\right\| _\infty <Ch, \end{aligned}$$

where \({\varvec{\gamma }}^*\) is an arbitrary vector in \(R^{P_n}\) with \(\Vert {\varvec{\gamma }}^*-{\varvec{\gamma }}_0\Vert _2 = o_p(1)\). Furthermore, for arbitrary \(\mathbf{a}\in R^{P_n}\),

$$\begin{aligned}&ch\Vert \mathbf{a}\Vert _2^2<\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} \mathbf{a}<Ch\Vert \mathbf{a}\Vert _2^2,\\&ch\Vert \mathbf{a}\Vert _2^2<\mathbf{a}^\mathsf{\scriptscriptstyle T}\mathbf{V}_n({\varvec{\beta }}_0)\mathbf{a}<Ch\Vert \mathbf{a}\Vert _2^2. \end{aligned}$$

Proof of Corollary 1

We only prove the result for \(\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\). The proof for \(\mathbf{V}_n({\varvec{\beta }}_0)\) can be obtained similarly. We have

$$\begin{aligned}&\quad -n^{-1}\mathbf{a}^\mathsf{\scriptscriptstyle T}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\mathbf{a}\nonumber \\&\quad =n^{-1}\sum _{i=1}^n\mathbf{a}^\mathsf{\scriptscriptstyle T}\left( (1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\}du}\right. \nonumber \\&\quad \quad \left. -(1+\varDelta _i)\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) [\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\}du]^2}\right) \mathbf{a}\nonumber \\&\quad \ge n^{-1}\sum _{i=1}^n\mathbf{a}^\mathsf{\scriptscriptstyle T}\left( (1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\} \mathbf{B}_r(u)^{\otimes 2} du}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\}du]^2}\right) \mathbf{a}\nonumber \\&\quad \ge c_1' n^{-1}\sum _{i=1}^n\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ (1+\varDelta _i) \mathbf{B}_r(u)^{\otimes 2} du\right\} \mathbf{a}\nonumber \\&\quad \rightarrow c_1'E\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ (1+\varDelta _i)\int _0^{X_i}\mathbf{B}_r(u)^{\otimes 2} du \right\} \mathbf{a}\nonumber \\&\quad \ge c_1'\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ \int _0^{{{\mathcal {E}}}} \int _0^{x}\mathbf{B}_r(u)^{\otimes 2} du f_C(x)S_T(x)dx \right\} \mathbf{a}\nonumber \\&\quad = c_1'\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ \int _0^{{{\mathcal {E}}}-} \int _0^{x}\mathbf{B}_r(u)^{\otimes 2} du f_C(x)S_T(x)dx +\int _0^{{{\mathcal {E}}}}\mathbf{B}_r(u)^{\otimes 2} du S_C({{\mathcal {E}}}-)S_T({{\mathcal {E}}})\right\} \mathbf{a}\nonumber \\&\quad \ge S_T({{\mathcal {E}}})S_C({{\mathcal {E}}}-)c_1'\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ \int _0^{{{\mathcal {E}}}}\mathbf{B}_r(u)^{\otimes 2} du \right\} \mathbf{a}\nonumber \\&\quad \ge c_1h\Vert \mathbf{a}\Vert _2^2, \end{aligned}$$
(A.38)

for positive constants \(0<c_1', c_1<\infty \) by conditions (C1) and (C4).

Following a similar proof, we can further obtain

$$\begin{aligned}&\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} \mathbf{a}\nonumber \\&\quad \le n^{-1}\sum _{i=1}^n\mathbf{a}^\mathsf{\scriptscriptstyle T}\left[ (1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}^*\}du}\right] \mathbf{a}\nonumber \\&\quad \le C_1' n^{-1}\sum _{i=1}^n\mathbf{a}^\mathsf{\scriptscriptstyle T}\int _0^{X_i}\mathbf{B}_r(u)^{\otimes 2} du\mathbf{a}\nonumber \\&\quad \le C_1' \mathbf{a}^\mathsf{\scriptscriptstyle T}\int _0^{{{\mathcal {E}}}}\mathbf{B}_r(u)^{\otimes 2} du\mathbf{a}\nonumber \\&\quad \le C_1h\Vert \mathbf{a}\Vert _2^2, \end{aligned}$$
(A.39)

for some constant \(0<C_1',C_1<\infty \), because \(\int _0^{{{\mathcal {E}}}}\mathbf{B}_r(u)^{\otimes 2} du\) is an r-banded matrix with diagonal and \(j^{\mathrm{th}}\) off-diagonal elements of order O(h) uniformly elementwise, for \(j=1,\cdots ,r-1\), and 0 elsewhere.

Combining (A.38) and (A.39), we have

$$\begin{aligned} c_1h<\left\| -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\| _2<C_1h. \end{aligned}$$

Next, we investigate the order of \(\Vert -n^{-1}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}\Vert _\infty \). We have

$$\begin{aligned}&\left\| -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\| _\infty \\&\quad = \mathrm{max}_{1\le j\le P_n}\sum _{k=1}^{P_n} |\left[ -n^{-1}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}\right] _{jk}|\\&\quad \ge \sum _{k=1}^{P_n} \left| \left[ -n^{-1}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}\right] _{1k}\right| \\&\quad \ge c_2'\sum _{k=1}^{P_n} \left| n^{-1}\sum _{i=1}^n\int _0^{X_i} B_{r,1}(u)B_{r,k}(u)du\right| \\&\quad = c_2'n^{-1}\sum _{i=1}^n\int _0^{X_i} \sum _{k=1}^{P_n} B_{r,1}(u)B_{r,k}(u)du\\&\quad = c_2'n^{-1}\sum _{i=1}^n\int _0^{X_i} \sum _{k=1}^{r} B_{r,1}(u)B_{r,k}(u)du\\&\quad \rightarrow c_2'E\int _0^{X_i} \sum _{k=1}^{r} B_{r,1}(u)B_{r,k}(u)du\\&\quad = c_2 h \end{aligned}$$

with probability 1 as \(n\rightarrow \infty \), where \(0<c_2,c_2'<\infty \) are constants. Here, for an arbitrary matrix \(\mathbf{A}\) we use \(\mathbf{A}_{jk}\) to denote its element in the \(j^{\mathrm{th}}\) row and the \(k^{\mathrm{th}}\) column. In the above inequalities, we use the fact that B-spline basis are all non-negative and are non-zero on no more than r consecutive intervals formed by its knots.

On the other hand,

$$\begin{aligned}&\Vert -n^{-1}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}\Vert _\infty \\&\quad = \mathrm{max}_{1\le j\le P_n}\sum _{k=1}^{P_n} |\left[ -n^{-1}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}\right] _{jk}|\\&\quad \le C_2'\mathrm{max}_{1\le j\le P_n}\sum _{k=1}^{P_n} \left[ \left| \left\{ n^{-1}\sum _{i=1}^n\int _0^{X_i}\mathbf{B}_r(u)^{\otimes 2}du\right\} _{jk}\right| +\left| n^{-1}\sum _{i=1}^n\left\{ \int _0^{X_i}\mathbf{B}_{r}(u)du\right\} ^{\otimes 2}_{jk}\right| \right] \\&\quad \le C_2'n^{-1}\sum _{i=1}^n\mathrm{max}_{1\le j\le P_n}\sum _{k=1}^{P_n}\left[ \left| \left\{ \int _0^{X_i}\mathbf{B}_r(u)^{\otimes 2}du\right\} _{jk}\right| +\left| \left\{ \int _0^{X_i}\mathbf{B}_{r}(u)du\right\} ^{\otimes 2}_{jk}\right| \right] \\&\quad = C_2'n^{-1}\sum _{i=1}^n\mathrm{max}_{1\le j\le P_n}\sum _{k=1}^{P_n} \left[ \left\{ \int _0^{X_i}\mathbf{B}_{r,j}(u)\mathbf{B}_{r,k}(u)du\right\} + \int _0^{X_i}\mathbf{B}_{r,j}(u)du\int _0^{X_i}\mathbf{B}_{r,k}(u)du \right] \\&\quad \le C_2'n^{-1}\sum _{i=1}^n\left[ \mathrm{max}_{1\le j\le P_n}\sum _{k=\mathrm{max}(1,j-r+1)}^{\mathrm{min}(j+r-1,P_n)} \left\{ \int _0^{X_i}\mathbf{B}_{r,j}(u)\mathbf{B}_{r,k}(u)du\right\} \right. \\&\quad \left. + \mathrm{max}_{1\le j\le P_n} \sum _{k=1}^{P_n}\int _0^{X_i}\mathbf{B}_{r,j}(u)du\int _0^{X_i}\mathbf{B}_{r,k}(u)du\right] \\&\quad \le C_2'n^{-1}\sum _{i=1}^n\left[ \mathrm{max}_{1\le j\le P_n}\sum _{k=\mathrm{max}(1,j-r+1)}^{\mathrm{min}(j+r-1,P_n)} \left\{ \int _0^{{{\mathcal {E}}}}\mathbf{B}_{r,j}(u)\mathbf{B}_{r,k}(u)du\right\} \right. \\&\quad \left. + \mathrm{max}_{1\le j\le P_n} \sum _{k=1}^{P_n}\int _0^{{{\mathcal {E}}}}\mathbf{B}_{r,j}(u)du\int _0^{{{\mathcal {E}}}}\mathbf{B}_{r,k}(u)du\right] \\&\quad \le C_2h. \end{aligned}$$

Hence, \(c_2h\le \Vert -n^{-1}\{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^{*})/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\}\Vert _\infty \le C_2h\).

Therefore, Lemma A5 holds for \(c = \mathrm{min}(c_1,c_2)\) and \(C = \mathrm{max}(C_1,C_2)\). \(\square \)

Lemma A6

$$\begin{aligned} \left\| -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}-\mathbf{V}_n({\varvec{\beta }}_0)\right\| _2 = O_p(h^{q+1}+n^{-1/2}h),\\ \left\| -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}-\mathbf{V}_n({\varvec{\beta }}_0)\right\| _\infty = O_p(h^{q+1}+n^{-1/2}h). \end{aligned}$$

Proof

Similarly as the previous derivations,

$$\begin{aligned}&\left\| -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}-\mathbf{V}_n({\varvec{\beta }}_0)\right\| _\infty \nonumber \\&\quad =\Vert -n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }}{\varvec{\gamma }},i}({\varvec{\beta }}_0,{\varvec{\gamma }}_0) -\mathbf{V}_n({\varvec{\beta }}_0)\Vert _\infty \nonumber \\&\quad \le \left\| n^{-1}\sum _{i=1}^n(1+\varDelta _i)\left( \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}_0\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}_0\}du}\right. \right. \nonumber \\&\qquad - \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\nonumber \\&\qquad -\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) [\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}_0\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{\mathbf{B}_r^\mathsf{\scriptscriptstyle T}(u){\varvec{\gamma }}_0\}du]^2}\nonumber \\&\qquad \left. \left. +\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) [\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2} \right) \right\| _\infty \nonumber \\&\qquad +\left\| n^{-1}\sum _{i=1}^n(1+\varDelta _i)\left( \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right. \right. \nonumber \\&\qquad \left. \left. -\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) [\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2} \right) \right. \nonumber \\&\qquad -E\left\{ (1+\varDelta _i)\left( \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right. \right. \nonumber \\&\qquad \left. \left. \left. +\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) [\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2} \right) \right\} \right\| _\infty \nonumber \\&\quad = O_p(h^{q+1}+n^{-1/2}h), \end{aligned}$$
(A.40)

where the second term \(O_p(n^{-1/2}h)\) in the last equality is obtained using both the Central Limit Theorem and the matrices above are banded to the first order. Specifically, \(-n^{-1}\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}-\mathbf{V}_n({\varvec{\beta }}_0)\) has diagonal and \(j^{\mathrm{th}}\) off-diagonal element with order \(O_p(h^{q+1}+n^{-1/2}h)\) for \(j=1,\cdots ,r-1\) and all the other elements of order \(O_p(h^{q+2}+n^{-1/2}h^2)\). Further,

$$\begin{aligned} \left\| -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}- \mathbf{V}_n({\varvec{\beta }}_0)\right\| _2=O_p(h^{q+1}+n^{-1/2}h), \end{aligned}$$

again because the matrices are banded to the first order. In fact for arbitrary vector \(\mathbf{a}\in R^{P_n}\),

$$\begin{aligned}&\left| \mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}- \mathbf{V}_n({\varvec{\beta }}_0)\right\} \mathbf{a}\right| \\&\quad \le \sum _{j,k}|a_j| \left| \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}- \mathbf{V}_n({\varvec{\beta }}_0)\right\} _{jk}\right| |a_k| \\&\quad = \sum _{|j-k|\le 2r-1}|a_j| \left| \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}- \mathbf{V}_n({\varvec{\beta }}_0)\right\} _{jk}\right| |a_k| \\&\qquad +\sum _{|j-k|>2r-1}|a_j| \left| \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}- \mathbf{V}_n({\varvec{\beta }}_0)\right\} _{jk}\right| |a_k|\\&\quad \le C'(h^{q+1}+n^{-1/2}h)\sum _{|j-k|\le {2r-1}} |a_j| |a_k| +C_3''(h^{q+2}+n^{-1/2}h^2)\sum _{|j-k|> {2r-1}} |a_j| |a_k|\\&\quad \le C'(h^{q+1}+n^{-1/2}h)\sum _{|j-k|\le {2r-1}} (a_j^2+ a_k^2)/2 +C_3''(h^{q+2}+n^{-1/2}h^2)\sum _{|j-k|> {2r-1}} (a_j^2+ a_k^2)/2\\&\quad \le C'(h^{q+1}+n^{-1/2}h)(2r+hP_n) \Vert \mathbf{a}\Vert _2^2\\&\quad \le C(h^{q+1}+n^{-1/2}h)\Vert \mathbf{a}\Vert _2^2, \end{aligned}$$

where \(0<C,C'<\infty \) are constants. \(\square \)

Lemma A7

There exists constant \(0<c,C<\infty \), such that for n sufficiently large, with probability approach 1,

$$\begin{aligned} ch^{-1/2}<\left\| \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}\right\| _\infty<Ch^{-1},\\ ch^{-1/2}<\Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty <Ch^{-1}. \end{aligned}$$

Proof

We have \(\mathbf{V}_n({\varvec{\beta }}_0) = h\mathbf{V}_0({\varvec{\beta }}_0)-h^2\mathbf{V}_1({\varvec{\beta }}_0)\), where

$$\begin{aligned} \mathbf{V}_0({\varvec{\beta }}_0)=h^{-1}E\left[ (1+\varDelta _i) \frac{ \hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u)^{\otimes 2} du}{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du} \right] \end{aligned}$$

is a banded matrix with each nonzero element of order O(1) uniformly and

$$\begin{aligned} \mathbf{V}_1({\varvec{\beta }}_0) = h^{-2}E\left\{ (1+\varDelta _i)\frac{\hbox {exp}(2\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0) [\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du]^{\otimes 2}}{[1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}_0)\int _0^{X_i}\hbox {exp}\{m(u)\}du]^2}\right\} \end{aligned}$$

is a matrix with all elements of order O(1) uniformly. It is easily seen that \(\mathbf{V}_0({\varvec{\beta }})\) is positive definite, and \(\mathbf{V}_1({\varvec{\beta }})\) is semi-positive definite.

According to Demko (1977) and Theorem 4.3 in Chapter 13 of DeVore and Lorentz (1993), we have

$$\begin{aligned} \Vert \mathbf{V}_0({\varvec{\beta }}_0)^{-1}\Vert _\infty \le C', \end{aligned}$$

for some constant \(0<C'<\infty \). Furthermore, there exists constants \(0<C''<\infty \) and \(0<\lambda <1\) such that \(|\{\mathbf{V}_0({\varvec{\beta }})^{-1}\}_{jk}|\le C''\lambda ^{|j-k|}\) for \(j,k=1,\cdots ,P_n\).

We want to show that

$$\begin{aligned} \Vert \{\mathbf{I}-h\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\mathbf{V}_1({\varvec{\beta }}_0)\}^{-1}\Vert _\infty \end{aligned}$$

is bounded. As a result,

$$\begin{aligned} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty= & {} h^{-1}\Vert \{\mathbf{I}-h\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\mathbf{V}_1({\varvec{\beta }}_0)\}^{-1}\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\Vert _\infty \\\le & {} h^{-1}\Vert \{\mathbf{I}-h\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\mathbf{V}_1({\varvec{\beta }}_0)\}^{-1}\Vert _\infty \Vert \mathbf{V}_0({\varvec{\beta }}_0)^{-1}\Vert _\infty \\= & {} O_p(h^{-1}). \end{aligned}$$

Denote \(\mathbf{W}= -\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\mathbf{V}_1({\varvec{\beta }}_0)\). There exists constant \(0<\kappa <\infty \) such that \(|\{\mathbf{V}_1({\varvec{\beta }}_0)\}_{jk}|<\kappa \) for \(j,k=1,\cdots ,P_n\). Hence,

$$\begin{aligned} |\mathbf{W}_{ij}|= & {} |\{\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\mathbf{V}_1({\varvec{\beta }}_0)\}_{jk}| \\= & {} \bigg |\sum _{\ell =1}^{P_n} \{\mathbf{V}_0({\varvec{\beta }}_0)^{-1}\}_{j\ell }\{\mathbf{V}_1({\varvec{\beta }}_0)\}_{\ell k}\bigg |\\\le & {} \sum _{\ell =1}^{P_n} C'' \lambda ^{|j-\ell |}\kappa \\\le & {} 2C''\kappa (1-\lambda )^{-1}\le \kappa _1, \end{aligned}$$

where \(\kappa _1 = \mathrm{max}\{1,2C''\kappa (1-\lambda )^{-1}\}\ge 1\).

Let \(P_n h \le \kappa _2\), where \(1\le \kappa _2<\infty \) is a constant. Similar derivation as before shows there exists some constant \(0<{\widetilde{c}}<{\widetilde{C}}<\infty \), such that for arbitrary \(\mathbf{a}\in R^{P_n}\), \({\widetilde{c}}\Vert \mathbf{a}\Vert _2^2<\mathbf{a}^\mathsf{\scriptscriptstyle T}\mathbf{V}_0({\varvec{\beta }}_0)\mathbf{a}<{\widetilde{C}}\Vert \mathbf{a}\Vert _2^2\) and \({\widetilde{c}}\Vert \mathbf{a}\Vert _2^2<\mathbf{a}^\mathsf{\scriptscriptstyle T}\{\mathbf{V}_0({\varvec{\beta }}_0)-h\mathbf{V}_1({\varvec{\beta }}_0)\}\mathbf{a}<\widetilde{C}\Vert \mathbf{a}\Vert _2^2\). Hence,

$$\begin{aligned} \Vert (\mathbf{I}+h\mathbf{W})^{-1}\Vert _2= & {} \Vert \{\mathbf{V}_0({\varvec{\beta }}_0)-h\mathbf{V}_1({\varvec{\beta }}_0)\}^{-1}\mathbf{V}_0({\varvec{\beta }}_0)\Vert _2\\\le & {} \Vert \{\mathbf{V}_0({\varvec{\beta }}_0)-h\mathbf{V}_1({\varvec{\beta }}_0)\}^{-1}\Vert _2\Vert \mathbf{V}_0({\varvec{\beta }}_0)\Vert _2\\\le & {} \kappa _3\equiv {\widetilde{C}}/{\widetilde{c}}. \end{aligned}$$

where \(1\le \kappa _3<\infty \).

In the following, we will use induction to show that

$$\begin{aligned} a_{P_n}\equiv & {} |\mathrm{det}(\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|\le (1+h\kappa _4)^{P_n-1},\\ b_{P_n}\equiv & {} |\mathrm{det}(\mathbf{J}_{P_n}+h\mathbf{W}_{P_n})|\le (\kappa _1+2\kappa _1^2\kappa _2\kappa _3)h (1+h\kappa _4)^{P_n-2}, \end{aligned}$$

where \(\mathbf{J}_{P_n} = (\mathbf{J}_{ij})_{1\le i,j\le P_n}\) with \(\mathbf{J}_{ij}=1\) if \(j-i=1\) and \(\mathbf{J}_{ij}=0\) otherwise. Here \(\kappa _4=4\kappa _1^2\kappa _2(1+\kappa _1\kappa _2\kappa _3)\).

When \(P_n=2\),

$$\begin{aligned} a_2= & {} |\mathrm{det}(\mathbf{I}_2+h\mathbf{W}_2)| \\= & {} |(1+h\mathbf{W}_{11})(1+h\mathbf{W}_{22})-h^2\mathbf{W}_{12}\mathbf{W}_{21}|\\\le & {} |(1+h\mathbf{W}_{11})(1+h\mathbf{W}_{22})|+|h^2\mathbf{W}_{12}\mathbf{W}_{21}|\\\le & {} (1+h\kappa _1)^2+h^2\kappa _1^2\\= & {} 1+2h\kappa _1+2h^2\kappa _1^2\\\le & {} 1+4h\kappa _1^2 \le 1+h\kappa _4. \end{aligned}$$

Similary, we have

$$\begin{aligned} b_2= & {} |\mathrm{det}(\mathbf{J}_2+h\mathbf{W}_2)| \\\le & {} h^2\kappa _1^2+h(1+h\kappa _1)\kappa _1\\\le & {} (\kappa _1+2\kappa _1^2)h\\\le & {} (\kappa _1+2\kappa _1^2\kappa _2\kappa _3)h. \end{aligned}$$

Assume the result holds for \(2,\cdots ,P_n-1\), then for \(P_n\), denote \(\mathbf{W}_{P_n,-P_n} = (W_{P_n 1},\cdots ,W_{P_n (P_n-1)})^\mathsf{\scriptscriptstyle T}\) and \(\mathbf{W}_{-P_n,P_n} = (W_{1 P_n},\cdots ,W_{(P_n-1) P_n})^\mathsf{\scriptscriptstyle T}\), we have

$$\begin{aligned} a_{P_n}= & {} |\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|\\= & {} |\mathrm{det} (\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1}) (1+hW_{P_n P_n}-h^2\mathbf{W}_{P_n,-P_n}^\mathsf{\scriptscriptstyle T}(\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})^{-1} \mathbf{W}_{-P_n,P_n}|\\\le & {} a_{P_n-1} \{1+h\kappa _1+ h^2(n-1)\kappa _1^2\Vert (\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})^{-1}\Vert _2\}\\\le & {} a_{P_n-1} \{1+h\kappa _1+ h \kappa _1^2\kappa _2\kappa _3\}\\\le & {} (1+h\kappa _4)^{P_n-1}, \end{aligned}$$

and

$$\begin{aligned} b_{P_n}= & {} |\mathrm{det} (\mathbf{J}_{P_n}+h\mathbf{W}_{P_n})|\\= & {} |\{hW_{P_n 1}-h^2 \mathbf{W}_{P_n,-1}^\mathsf{\scriptscriptstyle T}(\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})^{-1}\mathbf{W}_{-P_n,1}\}\mathrm{det} (\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})|\\\le & {} \{h\kappa _1 +h^2\kappa _1^2(n-1)\Vert (\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})^{-1}\Vert _2\}a_{P_n-1}\\\le & {} h(\kappa _1 +\kappa _1^2\kappa _2\kappa _3) a_{P_n-1}\\\le & {} h(\kappa _1 +2\kappa _1^2\kappa _2\kappa _3) (1+h\kappa _4)^{P_n-2}. \end{aligned}$$

where \(\mathbf{W}_{P_n,-1}=(W_{P_n 2},\cdots ,W_{P_n P_n})^\mathsf{\scriptscriptstyle T}\) and \(\mathbf{W}_{-P_n,1}=(W_{1 1},\cdots ,W_{(P_n-1) 1})^\mathsf{\scriptscriptstyle T}\).

Therefore,

$$\begin{aligned}&\Vert (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})^{-1}\Vert _\infty \\&\quad =\mathrm{max}_j\sum _{k}|\{(\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})^{-1}\}_{jk}|\\&\quad \le \frac{a_{P_n-1}+b_{P_n-1}(P_n-1)}{|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|}\\&\quad \le \frac{(1+h\kappa _4)^{P_n}+(\kappa _1 +2\kappa _1^2\kappa _2\kappa _3)\kappa _2 (1+h\kappa _4)^{P_n-2}}{|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|}\\&\quad \le \frac{2\kappa _4(1+h\kappa _4)^{\frac{\kappa _2\kappa _4}{h\kappa _4}}}{|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|}, \end{aligned}$$

where the numerator converges to \(2\kappa _4\hbox {exp}(\kappa _2\kappa _4)\) as \(h\rightarrow 0\), or equivalently, \(P_n\rightarrow \infty \). Here in the first equation above we use the fact that the \((j,k)^{\mathrm{th}}\) element of the matrix \((\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})^{-1}\) is the determinant of the matrix \(\mathbf{I}_{P_n}+h\mathbf{W}_{P_n}\) without its \(j^{\mathrm{th}}\) column and \(k^{\mathrm{th}}\) row, divided by the determinant of \(\mathbf{I}_{P_n}+h\mathbf{W}_{P_n}\) itself. Specifically, when \(j=k\), the absolute value of that \((j,k)^{\mathrm{th}}\) element is \(|\mathrm{det}( \mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})|/|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|=a_{P_n-1}/|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})\); when \(j\ne k\), with certain column operations, we obtain \(|\mathrm{det} (\mathbf{J}_{P_n-1}+h\mathbf{W}_{P_n-1})|/|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|=b_{P_n-1}/|\mathrm{det} (\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|\).

Now it remains to show that there exists \(\kappa _5>0\), such that

$$\begin{aligned} a_{P_n}=|\mathrm{det}(\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|\ge \kappa _5, \end{aligned}$$

for \(P_n\) sufficiently large. This can be seen from

$$\begin{aligned} a_{P_n}= & {} |\mathrm{det}(\mathbf{I}_{P_n}+h\mathbf{W}_{P_n})|\\= & {} |\mathrm{det} (\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1}) (1+hW_{P_n P_n}\\&-h^2\mathbf{W}_{P_n,-P_n}^\mathsf{\scriptscriptstyle T}(\mathbf{I}_{P_n-1}+h\mathbf{W}_{P_n-1})^{-1} \mathbf{W}_{-P_n,P_n}|\\\ge & {} \{1-h\kappa _1-h^2(P_n-1)\kappa _1^2\kappa _3\}a_{P_n-1}\\\ge & {} (1-h\kappa _1-h\kappa _1^2\kappa _2\kappa _3)a_{P_n-1}\\\ge & {} (1-h\kappa _4)a_{P_n-1}\\\ge & {} (1-h\kappa _4)^{P_n-3}a_2\\\ge & {} (1-h\kappa _4)^{P_n-2}\\\ge & {} (1-h\kappa _4)^{\frac{\kappa _2\kappa _4}{h\kappa _4}}\rightarrow \hbox {exp}(-\kappa _2\kappa _4). \end{aligned}$$

Thus the result holds for \(\kappa _5=\hbox {exp}(-\kappa _2\kappa _4)\).

Therefore, we have \(\Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty \le Ch^{-1}\) for some constant \(0<C<\infty \).

On the other hand, we have

$$\begin{aligned} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _\infty \ge P_n^{-1/2} \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _2\ge ch^{-1/2}, \end{aligned}$$

for some constant \(0<c<\infty \) (Horn and Johnson 1990; Golub and Van Loan 1996).

The proof for \(\left\{ -n^{-1}\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\right\} ^{-1}\) is similar, and hence is omitted. \(\square \)

Lemma A8

$$\begin{aligned} \left\| \left\{ -n^{-1}\frac{\partial ^2l_n({\varvec{\beta }}_0,{\varvec{\gamma }}^*)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}-\mathbf{V}_n({\varvec{\beta }}_0)^{-1}\right\| _\infty =O_p(h^{q-1}+n^{-1/2}h^{-1}). \end{aligned}$$

Proof of Corollary 1

According to Lemmas A6 and A7, we have

$$\begin{aligned}&\left\| \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}- \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\right\| _\infty \\&\quad =\left\| \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\left[ \mathbf{V}_n({\varvec{\beta }}_0)-\left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} \right] \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}\right\| _{\infty }\\&\quad \le \left\| \left\{ -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}\right\} ^{-1}\right\| _{\infty } \Vert \mathbf{V}_n({\varvec{\beta }}_0)^{-1}\Vert _{\infty } \left\| -n^{-1}\frac{\partial ^2 l_n({\varvec{\beta }}_0,{\varvec{\gamma }}_0)}{\partial {\varvec{\gamma }}\partial {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}}- \mathbf{V}_n({\varvec{\beta }}_0)\right\| _{\infty }\\&\quad =O_p(h^{-2})O_p(h^{q+1}+n^{-1/2}h)\\&\quad =O_p(h^{q-1}+n^{-1/2}h^{-1}). \end{aligned}$$

\(\square \)

Lemma A9

There exists constants \(0<c<C<\infty \) such that for n sufficiently large, with probability approach 1, for arbitrary \(\mathbf{a}\in R^{P_n}\),

$$\begin{aligned} ch\Vert \mathbf{a}\Vert _2^2<\mathbf{a}^\mathsf{\scriptscriptstyle T}\left\{ n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)^{\otimes 2}\right\} \mathbf{a}<Ch\Vert \mathbf{a}\Vert _2^2. \end{aligned}$$

Proof

We have

$$\begin{aligned}&n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)^{\otimes 2}\\= & {} n^{-1}\sum _{i=1}^n\left[ \varDelta _i\mathbf{B}_r(X_i)- (1+\varDelta _i) \frac{\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\} \mathbf{B}_r(u) du }{1+\hbox {exp}(\mathbf{Z}_i^\mathsf{\scriptscriptstyle T}{\varvec{\beta }})\int _0^{X_i}\hbox {exp}\{m(u)\}du}\right] ^{\otimes 2}\\\le & {} n^{-1}\sum _{i=1}^nC'\left[ \mathbf{B}_r(X_i)^{\otimes 2} +\left\{ \int \mathbf{B}_r(u) du\right\} ^{\otimes 2}\right] , \end{aligned}$$

for some constants \(0<C'<\infty \). Similar derivation leads to

$$\begin{aligned} ch\Vert \mathbf{a}\Vert _2^2\le \mathbf{a}^\mathsf{\scriptscriptstyle T}\{n^{-1}\sum _{i=1}^n\mathbf{S}_{{\varvec{\gamma }},i}({\varvec{\beta }}_0,m)^{\otimes 2}\}\mathbf{a}\le Ch\Vert \mathbf{a}\Vert _2^2, \end{aligned}$$

for constant \(0<c<C<\infty \). \(\square \)

Appendix F Algorithm for optimization

Here we provide the detailed algorithm for optimization of \(l_n\).

  1. 1.

    Obtain the initial estimator of \(\widehat{{\varvec{\beta }}}_{\scriptscriptstyle \mathsf init}\) from the U-statistic equation (Cheng et al. 1995),

    $$\begin{aligned} \sum _{i=1}^n \sum _{j=1}^n ({\widehat{\mathbf{Z}}}_i - {\widehat{\mathbf{Z}}}_j)\left[ \frac{\varDelta _j I(X_i \ge X_j)}{{\hat{G}}^2(X_j)} - \xi \left( ({\widehat{\mathbf{Z}}}_i - \widehat{\mathbf{Z}}_j)^\mathsf{\scriptscriptstyle T}{\varvec{\beta }}\right) \right] = 0, \end{aligned}$$

    where \({\hat{G}}\) is the Kaplan-Meier or empirical distribution estimator for censoring time distribution, and \(\xi (s) = \{e^s(s-1)+1\}/(e^s-1)^2\). We solve the equation by classical Newton’s method. Then, we calculate the initial estimator for baseline function \({\widehat{\alpha }}_{\scriptscriptstyle \mathsf init}\) by solving

    $$\begin{aligned} \sum _{i=1}^n \frac{\varDelta _i I(X_i \le t)}{{\hat{G}}(X_i)} - \frac{\hbox {exp}\{{\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf init}^\mathsf{\scriptscriptstyle T}\mathbf{Z}_i\}\alpha (t)}{1+\hbox {exp}\{{\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf init}^\mathsf{\scriptscriptstyle T}\mathbf{Z}_i\}\alpha (t)} = 0. \end{aligned}$$
  2. 2.

    Update \(\widehat{{\varvec{\beta }}}_{\scriptscriptstyle \mathsf update}\) and \({\widehat{\alpha }}_{\scriptscriptstyle \mathsf update}\) from \(\widehat{{\varvec{\beta }}}_{\scriptscriptstyle \mathsf init}\) and \({\widehat{\alpha }}_{\scriptscriptstyle \mathsf init}\) with the alternative B-spline approximation

    $$\begin{aligned} \alpha (t) = \hbox {exp}\left\{ {\tilde{{\varvec{\gamma }}}} ^\mathsf{\scriptscriptstyle T}\int _0^t \mathbf{B}_r(s)ds\right\} . \end{aligned}$$

    Setting the initial value \({\widehat{{\varvec{\beta }}}}^{[0]} = \widehat{{\varvec{\beta }}}_{\scriptscriptstyle \mathsf init}\) and \({\widehat{\alpha }}^{[0]} = {\widehat{\alpha }}_{\scriptscriptstyle \mathsf init}\), we perform the iterative algorithm:

    1. (a)

      Calculate

      $$\begin{aligned} {\widehat{\pi }}^{[k]}_i = \frac{e^{{\widehat{{\varvec{\beta }}}}^{[k-1] \mathsf \scriptscriptstyle T}{\widehat{\mathbf{Z}}}_i}{\widehat{\alpha }}(t)^{[k-1]}}{1+e^{{\widehat{{\varvec{\beta }}}}^{[k-1] \mathsf \scriptscriptstyle T}{\widehat{\mathbf{Z}}}_i}{\widehat{\alpha }}(t)^{[k-1]}} \end{aligned}$$

      and update \(\alpha \) by the Breslow type of estimator

      $$\begin{aligned}&\widehat{{\tilde{\gamma }}}_p^{[k]} = \frac{\sum _{i=1}^n\varDelta _i B_{r,p}(X_i)}{\sum _{i=1}^n \left[ (1+\varDelta _i){\widehat{\pi }}^{[k]}_i - \varDelta _i\right] \int _0^{X_i}B_{r,p}(t)dt},\, {\widehat{\alpha }}(t)^{[k]}\nonumber \\&= \hbox {exp}\left\{ \widehat{{\tilde{{\varvec{\gamma }}}}}^{[k] \mathsf \scriptscriptstyle T} \int _0^t \mathbf{B}_r(t)dt\right\} \end{aligned}$$
    2. (b)

      Update \({\varvec{\beta }}\) by the pseudo logistic regression

      • If \(\varDelta _i = 0\), observation-i contributes one entry in the pseudo data, \(0 \sim {\widehat{\mathbf{Z}}}_i\) with offset \(\hbox {log}\left\{ {\widehat{\alpha }}(t)^{[k]}(X_i)\right\} \).

      • If \(\varDelta _i = 1\), observation-i contributes two entries in the pseudo data, \(0 \sim {\widehat{\mathbf{Z}}}_i\) and \(1 \sim {\widehat{\mathbf{Z}}}_i\) both with offset \(\hbox {log}\left\{ {\widehat{\alpha }}(t)^{[k]}(X_i)\right\} \).

      The solution is \({\widehat{{\varvec{\beta }}}}^{[k]}\).

    During this step, we compute the integrals \(\int _0^{X_i}B_{r,p}(t)dt\) once at the initiation step and use the computation repeatedly. The parameters at the convergence are \(\widehat{{\varvec{\beta }}}_{\scriptscriptstyle \mathsf update}\), \({\widehat{\alpha }}_{\scriptscriptstyle \mathsf update}\) and \(\widehat{{\tilde{{\varvec{\gamma }}}}}_{\scriptscriptstyle \mathsf update}\).

  3. 3.

    Obtain the final MLE estimators \({\widehat{{\varvec{\beta }}}}_{\scriptscriptstyle \mathsf MLE}\) and \({\widehat{{\varvec{\gamma }}}}_{\scriptscriptstyle \mathsf MLE}\). We use \({\widehat{{\varvec{\beta }}}}^{[0]} = \widehat{{\varvec{\beta }}}_{\scriptscriptstyle \mathsf update}\) as initial value for \({\varvec{\beta }}\) and calculate the initial value for \({\varvec{\gamma }}\) from the linear regression

    $$\begin{aligned} {\widehat{{\varvec{\gamma }}}}^{[0]} =&\text{ argmin}_{{\varvec{\gamma }}\in {\mathbb {R}}^{P_n}} \sum _{i=1}^n \left\{ \hbox {log}\left( {\widehat{\alpha }}_{\scriptscriptstyle \mathsf update}'(X_i) \right) - {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\mathbf{B}_r(X_i) \right\} ^2 \\ =&\text{ argmin}_{{\varvec{\gamma }}\in {\mathbb {R}}^{P_n}} \sum _{i=1}^n \left\{ \hbox {log}\left( {\widehat{\alpha }}_{\scriptscriptstyle \mathsf update}(X_i) \right) + \hbox {log}\left( {\widehat{{\varvec{\gamma }}}}_{\scriptscriptstyle \mathsf update}^\mathsf{\scriptscriptstyle T}\mathbf{B}_r(X_i) \right) - {\varvec{\gamma }}^\mathsf{\scriptscriptstyle T}\mathbf{B}_r(X_i) \right\} ^2. \end{aligned}$$

    We perform the iterative algorithm:

    1. (a)

      Update \({\varvec{\gamma }}\) by the one-step Newton’s method

      $$\begin{aligned} {\widehat{{\varvec{\gamma }}}}^{[k]} = {\widehat{{\varvec{\gamma }}}}^{[k-1]} - \left\{ \frac{\partial ^2}{\partial {\varvec{\gamma }}^{\otimes 2}} l_n\left( {\widehat{{\varvec{\beta }}}}^{[k-1]},\widehat{\varvec{\gamma }}^{[k-1]}\right) \right\} ^{-1}\frac{\partial }{\partial {\varvec{\gamma }}} l_n\left( {\widehat{{\varvec{\beta }}}}^{[k-1]},{\widehat{{\varvec{\gamma }}}}^{[k-1]}\right) . \end{aligned}$$
    2. (b)

      Update \({\varvec{\beta }}\) by the pseudo logistic regression as in Step 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, L., Hou, J., Uno, H. et al. Semi-supervised approach to event time annotation using longitudinal electronic health records. Lifetime Data Anal 28, 428–491 (2022). https://doi.org/10.1007/s10985-022-09557-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-022-09557-5

Keywords

Navigation