Analysis of the association of gene expression with clinical outcome pediatric patients treated on the AML 87, AML 91, and AML 97 protocols conducted at St. Jude Children’s Research Hospital and included in the microarray study. After exclusion of PML-RARa cases, 98 cases had sufficient follow-up for evaluation. Time to relapse or progression was defined as zero for patients never achieving complete remission. For other patients, time to relapse or progression was defined as the time elapsed from study enrollment date to: relapse, death, or most recent follow-up. Patients still living and disease free at last follow-up were considered censored in this analysis. Additionally, patients who died while in first complete remission were censored at the date of death.
16,134 probes sets remained after application of the variation filter. A protocol-stratified randomization divided the 98 patients into a training cohort (n=68) and a validation cohort (n=30). For each probe set and within each protocol, a generalized Mantel statistic (GMS) measured the strength of the association of expression with time to progression or relapse in the training cohort. ^{6} Our implementation of the GMS is concisely described by analogy to the log-rank test. ^{7} The log-rank test computes a series of contingency-table chi-square test statistics comparing the distribution of group memberships within the set of individuals known to have failed with that of those individuals known to have not failed prior to each unique observed failure time. Our implementation of the GMS replaces the series of chi-square tests with a series of rank-sum tests comparing the median expression of those having failed to that of those known not to have failed. ^{8} We assessed the significance of the observed GMS by simulation of the null hypothesis in a series of 10,000 independent replications. The simulation was conducted by computing the GMS statistic for data created by coupling randomly generated “expression” values with the observed failure times and censoring indicators. The p-value for an observed GMS is the proportion of simulated GMS statistics with greater or equal absolute value.
For each probe set, one p-value represented the significance of the association of expression with outcome under each protocol. For each probe set, the three protocol-specific p-values were combined into an across-study summary p-value by comparing the negative sum of the log of the three p-values with the gamma distribution that describes the distribution of three similarly transformed independent uniform (0,1) random variables. ^{9} .
The spacings LOESS histogram was used to estimate the conditional false discovery rate (cFDR) corresponding to each of the summary p-values. ^{10,11} Table S15 lists the 50 most significant probe sets and their corresponding summary p-values and cFDR estimates.The cFDR estimates imply that approximately half of the probe sets represent false discoveries arising solely due to chance mechanisms. However, these cFDR estimates also clearly indicate that several probe sets’ expressions may be truly associated with time to relapse or progression. Therefore, a leave-one-out jackknife was used to identify probe sets whose significance (in the traditional sense) was robust against the exclusion of one patient from the analysis. ^{12} The jackknife identified three probe sets having p-values less than or equal a = 0.001 in all 68 leave-one-out GMS assessments, indicated by an asterisk in Table S15.
A multivariable, protocol-stratified, Cox proportional hazards regression model simultaneously examined the association of the three jackknife-selected probe sets with time to progression or relapse within the training cohort. ^{7} The multivariable Cox analysis found that increased expression of the probe sets 60471_at and 203063_at were significantly associated with decreased time to relapse or progression (p < 0.0001 and p = 0.0409 respectively). A prognostic score function based on these two probe sets’ expressions was developed by using them as outcome predictors in a second Cox model fit to the training cohort data. An increased score was found to be significantly associated (p = 0.0200) with decreased time to relapse or progression in the validation cohort. More specifically, a unit increase in the score is associated with a 1.54 fold increase in the hazard of relapse or progression in the validation cohort (95% CI = 1.05 - 2.27).
The association of the score with time to relapse or progression in the adult cohort was also examined. The power of this analysis was severely limited by the small sample size. A total of 6 patients were excluded: three t(15;17) patients, two patients who refused therapy, and one patient with an extremely rare and complex karyotype (containing both BCR-ABL and CBFb-MYH11) Consequently, only 14 adult patients were available for analysis. Nevertheless, Cox regression analysis suggested that time to relapse or progression in adults also tended to decrease as score increases (p = 0.0837).