Powered By Google

Supplementary Information

Classification of Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profile

Section II: Methods

Hybridization of microarrays

Hybridization solutions from our previous U95A study had been stored at -80oC since their initial use. These solutions were thawed at 45oC, then microcentrifuged for 2 minutes to remove any insoluble material from the mixture. The hybridization solutions were added to U133A chips and allowed to hybridize for 16 hours at 45oC. At the end of the incubation period, the hybridization solution was removed from each U133A chip and refrozen. Subsequently, the hybridizations were thawed and hybridized to the U133B chip.

A non-stringent wash buffer (6X SSPE, 0.01% Tween 20) was added to each chip cassette after the hybridization solution was removed and the cassette allowed to equilibrate to room temperature. The microarray cassettes were then placed on the fluidics station and the antibody amplification protocol performed. The arrays were washed at 25oC with the non-stringent buffer followed by a more stringent wash at 50oC with 100 mM MES, 0.1M NaCl2, 0.01% Tween 20. The arrays were then stained with Streptavidin Phycoerythrin (SAPE, Molecular Probes, Eugene, OR) for 10 minutes at 25oC. Following another non-stringent wash, the arrays were hybridized for 10 minutes at 25oC with an antibody solution (100 mM MES, 1 M [Na+], 0.05% Tween 20, 2 mg/ml BSA, 0.1 mg/ml goat IgG, and 3 mg/ml biotinylated antibody). This solution was removed and the cassettes restained with the SAPE solution.

Arrays were scanned on a laser confocal scanner (Agilent, Palo Alto, CA) and then analyzed with Affymetrix Microarray Suite 5.0 (MAS 5.0). Detection values (present, marginal or absent) were determined by default parameters, and signal values were scaled by global methods to a target value of 500. After completing the scans, the arrays were visually inspected for defects and Affymetrix internal controls were utilized to monitor the success of hybridization, washing, and staining procedures.

Statistical methods

The chi-square metric and the k-NN and ANN supervised learning algorithms have been previously described. For more information see http://www.stjuderesearch.org/data/ALL1/. The SVM supervised learning algorithm that was used in this study is available as part of the software package Rv 1.6.0.

To determine the performance of each model using ANN, a confidence threshold was built for each diagnostic subtype utilizing a modification of the method described by Khan et al.2 Models were built based on a decision tree format where each level of the decision tree contains only two possible distinctions – class and non-class (for example, T verses non-T). At each level, using only samples in the training set, 3 ANN models were built by 3-fold cross validation. The training set samples were then shuffled and 3 additional ANN models were built. This model building process was repeated for a total of 100 times at each step of the decision tree. Then an empirical probability distribution for the ANN output node value was built only for subtype under study, for example, T-ALL at the first step of the decision tree. Only nodal values greater than 0.5 for each subtype were included. For each individual sample in the training set, the 100 validation subtype node values were averaged and compared to threshold. Individual samples were assigned to the subtype under study only when its average subtype nodal value was greater than the 95% confidence threshold. For samples in the test set, subtype nodal values are averaged from all models generated in the 3-fold cross validation. A sample is assigned to the class under study when the average subtype nodal value is greater than the 95% confidence level defined on the training set. A sample not assigned to the subtype will progress to the next level of the decision tree, where the entire process is repeated.