Powered By Bing

Supplementary Information

Classification of Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profile

Second IV: Diagnostic Accuracy

The class discriminating genes were identified as described above, and then used in an ANN-based supervised learning algorithm. As previously discussed (section II, Supplemental Information), class assignment was based on a differential diagnostic tree format and required that the node value for assignment exceeded a statistically defined confidence level. The results of this analysis are shown in Table 4 of the paper and are included below for the readerís convenience.

Table 4. ALL subgroup prediction accuracies using top 50 chi-square selected genes from U133A and B and Artificial Neural Network (ANN) in decision tree format.

Subgroup

Training Seta
Apparent Accuracyc

True Accuracyd

Test Setb
Sensitivitye

Specificityf

T-ALL

100% 100% 100% 100%

E2A-PBX1

100% 100% 100% 100%

TEL-AML1

98% 100% 100% 100%

BCR-ABL

100% 95% 75% 100%

MLL rearrangement

100% 100% 100% 100%
Hyperdiploid >50 100% 100% 100% 100%

a training set consisted of 100 cases with distribution: [T-ALL 12, E2A-PBX1 13, TEL-AML1 15, BCR-ABL 11, MLL 15, HD>50 13, other 21]
b blinded test set consisted of 32 cases [T-ALL 2, E2A-PBX1 5, TEL-AML1 5, BCR-ABL 4, MLL 5, HD>50 4, other 7
c apparent accuracy determined by 3-fold cross-validation
d true accuracy determined by class prediction on the blinded test set.
e Sensitivity = ( the number of positive cases predicted)/(the number of true positives).

To control for over-fitting of the data, we performed 10 additional rounds of analysis. For each round, new training and test sets were developed and discriminating probe sets reselected exclusively using the new training sets. The top 20 and 50 probe sets were then used in an ANN-based supervised learning algorithm, and their true accuracy assessed on the new test sets. This resulted in an average accuracy of class assignment of 97% (range 93.8%-100%) using 20 probes per class. Shown in Tables S15 and S16 are the results from these analyses. The numbers listed under the individual leukemia subtypes represent the number of misclassified cases in the training and test sets. The overall accuracies are listed on the right.

Table S15. Training and Test Set Permutation Results - Errors per group using 20 probe sets

  T-ALL E2A-PBX1 TEL-AML1 BCR-ABL MLL Hyperdip>50 Overall Accuracy
  Training Test Training Test Training Test Training Test Training Test Training Test Training Test
1 01 0 0 0 0 1 0 1 0 0 0 0 100 93.8
2 0 0 0 0 0 0 1 1 0 0 0 1 99 93.8
3 0 0 0 0 0 0 2 0 0 0 0 0 98 100
4 0 0 0 0 0 1 2 0 0 0 0 0 98 96.9
5 0 0 0 0 0 0 0 1 0 0 0 0 100 96.9
6 0 0 0 0 1 0 0 0 0 0 0 0 99 100
7 0 0 0 0 0 0 0 0 0 0 0 0 100 100
8 0 0 0 0 0 0 1 1 0 0 0 0 99 96.9
9 0 0 0 0 0 0 0 0 0 0 0 0 100 100
10 0 0 0 0 0 0 0 1 0 0 0 1 99 93.8

1 The number of misclassified cases obtained when diagnosing the indicated leukemia subtype


Table S16. Training and test set permutation results - errors per group using 50 probe sets

  T-ALL E2A-PBX1 TEL-AML1 BCR-ABL MLL Hyperdip>50 Overall Accuracy
  Training Test Training Test Training Test Training Test Training Test Training Test Training Test
1 01 0 0 0 1 0 0 0 0 0 0 0 99 100
2 0 0 0 0 0 1 1 1 0 0 0 1 99 90.6
3 0 0 0 0 1 0 2 1 0 0 0 0 97 96.9
4 0 0 0 0 1 0 2 1 0 0 0 0 97 96.9
5 0 0 0 0 1 0 0 1 0 2 1 0 98 90.6
6 0 0 0 0 2 0 0 0 0 0 0 0 98 100
7 0 0 0 0 1 0 0 0 0 0 0 0 99 100
8 0 0 0 0 1 0 0 0 0 0 0 0 99 100
9 0 0 0 0 0 0 0 1 0 0 0 0 100 96.9
10 0 0 0 0 2 0 1 1 0 0 0 1 97 93.8

1The number of misclassified cases obtained when diagnosing the indicated leukemia subtype

Comparison of supervised learning algorithms

The performance of other supervised learning algorithms was compared to ANN. Using the original training and test sets, chi-squared was used to select the desired number of probes sets, and then the selected probes were used to build a model using ANN, SVM, and k-NN. ANN was performed with one hidden layer consisting of 4 nodes and the backpropagation epoch number was 5000. For the other algorithms, the linear SVM kernel was used and the k-NN parameter was 3. The comparison of the results is shown in Table S17 below. The comparison was performed using the top 20 and 50 probe sets, as well as the top 20 and 50 genes. The numbers correspond to the number of errors made in either the training or test set by class for each metric. Overall, ANN and SVM performed fairly comparably while k-NN gave slightly poorer results.

Table S17. Comparison of supervised learning algorithms

  ANN SVM k-NN
  Training Test Training Test Training Test
top 20 probes            
T-ALL 01 0 0 0 0 0
E2A-PBX1 0 0 0 0 0 0
TEL-AML1 0 0 0 0 0 0
BCR-ABL 1 2 1 2 2 1
MLL 0 0 0 0 0 1
Hyperdiploid >50 0 0 0 0 0 0
             
top 50 probes            
T-ALL 0 0 0 0 0 0
E2A-PBX1 0 0 0 0 0 0
TEL-AML1 1 0 0 0 0 0
BCR-ABL 0 1 1 1 2 1
MLL 0 0 0 0 0 1
Hyperdiploid >50 0 0 0 0 0 0
             
top 20 genes            
T-ALL 0 0 0 0 0 0
E2A-PBX1 0 0 0 0 0 0
TEL-AML1 1 0 0 0 0 0
BCR-ABL 1 2 1 2 1 1
MLL 0 0 0 0 0 1
Hyperdiploid >50 0 0 0 0 0 0
             
top 50 genes            
T-ALL 0 0 0 0 0 0
E2A-PBX1 0 0 0 0 0 0
TEL-AML1 1 0 0 0 0 0
BCR-ABL 0 1 1 1 3 1
MLL 0 0 0 0 0 0
Hyperdiploid >50 0 0 0 1 0 0