The class discriminating genes were identified as described above, and then used in an ANN-based supervised learning algorithm. As previously discussed (section II, Supplemental Information), class assignment was based on a differential diagnostic tree format and required that the node value for assignment exceeded a statistically defined confidence level. The results of this analysis are shown in Table 4 of the paper and are included below for the reader’s convenience.
| Subgroup |
Training Seta |
True Accuracyd |
Test Setb |
Specificityf |
|
T-ALL |
100% | 100% | 100% | 100% |
|
E2A-PBX1 |
100% | 100% | 100% | 100% |
|
TEL-AML1 |
98% | 100% | 100% | 100% |
|
BCR-ABL |
100% | 95% | 75% | 100% |
|
MLL rearrangement |
100% | 100% | 100% | 100% |
| Hyperdiploid >50 | 100% | 100% | 100% | 100% |
a training set consisted of 100 cases with distribution: [T-ALL
12, E2A-PBX1 13, TEL-AML1 15, BCR-ABL 11, MLL 15, HD>50 13, other 21]
b blinded test set consisted of 32 cases
[T-ALL 2, E2A-PBX1 5, TEL-AML1 5, BCR-ABL 4, MLL 5,
HD>50 4, other 7
c apparent accuracy determined by 3-fold cross-validation
d true accuracy determined by class prediction on the
blinded test set.
e Sensitivity = ( the number of positive cases predicted)/(the
number of true positives).
To control for over-fitting of the data,
we performed 10 additional rounds of analysis. For each round,
new training and test sets were developed and discriminating
probe sets reselected exclusively using the new training sets.
The top 20 and 50 probe sets were then used in an ANN-based
supervised learning algorithm, and their true accuracy assessed
on the new test sets. This resulted in an average accuracy
of class assignment of 97% (range 93.8%-100%) using 20 probes
per class. Shown in Tables S15 and S16 are the results from
these analyses. The numbers listed under the individual leukemia
subtypes represent the number of misclassified cases in the
training and test sets. The overall accuracies are listed
on the right.
| T-ALL | E2A-PBX1 | TEL-AML1 | BCR-ABL | MLL | Hyperdip>50 | Overall Accuracy | ||||||||
| Training | Test | Training | Test | Training | Test | Training | Test | Training | Test | Training | Test | Training | Test | |
| 1 | 01 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 100 | 93.8 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 99 | 93.8 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 98 | 100 |
| 4 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 98 | 96.9 |
| 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 100 | 96.9 |
| 6 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99 | 100 |
| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 100 |
| 8 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 99 | 96.9 |
| 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 100 |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 99 | 93.8 |
|
1 The number of misclassified cases obtained when diagnosing the indicated leukemia subtype |
||||||||||||||
| T-ALL | E2A-PBX1 | TEL-AML1 | BCR-ABL | MLL | Hyperdip>50 | Overall Accuracy | ||||||||
| Training | Test | Training | Test | Training | Test | Training | Test | Training | Test | Training | Test | Training | Test | |
| 1 | 01 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99 | 100 |
| 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 99 | 90.6 |
| 3 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 97 | 96.9 |
| 4 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 97 | 96.9 |
| 5 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 98 | 90.6 |
| 6 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 98 | 100 |
| 7 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99 | 100 |
| 8 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99 | 100 |
| 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 100 | 96.9 |
| 10 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 97 | 93.8 |
|
1The number of misclassified cases obtained when diagnosing the indicated leukemia subtype |
||||||||||||||
The performance of other supervised learning algorithms was compared to ANN. Using the original training and test sets, chi-squared was used to select the desired number of probes sets, and then the selected probes were used to build a model using ANN, SVM, and k-NN. ANN was performed with one hidden layer consisting of 4 nodes and the backpropagation epoch number was 5000. For the other algorithms, the linear SVM kernel was used and the k-NN parameter was 3. The comparison of the results is shown in Table S17 below. The comparison was performed using the top 20 and 50 probe sets, as well as the top 20 and 50 genes. The numbers correspond to the number of errors made in either the training or test set by class for each metric. Overall, ANN and SVM performed fairly comparably while k-NN gave slightly poorer results.
| ANN | SVM | k-NN | ||||
| Training | Test | Training | Test | Training | Test | |
| top 20 probes | ||||||
| T-ALL | 01 | 0 | 0 | 0 | 0 | 0 |
| E2A-PBX1 | 0 | 0 | 0 | 0 | 0 | 0 |
| TEL-AML1 | 0 | 0 | 0 | 0 | 0 | 0 |
| BCR-ABL | 1 | 2 | 1 | 2 | 2 | 1 |
| MLL | 0 | 0 | 0 | 0 | 0 | 1 |
| Hyperdiploid >50 | 0 | 0 | 0 | 0 | 0 | 0 |
| top 50 probes | ||||||
| T-ALL | 0 | 0 | 0 | 0 | 0 | 0 |
| E2A-PBX1 | 0 | 0 | 0 | 0 | 0 | 0 |
| TEL-AML1 | 1 | 0 | 0 | 0 | 0 | 0 |
| BCR-ABL | 0 | 1 | 1 | 1 | 2 | 1 |
| MLL | 0 | 0 | 0 | 0 | 0 | 1 |
| Hyperdiploid >50 | 0 | 0 | 0 | 0 | 0 | 0 |
| top 20 genes | ||||||
| T-ALL | 0 | 0 | 0 | 0 | 0 | 0 |
| E2A-PBX1 | 0 | 0 | 0 | 0 | 0 | 0 |
| TEL-AML1 | 1 | 0 | 0 | 0 | 0 | 0 |
| BCR-ABL | 1 | 2 | 1 | 2 | 1 | 1 |
| MLL | 0 | 0 | 0 | 0 | 0 | 1 |
| Hyperdiploid >50 | 0 | 0 | 0 | 0 | 0 | 0 |
| top 50 genes | ||||||
| T-ALL | 0 | 0 | 0 | 0 | 0 | 0 |
| E2A-PBX1 | 0 | 0 | 0 | 0 | 0 | 0 |
| TEL-AML1 | 1 | 0 | 0 | 0 | 0 | 0 |
| BCR-ABL | 0 | 1 | 1 | 1 | 3 | 1 |
| MLL | 0 | 0 | 0 | 0 | 0 | 0 |
| Hyperdiploid >50 | 0 | 0 | 0 | 1 | 0 | 0 |