The promoter region and the 5'UTR for specific sets of genes was used for MatchT to search the most recent TRANSFAC ® Professional 8.1 database (vertebrates) ( http://www.biobase.de/ ) for common regulatory elements. We determined statistical over-representation of the common motifs compared to "all" (~20,000) human promoter sequences (UCSC hg17; CLOVER) (Matys et al., 2003) and additionally we used the promoter sequences of the "other" gene set as background.
For the 10 genes that were commonly over-expressed in cross-resistant ALL (2 of 12 were excluded, SLC4A1 because no reliable upstream region was found, and MUC4 because the expression levels were relatively low), we identified 41 common transcription factor binding sites within 1.5kb 5' of the transcription start site, and of those eight were significantly over-represented in these 10 genes compared to all human promoter sequences and to the promoter sequences of 34 genes that were under-expressed in cross-resistant ALL (P<0.05). For these eight factors, we found 14 probe sets on the array. Two of these factors that interact with one of the eight over-represented binding motifs are expressed at a significantly different level in cross-resistant versus cross-sensitive ALL cells (ELF2, lower; IRF1 higher). Different isoforms of ELF2 are known to act as either an inhibitor or a transactivator and physically interact with the AML1 (RUNX1) domain (Cho, J, JBC , 2004). IRF1 encodes interferon regulatory factor 1, a member of the interferon regulatory transcription factor (IRF) family. IRF1 functions as a transcription activator of interferons and interferon targets.
In addition, a transcription binding site for GATA3, a transcriptional activator which binds to the enhancer of the T-cell receptor alpha and delta genes, was identified in all 10 genes (although not over-represented compared to the human genome), and interestingly GATA3 was among the genes that were over-expressed in cross-resistant ALL (R/S ratio 3.38). The gene probe set, transcription factor name (TF), gene symbol (GS), Pearson correlation coefficient (R2) of the expression level with the CR-score, the R/S ratios comparing their expression in cross-resistant (R) versus cross-sensitive (S) ALL, and the raw score and P-values from the TRANSFAC analysis are listed.
Probe set ID | TF name | GS | R 2 | R/S ratio | P-value R/S t-test | Raw score | P-value* from 'all' | P-value + from 'other' |
203010_at | signal transducer and activator of transcription 5A | STAT5A | 0.18 | 1.19 | 0.0743 | 2.5 | 0.032 | 0.017 |
212550_at | signal transducer and activator of transcription 5B | STAT5B | 0.11 | 1.03 | 0.4314 | 2.15 | 0.046 | 0.022 |
212549_at | signal transducer and activator of transcription 5B | STAT5B | 0.15 | 1.05 | 0.1746 | 2.15 | 0.046 | 0.022 |
205026_at | signal transducer and activator of transcription 5B | STAT5B | 0.05 | 0.94 | 0.4379 | 2.15 | 0.046 | 0.022 |
203541_s_at | basic transcription element binding protein 1 | BTEB1 | 0.04 | 1.16 | 0.3699 | 20.6 | 0.023 | 0.008 |
203542_s_at | basic transcription element binding protein 1 | BTEB1 | 0.05 | 1.17 | 0.1100 | 20.6 | 0.023 | 0.008 |
203543_s_at | basic transcription element binding protein 1 | BTEB1 | -0.07 | 0.98 | 0.4952 | 20.6 | 0.023 | 0.008 |
209212_s_at | Kruppel-like factor 5 | KLF5 | 0.13 | 1.21 | 0.0983 | 20.6 | 0.023 | 0.008 |
202308_at | sterol regulatory element binding transcr. factor 1 | SREBF1 | 0.04 | 1.19 | 0.3490 | 5.41 | 0.011 | 0.014 |
202308_at | sterol regulatory element binding transcr. factor 1 | SREBF1 | 0.04 | 1.19 | 0.3490 | 8.54 | 0.009 | 0.005 |
203822_s_at | E74-like factor 2 (ets domain transcription factor) | ELF2 | -0.20 | 0.83 | 0.0036 | 7.46 | 0.01 | 0.011 |
210361_s_at | E74-like factor 2 (ets domain transcription factor) | ELF2 | -0.11 | 0.93 | 0.0539 | 7.46 | 0.01 | 0.011 |
203275_at | interferon regulatory factor 2 | IRF2 | -0.05 | 1.17 | 0.3121 | 1.76 | 0.013 | 0.026 |
202531_at | interferon regulatory factor 1 | IRF1 | 0.18 | 1.34 | 0.0150 | 4.98 | 0.015 | 0.009 |
* P-value from over-representation compared to "all" ~20,000 human promoter sequences
+ P-value from over-representation compared to promoter sequences of "other" 34 under-expressed genes