APPLICATIONS / DATABASESSUPPLEMENTAL DATAFACULTYMIRROR SITESCOLLABORATIVE PROJECTSDEPARTMENTS
EMS: Evolution-based Motif Search
 QUERY  |  FORMAT  |  CONTACT US   |  LINKS  |  CREDITS 
EMS Logo St. Jude Logo ISB Logo

Brief Introduction

EMS scans conserved cis-regulatory elements in human/mouse based on cross-species comparison. Sequences and alignments of all human and mouse orthlogous gene pairs are stored for searching. The list of human/mouse orthlogous genes was generated based on NCBI Homologene (version: May 2004). From the UCSC goldenpath database, (version: human, July, 2003 (hg16), 21231 Refseq genes; mouse,   Feb. 2003 (mm4), 17724 Refseq genes), only full-length genes, which transcriptional start sites are ahead of the transnational start sites are kept. Finally, 11836 pairs of ortholog genes appear in both datasets, Then we extracted their 10k (up or down-stream) RepeatMasked upstream regions and aligned them using the standalone AVID tool. (See the Flowchart) When searching user-defined matrices (or in other formats like Fasta), which may generate from a cluster of genes with a same expression patter in Microarray analyses, EMS first scans them on all 11836 up-stream regions of human genes for significant candidates, then matches the mouse corresponding regions, and finally reports conserved hits (See the paper). EMS website at St. Jude uses the Hartwell Center's 280-CPU Linux cluster for performing motif scans in parallel. With this computational power, a full search takes only around 2 minutes. And a human chromosome viewer is also available to show the genome location of hits.
Query Submission (motif width>=6)  see an example or see precomputed motifs from Transfac

Input a Matrix  Or alignment (raw) Or alignment (Fasta)
 
Upstream 10k  Or downstream 10k
Parameters: show top hits (max: 100)
Format example

Any of the three formats below can be used as input. In the matrix example, note that the sums of the scores in each row need to be equal. Line 01's sum (6 + 0 + 1 + 0) equals Line 02's sum (1 + 0 + 6 + 0).


1. Matrix (TransFac) 2. alignment 3. alignment (Fasta)
01 6 0 1 0 A
02 1 0 6 0 G
03 3 0 0 4 W
04 5 0 2 0 A
05 0 5 2 0 C
06 4 1 1 1 A
07 0 1 1 5 T
08 1 1 3 2 N
09 3 1 0 3 W
10 2 0 0 5 T
11 0 0 7 0 G
12 0 0 0 7 T
13 2 0 0 5 T
14 0 7 0 0 C
15 1 0 0 6 T
ggaacagagtgtact
ggaacattatgttct
gggacacggtgtgct
ttcacatgatgttcc
ggaccgatgagtcct
ggcacatggtgtaca
gggtcagggtgttcc
ggaacgtgatgttct
ggttcacgatgtaat
gggactggacgttat
ggtacggactgtgct
ggatcaggacgttct
gggacaggctgtgct
agcacgacgcgttct
ggttcatagtgatcc

>seq1
ggaacagagtgtact
>seq2
ggaacattatgttct
>seq3
gggacacggtgtgct
>seq4
ttcacatgatgttcc
>seq5
ggaccgatgagtcct
>seq6
ggcacatggtgtaca
>seq12
gggtcagggtgttcc
>seq11
ggaacgtgatgttct
>seq13
ggttcacgatgtaat
>seq14
gggactggacgttat
>seq17
ggtacggactgtgct
>seq21
ggatcaggacgttct
>seq29
gggacaggctgtgct


Contact Us
Tao.Xie@StJude.org    (Dr. Tao Xie)

Reference

EMS has not been published yet, but has been presented at a conference:

Xie T, Lin B, Hood L, and Naeve C, EMS: Evolution-based Motif Search, Poster, Molecular Medicine Tri-Conference, San Francisco, California USA, Mar. 23-26, 2004.       


Thanks
TFBS for drawing sequence logos!


Additional Links
ConSite   Footprinter   rVISTA


Last updated on: May 26, 2004