Download EST spliced alignments UCSC
1
1
Entering edit mode
6.1 years ago

Hi,

I would like to download the available EST alignments reported in the UCSC genome browser for a specific region. When I click a particular EST in the browser a new window appears with detailed information and the link "View details of parts of alignment within browser window", following that link there is a "side by side alignment", that is the particular section that I'm interested in, so I would like to know if there is any programmatically way to get that information without having to click on every single EST?

Thanks

sequence alignment UCSC EST • 1.1k views
ADD COMMENT
1
Entering edit mode
6.1 years ago
genecats.ucsc ▴ 580

It is possible to generate those alignments with the pslPretty utility, available from our list of utilities:
http://hgdownload.soe.ucsc.edu/admin/exe

Here is an example where I also illustrate some other useful commands, faSomeRecords and pslSomeRecords, which are also available from the same directory listed above:

# download everything 
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit
$ wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/all_est.txt.gz
$ wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/est.fa.gz

# format
$ gzip -cd all_est.txt.gz | cut -f2- > all_ext.psl
$ gzip -d est.fa.gz

# small example psl:
$ echo "BX437773" | pslSomeRecords all_ext.psl stdin onePsl.psl
$ echo "BX437773" | faSomeRecords est.fa stdin out.fa

# now run pslPretty
$ pslPretty onePsl.psl hg38.2bit out.fa pretty.out
$ cat pretty.out
>BX437773:0-883 of 897 chr1:11130551+11145019 of 248956422
gcgat-gggt-gggctgttctcgg.....75......cNNtggtggcgttgttctgttgN
||||| |||| |||||||||||||             |  ||||||||||| |    |  
GCGATGGGGTGGGGCTGTTCTCGG.....75......cagtggtggcgTTGGTGATGTTG

cccNgaaNggcctNccgccNatacttcttctc-NttNgcgggcttgNttctgatNtttNt
 ||     ||| |  |||   | |||||||||   | ||||||||| ||||||| ||| |
GCCCCGCTGGCATGACGCAGTTTCTTCTTCTCA--TCGCGGGCTTGGTTCTGATGTTTGT

NgtgtNgccccgattcgaagttcatcactgcccacgcatgccagNc-----2302-----
 |||| || | | ||||||||||||||||||||||||||||||| |              
AGTGTAGCACAGCTTCGAAGTTCATCACTGCCCACGCATGCCAGGCCTGGTT...GATCA
...
...
...

All 3 utils can be run with no arguments in order to get a usage message:

$ pslPretty 
pslPretty - Convert PSL to human-readable output
usage:
   pslPretty in.psl target.lst query.lst pretty.out
options:
   -axt             Save in format like Scott Schwartz's axt format.
                    Note gaps in both sequences are still allowed in the
                    output, which not all axt readers will expect.
   -dot=N           Output a dot every N records.
   -long            Don't abbreviate long inserts.
   -check=fileName  Output alignment checks to filename.
It's recommended that the psl file be sorted by target if it contains
multiple targets; otherwise, this will be extremely slow. The target and query
lists can be fasta, 2bit or nib files, or a list of these files, one per line.

If you have further questions about UCSC data or tools feel free to send your question to one of the below mailing lists:

  • General questions: genome@soe.ucsc.edu
  • Questions involving private data: genome-www@soe.ucsc.edu
  • Questions involving mirror sites: genome-mirror@ose.ucsc.edu

ChrisL from the UCSC Genome Browser

ADD COMMENT

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6