Problem finding sequence in mouse dna despite blast finds it
1
0
Entering edit mode
4.9 years ago
juanjo75es ▴ 130

I downloaded an alignment of 35 mammals. I selected a random fragment. In the case of mouse it corresponded to this section:

>mus_musculus_2_9117025_9127689_-1__chr_length=182113224_
TTATTTTGGATAAATACATTAAAAATTTAAATTTAGTTATTGTTAGTACTTGGATAAGTAGGAT

When I look for that sequence (just the first line of it) using Ensembl BLAST it finds it (despite not in that position) but when I download the data for that region (whether I use ensembl of ncbi) the sequence does not correspond. I then downloaded the full chromosome 2 for the mouse reference genome, then made a search of that sequence and it doesn't appear anywhere. Not even close. What am I missing?

I need that because I want to extract an alignment for that section from other species. Indeed I tried to find a local alignment for the full sequence but the result was terrible. Then I tried with the mouse and same happened. Then I realized that indeed that sequence used in the 35 mammals alignment apparently does not exist in the mouse gemome, despite blast also finds it ... I am lost. Any help would be appreciated.

sequence alignment • 903 views
ADD COMMENT
0
Entering edit mode

This sequence do exist in the mouse genome:

Mus musculus strain C57BL/6J chromosome 2, GRCm38.p4 C57BL/6J
Sequence ID: NC_000068.7    Length: 182113224   Number of Matches: 1
Range 1: 9127626 to 9127689
Alignment statistics for match #1 Score Expect  Identities  Gaps    Strand
119 bits(64)    1e-25   64/64(100%)     0/64(0%)    Plus/Minus

Query  1        TTATTTTGGATAAATACATTAAAAATTTAAATTTAGTTATTGTTAGTACTTGGATAAGTA  60
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  9127689  TTATTTTGGATAAATACATTAAAAATTTAAATTTAGTTATTGTTAGTACTTGGATAAGTA  9127630

Query  61       GGAT  64
                ||||
Sbjct  9127629  GGAT  9127626

then made a search of that sequence and it doesn't appear anywhere. Not even close.

How did you search the sequence?

What am I missing?

How To Ask Good Questions On Technical And Scientific Forums

ADD REPLY
1
Entering edit mode

Hi h.mon As I said, I already found that sequence using BLAST. That same screen that you posted. The problem is when I try to download that sequence and the surrounding area. This sequence is not what I get if use these positions as parameters in the ensembl browser. What I get is that:

>2 dna:chromosome chromosome:GRCm38:2:9127626:9127689:1
ATCCTACTTATCCAAGTACTAACAATAACTAAATTTAAATTTTTAATGTATTTATCCAAAATAA

As I also said, I downloaded the full chromosome 2 sequence from an ftp site (indeed two versions from two ftp sites) and made a search using a text editor on it and that sequence does not appear. Sorry if I am not accurate with the terminology, I am a newbie on bioinformatics, but I already have large experience on asking technical questions on other technical fields. I don't see any problem in the question but if there is one please share your impressions.

ADD REPLY
0
Entering edit mode

Just be more detailed. First and foremost, you should have showed the sequence you found.

But you also should have said from the beginning how you downloaded the particular region (using Ensembl BioMart? etc), how you searched for the sequence, if you used used local blast, or NCBI (or Ensembl) blast server, and so on. Generally speaking, it is also a good idea to paste the exact commands you used.

When you do this, people have more information and is able to provide more detailed, higher quality suggestions. For example, although very tempting (I myself do this), searching for a pattern in a fasta file is not advisable, because line wrapping can result in false negatives. I would have advised you to perform a local blast search against the downloaded chr2, or to use BBDuk from the BBTools package:

bbduk.sh in=chr2.fasta literal=ATCCTACTTATCCAAGTACTAACAATAACTAAATTTAAATTTTTAATGTATTTATCCAAAATAA

Both programs handle searching in both strands, so one finds patterns on the opposite strand, which are not automatically searched when using an text editor.

ADD REPLY
2
Entering edit mode
4.9 years ago
h.mon 35k

The sequence you are interested is there, it is just on the opposite strand, you have to reverse-complement it:

The Sequence Manipulation Suite: Reverse Complement
Results for 64 residue sequence "2 dna:chromosome chromosome:GRCm38:2:9127626:9127689:1" starting "ATCCTACTTA".

TTATTTTGGATAAATACATTAAAAATTTAAATTTAGTTATTGTTAGTACTTGGATAAGTAGGAT
ADD COMMENT

Login before adding your answer.

Traffic: 2758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6