Extracting Multiple Sequences Files From Large Fasta Text Using Java
2
1
Entering edit mode
11.0 years ago
J.Ashley ▴ 10

Hi I really need some help with my second problem . I have a large fasta file that contains over 300 sequences. I need to search each sequence in the fasta file that contains the following zinc fingers consensus sequence of C-x2-C-x15-C-x2-C or in other words... C-(then 2 letters of any type)-C-( then 15 letters of any type)-C-(2 letters of any type)-C

In the output file i need to print out the title line, the zinc finger and followed by the sequence itself.

Here is what I have so far

import java.io.*;
import java.util.*;
public class test {
public static void main(String[] args) throws IOException {

    String fileName = ""; 
    Scanner input = new ScannerSystem.in);    

    System.out.print ("Enter the name of the sequence file: ");
    fileName = input.nextLine();
    int count = 0;
    BufferedReader bf = null;        
    try {            
        bf = new BufferedReader(new FileReader(fileName));
        String line;
        while ((line = bf.readLine()) != null){
            // if is the title line, count as a record
            if (line.matches("^>.*"))count++;
        }                
    } catch (FileNotFoundException e) {
        System.out.println("File: " + fileName + " does not exist!");
    } finally {
        if (bf != null) {
            bf.close();
        }

After this i get completely confused I know to print out sequences within the file but i have no idea how to print out the type of sequences above. Any help is greatly appreciated

java fasta multiple • 6.2k views
ADD COMMENT
2
Entering edit mode
11.0 years ago

Here is my Satureday-Night-Fever solution.

public class Biostar68459
    {
    public static void main(String args[]) throws java.io.IOException
        {
        java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("[Cc].{2}[Cc].{15}[Cc].{2}[Cc]");
        StringBuilder name=new StringBuilder();
        StringBuilder sequence=new StringBuilder();

        for(;;)
            {
            int c=System.in.read();
            switch(c)
                {
                case -1:
                case '>':
                    {
                    if(pattern.matcher(sequence).find())
                        {
                        System.out.print(">"+name);
                        for(int i=0;i< sequence.length();++i)
                            {
                            if(i%60==0) System.out.println();
                            System.out.print(sequence.charAt(i));
                            }
                        System.out.println();
                        }
                    if(c==-1) return;
                    name.setLength(0);
                    sequence.setLength(0);
                    while((c=System.in.read())!=-1 && c!='\n') name.append((char)c);
                    break;
                    }
                case '\n':
                case ' ':
                case '\r':  break;
                default: sequence.append((char)c);break;
                }
            }
        }
    }

.

$ curl -s  "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=475808216&rettype=fasta" | java Biostar68459  | head
>gi|475808216|ref|NM_001277403.1| Homo sapiens zinc finger protein 730 (ZNF730), mRNA
AATCAGGCCCGCAGCTGGAGCAGACAGGGCGGCTTCCGGGATTTGGCGCGGCCTTTGTTT
CTCGCTGCCGCCGAAGCTCCAATTTTCGTCTGTCTGCTTTGTGTCCTCTGCACGTAGAAG
CCCAGCCTGTGTGGCCCTGCGACCTGCGGGTATTGGGAGATCCACAGCTAAGACGCCAGG
GCCCCCTGGAAGCCTAGAAATGGGAGCGTTGACATTTAGAGATGTGGCCATAGAATTCTC
TCTGGAGGAGTGGCAATGTCTGGACACCGAACAACAGAATTTATATAGAAATGTAATGTT
AGATAACTACAGAAACCTGGTCTTCCTGGGTATTGCTGTCTCAAAGCCAGACCTGATCAC
CTGTCTGGAGCAAGAAAAAGAGCCTTGGAATTTGAAGACACATGATATGGTAGCCAAACC
CCCAGTTATATGTTCTCATATTGCCCAAGACCTTTGGCCAGAGCAAGGCATAAAAGATTA
TTTCCAAGAAGTCATACTGAGACAATATAAAAAATGTAGACATGAGAATTTACTGTTAAG
ADD COMMENT
1
Entering edit mode

You're a generous person! Maybe I'm too pessimistic, but this question really sounds like a homework problem, and the "what I have so far" really seems like skeleton code from a problem statement.

I would've just given vague pointers to consider using regular expressions, since even that tidbit wasn't present in the question.

But maybe I'm wrong...

ADD REPLY
0
Entering edit mode

You're right. But he provided a source code as if he really tried to solve the problem and ... I was looking for something funny to do before switching off my laptop :-)

ADD REPLY
0
Entering edit mode

Aghhh I see so you use the compile method!. Thank you so much for your help. I will try this out..test it and see what happens. Again thanks, sometimes it just takes an example to get you going!!

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6