Hi ,
I have a file with Id which I want to compare it with other file to get the sequence of a particular id.
File 1
CCDS2.2
CCDS3.1
CCDS30550.1
CCDS30551.1
File 2
>CCDS2.2|Hs37.3|chr1
MSKGILQVHPPICDCPGCRISSPVNRGRLADKRTVALPAARNLKKERTPSFSASDGDSDG
SGPTCGRRPGLKQEDGPHIRIMKRRVHTHWDVNISFREASCSQDGNLPTLISSVHRSRHL
VMPEHQSRCEFQRGSLEIGLRPAGDLLGKRLGRSPRISSDCFSEKRA
>CCDS3.1|Hs37.3|chr1
MAAAGSRKRRLAELTVDEFLASGFDSESESESENSPQAETREAREAARSPDKPGGSPSAS
RRKGRASEHKDQLSRLKDRDPEFYKFLQENDQSLLNFSDSDSSEEEEGPFHSLPDVLEEA
SEEEDGAEEGEDGDRVPRGLKGKKNSVPVTVAMVERWKQAAKQRLTPKLFHEVVQAFRAA
VATTRGDQESAEANKFQVTDSAAFNALVTFCIRDLIGCLQKLLFGKVA.
>CCDS4.1|Hs37.3|chr1
MGNSHCVPQAPRRLRASFSRKPSLKGNREDSARMSAGLPGPEAARSGDAAANKLFHYIPG
TDILDLENQRENLEQPFLSVFKKGRRRVPVRNLGKVVHYAKVQLRFQHSQDVSDCYLELF
PAHLYFQAHGSEGLTFQGLLPLTELSVCPLEGSREHAFQITGPLPAP
I want these two files to be compared by comapring ID in the first file with the ID encoded in the second file >CCDS#. If it is same then print the complete sequence.
For example, CCDS2.2 and CCDS3.1 is found in first file and in the second file. So in the output I should have something like this given below
Expected output
column1 column2
CCDS2.2 >CCDS2.2|Hs37.3|chr1
MSKGILQVHPPICDCPGCRISSPVNRGRLADKRTVALPAARNLKKERTPSFSASDGDSDG
SGPTCGRRPGLKQEDGPHIRIMKRRVHTHWDVNISFREASCSQDGNLPTLISSVHRSRHL
VMPEHQSRCEFQRGSLEIGLRPAGDLLGKRLGRSPRISSDCFSEKRA
CCDS3.1 >CCDS3.1|Hs37.3|chr1
MAAAGSRKRRLAELTVDEFLASGFDSESESESENSPQAETREAREAARSPDKPGGSPSAS
RRKGRASEHKDQLSRLKDRDPEFYKFLQENDQSLLNFSDSDSSEEEEGPFHSLPDVLEEA
SEEEDGAEEGEDGDRVPRGLKGKKNSVPVTVAMVERWKQAAKQRLTPKLFHEVVQAFRAA
VATTRGDQESAEANKFQVTDSAAFNALVTFCIRDLIGCLQKLLFGKVA
CCDS30550.1 NULL
CCDS30551.1 NULL
Can this be done using awk or sed ?
Thank you,
Nandini
This question falls into the general category "I want to to parse a fasta file and do something to it." awk/sed are unlikely to cut it here. As a.zielezinski suggests below, you need to learn the libraries used to parse sequence formats. Any of the Bio* projects (Bioperl, BioPython, BioRuby, BioJava...) will do this.
Also, and for the record, I will suggest the caption of the question to be a bit more specific. You will get good answers if you ask the right questions.