read FASTA file in R, including description information
2
1
Entering edit mode
8.2 years ago
friasoler ▴ 50

Hello everybody

I'm trying to import a: *.fna file, in R, and I have no idea how to deal with it. I would need to have columns like these: Isogroup, Contigs, Length, Sequence. As one Isogroup contain more that one Contig they will be repeated; but that is exactly what I want. I will appreciate your help. An example file is included. They have paragraph tab after each line.

Regards
Roberto

>contig00002 gene=isogroup00001 length=656
CTAATCCTCAAACCCGAACTCATCTCAGGCCTCCCTATATGCAAGTATAGTTTCAACCCA
CTCCCCACACCATCAAACATCTCAGCTTGATGAAATTTCGGGTCATTACTGGGCATTTGC
>contig00004 gene=isogroup00001 length=566
CACAGATACAGATGGTTGGGGATGCAACAGTCCTCATCCTACTTCGTAATCGCGGCATTC
>contig00005 gene=isogroup00001 length=1152
CTCCCCACACCATCAAACATCTCAGCTTGATGAAATTTCGGGTCATTACTGGGCATTTGC
>contig00007 gene=isogroup00001 length=547
AACCCAACCAAAGCATTTGCCAGTCCCAGTATAGGCGATAGAAAAGACACCATTGGAGCG
>contig00008 gene=isogroup00001 length=698
AaGGgGGGGGggTGGTTCTCGTAGTTAAATGCTTATAACAGtGGCTTTTCAGGCCGTTGA
>contig00024 gene=isogroup00001 length=2170
CTTGGACGCTCTATTATCCCGTGTGAATCATCCCGTCGTCATTTGTCGGGGCTGGGAGAG
fasta RNA-Seq R • 17k views
ADD COMMENT
4
Entering edit mode
8.2 years ago
Erik Wright ▴ 420

You could use the readDNAStringSet function in the Biostrings package:

library(Biostrings)
dna <- readDNAStringSet("<<PATH TO FASTA FILE>>")
s <- strsplit(names(dna), "[ \t]+") # split names by tab/space
info <- matrix(unlist(s), ncol=3, byrow=T)

More text parsing may be added as desired. Hope that helps!

ADD COMMENT
0
Entering edit mode

Thanks Erik. I have got something but I can not get the sequence along the row with the isogroup, contig information. Sorry I can't get the sequences beside the Contig information

ADD REPLY
3
Entering edit mode
8.2 years ago

Check out the readFASTA() method in the Biostrings Bioconductor package. Some postprocessing will be needed to get a data.frame with the columns as you describe.

ADD COMMENT

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6