Question

how to create a fasta file with subset of sequences using seqinr

0

Entering edit mode

5.3 years ago

Camilla ▴ 10

Hi everyone, I am very new in environment R. I am trying to obtain a final fasta file selecting a subset of sequences from a downloaded fasta file, using seqinr. For this purpose I type the following script:

 myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
  subsetlist<-read.table("list.txt", header=TRUE)
  myfasta[names(myfasta) %in% subsetlist$id]
  write.fasta(sequences = myfasta, names = names(myfasta), nbchar = 80, file.out = "parsedMRP2.fasta")
  parsedMRP2<- read.fasta("parsedMRP2.fasta", set.attributes = FALSE)"

The MRP2.fas contains 5039 sequences, while the list.txt is a list of 3000 sequences. When I run my script, I obtain final file (parsedMRP2.fasta) always with 5039 sequences instead of the 3000 sequences that I indicated in the list.txt.

Could you please help me? thank you very much in advance

R seqin fasta sequences • 6.1k views

ADD COMMENT • link updated 4.9 years ago by Biostar 20 • written 5.3 years ago by Camilla ▴ 10

1

Entering edit mode

If you need an alternative then using faSomeRecords from Jim Kent's utilities is a great/fast option. Linux version is linked but macOS available as well. After downloading do not forget to add execute permissions (chmod a+x faSomeRecords).

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.

ADD REPLY • link 5.3 years ago by GenoMax 141k

score 2 · Accepted Answer · 2019-01-14

2

Entering edit mode

5.3 years ago

Chirag Parsania ★ 2.0k

Because you are not saving data in new object. Before you write in the file, you need to save in another object. See the change below.

myfasta<- read.fasta(file = "MRP2.fas", seqtype = "AA",as.string = TRUE, set.attributes = FALSE)
subsetlist<-read.table("list.txt", header=TRUE)

my_fasta_sub <- myfasta[names(myfasta) %in% subsetlist$id]
write.fasta(sequences = my_fasta_sub, names = names(my_fasta_sub), nbchar = 80, file.out = "parsedMRP2.fasta")

ADD COMMENT • link 5.3 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Thank you very much Chirag for the solution ! Now it works!

ADD REPLY • link 5.3 years ago by Camilla ▴ 10

0

Entering edit mode

Hi. I would like to ask a similar question: I have a Fasta alignment, that I have created a object with: myotis <- read.dna("epfu.fas",format= "fasta") I am interested in using bootstrap to estimate confidence intervals, if it is actually possible. I would like to resample with replacement sequences from that alignment, and then using the function tajimaD from the package strataG or the function tajima.test$D from the package pegas.

ADD REPLY • link 4.9 years ago by ja569116 • 0