This site is a beta test.
Question: All result in one file
0
Entering edit mode
4 months ago
amandinelecerfdefer • 0

Hello, I want to perform the Biomart request on several files in the same folder (which works). But I would like the output of the request for each file to be saved either: 1 file to its own output file (1 request = 1 file) or the output of requests for 500 files is in the same file (500 files = 1 output file with accumulation of the output of each request). Here is the code used but in the final file, only the last request is saved.

library(biomaRt)
files<-list.files(path = "/Users/amandinelecerfdefer/Desktop/poi/data/", pattern = (".txt$"))
files
myList2 <- list()

for (k in 1:length(files)) {
  setwd("/Users/amandinelecerfdefer/Desktop/poi/data/")
  myList2[[k]] <- read.delim(files[k])
  snpmart <-
    useMart(biomart = "ENSEMBL_MART_SNP", dataset="hsapiens_snp")

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = myList2[[k]]$rsID,
    mart = snpmart,
    uniqueRows = TRUE
  )

  setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
  write.csv(res[[k]], file = "recovery_gene_trans.txt")
  or 
    for(k in 1:length(files)){
         setwd("/Users/amandinelecerfdefer/Desktop/poi/result/")
         write.csv(res[[k]], file = "recovery_gene_trans.txt")
    }

}

Always the same issue

How to do this?

ADD COMMENTlink 4 months ago amandinelecerfdefer • 0 • updated 4 months ago SMK ♦ 1.3k
0
Entering edit mode
4 months ago
manuel.belmadani • 830
Canada

The easiest way to fix that would be to change your filename as you're writing is, for example:

write.csv(res[[i]], file = paste0("recovery_gene_trans_",k,".txt"))

So for each k file, your file name with be suffixed with k.txt.

But what is i here? it shows up in your write.csv but doesn't seem to get set before, so maybe you want to switch that to a constant other than i if it's always the same?

Another way that might be of interest to you, if each file has the same columns, is to do something like:

combined.files <-
  do.call(rbind,
          lapply(files, function(filename_k) {
            file_k <- read.delim(filename_k)
            snpmart <-
              useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

            res <- getBM(
              attributes = c(
                "refsnp_id",
                "ensembl_gene_stable_id",
                "ensembl_transcript_stable_id"
              ),
              filters = "snp_filter",
              values =  file_k$rsID,
              mart = snpmart,
              uniqueRows = TRUE
            )

            return((res[[i]])
          }))

This will call lapply and return you a list of data which gets combined into one table by do.call( rbind,. You could even add a column to your res[[i]] to identify which k file it's coming from.

ADD COMMENTlink 4 months ago manuel.belmadani • 830
Entering edit mode
0

Thank you for your answer. Excuse me, I made a mistake, there is no i in my code, it's a bad habit, it's a k instead of the i.

ADD REPLYlink 4 months ago
amandinelecerfdefer
• 0
Entering edit mode
0

unfortunately, I just tried your proposals, which unfortunately don't work.

edit : I answer here because the site doesn't want me to comment on your answer: I want to retrieve the total output of each request and not just one item to be returned by BioMart

ADD REPLYlink 4 months ago
amandinelecerfdefer
• 0
Entering edit mode
0

Which one? Does it give you an error message or it doesn't merge them properly?

The main problem I see is that you're getting res from biomart. So calling res[[k]] doesn't seem to make sense since biomart doesn't know that you have k files, that's why I assumed you were using i in res[[i]] to access a specific element of the biomart output.

Check if you want the whole res list or a specific element of it, but it seems unlikely that you'll want element k for each iteration.

ADD REPLYlink 4 months ago
manuel.belmadani
• 830
0
Entering edit mode
4 months ago
SMK ♦ 1.3k
Ghent, Belgium

Hi amandinelecerfdefer,

To write 1 file to its own output file, you can do something like:

library(biomaRt)
setwd("/Users/amandinelecerfdefer/Desktop/poi")
files <- list.files(path = "data", pattern = (".txt$"))

snpmart <- useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
for (k in 1:length(files)) {
  fname <- files[k]
  cat(paste0("Now parsing data/", fname, "...\n"))
  data <- read.delim(paste0("data/", fname))

  res <- getBM(
    attributes = c(
      "refsnp_id",
      "ensembl_gene_stable_id",
      "ensembl_transcript_stable_id"
    ),
    filters = "snp_filter",
    values = data,
    mart = snpmart,
    uniqueRows = TRUE
  )

  write.csv(res, file = paste0("result/recovery_gene_trans_", fname))
  rm(data, res)
  Sys.sleep(5)
}

To output to the same file, just remove any existing result/recovery_gene_trans.txt and change write.csv to:

  write.table(
    res,
    file = "result/recovery_gene_trans.txt",
    append = T,
    row.names = F,
    col.names = !file.exists("result/recovery_gene_trans.txt"),
    sep = ","
  )
ADD COMMENTlink 4 months ago SMK ♦ 1.3k

Login before adding your answer.

Powered by the version 1.5.2