creating for loops in R for nGS data
1
0
Entering edit mode
7.2 years ago
Ana ▴ 200

I have a question about doing a for loop in R, I would be very grateful if you could let me know your ideas. I'm working with NGS data, I have calculated r2 values to estimate linkage disequilibrium but I want to calculate LD decay for every single SNP in each contig.

This is the first 3 rows of my data:

scaffold94_798049_802097   999  NA  tscaffold94_798049_802097   999   NA  1
tscaffold94_798049_802097  999  NA  tscaffold94_798049_802097   1029  NA  1
tscaffold94_798049_50222   2011 NA tscaffold94_798049_802097    1029  NA  1

the first and third column are contig names. How can I make a loop to keep only those rows that the name of first and third columns are identical (means that only those two SNP located on the same contig)?

R loops • 1.7k views
ADD COMMENT
2
Entering edit mode
7.2 years ago
TriS ★ 4.7k

R solution:

myData <- theResultsYouHaveAlready
myDataFiltered <- myData[which(myData[,1] == myData[,4]),]

awk

awk -v FS='\t' -v OFS='\t' '{if($1 == $4) print}' myFileWithData.txt > myFileWithFilteredData.txt
ADD COMMENT
0
Entering edit mode

actually someone gave me the solution: Works perfectly fine

data$keep_dontKeep <- "dontKeep"

for (i in 1:nrow(data)){ if(as.character(data$V1[i]) == as.character(data$V4[i])){ #If values in V1 and V3 are equal, categorize as 'keep' data$keep_dontKeep[i] <- "keep" } }

data <- data[data$keep_dontKeep == "keep",]

ADD REPLY
0
Entering edit mode

TriS's R solution is way more simple and efficient (faster). It is also the recommended way. You do not need to use a for loop in R for subsetting.

ADD REPLY

Login before adding your answer.

Traffic: 3051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6