How to delete some contigs in an assembly
1
1
Entering edit mode
5.2 years ago
luzglongoria ▴ 50

Hi there, I'm new in bioinformatics tools and I need help. I tell you what I have done :)

I have an assembly from Trinity and I wanted to know the CG content of each assembly so I generated a file with this information and I imported to R.

Then, I calculated the CG content of all my contigs in R by doing:

## load data
data= read.table("CG_content_contig.txt", header = T)

## Create a new variable called CG
data$CG <- (data$C+data$G)/(data$A+data$C+data$G+data$T)

## Create a file with CG content > 23%
CG_content_more_than_23 <- data[which(data$CG > '0.23'), ]

## Create a file only with the names of the contigs
name_contigs <- CG_content_more_than_23$chr

## Download .txt file. 
write.table(name_contigs, file="name_contigs.txt", row.names=FALSE, sep='\t')

As you see, I have created a file with those contigs with an CG content higher than 23%. I have uploaded this file (name_contigs.txt) to my server in order to work with it. It looks like this:

TRINITY_DN134693_c0_g1_i1
TRINITY_DN109669_c0_g1_i1
TRINITY_DN109679_c0_g1_i1
TRINITY_DN114999_c0_g1_i1
TRINITY_DN114910_c0_g1_i1

I have a .fq file that looks like this:

>TRINITY_DN134617_c0_g1_i1
 AATAAAAATAAATAAAAATCAATAAAAATATTATAATACAATATAATATAAAATAATATAAAAATTCTACAATAAGAATAAAGTATAATTTTTTAGATTATAAGAGGATATGTTAATACATAGTATTCTGTTTGTTATTGTAGAAAAAACATACAGAAACTTTTTGTATATATAGTCTCATTTTATATATATAAATAAAAATGAACATTAATGAAATGAAATTAAGAGTCGTTTTATTAAAAATAGCTATAAAAAATAACAACA

>TRINITY_DN134643_c0_g1_i1
GCATGGTAGTAAAGTATAATGACATAGCAAAAATATTTAAAATAAAAAAAAATTACTATTATAATTTTTTCTGTATAACATAAACGTTTTTAATGATATTATATTAATTACATATAAAAATAGCATAATAAAAATATTTAGTTATAAAATTTATTATTTTATTTTTTTTTTTTTGTTATATACTTTCTCAGAACATTAATTTGTCATCAGTTCTATTATATTGATAAACTATTCAATTGCTTTAATA

What I want to do is to keep only the contigs that are NOT in the .txt file.

Maybe I should create a pipeline with grep command?

assembly RNA-Seq R • 900 views
ADD COMMENT
2
Entering edit mode
5.2 years ago
GenoMax 141k

You should use faSomeRecords utility from Jim Kent (UCSC, linux binary linked). Add execute permissions after download chmod u+x faSomeRecords.

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD COMMENT

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6