prepare file for depth of coverage
0
0
Entering edit mode
4.7 years ago
bioguy24 ▴ 230

Trying to prepare a Rod file for use with GATK depth of coverage. I downloaded a standard hg1g refseq file and I need to remove non-standars contigs other then chr1-22 chrx and y and chrM and sort in karotypic order. Is the below the best way to do so? Thank you :).

cat getRefGene.txt | grep -v chrUn* | grep -v *random | grep -v chrM | grep -v *hap* | sort -k1,1 -V -s > output.txt
depth of coverage • 776 views
ADD COMMENT
1
Entering edit mode

grep can accept multiple search patterns in the regex:

grep -vE 'random|chrM|hap|' getRefGene.txt | sort -k1,1 > (...)

I would not use -V as most tools expect standard rather than natural sort order. Ca you show the content of this getRefGene.txt and expected output?

ADD REPLY
0
Entering edit mode

input file (getRefegene.txt)

#bin    name    chrom   strand  txStart txEnd   cdsStart    cdsEnd  exonCount   exonStarts  exonEnds    score   name2   cdsStartStat    cdsEndStat  exonFrames
0   NM_001308203.1  chr1    +   66999251    67216822    67000041    67208778    22  66999251,66999928,67091529,67098752,67105459,67108492,67109226,67136677,67137626,67138963,67142686,67145360,67154830,67155872,67160121,67184976,67194946,67199430,67205017,67206340,67206954,67208755,  66999355,67000051,67091593,67098777,67105516,67108547,67109402,67136702,67137678,67139049,67142779,67145435,67154958,67155999,67160187,67185088,67195102,67199563,67205220,67206405,67207119,67216822,  0   SGIP1   cmpl    cmpl    -1,0,1,2,0,0,1,0,1,2,1,1,1,0,1,1,2,2,0,2,1,1,
0   NM_032291.3 chr1    +   66999638    67216822    67000041    67208778    25  66999638,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755,   67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67216822,   0   SGIP1   cmpl    cmpl    0,1,2,0,0,0,1,0,0,0,1,2,1,1,1,1,0,1,1,2,2,0,2,1,1,

The expected output I believe would be a column 3 with only chr1-22 x y and m. Thank you :).

ADD REPLY

Login before adding your answer.

Traffic: 1404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6