Biostar Beta. Not for public use.
Question: Keep special character in a .txt file
0
Entering edit mode

Hi all,

I have a .txt file including three columns which based on column Annotation, i want to keep only rows including synonymous and missense. I will be grateful if you can help me to solve this problem.

CHROM_POS       ANN     Annotation
CM009840.1_932          A|intergenic_region|MODIFIER|CHR_START-LOC112587351|CHR_START-gene0|||  nongenic
CM009840.1_1096         T|intergenic_region|MODIFIER|CHR_START-LOC112587351|CHR_START-gene0||   nongenic
CM009840.1_4421500      A|missense_variant|MODERATE|LOC102415844|gene14  |1/1|c.298C>T|p.Ar||   missense
CM009840.1_4421553      A|missense_variant|MODERATE|LOC102415844|gene14|transcript|rna37|p ||    missense
CM009840.1_4421600      G|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|protei ||   synonymous
CM009840.1_4421630      C|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|pro||   synonymous
ADD COMMENTlink 15 months ago mostafarafiepour • 60 • updated 15 months ago ahmad mousavi • 430
Entering edit mode
2

Hello mostafarafiepour!

We believe that this post does not fit this site. You repeatedly do not show any effort in solving basic unix questions yourselves. For this reason we have closed your question.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink 15 months ago
WouterDeCoster
39k
Entering edit mode
0

with awk and sed:

$ sed -nr '/CHROM|mis|\tsyn/p' test.txt 

CHROM_POS   ANN Annotation
CM009840.1_4421500  A|missense_variant|MODERATE|LOC102415844|gene14 |1/1|c.298C>T|p.Ar||    missense
CM009840.1_4421553  A|missense_variant|MODERATE|LOC102415844|gene14|transcript|rna37|p  ||  missense
CM009840.1_4421600  G|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|protei    ||  synonymous
CM009840.1_4421630  C|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|pro|| synonymous

.

$ awk 'NR==1 {print}; /\tsynonymous|missense/ {print $0}' test.txt 
CHROM_POS   ANN Annotation
CM009840.1_4421500  A|missense_variant|MODERATE|LOC102415844|gene14 |1/1|c.298C>T|p.Ar||    missense
CM009840.1_4421553  A|missense_variant|MODERATE|LOC102415844|gene14|transcript|rna37|p  ||  missense
CM009840.1_4421600  G|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|protei    ||  synonymous
CM009840.1_4421630  C|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|pro|| synonymous

input:

$ cat test.txt 

CHROM_POS   ANN Annotation
CM009840.1_932  A|intergenic_region|MODIFIER|CHR_START-LOC112587351|CHR_START-gene0|||  nongenic
CM009840.1_1096 T|intergenic_region|MODIFIER|CHR_START-LOC112587351|CHR_START-gene0||   nongenic
CM009840.1_4421500  A|missense_variant|MODERATE|LOC102415844|gene14 |1/1|c.298C>T|p.Ar||    missense
CM009840.1_4421553  A|missense_variant|MODERATE|LOC102415844|gene14|transcript|rna37|p  ||  missense
CM009840.1_4421600  G|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|protei    ||  synonymous
CM009840.1_4421630  C|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|pro|| synonymous
CM009840.1_4421630      C|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|pro||     non-synonymous
CM009840.1_4421630      C|synonymous_variant|LOW|LOC102415844|gene14|transcript|rna37|pro||     nonsynonymous
ADD REPLYlink 15 months ago
cpad0112
11k
2
Entering edit mode

Hi

grep 'synonymous' file.txt > new_file.txt
grep "PATTERN1\|PATTERN2" file.txt > new_file.txt
ADD COMMENTlink 15 months ago ahmad mousavi • 430
Entering edit mode
0

Thank you for the good answer.

But for those who close the post to prevent the problem, they should be sorry.

ADD REPLYlink 15 months ago
mostafarafiepour
• 60
Entering edit mode
2

We didn’t close your post to prevent people offering answers, we closed the post because you consistently ask low-effort questions where you do not attempt to learn how to solve the problem yourself.

You will be better off in the long run if you spend extra time trying to learn these very basic command now, even if it takes you a little longer to solve to problem.

ADD REPLYlink 15 months ago
Joe
12k
Entering edit mode
0

What is the solution if we do not waste the header?

The header is meant : CHROM_POS ANN Annotation

ADD REPLYlink 15 months ago
mostafarafiepour
• 60
Entering edit mode
1

Why dont you read about grep and the other tools you can use for this task, and try to figure this out for yourself. It will not be difficult.

ADD REPLYlink 15 months ago
Joe
12k
Entering edit mode
0

I'm agree with @jej.healey , you should try some coding by yourself, millions of people have tried before you, at least 100K. and those question are too simple and have answers for sure.

ADD REPLYlink 15 months ago
ahmad mousavi
• 430
This thread is not open. No new answers may be added
Similar Posts
Loading Similar Posts
Powered by the version 2.0