Extracting matrix columns specific to file1 and file 2, not the overlap or common?
1
0
Entering edit mode
4.5 years ago

I have two small RNA matrix files, having almost 87% overlap. I want to extract those columns which are only specific to file 1 and specific to file 2, I am giving an example of my data:

File1.Sample 1:

    AAAAAAACAAGGATCAACAAGACT        0.0835      0       0.2743      0.197     0.069      0.44       0.195     0.31
    AAAAAAACACTCGGCAAAGAACCC        0.3343       0.0    1.641      2.170       1.82       0.88      0.758
    AAAAAAACCCTCTGACGCAGCACC        0.167      0       1.455        0.096     0.487       0       0       0       1.55        
    AAAAAAACCGCCACTAGAAATCGT        0.0835      0.0843      0.557       0.888      1.35       0.88    0.66
    AAAAAAACGTACTTCGTGCCGACT        0.0835     0.599       0       0       0       0       0.351       0       0       0    
    AAAAAAACTCGGAACCCTAATCTG        0.083      0.2569       0.364      0.260       0.286       0.10       0.35

File2. Sample2:

    AAAAAACACTCGGCAAAGAAGGCT        0.167       0       0.674       1.0531      0.3878  0.61838       0.08543      0.387
    AAAAAACACTCGGCAAAGGCTTTG        0.51        0.22       1.82        0.888   0.87699       1.6497       0.17659
    AAAAAACAGACTTTGTATCGACT         2.846        0.0300     0.1824    0.39       0.94       0.4692       0.31817
    AAAAAACAGATGCCGAAGATGT          1.8389        0.4282       4.0117        2.562        0.54       1.649477        
    AAAAAACAGTATTCGAAACGGGAC        0.1677       0.08511      1.55052        0.6997       0.58733       1.75284

File3.Overlap:

    AAAAAAACGTACTTCGTGCCGACT        0.0835     0.599       0       0       0       0       0.351       0       0       0    
    AAAAAAACTCGGAACCCTAATCTG        0.083      0.2569       0.364      0.260       0.286       0.10       0.35
    AAAAAACACTCGGCAAAGAAGGCT        0.167       0       0.674       1.0531      0.3878  0.61838       0.08543      0.387
    AAAAAACACTCGGCAAAGGCTTTG        0.51        0.22       1.82        0.888   0.87699       1.6497       0.17659

These are the three files, file 1 is sample 1, file 2 is sample 2 and file 2 overlap or common between file 1 and 2 based on column 1. I want to extract those specific sequences which are specific to the respective file along with the matrix values . I have tried these several commands also got from biostar through search, includes:

cat sorted_b73matrix.txt sorted_mo17matrix.txt|sort |uniq -u |awk '$1==1' > 123.txt
grep -vxFf sorted_b73matrix.txt sorted_mo17matrix.txt > B73_specific_martix.txt
grep -vxFf sorted_mo17matrix.txt sorted_b73matrix.txt > M017_specific_matrix.txt
cat file1.tx file2.txt |sort |uniq -c |awk '$1==1'

But the result is not correct. maybe my parameters are wrong. Please tell me how i will get my matrix file specific to the respective files, not the overlap.

sequence matrix columns • 1.1k views
ADD COMMENT
0
Entering edit mode

From your example it seems like you want to extract specific rows not columns, right ?

ADD REPLY
1
Entering edit mode
4.5 years ago

If you want in the same output file to extract lines from file 1 where sequences are not present in file 2 AND lines from file 2 where sequences are not present in file 1 :

awk 'NR==FNR { a[$1]++ } NR!=FNR && a[$1]==1' <(cat sorted_mo17matrix.txt sorted_b73matrix.txt) <(cat sorted_mo17matrix.txt sorted_b73matrix.txt)
ADD COMMENT
0
Entering edit mode

Thanks for your reply, yeah actually, i was extracting specific rows @Bastien Herve , it works for me.

ADD REPLY
0
Entering edit mode

Glad it helps, you can accept it as an answer to your post (the check marker next to the thumb)

ADD REPLY

Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6