Hello,
I have a matrix in which I am trying to retrieve specific rows from it and save it in text
An example matrix is
Affy ID DDM1 DGKI2 FDGYYY1 GUHIL6
1438_at 0.0635 0.2065 -0.2112 0.0856
1487_at 0.071 -0.1315 0.0263 0.0198
1494_f_at 0.0045 -0.0237 0.0156 -0.1352
1598_g_at -0.0541 0.0006 -0.1369 -0.0589
160020_at -0.0925 0.2182 -0.1967 -0.0074
1729_at -0.0017 -0.2209 -0.086 -0.0709
1773_at -0.0273 -0.0181 0.1042 0.0136
177_at -0.0276 -0.2563 0.3975 -0.0535
179_at -0.0472 0.0979 -0.216 -0.2814
1861_at 0.0121 -0.4038 0.0016 0.0334
200000_s_at -0.1021 -0.0887 -0.0452 0.0035
Lets say the name of this matrix is M.txt
and the selected rows is in a list named mSelected.txt
(consisting of 1438_at
, 1729_at
and 200000_s_at
). My output should look like the following file
Affy ID DDM1 DGKI2 FDGYYY1 GUHIL6
1438_at 0.0635 0.2065 -0.2112 0.0856
1729_at -0.0017 -0.2209 -0.086 -0.0709
200000_s_at -0.1021 -0.0887 -0.0452 0.0035
Is there also anyway to convert their Affy ID to Gene name?
$ head /Users/Desktop/mSelected.txt | cat -vet
1438_at^M1729_at^M200000_s_at
$ head /Users/Desktop/m.txt | cat -vet
Affy ID ^I DDM1 ^IDGKI2 ^IFDGYYY1 ^I GUHIL6^M1438_at^I0.0635^I0.2065^I-0.2112^I0.0856^M1487_at^I0.071^I-0.1315^I0.0263^I0.0198^M1494_f_at^I0.0045^I-0.0237^I0.0156^I-0.1352^M1598_g_at^I-0.0541^I0.0006^I-0.1369^I-0.0589^M160020_at^I-0.0925^I0.2182^I-0.1967^I-0.0074^M1729_at^I-0.0017^I-0.2209^I-0.086^I-0.0709^M1773_at^I-0.0273^I-0.0181^I0.1042^I0.0136^M177_at^I-0.0276^I-0.2563^I0.3975^I-0.0535^M179_at^I-0.0472^I0.0979^I-0.216^I-0.2814^M1861_at^I0.0121^I-0.4038^I0.0016^I0.0334^M200000_s_at^I-0.1021^I-0.0887^I-0.0452^I0.0035
Thanks. of course the file is huge and the awk works fine but I don't see any output !!! do you have any idea where the output is saved? Note that I use first Cat for both as follows:
then I run your awk line
You don't wanna use
cat
when theawk
command is designed to read from the files. And output is stored in the file that follows the>
operator in any UNIX command.I used the following command but the output is empty.
Do you know where the problem could be?
Could you give us the output of:
and
How about
Is that a Mac cat? I knew there was a reason I always used
cat -te
and notcat -A
brew install coreutils
makes life with Mac OS X so much easier..Yep. That's the reason I don't remember BSD specific syntaxes. I have homebrew managing GNU-coreutils with bash on my Mac.
I still cannot reply your comment.
I do use TextWrangler but my real data is HUGE and I am afraid I won't be able to paste or even open it. I am searching for a way to make it right
I used
then I used
The same as before, I get an empty output
You can use pastebin or GitHub gist
You'd need to use extended sed with the regular expression. Also, your regex cannot be
^M$
if you wish to match a^M
character at the end of the line, because^
is a meta for the beginning of the line. Usesed -e 's/\r//' input >output
Also, you should not need to open the entire file to just pick the top 10 lines.
It did not work RamRS , still getting an empty output
Just post the whole file somewhere (dropbox, google drive, etc.) so someone can just directly determine what you need to do to clean it. The amount of effort that others have put into doing this remotely is a bit excessive.
I found where the problem was and I solved it! Thank you very much
I wish this had been heeded by OP: how to retrieve specific raws from a data matrix based on Affymetrix ID in Linux
I'd missed that in the deluge of comments :P
I think everyone did. I really think we need to add instructions on Add Comment/Answer/Post to use Gist or Pastebin to add files that are
too largenot optimal for the viewer here. Plus, they also have better formatting, syntax highlighting and line numbering, so I'd prefer that any day over pasting long code here.Did you try to convert the files by any other way? You could e.g. try the "tr method" from my link. You are doing science, show some initiative.. Then when you do the
head
command on the correctedM.txt
, it should look like:While the corrected
mSelected.txt
should look like:And then if the awk command still returns empty then it means that the IDs from
mSelected.txt
do not exist in the first column ofM.txt
.Your files do not appear to have LF end of line markers. So here, e.g. awk sees only one line in your
mSelected.txt
file that contains all three patterns, when they should be on separate lines. Here are some ways for converting your files, or if you have e.g. TextWrangler installed you can do it there..And then use the corrected files for the awk command. Should work..
That might be my mistake (during moderation copy-paste). I'll change that now.Nope, OP's file looks like it has
^M
characters plus a mix of tabs and spaces. Will def need a bit of cleaning.I don't think you can replace
^M
with a literal^M
. You might have to use\r