TCGA: Bash assigning a tumor to its match normal
0
0
Entering edit mode
8.2 years ago
umn_bist ▴ 390

So I am piping my TCGA files in a for loop and up until variant calling (MuTect2), these RNA-seq have been processed individually.

For those unfamiliar, CGHub (GeneTorrent) downloads TCGA files without a way of knowing which tumor is its matched normal. The only lead is that the subdirectory these fastq files are downloaded under are named with its unique UUID.

I fortunately kept an Excel spreadsheet putting tumor and its matched normal next to each other.

My question is - for someone unfamiliar with SQL - how can code this pseudoalgorithm

  1. recognize a tumor's UUID and then find its matched normal UUID (in excel)
  2. identify the corresponding subdirectories (and its fastq) file and assign the two files under $TUMOR and $NORMAL variable (I would like to use these for all the files, nesting it inside a for loop)
  3. Feed it into Mutect's option Input:tumor $TUMOR and Input:normal $NORMAL
  4. and then assign a filename (unique to this set of sample) to a variable $SET

EDIT: I am trying to solve this since it's an interesting problem. I realize that storing the directory first will be better. Then find the string in the Excel spreadsheet (is it okay if I have multiple sheets?), and then find the string NT (normal) or TP (tumor) if TP assign to $TUMOR and if not $NORMAL.

My problem now is, how will I associate the tumor to its matching normal? Would it be easier to assign it to $TUMOR_A $NORMAL_A, $TUMOR_B, $NORMAL_B all before entering a for loop instead of using $TUMOR and $NORMAL repeatedly in a for loop (I can't see how this will be possible to be honest).

The only thing I can work with is the line between each sets. If there are other ways to attack this problem, please please let me know

RNA-Seq TCGA MuTect2 • 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6