Question

Gene and pseudogenes

0

Entering edit mode

5.0 years ago

sabaghianamir70 ▴ 70

im mapped 2 types of rna seq data and after i did htseq COunt, and differentiate it with limmam i saw per3 and per4 logFC and p valueAdj exactly the same . i tried it with different type of rna seq data and get the same answer. my question is, is it normal or i am making some mistakes ?

RNA-Seq • 1.3k views

ADD COMMENT • link 5.0 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

What species? How do you map? What aligner did you use? Which annotation? Did you mask pseudogenes? Why don't you format your text?

ADD REPLY • link 5.0 years ago by JC 13k

0

Entering edit mode

in human , i used STAR, FEATURE counts, i tried with Single and paired end data. , annotatation was GENECODE V29. .i dont know what is masking pseudogenes.

ADD REPLY • link 5.0 years ago by sabaghianamir70 ▴ 70

1

Entering edit mode

JC means that you can, for example, exclude the pseudogenes from the featureCounts stage.

If you have used the 'comprehensive' GENCODE, then you will have ~199,000 transcripts and isoforms, the majority of which are non-coding. ~50,000 relate to pseudogenes.

You may also want to explore multi-mapping parameters to both STAR (outFilterMultimapNmax) and featureCounts (-M)

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k

0

Entering edit mode

im working with galaxy and i dont know they have this option on their tools or not. So in general you think these data are trust able or .. ?

ADD REPLY • link 5.0 years ago by sabaghianamir70 ▴ 70

score 0 · Answer 1 · 2019-04-16

0

Entering edit mode

5.0 years ago

Kevin Blighe 87k

Edit 20th April, 2019: zoom to answer, here: C: Gene and pseudogenes

------------------------

Hello, and welcome, Amir. What is not 100% clear is how you performed your analysis.

What are the "2 types" of RNA-seq data to which you refer?
Which program did you use for alignment?
How did you normalise the data?
What do you mean by "per3" and "per4" logFC?

ADD COMMENT • link 5.0 years ago by Kevin Blighe 87k

0

Entering edit mode

2 types are including circadian rhythm in fibroblast

i am using galaxy because my department dosent have the super computer to work with

i first get the raw data and then perform fastqc and Trimmomatic

i mean the expresion rate of PER3(parent gene) and PER4(per3s pseudogene) . at the end what matters are logfc(fold change) and pvalueAdj and FDR(false discovery rate)

ADD REPLY • link 5.0 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

If these 2 genes exhibit ~100% sequence similarity, then it may be impossible to faithfully distinguish them via short read NGS technology. Is PER4 a processed (contains only spliced portion of PER3) or unprocessed (contains the genomic sequence of PER3) pseudogene?

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k

0

Entering edit mode

it dose not has any intron, so i think it is processed pseudogene. after i perform limma, i got ~200000 gene fold changes.but in near than 20000 of them i saw a same numbers. like per3 and per4,. also the Abcc6 and Abcc6P1 were like the per3-4 situation. but in some genes and pseudogenes the numbers was different. and my question is right here,,is it possible the limma couldnt identify the reads or it is ok and these numbers are same, just because of their similar sequence. and i must say ,the numbers are different in different time points

these are some of those transcripts. as you see , they all have the same numbers,

ENST00000612945.4 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000614137.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000614131.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000605169.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000443027.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000447999.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000451755.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000391120.2 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000634803.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000453188.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000422697.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577 ENST00000364904.1 -0.126416532 -3.603015772 -0.575651618 0.580084609 0.612291571 -6.174386577

ADD REPLY • link 5.0 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

Thanks for showing this. What were the raw counts for these genes? - all zero?

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k

0

Entering edit mode

if you mean the counts data from FeatureCounts, yes ,all zero.

ADD REPLY • link 5.0 years ago by sabaghianamir70 ▴ 70

1

Entering edit mode

In that case, the result makes sense. When you include genes/transcripts that are just zero, they will be converted to some constant value (like in your data) as a result of the normalisation / transformation. So, you should consider removing these 'zero' genes before you perform the normalsiation step.

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k