Hi! At the moment I am stuck at the stage where I have a sorted bam-file, a vcf-file with variants and an annotation gtf-file downloaded from Ensemble. Is it possible to use bedtools (or is it better to use other programs?) To annotate my bam-file using a ready-made genome annotation from Ensembl (gtf format)? The goal is to find out which genes have been sequenced In the future, the task is to find out if there are substitutions, deletions, sudden stop codons, etc. Thanks
Thanks for the answer! These data on genome-wide sequencing (as indicated by the authors in NCBI), but they do not exactly cover the entire genome and were originally used by the authors for other purposes. (this data not from human).
I know about GATK and VEP, but I have technical difficulties with how to actually start manipulating my files. I used to this time only: fastqc - trimmomatic - bowtie2 - samtools/bcftools - vcftools. As a result, I received a sorted file with aligned and deduplicated reads (bam), as well as a file with variations (vcf)
Therefore, I wanted to know how to associate my files (bam and vcf) with the Ensemble file (gtf)? Or make this GATK? Examples of commands? First: That is, find out what genes are read. Second: which of the read genes have variations.
Otherwise, we can say why and how to continue here? Sorry if the question seemed very general. Thanks.
Then, as the answers says, GATK and VEP can do what you need. GATK can tell you per-gene coverage, which you can use to determine which genes may not have the necessary coverage to identify low frequency SNPs. VEP can annotate the SNPs from your VCF file to tell you potential consequences on gene products.