How can we extract the alignment coordinates and gap coordinates in BLAT PSL?
2
1
Entering edit mode
9.1 years ago
Rohit ★ 1.5k

Hello,

I am bringing up the thread regarding the coordinates from BLAT psl-format, previous threads covered regarding in PSL format but I want to include the negative strand the the gap-blocks information too.

Blat sequence-alignment • 3.2k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
kepbod ▴ 90

I think if you want to extract the alignment sequences, you could directly add "-out=pslx" when you run blat.

ADD COMMENT
0
Entering edit mode

Sorry I got back late, there is an extended option to include the sequences or alignments completely from pslx gives the output with the sequences, it does not give you blocks of alignments or missing gaps :(

ADD REPLY
0
Entering edit mode
4.0 years ago

I know that's an old question, but anyway: the default .psl output format of BLAT includes columns block count, blockSizes, qStarts and tStarts which contain gaps coordinates.
The following R snippet will help to extract all exact matches from BLAT output into a GRangesList object respecting the gaps:

  library(data.table)
  library(GenomicRanges)
  library(rtracklayer)
  library(stringr)

  # skip first 5 lines of header
  blat_out <- fread("blat_output_file.psl", skip = 5)
  setnames(blat_out, c("V1", "V9", "V10", "V11", "V14", "V16", "V17", "V18", "V19", "V20", "V21"), c("match", "strand", "name", "len", "chr", "start", "end", "blockCount", "blockSizes", "blockStarts", "tStarts"))

  # keep only exact matches
  blat_out <- blat_out[match==len]

  # remove any match on the "alternative" chromosomes
  blat_out <- blat_out[!str_detect(chr, "alt")]
  blat_out <- blat_out[,.(name, strand, chr, start, end, len, blockCount, blockSizes, blockStarts, tStarts)]

  # to make "name" column unique
  blat_out[,name:=paste0(name, "_", 1:nrow(.SD)),by=name]

  blat_out_blocks_granges <- GRangesList(lapply(blat_out$name, function(name_i){
    row <- blat_out[name==name_i]
    block_sizes <- as.integer(unlist(str_split(row$blockSizes, ",")))
    block_sizes <- block_sizes[!is.na(block_sizes)]

    t_starts <- as.integer(unlist(str_split(row$tStarts, ",")))
    t_starts <- t_starts[!is.na(t_starts)]

    result <- data.table(start=t_starts, end=t_starts+block_sizes)
    makeGRangesFromDataFrame(cbind(row[,.(name, strand, chr)], result), keep.extra.columns = T)
  }))

  names(blat_out_blocks_granges) <- blat_out$name

  # and export to a .bed file
  export(blat_out_blocks_granges, con = "output.bed", format = "bed")
ADD COMMENT

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6