Do fasta files have _any_ newline dependence?
1
0
Entering edit mode
4.9 years ago
Bosberg ▴ 50

I'm constructing an artificial "genome" to do alignments against, and there are several segments of it that I'd like to keep visually distinct, just for my own reference later (e.g. I use "BC" for "barcode" instead of the usual [chr]omosome). My "genome" looks like this:

>BC1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
GGGACCGGT
CGAGGTGGTTGAAGGTCCTATAATGTCGCCCTCTCCTTCAT
CAGACCAGTAGACCGATTAGGATAGAAAGGCTTAAAACTTA
GGAGTGTGGTTTGTAATTAGGATAGAAAGGCTTAAAACTTA
CGAGGTGGTTGAAGGTCCTATAATGTCGCCCTCTCCTTCAT
CAGACCAGTAGACCGATTAGGATAGAAAGGCTTAAAACTTA
GGAGTGTGGTTTGTAATTAGGATAGAAAGGCTTAAAACTTA
GGTATAGTTATC
CCTAG
AAAAAAAAAAAAAAAAAAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>BC2
...(similar to above)

Each "chromosome" that I'm defining (e.g. BC1, BC2, etc...) has a few segments corresponding to restriction sequences, poly-A tail, etc.. There is no actual biological segmentation within each >BC block, but just for my own ability to quickly come back and visually distinguish each part later, I'm separating them by a newline. Can this create any potential problems? Are there any indexing or genome-conversion packages that assume fixed line lengths? I'm just wondering if this is bad practice.

*Edit: * I should add that I'm planning on using minimap2 and samtools for indexing and alignment.

sequence • 682 views
ADD COMMENT
3
Entering edit mode
4.9 years ago

Can this create any potential problems? Are there any indexing or genome-conversion packages that assume fixed line lengths?

YES samtools faix assumes that all the lines have the same length.

ADD COMMENT

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6