Human Whole Genome Sequence Data With Certain Read Length And Coverage
2
0
Entering edit mode
12.5 years ago
Steffi ▴ 580

Can anybody indicate me how to search efficiently for human whole genome sequence data (fasta/q files)- with a certain read length and coverage?

I am aware of the 1000 genomes project, but I have not found yet a tabular listing of the used read length....further more I find the samples description rather confusing.

Thanks
steffi

whole-genome • 2.3k views
ADD COMMENT
1
Entering edit mode
12.5 years ago

I would say the tasks you are looking for is not possible at this moment. The read lenghts are not information that is usually thought to be important/essential to warrant a special searchable field.

Also I think that the concept of fixed read length is just a limitation/characteristic of a sequencing technology rather than being permanent attribute that characterizes all data.

ADD COMMENT
0
Entering edit mode

totally agree ! +1

ADD REPLY
1
Entering edit mode
12.2 years ago

Since the ftp site in the 1000 genomes is accessible remotely via wget, you could query the second line of each file and count the number of characters like this:

wget -qO- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/sequence.index | awk '{print $1}' | \
grep -v FASTQ_FILE | head | while read fe; do echo $file; wget \
-qO- ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/$file | gunzip -c | head -n 2 | \
tail -n 1 | wc -c; done

data/NA19238/sequence_read/ERR000018.filt.fastq.gz
37
data/NA19238/sequence_read/ERR000019.filt.fastq.gz
37
data/NA19240/sequence_read/ERR000020.filt.fastq.gz
37
data/NA19240/sequence_read/ERR000020_1.filt.fastq.gz
37
data/NA19240/sequence_read/ERR000020_2.filt.fastq.gz
37
data/NA19238/sequence_read/ERR000021.filt.fastq.gz
37
data/NA19238/sequence_read/ERR000022.filt.fastq.gz
37
data/NA19238/sequence_read/ERR000023.filt.fastq.gz
41
data/NA19238/sequence_read/ERR000024.filt.fastq.gz
37
[...]
ADD COMMENT

Login before adding your answer.

Traffic: 3020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6