HMP database, how to process 16S data?
0
0
Entering edit mode
6.0 years ago
agata88 ▴ 870

Hi all!

Lately I was processing some public 16S data and I came across Human Microbiom Project. I've decided to train on this data. For this purpose I downloaded SRR files (16S raw sequences) for elbow body site, you can see this here: https://portal.hmpdacc.org/search/c?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.sample_body_site%22,%22value%22:%5B%22elbow%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.file_format%22,%22value%22:%5B%22Standard%20Flowgram%20File%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.file_type%22,%22value%22:%5B%2216s_raw_seq_set%22%5D%7D%7D%5D%7D&pagination=%7B%22cases%22:%7B%22from%22:101,%22size%22:100,%22sort%22:%22case_id.raw:asc%22%7D%7D&facetTab=files

There are 125 samples in 4 files - total ~9GB of data. Unfortunately, I am not very familiar with 454 data (sff files) so I encountered some problems during analysis.

I saw that all 4 SFF files include also leg, knee, scalp etc. (beside elbow) body sites. Since I am interested only in elbow data I wanted to divide it by sample ID , which is written as e2559e04fcd73935a7d7b917907a1f46, e2559e04fcd73935a7d7b917907a5ced etc.

I transferred sff files to fasta and qual files with the use of qiime process_sff.py. After this step, I didn't find sample ID in the fasta headers - and now I am confused...how can I divide this data into sample ID? Or body site?

Any help will be much appreciated.

Best, Agata

HMP 16S 454 • 1.5k views
ADD COMMENT
0
Entering edit mode

I think you need mapping file. Check more from where you downloaded data.

ADD REPLY

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6