The Gene Set Enrichment Algorithm, outlined in this paper, http://www.broadinstitute.org/gsea/doc/subramanian_tamayo_gsea_pnas.pdf, refers often to a "random walk" used to traverse the ranked list L of gene-to-phenotype correlations.

However, what they actually do in the paper does not look like a random walk at all. It seems to me that they traverse the ranked list L sequentially, from rank 1 (highest correlation) onwards.

I was wondering if anyone could clear up the confusion of what they mean by "random walk", and why they use the term, when really it looks like they are doing a sequential walk, quite the opposite.

Also, as a follow-up question, how is it that they do not bias the top of the ranked list `L`

over the bottom? If we assume for the moment that they are doing a sequential walk, which seems to be the case, then the gene sets found at the bottom extreme will have a larger value for `P_miss`

, since `P_miss`

is proportional to `i`

. As a consequence, they will have smaller enrichment scores.

Perhaps this is related to the question above, since a sequential walk does not seem to work here...

I appreciate any help... I suspect I am not understanding something correctly...

Hey, thanks! I think this article made it clear. They are comparing the supremum (ES) with what it would be for a random walk... gene sets found at the top or the bottom will have a higher ES, and gene sets that are randomly distributed will resemble a random walk - thanks!