Hello,
Currently I'm trying to figure out how to download the latest patch of the hg38 assembly in AGP format from NCBI.
I have some oddly stringent requirements which makes this process difficult:
- I need to make the command as short and as simple as possible for other users to use.
- It should be future proof and get the latest patch number. However, I'd settle on a permanent location for older patch numbers.
- In the FTP link above, there are AGP files that are not necessary for our users to obtain. I only need the AGP files for chr1-22, X and Y.
I've tried various combinations of wget recursive/accept-regex combinations but it seems almost ignored since I don't believe it fetches a proper html file since it refers to a FTP site. You can "glob" on FTP sites using wget, which fetches the ".listing" file and the glob matches the pattern in there but I cannot find a pattern that only matches the AGP files I'm interested in.
Any insight or best guesses would be greatly appreciated.
Thanks!
Unfortunately, this retrieves the alt, MT, unplaced and unlocalized AGP files as well which doesn't suit my 3rd requirement. There is always the option of having this done in multiple commands by removing the unnecessary chromosomes afterwards but again I'm trying to keep this as short and simple as possible.
No it does not. I am not sure what
wget
you are using but on my system I only getchr*
files. If you don't needchrMT
then I have amended the command above to reject that file.My mistake, yes I meant only MT file as well.