How to estimate how much RAM and runtime a job will need
This information is going to come from looking at the documentation that comes with the software (not always a sure thing) and perusing the forums. You will have to start with a minimum recommended amount but throwing the kitchen sink at a job will end up just wasting resources. You could use a sub-sample of your data to get an idea of run-time. A couple hundred thousand reads may be sufficient to get a rough idea of what the run time may look like. With programs that support threading/multiple CPU's you will see some speed-up in execution time but it likely will not be linear. There may be some nuances as to what changes you may need to do in RAM allocation with multiple cores but it will likely be very program dependent.
The relationship between how much RAM you give a job and how much
runtime? Are these 2 parameters independent? Will one affect how long
you are in the queue more?
Not predictable, not completely and yes (in that order).
Rule of thumb is to allocate minimum recommended amount of RAM (remember there is no substitute for actual RAM) + 10% to account for overheads, different configuration of your cluster etc. Some programs will page data to local storage if enough RAM is not available so that will increase the run-time. On other hand, if you have a TB of RAM available then you could read the entire
nr blast index into memory and speed searches up. Most places have fewer nodes/job slots with access to lots of RAM, so you would likely be waiting in the queue longer with large memory requirement jobs.
I have been using an HPC cluster for a few years now and regularly
need to submit jobs that process large amounts (often over 100) for
large files like BAM files etc.
Despite some experience, I feel I am lacking some of the understanding of the basic concepts
I find that a bit surprising. Perhaps you are being modest or are truthfully recognizing a deficiency. What you have been doing so far has probably got you half-way there. Talking with your local fellow power users/sys admins would be an excellent way to re-mediate this deficiency. If you have not had a sys admin get on your case for doing something "out of bounds" on your cluster then you have not anywhere close to what is possible/acceptable!
Always remember to experiment with a couple of samples first before trying to start jobs with 100s.