How to estimate how much RAM and runtime a job will need
This information is going to come from looking at the documentation that comes with the software (not always a sure thing) and perusing the forums. You will have to start with a minimum recommended amount but throwing the kitchen sink at a job will end up just wasting resources. You could use a sub-sample of your data to get an idea of run-time. A couple hundred thousand reads may be sufficient to get a rough idea of what the run time may look like. With programs that support threading/multiple CPU's you will see some speed-up in execution time but it likely will not be linear. There may be some nuances as to what changes you may need to do in RAM allocation with multiple cores but it will likely be very program dependent.
The relationship between how much RAM you give a job and how much
runtime? Are these 2 parameters independent? Will one affect how long
you are in the queue more?
Not predictable, not completely and yes (in that order).
Rule of thumb is to allocate minimum recommended amount of RAM (remember there is no substitute for actual RAM) + 10% to account for overheads, different configuration of your cluster etc. Some programs will page data to local storage if enough RAM is not available so that will increase the run-time. On other hand, if you have a TB of RAM available then you could read the entire nr
blast index into memory and speed searches up. Most places have fewer nodes/job slots with access to lots of RAM, so you would likely be waiting in the queue longer with large memory requirement jobs.
I have been using an HPC cluster for a few years now and regularly
need to submit jobs that process large amounts (often over 100) for
large files like BAM files etc.
Despite some experience, I feel I am lacking some of the understanding of the basic concepts
I find that a bit surprising. Perhaps you are being modest or are truthfully recognizing a deficiency. What you have been doing so far has probably got you half-way there. Talking with your local fellow power users/sys admins would be an excellent way to re-mediate this deficiency. If you have not had a sys admin get on your case for doing something "out of bounds" on your cluster then you have not anywhere close to what is possible/acceptable!
Always remember to experiment with a couple of samples first before trying to start jobs with 100s.
This depends on knowing the tool and quite a bit of trial and error. You can start off with figuring out how to parallelize runs as much as possible, then you optimize the RAM, wall time and number of cores for each parallel chunk, and also optimize the RAM, runtime and number of cores for the master thread.
Start off with 16 GB RAM and 4-8 cores, wall time of 48-72 hours and tune from there. There are a whole lot of variable that go into the process.
I doubt you will find any resource that explains these because its something you have to figure out for yourself through trial & error. Depends on the program you are using and the data you are processing. There are basically 2 approaches: 1) be extremely generous for each job and request more memory and time than you could possible need, or 2) request only the bare minimum memory and time and see if the job completes successfully, if not then bump them up a little and try again.
m93, people have invested time to answer your question.
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Some programs are written to run by storing data in memory. Other programs are written to run by working on sorted or other predictably-organized data from file streams. Other programs still work best doing a mix of both. It depends on your program and input.
Without knowing what you're doing, this is a tough question to answer with specifics. Yet:
Giving more memory to a program that uses a constant amount of memory will not change how fast it runs. This will just waste memory. However, if you can split the work and run lots of instances of said program, each working concurrently on a small piece of the problem, then more memory will help the overall task complete in less time, because your overall memory use will be, at most, M x N for constant memory cost M and N jobs.
Also, job schedulers will have an easier time moving many small-memory jobs from the wait queue into the run queue, than one monolithic large-memory job, which may need to wait until queue conditions allow allotment of a large chunk of memory.
Actually, it would help the forum, if you could post few tips/suggestions as you have years of experience in submitting bioinformatics jobs to HPC cluster. Take dummy data or public domain data and walk us through till the end. At least, point to your blog/github repo for scripts. m93