Yes, the fact of the matter is that RNA-seq count data naturally follows a negative binomial distribution, so, one has to model the data as such if one wishes to derive statistics from it. Original analysis methods modeled it as a Poisson but it was found that this still resulted in false-positive associations after having fitted the model. Microarray gene expression data, on the other hand, follows a normal distribution.
Practically, what does fitting a model actually mean? - What happens is that we literally create a logistic regression model of the data and specify the negative binomial as the family. In pseudocode:
glm(outcome ~ gene1 + CounfoundingFactors, family="NegativeBinomial")
That is fit for all genes. Other parameters are added for the purposes of nomalisation and dispersion.
With a model of the RNA-seq data, we can then make statistical inferences.
Some of your other questions are specific to the normalisation method that one actually uses. Typically, what are known as size factors are calculated, which will cater for the scenario that you mentioned.