I am trying to understand what is going on in the use of an Inverse Wishart prior for (Gaussian) covariance, and what is the motivation for it. I am seeing this posed as a solution for when the parameters being estimated do not have sufficiently many data samples to be estimated from.
I assume you are aware that Wishart matrices can be generated as the outer product $X^TX$ of a matrix where each row is an independent observation of a multivariate normal distribution, yes? So a Wishart prior might emerge from an assumption of multivariate normality in your data. The inverse Wishart is the distribution of the inverse of these outer products and is therefore also a distribution over SPD matrices. Choosing the inverse Wishart as prior guarantees a nice form for the posterior, but intuitively the Wishart is also a decent choice with the added benefit of not inversely penalizing matrices with large determinants.