# What is the assumption on the distribution of data in gaussian mixture models?

by Olórin   Last Updated March 14, 2019 20:19 PM - source

https://www.ics.uci.edu/~smyth/courses/cs274/notes/EMnotes.pdf

However, I am super confused at the very first line.

It says:

We have a dataset of some data $$x_i$$

Each data is assumed to be generated i.i.d. from an underlying distribution. We assume that the underlying distribution is a mixture of Gaussian distribution.

I do not understand why we make the assumption that the underlying distribution for the data is the mixture of Gaussian distribution.

This seems to me to be completely false.

The data distribution could be anything. We are only fitting a mixture of Gaussian model to whatever that underlying distribution is. We are minimizing the log-likehood using EM to approximate that distribution with the GMM.

Why do people assume that the data themselves are generated through Gaussians?

Is my interpretation correct?

Tags :