Imagine I have two datasets, one has values of a dependent variable, time spend walking, along with many other independent variables such as for instance gender, age group, day of week and dog owner etc. The other as a lot less data, just the independent variables and the number of people within these groups who were recorded as walking for a particular combination of gender, age group etc.
If I use the first dataset to construct various distribution of walking times for each level of the data e.g. one distribution for males, one for females, one for Wednesdays etc. (provided sufficient data points – I’ve read 100 is a good rule of thumb)
Would I then be able to use these distributions to obtain distributions and subsequently (via something like inverse transform sampling) a collection of data points for each row of the second dataset. e.g. could I get a distribution of walking times for females, aged 40-50, on a Wednesday who owned a dog by overlaying the underlying distributions?
Many thanks, J