I have a fairly large data set with 500000 observations for 100 variables. Observations are randomly assigned to treatment and control group. How do I infer that treatment has indeed been randomized in the data set. I ran t-tests on a couple of explanatory variables. Some have given significant p-values, some insignificant. What would be the most suitable check for randomization here?
Ideally, the person who undertakes the randomisation should have created some replicable coding to allow the randomisation to be audited and reproduced. If randomisation is done in R this can be done by using
set.seed and having the code generating the randomisation.
In the case where the randomisation is not replicable, it is effectively then just a bunch of numbers that have come from somewhere. You can conduct post-hoc "balance tests" to see if there different groups appear to have been randomised, but that is all you can do. The other thing you should do is get very cranky at the person who did the randomisation.