by stats134711
Last Updated September 11, 2019 09:19 AM - source

I'm interested in testing independence of two groups (e.g. case and control) in a $2\times 2$ table: i.e. $H_0: \theta=1$ against the *two-sided* alternative $H_1:\theta\neq 1$, where $\theta$ is the odds-ratio. Suppose that the margins of the table are fixed, then the random variable for the number of hits among cases is given by $X\sim \text{HyperGeom}(n,N_1,N_2)$, where $n$ is the total number of hits, $N_1$ is the total number of cases and $N_2$ is the total number of controls. The pmf is given by
$$
Pr(X=x)=\frac{\binom{N_1}{n}\binom{N_2}{n-x}}{\binom{N_1+N_2}{n}}
$$
for $\max(0,n-N_2)\leq x\leq \min(n,N_1)$.

For testing, I'm using the mid $p$-value as it is one way to reduce the conservativeness of the Fisher's exact test without resorting to randomized tests. Suppose that the observed number of hits among cases is $x_0$. I've seen two formulations of the *two-sided* mid $p$-value in the literature:

**Formulation 1**
Eq 1.10 or Section 2.2
$$
p^{(1)}_{\text{mid}}=\sum_{j:Pr(X=j)<Pr(X=x_0)} Pr(X=j) + \frac{1}{2} \sum_{j:Pr(X=j)=Pr(X=x_0)} Pr(X=j)
$$

**Formulation 2**
$$
p_{lt} = Pr(X<x_0)+0.5~Pr(X=x_0)\\
p_{gt} = Pr(X>x_0)+0.5~Pr(X=x_0)\\
p^{(2)}_{\text{mid}}=2\min(p_{lt},p_{gt})=2\min(p_{lt},1-p_{lt})
$$
where the one-sided versions, $p_{lt}$ or $p_{gt}$, can be found in: Eq 1.7 or Section 5.1, to name a few.

In fact, Formulation 2 is the one used in SAS `PROC FREQ`

and in certain functions in `R`

packages such as `epitools::ormid.test`

.

From a simple test on the $2\times 2$ table below in `R`

, I noticed that these two functions sometimes don't produce the same $p$-values. In fact trying several tables seems to suggest that Formulation 1 can be much less conservative compared to Formulation 2. Additionally, Formulation 2 can be more conservative than the two-sided Fisher's exact test, as shown below.

**Question**
Which formulation is appropriate (and under what situations)?

```
midpval_f1 <- function(ct){
x <- ct[1,1]
n <- sum(ct[1,])
N1 <- sum(ct[,1])
N2 <- sum(ct[,2])
lo <- max(0L, n - N2)
hi <- min(n, N1)
support <- lo : hi
out <- dhyper(support, N1, N2, n)
return(sum(out[out < out[x - lo + 1]]) + sum(out[out==out[x-lo+1]])/2)
}
midpval_f2 <- function(ct){
x <- ct[1,1]
n <- sum(ct[1,])
N1 <- sum(ct[,1])
N2 <- sum(ct[,2])
plt <- phyper(x-1,N1,N2,n) + 0.5*dhyper(x,N1,N2,n)
pgt <- phyper(x,N1,N2,n,lower.tail = FALSE) + 0.5*dhyper(x,N1,N2,n)
return(2*min(plt,pgt))
}
test_ct <- matrix(c(3,5,7,9),ncol=2,byrow=T)
> midpval_f1(test_ct)
[1] 0.8366761
> midpval_f2(test_ct)
[1] 0.7956208
test_ct2 <- matrix(c(5,10,2,38),ncol=2,byrow=T)
> midpval_f1(test_ct2)
[1] 0.006789634
> midpval_f2(test_ct2)
[1] 0.01357927
> fisher.test(test_ct2)$p.value
[1] 0.012561
```

- Serverfault Help
- Superuser Help
- Ubuntu Help
- Webapps Help
- Webmasters Help
- Programmers Help
- Dba Help
- Drupal Help
- Wordpress Help
- Magento Help
- Joomla Help
- Android Help
- Apple Help
- Game Help
- Gaming Help
- Blender Help
- Ux Help
- Cooking Help
- Photo Help
- Stats Help
- Math Help
- Diy Help
- Gis Help
- Tex Help
- Meta Help
- Electronics Help
- Stackoverflow Help
- Bitcoin Help
- Ethereum Help