by Till Hoffmann
Last Updated October 17, 2018 17:19 PM

Let $X$ be a real, univariate random variable with probability distribution function $p(x)$. Let $x_1, \ldots, x_n$ be a sample of size $n$ drawn from $X$. U statistics provide an unbiased estimator of the integral \begin{align} I&=\int dxdx'\phi(x,x')p(x)p(x')\\ &\approx \frac{1}{n(n-1)}\sum_{i\neq j}\phi(x_i,x_j) \end{align} by employing the empirical distribution function and omitting terms $i=j$. The terms are omitted because they can have peculiar properties. For example, if $X\sim\mathrm{Normal}(0,1)$ and $\phi(x,x')=|x-x'|$, there would be $n$ terms with $\phi(x_i,x_i)=0$ even though the probability that $x=x'$ is vanishing.

I would like to evaluate the integral $$ J=\int dx' \prod_{i=1}^n \phi(x_i,x') p(x'). $$ Naively, one might approximate using the empirical distribution function such that $$ \hat{J}_1= \frac 1n\sum_{j=1}^n \prod_{i=1}^n \phi(x_i,x_j). $$ But each term in the sum will have $i=j$ terms in it which is likely going to lead to problems. Is there an equivalent to U statistics for integrals of the form I am working with?

The following estimator avoids combinations with $i=j$ but I don't trust it because I don't know how it behaves. $$ \hat{J}_2=\frac{1}{n}\sum_{j=1}^n\left(\prod_{i\neq j}^n \phi(x_i,x_j)\right)^{n/(n-1)} $$

Let $X\sim\mathrm{Normal}(0,1)$, $n=50$ and $q(x,x')=|x-x'|$. Then $\hat{J}_1=0$ for any sample because there is a term $q(x_i,x_i)=0$ in each product. The alternative estimator performs better (see the `python`

code below).

```
import numpy as np
# Define a kernel
kernel = lambda x, y: np.abs(x-y)
# Generate some data
# np.random.seed(1)
n = 50
x = np.random.normal(0, 1, n)
# Perform the integral by Monte Carlo integration
# using the known distribution for x
nsamples=1000
y = np.random.normal(0, 1, nsamples)
samples = np.prod(kernel(x[:, None], y[None]), axis=0) ** (1.0 / n)
# Report the results
integral = np.mean(samples)
integral_std = np.std(samples) / np.sqrt(nsamples)
print "Monte Carlo integration: {:.2f}+-{:.2f}".format(integral, integral_std)
# Use the naive estimator (which fails terribly)
samples = np.prod(kernel(x[:, None], x[None]), axis=0) ** (1.0 / n)
# Report the results
naive = np.mean(samples)
naive_std = np.std(samples) / np.sqrt(n)
print "Naive estimator J_1: {:.2f}+-{:.2f}".format(naive, naive_std)
# Calculate it with the correction
samples = kernel(x[:, None], x[None])
samples[range(n), range(n)] = 1
samples = np.prod(samples, axis=0) ** (1.0 / (n-1))
# Report the results
alt = np.mean(samples)
alt_std = np.std(samples) / np.sqrt(n)
print "Alternative estimator J_2: {:.2f}+-{:.2f}".format(alt, alt_std)
# Output for np.random.seed(1)
# Monte Carlo integration: 0.81+-0.01
# Naive estimator J_1: 0.00+-0.00
# Alternative estimator J_2: 0.80+-0.05
```

- Serverfault Help
- Superuser Help
- Ubuntu Help
- Webapps Help
- Webmasters Help
- Programmers Help
- Dba Help
- Drupal Help
- Wordpress Help
- Magento Help
- Joomla Help
- Android Help
- Apple Help
- Game Help
- Gaming Help
- Blender Help
- Ux Help
- Cooking Help
- Photo Help
- Stats Help
- Math Help
- Diy Help
- Gis Help
- Tex Help
- Meta Help
- Electronics Help
- Stackoverflow Help
- Bitcoin Help
- Ethereum Help