# Cumulants in chinese restaurant process?

by Cupitor   Last Updated September 23, 2018 18:19 PM - source

I already wrote a similar question on StackOverflow but was not welcomed there! So I decided to ask it from the folks here. I have wrote a code in Python for CRP problem. I think everybody here is familar with the subject but nevertheless:

Short description of it: Suppose we want to assign people entering to a restaurants to potentially infinite number of tables. If $z_i$ represents the random variable assigned for the $i$'th person entering the restaurant the following should hold:

With probability $p(z_i=a|z_1,...,z_{i-1})=\frac{n_a}{i-1+\alpha}$ for $n_a>0$, $i$'th person will sit in table $a$ and with probability $p(z_i=a|z_1,...,z_{i-1})=\frac{\alpha}{i-1+\alpha}$ $i$'th person will sit around a new table.

I am not quite sure if my code is correct cause I am surprised how small the final number of tables are. I would be happy if somebody could give me cumulants for the distribution associated with this process.

import numpy as np
def CRP(alpha,N):
"""Chinese Restaurant Process with alpha as concentration parameter and N
the number of sample"""
#Array which will save for each i, the number of people people sitting
#until table i
summed=np.ones(1) #first person assigned to the first table
for i in range(1,N):
#A loop that assigns the people to tables

#randind represent the random number from the interval [1,i-1+alpha]
randind=(float(i)+alpha)*np.random.uniform(low=0.0, high=1.0, size=1)
#update is the index for the table that the person should be placed which
#if greater than the total number, will be placed in a new table
update=np.searchsorted(summed,randind,side='left')
if randind>i:
summed=np.append(summed,i+1)
else:
zerovec=np.zeros(update)
onevec=np.ones(summed.size-update)
summed+=np.append(zerovec,onevec)
#This part converts summed array to tables array which indicates the number
#of persons assigned to that table
tables=np.zeros(summed.size)
tables[0]=summed[0]
for i in range(1,summed.size):
tables[i]=summed[i]-summed[i-1]
return tables
a=CRP(5,1000)
print a

Tags :

I don't know how many tables you were expecting, but the mean and variance of the number of total tables is available in closed form. For $\alpha = 5$ and $N = 1000$ $$E[\mbox{Num Tables}] = \sum_{i=1}^N \frac{\alpha}{\alpha + i - 1} \approx 27,$$ and $$\mbox{Var}[\mbox{Num Tables}] = \sum_{i=1}^N \frac{\alpha(i-1)}{(\alpha + i - 1)^2} \approx 21.5.$$
I ran your code in python and it seems consistent with these formulas. Note that the expected number of tables grows logrithmically in $N$.