Nonnegative rank and nondeterministic communication complexity

In this post I would like to discuss the connection between the nonnegative rank of a nonnegative matrix, and the nondeterministic communication complexity of its support matrix --- the connection being simply the former is greater than or equal to two to the power of the latter.

Deterministic Communication Complexity

As a digression, let us first discuss the classical deterministic communication complexity of a boolean function ${f\colon \{0,1\}^d\times\{0,1\}^d \rightarrow \{0,1\}}$. The idea is that Alice and Bob want to compute a specific value ${f(x,y)}$, but Alice only knows ${x}$ whereas Bob only knows ${y}$. Now they'll have to communicate, until one of them, say Bob, knows ${f(x,y)}$. Communication is assumed to be the ``expensive'' thing, so that is what Alice and Bob try to minimize.
The maximum over all inputs ${(x,y)}$ of the minimum number of bits Alice and Bob have to exchange before Bob knows the answer, is called the communication complexity of ${f}$.
It is important to realize that Alice and Bob have unlimited amount of storage and computing power. In particular, both of them may store the complete ${(2^d\times 2^d)}$- matrix ${M_{x,y} := f(x,y)}$ (along with all prime numbers and every book ever written), if they think it might help. (In this sense, it is like the opposite of an exam for students: Alice and Bob may use the complete solution to all exam problems, but they have to find out which problem they are supposed to solve). The question is not, how difficult is to compute ${f}$. The question is, how difficult it is to agree on which region of the matrix the parties ``are'' in.

Nondeterministic CC

Now let us discuss nondeterministic communication complexity. Here, in addition to their respective input ${x}$ and ${y}$, Alice and Bob are given a ``certificate'' for the answer ${f(x,y) = 1}$, which both can verify independently. If in their verifications, they both independently conclude that the certificate proves that ${f(x,y) =1}$, then Bob replies '1'. Otherwise, he says '0'. Bob may err in his reply, but only in saying '0' when the right answer would have been '1'. If the true answer is '0', Bob may never say '0'. (This is just like in nondeterministic computational complexity classes.) Moreover, if the true answer is '1', there must be at least one certificate with which Alice and Bob arrive at the true answer.
Obviously, for Bob to learn the result of Alice's checking of the certificate, Alice has to communicate one bit to Bob; this bit is generally ignored. The ``nondeterministic communication'' is then the coding length of the certificate. Since Alice and Bob may store all possible certificates which they might ever occur, this is the same as the coding length of the number of certificates needed for the function ${f}$.
The only kind of yes/no questions Alice and Bob can answer independently are of the form ${x\in A}$ (for Alice), or ${y\in B}$, respectively (for Bob). Thus, certificates amount to rectangles.
Suppose that we have a number of submatrices ${R_1,\dots,R_q}$ of ${M}$, such that every entry in every ${R_j}$ is one. These matrices are supposed to cover all the 1s in ${M}$ in the sense that for every ${(x,y)}$ if ${M_{x,y} =1}$, then at least one of the ${R_j}$ contains the ${(x,y)}$-entry. The usual word for such submatrices is 1-rectangle (or just rectangle), and ${R_1,\dots,R_q}$ with said properties is called a rectangle covering. when Alice and Bob receive their inputs ${x}$ and ${y}$, respectively, if the right answer would be '1', the certificate could be any of the rectangles containing the entry ${M_{x,y}}$. If, however ${M_{x,y} = 0}$, it does not matter which certificate is given -- Bob will always reply '0'.
Thus, determining the finding the nondeterministic communication complexity of ${M}$ amounts to fining a rectangle covering with minimum number ${q}$ of rectangles. The nondeterministic communication complexity is then the coding length of ${q}$, i.e., ${\approx \log_2 q}$.
Note that ``rectangle covering'' makes sense for all 0/1-matrices --- not just square matrices with the number of rows/columns a power of two.

Rectangle covering as a coloring problem

Let us make this even a bit more combinatorial. Let ${M}$ be an ${(m\times n)}$-matrix ${M}$, and denote by ${\mbox{rc}(M)}$ its rectangle covering number, i.e., the minimum number of 1-rectangles needed to cover all 1-entries of ${M}$.
We define the rectangle graph ${G_M}$ as follows. The vertex set ${V(G_M)}$ is the set of 1-entries of ${M}$; two such entries are adjacent, if the rectangle they span contains a 0. Say, ${M_{k,\ell} = M_{k',\ell'} = 1}$, so we have the two vertices ${(k,\ell)}$ and ${(k',\ell')}$. They are adjacent, if and only if, ${M_{k,\ell'} M_{k',\ell} = 0}$.
Denoting by ${\chi}$ the usual (vertex-) chromatic number, the following is true.

Lemma 1
$\displaystyle \chi(G_M) = \mbox{rc}(M) $
Proof. For a proof, it suffices to realize that a 1-rectangle in $M$ is an independent set in $G_M$, and that, by the definition of the edges of $G_M$, for every independent set $U$ in $G_M$, the smallest rectangle which contains $U$ is a 1-rectangle.

I have been unable to find this fact elsewhere in published literature. The proof, although quite easy, can be found in a forthcoming paper.
So nondeterministic CC amounts to coloring a certain graph defined by the boolean function.

Connection with nonnegative rank (and extension complexity)

Now that the basics are clear, let us come back to the original question. Let ${A}$ be a nonnegative matrix and ${M}$ ist support-matrix, i.e., ${M=}$ is the 0/1-matrix with ${M_{k,\ell} = 1}$ if and only if ${A_{k,\ell} \ne 0}$. The following fact was already known to Yannakakis (1991).

Lemma 2
$\displaystyle \mbox{rc}(M) \le \mbox{rk}_+(A). $
Proof. The nonnegative rank of $A$ is the minimum number of nonnegative rank-1 matrices whose sum equals $A$. In other words, there exist nonnegative vectors $u_k, v_k$ $k=1,\dots,\mbox{rk}_+(A)$ such that
$\displaystyle A = \sum_{k=1}^{\mbox{rk}_+(A)} u_k v_k^T$.
The support of $A$ is thus the union of the supports of the $u_k v_k^T$. But clearly, the support of $u_k v_k^T$ is a rectangle: it is the product of the support of $u_k$ and the support of $v_k$. Thus, we have a rectangle covering using $\mbox{rk}_+(A)$ rectangles, which implies the inequality in the lemma.

In the context of the extension complexity of a polytope ${P}$, the rectangle graph can be defined as follows. The vertices of ${G}$ are the pairs ${(x_k,(a_\ell,\alpha_\ell))}$, where ${x_k}$ is a point in ${P}$ and the inequality ${a_\ell^Ty \le \alpha_\ell}$ is valid and supporting for ${P}$, such that ${x_k}$ does not lie on the hyperplane defined by ${(a_k,\alpha_k)}$. Two such pairs are adjacent in ${G}$, if at least one of the two points does lie in one of the two hyperplanes. For this graph ${G}$ we have

Lemma 3
$\displaystyle \chi(G) \le \mbox{xc}(P). $