DOT's blog
This is the (blog and) homepage of Dirk Oliver Theis, University of Tartu, Estonia. I post on my research, mainly in Discrete Mathematics: probabilistic combinatorics, random structures; combinatorial matrix theory, matrix factorizations, polyhedral combinatorial optimization; hypergraph theory, graph theory; and more.
It's the End of the Blog
Up-to-date current will from now on only be on my official homepage homepage at the University of Tartu.
Bye-bye :)
Hypergraph problems connected to Boolean rank
There is a hypergraph (or set system, if you will) parameter which is equivalent to the Boolean rank / biclique covering number. Let H be a hypergraph with vertex set V(H) and (hyper-)edge set E(H). Let us say that a generator of H is a set of vertex sets $W_1,\dots,W_q$ such that every edge of H is the union of some of the sets $W_j$. Let us further agree that the generating number of a hypergraph is the smallest number of sets in a generator.
This was really easy. But let us make a connection to a more familiar hypergraph concept, namely transversals. It is not surprising that the biclique covering number / hypergraph generating number are transversal numbers of suitably defined hypergraphs. Slightly more sophisticated, though, is the following.
Hypergraph transversals and lower bounds for hypergraph generating number
I will use notation and terminology of the hypergraph generating number, which is more natural in this context. As usual, we are hunting for lower bounds. Let us denote the generating number of H by $\gamma(H)$. First, let us make the following trivial observation.
Lemma 0. $\gamma(H) \le |H|$ --- the generating number is at most the number of vertices. If we want to prove that $\gamma(H)$ is large, then singleton sets $W_j$ work for us: they drive the total number of sets which are needed towards the trivial upper bound $|H|$. Now let us adapt the concept of generating number to forbid singleton sets.
Definition 1. Denote by $\gamma_2(H)$ the smallest number q of set $W_1,\dots,W_q$ of at least two elements, such that every edge of H is the union of some of these sets.
Obviously, for this number to be finite, we need that H has no singleton edges. Letting, for a vertex set S denote $H\setminus S$ the hypergraph with vertex set $V(H)\setminus S$ and edge set $\{ e\setminus S \mid e\in E(H)\}$ (this is the hypergraph restricted to the complement of S, we trivially have the following:
Lemma 2. $\gamma(H) = \min_{S\subset V(H)} \; \bigl( |S| + \gamma_2(H\setminus S) \bigr)$. Now, for a vertex v, denote by H.v the hypergraph with vertex set V(H) and edge set
Lemma 3. Let $v_1,\dots,v_n$ be any ordering of the vertices of H. Then $\gamma_2(H) \ge \sum_{r=1}^n \tau( H_r.v_{r+1})$.
(As usual, $\tau$ denotes the transversal number.) Again, I leave the proof as an exercise. (Hint: consider the set of all generators $W_j$ which contain the vertex $v_r$) Let's move to an application of Lemma 2.
The generating number of complete uniform hypergraphs
We would like to determine the generating number $\gamma_{n,k}$ of the complete k-uniform hypergraph on n vertices. Here, k may depend on n. In this paper, we prove the following.
Theorem 4. The generating number of the complete k-uniform hypergraph on n vertices is at least $\min\bigl(n, \; \; (n-k+2)(n-k+1)/2\bigr)$.
For the proof, we first note that each of the hypergraphs $H\setminus S$ occurring in Lemma 2 has as its edge set all vertex sets of cardinality between $k-|S|$ and $k$. Hence, you may check that the transversal number of the hypergraph $H_r$ occurring in Lemma 3 equals $\max(0, n-r-k+2)$, if $k-|S| \ge 2$, and $\infty$, if $k-|S| \le 1$. In the first case, the sum in that lemma amounts to
Corollary 5. For $k\le n - (\sqrt{8n+1}-3)/2$, the generating number of the complete k-uniform hypergraph on n vertices is equal to n.
For k larger than the number in the corollary, the value of $\gamma$ is unknown. The following upper bound is proved in arXiv:1111.0444.
Theorem 6. $\gamma_{n,k} \le (n-k+1)^2 \log n$.
The proof uses a simple probabilistic argument, and we omit it here (see Lemma 3.3. in said paper). What is clear, is that for fast growing k there is a $\log n$ factor between the bounds in Theorems 5 and 6. For $k = n - O(1)$, this factor is needed, since (exercise!) $\gamma_{n,n-\ell} = \Omega_\ell (\log n)$. But $n-k \to \infty$ but $n-k = o(\sqrt n)$, the gap is a problem.
Problem 7. Determine the value of $\gamma_{n,k}$, when $n-k \to \infty$ and $n-k = o(\sqrt n)$! |
Problem 8. Determine the generating number of a random k-uniform hypergraph on n vertices! (With, say, edges chosen independently with probability p = p(n).) |
Best combinatorial bound for extension complexity
Proposition. For every 0/1-matrix M, the number
$\min \{ \mbox{rk}_+(A) \mid \mbox{supportmatrix}(A) = M \}$
is equal to the Boolean rank of M. For a polytope P, let xc(P) denote its extension complexity, i.e., the smallest number of facets of a polytope Q which can be mapped onto P by a projective mapping. Moreover, let S(P) denote the support matrix of the vertex-facet slack matrix of P. The proposition above motivates the following question.Question 1. Let M be the support matrix of the vertex-facet slack matrix of a polytope. Is it true that the number
$\min \{ \mbox{xc}(P) \mid S(P) = M \}$ is equal to the Boolean rank of M? |
Question 2. Is there a polytope which is combinatorially equivalent to the matching polytope and whose extension complexity is $O(n^4)$? |
Question 3. Is there a polytope which is combinatorially equivalent to the matching polytope and whose extension complexity is polynomial in n? |
Nonnegative rank (and other matrix factorization ranks over cones) at Dagstuhl, II
The complete title of the seminar is "Limitations of convex programming: lower bounds on extended formulations and factorization ranks", the official web page is here. As the title suggests, we (myself, Hartmut Klauck, Troy Lee, Rekha Thomas) have conceived it to be more broad than the 2013 workshop, which was mainly on nonnegative rank (and linear extended formulations of combinatorial optimization problems). In the nonnegative rank part (even though new lower bounds on Boolean rank are still very much desired for the relevant matrix families) there will be novelties, too. We plan to invite people from real algebraic geometry, more specifically, algebraic statistics, whose interest is in describing the set of matrices with nonnegative rank at most k as a semialgebraic set.
Just as a teaser, here's my favourite open problem in that area. For positive integers r < n, let f(n,r) be the smallest number q such that the set of all matrices of nonnegative rank at most q is has nonempty relative interior in the set of all rank-r $n\times n$ matrices. Clearly, $k \le f(n,k) \le n$, and Yaroslav Shitov (arXiv:1303.1960) has recently proven that $f(n,3) \le 6n/7$.
Conjecture 1. $f(n,k) = \Omega(n)$ for all k. |
I also like this stronger version.
Conjecture 2. $f(n,k) \sim n$ asymptotically if both k and n tend to infinity. |
PostDocs
- Matrix Theory,
- Probabilistic Combinatorics,
- Hypergraph Theory,
- Combinatorial Optimization.
- be fluent in English (deal breaker);
- hold a PhD in Math (or TCS);
- have experience and publications in at least one of the above areas;
- have a very good record of publications (measured relative to the specifics of your area and the stage of his PostDoc life).
- Eurozone since 2011
- 1.3 million inhabitants (= Cologne + Heidelberg)
- Bustling economy
- Language similar to Finnish
- No need to learn it: English works everywhere. (Out of politeness, not necessity, I learned the Estonian words for: 'Hi', 'Please coffee', 'Thanks', 'Good bye'.)
- Houses the region's most venerable university
- Student city (= cafes, bars, restaurants, nightlife, ...)
- Climate:
- 4 months party in the sauna (winter)
- 4 months party in the streets (summer)
- (4 months inconclusive)
PhD positions in Discrete Math
The work environment is multi-national. The language of instruction and communication is English.
You should be interested in at least one of the following areas:
- Matrix theory;
- Probabilistic combinatorics and hypergraph theory, graph coloring, probabilistic methods;
- Computational discrete optimization
The research group in discrete mathematics and combinatorial optimization was created in Feb 2013, and is in expansion (the web site still isn't up yet). As of September '14, it will consist of 2 PhD students, one early PostDoc, one advanced PostDoc, myself, and several half-members whose main research area has strong connections to combinatorics (Coding Theory, Cryptology). Expect a small and vibrant environment. Regarding supervising research students, I value working very closely with them. As a rule of thumb, I have 5+ hours of one-on-one face time per week with each student, usually condensed into one intensive afternoon research session.
The qualifications I am looking for are roughly the following. The ideal candidate would:
- have an excellent background in mathematics;
- be fluent in English;
- be able to express concisely his mathematical ideas and questions;
- hold a Masters degree (or equivalent), or be in the process of acquiring one, in Mathematics or Computer Science.
Below I have summarized some information about Estonia, Tartu, the University of Tartu, and the institute.
The PhD program
The 4-year PhD program (affiliated in Computer Science) requires to do ~50 ECTS (European credit points) of coursework. On the graduate level, we have 4 core courses covering basic research level combinatorics, which rotate every fall: extremal combinatorics, probabilistic combinatorics, enumerative combinatorics, and (hyper-)graph theory. The courses are heavily geared towards problems solving. The salary composition is competitive (i.e., depends on the candidate's qualifications). The generous version: you get ~1000 Euros; the other verions: you get ~590 Euros plus a half TA position worth ~410 Euros. (All these numbers are after tax.)
About half of the PhD students in the program are non-Estonian. Unfortunately, currently Discrete Math & Combinatorial Optimization is 100% expatriates.
Estonia
Estonia is the Euro-zone country located directly south of Finland.
It is an ``E-Society'': Broadband Internet (e.g. LTE) is fast and inexpensive or free (public WiFi is ubiquitous). The country scores 1st in the world in Internet Freedom and 5th in Privacy. It is ranked directly behind Germany in the ``State of the World's Children's Index'' (10th). In the Human-Development index, it scores between the UK and Portugal. See Wikipedia for these and other statistical facts.
Pretty much everybody under 40 speaks English. It is not necessary (though polite) to learn Estonian. The language is similar to Finnish.
Tartu
Tartu is a picturesque city in southern Estonia. Home to 17,000 students (in a population of 105,000), there are lots of inexpensive restaurants, bars, and cafés. Tartu is also the destination of large numbers of Erasmus exchange students from the rest of Europe. Consequently, it has an outsize the nightlife, well beyond what you'd expect in a city that size.
Universitas Tartuensis
Founded in 1632, the University of Tartu is one of the most venerable centers of education and research in this part of the world. In rankings (e.g., Times Higher Education World University Ranking), it is ranked in the top 400 universities world wide, with a rapid ascent over the recent years.
The CS institute
A significant number of the Computer Science institute faculty are expatriates. The discrete math & combinatorial optimization group, of which you will be part, is still in expansion. Research fields are Graph- and Hypergraph Theory, (Combinatorial) Matrix Theory, Combinatorial Number Theory, Probabilistic Combinatorics. Within, CS, there are connections and interactions with the sprawling Cryptology group here, and with a small group in Coding Theory.
The faculty comprises the CS, Math, and Statistics institutes. In Math, there are large research groups in Algebra and Functional Analysis, but the Math institute is less international (e.g., all their courses are in Estonian), same holds for the Statistics institute.
Extension complexity and factorization rank over arbitrary convex cones
In this post, I will write out some details of arguments in the context of these constructions.
The setup
We will be concerned with the situation that there are two convex polyhedral conesThe rigorous definition is the following. For the rest of this post, let $\mathcal C$ be a set of closed convex cones contained in linear spaces $V_C \supset C \in \mathcal C$, and let $h\colon \mathcal C \to \mathbf N$ be a mapping ($h(C)$ will be the "complexity" of C). For simplicity, we require that the $V_C$ are finite dimensional. For convenience, we also assume that C is full-dimensional in $V_C$, i.e., the interior of C is non-empty. Definition. A $\mathcal C$-extension of a pair of convex polyhedral cones $P_0\subset P_1$ in a linear space $W$ is a $C\in \mathcal C$ together with a linear subspace L of $V_C$, and a linear mapping $\pi\colon V_C \to W$, such that $Q := C\cap L$ satisfies (*). The $\mathcal C$-extension complexity of $P_0\subset P_1$ is the infimum over all $h(C)$ of $\mathcal C$-extensions of $P_0\subset P_1$. Remark. The extension complexity may be infinite. Remark. For the definition to make sense in applications in combinatorial optimization, where one is typically interested in polyhedra $P_0\subset P_1$ which are not cones, the family $\mathcal C$ will have to be closed with respect to homogenization. This is the case in the examples mentioned above.
Connection with factorization rank
We now define the factorization rank of a matrix over a system of convex cones. For a closed convex cone C contained in a linear space V, we denote its dual cone byTheorem. The $\mathcal C$-extension complexity of $P_0\subset P_1$ equals the $\mathcal C$-factorization rank of the slack matrix S.Proof. For simplicity, denote by $x_1,\dots,x_n$ the columns of X, and by $a_1,\dots,a_m$ the rows of A, so that $S_{k,\ell} = \left(a_k\mid x_\ell\right)$. We first show that the $\mathcal C$-extension complexity is at least the $\mathcal C$-factorization rank, by constructing a factorization of S from an extension of $P_0\subset P_1$. Let $(C,L,\pi)$ be such an extension, and $Q:= C\cap L$. For every $\ell$, there exists $\xi_\ell \in Q$ such that $\pi(\xi_\ell) = x_\ell$. For every k, it is follows from $\left(\pi^*(a_k)\mid \xi\right) = \left(a_k \mid \pi(\xi) \right)$ that $\pi^*(a_k) \in Q^*$. Thus, by (**), there exist $\alpha_k \in C^*$ and $\gamma_k\in V^*$ with $\left(\gamma_k\mid \lambda\right) = 0$ for all $\lambda\in L$, such that $\pi^*(a_k) = \alpha_k + \gamma_k$. We conclude
Dagstuhl Workshop "Communication Complexity, Linear Optimization, and lower bounds for the nonnegative rank of matrices"
Here's the abstract from the proposal.
A number of notorious conjectures and open problems in Matrix Theory, Combinatorial Optimization, and Communication Complexity require proving lower bounds on the nonnegative rank of matrices. In Combinatorial Optimization, one of the most infamous questions is whether the Matching Problem admits a polynomially sized Linear Programming formulation (Schrijver's Book). The minimum size of a Linear Programming formulation equals the nonnegative rank of a certain matrix (Yannakakis 1991). In Communication Complexity, one of the biggest open problems, the log-rank conjecture posed by Lovász and Saks (1988), involves the question whether the logarithms of the rank and the nonnegative rank are polynomially related for 01-matrices.
The currently known lower bounds for the nonnegative rank are either trivial (rank lower bound) or known not to work in many important cases (biclique covering lower bound). Recently, in Combinatorial Optimization, there has been a surge of interest in lower bounds of sizes of Linear Programming formulations. A number of new methods have been developed and new results have been obtained. Moreover, a new connection linking nonnegative rank directly to a variant of randomized communication complexity has been discovered.
We want to bring together researchers from the fields involved, in order to discuss and perfect the (known and) new techniques, and use (known and) new connections to transfer tools and methods from each one field to the others. We expect that in the wake of this seminar, considerable advances will be made towards resolving some of the questions regarding nonnegative rank.
Open questions about nonnegative rank and related concepts
$\mbox{frk}_S$ | factorization rank over the semiring S |
$\mbox{ndCC}(A)$ | nondeterministic communication complexity of the support of A |
$\mbox{fool}(A)$ | fooling set bound |
Theoretical questions
- Cohen & Rothblum (1993) asked: If ${A}$ is rational, is its factorization rank over ${\mathbf{Q}_+}$ equal to the factorization rank over ${\mathbf{R}_+}$? More generally: If ${A}$ has entries in a sub-semiring ${S}$ of a sub-semiring ${R}$ of the nonnegative reals, how large can the gap be between ${\mbox{frk}_S(A)}$ and ${\mbox{frk}_R(A)}$? (Beasley & Laffey 2009)
- Given any sub-semiring ${S}$ of ${\mathbf{R}_+}$ and ${n\ge 6}$, is there a matrix ${A \in \mathbf{M}_{n\times n}(S)}$ such that ${\mbox{rk} A = 3}$ and ${\mbox{frk}_S(A) = n}$? (Beasley & Laffey 2009) This is open even for ${S=\mathbf{R}_+}$. Cf. Specific-Question #1.
- If ${A}$ is a symmetric ${n\times n}$ nonnegative matrix with real rank ${k}$, and if ${A}$ has exactly one negative eigenvalue, must ${A}$ have a nonnegative factorization ${A = XY}$ where ${X}$ is ${n\times q}$ with ${q}$ bounded as a function of ${k}$? (Beasley & Laffey 2009)
- What is the smallest $\alpha$ such that $\mbox{rk}_+(A) \le \mbox{rk}(A)^\alpha$ for all 01-matrices $A$? This corresponds to Lovász and Saks log-rank conjecture (with $\alpha = \infty$ if that is false). Proving lower bounds for $\alpha$ corresponds to proving lower bounds for the nonnegative rank.
- What is the largest separation between the logarithm of the nonnegative rank and the deterministic communication complexity for Boolean functions? The gap cannot be more than quadratic, but the best known example has a smaller gap. A related question asks for the largest separation between the nonnegative rank and the biclique covering number for 01-matrices.
- It is known that ${\displaystyle \mbox{fool}(A) \le (\mbox{rk} A)^2}$. (Dietzfelbinger Hromkovic Juraj Schnitger (1996)), and there are examples where ${\mbox{fool}(A) \ge (\mbox{rk} A)^{\log_3 4}}$. Which of these two bounds can be improved? (Dietzfelbinger, Hromkovic, Juraj, Schnitger 1996)
Questions about specific families of matrices
- A ${d}$-dimensional, Euclidean distance Matrix of size ${n}$ is defined by points ${x_1,\dots,x_n}$ in ${d}$-dimensional Euclidean space. Its entries are ${\bigl( \lVert x_k - x_\ell \rVert )_{k,\ell}}$. Do ``generic''/``random'' Euclidean distance Matrices have full nonnegative rank? (Gillis & Glineur 2011; Lin & Chu 2010). A proof of this for ${d=1}$ by Lin & Chu 2010 is fatally flawed. More spcifically: Do some linear Euclidean distance Matrices have the propery asked for in Theory-Question #2? (Beasley & Laffey 2009) (Not all of them do (Gillis & Glineur 2011).)
The rank bound and the ndCC bound are both trivial for these matrices. Upper bounds for special cases have been improved. (Gillis & Glineur 2011)
- Does a random (with a suitable distribution) ${(m\times n)}$ rank-${k}$ matrix have full nonnegative rank? A weaker question asked by J. Garcke, M. Klimm, H.-C. Kreusler, and A. Uschmajew is: Has the set of matrices with nonnegative rank-${k}$ has positive Lebesgue measure within the ${(m\times n)}$ rank-${k}$ matrices? If ${A}$ is randomly drawn from the ${(m\times n)}$-matrices with nonnegative rank ${k}$, is ${\mbox{rk} A = k}$ with probability one / positive probability? (Dong, Lin, Chu)
For some of these questions, erroneous proofs can be found on the internet. In particular, a proof by Dong, Lin, and Chu is incorrect. The rank bound and the ndCC bound are both trivial for these matrices.
- Is the extension complexity of the Matching Polytope of an ${n}$-vertex graph polynomial in ${n}$?
An infamous, several decades old question in Combinatorial Optimization asks, whether there exist an extended formulation for the matching problem whose coding size is polynomial in the number of vertices of the graph. A slightly weaker question asks whether there exist an extended formulation whose number of inequalities is bounded by a polynomial in the number of vertices of the graph. The best known lower bound is the trivial rank bound ${\Omega(n^2)}$. The ndCC bound is ${O(n^4).}$ The known upper bounds are ${O(c^n)}$, for values of ${c\in \left]1,2\right]}$ which have been improved several times in the last three decades.
- Is the extension complexity of the Spanning Tree polytope ${\Omega(n^3)}$?.
Michele Goemans asked this question at the Cargese Workshop on Combinatorial Optimization 2010, which was dedicated to extended formulations. The best known lower bound is the trivial rank bound. It is unknown whether the ndCC bound improves on the rank bound. The known upper bound is ${O(n^3)}$, with improvements for certain types of graphs.
- What is the extension complexity of the Stable Set polytope of a perfect graph?
The question goes back to Yannakakis (1991), and was recently repeated by Michele Goemans (MIT) at the Cargese Workshop on Combinatorial Optimization 2010, which was dedicated to extended formulations.
- Is the extension complexity of the Completion Time polytope ${O(n\log n)}$? (Goemans 2011 & Cargese)
If true, this would extend work by Goemans (2011). The best known lower bound is the trivial rank bound. The upper bound is ${O(n^2)}$, which has been proved in a couple of ways in the last 20 years.
- Is the extension complexity of an ${n}$-vertex convex polygon with vertices in general position ${\Omega(n)}$? (Fiorini, Rothvoss, Tiwary) The rank bound is 2, the ndCC bound is ${\Omega(\log n)}$. Some regular convex polygons have extension complexity ${\Theta(\log n)}$. The best known lower bound for algebraically independent vertices is ${\Omega(\sqrt n)}$.
Random signed SAT
Algorithms which repair their mistakes
Say you have a smart algorithm, which proceeds through a number, say $\Theta(n)$, of iterations. In each iteration, the probability of it succeeding is quite large. For example, let us say that, in each iteration, the failure probability is $\Theta(r(n)/n)$ for some $r(n) = o(n)$. Due to the large number of iterations, though, the expected number of failures is $\Theta(r(n))$ --- too large even for constant $r$. The remedy is, of course, to let the algorithm check, in each iteration, whether it has made a mistake, and if that is the case, repair the damage. There are certain complications with this approach, mainly because the algorithm has already "looked at" some of the random input data, so that, during the repair, we have to assume that the distribution of some of the data is changed by conditioning on the earlier choices of the algorithm. However, there's a trick in this situation: While the overall algorithm (likely) chooses is actions fairly intelligently, for the repair part it is ok to proceed in a fairly stupid way. This is so because all the repair part needs to do, is to push its own failure probability (conditioned on the fact that a repair was necessary and on the previous choices of the algorithm) to $o(1/r)$. For example, for constant $r$, it suffices that the repair part fails with probability tending to zero arbitrarily slowly. In our paper, the we bound the failure probability of the "smart" algorithm by $O((\mbox{polylog}\,n)/n)$, and the "stupid" repair part fails with conditional probability $O((\mbox{polylog}\,n)/n)$, too.Branching system
As is often the case with SAT, the analysis of our algorithm entails studying properties of a branching system. Of interest here is the total population size, and we need the expectation and a tail probability estimate. It is actually closer to the problem to speak of a ``discrete time queue'', but at the heart of it is (as is the case with queues) a branching system. I have written two posts (How to analyze a discrete time queue and Tail probability estimate for the busy period) on discrete-time queues in preparation for the completion of our paper. So there's not much to add here, except the one thing that needs to be dealt with almost always in the "real world" -- random variables are almost never completely independent, and also almost never identically distributed. There's generally a small amount of dependence and deviation from the ``right'' distribution, which kills the textbook- (or Wikipedia-)level theorems one would like to apply, and results in several pages of estimates, inequalities, coupling, and conditioning on the parameter regions in which it all works, leading to more estimates and inequalities. Personally, I find that these complications really are a pain: they are usually not complicated, but I never know in how much detail to write them down. The same problem also occurs at a point where we (would usually) use Wald's equation, but the summands are neither (completely) independent nor (completely) identically distributed. Fortunately, though, at this point we can revert to Optional Stopping.Wormald's ODE-method
A black-box theorem by Wormald allows to control parameters of a random process by writing down conditional expectations of their changes from one iteration to the next in the form of an initial value problem. While I am generally not a big fan of black-box theorems, I quite fancy this one. The basic idea behind the whole method is the simple trick "if you know how the conditional expectations behave, try to find a martingale and use the Azuma-Hoeffding inequality", but the way it is used to prove the ODE-theorem is quite nice.A little bit of ODE calculus
Even though this is tremendously simple from an analysis point of view, we do some computations with the system of ODEs which we obtain, and transform it into a single initial value problem by a change of parameters. The resulting IVP is still not solvable analytically (I guess), but it is a lot easier to give estimates on how the solution behaves, and also to solve it numerically, which we do in order to determine quantitative properties of the input data for which our algorithm succeeds. More specifically, the initial ratio of the number of clauses over the number of variables must be such that the solution to the IVP does not intersect a certain line before it falls below a certain other function. The line describes the point at which the queue length (and thus the length of the busy period) of the discrete-time queue crosses from finite to infinite.A tail probability estimate for the busy time in a discrete time queue
$Q(t+1) = \left\{\begin{array}{lll} Q(t) -1 + &B(t)&\text{ if } Q(t) > 0\\&A(t)&\text{ if } Q(t)=0,\end{array}\right.$
where the $A(t)$ are iid random variables distributed as $A$ and the $B(t)$ are iid random variables distributed as $B$. Moreover, $Q(0) = 0$.As in the other post, we are interested in the busy time $X$ after the start, in other words, the time between 0 and the first return of $Q(t)$ to the state 0. I already argued that, if $h(z)$ is the probability generating function of this time, and $g_A(z)$ and $g_B(z)$, are the probability generating functions of $A$ and $B$, respectively, then the following is true
$\begin{aligned} h(z) &= g_A( f(z) )\\ f(z) &= z\, g_B( f(z) )\end{aligned}$
where $f(z)$ is the probability generating function of some other random variable, which is no longer of particular interest here.
In this post, I'd like to make some statements about properties of $X$, based on the system of equations for the probability generating function.
The mean
In the other post, we obtained the formula for the mean of $X$ as a consequence of the fact that $Q(t)$ is a stationary Markov chain, for which we could write down the probability that it is in state 0. Here, I'll use the probability generating function. We have to compute $h'(1)$. By the chain rule, we obtain$\begin{aligned}f'(z) &= g_B(f(z)) + z g_B'(f(z)) f'(z)\\h'(z) &= g_A'(f(z)) f'(z),\end{aligned}$
and, assuming everything converges, we obtain$\begin{aligned}f'(1) &= 1 + g_B'(1) f'(z)\\h'(z) &= g_A'(1) \frac{1}{1-g_B'(1)}, \end{aligned}$
where, to obtain the second equation, we solved the first one for $f'(1)$. With $\lambda := \mathbb E A$ and $\nu := \mathbb E B$, this is just what we obtained from the Markov chain. At this point, note that the desired convergence of the power series $f$ and $h$ is equivalent to them having finite means, plus Abel's theorem, possibly.Tail inequality
Now I'd like to bound the probability that the server is busy for a very long time. For this, I will focus on specific distributions of $A$ and $B$, namely, I'll assume that both are Poisson distributions. Thus, the probability generating functions of $A$ and $B$ are the following:$\begin{aligned} g_A(z) &= e^{\lambda(z-1)} \\ g_B(z) &= e^{\nu(z-1)}. \end{aligned}$
Hence, we obtain$\begin{aligned} h(z) &= e^{\lambda(f(z)-1)} \\ f(z) &= z\, e^{\nu(f(z)-1)}. \end{aligned}$
The second equation can be rewritten as$z = f(z) e^{\nu(1-f(z))}, \ \ \ \ \ (*)$
of course, so we have a nice expression for the inverse $f^{-1}$ of $f$. Given that we already ``have'' the probability generating functions, we will use the exponential moment method (= exponentiate + Markov + optimize) to bound the probability:$\Pr[ X \ge \alpha ] = \Pr[ t^X \ge t^\alpha ] \le h(t)t^{-\alpha},$
which is valid for every $t \ge 1$, and $t$ is optimized at some point. The expression on the right is equal to$h(t) t^{-\alpha} = e^{\lambda(f(t)-1) -\alpha\log t}.$
We now write $s := f(t)$ and use (*). Noting that $t > 0$ is equivalent to $s > 0$, we compute$\begin{aligned}\log \Pr[ X \ge \alpha ] &\le \lambda(s-1) -\alpha\log(s e^{\nu(1-s)}) = \lambda(s-1) - \alpha\log s - \alpha\nu(1-s) \\&= (s-1)(\lambda + \alpha \nu) - \alpha\log s.&(**)\end{aligned}$
Now we optimize $s$ (later we'll have to make sure that for our choice of $s$, there really exists a $t \ge 1$). Noting that since for $s\to 0$ and $s\to\infty$ the expression (**) goes to infinity, we just have to take for $s$ the unique point where the derivative in (**) vanishes, which is$\displaystyle s = \frac{\alpha}{\alpha\nu + \lambda}$.
Summarizing, we obtain the following estimate.Proposition (Tail estimate for our discrete queue).Now, with $\alpha\to\infty$, the term under the logarithm tends to $1/\nu$, and $\log(1/\nu) < \nu-1$ because of $\nu < 1$, so that there exist $\epsilon > 0$ (depending on $\nu$) and $\alpha_0$ (depending on both $\lambda$ and $\nu$) such that $1 - \nu - \log \frac{\alpha}{\alpha\nu + \lambda} \le -\epsilon$ for all $\alpha \ge \alpha_0$. Consequently, the bound in the proposition is essentially linear, i.e., for these $\alpha$ we have$\displaystyle \Pr[ X \ge \alpha ] \le \exp\left( \left( 1-\nu - \log \frac{\alpha}{\alpha\nu + \lambda} \right)\alpha - \lambda\right).$
Corollary.We still have to verify that our choice for $s$ is feasible, i.e., there must exist a $t \ge 1$ with $f(t) = s$. Let $t(s) := s e^{\nu(1-s)}$. A swift computation shows that $t'(s) > 0$ for $s < 1/\nu$. From elementary calculus we know that this implies that for these $s$, there exists a $t$ with $f(t) = s$. But our $s$ was $\frac{\alpha}{\alpha\nu + \lambda} < 1/\nu$ because $\lambda > 0$ (unless the queue were trivial). Moreover, we have$\Pr[ X \ge \alpha ] \le e^{-\Theta(\alpha)}$.
$\displaystyle t = s e^{\nu(1-s)} \ge s( 1 + \nu(1-s) ) = (1+\nu)s - \nu s^2 > 1$
where the last inequality holds whenever $1 < s < 1/\nu$ (because the two roots of the equation $(1+\nu)s -\nu s^2 =1$ are $s=1$ and $s=1/\nu$). By increasing $\alpha_0$ if necessary, we can assume that $s > 1$ holds.Nonnegative rank and nondeterministic communication complexity
Deterministic Communication Complexity
As a digression, let us first discuss the classical deterministic communication complexity of a boolean function ${f\colon \{0,1\}^d\times\{0,1\}^d \rightarrow \{0,1\}}$. The idea is that Alice and Bob want to compute a specific value ${f(x,y)}$, but Alice only knows ${x}$ whereas Bob only knows ${y}$. Now they'll have to communicate, until one of them, say Bob, knows ${f(x,y)}$. Communication is assumed to be the ``expensive'' thing, so that is what Alice and Bob try to minimize.
The maximum over all inputs ${(x,y)}$ of the minimum number of bits Alice and Bob have to exchange before Bob knows the answer, is called the communication complexity of ${f}$.
It is important to realize that Alice and Bob have unlimited amount of storage and computing power. In particular, both of them may store the complete ${(2^d\times 2^d)}$- matrix ${M_{x,y} := f(x,y)}$ (along with all prime numbers and every book ever written), if they think it might help. (In this sense, it is like the opposite of an exam for students: Alice and Bob may use the complete solution to all exam problems, but they have to find out which problem they are supposed to solve). The question is not, how difficult is to compute ${f}$. The question is, how difficult it is to agree on which region of the matrix the parties ``are'' in.
Nondeterministic CC
Now let us discuss nondeterministic communication complexity. Here, in addition to their respective input ${x}$ and ${y}$, Alice and Bob are given a ``certificate'' for the answer ${f(x,y) = 1}$, which both can verify independently. If in their verifications, they both independently conclude that the certificate proves that ${f(x,y) =1}$, then Bob replies '1'. Otherwise, he says '0'. Bob may err in his reply, but only in saying '0' when the right answer would have been '1'. If the true answer is '0', Bob may never say '0'. (This is just like in nondeterministic computational complexity classes.) Moreover, if the true answer is '1', there must be at least one certificate with which Alice and Bob arrive at the true answer.
Obviously, for Bob to learn the result of Alice's checking of the certificate, Alice has to communicate one bit to Bob; this bit is generally ignored. The ``nondeterministic communication'' is then the coding length of the certificate. Since Alice and Bob may store all possible certificates which they might ever occur, this is the same as the coding length of the number of certificates needed for the function ${f}$.
The only kind of yes/no questions Alice and Bob can answer independently are of the form ${x\in A}$ (for Alice), or ${y\in B}$, respectively (for Bob). Thus, certificates amount to rectangles.
Suppose that we have a number of submatrices ${R_1,\dots,R_q}$ of ${M}$, such that every entry in every ${R_j}$ is one. These matrices are supposed to cover all the 1s in ${M}$ in the sense that for every ${(x,y)}$ if ${M_{x,y} =1}$, then at least one of the ${R_j}$ contains the ${(x,y)}$-entry. The usual word for such submatrices is 1-rectangle (or just rectangle), and ${R_1,\dots,R_q}$ with said properties is called a rectangle covering. when Alice and Bob receive their inputs ${x}$ and ${y}$, respectively, if the right answer would be '1', the certificate could be any of the rectangles containing the entry ${M_{x,y}}$. If, however ${M_{x,y} = 0}$, it does not matter which certificate is given -- Bob will always reply '0'.
Thus, determining the finding the nondeterministic communication complexity of ${M}$ amounts to fining a rectangle covering with minimum number ${q}$ of rectangles. The nondeterministic communication complexity is then the coding length of ${q}$, i.e., ${\approx \log_2 q}$.
Note that ``rectangle covering'' makes sense for all 0/1-matrices --- not just square matrices with the number of rows/columns a power of two.
Rectangle covering as a coloring problem
Let us make this even a bit more combinatorial. Let ${M}$ be an ${(m\times n)}$-matrix ${M}$, and denote by ${\mbox{rc}(M)}$ its rectangle covering number, i.e., the minimum number of 1-rectangles needed to cover all 1-entries of ${M}$.
We define the rectangle graph ${G_M}$ as follows. The vertex set ${V(G_M)}$ is the set of 1-entries of ${M}$; two such entries are adjacent, if the rectangle they span contains a 0. Say, ${M_{k,\ell} = M_{k',\ell'} = 1}$, so we have the two vertices ${(k,\ell)}$ and ${(k',\ell')}$. They are adjacent, if and only if, ${M_{k,\ell'} M_{k',\ell} = 0}$.
Denoting by ${\chi}$ the usual (vertex-) chromatic number, the following is true.
Lemma 1Proof. For a proof, it suffices to realize that a 1-rectangle in $M$ is an independent set in $G_M$, and that, by the definition of the edges of $G_M$, for every independent set $U$ in $G_M$, the smallest rectangle which contains $U$ is a 1-rectangle.
$\displaystyle \chi(G_M) = \mbox{rc}(M) $
I have been unable to find this fact elsewhere in published literature. The proof, although quite easy, can be found in a forthcoming paper.
So nondeterministic CC amounts to coloring a certain graph defined by the boolean function.
Connection with nonnegative rank (and extension complexity)
Now that the basics are clear, let us come back to the original question. Let ${A}$ be a nonnegative matrix and ${M}$ ist support-matrix, i.e., ${M=}$ is the 0/1-matrix with ${M_{k,\ell} = 1}$ if and only if ${A_{k,\ell} \ne 0}$. The following fact was already known to Yannakakis (1991).
Lemma 2Proof. The nonnegative rank of $A$ is the minimum number of nonnegative rank-1 matrices whose sum equals $A$. In other words, there exist nonnegative vectors $u_k, v_k$ $k=1,\dots,\mbox{rk}_+(A)$ such that
$\displaystyle \mbox{rc}(M) \le \mbox{rk}_+(A). $
In the context of the extension complexity of a polytope ${P}$, the rectangle graph can be defined as follows. The vertices of ${G}$ are the pairs ${(x_k,(a_\ell,\alpha_\ell))}$, where ${x_k}$ is a point in ${P}$ and the inequality ${a_\ell^Ty \le \alpha_\ell}$ is valid and supporting for ${P}$, such that ${x_k}$ does not lie on the hyperplane defined by ${(a_k,\alpha_k)}$. Two such pairs are adjacent in ${G}$, if at least one of the two points does lie in one of the two hyperplanes. For this graph ${G}$ we have
Lemma 3
$\displaystyle \chi(G) \le \mbox{xc}(P). $
How to use LaTeX math formulas on blogspot
On Blogspot, I can use MathJax, which produces beautiful math formulas. (That's why I switched from Wordpress, where the formulas are super ugly.)
Let's have a simple formula, say $x=0$, and let's also try something complicated like $\sum_{k=0}^\infty \frac{1}{k^2}$. Ok.
What about displaystyle?
$$ \int_{-\infty}^\infty e^{-x^2} \, dx. $$
What about aligned things:
$$ \begin{aligned}
x&=y & a&=b \\
u&=v & c&=d &\text{and with a text, too.}\\
\end{aligned}
$$
Did that work, too?
For $j=1,\dots,n$ and $t\ge 0$, let $X_j(t)$ be 0 if bin $j$ is empty after the $t$th ball has been thrown, and 1 if it is non-empty. Let $$ \begin{aligned} X(t) &= \sum_{j=1}^n (1-X_j(t)) &&\text{the number of empty bins after the $t$th ball has been thrown, and}\\ Y(t) &= \sum_{j=1}^n X_j(t) &&\text{the number of non-empty bins after the $t$th ball has been thrown.}\\ \end{aligned} $$ $$ \begin{aligned} \mathbb{E}X(t) &= n(1-\tfrac{1}{n})^t \\ \mathbb{E}Y(t) &= n\Bigl( 1 - (1-\tfrac{1}{n})^t )\\ \end{aligned} $$