Ramsey numbers from pseudorandom graphs

One of the foundational results in modern combinatorics is Ramsey’s theorem which states that for every positive integers s, t there exists a constant n_0 such that for all n \geq n_0, every 2-coloring of edges of the complete graph K_n has a monochromatic copy of either K_s or K_t. The smallest n_0 for which the statement holds true is then known as the Ramsey number R(s, t); that is, for all n \geq R(s, t) the statement above is true and there exists a 2-coloring of the edges of the complete graph on R(s, t) - 1 vertices that avoids monochromatic K_s and K_t. We know the exact values of R(s, t) only for some small values of s and t. A famous example is R(3,3) = 6, while despite decades of research we only known that 43 \leq R(5,5) \leq 48.

Determining the Ramsey numbers for larger values has been widely open since 1940’s. It has become one of the central problems of mathematics to determine the asymptotics of R(t, t) with t \rightarrow \infty and R(s, t) with s fixed and t \rightarrow \infty. An easy upper bound follows from a nice inductive prove due to Erdős and Szekeres (1947), and it shows that R(s, t) \leq \binom{s + t - 2}{s - 1}. In particular, this implies R(t,t) = O(4^t/\sqrt{t}) and R(s, t) = O(t^{s - 1}) (with the constant depending on s). The best upper bounds that we know are not that far away from these. David Conlon proved in 2016 that there exists a constant c for which

R(t, t) \leq \frac{4^t}{t^{-c \log t/ \log \log t}} .

Ajtai, Komlós and Szemerédi proved in 1980 that for every fixed s there exists a constant c_s such that

R(s, t) \leq c_s \frac{t^{s - 1}}{(\log t)^{s - 2}}

For lower bounds, Erdős proved using a probabilistic argument that

R(t, t) \geq (1 - o(1)) \frac{t}{\sqrt{2} e} \sqrt{2}^t

and the only improvement (since 1947) in that lower bound has been by a factor of 2.

In case of R(s, t), with s fixed, we know the following lower bound by Bohman and Keevash from 2010 that is again proved probabilistically by carefully analysing a natural random K_s-free process:

R(s, t) \geq c'_s \frac{t^{(s + 1)/2}}{(\log t)^{(s + 1)/2 - 1/(s - 2)}}.

Ignoring the log factors in the bottom, we have

c'_s t^{(s + 1)/2 - o(1)} \leq R(s, t) \leq c_s t^{s - 1 - o(1)} .

Any improvement in the powers of t here would be ground-breaking. What I want to talk about in this post is a recent work of Dhruv Mubayi and Jacques Verstraete that offers us a new way of possibly improving the lower bound. Briefly, they show that if the main conjecture on pseudorandom graphs that I mentioned in an earlier post is true, then R(s, t) \geq c''_s t^{s - 1 - o(1)} for some constant c''_s. Therefore, the existence of the optimal K_s-free pseudorandom graphs would give us the correct polynomial in R(s, t).

Here is how their proof goes. Firstly, observe that by taking one of the colour classes to define the edges of a graph, we need to construct an n vertex graph G that has no K_s as subgraphs and has independence number < t to show R(s, t) \geq n + 1. The main result that they prove is as follows:

Theorem 1. Let If there exists a K_s-free (n, d, \lambda)-graph, and n is large enough, then for t = \lceil 2n \log^2 n/d \rceil, we have

R(s, t) \geq c_s \frac{n}{\lambda} \log^2 n .

for some constant c_s.

The proof of Theorem 1 is not too hard if you are comfortable with the probabilistic method and you know that you should use Theorem 2.1 from Alon-Rödl. It can be summarised as: “pick each vertex uniformly at random with probability \log^2 n/(2e^2 \lambda) and in the induced subgraph remove one vertex from each independent set of size t”.

If the conjecture regarding K_s-free pseudorandom graphs is true, then for every integer s \geq 3 there exists an infinite sequence of (n, d, \lambda)-graphs with d = \Omega(n^{1-1/(2s - 3)}) and \lambda = O(\sqrt{d}). Consider such a graph G for n large enough. From t = \lceil 2n \log^2 n/d \rceil and n = O(d^{(2s - 3)/(2s - 4)}) we get d = \Omega(t^{2s - 4}/(\log t)^{2(2s - 4)}). Since \lambda = O(\sqrt{d}), from Theorem 1 we have R(s, t) \geq c_s \frac{n}{\lambda} \log^2n = c_s t \frac{d}{2 \lambda} = \Omega(t \sqrt{d}), and from the previous expression for d in terms of t we get

R(s, t) = \Omega \left(\frac{t^{s - 1}}{\log^{2s - 4} t} \right).

So, all we need to do is construct these K_s-free pseudorandom graphs. The best construction that we have so far (discussed here) has edge density \Theta(n^{-1/(s - 1))}. Sadly, the lower bound that this construction gives us on R(s, t) is c_s t^{s/2}/(\log t)^{s - 2}, which is worse than the probabilistic bound of Bohman and Keevash. In general, if we can construction K_s-free optimally pseudorandom graphs with density n^{-1/f(s)}, we will get the bound

R(s, t) \geq c_s \frac{t^{(f(s) + 1)/2}}{\log^{f(s) - 1} t}

Therefore, even a construction with density n^{-1/(s + 1)} (or in fact n^{-1/(s+\epsilon)}) would already break the Bohman-Keevash lower bound! That should be doable, right? I leave it as an open problem to the readers.

UPDATE 15th Oct 2019: Xiaoyu He has shown that the existence of these optimal K_s-free graphs for all s would also determine the multicolour Ramsey numbers R(s_1, \dots, s_k, t) with s_i fixed for all i and t \rightarrow \infty, up-to a poly-logarithmic factor.

Posted in Combinatorics, Extremal Combinatorics | Tagged , , , , , , | Leave a comment

Spectral proofs of theorems on the boolean hypercube

In a recent breakthrough Hao Huang proved the sensitivity conjecture, that had remained open for past 30 years despite some serious effort from various computer scientists and mathematicians. The proof can be described in a single tweet (if you have the basic background), and it has appeared on the ELI5 page of reddit. The main ideas behind the proof are from spectral graph theory, and one of the starting points (see for example this comment) is the following bound on the independence number of a graph due to Cvetković:

Lemma 1 (Inertia Bound). Let G be a graph on N vertices and let \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_N be its eigenvalues. Let N^+ be the number of non-negative eigenvalues of G, and N^- the number of non-positive eigenvalues (counted with multiplicity). Then \alpha(G) \leq \min \{N^+, N^-\}.

The proof of this Lemma is a nice application of a classical result in mathematics known as Cauchy’s interlacing theorem (I have talked about eigenvalue interlacing earlier in this blog), and I leave it to the reader to fill in the details*. This spectral upper bound on the independence number of a graph is relatively less known than Hoffman/Delsarte bound which says \alpha(G) \leq (-\lambda_N/(\lambda_1 - \lambda_N)) N. Since many combinatorial problems can be rephrased as finding a nice upper bound on the independence number of some graph, such spectral results can be quite useful, when one has enough structure to say something about the eigenvalues. Until recently, I didn’t see any applications of the inertia bound in extremal combinatorics, whereas Hoffman’s bound is widely used; for example, in proving Erdős-Ko-Rado theorems. Interestingly, the classical Erdős-Ko-Rado theorem can also be proved using the Lemma 1 (as I learned from Hao Huang’s talk at our combinatorics seminar).

One interesting observation made by Huang regarding Lemma 1 (which was already known) is that instead of using the eigenvalues of the adjacency matrix of the graph, we could use any pseudo-adjacency matrix and still arrive at the same conclusion. A pseudo-adjacency matrix of a graph G on N vertices is any N \times N real symmetrix matrix A with the property that A_{ij} = 0 whenever the i-th vertex is non-adjacent to the j-th vertex (in an adjacency matrix we also require A_{ij} = 1 for every adjacent pair). If N^+, N^- are the number non-negative/positive eigenvalues of A, then from the proof of Lemma 1 using interlacing we again get \alpha(G) \leq \min \{N^+, N^-\}.

The flexibility of using the pseudo-adjacency matrix is crucial to Huang’s breakthrough result. In fact, before the proof of the sensitivity conjecture, Huang (in collaboration with Oleksiy Klurman and Cosmin Pohoata) used the same method to give a spectral proof of another theorem on the hypercube, known as Kleitman’s theorem, which is what I want to discuss in this blog post. In particular, I want to give a slightly simplified proof due to Aditya Potukuchi and Amey Bhangale, that my friend Aditya shared with me a few days after Huang’s sensitivity paper appeared on arXiv as an application of his methods, without realising that a spectral proof of this theorem had already appeared in an earlier paper of Huang. Here is the theorem that we will prove:

Theorem 2 (Kleitman). Let S be a set of vectors in \{0, 1\}^n such that the hamming distance between any two vectors in S is at most d. Then

\displaystyle |S| \leq \begin{cases} \sum_{i = 0}^t \binom{n}{i}, \text{ if } d = 2t \\ 2(\sum_{i = 0}^t \binom{n - 1}{i}), \text{ if } d = 2t + 1\end{cases}.

(This can be seen as the discrete analog of the isodiametric theorems in real geometry where one wants to find the largest volume of a set of points with a given diameter.)

To use Lemma 1 above, we will naturally consider the graph G on Q_n = \{0, 1\}^n where two vectors are adjacent if the hamming distance between them is at least d + 1, because then any independent set in this graph corresponds to such a set S in the theorem. What are the eigenvalues of the graph G?

The boolean hypercube, Q_n, and the graphs defined on it using hamming distance are well studied objects. One way of computing the eigenvalues of G is to look at it as the Cayley graph Cay(\mathbb{Z}_2^n, \mathbb{Z}_2^n \setminus B_d^n), where B_d^n is the Hamming ball of radius d, that is, all vectors whose Hamming weight is at most d. From the basic theory of Cayley graphs, it then follows that G has n + 1 distinct eigenvalues \lambda_0, \lambda_1, \dots, \lambda_n with

\displaystyle \lambda_i = \sum_{k = d + 1}^n \mathcal{K}_k^n(i)

appearing with multiplicity \binom{n}{i}, where \mathcal{K}_k^n(i) is the Kravchuk polynomial evaluated at i, defined as

\displaystyle \mathcal{K}_k^n(x) = \sum_{j = 0}^k (-1)^j \binom{x}{j} \binom{n - x}{k - j}.

Note that \mathcal{K}_k^n(x) is a polynomial of degree k. For ease of notation, we will skip the superscript from now on and just write \mathcal{K}_k instead. These polynomials have several nice properties, and we will use the following symmetry (or should it be called anti-symmetry?):

\displaystyle \mathcal{K}_k(i) = (-1)^i \mathcal{K}_{n - k}(i)

which lets us see the evaluation of a degree k polynomial at i as the evaluation of a degree n - k polynomial at i, up-to a factor of (-1)^i. This fact will come in handy later.

Now if you apply Lemma 1 to G using these eigenvalues, you will realise (after some calculation) that it does not give the required bound (what bounds do you get?). That’s where pseudo-adjacency matrices come into picture. We will define a weight function w:[n] \rightarrow \mathbb{R}, such that every vector with i non-zero values is given the weight w(i), and w(i) = 0 for all i \leq d. From this weight function we will get the pseudo-adjacency matrix A of G where A_{ij} = w(k) if the Hamming distance between the i-th and the j-th vectors is equal to k. Again, from the basic theory of Cayley graphs (in particular, calculating Fourier coefficients), or more directly (Lemma 2.3 here), we get that A has n + 1 distinct eigenvalues,

\displaystyle \lambda_i = \sum_{k = d + 1}^n w(k) \mathcal{K}_k(i)

with multiplicity \binom{n}{i}, with i = 0, 1, \dots, n. Now, we just have to choose the function w cleverly so that either N^+, or N^-, is at most what we need it to be. Here is when Huang, Klurman and Pohoata make the clever choice of w(k) = \binom{\lfloor (k - 1)/2 \rfloor}{t}, for the case of d = 2t. For the odd case, d = 2t + 1, there is a different choice, but from now on we will only stick to the even case of Theorem 2. From this choice, they get the following three properties,

  • (-1)^i \lambda_i > 0, for i = 0, \dots, t
  • \lambda_{n - i} = \lambda_{i + 1}, for i = 0, \dots, t - 1
  • \lambda_{t + 1} = \lambda_{t + 2} = \dots = \lambda_{n - t} = (-1)^{t + 1},

from which the result then follows by an easy analysis. The way they prove these properties is using generating functionology, and it is not immediately clear how the choice of w(k) came about. What Aditya and Amey do instead, is start with the properties that one wants from the eigenvalues, and then show the existence of the function w, without ever computing it explicitly!

Here is how this is done. Observe that using the symmetric property of Kravchuk polynomials, we can write

\displaystyle \lambda_i = \sum_{k = 2t + 1}^n w(k) \mathcal{K}_k(i) = (-1)^i \sum_{k = 2t + 1}^n w(k) \mathcal{K}_{n - k}(i)

In the second formulation, we can see \lambda_i as (-1)^i multiplied with a linear combination of polynomials of degrees n - 2t - 1, n - 2t - 2, \dots, 1, 0, evaluated at i. These polynomials are in fact linearly independent, and hence any polynomial f(x) of degree at most n - 2t - 1 can be written as a linear combination of them. Therefore, we can choose a nice f(x), for which the eigenvalues will be \lambda_i = (-1)^i f(i), and then get the weights w(k)‘s automatically.

Pick f(x) as the unique monic polynomial whose roots are t + i + 0.5, for i = 1 to n - 2t - 1. WLOG, assume that f(0) > 0 (if not, then we can take -f). Since f is a polynomial, the sign of its values oscillates between the roots. In particular, this shows that \lambda_i = (-1)^i f(i) has the same sign for i = t + 1, \dots, n - t, that is, \lambda_i > 0 if t is odd (since f(0) and f(t+1) have the same sign), and \lambda_i < 0 if t is even, for all of these i‘s. In the former case, we can carefully count the number of negative eigenvalues with multiplicities, and in the latter case we can count the number of positive eigenvalues with multiplicities, giving us Theorem 2.

Note that as far as the signs of \lambda_i‘s are concerned, this trick of picking the polynomial f explicitly, and then getting the weights, gives us the same three properties as we had earlier where Huang, Klurman and Pohoata made the explicit choice of the weight function. For the case of Theorem 2 when d = 2t + 1, we have to do something different, and I leave it to the reader to figure that case out.

For some open questions that can potentially be solved using these methods, check out Section 5 of the Huang-Klurman-Pohoata paper.

(*) Fedor Petrov has given a “low-level” proof that avoids using the interlacing theorem (congrats to Fedya for his new blog!).

For discussions on Huang’s proof of the sensitivity conjecture, see the following blog posts: Terry Tao, Gil Kalai, Scott Aaronson, Ken Reagan.

Posted in Combinatorics, Extremal Combinatorics, Spectral Graph Theory | Tagged , , , , , , , | 5 Comments

Pseudorandom clique-free graphs

Pseudorandom graphs are graphs that in some way behaves like a random graph with the same edge density. One way in which this happens is as follows. In the random graph G(n, p), with p = p(n) \leq 0.99, a direct application of Chernoff bound implies that the probability of the following event approaches 1 as n approaches infinity:

|e(S, T) - p|S||T|| = O(\sqrt{pn |S||T|})

where S,T are arbitrary subsets of vertices and e(S,T) denotes the number of edges with one end vertex in S and the other one in T.  Note that p|S||T| is the expected number of edges between S and T in this model, and \sqrt{p(1 - p)|S||T|} is the standard deviation. Now let G be a d-regular graph on n-vertices and let \lambda be the second largest eigenvalue of G in absolute value (these are referred to as (n, d, \lambda)-graphs). Then the following is true for any two subsets S, T of vertices:

|e(S,T) - (d/n) |S||T|| \leq \lambda \sqrt{|S||T|}.

where \lambda is the second largest eigenvalue of the adjacency matrix of the graph, in absolute value. Therefore, if \lambda is small, and in particular close to being O(\sqrt{d}), then the graph mimics the behaviour of G(n, d/n). In fact, for any (n, d, \lambda)-graph, with d < n/2, one can show by looking at the square of the adjacency matrix that \lambda = \Omega(\sqrt{d}). The graphs where \lambda = \Theta(\sqrt{d}) are known as optimally pseudorandom.

Pseudorandom graphs have found several applications over the last few decades, and there are many interesting questions about them (see the survey of Krivelevich and Sudakov). In a 1994 paper, Noga Alon constructed a family of optimally pseudorandom triangle free graphs on n-vertices, with d/n = \Omega(n^{-1/3}), that he then used to give explicit bounds on some Ramsey numbers, and to show that the maximum possible Euclidean norm of n unit vectors in \mathbb{R}^n with the property that among any three of them two are orthogonal, is equal to \Theta(n^{2/3}). More applications of this construction can be found in the recent survey paper of Noga based on a talk he gave at the conference celebrating the 70th birthday of László Lovász (which is where I learned about these graphs, and the main topic of this post).

Alon’s construction is in fact optimal in the sense that there existence a constant C > 0, such that any optimally pseudorandom graph on n vertices with d/n > Cn^{-1/3} must contain a triangle. This can be generalised to cliques of size k, where we have a constant C > 0 such that any optimally pseudorandom (n, d, \lambda)-graph with d/n > C n^{-1/(2k - 3)} must contain K_k. The proof of this follows from greedily picking common neighbours, using the following lemma (that can be proved using the edge distribution proposition above):

If S is a set of vertices with |S| \geq 2n\lambda/d, then there are at least d|S|^2/4n edges with both end points in S.

The natural question now is to give matching constructions for this bound for all k > 3, or improve the bound. This question has been asked by several people, as it arises naturally in many situations, but it has been open for every k > 3 since the past 20 years or so. David Conlon has called it “one of the outstanding open problems about pseudorandom graph” (also check out this video).

The best known construction so far for larger values of k was the construction of Alon and Krivelevich that gives us optimally pseudorandom graphs with edge density d/n = \Theta(n^{-1/(k - 2)}. Note that this starts giving a better construction than Alon’s triangle free graphs at k = 6. In my recent work with Ferdinand Ihringer and Valentina Pepe, we have been able to provide a better construction, that gives a family of graphs with edge density d/n = \Theta(n^{-1/(k - 1)}). The construction is fairly easy to describe, and for a proof you can refer to the paper:

Let Q(x_1, \dots, x_k) = x_1^2 + \xi x_2^2 + \sum_{i = 3}^k x_k^2 be a quadratic form over \mathbb{F}_q where \xi is a non-square in \mathbb{F}_q (we assume q to be odd). Define a graph with vertices as the 1-dimensional subspaces x of \mathbb{F}_q^k, for which Q(x) is a square, and making two vertices x and y adjacent if they are orthogonal to each other, that is,
\frac{1}{2}(Q(x + y) - Q(x) - Q(y)) = x_1 y_1 + \xi x_2y_2 + \sum_{i =3}^n x_i y_i = 0.

Then this graph is a K_k-free (n, d, \lambda)-graph with n = (1 + o(1))q^{k-1}/2, d = (1 + o(1))q^{k - 2}/2 and \lambda = \Theta(q^{(k - 2)/2}).

 

While we now have slightly better constructions, we are still far away from the conjectured bound. I am hopeful though that finite geometry, and especially the geometries associated with quadratic forms (known as polar spaces), can play an important role in obtaining even better constructions. In fact, there is a construction due to Kopparty  for triangle-free graphs matching the parameters of Alon’s construction that essentially comes from a generalized quadrangle (which is a polar space), if you look at it carefully (see my earlier post for hints on that). Moreover, Conlon was able to give an (almost) optimal probabilistic construction which also uses generalized quadrangles. This a strong indication that polar spaces can be useful in general. As a first step, we should perhaps try to obtain optimally pseudorandom K_4-free graphs that have higher edge density than n^{-1/3}.

Edit 04/09/2019: An exciting new result by Mubyai and Verstraete shows that if the main conjecture of this post is true, that is, there exists K_k-free pseudorandom graphs of density n^{-1/(2k - 3)}, then this would asymptotically determine off-diagonal Ramsey numbers R(k, t), with k fixed and t \rightarrow \infty. In fact, even an improvement of the denominator from k - 1 (which is in our construction) to k + 1 would be a big development as it’ll improve the current lower bounds on Ramsey numbers. See the first half of this talk of Mubayi.

Posted in Combinatorics, Extremal Combinatorics, Spectral Graph Theory | Tagged , , | 2 Comments

Ryser’s conjecture

I am on a research visit in Rome, working with Valentina Pepe, and our joint paper on Ryser’s conjecture is on arXiv now. So this seems like the right time to talk about the conjecture and the problems related to it that I have been obsessing over for past few months.

In graph theory, the classical result of Kőnig says that in any (finite) bipartite graph the minimum number of vertices that cover all edges (the so-called vertex cover number) is equal to the maximum number of pairwise disjoint edges in the graph (matching number). This result is clearly not-true for non-bipartite graphs (think of some obvious counter examples), where the best one can do is have the vertex cover number equal to twice the matching number since the vertices contained in any maximum matching must cover all the edges. Ryser’s conjecture is a proposed generalisation of this result to r-partite hypergraphs, that is, for hypergraphs whose vertex set can be partitioned into r parts (let’s call them sides from now on), such that each edge of the hypergraph contains a unique vertex from each of the sides. In particular, every r-partite hypergraph is r-uniform, that is, each edge has exactly r vertices in it. If we have any r-uniform hypergraph \mathcal{H} (not necessarily r-partite), then the following follows as before, \tau(\mathcal{H}) \leq r \nu(\mathcal{H}), where \tau denotes the vertex cover number and \nu the matching number. Ryser’s conjectured that if \mathcal{H} is r-partite, then we must have \tau(\mathcal{H}) \leq (r - 1) \nu(\mathcal{H}). This conjecture first appeared in the Ph.D. thesis of Ryser’s student, J.R. Henderson, and it has often been misattributed to a 1967 paper of Ryser (see this).

Despite the time span of about 50 years, our current knowledge about this conjecture is abysmal. The only other case of this conjecture which is known to be true in general is r = 3, which was proved by Aharoni using topological methods! If we impose some further restrictions on the hypergraph, then we know a bit more, but not much. The conjecture is true for intersecting hypergraphs (matching number equal to 1) if r \leq 5, as proved by Tuza, and for r \leq 9 if the hypergraph is also linear, as proved in the recent paper by Francetić, Herke, McKay and Wanless.

In view of our inability to prove this conjecture, some natural questions to ask are, (1) is it even true?, and (2) why is it so hard to prove?. While it’s quite possible that the conjecture is false, let’s focus on (2) for now. Sometimes what makes an extremal problem in combinatorics hard to prove is that there are many different kinds of extremal examples, and a “combinatorial” proof must somehow consider all of these examples (I should thank Tibor Szabó for this intuition). So let’s see what we know about hypergraphs meeting the bound in Ryser’s conjecture.

Until recently, the only known family of r-partite hypergraphs with vertex cover number equal to r - 1 times the matching number, called r-Ryser hypergraphs, came from finite projective planes of order r - 1 (which is what sparked my interest in this problem), and hence they were known to exist whenever r - 1 is equal to a prime power (2, 3, 4, 5, 6, 8, 9, 10, \dots). Once you know the definition of projective planes, the construction is easy: remove a point and all lines through it. These hypergraphs are hence known as truncated projective planes. Note that this gives us intersecting r-Ryser hypergraphs, and to get such hypergraphs with matching number \nu one can simply take \nu disjoint copies of the intersecting hypergraphs (more on this later!). Inspired by the lack of examples, several people gave constructions for small values of r where projective planes did not exist (or were not known to exist), and then Abu-Khazneh, Barát, Pokrovskiy and Szabó, came up with a clever construction which gave a new infinite family of intersecting r-Ryser hypergraphs whenever r - 2 is a prime power. In fact, they were able to construct many non-isomorphic examples of such hypergraphs! Note that while it is known that there are plenty of non-isomorphic projective planes of a given order, it is not clear what the rate of growth of this function is, and that’s a fascinating problem on its own. Another interesting family of r-Ryser hypergraphs was obtained by Haxell and Scott, whenever both (r - 1)/2 and (r + 1)/2 are prime powers (whether this gives infinitely many new values of r for which we have a Ryser hypergraph or not is in fact related to a nice open problem in number theory, which a careful reader should be able to deduce :)). Both of these constructions rely on finite projective or affine planes.

Valentina and I have constructed new infinite families of non-intersecting r-Ryser hypergraphs, whenever r - 1 is a prime power bigger than 3, which looks fundamentally different from just taking disjoint copies of intersecting Ryser hypergraphs. The condition on r being at least 4 cannot be relaxed since a result of Haxell, Narins and Szabo, says that every 3-Ryser hypergraph is essentially obtained by taking disjoint copies of intersecting 3-Ryser hypergraphs. For r = 4 it was already known that such a characterisation cannot be true, because of a computer-generated example of Abu-Khazneh. Our constructions show that for any r \geq 4 and \nu > 1, such that r - 1 is a prime power, there exists an r-Ryser hypergraph with matching number equal to \nu which does not contain two vertex disjoint copies of intersecting r-Ryser hypergraphs.

The constructions are again finite geometric, and we were quite happy about the fact that many non-trivial results on blocking sites of finite projective planes came into play when proving that these hypergraphs are Ryser extremal. Here is a description of the first family, with \nu = 2, along with a picture:

Ryser

Let \pi_1 and \pi_2 be two copies of classical projective planes of order q with a common point P, which will be truncated at the the points Q_1 and Q_2. Let \mathcal{C} be a conic in the second plane passing through q, and v an extra vertex. The edges of the hypergraph are (1) all lines in the first plane not through Q_1 or P, and a line \ell through P which does not contain Q_1; (2) all the lines of the second projective plane not through Q_2 that contain a point of the conic \mathcal{C}; (3) two new (weird) edges e_1 and e_2 with e_1 = \ell \setminus \{P\} \cup \{R\} and e_2 = C \setminus \{Q_2\} \cup \{v\}.

The fact that this is a (q + 1)-Ryser hypergraph of matching number 2 with the required properties, whenever q is an odd prime, is proved in the paper. For other values of q, there is a second, more involved, construction (which also comes with a picture!).

Our new constructions show that the non-intersecting Ryser hypergraphs can have a richer structure, and perhaps it’ll be useful to construct more such hypergraphs for either disproving the conjecture or understanding the extremal cases better. Some other interesting questions, that have also been mentioned before by others, are as follows:

Open problem 1: Find the minimum number of edges an r-Ryser hypergraph can have. It is not known, but conjectured, that linearly many edges should suffice.

Open problem 2: What is the largest vertex cover number of an r-partite intersecting hypergraph, if the “trivial” covers containing a side or an edge are not allowed?

This seems to be related to the problem of finding the smallest non-trivial blocking set in finite projective planes.

Open problem 3: Prove, or disprove!, Ryser’s conjecture.

Posted in Combinatorics, Extremal Combinatorics, Finite Geometry, Incidence Geometry | Tagged , , , , , , , , | 1 Comment

Wenger graphs

A central (and foundational) question in extremal graph theory is the forbidden subgraph problem of Turán, which asks for the largest number of edges in an n-vertex graph G that does not contain any copy of a given graph H as its subgraph. This number is denoted by ex(n, H) and it is called the Turán number of the graph H. While the Erdős-Stone theorem has solved this problem asymptotically whenever H is non-bipartite, the case of bipartite graphs is still wide open. For example, when H is isomorphic to the cube graph Q_3, all we know is that c_1 n^{3/2} \leq ex(n, Q_3) \leq c_2 n^{8/5}, for some constants c_1, c_2 and large enough n. The two most important cases in this problem are when H is a complete bipartite graph or an even cycle. In this post we will focus on the latter (see this for a survey), where graphs derived from finite geometries give us the best known extremal constructions for small cycles.

It was proved by Bondy and Simonovits that ex(n, C_{2k}) = O(n^{1 + 1/k}), but this bound is known to be (asymptotically) sharp only for the case of k = 2, 3 or 5; so in particular we do not even know what ex(n, C_{8}) is. The sharpness for these cases follows from the existence of finite generalized n-gons of order q for every prime power q and n = 3, 4, 6. These are point-line geometries introduced by Jacques Tits in 1959, and an easy graph theoretic definition of these objects is as follows:

A generalized k-gon of order q, for k,q \geq 2, is a point-line geometry whose incidence graph is (q+1)-regular, has diameter k and girth 2k.

One can count the total number of points/lines in a generalized k-gon in terms of the parameter q, which tells us that the incidence graph has \Theta(q^{k-1}) vertices and \Theta(q^{k}) edges. Since by definition, such a graph is C_{2k - 2} free, we get examples of C_{2k - 2}-free graphs on n vertices with \Theta(n^{1 + 1/(k-1)}) edges, whenever such a structure exists. Sadly, it is known that such generalized k-gons can only exist for k \in \{3, 4, 6\}, and in these cases they are only known to exist when q is a power of a prime. Using the density of prime numbers these objects can be used to prove that ex(n, 2k) = \Theta(n^{1 + 1/k)}) for k = 2, 3, 5.

The k = 3 case is simply a finite projective plane of order q, and Tits had already shown the existence (and many examples) for the other cases. Some special cases of generalized 4-gons and 6-gons were rediscovered by Benson in 1966 (and in the combinatorics community sometimes he’s the one who is cited for constructing these extremal graphs).

While it’s quite easy to construct generalized 4-gons of order q using zeros of a non-degenerate quadratic form in the 4-dimensional projective space over \mathbb{F}_q (these objects are a special case of polar spaces), the construction of a generalized 6-gon is quite involved. As an attempt to simplify this situation, in 1991 Wenger constructed a family of bipartite graphs H_k(q) for integers k \geq 2 and prime power q, with 2q^k vertices and q^{k + 1} edges, such that the graphs H_2(q), H_3(q) and H_5(q) did not have any C_4, C_6 and C_{10}, respectively. His construction and the proof of the cycle freeness involved some simple algebra over the finite fields. Later on his graphs were studied extensively, from various perspectives, and it was realised (I am not sure exactly when, but it’s at least mentioned here) that H_3(q) is in fact just an induced subgraph of the incidence graph of a well-known generalized quadrangle (the quadric Q(4,q)) and H_5(q) is isomorphic to a homomorphic image of the incidence graph of the only known generalized hexagon of order q, the split Cayley hexaon. (From the definition of H_2(q) it will be pretty clear that it is a q-regular induced subgraph of the incidence graph of the Desarguesian projective plane.)

In this post, we will see how H_3(q) is directly related to a particular family of generalized quadrangles introduced by Tits (which first appeared in Dembowski, 1968), known as the T_2(O) generalized quadrangles. These quadrangles are in fact isomorphic to Q(4,q) when q is odd, or in general when O is an irreducible conic (which will be the case corresponding to Wenger graphs). Let’s start with the definition of Wenger graphs.

Construction 1: Let P and L be two copies of \mathbb{F}_q^k. Then H_{k}(q) is the bipartite graph defined on P and L by making two vertices p = (p_1, \dots, p_k) and \ell = (\ell_1, \dots, \ell_k) adjacent if the following equations are satisfied:

p_2 + \ell_2 = p_1 \ell_1
p_3 + \ell_3 = p_1 \ell_1^2
\dots
p_k + \ell_k = p_1\ell_1^{k-1}

The original equations used by Wenger were a bit different, but it can be (and has been) shown that the graph we get is the same. One of the first thing we notice in these defining relations is that once you have fixed (p_1, p_2, \dots, p_k), every value of \ell_1 uniquely determine the vector (\ell_2, \dots, \ell_k), and vice versa. We can thus conclude that this graph is q-regular and thus it was q^{k + 1} edges, since each side of the bipartite graph has size q^k. Therefore, if this graph was C_{2k} free, then this will be an extremal graph with this property due to the Bondy-Simonovits upper bound. Let’s look at the smallest example k = 2.

We have (p_1, p_2) \sim (\ell_1, \ell_2) if p_2 + \ell_2 = p_1 \ell_1. Therefore, for any fixed \ell = (\ell_1, \ell_2) the set of points adjacent to \ell in H_2(q) are precisely those points which lie on the line y = \ell_1 p_1 - \ell_2, i.e., the line of slope \ell_1 through the point (0, -\ell_2) in the plane \mathbb{F}_q^2. If we identify the elements of L by these non-vertical lines of \mathbb{F}_q^2 (and identify P with the points in \mathbb{F}_q^2), then we get an isomorphic between H_2(q) and the incidence graph between the points and non-vertical lines of \mathbb{F}_q^2. The vertical lines correspond to a point P_\infty on the line \ell_\infty that one can use to obtain the projective completion of the affine space \mathbb{F}_q^2. So geometrically, we can also think of H_2(q) as the following graph:

Given the projective plane \mathrm{PG}(2,q), let \ell be a line and P a point on the line \ell. Then H_2(q) is the incidence graph between the points not lying on the line \ell and the lines which do not contain the point P, i.e., all the lines through the q points in \ell \setminus \{P\}

Alternatively, we can use coordinates and get the following description:

Let P be the set of points with (homogeneous) coordinates (1, p_1, p_2) and let L be the set of  all lines through the points with coordinates (0, 1, \ell_1) with \ell_1 \in \mathbb{F}_q. Then H_2(q) is the incidence graph between P and L

From these geometric descriptions, and the fact that through every two points there is a unique line, it is clear that H_2(q) is C_4-free. In fact, we can give a similar geometric description of all H_k(q)‘s. We first give the description involving coordinates and then take a coordinate-free approach which will in fact give a broader class of graphs.

Let P be the set of points in \mathrm{PG}(k,q) with (homogeneous) coordinates (1, p_1, \dots, p_k) and let L be the set of all lines through the points with coordinates (0, 1, \ell_1, \ell_1^2, \dots, \ell_1^{k-1}) with \ell_1 \in \mathbb{F}_q. Then H_k(q) is the incidence graph between P and L

Note that the set of points we have used to define the line set is in fact a part of the normal rational curve (a.k.a. moment curve) \{(0, \ell_1, \dots, \ell_1^{k-1}) : \ell_1 \in \mathbb{F}_q\} \cup \{(0, 0, \dots, 0, 1)\} in the hyperplane defined by the points whose first coordinate is 0 (we can call it the hyperplane at infinity and the set P as the affine points). The following map gives us the isomorphism between the two descriptions of the Wenger graph:  (p_1, \dots, p_k) \mapsto (1, p_1, \dots, p_k)  and (\ell_1, \dots, \ell_k) \mapsto \{\lambda(1, 0, -\ell_2, -\ell_3, \dots, -\ell_k) + \mu (0, 1, \ell_1, \ell_1^2, \dots, \ell_1^{k-1}) : \lambda, \mu \in \mathbb{F}_q\}.

The main property that the normal rational curve in \mathrm{PG}(k - 1, q) has, and what we will use to show cycle freeness, is that any k points on it are linearly independent. Or equivalently, any s-dimensional (projective) subspace of \mathrm{PG}(k - 1, q) intersects the curve in at most s + 1 points, where 1 \leq s \leq k - 2. The object that abstracts out this property is known as an arc, i.e., a set of points in \mathrm{PG}(k - 1, q) with the property that no k points on it lie on a common hyperplane. With this in our hand we get the following coordinate free description of Wenger graphs, which appears in [1] and [2]:

Construction 2: Let H_\infty \cong \mathrm{PG}(k - 1, q) be a hyperlpane in \mathrm{PG}(k, q) and let S be an arc of size q in H_\infty. Define H_k(q) to be the incidence graph between the point set \mathrm{PG}(k, q) \setminus H_\infty and the set of all lines of \mathrm{PG}(k, q) that contain a point of S

This is in fact a larger class of graphs than the one described by Wenger since an arc does not have to come from a normal rational curve (there exist several such families of arcs). Now that we have this geometric construction, let’s focus on the graph H_3(q).

Generalized Quadrangles and the Wenger Graph

Let’s see how H_3(q) is related to generalized 4-gons in exactly the same as H_2(q) is related to generalized 3-gons (the projective planes). For all k \geq 3, the graph H_k(q) can be shown to be C_6-free as follows: if there was a C_6 then we will have 3 lines \ell_1, \ell_2, \ell_3 in \mathrm{PG}(k, q) which pairwise intersect each other in \mathrm{PG}(k, q) \setminus H_\infty and intersect H_\infty in a point of the set S; but this will be a contradiction to the fact that S is an arc since these lines will span a plane which intersects H_\infty in a line that contains 3 points \ell_1 \cap H_\infty, \ell_2 \cap H_\infty and \ell_3 \cap H_\infty of S. Alternatively, we can do the same thing as we did before and show that H_3(q) is in fact a latex q-regular induced subgraph of the incidence graph of a generalized quadrangle of order q.

The generalized quadrangle that we will use to show this is the following one, called T_2(O), which was originally constructed by Tits in early 1960’s:

Let H \cong \mathrm{PG}(2, q) be a hyperplane in \mathrm{PG}(3,q) and let O be an oval in H, i.e., a set of q + 1 points no three of which are collinear. Define the points as (i) the points of \mathrm{PG}(3,q) \setminus H, (ii) the hyperplanes X of \mathrm{PG}(3,q) for which we have |X \cap O| = 1, and (iii) one new symbol (\infty). Define the lines as (a) lines of \mathrm{PG}(3,q) through points of O which are not contained in H and (b) the points of O. A point of type (i) is only incidence with lines of type (a), and the incidence is the natural one. A point of type (ii) is incidence with all the lines of type (a) contained in it and with the unique line of type (b) corresponding to the unique element of O contained in it. The point (\infty) is incidence with no lines of type (a) and all lines of type (b).

Now note that if from this generalized quadrangle we remove the point (\infty), a line \ell of type (b) (which corresponds to a point P of O), and all lines and points which are at distance at most 2 from (\infty) or \ell in the incidence graph of T_2(O), then what we are left with is the incidence structure defined on all the points of type (i) and those lines of type (ii) which pass through a point of S = O \setminus \{P\}This is precisely the description of the Wenger graph H_3(q)!

Now, one might want to find out H_3(q) is related to the more well known generalized quadrangle Q(4,q) (which is also the one that was constructed by Benson). This is given by the well known isomorphism between T_2(O) and Q(4,q) whenever O is an irreducible conic (see Theorem 3.2.2 in [3]), which by the seminal work of Segre, is always true for q odd.

 

Let’s end this blog post now with a list of questions, challenges and references.

Question 1: The graph H_5(q) is known to be a homomorphic image of a q-regular induced subgraph of the split Cayley hexagon H(q). Is there an easy way to see that?

Question 2: If we think of H_k(q) as a point-line geometry, then construction 2 above gives us what is known as a linear representation of the geometry described in construction 1. So H_3(q) is a linear representation of a particular subgeometry of the generalized quadrangle T_2(O), and H_k(q) is a generalization of that. What are some other interesting subgeometries (perhaps corresponding to regular induced subgraphs) of generalized polygons that can be generalized in this way to give interesting families of bipartite graphs that can be useful in Turán problems?

Question 3: Can H_4(q) be described using some well-studied geometrical structure like the generalized polygons? May be it’s related to some near polygon or a polar space?

Big Challenge: Give a finite geometrical construction of an infinite family of C_8-free graphs which have n vertices and \Theta(n^{5/4}) edges. For example, these graphs could have \Theta(q^4) and \Theta(q^5) edges.

 

References and further reading

[1] P. Cara, S. Rottey and G. Van de Voorde, A construction for infinite families of semisymmetric graphs revealing their full automorphism group.
[2] K. E. Mellinger and D. Mubayi, Constructions of Bipartite Graphs from Finite Geometries. Constructions of Bipartite Graphs from Finite Geometries. 
[3] S. E. Payne and J. A. Thas, Finite Generalized Quadrangles.
[4] F. Lazebnik and S. Sun, Some families of graphs, hypergraphs and digraphs defined by systems of equations: a survey.
[5] S. M. Cioabă, F. Lazebnik and W. Li, On the spectrum of Wenger graphs.

Posted in Combinatorics, Extremal Combinatorics, Finite Geometry, Incidence Geometry | Tagged , , , , , , , | 1 Comment

The footprint bound

Studying the set of common zeros of systems of polynomial equations is a fundamental problem in algebra and geometry. In this post we will look at estimating the cardinality of the set of common zeros, when we already know that it is finite. In fact, with some extra theory one can also determine whether the set of zeros is finite or not.

My main motivation here is to popularise the so-called footprint bound, especially among combinatorialists (combinators?).  As an application, we will see a nice “conceptual” proof of the so-called Alon-Füredi theorem, which is a result that in particular solves the following well-known geometric problem whose 3-dimensional case appeared as Problem 6 in IMO 2007:

Let S be a finite subset of a field F and let a \in S^n. What is the minimum number of hyperplanes in F^n that can cover all the points in S^n, while missing the point a?

(The answer is n(|S| - 1).)

To be able to state the footprint bound we need some definitions. A monomial order is a total order on the set of all monomials in F[x_1, \dots, x_n] which satisfies the property u \leq v \implies uw \leq vw for any monomials u, v, w. For example, we can look at the graded lexicographic order in which we first compare the total degree of the monomials and if that’s equal then we compare them lexicographically, i.e., for u = (u_1, \dots, u_n) and v = (v_1, \dots, v_n) we have \prod x_i^{u_i} > \prod x_i^{v_i} if \sum u_i > \sum v_i or if \sum u_i = \sum v_i and the left-most non-zero entry of the vector u - v is positive (so for example, we will have x_1^2 x_2^3 \geq x_1^2 x_2 x_3^2).

Now given a polynomial f in F[x_1, \dots, x_n] and a monomial ordering \leq, we denote the largest monomial  appearing in f w.r.t. “\leq” by LM(f) (a.k.a. leading monomial). And if J is an ideal of polynomials then we define LM(J) = \{LM(f) : f \in J\}. Given an ideal J, we are going to give an upper bound on the set V(J) = \{x \in F^n : f(x) = 0~\forall f \in J\}. For that, let \Delta(J) denote the set of those monomials which are not contained in the (monomial) ideal \langle LM(J) \rangle (these are also known as standard monomials). Then this set \Delta(J) is known as the footprint of the ideal J.

Theorem 1 (Footprint bound [7], [5,§5.3]) If \Delta(J) is finite, then we have |V(J)| \leq |\Delta(J)|. Moreover, the bound is sharp if the ideal J is radical.
Proof. Exercise! (show that \Delta(J) is a basis of the vector space F[x_1, \dots, x_n]/J.)

Of course, the usefulness of this bound (at least in combinatorial situations) depends on how easily we can calculate, or estimate, |\Delta(J)|. Say J is generated by g_1, \dots, g_r and let \Delta(g_1, \dots, g_r) = \{x^u : x^u \not \in \langle LM(g_i) \rangle ~\forall i\}. Then clearly \Delta(J) \subseteq \Delta(g_1, \dots, g_r), and since we have a finite list of monomials to check, with right choice of g_i‘s it can be easier to compute \Delta(g_1, \dots, g_r). Ideally, we would like \Delta(J) to be equal to \Delta(g_1, \dots, g_r), and in fact this does happen when g_1, \dots, g_r forms a Gröbner basis of J (if you know what a Gröbner basis is; but you don’t really need to for the rest of this post). Let’s consider an example. Let n = 2 and J = \langle x^2 y^3 - y, x^3y - x, x^4 - y^3, y^4 - xy^2 \rangle \subset F[x, y], and consider the graded lexicographical order. Then the multiples of the leading monomials x^2y^3, x^3y, x^4, y^4 can be seen pictorially as follows:

grid_footprint

The monomials which are multiples of these four monomials, i.e. the monomials in the ideal generated by these leading monomials, correspond to the integer points in the shaded region. Therefore, the elements of \Delta(x^2 y^3 - y, x^3y - x, x^4 - y^3, y^4 - xy^2) corresponds to the points in the unshaded region, and hence |\Delta(J)| \leq |\Delta(x^2 y^3 - y, x^3y - x, x^4 - y^3, y^4 - xy^2)| = 12, which implies that these polynomials have at most 12 common solutions (Wolfram Alpha tells me that there are in fact only 2).

Say we are now interested in estimating the number of zeros of a polynomial f \in F[x_1, \dots, x_n] contained in a finite grid S = S_1 \times \cdots \times S_n, where S_i \subseteq F, assuming that f does not vanish on all points of S. This is precisely what the Alon-Füredi theorem is all about. Let s_i = |S_i| and g_i = \prod_{a \in S} (x_i - a). Then g_i is a polynomial of degree s_i, and the points of the S are the common solutions of the polynomials g_1, \dots, g_n. What we are interested in is V(J) for J = \langle f, g_1, \dots, g_n \rangle. For that we will first assume that f is in its reduced form, so that LM(f) (with respect to any monomial ordering) is of the form \prod x_i^{u_i} where 0 \leq u_i \leq s_i - 1 for all i. We can always do this by reducing higher powers of x_i by using the equation g_i(x_i) = 0 (the so-called grid reduction [3, Chapter 8]). Say \deg (f) = d and x^u = \prod x_i^{u_i} is the leading monomial of f, with \sum u_i = d. Then \Delta(x^u, x_1^{s_i}, \dots, x_n^{s_n}) is equal to the set of all monomials of the form \prod x_i^{v_i} where there exists an index j for which v_j < u_j. The number of such monomials is in fact equal to \prod s_i - \prod (s_i - u_i) because \prod s_i is the total number of reduced monomials, and \prod (s_i - u_i) of them are those which are multiples of x^u. If we let c_i = s_i - u_i, then we have 1 \leq c_i \leq s_i and \sum c_i = (\sum s_i) - d, which directly implies the Alon-Füredi theorem:

Theorem 2 (Alon-Füredi [1, Theorem 5]) Let f be a polynomial which does not vanish on the entire grid S = S_1 \times \cdots \times S_n, then f does not vanish on at least \min \{ \prod c_i : 1 \leq c_i \leq s_i ~\forall i,~ \sum c_i = (\sum |S_i|) - \deg f\} points of S.

Compared to the other proof of the Alon-Füredi theorem, I feel like this proof gives us a better understanding of where that bound comes from and “why” this theorem is true. Also, the same proof also gives the generalized version of the Alon-Füredi theorem [3, Theorem 9.1.2] (over fields). Moreover, one can also prove Alon’s combinatorial nullstellensatz using the footprint bound as follows.

Theorem 3 (Combinatorial Nullstellensatz) Let f be such that LM(x) = x^u  with u = (u_1, \dots, u_n) satisfying u_i < |S_i| for all i. Then there exists an a \in S for which f(a) \neq 0.
Proof. The size of the footprint of the ideal I = \langle f, g_1, \dots, g_n \rangle is strictly less than |S| since u, being the exponent of a leading monomial, is not contained in it. Therefore by the footprint bound we have |V(I)| < |S|, which implies that there is a point where f does not vanish.

So, footprint bound is a nice algebraic result which has several interesting applications and extensions [2, 4, 6, 7, 8, 9, 10]. It is particularly useful for studying systems of polynomial equations over a finite field \mathbb{F}_q, where we can take the grid S to be equal to \mathbb{F}_q^n. For further reading on this bound see the papers in the references below.

And now for the extra theory, here is a characterisation of systems of polynomial equations which have only finitely many common zeros (the so-called zero dimensional affine varieties over the algebraic closure of the field).

Theorem 4 ([5, Page 234]) Let k be an algebraically closed field and let V = V(I) be an affine variety in k^n where I is an ideal in k[x_1, \dots, x_n]. Then the following statements are equivalent:

  1. V is a finite set.
  2. For each i, there exists some m_i \geq 0 such that x_i^{m_i} \in \langle LM(I) \rangle.
  3. Let G be a Gröbner basis for I. Then for each i, there exists some m_i \geq 0 such that x_i^{m_i} = LM(g_i) for some g_i \in G.
  4. The set \Delta(I) is finite.
  5. The k-vector space k[x_1, \dots, x_n]/I is finite-dimensional.

 

References

[1] N. Alon and Z. Füredi. Covering the cube by affine hyperplanes. European J. Combin., 14(2):79–83, 1993.
[2] P. Beelen, M. Datta, and S. R. Ghorpade. Vanishing ideals of projective spaces over finite fields and a projective footprint bound. arXiv:1801.09139.
[3] A. Bishnoi, Some contributions to incidence geometry and polynomial method, PhD Thesis (2016), Ghent University.
[4] C. Carvalho. On the second Hamming weight of some Reed–Muller type codes. Finite Fields Appl., 24:88–94, 2013.
[5] D. Cox, J. Little, and D. O’Shea. Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007.
[6] O. Geil. On the second weight of generalized Reed-Muller codes. Designs, Codes and Cryptography, 48(3): 323-330, 2008.
[7] O. Geil and T. Høholdt. Footprints or generalized Bezout’s theorem. IEEE Trans- actions on Information Theory, 46(2):635–641, 2000.
[8] O. Geil and C. Thomsen. Weighted Reed–Muller codes revisited. Designs, Codes and Cryptography, 66(1):195–220, 2013.
[9] O. Geil and U. Martínez-Penãs. Bounding the number of common zeros of multivariate polynomials and their consecutive derivatives, arXiv:1707.01354.
[10] R. Pellikaan and X.-W. Wu. List decoding of q-ary Reed-Muller codes. IEEE Transactions on Information Theory, 50(4):679–682, 2004.

Posted in Coding Theory, Combinatorics, Polynomial Method | Tagged , , , , , , , | Leave a comment

Introduction to polynomial method

(The following is a blog-friendly version of Chapter 7 of my PhD thesis, which is an introduction to the so-called polynomial method.)

The polynomial method is an umbrella term for different techniques involving polynomials which have been used to solve several problems in finite geometry, discrete geometry, extremal combinatorics and additive number theory. One of the general philosophies behind this method is to associate a set of polynomials (possibly a single polynomial), to a combinatorial object that we want to study, and then use some properties of these polynomials to describe the combinatorial object. For a concrete example, let us go through Koornwinder’s proof of the absolute bound on equiangular lines.

A set of lines in the Euclidean space \mathbb{R}^n through the origin (or any other fixed point) is called equiangular if the angle between every pair of distinct lines in the set is the same. For example, joining the opposite vertices of a regular hexagon in the plane \mathbb{R}^2, we get three equiangular lines.

At most how many equiangular lines can there be in \mathbb{R}^n?

This question was addressed by Gerzon (as reported by Lemmens and Seidel) who proved that there are at most n+1 equiangular lines in \mathbb{R}^n. Thus in particular, the regular hexagon example gives us the maximum possible equiangular lines in \mathbb{R}^2. But in general this bound is not sharp. In 1976, Koornwinder gave an “almost trivial proof” of Gerzon’s bound by giving a bijective correspondence between the set of equiangular lines in \mathbb{R}^n and a linearly independent set of polynomials lying in an {n+1 \choose 2} dimensional vector  space. This correspondence is as follows. Let L_1, \dots , L_k be k equiangular lines in \mathbb{R}^n and let u_1, \dots, u_k be unit vectors on these lines, chosen arbitrarily. Then we have \langle u_i,u_j \rangle^2 = \alpha for i \neq j where \alpha is a fixed real number in the interval [0,1). For i \in \{1, \dots, k\} define f_i \in \mathbb{R}[t_1, \dots, t_n] by f_i(t_1, \dots, t_n) = \langle u_i, (t_1, \dots, t_n) \rangle^2 - \alpha^2 (t_1^2 + \dots + t_n^2). Now since f_i(u_j) = 0 for all i \neq j and f_i(u_i) = 1 -\alpha^2 \neq 0, it is easy to see that f_1, \dots, f_k are linearly independent polynomials in the vector space \mathbb{R}[t_1, \dots, t_n]. As these polynomials lie in the {n + 1 \choose 2} dimensional subspace spanned by the monomial set \{t_i t_j : 1 \leq i < j \leq k\}, we get k \leq \binom{n + 1}{2}.

The argument above can also be seen as an example of the linear algebra method in combinatorics, which has been discussed in much detail in the influential (unfinished) manuscript of Babai and Frankl, and more recently in the beautiful book by Matoušek.

Another important way of using polynomials is to capture the combinatorial object via zeros of polynomials (or in general, algebraic varieties). One of the earliest examples here is the determination of the minimum size of an affine blocking set by Jamison in 1977. The problem is to find the minimum number of points required to “block” every hyperplane of the affine space \mathbb{F}_q^n. Clearly n(q - 1) + 1 points suffice (by taking all points that lie on one of the n axes), but can we do better? Jamison proved that we cannot, and his polynomial method proof can be sketched as follows: (a) first consider the dual problem, which is equivalent to finding the minimum number of hyperplanes required to cover all points of \mathbb{F}_q^n except the origin, (b) then identify \mathbb{F}_q^n with the finite field \mathbb{F}_{q^n}, (c) finally associate each hyperplane with the minimal polynomial over \mathbb{F}_q^n that vanishes on the hyperplane to show (using the theory of linearized polynomials) that if the number of hyperplanes is less than n(q - 1) + 1, then the polynomial t^{q^n - 1} - 1 = \prod_{\alpha \in \mathbb{F}_q^\times} (t - \alpha) does not divide the product of these polynomials corresponding to the hyperplanes. This technique of using polynomials over finite fields to solve finite geometrical problems came to be known as the “Jamison method” and it saw several applications (see for example this and this).

Brouwer and Schrijver gave another proof of Jamison’s theorem where they also started by considering the dual problem of hyperplane coverings but then proceeded by a much simpler argument involving multivariate polynomials over finite fields. Their approach was in fact quite similar to Chevalley’s proof of the famous Chevalley-Warning theorem  using reduced polynomials. We will see in Chapters 8 and 9 how both of these results are linked together by the notion of grid reduction, and in particular by the Lemma that a polynomial \mathbb{F}_q[t_1, \dots,t_n] which vanishes on all points of \mathbb{F}_q^n except one must have degree at least n(q-1). The Chevalley-Warning theorem, which is a statement on the zero set of a collection of polynomials over a finite field, has also found several applications in combinatorics. Alon, Friedland and Kalai used it to prove that every 4- regular graph plus an edge contains a 3-regular subgraph. Later, Bailey and Richter used the Chevalley-Warning theorem to give a new proof of the famous Erdős-Ginzburg- Ziv theorem from additive number theory. Recently, Clark, Forrow and Schmitt have shown that the Chevalley-Warning theorem and its combinatorial applications can be derived, and further generalized, using a result of Alon and Füredi from 1993. We will devote Chapter 9 to this Alon-Füredi Theorem, where we generalize the result and give a new simple proof. We also show how this result is linked to several other topics like coding theory, finite geometry and polynomial identity testing.

An important tool in the polynomial method involving zeros of polynomials is a result called Combinatorial Nullstellensatz. This powerful tool and its generalizations have been used extensively to solve several problems in additive number theory (see Chapter 9 of this for a survey) and more recently in some other areas as well. In 2014, Clark revisited Alon’s Combinatorial Nullstellensatz and showed how its proof can be seen as a “restricted variable analogue” of Chevalley’s proof of the Chevalley-Warning Theorem. He further generalized this result to commutative rings (adding certain extra conditions) and made it clear how many of the ideas involved are related to the notion of grid reduction. Ball and Serra introduced a related result which they called Punctured Combinatorial Nullstellensatz. This result was proved using Alon’s Combinatorial Nullstellensatz, and it has several combinatorial applications of its own. We will give another proof of this result in Chapter 8 by directly using the notion of grid reduction, and then use this result to prove a new generalization of the Chevalley-Warning theorem which we call the Punctured Chevalley-Warning Theorem. In fact, this result generalizes Brink’s Restricted Variable Chevalley-Warning theorem.

In recent years, there has been a lot of interest in the polynomial method as a result of Dvir’s two-page proof of the finite field Kakeya problem which involved an easy polynomial argument, and the developments which followed. Many experts worked on the finite field Kakeya problem using different techniques involving algebraic geometry and Fourier analysis, but made only partial progress towards a solution to this problem. And thus it was a great surprise to the mathematical community that such an easy polynomial argument could resolve this famous open problem. Even more recently, the notorious cap set problem has been solved using the polynomial method by Ellenberg and Gijswijt. Ideas originating from Dvir’s work have lead to several important advancements in mathematics, including the big breakthrough in the famous Erdős distinct distances problem by Guth and Katz. It is interesting to note that Dvir’s polynomial technique is a bit different from the techniques we have mentioned so far in this introduction as it involved polynomial interpolation instead of constructing explicit polynomials. For more details on this, we recommend the surveys by Dvir and Tao, and the recent book by Guth. Another example where a combinatorial problem is solved using polynomial interpolation, combined with a geometric argument, is Segre’s classical theorem on ovals in finite projective planes. Interestingly, the so-called “lemma of tangents” of Segre was used in combination with the Jamison/Brouwer-Schrijver bound on affine blocking sets by Blokhuis and Korchmáros to solve the Kakeya problem in 2 dimensions. Segre’s result (and his lemma of tangents) has been generalized further to higher dimensional finite projective spaces by Ball. For more on polynomial method in finite geometry, see the survey by Ball.

Posted in Combinatorics, Incidence Geometry, Polynomial Method | Tagged , , , , | 2 Comments