On a famous pigeonhole problem

After a short break from blogging, which involved moving from Ghent to Berlin, dealing with German bureaucracy, and learning how to make simple websites (the easiest bit), I am now back. I am working as a postdoc at the Free University of Berlin right now and a part of my job is to teach a course (mainly taking care of the exercise classes) called Extremal Combinatorics with Tibor Szabó . While preparing a review sheet of exercises for this course, we discussed about which pigeonhole problem to include. In the actual sheet we ended up giving a simple one, but we did think about including the following famous problem:

Prove that if you pick any n + 1 numbers from 1, 2, \dots, 2n, there will be two distinct numbers a and b such that a divides b.

This used to be one of the favourite problems of Paul Erdős for young (mathematically inclined) kids. Quoting Proofs from THE BOOK, “As Erdős told us, he put this question to young Lajos Pośa during dinner, and when the meal was over, Lajos had the answer. It has remained Erdős’ famous initiation questions to mathematics”. Perhaps you should try it over your next dinner, before reading further.

So here is the standard pigeonhole argument for this. Every positive integer a can be written as a = 2^k(2m - 1), with k \geq 0 and m \geq 1 by considering the largest power of 2 in the factorisation of a. Now write each of the n + 1 numbers you have chosen in this form. Since there are precisely n odd integers in the set 1, \dots, 2n and hence precisely n possibilities for the 2m - 1 factor, by the pigeonhole principle we must have two integers a, b with a = 2^{k_1}(2m - 1) and b = 2^{k_2}(2m - 1), for some k_1 < k_2 and m \in \{1, \dots, n\}. Thus, a divides b.

Looking at the proof carefully, what we have really done is partition the set [2n] = \{1, \dots, 2n\} into n parts S_1, \dots, S_n, with S_i = \{2^k(2i - 1) : k \geq 0\} \cap [2n], such that in each S_i every number divides all the other numbers which are bigger. Then since we have chosen n + 1 elements from [2n] = S_1 \sqcup \dots \sqcup S_n, there must be an index i for which S_i contains two of these n + 1 elements.

Now, as a mathematician, there are two natural questions that one can ask:

(Q1) What is the largest possible size of a subset of [2n] which does not contain two elements with one dividing the other?

(Q2) Can we classify, or say something interesting about, all such largest possible subsets?

(Q1) is straightforward to answer, since we quickly see that the n-element subset \{n + 1, n + 2, \dots, 2n\} does not have any two distinct elements in it with one dividing the other. And since we have shown that any (n+1)-element subset will violate this property, n is the largest possible size.

(Q2) is what we will be really interested in.

This is such a natural question for a combinatorialist, but somehow I have been unable to find any reference for it (which Tibor found quite surprising as well). Now the first natural guess for an answer is that the example \{n + 1, n + 2, \dots, 2n\} is the unique n-element subset of [2n] with this “non-divisibility” property. It is natural because this is what happens in typical extremal combinatorics problems. But if you think about it for a while, you’ll be able to come up with the set \{n, n + 1, n + 2, \dots, 2n - 1\}, which is another such example. And then \{n - 1, n, n + 1, \dots, 2n - 3, 2n - 1\} is another one. We can keep doing this for a certain k = cn number of steps, where c is a constant (figure out what c is!). This shows that there are at least a linear number of extremal examples in this problem. Moreover, you can prove that any such extremal set must contain all the odd numbers from \{n + 1, n + 2, \dots, 2n\}. But are there any more such sets?

After some more thought, I was able to show that there are in fact exponentially many such extremal sets! More precisely, there exists a constant c > 1 such that there are at least c^n n-element subsets of [2n] in which there are no two elements with one dividing the other. I will leave the proof of this as an exercise, since it did not take me too long to come up with it and in fact, one of the students in the exercise classes that I teach was able to come up with the same proof as mine pretty quickly.

Now the questions that I want to ask are the following:

What is the best possible value of the base c? In other words, asymptotically how many such extremal subsets can we find? Is there a “substantially” better upper bound on the number of such sets than \binom{3n/2}{n/2} \sim 2^{1.38 n}?

If you have an answer, or some ideas, then please let me know in the comments.

Edit 06/11/2017: I computed the number of such extremal subsets for small values of n, after which a quick search on The On-Line Encyclopedia of Integer Sequences revealed that people have computed and studied this number.  In particular, Robert Israel mentions the same lower bound that I obtained.

Edit 22/05/2018: My question has been completely answered independently by Liu, Pach and Palincza in “The Number of Maximum Primitive Sets of Integers“, Edit 29/11/2018: and Nathan Mcnew in “Counting primitive subsets and other statistics of the divisor graph of {1, …, n}”

Posted in Combinatorics, Extremal Combinatorics | Tagged , , , , | 5 Comments

What I have learned in finite geometry

On September 2nd, 2014 I wrote a blog post titled learning finite geometry, in which I described how much I have learned in my first year of PhD and more importantly, the topics that I wish to learn while I am in Ghent. Since I’ll be leaving Ghent in a few days, I feel like this is the right time reflect on that old post. I am quite happy to see that beyond just learning those topics, I have also been able to do some research on them. Here are the topics that I described in my previous blog post:

1. Blocking sets
Two of my papers have results on blocking sets. In the first one, I used the Alon-Furedi theorem to prove old and new results on partial covers and blocking sets with respect to hyperplanes in finite Desarguesian affine/projective spaces. I have given a new common framework for treating some of the problems on blocking sets in an easy way using the polynomial method. See section 6 of my paper for the details.

In my second paper, I have proved a generalisation of a classical result in finite geometry from the 1970s due to Aiden A. Bruen and Jeff Thas. Moreover, I have done so with a new method involving the expander mixing lemma which has lead to a unified (and simple) approach to problems related to blocking sets, Nikodym sets and tangency sets. This method is quite interesting in its own right as it has also lead to some advancements in the cage problem.

2. Unitals
These objects appear in my second paper on blocking sets, as they were used by Sam to give a construction of large minimal multiple blocking sets in finite projective planes. In fact, as Sam has described in his blog post, an open problem on unitals was one of my motivations to look at these blocking sets. Sadly, we haven’t been able to solve that open problem yet.

3. Polar Spaces
After the end of my first year, I had seen polar spaces appear in almost every other research paper and talk in finite geometry. Even though I had read and understood the definitions of these objects in my first year, I wasn’t really comfortable with them. I am not sure when exactly that changed, but it was certainly quite gradual. In my paper with John and Gordon, on regular induced subgraphs of generalized polygons, I have used some of the basic theory of polar spaces to give new constructions of small k-regular graphs of girth g with g = 8, 12. Before that I also refereed a paper which helped me revisit some of the basics of the axiomatic theory of polar spaces. Overall, I can say that I have made pretty good progress towards understanding polar spaces. I now wish that more mathematicians would learn the basic theory of these beautiful geometrical objects, especially because people sometimes obtain results on specific classes of finite polar spaces without realising it (see this for example). The new book “Finite Geometry and Combinatorial Applications” by Simeon Ball can be a useful resource for anyone interested in doing so.

Besides these topics, I have also learned about several combinatorial problems in which finite geometry plays an important part. For example, the cage problem, forbidden subgraphs, and ramsey numbers. During my postdoc years, I will work on finding more such combinatorial applications of finite geometry, and using various tools from combinatorics to solve problems in finite geometry.

Posted in Combinatorics, Finite Geometry, Incidence Geometry, Research Diary | Tagged , , , , , , | Leave a comment

The cage problem and generalized polygons (part 1)

This post is a continuation of my previous post on the cage problem. Just to recall the main problem, for any given integers k \geq 2 and g \geq 3, we want to find the least number of vertices in a simple undirected graph which is k-regular and has girth g. The least number is denoted by c(k, g) and the (k, g)-graphs with c(k, g) vertices are called cages. As we saw before, besides the generalized polygons of order q, for some prime power q, there are no know infinite families of cages (for a list of all known cages, see the survey by Excoo and Jajcay. ). And by the result of Feit and Higman on generalized polygons, we see that these infinite families have g \in \{6, 8, 12\}. It turns out that we know a lot more about c(k, g) for g \in \{6, 8, 12\} than the general case, even when there is no known generalized polygon of order k - 1, i.e., when k - 1 is not a prime power.

Let’s start with an example. For every prime power q, we know that c(q + 1, 6) = 2(q^2 + q + 1) by looking at the incidence graph of a finite projective plane (i.e., a generalized 3-gon) \pi of order q. Now pick a point X in \pi and a line \ell through X. Then observe that through each point Y \neq X, there are exactly q lines which do not contain X: all the lines through Y except the line XY. Dually, every line m \neq \ell contains exactly q points which do not lie on \ell: all points on m except the intersection point of \ell and m. What this shows is that if we take the subgraph of the incidence graph of \pi induced on the q^2 points not contained in \ell and the q^2 lines not containing X, then it is a q-regular graph. Moreover, this graph has girth 6, which leave as an exercise to the reader. So, we have just constructed a q-regular graph of girth 6 with 2q^2 vertices, proving that \displaystyle c(q, 6) \leq 2q^2 for all prime power q.

In fact, we can do slightly better if we take X \not\in \ell. We then take all the points distinct from X which are not on \ell, and all the lines distinct from \ell which do not contain X. This gives us a q-regular subgraph on 2(q^2 - 1) vertices, proving c(q, 6) \leq 2(q^2 - 1) (not a big improvement, I know). But is this the best that we can do by pursuing this idea of constructing q-regular induced subgraphs?

Interestingly, there is something much better that can be done, if q is a square. We can take the subgraph induced on the points and lines of \pi which are not contained in a Baer subplane of \pi. In the Desarguesian plane \mathrm{PG}(2,q), we can get a Baer subplane by restricting the co-ordinates of the points and lines to the subfield \mathbb{F}_{\sqrt{q}} (similarly the real projective plane \mathrm{PG}(2, \mathbb{R}) is a Baer subplane of the complex projective plane \mathrm{PG}(2, \mathbb{C})). While every subfield of \mathbb{F}_q gives rise to a subplane of \mathrm{PG}(2,q), the Baer subplane in particular has the following nice properties:

  • through every point outside the subplane there is a unique line of the subplane;
  • every line outside the subplane contains a unique point of the subplane;

In other words, the induced subgraph that we get on the points and lines not contained in the Baer subplane is q-regular. This construction gives us c(q, 6) \leq 2(q^2 + q + 1 - q - \sqrt{q} - 1) = 2(q^2 - \sqrt{q}), which is much better than the previous bound of 2(q^2 - 1). But, this construction only works when q is a square.

So now the natural question arises (as it always should!), is this the best that we can do? Interestingly, now the answer is yes. We have actually reached a theoretical lower bound on the size of a q-regular induced subgraph in a projective plane of order q. This is a consequence of the following result that tells us how small a k-regular induced subgraph of a d-regular bipartite graph be.

Theorem 1. Let G = (L, R, E) be a d-regular bipartite graph and let \lambda be its second largest eigenvalue. Let H be a k-regular induced subgraph of G. Then |V(H)| \geq (k - \lambda)|V(G)|/(d + \lambda).

Theorem 1 can be proved easily using the expander mixing lemma, which I discussed in a previous post. It is well known, and easy to show, that the second largest eigenvalue of the incidence graph of a generalized n-gon of order q is \sqrt{q}, \sqrt{2q} and \sqrt{3q} for n = 3, 4 and 6, respectively. The number of vertices in the incidence graphs of a generalized n-gon is q^2 + q + 1, (q + 1)(q^2 + 1) and (q + 1)(q^4 + q^2 + 1) for n = 3, 4 and 6 respectively. If we now substitute these values in Theorem 1, and note the factorisations q^2 + q + 1 = (q + \sqrt{q} + 1)(q - \sqrt{q} + 1), q^2 + 1 = (q + \sqrt{2q} + 1)(q - \sqrt{2q} + 1) and q^4 + q^2 + 1 = (q^2 + q + 1)(q + \sqrt{3q} + 1)(q - \sqrt{3q} + 1) (thanks Sam!), we get the following result.

Theorem 2. Let G be the incidence graph of a generalized n-gon of order q, and let H be a (q + 1 - t)-regular induced subgraph, for some t \geq 1. Then

\displaystyle|V(G)| - |V(H)| \geq \begin{cases} 2t(q + \sqrt{q} + 1), & \text{if } n = 3;\\ 2t(q + 1)(q + \sqrt{2q} + 1), & \text{if } n = 4; \\ 2t(q+1)(q^2 + q + 1)(q + \sqrt{3q} + 1), & \text{if } n = 6. \end{cases}

In particular, for n = 3 and t = 1, we get 2(q^2 + q + 1) - |V(H)| \geq 2(q + \sqrt{q} + 1), i.e., V(H) \leq 2(q^2 - \sqrt{q}). Therefore, the construction using the Baer subplane is the best possible. In fact, for n = 3 and arbitrary t, we have a construction using t disjoint Baer subplanes (see Construction 3.7 here) which is sharp according to this new bound. So, the problem of looking at regular induced subgraphs is settled for projective planes. For generalized quadrangles and hexagons though, there is a lot more to be said. In collaboration with John Bamberg and Gordon Royle, I have obtained some new finite geometrical constructions which improve the bounds on c(q, 8) and c(q, 12). We will discuss that in the next post.

Posted in Combinatorics, Extremal Combinatorics, Finite Geometry, Incidence Geometry, Spectral Graph Theory | Tagged , , , , | Leave a comment

The coefficient formula and Chevalley-Warning

This post is about a certain result on coefficients of a multivariate polynomial obtained by Schauz, Lason and Karasev-Petrov, which generalises Alon’s combinatorial nullstellensatz (or to be more precise, Theorem 1.2 in Alon’s paper). This sharpening of Alon’s result has seen some very interesting applications, for example a short proof of dyson conjecture and its q-analog. Now there is a new application by Pete Clark who has used it to obtain a simultaneous generalization of Chevalley-Warning and Morlaye’s theorem, building up on the work of Baoulina.  We will see how this result is proved.

If f is a single variable polynomial of degree d over a field K, then the values of f over any arbitrary set S of d + 1 elements from K completely determines f. This is basically Lagrange’s polynomial interpolation. One way of seeing it is to define the polynomials \delta_x(t) = \frac{\varphi(t)}{\varphi'(t)(t - x)} for each x \in S where \varphi(t) = \prod_{s \in S}(t - s) and note that \delta_x(s) = 1 if x = s and 0 otherwise. Then for each s \in S we have f(s) = \sum_{x \in S} f(x) \delta_x(s). And since no two degree d polynomials can agree in more than d points, we deduces that the polynomial f must be equal to the polynomial \sum_{x \in S} f(x) \delta_x = \sum_{x \in S} f(x) \prod_{s \in S \setminus \{x\}}\frac{t - s}{x - s}. The situation becomes quite different when we look at polynomials in n variables, but in fact we can still say something quite interesting using the same ideas.

Theorem 1. (Coefficient Formula) Let f \in K[t_1, \dots, t_n] be a polynomial of degree at most a_1 + \cdots + a_n for some positive integers a_i. Let A_1, \dots, A_n be finite subsets of K with |A_i| = a_i + 1. Then the coefficient of t_1^{a_1}t_2^{a_2} \cdots t_n^{a_n} in f is equal to

 \displaystyle \sum_{(x_1, \dots, x_n) \in A_1 \times \cdots \times A_n} \frac{f(x_1, \dots, x_n)}{\varphi_1'(x_i)\cdots\varphi_n'(x_n)}

where \varphi_i(t_i) = \prod_{x \in A_i}(t_i - x).

Unlike the single variable case, the Coefficient formula does not uniquely determine the full polynomial but just some coefficients of the polynomial. This looks quite weak, but surprisingly it has some really interesting consequences. For a proof of Theorem 1 look at these notes (Theorem 13) or any of the original papers.

Let’s first see how this a generalization of Alon’s combinatorial nullstellensatz.

Theorem 2. (Combinatorial Nullstellensatz) Let f \in K[t_1, \dots, t_n] and suppose that the coefficient of the monomial \prod_{i = 1}^n t_i^{d_i} in f is non-zero where d_1, \dots, d_n are some positive satisfying \deg f = d_1 + \cdots + d_n. Then for any finite subsets A_1, \dots, A_n of K satisfying |A_i| > d_i for all i, there exists (a_1, \dots, a_n) \in A_1 \times \cdots \times A_n such that f(a_1, \dots, a_n) \neq 0.

Proof: We have \deg f \leq \sum (|A_i| - 1). Then by Theorem 1, we get

\displaystyle \sum_{(x_1, \dots, x_n) \in A_1 \times \cdots \times A_n} \frac{f(x_1, \dots, x_n)}{\varphi_1'(x_1) \cdots \varphi_n'(x_n)} \neq 0,

which implies that there exists an (a_1, \dots, a_n) \in A_1 \times \cdots \times A_n for which f(a_1, \dots, a_n) \neq 0. \blacksquare

Therefore, Theorem 2 is a direct consequence of Theorem 1. Let’s now see how Theorem 1 can sometimes give more general results than Theorem 2.

One of the first applications of combinatorial nullstellensatz that Alon gave in his paper was to prove that the classical Chevalley-Warning theorem follows directly from his result. Actually, he only proved the weaker form of Chevalley-Warning (also known as Chevalley’s theorem), which is as follows: let f_1, \dots, f_r \in \mathbb{F}_q[t_1, \dots, t_n] such that \sum \deg f_j < n, then for Z = \{x \in \mathbb{F}_q^n : f_1(x) = \cdots = f_r(x) = 0\} we have |Z| \neq 1. The stronger form says that the characteristic p of the finite field \mathbb{F}_q divides |Z|. This can be proved using Theorem 1 as follows.

Define f := \prod_{j = 1}^r (1 - f_i^{q - 1}). Then \deg f = (q - 1) \sum \deg f_j, and f(x) = 1 for all x \in Z and 0 otherwise. Since \deg f < (q - 1) + (q - 1) + \cdots + (q - 1), the coefficient of t_1^{q-1} \cdots t_n^{q - 1} in f is equal to 0, and we can apply Theorem 1 to get

\displaystyle 0 = \sum_{(x_1, \dots, x_n) \in \mathbb{F}_q^n} \frac{f(x_1, \dots, x_n)}{\varphi'_1(x_1)\cdots\varphi_n'(x_n)} = \sum_{(x_1, \dots, x_n) \in Z} \frac{1}{\varphi'_1(x_1)\cdots\varphi_n'(x_n)},

where \varphi_i(t_i) = \prod_{\lambda \in \mathbb{F}_q} (t_i - \lambda). We will now use Lemma 3 below, which is a nice result on finite fields that will be useful in another context as well. In particular that result will show that \prod_{\mu \in \mathbb{F}_q, \mu \neq \lambda} (\lambda - \mu) = -1 for every \lambda \in \mathbb{F}_q. Therefore, our equation simplifies to 0 = \sum_{x \in Z} (-1)^n = |Z|(-1)^n. This means that the characteristic p of the field \mathbb{F}_q must divide |Z|, which is the stronger Chevalley-Warning theorem. \blacksquare

Lemma 3. Let d be a divisor of q - 1 and let X = \{x^d : x \in \mathbb{F}_q\}. Define \varphi(t) = \prod_{x \in X} (t - x). Then:

  1. \varphi'(0) = -1.
  2. For all x \in X, we have \varphi'(x) = -1/d.

Proof. Note that \varphi'(t) = \sum_{x \in X} \prod_{y \in X \setminus \{x\}} (t - y).

  1. We have \varphi'(0) = \prod_{x \in X \setminus \{0\}} (-x) = (-1)^{(q-1)/d} \prod_{x \in X \setminus \{0\}} x. Let \theta be a primitive element of \mathbb{F}_q, then we have X = \{0\} \cup \{\theta^{di} : 1 \leq i \leq (q - 1)/d\}. Therefore,
    \displaystyle \prod_{x \in X \setminus \{0\}} x = \theta^{\frac{(q - 1)}{2}((q - 1)/d + 1)}= (-1)(-1)^{(q - 1)/d}
    where the las equality follows from the fact that \theta^{(q-1)/2} = -1.
  2. Let x \in X \setminus \{0\}. Then
    \displaystyle \varphi'(x) = \prod_{y \in X \setminus \{x\}} (x - y)  = x \prod_{y \in X \setminus \{0, x\}} (x - y) = x \cdot x^{(q - 1)/d - 1} \prod_{y \in X \setminus \{0, x\}} (1 - y/x) = 1 \cdot \prod_{\lambda \in X \setminus \{0, 1\}} (1 - \lambda) = \varphi'(1).
    Since the elements of X \setminus \{0\} are all the \frac{q - 1}{d}th roots of unity, we have \prod_{\lambda \in X \setminus \{0,1\}} (t - \lambda) = (t^{(q - 1)/d} - 1)/(t - 1) = 1 + t + t^2 + \cdots + t^{(q - 1)/d - 1}. Therefore, \varphi'(1) = (q - 1)/d = -1/d. \blacksquare.

We now have all the ingredients for the new result of Clark, which is a generalization of Chevalley-Warning theorem obtained by restricting the variables to power residues. More precisely, we have the following:

Theorem 3. (Clark 2017) For 1 \leq i \leq n - 1 let m_i be a positive integer and define d_i = \gcd(m_i, q - 1). Let f_1, \dots, f_r \in \mathbb{F}_q[t_1, \dots t_n] be such that

\displaystyle \sum_{j = 1}^r \deg f_j < \sum_{i = 1}^n \frac{1}{d_i}.

Then the characteristic p of \mathbb{F}_q divides the cardinality of the set Z = \{(x_1, \dots, x_n) \in \mathbb{F}_q^n : f_1(x_1^{m_1}, \dots, x_n^{m_n}) = \cdots = f_r(x_1^{m_1}, \dots, x_n^{m_n}) = 0\}.

Proof. Apply all the tools so far carefully, or see Pete’s paper.

When all m_i‘s are equal to 1, Theorem 3 is just the Chevalley-Warning theorem. When r = 1 and \deg f_1 = 1, then this is a result due to Morlaye. Morlaye’s result has been generalized by Wan to prove that if \sum 1/d_i > b \geq 1 = \deg f_1, then q^b divides |Z|.

Question. Is there such a generalization of Theorem 3, or some other special case of it?


Posted in Number Theory, Polynomial Method | Tagged , , , , , , | 2 Comments

The Cage Problem

I recently finished my research visit to UWA where I worked with John Bamberg and Gordon Royle on some finite geometrical problems related to cages. So this seems like the right time for me to write a blog post about these graphs.

A (k,g) graph is a simple undirected graph  which is k-regular and has girth g, or in other words, every vertex in the graph is adjacent to exactly k other vertices and the length of the shortest cycle in the graph is g. For example, the following is a (3, 5) graph, which is the well-known Petersen graph.


Say G is a (k, 5) graph. Pick a vertex v of G. Then there are k vertices of G adjacent to v. None of these k vertices are adjacent among each other since that would give us a triangle in G, which contradicts the fact that the girth of G is five. Each of these k vertices is adjacent to k -1 more vertices besides v, all of which must lie at distance 2 from v. Moreover, these k(k-1) vertices are distinct since otherwise we will get a cycle of length 4 in G. Therefore, we see that G has at least 1 + k + k(k-1) vertices. Note that for k = 3 this number is equal to 10, and hence the Petersen graph above is the smallest (3,5) graph.

The argument above generalises to show that for odd g, the number of vertices in a (k, g) graph is at least 1 + k + k(k-1) + \cdots + k(k-1)^{(g-3)/2}. When g is even we can count the number of vertices at distance at most (g-2)/2 from a fixed edge, which shows that the number of vertices is at least 2(1 + (k-1) + \cdots + (k-1)^{(g-2)/2}). These lower bounds on the number of vertices in a (k,g) graph are collectively known as the Moore bound and the (k, g) graphs which have these many vertices are known as Moore graphsThe Petersen graph above is an example of a Moore graph. Interestingly, there are very few Moore graphs!

Using spectral methods, it has been shown that the Moore graphs can only exist in the following cases: (a) k = 2 and g \geq 3 (cycles), (b) g = 3 and k \geq 2 (complete graphs), (c) g = 4 and k \geq 2 (complete bipartite graphs), (d) g = 5 and k \in \{2, 3, 7, 57\}, (e) g \in \{6, 8, 12\} and there exists a generalized g-gon of order k - 1. The (3, 5) Moore graph is the Petersen graph, the (7,5) Moore graph is the Hoffman-Singleton graph and its a famous open problem in mathematics whether a (57, 5) Moore graph exists or not. Generalized n-gons are certain point line geometries that were defined by Tits in his famous paper on trialities, and they are precisely the rank 2 buildings. Their incidence graph is biregular, has diameter n, and girth 2n. Those generalized n-gons whose incidence graph is regular with degree k can only exist for n \in \{6, 8, 12\} and in each of these cases we only have constructions where k - 1 is a prime power! We are nowhere near proving that k - 1 has to be a prime power. For example, whether a projective plane (which equivalent to a generalized 3-gon) of order 12 exists or not is still a big open problem.

Since there are very special values of k and g for which a (k,g) Moore graph exists, it is natural to consider the following problem:

Find the smallest number of vertices c(k,g) in a (k,g) graph.

The (k,g) graphs which have exactly c(k, g) vertices are known as cages. It was shown by Erdős and Sachs (see the Appendix in the survey on cages *) that cages exist for every value of k \geq 2 and g \geq 3. But besides the Moore graphs, explicit examples of cages, or equivalently the exact value of c(k,g) is only known for a few values of k and g. See the database of Brouwer for a full list and description of the known cages.

Therefore, a natural problem that mathematicians have worked on is to construct small (k, g) graphs and thus improve the upper bounds on (k, g). The bound that Erdős and Sachs proved is c(k, g) \leq 4 \sum_{i = 1}^{g - 2} (k -1)^i which was improved later by Sauer to c(k,g) \leq 2(k-2)^{g-2} for g odd and c(k,g) \leq 4(k-1)^{g-3} for  g even. These upper bounds are roughly square of the Moore bound. This square was improved to 3/2 power by Lazebnik, Ustimenko and Woldar who proved that if q is the smallest odd prime power bigger than k, then c(k, g) \leq 2kq^{3g/4 - a} where a = 4, 11/4, 7/2, 13/4 for g \equiv 0, 1, 2, 3 \pmod{4}. Since we can always find an odd prime between k and 2k, this bound is roughly of the power 3/2 of the lower bound.

For specific values of g, the known upper bounds are in fact much better than these general bounds. I will talk about how these bounds are obtained for g \in \{6, 8, 12\} by looking at certain subgraphs of the known Moore graphs in a later post where I will also discuss our new results in this direction.

Further Reading

[1]  Dynamic Cage Survey by Geoffrey Excoo and Robert Jajcay.

[2] The cage problem by Michael Giudici.

[3] Balaban 10-cage | Visual Insight by John Baez.


* A simpler proof using Cayley graphs was later given by Biggs. In both of these proofs we need the fact that for all k \geq 3 and 3 \leq g_1 < g_2, we have c(k, g_1) < c(k, g_2). The only place where I have found an explicit proof of this is in the paper of Fu, Huang and Rodger.

Posted in Combinatorics, Finite Geometry | Tagged , , , , , | 2 Comments

Expander Mixing Lemma in Finite Geometry

In this post I will discuss some nice applications of the expander mixing lemma in finite incidence geometry, including a new result that I have obtained recently.

In many of the applications of the lemma in finite geometry, the graph is bipartite, and therefore let’s start by recalling the bipartite version of the expander mixing lemma that I described in my last post.

Lemma 1. Let G = (L, R, E) be a semiregular bipartite graph with degrees d_L and d_R. Let S \subseteq L and T \subseteq R such that |S| = \alpha|L| and |T| = \beta |R|. Define e(S, T) = |\{(x, y) \in S \times T \mid \{x, y\} \in E\}|. Then we have

|\frac{e(S,T)}{e(G)} - \alpha \beta| \leq \frac{\lambda_2}{\lambda_1} \sqrt{\alpha \beta(1 - \alpha)(1 - \beta)},

where \lambda_1 = \sqrt{d_Ld_R} is the largest eigenvalue of G, \lambda_2 is the second largest eigenvalue of G in absolute value, and e(G) is the total number of edges in G.

There are interesting bipartite graphs related to finite geometries, block designs and other combinatorial structures, for which we know the second largest eigenvalue of the graph exactly. For example, the incidence graph (a.k.a. Levi graph) of a finite projective plane of order n has eigenvalues n + 1, \sqrt{n}, -\sqrt{n}, -n-1 (the eigenvalues of a bipartite graph are always symmetric around the origin). And more generally, the eigenvalues of the incidence graph of a 2(v, k, \lambda) design are \pm \sqrt{rk}, \pm \sqrt{r - \lambda}, 0. Therefore, applying expander mixing lemma to these graphs can potentially give some nice results in finite geometry. Let’s start with one of the classical results in the area of finite geometry and see how it can be proved using Lemma 1.

A blocking set in a finite projective plane (or in general, any hypergraph) is a set B of points with the property that for every line \ell of the plane we have |\ell \cap B| \geq 1. One of the central problems in the area of finite geometry has been to understand the possible sizes of a minimal blocking set, where a blocking set B is minimal if no proper subset of B is also a blocking set. If we do not assume minimality, then this problem becomes trivial as you can obtain a blocking sets of any size by adding some points to a given blocking set. The simplest, and the smallest!, example of a blocking set is a line of the projective plane since every two lines intersect each other. This gives us a blocking set of size n + 1, which we a trivial blocking set. For a non-trivial minimal blocking set B, Bruen and Thas proved the following result, which is one of the main starting points for a systematic study of blocking sets:

n + \sqrt{n} + 1 \leq |B| \leq n \sqrt{n} + 1.

Here is a new proof of the upper bound which uses Lemma 1. For each point x of B there must exist a line \ell_x such that \ell_x \cap B = \{x\} since otherwise B \setminus \{x\} will also form a blocking set, contradicting the minimality of B. We can choose a single such line for each point of B to get a set \mathcal{L} of lines in the projective plane with the property that (a) |\mathcal{L}| = |B| and (b) the number of edges between the sets B and \mathcal{L} in the incidence graph is exactly |\mathcal{L}|. The number of points/lines in a projective plane of order n is n^2 + n + 1. By applying Lemma 1, taking S = B, T = \mathcal{L} and defining b := |B|, we get the following:

|b - (n + 1)b^2/(n^2 + n + 1)| \leq \sqrt{n} \sqrt{b^2(1 - b/(n^2 + n + 1))^2}.

We must have b \geq n + 1 since B is a blocking set (prove it!) and therefore b(n+1) > n^2 + n + 1, which helps us remove the mod sign in the inequality. We get

(n + 1)b/(n^2 + n + 1) - 1 \leq \sqrt{n}(1 - b/(n^2 + n + 1)),

which simplifies to

b \leq (\sqrt{n} + 1)(n^2 + n + 1)/(n + \sqrt{n} + 1).

Since n^2 + n + 1 = (n + 1 - \sqrt{n})(n + 1 + \sqrt{n}), we get

b \leq (\sqrt{n} + 1)(n - \sqrt{n} + 1) = n \sqrt{n} + 1. \Box

Question 1. Can the lower bound on non-trivial minimal blocking sets be proved using eigenvalue methods as well?

The proof above is quite flexible as it can be easily adapted to similar situations, sometimes giving us new results. One such result is about minimal multiple blocking sets that I  obtained while exploring the power of this method.

A t-fold minimal blocking in a projective plane of order n is a set B of points with the property that (a) for each line \ell, we have |\ell \cap B| \geq t, and (b) through each point x \in B, there exists a line \ell_x with the property that |\ell_x \cap B| = t. We will prove that

|B| \leq \frac{1}{2} n\sqrt{4tn - (3t + 1)(t - 1)} + \frac{1}{2} (t - 1)n + t.

Again, we define a set of lines \mathcal{L} by taking a single line \ell_x through each x \in B. But this time we have t|\mathcal{L}| \geq |B|. By getting rid of a few lines we can assume that |\mathcal{L}| = |B|/t (technically it should be |\mathcal{L}| = \lceil |B|/t \rceil, but it can be checked later that this won’t affect the proof). The number of incidences between B and \mathcal{L} is still |B| since each line is incident with exactly t points. Therefore, by applying Lemma 1 again we get

\left\lvert b - \frac{(n+1)b^2}{t(n^2 + n + 1)} \right\rvert \leq \sqrt{n\frac{b^2}{t}\left(1 - \frac{b}{(n^2 + n+ 1)}\right)\left(1 - \frac{b}{t(n^2 + n + 1)}\right)}.

This can be simplified to the following quadratic inequality:

b^2 - ((t - 1)n + 2t)b - t(n - t)(n^2 + n + 1) \leq 0.

This implies that b must lie between the two roots of the quadratic equation x^2 - ((t - 1)n + 2t)x - t(n - t)(n^2 + n + 1) = 0, which gives us the required bound. \Box

For more applications of this proof method, see my preprint “Minimal multiple blocking sets” on arXiv.

Another interesting result in finite geometry that can be obtained easily via Lemma 1 is the following Theorem of Stefaan De Winter, Jeroen Schillewaert and Jacques Verstraete on incidence free subsets. For an incidence structure \Pi = (P, L, I), let \alpha(\Pi) be the largest value of |X||Y| where X \subseteq P and Y \subseteq L such that there are no edges between X and Y in the incidence graph of \Pi. Then for P equal to the set of points in the finite projective space \mathrm{PG}(n, q) and L equal to the set of k-dimensional spaces in \mathrm{PG}(n, q), we have

\alpha(\Pi) \approx q^{(k + 2)(n - k)}.

This follows directly from Lemma 1 by observing that this incidence structure \Pi formed by taking points and k-spaces of \mathrm{PG}(n,q) is in fact a 2({n + 1 \brack 1}_q , {k + 1 \brack 1}_q, {n - 1 \brack k - 1}_q) design with r = {n \brack k}_q.

Just last week, a new application of the expander mixing lemma to a graph theoretical problem related to finite geometry has appeared on arXiv. Jared Loucks and Craig Timmons have proved that the largest triangle free induced subgraph of the polarity graph of a projective plane of order n, with respect to any polarity \theta, has at most (n^2 + n + 1)/2 + \sqrt{n}(n^2 + n + 1)/(n + 1) vertices. The polarity graph is defined by taking the points as vertices and making two points x and y adjacent if and only if x \in \theta(y). These graphs were introduced by Erdős and Renyi for the special case of orthogonal polarity, and independently by W. G. Brown. They have since been used in many problems in extremal graph theory.

I am sure that there are plenty of interesting applications of the expander mixing lemma in finite geometry and extremal graph theory waiting to be discovered. I will soon write about another new result in this direction which I proved a couple of weeks back.

Posted in Combinatorics, Finite Geometry, Incidence Geometry, Spectral Graph Theory | Tagged , , , , | 3 Comments

Incidence Bounds and Interlacing Eigenvalues

The Szemerédi–Trotter theorem is one of the central results in discrete geometry which gives us a (tight) bound on the number of incidences, i.e., the number of point-line pairs with the point lying on the line, between finite sets of points and lines in the Euclidean plane. More precisely, let P and L be finite sets of points and lines, respectively, in \mathbb{R}^2 and define I(P, L) = \{(p, \ell) \in P \times L \mid p \in \ell\}. Then we have

|I(P, L)| \leq C(|P|^{2/3}|L|^{2/3} + |P| + |L|),

for some constant C. This result can be proved in several ways (including polynomial method), all of which use some topological property of the real plane. The same result does not hold over arbitrary fields, as can be seen for example by taking all q^2 points and q^2 + q lines in \mathbb{F}_q^2, but some analogous results can be proved over  other fields. The purpose of this post is to discuss one such result and some interesting things around it.

In 2011, Vinh proved that given a set P of points and a set L of lines in \mathbb{F}_q^2, we have

|I(P, L)| \leq \frac{|P||L|}{q} + q^{1/2}\sqrt{|P||L|},        (1)

using some methods from spectral graph theory. His proof, which is done over the setting of the projective plane \mathrm{PG}(2, q), does not really require any property of these planes over finite fields besides the fact that the points and lines form a combinatorial design (which of course holds for non-Desarguesian planes as well) . This was explicitly observed by Ben Lund and Shubhangi Saraf who proved the following incidence bound on 2(v, k, \lambda) designs, which generalises the results of Vinh:

Theorem 1. Let (X, B) be a 2(v, k, \lambda) design with replication number r and number of blocks b, let P be a subset of X and let L be a subset of B. Then

||I(P, L)| - \frac{r|P||L|}{b}| \leq (r - \lambda)^{1/2} \sqrt{|P||L|}.

Since r = q + 1, b = q^2 + q and \lambda = 1 for the affine plane \mathbb{F}_q^2, we get (1) from this theorem. Moreover, since points and fixed dimensional affine (or projective) subspaces of an n-dimensional space over \mathbb{F}_q also form 2-designs, we get some incidence bounds in those cases as well. The main tool used by Vinh, Lund and Saraf is the following spectral result for bipartite graphs.

Lemma 2. Let G = (L, R, E) be a semiregular bipartite graph with degrees k_L and k_R. Let S \subseteq L and T \subseteq R such that |S| = \alpha|L| and |T| = \beta |R|. Define e(S, T) = |\{(x, y) \in S \times T \mid \{x, y\} \in E\}|. Then we have

|\frac{e(S,T)}{e(G)} - \alpha \beta| \leq \frac{\lambda_2}{\lambda_1} \sqrt{\alpha \beta(1 - \alpha)(1 - \beta)},

where \lambda_1 = \sqrt{k_Lk_R} is the largest eigenvalue of G, \lambda_2 is the second largest eigenvalue of G in absolute value, and e(G) is the total number of edges in G.

Lemma 2 is often referred to as the expander mixing lemma in math and CS literature, and it roughly says that if the second largest eigenvalue of a graph (in absolute value) is small then the graph behaves like a random graph. Though Lemma 2 is usually attributed to a paper of Alon and Chung from 1988 (see for example Section 2.4 of the survey on expander graphs by Hoory, Linial and Wigderson), it actually appeared 9 years before that in the PhD thesis of Willem Haemers (see Theorem 3.1 in this and Theorem 5.1 in this), who proved it using interlacing technique. In fact, Haemers also implicitly proved Theorem 1 by making the observation that the second largest eigenvalue of the incidence graph of a 2(v, k, \lambda) design is (r - \lambda)^{1/2}. Note that for Theorem 1 we simply need to plug in the relevant values in Lemma 2 and ignore the (1 - \alpha)(1 - \beta) term on the right hand side. I’ll sketch the proof of Haemers (see also Section 4.9 in Spectra of Graphs by Brouwer and Haemers). For direct proof of Lemma 2, see Section 3.2 of this paper.

Lemma 3. Let A be a real symmetric matrix with eigenvalues \lambda_1 \geq \dots \geq \lambda_n, and let S be an n \times m real matrix satisfying S^TS = I. Let \mu_1 \geq \dots \geq \mu_m be the eigenvalues of the (real symmetric) matrix B = S^T A S. Then we have \lambda_i \geq \mu_i \geq \lambda_{n - m + i} for all i \in \{1, \dots, m\}.

Lemma 3 can be proved using basic properties of Rayleigh quotient. Given two sequences of real numbers \lambda_1 \geq \dots \geq \lambda_n and \mu_1 \geq \dots \geq \mu_m, with m < n, we say that the second sequence interlaces the first when \lambda_i \geq \mu_i \geq \lambda_{n - m + i} for all i (sometimes the term interlace is used only in the case of m = n - 1). Therefore, Lemma 1 tells us that the eigenvalues of the matrix B interlace the eigenvalues of the matrix A. By choosing the matrix S appropriately, Lemma 3 has the following corollary for arbitrary graphs.

Corollary 4. Let G be a graph on n vertices and let X_1, \dots, X_m be a partition of its vertices. Define the m \times m quotient matrix B = (b_{ij}) of the adjacency matrix A of G with respect to the given partition, by taking b_{ij} to be average row sum of the block of A determined by X_i and X_j, i.e., the average number of edges from X_i to X_j. Then the eigenvalues of B interlace the eigenvalues of A.

Now to prove Lemma 2, we will use the partition X_1 = S, X_2 = L \setminus S, X_3 = T, X_4 = R \setminus T of the bipartite graph G. The quotient matrix that we get with respect to this partition is

B = \begin{bmatrix} 0 & 0 & \frac{e(S,T)}{|S|} & k_L - \frac{e(S,T)}{|S|} \\ 0 & 0 & \frac{k_R|T| - e(S,T)}{|L| - |S|} & k_L -\frac{k_R|T| - e(S,T)}{|L| - |S|}  \\\frac{e(S,T)}{|T|} &k_R - \frac{e(S,T)}{|T|} & 0 & 0 \\\frac{k_L|S| - e(S,T)}{|R| - |T|} &k_R -\frac{k_L|T| - e(S,T)}{|R| - |T|}   & 0 & 0 \end{bmatrix}

Now let \mu_1 \geq \mu_2 \geq \mu_3 \geq \mu_4 be the eigenvalues of B, and let \lambda_1 \geq \lambda_2 \dots \geq \lambda_{n-1} \geq \lambda_n be the eigenvalues of A. Then it can be easily shown that \lambda_1 = \mu_1 = \sqrt{k_Lk_R} and \lambda_n = \mu_4 = -\sqrt{k_Lk_R}. (Note that the eigenvalues of both A and B are symmetric with respect to the the origin.) From interlacing and the fact that \mu_3, \lambda_{n-1} < 0, we get -\mu_2 \mu_3 \leq -\lambda_2 \lambda_{n-1}. Since \det(B) is equal to the product of the eigenvalues of B, we have

\det(B)/(k_Lk_R) \leq -\mu_2\mu_3 \leq -\lambda_2 \lambda_{n-1} = \lambda_2^2.

The determinant of B is not too difficult to compute due to the particular block structure of B and after some simplifications we can see that it is equal to (k_L k_R)^2 \frac{(e(S,T)/e(G) - \alpha \beta)^2}{\alpha \beta (1 - \alpha)(1 - \beta)}\blacksquare

Corollary 4 and interlacing techniques in general have several other interesting applications too, see the survey by Haemers and the book by Brouwer and Haemers. For another application of Theorem 1 to incidence problems see this paper by Stefaan De Winter, Jeroen Schillewaert and Jacques Verstraete. For a proof of Theorem 1 which does not require spectral graph theory, see this recent paper of Brendan Murphy and Giorgis Petridis. Recently, Ben Lund, Shubhangi Saraf and Charles Wolf have applied Theorem 1 to improve bounds on the finite field Nikodym problem in 3 dimensions, see this. Their work is the first one to show a gap between the lower bound on the size of a Nikodym set and the lower bound on the size of a Kakeya set in \mathbb{F}_q^3, for arbitrary q.

Edit 05/10/2016: Here is a nice introduction to interlacing techniques and spectral graph theory in general by Haemers: http://upcommons.upc.edu/handle/2099.2/2454. This in particular contains a simple proof of Cheeger’s inequality for regular graphs using interlacing, which I couldn’t find in any other place. More such videos can be found on his webpage.

Posted in Combinatorics, Finite Geometry, Incidence Geometry, Spectral Graph Theory | Tagged , , , , , , | 5 Comments

Generalized hexagons containing a subhexagon

I have recently uploaded a joint paper with Bart, “On generalized hexagons of order (3, t) and (4, t) containing a subhexagon”,on arXiv and submitted it for publication. In this work we extend the results of my first paper, which I discussed here, by proving the following:

Let q \in \{2, 3, 4\} and let S be a generalized hexagon isomorphic to the split Cayley hexagon \mathrm{H}(q) or its dual \mathrm{H}(q)^D. Then the following holds for any generalized hexagon S' that contains S as a full subgeometry: (1) S' is finite; (2) if q \in \{2, 4\} and \mathcal{H} \cong \mathrm{H}(q), then S' = S.

This result is what one might expect in general since (1) we do not know of any semi-finite generalized polygons and (2) we do know of any generalized hexagons which properly contain \mathrm{H}(q) as a full subgeometry, for q a power of a prime p \neq 3 (when q is a power of 3, \mathrm{H}(q) is isomorphic to its dual, which is always contained in a generalized hexagon of order (q, q^3)). But since we are nowhere near classifying finite generalized polygons, this problem seems to be pretty hard in general. Also, existence/non-existence of semi-finite generalized polygons is a major open problem in incidence geometry which has only been resolved for specific cases of generalized quadrangles (when every line has s \in \{3, 4, 5\} points on it). So for now we satisfy ourselves with this small contribution.

There is a nice counting based lemma in our paper which I really like and I hope that it can be used to obtain more results for generalized hexagons containing subhexagons. Let’s see what the lemma says.

Let S be a generalized hexagon of order (s, t) contained in a generalized hexagon S' as a full subgeometry (every line of S is also a full line of S'). Then using standard arguments for generalized polygons, we first show that every line of S' must have precisely s + 1 points on it. It directly follows from the axioms of a generalized hexagon that every point of S' is at distance at most 2 from S. So, we have three “types” of points in S', those contained in S, those at distance 1 from S and those at distance 2 from S. These three types of points can also be characterised by the kind of substructures of S they induce when we take all points of S at non-maximal distance from a given point of S'. We call these substructures singular, semi-singular and ovoidal hyperplanes, respectively (there is a good reason why these substructures of S are called “hyperplanes” of S; see Section 3 of the paper for exact definitions).

Now if S is a proper subgeometry of S', then it can be shown that there must be a line L of S' which does not contain any point of S. Say this line contains n_L points of the third type. Then our lemma says that for any two points x, y on L, the substructures of S induced by x and y intersect each other in precisely s + 1 - n_L points. The lemma is proved using a result of Bart (Proposition 4.7 in Polygonal Valuations), which we reprove in the paper without relying too much on the theory of polygonal valuations, followed by simple counting.

How do we use this lemma? Well, if for a given generalized hexagon S one can show that every pair of semi-singular hyperplanes in S intersect each other in more than s + 1 points, then there cannot be any generalized hexagon S' be which contains S as a proper full subgeometry as otherwise we can derive a contradiction by finding an appropriate line of S' which does not contain any points of S. This is what we check  for the case when S is isomorphic to \mathrm{H}(2) or \mathrm{H}(4) and thus obtain part (2) of our main result. Our check mostly consists of some clever computations in a computer model of these geometries as we do not really understand how semi-singular (or ovoidal) hyperplanes of split Cayley hexagons and their duals behave in general.

To prove part (1) of our main result we show that if every point of S' is at distance at most 1 from S, or equivalently, there are no points of the third type in S', then through every point of S' there are only finitely many lines. Therefore, if we show that a given generalized hexagon S does not contain any ovoidal hyperplanes a.k.a. 1-ovoids a.k.a. distance-2 ovoids, then from this result we get that every generalized hexagon containing S as a full subgeometry is finite. These 1-ovoids are simply sets of points which intersect every line in a unique point, thus in graph theoretical terms these are the exact hitting sets of the hypergraph obtained from the generalized hexagon by identifying each line by the set of points it contains. Now when S is isomorphic to \mathrm{H}(2)^D (a geometry that has 63 points and 63 lines) it is easy to show via an exhaustive computer search or via an old result of Frohardt and Johnson which classifies all hyperplanes of this geometry, that it does not contain any 1-ovoids. But in general this problem is quite hard. For \mathrm{H}(4)^D we use the non-existence result proved by me and Ferdinand in a recent paper.

The last remaining case of our main result is when S is isomorphic to H(3). The extra complication here is that H(3) is isomorphic to its dual, and hence it is contained in the dual twisted triality hexagon of order (3, 27). Moreover, it has 1-ovoids. What we do here is again use the lemma on intersection sizes, but this time we show that for every point x, every semi-singular hyperplane  with center x (again, see Section 3 of the paper for the definition) of S intersects every ovoidal hyperplane containing x in more than 3 points. From this we show that there are no type 3 points in any generalized hexagon S' which contains S has a full subgeometry.

Sadly, we have reached our computational limitations at q = 4 and it seems like for larger cases we will need more ideas and a better understanding of these three types of hyperplanes in split Cayley hexagons and their duals. If in general one can prove that every semi-singular hyperplane of \mathrm{H}(q)  (or \mathrm{H}(q)^D) through a fixed point x  intersects every ovoidal hyperplane containing x (whenever such a hyperplane exists) in more than q points, then this would imply that there is no semi-finite generalized hexagon with q + 1 points on each line, containing \mathrm{H}(q) (or \mathrm{H}(q)^D) as a subgeometry. That would be really nice.

There are some natural open problems arising from this work which we list in the last section of our paper. In particular, I would be really happy to see a proof of the following conjecture in the near future.

Every generalized hexagon containing the generalized hexagon of order (2, 1), which arises from the incidence graph of the Fano plane, is finite.

This is the smallest case where none of our techniques work.

Posted in Incidence Geometry | Tagged , , , | Leave a comment

Applications of Alon-Furedi to finite geometry

In a previous post I discussed how the Alon-Furedi theorem serves as a common generalisation of the results of Schwartz, DeMillo, Lipton and Zippel. Here I will show some nice applications of this theorem to finite geometry (reference: Section 6 of my paper with Clark, Potukuchi and Schmitt). First, let’s recall the statement of Alon-Furedi.

Theorem 1 (Alon-Furedi). Let A = A_1 \times \cdots \times A_n be a finite grid in \mathbb{F}^n where |A_i| = a_i. Let f \in \mathbb{F}[t_1, \dots, t_n] be an n-variable polynomial of degree d such that the set \mathcal{U}_A(f) = \{x \in A : f(x) \neq 0\} is nonempty (in other words, f does not vanish on the entire grid). Then |\mathcal{U}_A(f)| \geq \mathfrak{m}(a_1, \dots, a_n; \sum_{i = 1}^n a_i - d). Moreover, the bound is sharp.

Here \mathfrak{m}(a_1, \dots, a_n; k) denotes the minimum value of the product y_1 \cdots y_n where y_i‘s are integers satisfying y_1 + \cdots + y_n = k and 1 \leq  y_i \leq a_i for all i. See my post on balls in bins for some details about this function. Thus, the Alon-Furedi theorem gives us a sharp lower bound on the number of non-zeros a polynomial function of degree at most d in a finite grid, given that the polynomial does not vanish on the entire grid.

If we take \mathbb{F} to be the finite field \mathbb{F}_q, and A_i = \mathbb{F}_q for all i, then this theorem is in fact equivalent to a 1968 result of Kasami, Lin and Peterson, that determines the minimum Hamming distance of generalized Reed-Muller codes. The generalized Reed-Muller code over \mathbb{F}_q is obtained by evaluating each reduced polynomial (degree in each variable at most q - 1, see this) of degree at most d on the points of \mathbb{F}_q^n. Clearly, the minimum Hamming distance of this linear code is the minimum number of non-zeros a reduced polynomial of degree at most d can have. But that’s precisely what the Alon-Furedi theorem is about! After an easy computation one can show that if d = a(q - 1) + b where 0 < b \leq q - 1, then this minimum is equal to (q - b)q^{n - a - 1}, which is what Kasami-Lin-Peterson proved in 1968 (using more technical arguments involving BCH codes). This particular case of Alon-Furedi is what we need for most of our applications to finite geometry, and thus these results can also be seen as applications of the Kasami-Lin-Peterson theorem.

First let’s show that the original problem studied by Alon-Furedi, which also appeared as problem 6 in 2007 International Maths Olympiad, can be solved using Theorem 1. We are given a finite grid A_1 \times \cdots \times A_n  in \mathbb{F}^n and we are asked to show that the number of hyperplanes required to cover all points of the grid except one is equal to \sum (|A_i| - 1). Associate each hyperplane to the linear polynomial that defines it, and take the product of these linear polynomials. Then the set of points covered by the hyperplanes is equal to the set of zeros of this polynomial. If the degree of this polynomial, which is equal to the number of hyperplanes, is less than \sum (|A_i| - 1), then by Theorem 1 there are at least \mathfrak{m}(|A_1|, \dots, |A_n|; n + 1) \geq 2 points of the grid which are not covered, a contradiction.

Now let’s look at this problem of hyperplane covering for the finite projective and affine spaces over \mathbb{F}_q. It would be important to remember that the projective space PG(n, q) can be obtained from the affine space AG(n, q) (which is isomorphic to \mathbb{F}_q^n after fixing a point), by adding a hyperplane at infinity. Let H_1, \dots, H_k be k hyperplanes in PG(n, q) which do not cover all the points. Treating H_k as the hyperplane at infinity, we get hyperplanes H_1', \dots, H_{k-1}' in \mathbb{F}_q^n which do not cover all the points of the grid \mathbb{F}_q^n. Therefore, by Theorem 1, there are at least \mathfrak{m}(q, \dots, q; nq - k + 1) points of PG(n, q) which are missed by these hyperplanes. Such a collection of hyperplanes is known as a partial cover and the points missed are known as holes. By computing this function, we have the following corollary, which is a stronger version of a result of S. Dodunekov, L. Storme and G. Van de Voorde:

If 0 \leq a < q, then a partial cover of PG(n, q) of size q + a has at least q^{n-1} - aq^{n-2} holes.

By projective duality, we can translate our statement to sets of points in PG(n, q) which do not meet all hyperplanes. In fact, we get the following nice result for affine spaces by noticing that a subset of affine points in PG(n, q) does not meet the hyperplane at infinity:

Theorem 2. Let S be a set of k points in AG(n, q). Then there are at least \mathfrak{m}(q, \dots, q; nq - k + 1) - 1 hyperplanes of AG(n, q) which do not meet S.

A direct corollary of Theorem 2 is the famous result of Jamison/Brouwer-Schrijver on affine blocking sets (sets of points which meet all hyperplanes, i.e., contain at least one point of each hyperplane):

The minimum number of points required to block all hyperplanes in AG(n, q) is n(q - 1) + 1.

(Proof: if you take less than that many points, then by Theorem 2, there will be at least one hyperplane which does not meet the set of points)

We can obtain another interesting result on blocking sets, this time in PG(n, q). A point x of a blocking set B in PG(n, q) is called an essential point if removing x from B we have a set which does not meet all hyperplanes. Clearly, through every essential point of B, there must be a hyperplane H such that H \cap B = \{x\}. These are known as tangent hyperplanes. How many tangent hyperplanes can be there through an essential point of a blocking set?

Theorem 3. Let B be a blocking set in PG(n, q) and x an essential point of B. Then there are at least \mathfrak{m}(q, \dots, q; nq - |B| + 2) tangent hyperplanes through x.

(Proof: fix a hyperplane through x and treat it as the hyperplane at infinity; removing this hyperplane reduces the problem to Theorem 2 if we take S = B \setminus \{x\}.)

A nice direct corollary of Theorem 3 which I noticed today is a lower bound on the size of a blocking semioval that is almost the same as the bound obtained by Dover in 2000. A semioval in PG(2, q) is a set S of points with the property that for every point x in S, there is a unique line tangent to S at x (this property is satisfied by ovals but it does not characterise them). The classical examples of semiovals are obtained from polarities of projective planes. A semioval is called a blocking semioval if it is also a blocking set. There is a general lower bound of q + \sqrt{q} + 1 on the size of a blocking set in an arbitrary projective plane of order q given by Bruen, but for blocking semiovals we can obtain a much better lower bound. From Theorem 3, it follows that if |B| < 2q, then there are at least \mathfrak{m}(q, q; 3) \geq 2 tangent lines through an essential point of B. Therefore, if B is a blocking set of size less than 2q, then B cannot be a semioval.

Posted in Combinatorics, Finite Geometry, Polynomial Method | Tagged , , | Leave a comment

The Ellenberg-Gijswijt bound on cap sets

Four days back Jordan Ellenberg posted the following on his blog:

Briefly:  it seems to me that the idea of the Croot-Lev-Pach paper I posted about yesterday can indeed be used to give a new bound on the size of subsets of F_3^n with no three-term arithmetic progression! Such a set has size at most (2.756)^n. (There’s actually a closed form for the constant, I think, but I haven’t written it down yet.)

Here’s the preprint. It’s very short. I’ll post this to the arXiv in a day or two, assuming I (or you) don’t find anything wrong with it, so comment if you have comments!

Then two days later Dion Gijswijt made this comment on Ellenberg’s post:

What a nice result, congratulations!

Last week, I’ve also been writing a proof of an upper bound O(p^{cn}), c<1 for progression-free sets, in Z_p^n based on the same paper of Croot, Lev and Pach, see


So I guess, you beat me to it! Also, your constant for the case p=3 is better than mine (O(2.76^n) vs O(2.84^n)). I think the CLP-paper is a real gem in its simplicity.

This is a true breakthrough, settling an important open problem in combinatorics that survived even after a lot of effort from several mathematicians (see this, this, this and this). Ultimately, just like the finite field Kakeya problem, it succumbed to the polynomial method. I really like the following comment by Ben Green on a Facebook post: “It’s amazing how the polynomial method seems to be so orthogonal to other methods. It’s a bit like Dvir’s proof of the finite field Kakeya – in a couple of pages, it sweeps aside a large body of partial results using complicated combinatorial and analytic methods to get much weaker statements.”

Problem statement: Find the maximum possible size of a subset of \mathbb{F}_q^n which does not contain any three distinct elements in arithmetic progression.

A set with such a property is known as a cap set, a name probably inspired from the fact that the early examples of such sets in three dimension came from elliptic quadrics over finite fields, which look like hats (or caps) in real spaces. The elliptic quadrics are 3-dimensional examples of affine caps; sets of points  which do not contain any three points collinear with the same line (this condition is stronger than the condition above when q > 3 as it says that for any three distinct elements a, b, c we have c - b \neq \lambda (b -a) for all \lambda \in \mathbb{F}_q \setminus \{0, -1\}.). The oldest reference that I could find mentioning the term “cap” for such sets is this 1958 paper by Segre: On Galois Geometries. When q = 3, affine caps are related to the game SET and you can read the following brilliant paper by B. L. Davis and D. Maclagan which explains this connnection: The card game SET.

Ellenberg and Gijswijt  have proved that a cap set in \mathbb{F}_q^n has size at most a constant times the number of monomials x_1^{e_1}x_2^{e_2}\cdots x_n^{e_n} of total degree at most (q-1)n/3 that satisfy 0 \leq e_i \leq q - 1 for all i.

Understanding their beautiful proof requires only some knowledge of basic linear algebra and polynomials. I’ll try to explain the main arguments here. Ultimately, one also needs to estimate the total number of monomials that satisfy the conditions above, for which I will refer to the preprints of Ellenberg and Gijswijt, and to the comments section of the blog post by Ellenberg. One crucial thing to note about the estimates is that this bound is good only because (q-1)n/3 is strictly less than the mean (q-1)n/2.

Let q be an odd prime power and let \mathcal{P}_d(n, q) denote the set of all polynomials f in \mathbb{F}_q[x_1, \dots, x_n] that satisfy \deg f \leq d and \deg_{x_i} f \leq q - 1 for all i, aka, the set of reduced polynomials of degree at most d. The set of all reduced polynomials with no restriction on the total degree is simply denoted by \mathcal{P}(n, q). There is a vector space isomorphism between \mathcal{P}(n, q) and the space of all \mathbb{F}_q-valued functions on \mathbb{F}_q^n given by evaluating the polynomials. Let k_d denote the dimension of \mathcal{P}_d(n, q), which is equal to the total number of integral solutions of e_1 + \dots + e_n \leq d and 0 \leq e_i \leq q - 1. Clearly, k_{n(q - 1) - d} = q^n - k_d via the bijection (e_1, \dots, e_n) \mapsto (q - 1 - e_1, \dots, q - 1 - e_n).

For a 3-term arithmetic progression free subset A of \mathbb{F}_q^n, the sets A + A = \{a + a' : a, a' \in A, a \neq a'\} and 2A = \{a + a : a \in A\} are disjoint as otherwise we will have three distinct elements a, b, c in A satisfying a + c = 2b.
Let U be the subspace of \mathcal{P}(n, q) consisting of all polynomials that vanish on the complement of 2A.
Then \dim U = |2A| = |A| (since q is odd) and thus we try to find upper bounds on \dim U.
For any integer d \in \{0, 1, \dots, n(q - 1)\} let U_d be the intersection of \mathcal{P}_d(n, q) with U.
Then since the span \langle U, \mathcal{P}_d(n, q) \rangle is a subspace of  \mathcal{P}(n, q), by the dimension formula we have \dim U \leq \dim \mathcal{P}(n, q) - \dim \mathcal{P}_d(n, q) + \dim U_d, or equivalently, |A| \leq q^n - k_d + \dim U_d = k_{n(q - 1) - d} + \dim U_d.

Now the crux of the proof is Proposition 1 in Jordan’s preprint, which is essentially equivalent to Lemma 1 of Croot-Lev-Pach’s paper, and says that every (reduced) polynomial of degree at most d which vanishes on A + A has at most 2 k_{d/2} non-zeros in 2A. The proof of this is an interesting dimension argument which I will skip for now (also see this old mathoverflow question by Seva).

Since A + A is a subset of the complement of 2A, the above proposition implies that every element of U_d, when seen as an element of \mathbb{F}_q^{\mathbb{F}_q^n} via the evaluation isomorphism, has at most 2 k_{d/2} non-zero coordinates. I leave it as an exercise to show that a vector subspace V of F^n with the property that every element of V has at most k non-zero coordinates must have its dimension less than or equal to k (hint: row reduction). And thus \dim U_d \leq 2 k_{d/2}.

Therefore, from the earlier inequality on |A|, we get |A| \leq k_{n(q - 1) - d} + 2k_{d/2}.
Finally, observe that n(q - 1) - d = d/2 when d = 2(q - 1)n/3, which gives us the bound |A| \leq 3 k_{(q - 1)n/3} (whenever (q - 1)n/3 is an integer).
The reason why this is a good bound is that k_{d} is bounded above by q^{\lambda n} for some \lambda < 1 whenever d < (q - 1)n/2 (see the preprints).

Some questions:
(1) Can we obtain a better bound on affine caps since they are much more restrictive than cap sets?
(2) Do these techniques say anything about sets which do not contain any full line, i.e., the complements of affine blocking sets with respect to lines? As Ferdinand has pointed out in this comment, we have a general upper bound of the form q^n - 2q^{n - 1} + 1 there.
(3) What about sets which do not contain any k terms in arithmetic progression, for k > 3?

Further Reading:
[1] Mind Boggling: Following the work of Croot, Lev, and Pach, Jordan Ellenberg settled the cap set problem! by Gil Kalai
[2] Polymath 10 Emergency Post 5: The Erdos-Szemeredi Sunflower Conjecture is Now Proven. by Gil Kalai
[3] A symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound by Terence Tao
[4] Many Zero Divisors in a Group Ring Imply Bounds on Progression–Free Subsets by Fedor Petrov
[5] Large caps in projective Galois spaces, by Jürgen Bierbrauer and Yves Edel
[6] Zero-sum problems in finite abelian groups and affine caps, by Edel et al.
[7] Sumsets and sumsets of subsets, by Jordan Ellenberg.
[8] Bounds for sum free sets in prime power cyclic groups – three ways, by David speyer.
[9] Talk by Jordan Ellenberg at Discrete Math Seminar in IAS, Princeton.

Posted in Combinatorics, Finite Geometry, Polynomial Method | Tagged , , , , , | 6 Comments