## Polymath Junior

This summer I will be one of the main mentors for a project on Ramsey numbers as a part of the Polymath REU. This is my first time being involved with this program and I am super excited about it.

If you are an undergraduate student, anywhere in the world, interested in doing some mathematical research this summer, then please consider applying! The applications deadline is 1st April, 2022. Here is the link to apply.

## A short video on Ramsey numbers

I was recently involved in making a 1 minute maths video for a contest organised by Veritasium. Here is the main requirement for the video

We are looking for videos that clearly and creatively explain complex or counterintuitive concepts in the fields of Science, Technology, Engineering, and Mathematics.

This project started with Aditya Potukuchi telling me about the contest some weeks back after which we immediately started brainstorming for possible topics. I then suggested that we include my brother Anup Bishnoi and his wife Jyoti Arora in the project as they love creating various forms of learning content. In fact, they have founded a company around this called Not A Bot Studios (go check it out!). I introduced to them what Ramsey numbers were, as Aditya and I decided that could be a suitable topic for a 1 minute video. They seemed quite amazed by this mathematical concept and especially surprised by the fact that we don’t even know what the Ramsey number R(5) is. Thus, that became the punchline of our video. We worked on this video for about a week and had a lot of fun doing it! The involvement of my partner Aparajita Nath later on made it even better. I am quite happy with the final outcome as we have been getting some great comments.

As this is the first time I have been involved with something like this, I though I’ll share my experience and learnings, with the aim of encouraging other mathematicians to do something similar.

Here are some of my key learnings from this project:

• Parsing mathematical statements is really hard.
As mathematics students we get trained in parsing statements like “Either A or B is true” and using quantifiers like “for all”, “there exists”, etc. This is not the case for most people and they can find it extremely difficult to understand such statements, especially in such a short video. We tried to make it easier by using some visual aids. For example, we showed a bunch of possible arrangements of allies/enemies (as a two-edge-colouring of the complete graph) when saying that you can always find 3 mutual allies or 3 mutual enemies. Even after that it didn’t look like it’ll be immediately clear to a first time viewer that we are talking about all possible arrangement and so we decided to make it interactive by asking the viewer to pause the video and try it out on their own.
• Get a lot of feedback!
When we made the first version of this video all of us were pretty happy with the outcome, until we tried it out with a bunch of our friends who had no idea about the topic. Many of them couldn’t follow it at all. Even after multiple watchings! The feedback we got from them made us realise that because we had already spent a lot of time working on this, we all had a good idea of what Ramsey numbers were, and so we just started focussing on making it visually appealing and fun. The explanation part took a back seat. Because of the feedback, we were able to fix this. Even though it required a lot of rewriting of the script and redoing of the animation, I am so glad that we didn’t put out that version of the video.
• Make it fun to watch, without compromising the subject matter.
No matter what audience you are aiming for, it is going to be really helpful to add fun elements that they can all enjoy. Make them laugh and make them care about what you are teaching. Memes are your friends!
One important thing to ensure though is that these fun elements don’t take away anything from the actual concept that you are trying to explain. For example, we based our script around the world of the widely popular tv series Game of Thrones (which also had a widely unpopular last season), but made sure that the specific characters and all the insider jokes are not necessary in any way to understand Ramsey numbers. A friend pointed out how he started recalling the tv show instead of focussing on the maths. We tried to fix this in the final version of the video, but I am not sure if we have still managed to remove all such distractions.

• Keep it clear and simple
Both the script and visuals should be as clear and simple as possible. Complicated mathematical terms or english words can be very confusing, especially in a short video like this. It took as many iterations to get to the final version, but I am glad that we made a conscious effort of keeping it simple.

As I said before, it was great fun to make this video that can explain Ramsey numbers to a general audience in one minute and I wish I can do more such things in future, especially with such great collaborators. Go check out the video and share it around!

## A postdoc position at TU Delft

I would like to advertise a 3 year postdoc position at my research group. Please apply if you are interested in the kind of research that Dion and I do (https://anuragbishnoi.wordpress.com/publications/). The deadline is April 15 2021. Here are some further details:

A three-year postdoctoral position is available at the Optimisation and Discrete Mathematics group (https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/applied-mathematics/optimization) of the Applied Mathematics department (DIAM) at TU Delft. The preferred starting date is September 2021, with some flexibility.

The successful candidate is expected to hold a PhD in mathematics or closely related field by the start of the appointment, and have research interests aligned with our group, e.g. extremal combinatorics, finite geometry or algebraic methods in combinatorics/optimization. The position also entails some teaching and supervision responsibilities.

Qualified candidates are invited to send the following application materials to Dion Gijswijt (D.C.Gijswijt@tudelft.nl):

– Curriculum Vitae
– Publication List
– Research Statement (1-2 pages)
– Three contacts for letters of recommendation

Informal enquiries related to the position are encouraged and may be directed to the above address.

The deadline for the application is April 15, 2021 or until the position is filled.

Dion Gijswijt and Anurag Bishnoi

## A coding theoretic application of the Alon-Füredi theorem

The Alon-Füredi theorem is something that I have written a lot about in this blog. I spent a considerable amount of time on this theorem during my PhD. In fact, it’s generalisation that I obtained and it’s applications in finite geometry (blocking sets), coding theory (Reed-Muller codes), and computer science (polynomial identity testing), constitute my most cited paper. I have recently learned another nice coding theoretic application of this result, which I would like to share in this post.

Coding theory is an important branch of mathematics with wide ranging applications, like data compression, transmission, storage. One of the main object studied in this area is a linear code. An $[n, k, d]_q$ code is a $k$-dimensional vector subspace of $\mathbb{F}_q^n$ in which the minimum Hamming weight of vectors is equal to $d$. The elements of the vector subspace are known as codewords. The parameter $n$ is known as the length of the code and $d$ as its minimum distance which is related to the number of errors that can be corrected by the code (higher the value of $d$, higher the error-correcting capacity of the code). The main problem in coding theory is to obtain the largest possible value of $k$, given $n, d, q$ and the smallest possible $n$ given $k, d, q$. Some of the fundamental bounds for these problems are the Hamming bound, the Singleton bound and the Griesmer bound. Some famous examples of codes that are optimal w.r.t. these bounds are the Hamming code, Golay code, and Reed-Solomon codes. Along with being useful in practice, these codes are also related to beautiful and deep mathematical theories. Here is a nice introduction to the subject by Venkatesan Guruswami.

The study of codes with certain extra restrictions has also developed because of some specific examples. One such restriction, is that every codeword is minimal, in the sense that if $\sigma(c) = \{i : c_i \neq 0\}$ is the support of a codeword $c$ and $c'$ is any codeword with $\sigma(c') \subseteq \sigma(c)$ then $c' = \lambda c$ for some $\lambda \in \mathbb{F}_q$. Such minimal codewords have been used in secret sharing schemes and decoding algorithms. Here is an early paper discussing some properties of minimal codewords. Naturally, one expects that minimal codes will have stronger bounds on their parameters. That’s precisely where Alon-Füredi shows up. In a recent paper, Alfarano, Borello, Neri and Ravagnani have applied this theorem and related results to prove the following:

Theorem 1. In a minimal $[n, k, d]_q$ code, $d \geq (q-1)(k - 1) + 1$.

Theorem 2. In a minimal $[n, k, d]_q$ code, $n \geq (q + 1)(k - 1)$.

Here is their short proof of Theorem 1 using the Alon-Füredi bound.

Every codeword in a minimal code is also maximal. Let $c$ be a non-zero codeword. Then $c = u G$, where $G$ is the $k \times n$ generator matrix and $u$ is a non-zero vector in $\mathbb{F}_q^k$. Let $f_i \in \mathbb{F}_q[x_1, \dots, x_k]$ be the linear polynomial given by $c_i \cdot x = 0$ where $c_i$ is the $i$-th column of $G$. Consider the polynomial $f = \prod_{i \in \sigma(c)} f_i.$ Then $f(u) \neq 0$, and in fact for any vector $v$, $f(v) \neq 0$ if and only if $\sigma(c) \subseteq \sigma(v G)$, which by the maximality of $c = uG$ is only possible if $v = \lambda u$ for some $\lambda$. Therefore, $f$ is non-zero at exactly $q - 1$ elements of $\mathbb{F}_q^k$. From the Alon-Füredi bound, we know that $f$ is non-zero at at least $(q - b)q^{k - a - 1}$ points where $a, b$ are unique non-negative integers for which $\deg f = a(q - 1) + b$, and $1 \leq b \leq q - 1$. From $q - 1 \geq (q - b)q^{k - a - 1}$, we can conclude that $k - a - 1 = 0$, and thus $\deg f = (q - 1)(k - 1) + b \geq (q - 1)(k - 1) + 1$. But note that $\deg f$ is precisely the Hamming weight of $c$, and hence the minimum distance of the code is at least $(q - 1)(k - 1) + 1$. $\square$

The proof of Theorem 2 is slightly more involved than Theorem 1, and so I refer you to the paper for that. The same proof appears in this recent paper in a geometric language. Note that one can derive lower bounds on $n$ using Singleton bound or the Griesmer bound in combination with Theorem 1, but these are weaker than the bound in Theorem 2.

Interestingly, Theorem 1 can also be proved via the main Alon-Füredi theorem, instead of the Alon-Füredi bound (which is in fact a result of Jamison and Brouwer-Schrijver when applied to $\mathbb{F}_q^n$), i.e., any polynomial $f$ that vanishes on all points except one in $\mathbb{F}_q^n$ has degree at least $n(q-1)$ (see Theorem 22 in this or Theorem 3.7 in this).

## Bilinear forms and diagonal Ramsey numbers

The recent breakthrough of Conlon and Ferber has shown us that algebraic methods can be used in combination with probabilistic methods to improve bounds on multicolour diagonal Ramsey numbers. This was already shown for the off-diagonal Ramsey numbers by Mubayi and Verstraete last year, as discussed here. Sadly, for two colours we still don’t have any improvement over the classic probabilistic bound proved by Erdős in 1947 (except for some constant factors in front of the exponential function). More concretely, we haven’t been able to prove $R(t, t) > c^t$, for any $c > \sqrt{2}$ since 1947, despite considerable effort by various mathematicians. Therefore, it might sound reasonable to think that $R(t, t)$ is equal to $2^{t/2 + o(t)}$, but the exponential improvement over the (purely ) probabilistic lower bounds on multicolour Ramsey numbers should cast some doubt on that.

In an end remark Conlon and Ferber mention that the lower bound of $2^{t/2 + o(t)}$ can be proved in a new way, using the arguments of their paper, where we have a mix of algebra and probability. This sounds like a very promising approach for further improvements as well. In this post I want to explore that, and attempt to explain why we will need some really new ideas to improve the lower bound. In the process, we will see some basic theory of bilinear forms, which is crucial to the arguments. For a more in depth exploration of the topic, see these notes of Keith Conrad and Chapter 3 of the book by Simeon Ball.

Given a vector space $V$ over a field $F$, a bilinear form is a function $b: V \times V \rightarrow F$, such that for any fixed $u, v \in V$, both $b(u, y) : V \rightarrow F$ and $b(x, v) : V \rightarrow F$ are linear functions. This is an abstraction of the dot product, which is so crucial in understanding the geometry of the real space $\mathbb{R}^n$. Inspired by that, we define a vector $u$ to be orthogonal to a vector $v$ if $b(u, v) = 0$. There is a certain lack of symmetry in the definition here, namely, $b(u, v) = 0$ doesn’t necessarily imply $b(v, u)$, or in other words, we could have $u$ orthogonal to $v$ but $v$ non-orthogonal to $u$. To fix that we introduce the notion of reflexivity and call a bilinear form reflexive if $b(u, v) = 0$ implies $b(v, u) = 0$ for all $u, v \in V$. Once have have a reflexive bilinear form we can talk about orthogonality of two vectors. From now on, $b$ will denote a reflexive bilinear form.

While this orthogonality defined in terms of reflexive bilinear forms might seem like a very general notion, it ends up being restricted to only two cases. A fundamental result, attributed to John von Neumann, states that for any vector space $V$, a reflexive bilinear form $b$ on $V$ is of one of the following two types:

1. Symmetric: $b(u, v) = b(v, u)$ for all $u, v \in V$.
2. Alternating: $b(u, u) = 0$ for all $u$.

For example, the dot product on $\mathbb{R}^n$ defined by $(x_1, \dots, x_n) \cdot (y_1, \dots, y_n) = x_1y_1 + x_2y_2 \cdots x_ny_n$ is a symmetric bilinear form, whereas the function $b(x, y) = x_1y_2 - x_2y_1 + x_3y_4 - x_4y_3 + \cdots x_{n-1} y_n - x_n y_{n - 1}$ (assuming $n$ to be even) is alternating. These examples are in fact examples of what we call non-degenerate bilinear forms, i.e., form with respect to which no vector is orthogonal to every vector of the vector space.

One interesting property of alternating forms is that they are skew symmetric. Indeed, we have $b(u + v, u + v) = 0$ for any $u,v$ and thus $b(u,v)=-b(v,u)$. If the characteristic of the underlying field $F$ is even, then this shows that every alternating form is in fact symmetric. The following lemma gives us a relation in the other direction as well.

Lemma 1: If $b$ is a symmetric bilinear form on a vector space $V$ over a field $F$ of characteristic $2$, then $V=S\oplus W$ where $\dim S \leq 1$ and the restriction of $b$ on $W$ is an alternating form.

For example, if we take $b(x, y)=x_1y_1 + \cdots x_n y_n$ over $\mathbb{F}_2^n$, then we can take $W = \{u \in \mathbb{F}_2^n : b(u,u) = 0\}$, which is equal to the hyperplane $\{u\in \mathbb{F}_2^n:\sum u_i = 0\}$, since $a^2=a$ for any $a \in \mathbb{F}_2$. The form restricted to $W$ is alternating by the definition of $W$.

One of the basic notions associated to a bilinear form is that of an isotropic subspace, which shows up in the construction of Conlon and Ferber as well, as discussed in the previous post. An isotropic subspace is a subspace $S$ such that $b(u, v) = 0$ for any two, not necessarily distinct, $u,v$ in $S$ (such a subspace is usually a totally isotropic but we’d ignore that here). For an arbitrary subset $A$ of $V$, if we denote the set $\{u\in V:b(u,v)=0\forall v\in A\}$ by $A^\perp$, then $S$ being isotropic is the same as saying $S \subseteq S^\perp$. Note that $A^\perp$ is always a vector subspace. The following is another fundamental result whose corollary is what we used in the previous post.

Lemma 2: Let $b$ be a non-degenerate reflexive bilinear form over a finite dimensional vector space $V$. Then for any subspace $S$, we have $\dim S + \dim S^\perp = \dim V$.

Corollary 3: An isotropic subspace with respect to a non-degenerate (reflexive) bilinear form over an $n$ dimensional vector space has dimension at most $n/2$.

The max dimension of an isotropic subspace is known as the Witt index of the bilinear form. It can be shown that every non-degenerate alternating form has Witt index exactly equal to $n/2$ (see Theorem 3.8 in Simeon Ball’s book). This also implies that there are no non-degenerate alternating forms over odd dimensional vector spaces.

Let us now look at the graph that plays the main role in the construction of Conlon and Ferber. Let $G$ be the orthogonality graph on the set of isotropic vectors in $\mathbb{F}_2^t$ with respect to the bilinear form $b(x, y) = x_1y_1 + \cdots x_t y_t$, with $t$ even. Then as we saw earlier, the set $S$ of isotropic vertices form a $t-1$ dimensional vector subspace on which $b$ is an alternating form. Moreover, this alternating form is degenerate because $(1, \dots, 1)$ is orthogonal to every other vector in $S$ since $t$ is even. As there is no other isotropic vector which is orthogonal to every other isotropic vector, the set vector space $S$ can be decomposed as $S = \langle (1, \dots, 1) \rangle\oplus \mathbb{F}_2^{t\text{-}2}$ such that the form induced on $\mathbb{F}_2^{t\text{-}2}$ is a non-degenerate alternating form. Therefore, we should be able to understand this graph starting from the non-degenerate alternating form. Let’s do that and start over.

Let $t$ be an even integer, and $b$ a non-degenerate alternating form on $\mathbb{F}_2^{t\text{-}2}$. Let $G$ be the orthogonality graph on these vectors, that is, $u \sim v$ if $b(u, v) = 0$. We can notice the following property.

Lemma 4: There are no independent sets of size $t$ in $G$.
Proof. Let $x_1, \dots, x_t$ be an independent set, and say $\sum a_i x_i = 0$. Then by applying $b$ on this sum and each $x_i$ we get that $(a_1, \dots, a_t)$ is in the null space of $J - I$. But $J - I$ has eigenvalues $t-1$ and $\text{-}1$, which are both non-zero (since $t$ is even), and thus $J - I$ is non-singular. Therefore, $a_i = 0$ for all $i$, and these vectors form an independent set in the vector space. This is a contradiction since the dimension of the vector space is strictly less than $t$.

While there can’t be any large independent sets, $G$ certainly has really large cliques. Any (totally) isotropic subspace of dimension $(t - 2)/2$ gives rise to a clique of size $2^{t/2 - 1}$, and in fact these are the largest possible cliques. So, trivially $G$ does not give any good lower bound on the Ramsey number $R(t, t)$. Still, we can apply the basic idea of Conlon and Ferber (which was also the main idea of Mubayi and Verstraete) and take a random induced subgraph of $G$, hoping that it can destroy all large cliques while giving us a graph of reasonable size. (Note that any induced subgraph will also be free of any independent sets of size $t$.) For that, let’s first count the total number of cliques of size $k$ in $G$.

Every clique in $G$ spans an isotropic subspace of $\mathbb{F}_2^{t-2}$, which has a certain rank $r$. The number $N_{k,r}$ of cliques of size $k$ and rank $r$ can be bounded from above by $2^{t-2} 2^{t - 3} \cdots 2^{t - 2 + r -1} 2^{(k - r)r}$, where the first $r$ terms correspond to choosing $r$ linearly independent members of the clique and then we pick the remaining $k - r$ elements from this $r$-dimensional subspace. By simplifying, we have

$N_{k, r} \leq 2^{(t+ k - 3(r+1)/2)r}$.

This number is increasing in $r$ from $r = 1$ until $r = (t + k)/3 - 1$, and then decreases till $r = 2(t + k)/3 -1$, where it finally becomes $1$ (unless $2(t + k)/3 - 1 > t - 1$, which is a trivial case that we’ll ignore). But, we know that $r \leq t/2$ from Corollary 3, and hence $\max_{r} N_{k, r} = N_{k, t/2 - 1}$ for $(t + k)/3 - 1 \geq t/2 - 1$, i.e., for $k \geq t/2$, and $\max_{r} N_{k, r} = N_{k, (t + k)/3 - 1}$ for $k \leq t/2$. While we are interested in $N_{t} = \sum_{r = 1}^{t/2 - 1} N_{t,r}$, we will soon need the estimates for smaller cliques as well.

Repeating the probabilistic argument of Conlon and Ferber, we see that if we sample each vertex independently with probability $p = n2^{-t + O(1)}$, then there exists a subset of vertices of size $n$ inducing no clique of size $t$ as long as we can ensure that $n^{t} 2^{-t^2 + o(t^2)} N_t < 1/2$. Since $N_t \leq \frac{t}{2} \max_{r} N_{t,r} = \frac{t}{2} N_{t, t/2} \leq 2^{5t^2/8 + o(t^2)}$, we can take $n = 2^{3t/8 + o(t)}$, which then gives us the bound $R(t, t) > 2^{3t/8 + o(t)}$ on the Ramsey number $R(t,t)$. This is worse than the purely probabilistic bound of $2^{t/2}$. Can we do anything better?

Here’s an idea that is extracted from the paper of Conlon and Ferber. We can look at cliques, and independent sets, of size smaller than $t$ in this graph. Perhaps for some $0 < a < 1$, we can get good lower bound on $R(at, at)$ by taking a random induced subgraph of $G$ that does not contain any cliques of independent sets of size $at$.

For that, let’s count the number $I_k$ of independent sets of size $k$ in $G$. For $k$ even, the proof of Lemma 4 tells us that any such set will also be an independent set of vectors. Therefore, $I_k$ is at most $\prod_{i = 0}^{k-1} 2^{t - i - 2}$ or $2 \prod_{i = 0}^{k-2} 2^{t - i - 2}$, depending on whether $k$ is even or odd. In any case, for $k = at$, we have $I_{at} \leq 2^{(2a - a^2)t^2/2 + o(t^2)}$.

Recall that we had $N_{at} \leq 2^{(1+a)^2 t^2/6 + o(t^2)}$ for $a \leq 1/2$ and $N_{at} \leq 2^{(4a+1)t^2/8 + o(t^2)}$ for $a \geq 1/2$. Therefore, we see that asymptotically $N_{at}$ is always bigger than $I_{at}$. Thus, the probabilistic argument from before gives us a subset of $n$ vertices on which the induced graph has no cliques of size $at$ and no independent sets of size $at$ as long as $n^{at} 2^{-at^2 + o(t^2)} (N_{at} + I_{at}) < 1/2$, and since $N_{at} \geq I_{at}$ (for $t$ large enough), it suffices to pick an $n$ for which $n^{at} 2^{-at^2 + o(t^2)} N_{at} < 1/2$. We can achieve this by taking $n = 2^{cat + o(t)}$ where $c = (4a - 1)/(8a^2)$ for $a > 1/2$ and $c = (4a - a^2 - 1)/(6a^2)$ for $a \leq 1/2$.

Sadly, both of these values of $c$ are $\leq 1/2$, with equality only at $a = 1/2$. Therefore, we are always worse than the basic probabilistic bound except for the case of $a = 1/2$ where we can show $R(t/2, t/2) > 2^{t/4 + o(t)}$. This asymptotically matches the probabilistic bound but we arrive at it via a different (and much more complicated) construction. For improvements, using this easy method, it looks like we’ll have to consider some other graphs. This new approach definitely gives me hope that one can find graphs with a better control on the total number of cliques and independent sets, which can then be used to give an exponential improvement in the lower bound on two-colour diagonal Ramsey numbers.

## Covering the binary hypercube

A finite grid is a set $S = S_1 \times \cdots \times S_n \subseteq F^n$, where $F$ is a field and each $S_i$ is a finite subset of $F$. The minimum number of hyperplanes required to cover $S$ can easily be shown to be $\min |S_i|$, with the hyperplanes defined by $x_1 - \lambda = 0$, for $\lambda \in S_j$ providing such a cover (assuming $j$ is the index minimising $|S_i|$). Interestingly, if we forbid the hyperplanes to pass through one specific point of $S$, say $\vec{a} = (a_1, \dots, a_n)$, and then want to cover everything else, then the minimum number of hyperplanes required is equal to $\sum_{i = 1}^n (|S_i| - 1)$. This number can be pretty large compared to the previous one.

This is the famous result of Alon and Füredi from 1993 which was proved in its full generality using the polynomial method. They were answering a question of Komjáth, which was for the special case of $S = \{0, 1\}^n$, but this same result had in fact been proved already for another special case in the late 70’s. Jamison, and Brouwer-Schrijver, proved this bound for $F = \mathbb{F}_q$ and $S = \mathbb{F}_q^n$. They were motivated by a question in finite geometry about the blocking number of an affine space, asked by Jean Doyen, which is equivalent to the dual statement. Their proofs were also algebraic and in fact they are one of the first instances of what we now call the polynomial method (see this for a history). In finite geometry circles, this method was sometimes even referred to as the Jamison method.

In the 90’s, Bruen proved an interesting generalisation of the Jamison/Brouwer-Schrijver result, motivated by problems in finite geometry, where the (multi)set of hyperplanes covered every point except $\vec{a} \in \mathbb{F}_q^n$ at least $k$ times, while not covering $\vec{a}$ at all. He proved the lower bound of $(n + k - 1)(q-1)$ and asked about the sharpness of this bound. The sharpness was addressed in some later papers, for example by Simeon Ball and then Corrado Zanella. In 2009, Ball and Serra generalised this covering with multiplicities result to the case of arbitrary finite grids by proving a general lower bound on the degree of polynomials that vanish everywhere on the grid with a certain multiplicity except at one point. Recently, there has been a lot of interesting activity in this hyperplane covering problem with multiplicities, for the special case of binary hypercubes. This is what I want to focus on in this blog post. I also want to mention some interesting new results over the binary field $\mathbb{F}_2$ that I have proved in collaboration with Simona, Shagnik and Tamás, which nicely complements this recent work.

Let’s first write down the main extremal functions that we are interested in. Given a field $F$ and positive integers $n, k, s$, with $s \leq k - 1$, define a $(k, s)$-cover of $\{0, 1\}^n \subseteq F^n$ as a multiset $\mathcal{H}$ of hyperplanes with the property that there are exactly $s$ element of $\mathcal{H}$ passing through the origin and for every point $\vec{p}$ in $\{0, 1\}^n \setminus \{\vec{0}\}$ there are at least $k$ element in $\mathcal{H}$ passing through $\vec{p}$. We then define:

$h(n, k, s) = \{\min |\mathcal{H}| : \mathcal{H} \text{ is a } (k, s)-\text{cover of } \{0, 1\}^n\}$;

$g(n, k) = \min_{s} h(n, k, s)$;

$f(n, k) = h(n, k, 0)$.

Note that $g(n, k)$ has the following natural reformulation: it is the smallest number of hyperplanes that cover every non-origin point at least $k$ times but only cover the origin at most $k - 1$ times. All of these function are the same for $k = 1$ and equal to $n$ by the result of Alon and Füredi. It is also an easy exercise to show that $f(n, 2) = g(n, 2) = n + 1$ for any field $F$. I have hidden the field $F$ in the notation of these functions, and so I’ll make it clear what field we are talking about before making any statement.

Last year, Clifton and Huang studied the function $f(n, k)$ when $F = \mathbb{R}$ and proved the following results:

$f(n, 3) = n + 3$,

$n + k + 1 \leq f(n, k) \leq n + \binom{k}{2}$, for all $k \geq 4$ and $n \geq 3$, and

$f(n, k) = (1 + 1/2 + \cdots + 1/n + o(1))k$ for fixed $n$ and $k \rightarrow \infty$.

Moreover, they conjectured that we should have $f(n, k) = n + \binom{k}{2}$ for every fixed $k \geq 3$ and $n$ large enough as they couldn’t find any better construction than the one where you take the $n$ hyperplanes defined by $x_i = 1$ and $k - t$ copies of hyperplanes $\sum_{i = 1}^n x_i = t$, for $1 \leq t \leq k - 1$. This seems to be a very challenging problem.

Very recently, Sauermann and Wigderson have managed to improve the Clifton-Huang lower bounds on $f$ by proving a general result about polynomials vanishing with multiplicities on binary hypercubes. In particular, they prove the following results:

1. $h(n, k, k - 1) = n + 2k - 2$ for any field $F$.
2. $g(n, k) = n + 2k - 3$ for all $n \geq 2k - 3$ if we have $\mathrm{char}~F = 0$.

Note that since $f(n, k) \geq g(n, k)$ by the definition of these function, the second statement improves the lower bound of Clifton and Huang for $n \geq 2k - 3$ (as the field $\mathbb{R}$ has characteristic $0$). They also ask about extending their results to fields of positive characteristic and give some examples over finite fields that shows how their proof fails to give the same bound on $g(n, k)$ (or $f(n, k)$. In particular, over $\mathbb{F}_2$ they show that one can’t prove that $f(n, 4) \geq n + 5$ using their method.

Last year, while I was still in Berlin, I suggested studying these problems in our group workshop and we managed to obtain some interesting preliminary results for the case of $\mathbb{F}_2$ with my colleagues. We now have a paper, which is soon going to be posted on the arXiv, where we prove the following results for $F = \mathbb{F}_2$.

Theorem 1. For $n > 2^k$, $g(n, k) = n + 2k - 3$. For $k \geq 2^{n - 2}$ , $g(n, k) = 2k - \lfloor k/2^{n-1} \rfloor$.

Our proofs don’t involve any polynomials. We simply use some combinatorial arguments where one crucial ingredient in our proof is a coding theoretic interpretation of the function $f(n, k)$ that implies the following:

Theorem 2. For all $k \geq 2$ and $n \geq 1$, $f(n, k) \geq n + \lfloor (k -1)/2 \rfloor \log (2n/(k-1))$.

(One can also show the upper bound of $f(n, k) \leq n + (k - 1) \log(n/(k-1))$.)

Another important ingredient is the following statement, which we prove via induction, but it in fact follows from the work of Sauermann and Wigderson as well:

Lemma 3. For $n, k \geq 1$, we have $h(n, k, k - 1) = n + 2k - 2$.

Moreover, because our methods are combinatorial, we can generalise to covers using smaller dimensional subspaces, which is something that’s not at all easy to do in the polynomial method. If we denote by $g(n, k, d)$ the function where the cover involves $(n - d)$-dimensional affine subspace of $\mathbb{F}_2^n$, then we show the following:

Theorem 4. For $n > 2^{k2^d -d - k + 1}$, $g(n, k, d) = k2^d + n - d - 2$. For $k \geq 2^{n - d - 1}$, $g(n, k, d) = k2^d - \lfloor k/2^{n-d} \rfloor$.

This result in fact generalises the binary case of Jamison’s theorem who proved that for $F = \mathbb{F}_q$ we have $g(n, 1, d) = q^d - 1 + (n - d)(q - 1)$. (though there is no condition on $n$ being large enough in Jamison’s theorem)

Sadly, our methods don’t give us anything new for the case of $F = \mathbb{F}_q$ when $q > 2$. I believe that for larger fields one needs to improve upon the existing algebraic methods. Also, over $\mathbb{F}_2$ while we know that the threshold $k \geq 2^{n - 2}$ in Theorem 1 is tight, there is a lot of room for improvement in the bound $n \geq 2^k$. We suspect that except for some small exceptions (that arise due to the Golay code), the correct threshold should be $k + 1$. More specifically, we believe that $g(n, k) = n + 2k - 3$ for $n \geq \max\{k+1,13\}$. We do show that $g(k, k) \leq 3k - 4$ for all $k$, and thus this threshold of $k + 1$, if true, would be tight.

I would love to see a proof (or a counterexample) to the following simpler statement,

$g(n, k) = n + 2k - 3$ for all $n \geq poly(k)$.

It will also be very interesting to determine what happens between these extreme ranges of parameters, for example when $n \sim k^{c}$ for some constant $c < 1$. We leave these as open problems and we are looking forward to more new developments in this topic which can help solve them.

Edit 29/01/2021: The paper is now available on the arXiv.

## Improved lower bounds for multicolour diagonal Ramsey numbers

Big news in combinatorics today: David Conlon and Asaf Ferber have posted a 4-page preprint on arXiv that gives exponential improvements in the lower bounds on multicolour diagonal Ramsey numbers, when the number of colours is at least $3$ (also see the post by Gil Kalai). The best known lower bounds so far were completely probabilistic, combined with an inductive argument of Lefmann that gave better bounds for $\geq 4$ colours. The new construction on the other hand has a deterministic part to it, which in fact uses finite geometry. In this post I’ll describe their construction, and also show how to improve their bounds further using a simple observation.

The multicolour Ramsey number $R(t; \ell)$ is the minimum $n$ such that any colouring of the edges of $K_n$ using $\ell$ colours gives rise to a monochromatic $K_t$, which is known to exist by Ramsey’s theorem. Ignoring the lower order terms (which is what the big recent breakthrough of Ashwin Sah improved for $\ell = 2$), the best upper bound we have for all $\ell \geq 3$ is $R(t; \ell) \leq \ell^{\ell t}$, and improving the base of the exponent from $\ell^\ell$ to anything smaller seems out of reach by our current methods. As far as the lower bounds are concerned, the best lower bounds for $\ell = 3, 4$ are $R(t; 3) \geq (\sqrt{3})^t$ and $R(t;4) \geq 2^t$, as shown by Erdős in 1940’s. For larger $\ell$, one can improve the lower bound of $R(t;\ell) \geq \sqrt{\ell}^t$ by using the following observation of Lefmann:

$R(t; \ell_1 + \ell_2) - 1 \geq (R(t;\ell_1) - 1)(R(t;\ell_2) - 1)$.

Conlon and Ferber improve all of these bounds as a corollary to the following result:

Theorem 1. For any prime $p$, $R(t; p + 1) > 2^{t/2} p^{t/3 + o(t)}$.

This, along with Lefmann’s observation, gives an exponential improvement for the known lower bound on $R(t;\ell)$ for all $\ell \geq 3$.

As the appearance of the prime $p$ in this result might suggest, it uses finite fields. In fact, it uses the basic theory of polar spaces/quadratic forms over finite fields. By giving more precise estimate on a certain count in their proof, I can show the following:

Theorem A. For any prime $p$, $R(t; p + 1) > 2^{t/2} p^{3t/8 + o(t)}$.

Note that this is an exponential improvement. Let’s see how we can prove this.

Let $p$ be a prime, and let $V$ be the set of all isotropic vectors in $\mathbb{F}_p^t$ with respect to the symmetric bilinear form $B(x, y) = x_1 y_1 + \cdots + x_t y_t$. In other words, $V = \{(x_1, \dots, x_t) \in \mathbb{F}_p^t : x_1^2 + \cdots + x_t^2 = 0\}$. It is well known, and in fact not so hard to show, that $|V| \sim p^{t - 1}$. Though all we need for the bounds is that $|V| = p^{t - O(1)}$, which is what Conlon and Ferber show in their paper (more precisely, they show $|V| \geq p^{t-2}$). Consider the following edge colouring of the complete graph on $V$: the edge $\{x, y\}$ is coloured with $i \in \mathbb{Z}$ if $B(x, y) = i \pmod{p}$ with $i \neq 0$, and if $B(x, y) = 0$ then it is coloured from the set $\{p, p + 1\}$ uniformly at random and independently for all such edges. Thus we have a $(p + 1)$-colouring of the complete graph on $|V|$ vertices, where the colours are just the integers $1, \dots, p + 1$.

In each colour $1 \leq i \leq p - 1$ we can show algebraically that the graph on these edges has no cliques of size larger than $t$, assuming $t$ is not divisible $p$, which we will assume (for the exact argument see the paper of Conlon and Ferber). Therefore, we are aiming for a lower bound on $R(t+1; \ell)$, but asymptotically it would be the same for $R(t; \ell)$. Now with the random colouring in the last two colours, Conlon and Ferber show that for $n = 2^{t/2}p^{t/3 + o(t)}$, there exists a subset of $V$ of size $n$ such that the complete graph on these vertices, with the same edge-colouring as before, has no monochromatic clique of size $t$ in colours $p$ or $p + 1$. This is done by choosing a random subset where each vertex is chosen with a certain probability which is a function of $n$. Therefore, we have a random colouring of certain pairs of vectors, and a random subset of all the isotropic vectors.

A crucial step in the estimation used in the proof, that leads to this particular choice of $n$, is an upper bound on the number of cliques of size $t$ in the graph defined on $V$ where two vectors are adjacent if they are orthogonal to each other, that is, $x \sim y$ if $B(x, y) = 0$. Note that in our edge colouring, the edges of this graph were the ones that got colours from $\{p, p + 1\}$. Let’s denote this number by $N_t$. Each such clique is a collection $x_1, \dots, x_t$ of vectors in $\mathbb{F}_p^t$ such that $B(x_i, x_j) = 0$ for all $1 \leq i \leq j \leq t$. Conlon and Ferber estimate the number $N_{t, r}$ of such cliques that have rank $r$, that is, the dimension of the space spanned by $x_1, \dots, x_t$ is $r$, by $N_{t, r} \leq p^{2tr - 3r^2/2 + r/2}$, and then just use the upper bound $N_t = \sum_{r = 1}^{t} N_{t, r} \leq t \max_{r = 1}^t N_{t, r}$.

Until this point it’s all fine, but what we can improve upon is the argument that follows. They say that $N_{t, r}$ is maximized for $r = 2t/3 + 1/6$ (which is not at all hard to see) and thus conclude that $N_t \leq p^{2t^2/3 + o(t^2)}$. But they miss a crucial linear algebraic fact at this point. The rank of a clique can never be larger than $t/2$. This is an easy fact that follows from (i) $\dim S + \dim S^\perp = t$ for any subspace $S$, and (ii) $S \subseteq S^\perp$ for a totally isotropic subspace, which is exactly what the span of the vectors forming the clique is going to be. In terms of the theory of polar spaces (or quadratic forms), it says that the rank of a polar space is always at most half of the dimension of the underlying vector space (something all the finite geometers must be familiar with). From this we see that $N_t \leq t N_{t, t/2} \leq p^{5t^2/8 + o(t^2)}$. After this, if you do the calculation of Conlon-Ferber then you’ll see that we can take $n = p^{3t^2/8 + o(t^2)}$, thus proving Theorem A. $\square$

Remark: If you are familiar with the theory of polar spaces, then you can easily estimate this number $N_t$, as each such clique must be contained in a generator (maximal totally isotropic subspaces) and you can count the total number of generators (see for example Theorem 4.11 in the book by Simeon Ball). Depending on the type of the polar space, you can even get an exact count but here we are just interested in the asymptotic results.

What’s next?

I think it’s really exciting that finite geometry, or in fact anything not completely probabilistic, can be used in the case of diagonal Ramsey numbers to improve the known bounds. This was recently shown to be true for the two-colour off-diagonal case by Mubayi and Verstraete, which was extended to multicolours by He and Wigderson, as I describe here and here. Therefore, finite geometry is paying a crucial role in developing Ramsey theory. I think a deeper understanding of these constructions, and new ideas from finite geometry (perhaps group theoretic ones) can lead to even further progress.

Update 28/09/20: Yuval Wigderson has managed to improve the bound further for $\ell \geq 4$ colours (see this).

## The dual version of Ryser’s conjecture

I talked about our new results related to Ryser’s conjecture in a previous post (also see an even earlier post). The conjecture, and its variants, have some interesting equivalent formulations in terms of edge colourings of graphs. While I was vaguely aware of that, I never looked into it in detail. Thanks to the new paper of DeBiasio, Kamel, McCourt and Sheats, I have now decided to study it properly (and share what I understand). In this post I would like to explain and prove this so-called duality.

Consider the following two statements (and try to prove them without reading forward):

(1) Given $n$ points in $\mathbb{R}^2$, let $k$ be the maximum number of points that you can choose from these such that no two points share an $x$ or a $y$ coordinate. Then these $n$ points can be covered by $k$ axis parallel lines, i.e., horizontal or vertical lines.

(2) Let $G$ be a graph of independence number $\alpha$ whose edges are coloured red or blue. Then the vertices of $G$ can be covered by $\alpha$ monochromatic sub-trees (different trees can have different colours but all the edges of one tree must have the same colour).

While both of these statements do have the common feature of being covering problems, it is not immediately clear that they are in fact both implied by the classical theorem of Kőnig. Ryser’s conjecture is a generalisation of this, which in terms of (1) can be seen as a generalisation to $\mathbb{R}^n$ and in terms of (2) can be seen as a generalisation to $r$ colours. We will focus on (2) here, as the connection to (1) is sort of immediate and I don’t know if the geometric point of view has any clear advantage.

Just as a recap, here is what Ryser conjectured. Given an $r$-partite $r$-uniform hypergraph $H$, we have $\tau(H) \leq (r - 1) \nu(H)$, where $\tau$ is the vertex cover number of a hypergraph and $\nu$ is the matching number.

For a graph $G$, let $tc_r(G)$ denote the smallest number of monochromatic connected subgraphs whose vertices cover $V(G)$, over all $r$-edge colourings of $G$. Here a monochromatic connected subgraph can also be a single vertex. You can in fact take these subgraphs to be the connected components in the graph $G_i$ formed by taking the edges coloured by the $i$-th colour. If the grah $G_i$ is disconnected, then you get single vertex monochromatic subgraphs. Gyárfás observed that Ryser’s conjecture is equivalent to $tc_r(G) \leq (r - 1) \alpha(G)$, for all graphs $G$. We will soon see why this is so.

Firstly, to get some familiarity with $tc_r(G)$, observe that the well known fact that for any graph $G$, either $G$ is a connected graph or its complement $G^c$ is a connected graph, can be stated as $tc_2(K_n) \leq 1$ for all $n$. This is because we can look the edges of $G$ and $G^c$ giving us the $2$-edge colouring of $K_n$, where $n = |V(G)|$. Also note that $tc_r(G) \leq n_i$, where $n_i$ is the number of connected components in the graph $G_i$, given an $r$-edge colouring of $G$. The notation $tc$ stands for tree cover, which comes from the fact that any connected graph has a spanning tree, and thus instead of covering with monochromatic connected subgraphs we can equivalently cover with monochromatic trees.

Let’s prove that these two conjectures are equivalent. So, say Ryser’s conjecture is true, and let $G$ be a graph with each of its edge given a colour from $\{1, \dots, r\}$. Denote the subgraph formed by edges whose colour is $i$ by $G_i$. Construct a hypergraph $H$ as follows. The vertices of $H$ are the connected components of $G_i$, for $i = 1, \dots, r$. These vertices are clearly partitioned into $r$ sets, corresponding to the colour $i$. For each vertex $v$ of $G$, define a hyperedge of $H$ as the set of all monochromatic connected components that contain $v$. Each hyperedge has size $r$ since for each $i$ the vertex $v$ is contained in a unique component of $G_i$. Thus, we have an $r$-partite $r$-uniform hypergrah $H$. See the images below for some concrete example.

Now a matching in $H$ corresponds to a set of vertices in $G$ such that no two of these vertices are together in any monochromatic connected component. In particular, there is no edge between in any two of these vertices. Therefore, $\nu(H) \leq \alpha(G)$. A vertex cover of $H$ corresponds to a collection of monochromatic components that cover all the vertices of $G$ (as they correspond to the hyperedges of $H$). Therefore, $tc_r(G) \leq \tau(H)$. Since we assumed that Ryser’s conjecture is true, we also have $\tau(H) \leq (r - 1) \nu(H)$. Combining all of these inequalities, we get $tc_r(G) \leq (r - 1) \alpha(G)$.

Now assume that the conjecture of Gyárfás is true, and let $H$ be an $r$-partite $r$-uniform hypergraph. Construct a graph $G$, where each hyperedge is a vertex of $G$ and two vertices are adjacent if the corresonding hyperedges intersect non-trivially. We colour an edge $uv$ of $G$ by the colour $i$ if the hyperedges corresponding to $u$ and $v$ meet each other in the $i$-th part of the $r$-partition. If they meet in more than one parts, then pick a colour arbitrarily. Again, for concreteness, we can see that in the first image above starting from the hypergraph on the right hand side we reach the graph on the left hand side. In the second case, however, we get the following:

It follows directly from the definition that independent sets in the graph are in bijective correspondence with matchings in the hypergraph, and thus $\alpha(G) = \nu(H)$. A monochromatic connected component in $G$, which is in fact a clique, corresponds to the vertex of the hypergraph where all of the hyperedges meet. Therefore, a cover of the vertices of the graph using monochromatic connected components gives rise to a vertex cover of $H$ of the same size, which implies that $\tau(H) \leq tc_r(G)$. Since $tc_r(G) \leq (r - 1) \alpha(G) = (r - 1)\nu(H)$, we get $\tau(H) \leq (r - 1) \nu(H)$, thus proving the equivalence of these conjectures.

This duality can be applied to other variants of Ryser as well. For example, in my work with Shagnik, Patrick and Tibor, we studied the min vertex cover size for the case when the hypergraph is $t$-intersecting, i.e., any two hyperedges share at least $t$ vertices. This can be seen as determining min $tc_r(K_n)$ with the added condition that in the edge colouring any two vertices are contained in at least $t$ different monochromatic components. Another variant that we studied was that of $k$-wise $t$-intersecting hypergraphs. This can be seen as determining the tree cover number of the $k$-uniform complete hypergraphs, where every set of $k$ vertices are contained in at least $t$ different monochromatic components. In this formulation Király had already solved the problem for $k \geq 3, t =1$, which we were unaware of, and our work resolves the general case of arbitrary $t \geq 1$. This complete knowledge of the extremal function for $k \geq 3$, and very limited knowledge for $k = 2$ is quite interesting.

One particular advantage of the edge colouring formulation is that we have a richer structure there that can be exploited to study interesting variants that have no natural formulation in terms of hypergraphs. For this, and more, check out the paper of DeBiasio, Kamel, McCourt and Sheats. On the other hand, the hypergraphs formulation somehow feels more natural, and in fact many proofs (for example the $r=3$ case of Ryser) are best stated in terms of hypergraphs. So in conclusion, it’s a good idea to be aware of both.

## Heisenberg groups, irreducible cubics and minimal Ramsey

As I mentioned in a previous post, we recently improved the upper bound on a Ramsey parameter, in collaboration with John Bamberg and Thomas Lesgourgues. My favourite thing about this work is how it ends up using the properties of certain Heisenberg groups associated to generalized quadrangles, and cubic equations, to say something interesting about a problem in graph Ramsey theory. It makes me really happy to see such harmony in mathematics. In this post I will attempt to describe the main construction in our paper without getting into too many details.

As discussed earlier, we want to determine the smallest possible minimum degree of a graph $G$ that has the property that every $r$-colouring of the edges of $G$ contains a monochromatic clique of size $k$, and no proper subgraph of $G$ has this property. This smallest degree is denoted by $s_r(K_k)$, and the best known upper bounds on it in various ranges are described in the earlier post. All of these bounds use the equality between $s_r(K_{k+1})$ and the following packing parameter:

$P_r(k) =$ the minimum $n$ for which there exist $K_{k+1}$-free pairwise edge disjoint graph $G_1, \dots, G_r$ on a common vertex set $V$ of size $n$ such that for any $r$-colouring of $V$ there exists a $G_i$ that contains a $K_k$ all of whose vertices are coloured in the $i$-th colour.

It is known that $s_2(K_{k+1}) = P_2(k) = k^2$, as proved by Burr, Erdős and Lovász in 1976, but for $r > 2$ our knowledge of this function is quite limited. (As an exercise, try to find such edge disjoint $G_1$ and $G_2$ on a common set of $k^2$ vertices.)

Since we are just looking at upper bounds for now, one can show $P_r(k) \leq n$ by finding a packing of $G_i$‘s where each $G_i$ is $K_{k+1}$ free but every collection of $n/r$ vertices in $G_i$ induces a copy of $K_k$. This works because in any $r$-colouring of $V$ there must exist a colour $i$ which occurs at least $n/r$ times, and then in the graph $G_i$ we have a $K_k$ all of whose vertices are coloured in the $i$-th colour. This has been the main approach so far in showing upper bound on $s_r(K_{k+1})$ for $r \geq 3, k \geq 2$, and we will be doing the same.

Before finding such a collection of edge disjoint $G_i$‘s, the first step is to see if we can find even a single graph $G$ on $n$ vertices which is $K_{k+1}$-free and has the property that any set of $n/r$ vertices in $G$ induce a $K_k$. For $k = 3$, we are looking for triangle-free graphs with independence number $< n/r$. This should remind you of the Ramsey number $R(3, t)$. In fact, such a $G$ on $n$ vertices exists if and only if $n < R(3, \lceil n/r \rceil)$. This connection can in fact be used to prove that $s_r(K_3) = \Theta(r^2 \log r)$ (see Guo and Warnke) by finding not just one triangle-free graph but a packing of triangle-free graphs.

More generally the ideas from the determination of the so-called Erdős-Rogers function were used to prove that for any fixed $k$, and $r$ large enough, $P_r(k) \leq C (\ln r)^{8k^2} r^2$ (see Fox, Grinshpun, Liebenau, Person and Szabó). This is pretty satisfactory as far as the realm of $k$ fixed and $r \rightarrow \infty$ is concerned, but in other ranges this is a terrible bound compared to the more general result of FGLPS that shows $P_r(k) \leq 8k^6 r^3$ for all $k \geq 2, r \geq 3$. Proofs of all these results rely on finite geometric ideas, combined with the probabilistic method, and that’s exactly the path we take as well.

Here is a general idea to construct these graphs. Consider a partial linear space (a.k.a. a linear hypergraph) that is “triangle-free” in the sense that no three lines pairwise meet each other in three distinct points. Then because of this triangle-freeness the only cliques (of size at least 3) in the graph defined on the point-set with two points adjacent if they are collinear, are those contained inside a line. Therefore, we can try to destroy all of the cliques of size $\geq k + 1$, while maintaining some structure so that any set of $n/r$ points contains a $K_k$. This idea, along with a probabilistic argument, can be used to prove the following lemma (which is Lemma 3.1 in our paper) about packings of partial linear spaces:

Lemma 1. Say there exist triangle-free partial linear spaces $(P, \mathcal{L}_1), \dots, (P, \mathcal{L}_r)$, each of order $(s, t)$, such that $(P, \cup_{i = 1}^r \mathcal{L}_i)$ is also a partial linear space. If $s \geq 2rk \ln k$ and $t \geq 2k(1 + \ln r)$, then $P_r(k) \leq |P|$.

The order $(s, t)$ denotes that every line has $s + 1$ points on it and every point has $t + 1$ lines through it. The condition that $(P, \cup_{i = 1}^r \mathcal{L}_i)$ is a partial linear space, is really important, and we overlooked it in our previous version of the paper which led to an error. I would like to thank Simona Boyadzhiyska for pointing this out to me.

The best sources of such triangle-free partial linear spaces, and in fact the only sources I know of, are generalized quadrangles (GQ). One can take any “regular” subhypergraph of a GQ to get these spaces. The main difficulty though is to find a packing of such spaces, as required in Lemma 3.1. In Fox et al. they managed to find such a packing with $s = t = q - 1$, an arbitrary prime power $q$, using what is known as the $T_2(O)$ generalized quadrangle (even though they don’t call it that in the paper). Since $|P| = q^3$ in their packing, we get the upper bound of $P_r(k) \leq 8k^6 r^3$ by picking a prime $q$ such that $rk^2 < q \leq 2rk^2$. We have managed to find such a packing with $s = q^2 - 1$ an $t = q - 1$, for an arbitrary odd prime power $q$, that gives us the following improvement:

Theorem 2. $P_r(k) \leq C k^5 r^{5/2}$.

The construction relies on a group theoretic model of generalized quadrangles, introduced by Bill Kantor in 1980. Consider the Heisenberg group $E$ defined on the set $\mathbb{F}_{q^2} \times \mathbb{F}_{q^2} \times \mathbb{F}_{q}$ by the following group operation $(x, z, y) \circ (x', z', y') = (x + x', z + z', y + y' + z^qx' + zx'^q)$. Note that $(a, b) \rightarrow a^q b + ab^q$ is a bilinear map on $\mathbb{F}_{q^2}$ when identified as the vector space $\mathbb{F}_q^2$.

Our points will be the elements of $E$, and thus we have a point set of size $q^5$. It is known that $E$ contains subgroups $A_i$, for $i \in \mathbb{F}_q$, such that (1) $|A_i| = q^2$; (2) $A_i \cap A_j = \{(0, 0, 0\})$ for any $i \neq j$ and (3) for any three distinct $i, j, k$ we have $(A_i A_j) \cap A_k = \{(0, 0, 0)\}$. These subgroups are $A_i = \{(a, ai, a^{q+1}i) : a \in \mathbb{F}_{q^2}\}$. The partial linear space obtained by taking the set of all cosets of $A_i'$ as lines, is of the order $(q^2 - 1, q - 1)$ and the third condition on these subgroups ensures that it is triangle-free. You should try to prove the triangle-freeness. For the finite geometers reading this post, this partial linear space is a subgeometry of the Hermitian generalized quadrangle $H(3, q^2)$.

We obtain a packing of partial linear spaces isomorphic to this one, on the same point set $E$, by taking a group of automorphisms of $E$ which then acts naturally on the set $\{A_i : i \in \mathbb{F}_q\}$. Each such automorphism $\tau$ defines a new set of subgroups $A^\tau_i$, $i \in \mathbb{F}_q$, which have the same three properties. What we want from our automorphisms though, because of the condition in Lemma 1 regarding $(P, \cup \mathcal{L}_i)$ being a partial linear space, is that every subgroup in $\{A_i : i \in \mathbb{F}_q\}$ must intersect ever subgroup in $\{A_i^\tau : i \in \mathbb{F}_q\}$ trivially. This turns out to be quite tricky, and for this we had to be very careful in our choice of automorphisms. In particular, one can show that none of the inner automorphisms of $E$ would work. In the end end we managed to find $q^2$ outer automorphisms of $E$ that give us the required packing of size $q^2$. Very interestingly, to prove that our packing is good, we had to rely on the following two results on irreducible cubics:

Proposition 1: (Kim-Kim-Yie 2009) There is a bijective correspondence between the set of irreducible polynomials of the form $x^3 - cx^2 + c^q x - 1$ in $\mathbb{F}_{q^2}[x]$ and the set of irreducible polynomials of the form $x^3 + ax^2 + bx + c$ in $\mathbb{F}_q[x]$ where $-b$ is a non-square in $\mathbb{F}_q$.

Proposition 2: For every finite field $\mathbb{F}_q$, and every $\lambda \in \mathbb{F}_q$, there exists an irreducible cubic $x^3 + ax^2 + bx + c$ where $b = \lambda$.

This can be shown via the Hansen-Mullen Irreducibility Conjecture (which was proved by Wan for all but finitely many cases, and those small cases were handled by Ham and Mullen). See this recent paper of Tuxanidy and Wang, and the references therein, for more on this conjecture and its generalizations where we look for irreducible polynomials with more than one of its coefficients prescribed.

## Generalized polygons in extremal combinatorics

Jacques Tits invented generalized polygons to give a geometrical interpretation of the exceptional groups of Lie type. The prototype of these incidence geometries already appears in his 1956 paper, while they are axiomatically defined in his influential 1959 paper on triality. The finite versions of these objects have played an important role in the study of finite simple groups and in the general the theory of permutation groups. In the 70’s and 80’s, generalized polygons also started being studied under the theory of distance regular graphs (and more broadly association schemes). The theory of these geometries has been developed for its own sake as well (see this and this), and there are several long standing open problems about them that have attracted the attention of many mathematicians. In this post I want to discuss the deep connection that generalized polygons have had with extremal combinatorics, specifically with Ramsey theory and Turán type theorems.

Finite projective planes, which are in fact a special case of generalized polygons known as generalized triangles, have always been deeply connected to extremal combinatorics. They give us the optimal examples for several extremal problems on both graphs and hypergraphs. For example, they were behind the construction of Esther Klein, mentioned in the 1938 paper of Paul Erdős, which in fact determines the maximum number of edges in a $C_4$-free graph on $n$ vertices asymptotically (it’s fun to look at the original number theoretic problem which motivated this construction and also the fact that there is no mention of the term ‘projective planes’ even though the concept is there). They are also present in the early constructions for some special cases of the classical Zarankiewicz problem. These early results form the foundation of extremal graph theory.

Finite generalized quadrangles (GQ) have played a similar role, but I have noticed that many times they don’t get mentioned explicitly by the extremal combinatorics community. There appears to be a gap in knowledge, which is probably because of the slightly more involved constructions of classical GQ’s. The so-called Benson graph of girth $8$, that give us the asymptotic result for the Turán number of $C_6$,  is simply the incidence graph of a GQ. While Benson doesn’t say that explicitly in his 1966 paper, it’d be very surprising if he actually din’t know it as his PhD thesis was called Generalized Quadrangles and (B,N) Pairs. Similarly, the so-called Wenger graph from 1991, is just an induced subgraph of the incidence graph of a well known GQ, even though there is no mention of generalized quadrangles in Wenger’s paper. See my earlier blog post on Wenger graph for more details on this. Once you see that, it should be clear how Kopparty’s graph is related to the same GQ. Thus,  the best known construction of optimal pseudorandom triangle-free graphs can be obtained via generalized quadrangles! This was also shown by Conlon, who started with the collinearity graph of a generalized quadrangle and gave a random construction from it that resulted in almost optimal triangle-free pseudorandom graphs. The basic idea behind these two constructions is in fact the same, just that Kopparty uses an algebraic method to achieve the end goal whereas Conlon uses a probabilistic one (which is why he ends up with an almost optimal graph).

These optimal triangle-free pseudorandom graphs give us the best known constructive lower bounds on the Ramsey number $R(3, t)$. As these can be obtained via generalized quadrangles, we see an application of GQ’s in Ramsey theory. Interestingly, GQ’s can also be used to give us the best known constructive lower bounds on the Ramsey number $R(4, t)$, as shown by Kostochka, Pudlák and Rödl. They have even been used in some hypergraph Ramsey problems to obtain tight lower bounds that beat the probabilistic ones (see Kostochka-Mubayi-Verstraëte). There are some other problems in Ramsey theory where GQ’s have shown up as well, see for example this and this.

In my recent work with John Bamberg and Thomas Lesgourgues on a minimal Ramsey problem we have also used generalized quadrangles. We noticed that a particular group theoretic model of GQ’s, due to Kantor, was perfectly suitable for this problem. I am certain that this idea will be useful in other problems in extremal combinatorics problems as well. I also believe that new interesting applications of generalized hexagons and octagons, as in the recent work of Mubayi and Verstraete (also described here), are just waiting to be found. One of the necessary steps for this is to learn more about these beautiful mathematical objects, and for that I would like to end with a few references.