## Linear trifferent codes and minimal codes

In the previous post, we saw the problem of determining the asymptotic growth of the function $T_L(n)$, which is the largest size of vector subspace $C \subseteq \mathbb{F}_3^n$, with the property that for any three distinct vectors $u, v, w$ in $C$, there is a coordinate $i$, such that $\{u_i, v_i, w_i\} = \mathbb{F}_3$. A code $C$ with these properties is called a linear trifferent code. Note that, $T_L(n) = 3^{\dim C}$, and thus it suffices to study the largest dimension of such codes.

In this post, I will show that linear trifferent codes are in fact related to another well-studied notion in coding theory, minimal codes.

A (linear) $[n, k, d]_q$ code is a $k$-dimensional subspace of $\mathbb{F}_q^k$ such that every codeword has at least $d$ non-zero coordinates. The classic problem in coding theory is to understand the trade-off between $d$ and $k$, as $d$ corresponds to the error-correcting capabilities of the code, and $k$ corresponds to the amount of information that can be transferred. Asymptotically, for a family of codes with parameters $[n_i, k_i, d_i]_q$, we want to understand the trade-off between the rate $R = \lim k_i/n_i$ and the relative distaudimennlesnce $\delta = \lim d_i/n_i$. Any family with both $R$ and $\delta$ strictly bigger than $0$, is called an asymptotically good family.

A non-zero codeword $u$ in $C$ is called minimal, if there is no non-zero codeword $v$ such that the set of coordinates where $v$ is non-zero is a proper subset of the set of coordinates where $u$ is non-zero. For example, in the $[4, 2, 2]_3$ code $C$ generated by the vectors $(0, 1, 0, 1)$ and $(0, 0, 1, 1)$, the vector $(0, 2, 0, 2)$ is a minimal codeword while $(0, 1, 1, 2)$ is not. Minimal codes are linear codes in which all codewords are minimal. For example, any linear code where all vectors have the same number of non-zero coordinates is automatically minimal, but they can be much more interesting than that example. Over the binary field $\mathbb{F}_2$, coincidentally, a minimal code is equivalent to an intersecting code (where the support of any two vectors intersects non-trivially), which have been studied extensively since 1980s, due to various theoretical and applied reasons. But over larger fields, minimal codes have not been so well-studied. In an earlier post, I talked about the work of Alfarano, Borello, Neri and Ravagnani, who showed that every minimal $[n, k, d]_q$ code satisfies

• $d \geq (q - 1)(k - 1) + 1$
• $n \geq (q + 1)(k - 1$.

The first statement shows that any infinite family of minimal codes where $k$ is a linear function of $n$, must be asymptotically good. Therefore, it is very interesting to construct such families. In my work with Noga Alon, Shagnik Das, and Alessandro Neri (that we will hopefully upload on arXiv by next month), we give the first explicit construction of minimal $[n, k, d]_q$ codes where $k = c(q + 1)n$ for some absolute constant $c$, using expander graphs. I will talk about this construction in a later post.

With all of this discussion around minimal codes, it is not at all clear what they might have to do with linear trifferent codes. Well, it turns out these two are the same objects over $\mathbb{F}_3$. This is not something I expected when I started working on the trifference problem! Here is a short proof from my paper with Jozefien, Dion and Aditya:

Theorem 1. Let $C$ be a vector subspace of $\mathbb{F}_3^n$. Then $C$ is a trifferent code if and only if it is a minimal code.
Proof. First suppose that $C$ is not minimal. Then there exist distinct nonzero codewords $u,v\in C$ with $\mathrm{supp}(u) \subsetneq \mathrm{supp}(v)$.
Let $w=-u$, which must also lie in $C$ as $C$ is linear.
Then there is no index $i \in [n]$ such that $\{u_i,v_i,w_i\}=\mathbb{F}_3$, since $v_i=0$ implies $u_i=w_i=0$ and $u_i=0 \iff w_i=0$. We conclude that $C$ is not a trifferent code.

For the other direction, suppose that $C$ is a minimal linear code. Say $C$ is not a trifferent code.Then there exist distinct codewords $u,v,w\in C$ such that there is no coordinate in which they all have a different value.
Consider the vectors $w - u, w - v$ and $2w - u - v$, which all belong to $C$ because $C$ is linear. The first two vectors are non-zero because $w \neq u$ and $w \neq v$. We first show that $2w - u - v$ is also a non-zero vector. As $u$ and $v$ are distinct, there is an $i \in [n]$ such that $u_i \neq v_i$.
Since $w_i$ must be equal to $u_i$ or $v_i$, it follows that $2w_i - u_i - v_i = (w_i - u_i) + (w_i - v_i)$ is either equal to $u_i - v_i$ or $v_i - u_i$, which are both non-zero.
This also shows that $w - u$ is not a scalar multiple of $w - v$, and thus $\mathrm{supp}(w - u) \neq \mathrm{supp}(w - v)$.

We now show that $\mathrm{supp}(w - u)$ is a subset of $\mathrm{supp}(2w - u - v)$. Let $i \in [n]$ such that $w_i - u_i \neq 0$. We need to show that $2w_i - u_i - v_i \neq 0$. Since $w_i \neq u_i$, we must have the following two cases: (i) $w_i = v_i$ and $u_i \neq v_i$; (ii) $w_i \neq v_i$ and $u_i = v_i$.
In the first case $2w_i - u_i - v_i = v_i - u_i$, which is non-zero. In the second case, $2w_i - u_i - v_i = 2(w_i - u_i)$, which is also non-zero. Similarly, we can show that $\mathrm{supp}(w - v)$ is also a subset of $\mathrm{supp}(2w - u - v)$. Since $\mathrm{supp}(w - u)$ is not equal to $\mathrm{supp}(w - v)$, at least one of them must be a proper subset of $\mathrm{supp}(2w - u - v)$, which contradicts the minimality of $C$. $\square$

From this new equivalence, it follows that determining $T_L(n)$ is equivalent to determining the largest dimension $k$ of a minimal $\mathbb{F}_3$-code of length $n$, which gives us a new way of attacking this problem. In fact, finding the largest dimension $k$ of a minimal code of length $n$ is equivalent to a finite geometric problem on something called a strong blocking set. In the next post, I will show how to improve the current best bounds on these blocking sets over arbitrary finite fields. When restricted to $\mathbb{F}_3$, these improvements will imply the new bounds on $T_L(n)$ that I described in my previous post. In particular, I will prove the following two theorems.

Theorem 2. For every prime power $q$ there is a constant $c_q > 1$ such that every linear minimal code of length $n$ and dimension $k$ satisfies $n \geq (c_q - o(1))(q+1)(k -1)$, for large enough $k$.

Theorem 3. For every prime power $q$, there exists a linear minimal $k$-dimensional code of length $n$ with $n \leq 2(q + 1)k/\log_q(\frac{q^4}{q^3 - q + 1})$.

## The Trifference Problem

What is the largest possible size of a set $C$ of ternary strings of length $n$, with the property that for any three distinct strings in $C$, there is a position where they all differ?

Let $T(n)$ denote this largest size. Trivially, $T(1) = 3^1 = 3$ as you can take all possible ternary strings of length one. After some playing around you can perhaps prove that $T(2) = 4$ (I encourage you to try it so that you understand the problem). With a bit more effort and the help of a computer, you might also be able to show that $T(3) = 6$, and $T(4) = 9$. For example, here is a set of nine ternary strings showing that $T(4) \geq 9$: $0000, 0111, 2012, 2201, 2120, 0111, 1012, 1101, 1210$. You should check that for any three strings from these nine, there is at least one position where the values that you get is a permutation of $\{0, 1, 2\}$.

Things start getting much harder for larger lengths. In fact, despite this problem being more than 50 years old, only last year it was shown that $T(5) = 10$ and $T(6) = 13$ (see this for the details). Computing the value of $T(7)$ is already an open problem!

Perhaps a much more tractable problem than computing $T(n)$ exactly is to find out how this function behaves. For example, does it grow exponentially in $n$? The answer is yes, as it was shown by Körner and Marton in 1988:

$T(n) \geq \left( \frac{9}{5} \right)^{n/4}.$

Surprisingly, that is still the best lower bound that we know! The situation is even more surprising for upper bounds, where the following result of Körner from 1973 is still essentially the best bound:

$T(n) \leq 2 \left(\frac{3}{2}\right)^n.$

The constant $2$ in front of the exponential term can be improved by using the new values of $T(n)$ for $n = 5, 6$, but we have absolutely no idea how to improve the base $3/2$ of the exponent. Here is a quick proof for this upper bound:

Let $C$ be such a collection of strings. Consider the three sub-collections $C_1, C_2, C_3$, where $C_1$ = all strings in $C$ starting from 0 or 1, $C_2$ = all strings starting from 1 or 2, $C_3$ = all strings starting from 2 or 0. Then $2|C| = |C_1| + |C_2| + |C_2|$ since each string in $C$ is counted twice on the right-hand side. Note that the sets $C_i$‘s also have the trifference property, and since we know the first letters in them, they all have size at most $T(n-1)$. This gives us the recurrence $2T(n) \leq 3T(n - 1)$, that implies the bound. $\square$

The Trifference Problem, is the problem of understanding the limit

$\lim_{n \rightarrow \infty} \frac{\log_3(T(n))}{n}.$

The bounds above show that $\liminf \log_3(T(n))/n \geq 0.133$, and $\limsup \log_3(T(n))/n \leq 0.369$.

It is ridiculous how none of the combinatorial methods that have been developed in the past 50 years have led to any improvement in the bounds on $T(n)$. For example, Costa and Dalai have shown that the slice rank method, which was recently used to make breakthroughs on the cap set problem, and many other combinatorial problems, cannot be used in any obvious way to improve the upper bound on $T(n)$.

As we cannot make progress on the original problem, we can try to put some extra restrictions to see if that leads to any better bounds. This is precisely what Pohoata and Zakharov did last year, and studied the variation of the problem where the set $C$ is a vector subspace of the set $\mathbb{F}_3^n$, where $\mathbb{F}_3$ is the finite field of three elements. While it is somewhat natural to put such a restriction, it is also motivated by the fact that the lower bound on $T(n)$ by Körner and Marton relies on a probabilistic lifting of a trifferent code $C$ which is a vector subspace of $\mathbb{F}_3^4$ (it is in fact the set of strings of length $4$ that I wrote above). Moreover, the best known explicit constructions of these trifferent codes also happen to be linear.

Let $T_L(n)$ be the largest size of a linear trifferent code, that is, a vector subspace $C$ of $\mathbb{F}_3^n$ with the property that for any three distinct vectors in $C$, there is a coordinate where they all differ. Pohoata and Zakharov, showed that

$T_L(n) \leq 3^{\left(\frac{1}{4} - \epsilon\right)n},$

which is much better than the bound that we could prove on $T(n)$! The $\epsilon$ in their result is a small positive number that is not determined explicitly.

In my recent work with Jozefien D’haeseleer, Dion Gijswijt, and Aditya Potukuchi, we have managed to improve this bound to

$T_L(n) \leq 3^{\frac{n}{4.55}}.$

More interestingly, we have shown that essentially the same lower bound can be proven on $T_L(n)$ that was known for $T(n)$,

$T_L(n) \geq \frac{1}{3} \left( \frac{9}{5}\right)^{n/4}.$

It is worth noting that our construction is very different from the one given by Körner and Marton, as that is far from being linear. Our lower bound gives us a new motivation for studying $T_L(n)$. As $T(n) \geq T_L(n)$, we can focus on improving the lower bounds on linear trifferent codes to get an improvement on trifferent codes. Even a tiny improvement to our lower bound (which asymptotically matches the best lower bound in the trifference problem), will be a breakthrough.

In the next few posts, I will sketch the arguments that we have used to make these improvements. We will also see exciting new connections to other problems in mathematics, on blocking sets, minimal codes, and expander graphs, that we establish in our paper. In fact, we will see a bunch of problems that are equivalent to determining $T_L(n)$, and thus any one of those could be studied to make progress on the trifference problem.

## Polymath Junior

This summer I will be one of the main mentors for a project on Ramsey numbers as a part of the Polymath REU. This is my first time being involved with this program and I am super excited about it.

If you are an undergraduate student, anywhere in the world, interested in doing some mathematical research this summer, then please consider applying! The applications deadline is 1st April, 2022. Here is the link to apply.

## A short video on Ramsey numbers

I was recently involved in making a 1 minute maths video for a contest organised by Veritasium. Here is the main requirement for the video

We are looking for videos that clearly and creatively explain complex or counterintuitive concepts in the fields of Science, Technology, Engineering, and Mathematics.

This project started with Aditya Potukuchi telling me about the contest some weeks back after which we immediately started brainstorming for possible topics. I then suggested that we include my brother Anup Bishnoi and his wife Jyoti Arora in the project as they love creating various forms of learning content. In fact, they have founded a company around this called Not A Bot Studios (go check it out!). I introduced to them what Ramsey numbers were, as Aditya and I decided that could be a suitable topic for a 1 minute video. They seemed quite amazed by this mathematical concept and especially surprised by the fact that we don’t even know what the Ramsey number R(5) is. Thus, that became the punchline of our video. We worked on this video for about a week and had a lot of fun doing it! The involvement of my partner Aparajita Nath later on made it even better. I am quite happy with the final outcome as we have been getting some great comments.

As this is the first time I have been involved with something like this, I though I’ll share my experience and learnings, with the aim of encouraging other mathematicians to do something similar.

Here are some of my key learnings from this project:

• Parsing mathematical statements is really hard.
As mathematics students we get trained in parsing statements like “Either A or B is true” and using quantifiers like “for all”, “there exists”, etc. This is not the case for most people and they can find it extremely difficult to understand such statements, especially in such a short video. We tried to make it easier by using some visual aids. For example, we showed a bunch of possible arrangements of allies/enemies (as a two-edge-colouring of the complete graph) when saying that you can always find 3 mutual allies or 3 mutual enemies. Even after that it didn’t look like it’ll be immediately clear to a first time viewer that we are talking about all possible arrangement and so we decided to make it interactive by asking the viewer to pause the video and try it out on their own.
• Get a lot of feedback!
When we made the first version of this video all of us were pretty happy with the outcome, until we tried it out with a bunch of our friends who had no idea about the topic. Many of them couldn’t follow it at all. Even after multiple watchings! The feedback we got from them made us realise that because we had already spent a lot of time working on this, we all had a good idea of what Ramsey numbers were, and so we just started focussing on making it visually appealing and fun. The explanation part took a back seat. Because of the feedback, we were able to fix this. Even though it required a lot of rewriting of the script and redoing of the animation, I am so glad that we didn’t put out that version of the video.
• Make it fun to watch, without compromising the subject matter.
No matter what audience you are aiming for, it is going to be really helpful to add fun elements that they can all enjoy. Make them laugh and make them care about what you are teaching. Memes are your friends!
One important thing to ensure though is that these fun elements don’t take away anything from the actual concept that you are trying to explain. For example, we based our script around the world of the widely popular tv series Game of Thrones (which also had a widely unpopular last season), but made sure that the specific characters and all the insider jokes are not necessary in any way to understand Ramsey numbers. A friend pointed out how he started recalling the tv show instead of focussing on the maths. We tried to fix this in the final version of the video, but I am not sure if we have still managed to remove all such distractions.

• Keep it clear and simple
Both the script and visuals should be as clear and simple as possible. Complicated mathematical terms or english words can be very confusing, especially in a short video like this. It took as many iterations to get to the final version, but I am glad that we made a conscious effort of keeping it simple.

As I said before, it was great fun to make this video that can explain Ramsey numbers to a general audience in one minute and I wish I can do more such things in future, especially with such great collaborators. Go check out the video and share it around!

## A postdoc position at TU Delft

I would like to advertise a 3 year postdoc position at my research group. Please apply if you are interested in the kind of research that Dion and I do (https://anuragbishnoi.wordpress.com/publications/). The deadline is April 15 2021. Here are some further details:

A three-year postdoctoral position is available at the Optimisation and Discrete Mathematics group (https://www.tudelft.nl/ewi/over-de-faculteit/afdelingen/applied-mathematics/optimization) of the Applied Mathematics department (DIAM) at TU Delft. The preferred starting date is September 2021, with some flexibility.

The successful candidate is expected to hold a PhD in mathematics or closely related field by the start of the appointment, and have research interests aligned with our group, e.g. extremal combinatorics, finite geometry or algebraic methods in combinatorics/optimization. The position also entails some teaching and supervision responsibilities.

Qualified candidates are invited to send the following application materials to Dion Gijswijt (D.C.Gijswijt@tudelft.nl):

– Curriculum Vitae
– Publication List
– Research Statement (1-2 pages)
– Three contacts for letters of recommendation

Informal enquiries related to the position are encouraged and may be directed to the above address.

The deadline for the application is April 15, 2021 or until the position is filled.

Dion Gijswijt and Anurag Bishnoi

## A coding theoretic application of the Alon-Füredi theorem

The Alon-Füredi theorem is something that I have written a lot about in this blog. I spent a considerable amount of time on this theorem during my PhD. In fact, it’s generalisation that I obtained and it’s applications in finite geometry (blocking sets), coding theory (Reed-Muller codes), and computer science (polynomial identity testing), constitute my most cited paper. I have recently learned another nice coding theoretic application of this result, which I would like to share in this post.

Coding theory is an important branch of mathematics with wide ranging applications, like data compression, transmission, storage. One of the main object studied in this area is a linear code. An $[n, k, d]_q$ code is a $k$-dimensional vector subspace of $\mathbb{F}_q^n$ in which the minimum Hamming weight of vectors is equal to $d$. The elements of the vector subspace are known as codewords. The parameter $n$ is known as the length of the code and $d$ as its minimum distance which is related to the number of errors that can be corrected by the code (higher the value of $d$, higher the error-correcting capacity of the code). The main problem in coding theory is to obtain the largest possible value of $k$, given $n, d, q$ and the smallest possible $n$ given $k, d, q$. Some of the fundamental bounds for these problems are the Hamming bound, the Singleton bound and the Griesmer bound. Some famous examples of codes that are optimal w.r.t. these bounds are the Hamming code, Golay code, and Reed-Solomon codes. Along with being useful in practice, these codes are also related to beautiful and deep mathematical theories. Here is a nice introduction to the subject by Venkatesan Guruswami.

The study of codes with certain extra restrictions has also developed because of some specific examples. One such restriction, is that every codeword is minimal, in the sense that if $\sigma(c) = \{i : c_i \neq 0\}$ is the support of a codeword $c$ and $c'$ is any codeword with $\sigma(c') \subseteq \sigma(c)$ then $c' = \lambda c$ for some $\lambda \in \mathbb{F}_q$. Such minimal codewords have been used in secret sharing schemes and decoding algorithms. Here is an early paper discussing some properties of minimal codewords. Naturally, one expects that minimal codes will have stronger bounds on their parameters. That’s precisely where Alon-Füredi shows up. In a recent paper, Alfarano, Borello, Neri and Ravagnani have applied this theorem and related results to prove the following:

Theorem 1. In a minimal $[n, k, d]_q$ code, $d \geq (q-1)(k - 1) + 1$.

Theorem 2. In a minimal $[n, k, d]_q$ code, $n \geq (q + 1)(k - 1)$.

Here is their short proof of Theorem 1 using the Alon-Füredi bound.

Every codeword in a minimal code is also maximal. Let $c$ be a non-zero codeword. Then $c = u G$, where $G$ is the $k \times n$ generator matrix and $u$ is a non-zero vector in $\mathbb{F}_q^k$. Let $f_i \in \mathbb{F}_q[x_1, \dots, x_k]$ be the linear polynomial given by $c_i \cdot x = 0$ where $c_i$ is the $i$-th column of $G$. Consider the polynomial $f = \prod_{i \in \sigma(c)} f_i.$ Then $f(u) \neq 0$, and in fact for any vector $v$, $f(v) \neq 0$ if and only if $\sigma(c) \subseteq \sigma(v G)$, which by the maximality of $c = uG$ is only possible if $v = \lambda u$ for some $\lambda$. Therefore, $f$ is non-zero at exactly $q - 1$ elements of $\mathbb{F}_q^k$. From the Alon-Füredi bound, we know that $f$ is non-zero at at least $(q - b)q^{k - a - 1}$ points where $a, b$ are unique non-negative integers for which $\deg f = a(q - 1) + b$, and $1 \leq b \leq q - 1$. From $q - 1 \geq (q - b)q^{k - a - 1}$, we can conclude that $k - a - 1 = 0$, and thus $\deg f = (q - 1)(k - 1) + b \geq (q - 1)(k - 1) + 1$. But note that $\deg f$ is precisely the Hamming weight of $c$, and hence the minimum distance of the code is at least $(q - 1)(k - 1) + 1$. $\square$

The proof of Theorem 2 is slightly more involved than Theorem 1, and so I refer you to the paper for that. The same proof appears in this recent paper in a geometric language. Note that one can derive lower bounds on $n$ using Singleton bound or the Griesmer bound in combination with Theorem 1, but these are weaker than the bound in Theorem 2.

Interestingly, Theorem 1 can also be proved via the main Alon-Füredi theorem, instead of the Alon-Füredi bound (which is in fact a result of Jamison and Brouwer-Schrijver when applied to $\mathbb{F}_q^n$), i.e., any polynomial $f$ that vanishes on all points except one in $\mathbb{F}_q^n$ has degree at least $n(q-1)$ (see Theorem 22 in this or Theorem 3.7 in this).

## Bilinear forms and diagonal Ramsey numbers

The recent breakthrough of Conlon and Ferber has shown us that algebraic methods can be used in combination with probabilistic methods to improve bounds on multicolour diagonal Ramsey numbers. This was already shown for the off-diagonal Ramsey numbers by Mubayi and Verstraete last year, as discussed here. Sadly, for two colours we still don’t have any improvement over the classic probabilistic bound proved by Erdős in 1947 (except for some constant factors in front of the exponential function). More concretely, we haven’t been able to prove $R(t, t) > c^t$, for any $c > \sqrt{2}$ since 1947, despite considerable effort by various mathematicians. Therefore, it might sound reasonable to think that $R(t, t)$ is equal to $2^{t/2 + o(t)}$, but the exponential improvement over the (purely ) probabilistic lower bounds on multicolour Ramsey numbers should cast some doubt on that.

In an end remark Conlon and Ferber mention that the lower bound of $2^{t/2 + o(t)}$ can be proved in a new way, using the arguments of their paper, where we have a mix of algebra and probability. This sounds like a very promising approach for further improvements as well. In this post I want to explore that, and attempt to explain why we will need some really new ideas to improve the lower bound. In the process, we will see some basic theory of bilinear forms, which is crucial to the arguments. For a more in depth exploration of the topic, see these notes of Keith Conrad and Chapter 3 of the book by Simeon Ball.

Given a vector space $V$ over a field $F$, a bilinear form is a function $b: V \times V \rightarrow F$, such that for any fixed $u, v \in V$, both $b(u, y) : V \rightarrow F$ and $b(x, v) : V \rightarrow F$ are linear functions. This is an abstraction of the dot product, which is so crucial in understanding the geometry of the real space $\mathbb{R}^n$. Inspired by that, we define a vector $u$ to be orthogonal to a vector $v$ if $b(u, v) = 0$. There is a certain lack of symmetry in the definition here, namely, $b(u, v) = 0$ doesn’t necessarily imply $b(v, u)$, or in other words, we could have $u$ orthogonal to $v$ but $v$ non-orthogonal to $u$. To fix that we introduce the notion of reflexivity and call a bilinear form reflexive if $b(u, v) = 0$ implies $b(v, u) = 0$ for all $u, v \in V$. Once have have a reflexive bilinear form we can talk about orthogonality of two vectors. From now on, $b$ will denote a reflexive bilinear form.

While this orthogonality defined in terms of reflexive bilinear forms might seem like a very general notion, it ends up being restricted to only two cases. A fundamental result, attributed to John von Neumann, states that for any vector space $V$, a reflexive bilinear form $b$ on $V$ is of one of the following two types:

1. Symmetric: $b(u, v) = b(v, u)$ for all $u, v \in V$.
2. Alternating: $b(u, u) = 0$ for all $u$.

For example, the dot product on $\mathbb{R}^n$ defined by $(x_1, \dots, x_n) \cdot (y_1, \dots, y_n) = x_1y_1 + x_2y_2 \cdots x_ny_n$ is a symmetric bilinear form, whereas the function $b(x, y) = x_1y_2 - x_2y_1 + x_3y_4 - x_4y_3 + \cdots x_{n-1} y_n - x_n y_{n - 1}$ (assuming $n$ to be even) is alternating. These examples are in fact examples of what we call non-degenerate bilinear forms, i.e., form with respect to which no vector is orthogonal to every vector of the vector space.

One interesting property of alternating forms is that they are skew symmetric. Indeed, we have $b(u + v, u + v) = 0$ for any $u,v$ and thus $b(u,v)=-b(v,u)$. If the characteristic of the underlying field $F$ is even, then this shows that every alternating form is in fact symmetric. The following lemma gives us a relation in the other direction as well.

Lemma 1: If $b$ is a symmetric bilinear form on a vector space $V$ over a field $F$ of characteristic $2$, then $V=S\oplus W$ where $\dim S \leq 1$ and the restriction of $b$ on $W$ is an alternating form.

For example, if we take $b(x, y)=x_1y_1 + \cdots x_n y_n$ over $\mathbb{F}_2^n$, then we can take $W = \{u \in \mathbb{F}_2^n : b(u,u) = 0\}$, which is equal to the hyperplane $\{u\in \mathbb{F}_2^n:\sum u_i = 0\}$, since $a^2=a$ for any $a \in \mathbb{F}_2$. The form restricted to $W$ is alternating by the definition of $W$.

One of the basic notions associated to a bilinear form is that of an isotropic subspace, which shows up in the construction of Conlon and Ferber as well, as discussed in the previous post. An isotropic subspace is a subspace $S$ such that $b(u, v) = 0$ for any two, not necessarily distinct, $u,v$ in $S$ (such a subspace is usually a totally isotropic but we’d ignore that here). For an arbitrary subset $A$ of $V$, if we denote the set $\{u\in V:b(u,v)=0\forall v\in A\}$ by $A^\perp$, then $S$ being isotropic is the same as saying $S \subseteq S^\perp$. Note that $A^\perp$ is always a vector subspace. The following is another fundamental result whose corollary is what we used in the previous post.

Lemma 2: Let $b$ be a non-degenerate reflexive bilinear form over a finite dimensional vector space $V$. Then for any subspace $S$, we have $\dim S + \dim S^\perp = \dim V$.

Corollary 3: An isotropic subspace with respect to a non-degenerate (reflexive) bilinear form over an $n$ dimensional vector space has dimension at most $n/2$.

The max dimension of an isotropic subspace is known as the Witt index of the bilinear form. It can be shown that every non-degenerate alternating form has Witt index exactly equal to $n/2$ (see Theorem 3.8 in Simeon Ball’s book). This also implies that there are no non-degenerate alternating forms over odd dimensional vector spaces.

Let us now look at the graph that plays the main role in the construction of Conlon and Ferber. Let $G$ be the orthogonality graph on the set of isotropic vectors in $\mathbb{F}_2^t$ with respect to the bilinear form $b(x, y) = x_1y_1 + \cdots x_t y_t$, with $t$ even. Then as we saw earlier, the set $S$ of isotropic vertices form a $t-1$ dimensional vector subspace on which $b$ is an alternating form. Moreover, this alternating form is degenerate because $(1, \dots, 1)$ is orthogonal to every other vector in $S$ since $t$ is even. As there is no other isotropic vector which is orthogonal to every other isotropic vector, the set vector space $S$ can be decomposed as $S = \langle (1, \dots, 1) \rangle\oplus \mathbb{F}_2^{t\text{-}2}$ such that the form induced on $\mathbb{F}_2^{t\text{-}2}$ is a non-degenerate alternating form. Therefore, we should be able to understand this graph starting from the non-degenerate alternating form. Let’s do that and start over.

Let $t$ be an even integer, and $b$ a non-degenerate alternating form on $\mathbb{F}_2^{t\text{-}2}$. Let $G$ be the orthogonality graph on these vectors, that is, $u \sim v$ if $b(u, v) = 0$. We can notice the following property.

Lemma 4: There are no independent sets of size $t$ in $G$.
Proof. Let $x_1, \dots, x_t$ be an independent set, and say $\sum a_i x_i = 0$. Then by applying $b$ on this sum and each $x_i$ we get that $(a_1, \dots, a_t)$ is in the null space of $J - I$. But $J - I$ has eigenvalues $t-1$ and $\text{-}1$, which are both non-zero (since $t$ is even), and thus $J - I$ is non-singular. Therefore, $a_i = 0$ for all $i$, and these vectors form an independent set in the vector space. This is a contradiction since the dimension of the vector space is strictly less than $t$.

While there can’t be any large independent sets, $G$ certainly has really large cliques. Any (totally) isotropic subspace of dimension $(t - 2)/2$ gives rise to a clique of size $2^{t/2 - 1}$, and in fact these are the largest possible cliques. So, trivially $G$ does not give any good lower bound on the Ramsey number $R(t, t)$. Still, we can apply the basic idea of Conlon and Ferber (which was also the main idea of Mubayi and Verstraete) and take a random induced subgraph of $G$, hoping that it can destroy all large cliques while giving us a graph of reasonable size. (Note that any induced subgraph will also be free of any independent sets of size $t$.) For that, let’s first count the total number of cliques of size $k$ in $G$.

Every clique in $G$ spans an isotropic subspace of $\mathbb{F}_2^{t-2}$, which has a certain rank $r$. The number $N_{k,r}$ of cliques of size $k$ and rank $r$ can be bounded from above by $2^{t-2} 2^{t - 3} \cdots 2^{t - 2 + r -1} 2^{(k - r)r}$, where the first $r$ terms correspond to choosing $r$ linearly independent members of the clique and then we pick the remaining $k - r$ elements from this $r$-dimensional subspace. By simplifying, we have

$N_{k, r} \leq 2^{(t+ k - 3(r+1)/2)r}$.

This number is increasing in $r$ from $r = 1$ until $r = (t + k)/3 - 1$, and then decreases till $r = 2(t + k)/3 -1$, where it finally becomes $1$ (unless $2(t + k)/3 - 1 > t - 1$, which is a trivial case that we’ll ignore). But, we know that $r \leq t/2$ from Corollary 3, and hence $\max_{r} N_{k, r} = N_{k, t/2 - 1}$ for $(t + k)/3 - 1 \geq t/2 - 1$, i.e., for $k \geq t/2$, and $\max_{r} N_{k, r} = N_{k, (t + k)/3 - 1}$ for $k \leq t/2$. While we are interested in $N_{t} = \sum_{r = 1}^{t/2 - 1} N_{t,r}$, we will soon need the estimates for smaller cliques as well.

Repeating the probabilistic argument of Conlon and Ferber, we see that if we sample each vertex independently with probability $p = n2^{-t + O(1)}$, then there exists a subset of vertices of size $n$ inducing no clique of size $t$ as long as we can ensure that $n^{t} 2^{-t^2 + o(t^2)} N_t < 1/2$. Since $N_t \leq \frac{t}{2} \max_{r} N_{t,r} = \frac{t}{2} N_{t, t/2} \leq 2^{5t^2/8 + o(t^2)}$, we can take $n = 2^{3t/8 + o(t)}$, which then gives us the bound $R(t, t) > 2^{3t/8 + o(t)}$ on the Ramsey number $R(t,t)$. This is worse than the purely probabilistic bound of $2^{t/2}$. Can we do anything better?

Here’s an idea that is extracted from the paper of Conlon and Ferber. We can look at cliques, and independent sets, of size smaller than $t$ in this graph. Perhaps for some $0 < a < 1$, we can get good lower bound on $R(at, at)$ by taking a random induced subgraph of $G$ that does not contain any cliques of independent sets of size $at$.

For that, let’s count the number $I_k$ of independent sets of size $k$ in $G$. For $k$ even, the proof of Lemma 4 tells us that any such set will also be an independent set of vectors. Therefore, $I_k$ is at most $\prod_{i = 0}^{k-1} 2^{t - i - 2}$ or $2 \prod_{i = 0}^{k-2} 2^{t - i - 2}$, depending on whether $k$ is even or odd. In any case, for $k = at$, we have $I_{at} \leq 2^{(2a - a^2)t^2/2 + o(t^2)}$.

Recall that we had $N_{at} \leq 2^{(1+a)^2 t^2/6 + o(t^2)}$ for $a \leq 1/2$ and $N_{at} \leq 2^{(4a+1)t^2/8 + o(t^2)}$ for $a \geq 1/2$. Therefore, we see that asymptotically $N_{at}$ is always bigger than $I_{at}$. Thus, the probabilistic argument from before gives us a subset of $n$ vertices on which the induced graph has no cliques of size $at$ and no independent sets of size $at$ as long as $n^{at} 2^{-at^2 + o(t^2)} (N_{at} + I_{at}) < 1/2$, and since $N_{at} \geq I_{at}$ (for $t$ large enough), it suffices to pick an $n$ for which $n^{at} 2^{-at^2 + o(t^2)} N_{at} < 1/2$. We can achieve this by taking $n = 2^{cat + o(t)}$ where $c = (4a - 1)/(8a^2)$ for $a > 1/2$ and $c = (4a - a^2 - 1)/(6a^2)$ for $a \leq 1/2$.

Sadly, both of these values of $c$ are $\leq 1/2$, with equality only at $a = 1/2$. Therefore, we are always worse than the basic probabilistic bound except for the case of $a = 1/2$ where we can show $R(t/2, t/2) > 2^{t/4 + o(t)}$. This asymptotically matches the probabilistic bound but we arrive at it via a different (and much more complicated) construction. For improvements, using this easy method, it looks like we’ll have to consider some other graphs. This new approach definitely gives me hope that one can find graphs with a better control on the total number of cliques and independent sets, which can then be used to give an exponential improvement in the lower bound on two-colour diagonal Ramsey numbers.

## Covering the binary hypercube

A finite grid is a set $S = S_1 \times \cdots \times S_n \subseteq F^n$, where $F$ is a field and each $S_i$ is a finite subset of $F$. The minimum number of hyperplanes required to cover $S$ can easily be shown to be $\min |S_i|$, with the hyperplanes defined by $x_1 - \lambda = 0$, for $\lambda \in S_j$ providing such a cover (assuming $j$ is the index minimising $|S_i|$). Interestingly, if we forbid the hyperplanes to pass through one specific point of $S$, say $\vec{a} = (a_1, \dots, a_n)$, and then want to cover everything else, then the minimum number of hyperplanes required is equal to $\sum_{i = 1}^n (|S_i| - 1)$. This number can be pretty large compared to the previous one.

This is the famous result of Alon and Füredi from 1993 which was proved in its full generality using the polynomial method. They were answering a question of Komjáth, which was for the special case of $S = \{0, 1\}^n$, but this same result had in fact been proved already for another special case in the late 70’s. Jamison, and Brouwer-Schrijver, proved this bound for $F = \mathbb{F}_q$ and $S = \mathbb{F}_q^n$. They were motivated by a question in finite geometry about the blocking number of an affine space, asked by Jean Doyen, which is equivalent to the dual statement. Their proofs were also algebraic and in fact they are one of the first instances of what we now call the polynomial method (see this for a history). In finite geometry circles, this method was sometimes even referred to as the Jamison method.

In the 90’s, Bruen proved an interesting generalisation of the Jamison/Brouwer-Schrijver result, motivated by problems in finite geometry, where the (multi)set of hyperplanes covered every point except $\vec{a} \in \mathbb{F}_q^n$ at least $k$ times, while not covering $\vec{a}$ at all. He proved the lower bound of $(n + k - 1)(q-1)$ and asked about the sharpness of this bound. The sharpness was addressed in some later papers, for example by Simeon Ball and then Corrado Zanella. In 2009, Ball and Serra generalised this covering with multiplicities result to the case of arbitrary finite grids by proving a general lower bound on the degree of polynomials that vanish everywhere on the grid with a certain multiplicity except at one point. Recently, there has been a lot of interesting activity in this hyperplane covering problem with multiplicities, for the special case of binary hypercubes. This is what I want to focus on in this blog post. I also want to mention some interesting new results over the binary field $\mathbb{F}_2$ that I have proved in collaboration with Simona, Shagnik and Tamás, which nicely complements this recent work.

Let’s first write down the main extremal functions that we are interested in. Given a field $F$ and positive integers $n, k, s$, with $s \leq k - 1$, define a $(k, s)$-cover of $\{0, 1\}^n \subseteq F^n$ as a multiset $\mathcal{H}$ of hyperplanes with the property that there are exactly $s$ element of $\mathcal{H}$ passing through the origin and for every point $\vec{p}$ in $\{0, 1\}^n \setminus \{\vec{0}\}$ there are at least $k$ element in $\mathcal{H}$ passing through $\vec{p}$. We then define:

$h(n, k, s) = \{\min |\mathcal{H}| : \mathcal{H} \text{ is a } (k, s)-\text{cover of } \{0, 1\}^n\}$;

$g(n, k) = \min_{s} h(n, k, s)$;

$f(n, k) = h(n, k, 0)$.

Note that $g(n, k)$ has the following natural reformulation: it is the smallest number of hyperplanes that cover every non-origin point at least $k$ times but only cover the origin at most $k - 1$ times. All of these function are the same for $k = 1$ and equal to $n$ by the result of Alon and Füredi. It is also an easy exercise to show that $f(n, 2) = g(n, 2) = n + 1$ for any field $F$. I have hidden the field $F$ in the notation of these functions, and so I’ll make it clear what field we are talking about before making any statement.

Last year, Clifton and Huang studied the function $f(n, k)$ when $F = \mathbb{R}$ and proved the following results:

$f(n, 3) = n + 3$,

$n + k + 1 \leq f(n, k) \leq n + \binom{k}{2}$, for all $k \geq 4$ and $n \geq 3$, and

$f(n, k) = (1 + 1/2 + \cdots + 1/n + o(1))k$ for fixed $n$ and $k \rightarrow \infty$.

Moreover, they conjectured that we should have $f(n, k) = n + \binom{k}{2}$ for every fixed $k \geq 3$ and $n$ large enough as they couldn’t find any better construction than the one where you take the $n$ hyperplanes defined by $x_i = 1$ and $k - t$ copies of hyperplanes $\sum_{i = 1}^n x_i = t$, for $1 \leq t \leq k - 1$. This seems to be a very challenging problem.

Very recently, Sauermann and Wigderson have managed to improve the Clifton-Huang lower bounds on $f$ by proving a general result about polynomials vanishing with multiplicities on binary hypercubes. In particular, they prove the following results:

1. $h(n, k, k - 1) = n + 2k - 2$ for any field $F$.
2. $g(n, k) = n + 2k - 3$ for all $n \geq 2k - 3$ if we have $\mathrm{char}~F = 0$.

Note that since $f(n, k) \geq g(n, k)$ by the definition of these function, the second statement improves the lower bound of Clifton and Huang for $n \geq 2k - 3$ (as the field $\mathbb{R}$ has characteristic $0$). They also ask about extending their results to fields of positive characteristic and give some examples over finite fields that shows how their proof fails to give the same bound on $g(n, k)$ (or $f(n, k)$. In particular, over $\mathbb{F}_2$ they show that one can’t prove that $f(n, 4) \geq n + 5$ using their method.

Last year, while I was still in Berlin, I suggested studying these problems in our group workshop and we managed to obtain some interesting preliminary results for the case of $\mathbb{F}_2$ with my colleagues. We now have a paper, which is soon going to be posted on the arXiv, where we prove the following results for $F = \mathbb{F}_2$.

Theorem 1. For $n > 2^k$, $g(n, k) = n + 2k - 3$. For $k \geq 2^{n - 2}$ , $g(n, k) = 2k - \lfloor k/2^{n-1} \rfloor$.

Our proofs don’t involve any polynomials. We simply use some combinatorial arguments where one crucial ingredient in our proof is a coding theoretic interpretation of the function $f(n, k)$ that implies the following:

Theorem 2. For all $k \geq 2$ and $n \geq 1$, $f(n, k) \geq n + \lfloor (k -1)/2 \rfloor \log (2n/(k-1))$.

(One can also show the upper bound of $f(n, k) \leq n + (k - 1) \log(n/(k-1))$.)

Another important ingredient is the following statement, which we prove via induction, but it in fact follows from the work of Sauermann and Wigderson as well:

Lemma 3. For $n, k \geq 1$, we have $h(n, k, k - 1) = n + 2k - 2$.

Moreover, because our methods are combinatorial, we can generalise to covers using smaller dimensional subspaces, which is something that’s not at all easy to do in the polynomial method. If we denote by $g(n, k, d)$ the function where the cover involves $(n - d)$-dimensional affine subspace of $\mathbb{F}_2^n$, then we show the following:

Theorem 4. For $n > 2^{k2^d -d - k + 1}$, $g(n, k, d) = k2^d + n - d - 2$. For $k \geq 2^{n - d - 1}$, $g(n, k, d) = k2^d - \lfloor k/2^{n-d} \rfloor$.

This result in fact generalises the binary case of Jamison’s theorem who proved that for $F = \mathbb{F}_q$ we have $g(n, 1, d) = q^d - 1 + (n - d)(q - 1)$. (though there is no condition on $n$ being large enough in Jamison’s theorem)

Sadly, our methods don’t give us anything new for the case of $F = \mathbb{F}_q$ when $q > 2$. I believe that for larger fields one needs to improve upon the existing algebraic methods. Also, over $\mathbb{F}_2$ while we know that the threshold $k \geq 2^{n - 2}$ in Theorem 1 is tight, there is a lot of room for improvement in the bound $n \geq 2^k$. We suspect that except for some small exceptions (that arise due to the Golay code), the correct threshold should be $k + 1$. More specifically, we believe that $g(n, k) = n + 2k - 3$ for $n \geq \max\{k+1,13\}$. We do show that $g(k, k) \leq 3k - 4$ for all $k$, and thus this threshold of $k + 1$, if true, would be tight.

I would love to see a proof (or a counterexample) to the following simpler statement,

$g(n, k) = n + 2k - 3$ for all $n \geq poly(k)$.

It will also be very interesting to determine what happens between these extreme ranges of parameters, for example when $n \sim k^{c}$ for some constant $c < 1$. We leave these as open problems and we are looking forward to more new developments in this topic which can help solve them.

Edit 29/01/2021: The paper is now available on the arXiv.

## Improved lower bounds for multicolour diagonal Ramsey numbers

Big news in combinatorics today: David Conlon and Asaf Ferber have posted a 4-page preprint on arXiv that gives exponential improvements in the lower bounds on multicolour diagonal Ramsey numbers, when the number of colours is at least $3$ (also see the post by Gil Kalai). The best known lower bounds so far were completely probabilistic, combined with an inductive argument of Lefmann that gave better bounds for $\geq 4$ colours. The new construction on the other hand has a deterministic part to it, which in fact uses finite geometry. In this post I’ll describe their construction, and also show how to improve their bounds further using a simple observation.

The multicolour Ramsey number $R(t; \ell)$ is the minimum $n$ such that any colouring of the edges of $K_n$ using $\ell$ colours gives rise to a monochromatic $K_t$, which is known to exist by Ramsey’s theorem. Ignoring the lower order terms (which is what the big recent breakthrough of Ashwin Sah improved for $\ell = 2$), the best upper bound we have for all $\ell \geq 3$ is $R(t; \ell) \leq \ell^{\ell t}$, and improving the base of the exponent from $\ell^\ell$ to anything smaller seems out of reach by our current methods. As far as the lower bounds are concerned, the best lower bounds for $\ell = 3, 4$ are $R(t; 3) \geq (\sqrt{3})^t$ and $R(t;4) \geq 2^t$, as shown by Erdős in 1940’s. For larger $\ell$, one can improve the lower bound of $R(t;\ell) \geq \sqrt{\ell}^t$ by using the following observation of Lefmann:

$R(t; \ell_1 + \ell_2) - 1 \geq (R(t;\ell_1) - 1)(R(t;\ell_2) - 1)$.

Conlon and Ferber improve all of these bounds as a corollary to the following result:

Theorem 1. For any prime $p$, $R(t; p + 1) > 2^{t/2} p^{t/3 + o(t)}$.

This, along with Lefmann’s observation, gives an exponential improvement for the known lower bound on $R(t;\ell)$ for all $\ell \geq 3$.

As the appearance of the prime $p$ in this result might suggest, it uses finite fields. In fact, it uses the basic theory of polar spaces/quadratic forms over finite fields. By giving more precise estimate on a certain count in their proof, I can show the following:

Theorem A. For any prime $p$, $R(t; p + 1) > 2^{t/2} p^{3t/8 + o(t)}$.

Note that this is an exponential improvement. Let’s see how we can prove this.

Let $p$ be a prime, and let $V$ be the set of all isotropic vectors in $\mathbb{F}_p^t$ with respect to the symmetric bilinear form $B(x, y) = x_1 y_1 + \cdots + x_t y_t$. In other words, $V = \{(x_1, \dots, x_t) \in \mathbb{F}_p^t : x_1^2 + \cdots + x_t^2 = 0\}$. It is well known, and in fact not so hard to show, that $|V| \sim p^{t - 1}$. Though all we need for the bounds is that $|V| = p^{t - O(1)}$, which is what Conlon and Ferber show in their paper (more precisely, they show $|V| \geq p^{t-2}$). Consider the following edge colouring of the complete graph on $V$: the edge $\{x, y\}$ is coloured with $i \in \mathbb{Z}$ if $B(x, y) = i \pmod{p}$ with $i \neq 0$, and if $B(x, y) = 0$ then it is coloured from the set $\{p, p + 1\}$ uniformly at random and independently for all such edges. Thus we have a $(p + 1)$-colouring of the complete graph on $|V|$ vertices, where the colours are just the integers $1, \dots, p + 1$.

In each colour $1 \leq i \leq p - 1$ we can show algebraically that the graph on these edges has no cliques of size larger than $t$, assuming $t$ is not divisible $p$, which we will assume (for the exact argument see the paper of Conlon and Ferber). Therefore, we are aiming for a lower bound on $R(t+1; \ell)$, but asymptotically it would be the same for $R(t; \ell)$. Now with the random colouring in the last two colours, Conlon and Ferber show that for $n = 2^{t/2}p^{t/3 + o(t)}$, there exists a subset of $V$ of size $n$ such that the complete graph on these vertices, with the same edge-colouring as before, has no monochromatic clique of size $t$ in colours $p$ or $p + 1$. This is done by choosing a random subset where each vertex is chosen with a certain probability which is a function of $n$. Therefore, we have a random colouring of certain pairs of vectors, and a random subset of all the isotropic vectors.

A crucial step in the estimation used in the proof, that leads to this particular choice of $n$, is an upper bound on the number of cliques of size $t$ in the graph defined on $V$ where two vectors are adjacent if they are orthogonal to each other, that is, $x \sim y$ if $B(x, y) = 0$. Note that in our edge colouring, the edges of this graph were the ones that got colours from $\{p, p + 1\}$. Let’s denote this number by $N_t$. Each such clique is a collection $x_1, \dots, x_t$ of vectors in $\mathbb{F}_p^t$ such that $B(x_i, x_j) = 0$ for all $1 \leq i \leq j \leq t$. Conlon and Ferber estimate the number $N_{t, r}$ of such cliques that have rank $r$, that is, the dimension of the space spanned by $x_1, \dots, x_t$ is $r$, by $N_{t, r} \leq p^{2tr - 3r^2/2 + r/2}$, and then just use the upper bound $N_t = \sum_{r = 1}^{t} N_{t, r} \leq t \max_{r = 1}^t N_{t, r}$.

Until this point it’s all fine, but what we can improve upon is the argument that follows. They say that $N_{t, r}$ is maximized for $r = 2t/3 + 1/6$ (which is not at all hard to see) and thus conclude that $N_t \leq p^{2t^2/3 + o(t^2)}$. But they miss a crucial linear algebraic fact at this point. The rank of a clique can never be larger than $t/2$. This is an easy fact that follows from (i) $\dim S + \dim S^\perp = t$ for any subspace $S$, and (ii) $S \subseteq S^\perp$ for a totally isotropic subspace, which is exactly what the span of the vectors forming the clique is going to be. In terms of the theory of polar spaces (or quadratic forms), it says that the rank of a polar space is always at most half of the dimension of the underlying vector space (something all the finite geometers must be familiar with). From this we see that $N_t \leq t N_{t, t/2} \leq p^{5t^2/8 + o(t^2)}$. After this, if you do the calculation of Conlon-Ferber then you’ll see that we can take $n = p^{3t^2/8 + o(t^2)}$, thus proving Theorem A. $\square$

Remark: If you are familiar with the theory of polar spaces, then you can easily estimate this number $N_t$, as each such clique must be contained in a generator (maximal totally isotropic subspaces) and you can count the total number of generators (see for example Theorem 4.11 in the book by Simeon Ball). Depending on the type of the polar space, you can even get an exact count but here we are just interested in the asymptotic results.

What’s next?

I think it’s really exciting that finite geometry, or in fact anything not completely probabilistic, can be used in the case of diagonal Ramsey numbers to improve the known bounds. This was recently shown to be true for the two-colour off-diagonal case by Mubayi and Verstraete, which was extended to multicolours by He and Wigderson, as I describe here and here. Therefore, finite geometry is paying a crucial role in developing Ramsey theory. I think a deeper understanding of these constructions, and new ideas from finite geometry (perhaps group theoretic ones) can lead to even further progress.

Update 28/09/20: Yuval Wigderson has managed to improve the bound further for $\ell \geq 4$ colours (see this).

## The dual version of Ryser’s conjecture

I talked about our new results related to Ryser’s conjecture in a previous post (also see an even earlier post). The conjecture, and its variants, have some interesting equivalent formulations in terms of edge colourings of graphs. While I was vaguely aware of that, I never looked into it in detail. Thanks to the new paper of DeBiasio, Kamel, McCourt and Sheats, I have now decided to study it properly (and share what I understand). In this post I would like to explain and prove this so-called duality.

Consider the following two statements (and try to prove them without reading forward):

(1) Given $n$ points in $\mathbb{R}^2$, let $k$ be the maximum number of points that you can choose from these such that no two points share an $x$ or a $y$ coordinate. Then these $n$ points can be covered by $k$ axis parallel lines, i.e., horizontal or vertical lines.

(2) Let $G$ be a graph of independence number $\alpha$ whose edges are coloured red or blue. Then the vertices of $G$ can be covered by $\alpha$ monochromatic sub-trees (different trees can have different colours but all the edges of one tree must have the same colour).

While both of these statements do have the common feature of being covering problems, it is not immediately clear that they are in fact both implied by the classical theorem of Kőnig. Ryser’s conjecture is a generalisation of this, which in terms of (1) can be seen as a generalisation to $\mathbb{R}^n$ and in terms of (2) can be seen as a generalisation to $r$ colours. We will focus on (2) here, as the connection to (1) is sort of immediate and I don’t know if the geometric point of view has any clear advantage.

Just as a recap, here is what Ryser conjectured. Given an $r$-partite $r$-uniform hypergraph $H$, we have $\tau(H) \leq (r - 1) \nu(H)$, where $\tau$ is the vertex cover number of a hypergraph and $\nu$ is the matching number.

For a graph $G$, let $tc_r(G)$ denote the smallest number of monochromatic connected subgraphs whose vertices cover $V(G)$, over all $r$-edge colourings of $G$. Here a monochromatic connected subgraph can also be a single vertex. You can in fact take these subgraphs to be the connected components in the graph $G_i$ formed by taking the edges coloured by the $i$-th colour. If the grah $G_i$ is disconnected, then you get single vertex monochromatic subgraphs. Gyárfás observed that Ryser’s conjecture is equivalent to $tc_r(G) \leq (r - 1) \alpha(G)$, for all graphs $G$. We will soon see why this is so.

Firstly, to get some familiarity with $tc_r(G)$, observe that the well known fact that for any graph $G$, either $G$ is a connected graph or its complement $G^c$ is a connected graph, can be stated as $tc_2(K_n) \leq 1$ for all $n$. This is because we can look the edges of $G$ and $G^c$ giving us the $2$-edge colouring of $K_n$, where $n = |V(G)|$. Also note that $tc_r(G) \leq n_i$, where $n_i$ is the number of connected components in the graph $G_i$, given an $r$-edge colouring of $G$. The notation $tc$ stands for tree cover, which comes from the fact that any connected graph has a spanning tree, and thus instead of covering with monochromatic connected subgraphs we can equivalently cover with monochromatic trees.

Let’s prove that these two conjectures are equivalent. So, say Ryser’s conjecture is true, and let $G$ be a graph with each of its edge given a colour from $\{1, \dots, r\}$. Denote the subgraph formed by edges whose colour is $i$ by $G_i$. Construct a hypergraph $H$ as follows. The vertices of $H$ are the connected components of $G_i$, for $i = 1, \dots, r$. These vertices are clearly partitioned into $r$ sets, corresponding to the colour $i$. For each vertex $v$ of $G$, define a hyperedge of $H$ as the set of all monochromatic connected components that contain $v$. Each hyperedge has size $r$ since for each $i$ the vertex $v$ is contained in a unique component of $G_i$. Thus, we have an $r$-partite $r$-uniform hypergrah $H$. See the images below for some concrete example.

Now a matching in $H$ corresponds to a set of vertices in $G$ such that no two of these vertices are together in any monochromatic connected component. In particular, there is no edge between in any two of these vertices. Therefore, $\nu(H) \leq \alpha(G)$. A vertex cover of $H$ corresponds to a collection of monochromatic components that cover all the vertices of $G$ (as they correspond to the hyperedges of $H$). Therefore, $tc_r(G) \leq \tau(H)$. Since we assumed that Ryser’s conjecture is true, we also have $\tau(H) \leq (r - 1) \nu(H)$. Combining all of these inequalities, we get $tc_r(G) \leq (r - 1) \alpha(G)$.

Now assume that the conjecture of Gyárfás is true, and let $H$ be an $r$-partite $r$-uniform hypergraph. Construct a graph $G$, where each hyperedge is a vertex of $G$ and two vertices are adjacent if the corresonding hyperedges intersect non-trivially. We colour an edge $uv$ of $G$ by the colour $i$ if the hyperedges corresponding to $u$ and $v$ meet each other in the $i$-th part of the $r$-partition. If they meet in more than one parts, then pick a colour arbitrarily. Again, for concreteness, we can see that in the first image above starting from the hypergraph on the right hand side we reach the graph on the left hand side. In the second case, however, we get the following:

It follows directly from the definition that independent sets in the graph are in bijective correspondence with matchings in the hypergraph, and thus $\alpha(G) = \nu(H)$. A monochromatic connected component in $G$, which is in fact a clique, corresponds to the vertex of the hypergraph where all of the hyperedges meet. Therefore, a cover of the vertices of the graph using monochromatic connected components gives rise to a vertex cover of $H$ of the same size, which implies that $\tau(H) \leq tc_r(G)$. Since $tc_r(G) \leq (r - 1) \alpha(G) = (r - 1)\nu(H)$, we get $\tau(H) \leq (r - 1) \nu(H)$, thus proving the equivalence of these conjectures.

This duality can be applied to other variants of Ryser as well. For example, in my work with Shagnik, Patrick and Tibor, we studied the min vertex cover size for the case when the hypergraph is $t$-intersecting, i.e., any two hyperedges share at least $t$ vertices. This can be seen as determining min $tc_r(K_n)$ with the added condition that in the edge colouring any two vertices are contained in at least $t$ different monochromatic components. Another variant that we studied was that of $k$-wise $t$-intersecting hypergraphs. This can be seen as determining the tree cover number of the $k$-uniform complete hypergraphs, where every set of $k$ vertices are contained in at least $t$ different monochromatic components. In this formulation Király had already solved the problem for $k \geq 3, t =1$, which we were unaware of, and our work resolves the general case of arbitrary $t \geq 1$. This complete knowledge of the extremal function for $k \geq 3$, and very limited knowledge for $k = 2$ is quite interesting.

One particular advantage of the edge colouring formulation is that we have a richer structure there that can be exploited to study interesting variants that have no natural formulation in terms of hypergraphs. For this, and more, check out the paper of DeBiasio, Kamel, McCourt and Sheats. On the other hand, the hypergraphs formulation somehow feels more natural, and in fact many proofs (for example the $r=3$ case of Ryser) are best stated in terms of hypergraphs. So in conclusion, it’s a good idea to be aware of both.