## The Trifference Problem

What is the largest possible size of a set $C$ of ternary strings of length $n$, with the property that for any three distinct strings in $C$, there is a position where they all differ?

Let $T(n)$ denote this largest size. Trivially, $T(1) = 3^1 = 3$ as you can take all possible ternary strings of length one. After some playing around you can perhaps prove that $T(2) = 4$ (I encourage you to try it so that you understand the problem). With a bit more effort and the help of a computer, you might also be able to show that $T(3) = 6$, and $T(4) = 9$. For example, here is a set of nine ternary strings showing that $T(4) \geq 9$: $0000, 0111, 2012, 2201, 2120, 0111, 1012, 1101, 1210$. You should check that for any three strings from these nine, there is at least one position where the values that you get is a permutation of $\{0, 1, 2\}$.

Things start getting much harder for larger lengths. In fact, despite this problem being more than 50 years old, only last year it was shown that $T(5) = 10$ and $T(6) = 13$ (see this for the details). Computing the value of $T(7)$ is already an open problem!

Perhaps a much more tractable problem than computing $T(n)$ exactly is to find out how this function behaves. For example, does it grow exponentially in $n$? The answer is yes, as it was shown by Körner and Marton in 1988:

$T(n) \geq \left( \frac{9}{5} \right)^{n/4}.$

Surprisingly, that is still the best lower bound that we know! The situation is even more surprising for upper bounds, where the following result of Körner from 1973 is still essentially the best bound:

$T(n) \leq 2 \left(\frac{3}{2}\right)^n.$

The constant $2$ in front of the exponential term can be improved by using the new values of $T(n)$ for $n = 5, 6$, but we have absolutely no idea how to improve the base $3/2$ of the exponent. Here is a quick proof for this upper bound:

Let $C$ be such a collection of strings. Consider the three sub-collections $C_1, C_2, C_3$, where $C_1$ = all strings in $C$ starting from 0 or 1, $C_2$ = all strings starting from 1 or 2, $C_3$ = all strings starting from 2 or 0. Then $2|C| = |C_1| + |C_2| + |C_2|$ since each string in $C$ is counted twice on the right-hand side. Note that the sets $C_i$‘s also have the trifference property, and since we know the first letters in them, they all have size at most $T(n-1)$. This gives us the recurrence $2T(n) \leq 3T(n - 1)$, that implies the bound. $\square$

The Trifference Problem, is the problem of understanding the limit

$\lim_{n \rightarrow \infty} \frac{\log_3(T(n))}{n}.$

The bounds above show that $\liminf \log_3(T(n))/n \geq 0.133$, and $\limsup \log_3(T(n))/n \leq 0.369$.

It is ridiculous how none of the combinatorial methods that have been developed in the past 50 years have led to any improvement in the bounds on $T(n)$. For example, Costa and Dalai have shown that the slice rank method, which was recently used to make breakthroughs on the cap set problem, and many other combinatorial problems, cannot be used in any obvious way to improve the upper bound on $T(n)$.

As we cannot make progress on the original problem, we can try to put some extra restrictions to see if that leads to any better bounds. This is precisely what Pohoata and Zakharov did last year, and studied the variation of the problem where the set $C$ is a vector subspace of the set $\mathbb{F}_3^n$, where $\mathbb{F}_3$ is the finite field of three elements. While it is somewhat natural to put such a restriction, it is also motivated by the fact that the lower bound on $T(n)$ by Körner and Marton relies on a probabilistic lifting of a trifferent code $C$ which is a vector subspace of $\mathbb{F}_3^4$ (it is in fact the set of strings of length $4$ that I wrote above). Moreover, the best known explicit constructions of these trifferent codes also happen to be linear.

Let $T_L(n)$ be the largest size of a linear trifferent code, that is, a vector subspace $C$ of $\mathbb{F}_3^n$ with the property that for any three distinct vectors in $C$, there is a coordinate where they all differ. Pohoata and Zakharov, showed that

$T_L(n) \leq 3^{\left(\frac{1}{4} - \epsilon\right)n},$

which is much better than the bound that we could prove on $T(n)$! The $\epsilon$ in their result is a small positive number that is not determined explicitly.

In my recent work with Jozefien D’haeseleer, Dion Gijswijt, and Aditya Potukuchi, we have managed to improve this bound to

$T_L(n) \leq 3^{\frac{n}{4.55}}.$

More interestingly, we have shown that essentially the same lower bound can be proven on $T_L(n)$ that was known for $T(n)$,

$T_L(n) \geq \frac{1}{3} \left( \frac{9}{5}\right)^{n/4}.$

It is worth noting that our construction is very different from the one given by Körner and Marton, as that is far from being linear. Our lower bound gives us a new motivation for studying $T_L(n)$. As $T(n) \geq T_L(n)$, we can focus on improving the lower bounds on linear trifferent codes to get an improvement on trifferent codes. Even a tiny improvement to our lower bound (which asymptotically matches the best lower bound in the trifference problem), will be a breakthrough.

In the next few posts, I will sketch the arguments that we have used to make these improvements. We will also see exciting new connections to other problems in mathematics, on blocking sets, minimal codes, and expander graphs, that we establish in our paper. In fact, we will see a bunch of problems that are equivalent to determining $T_L(n)$, and thus any one of those could be studied to make progress on the trifference problem.