Scientific proofs that we can rely on!

Nov 27, 2024

The total number of submissions in arXiv, the open-access repository of academic manuscripts is 2.6 million, as of November 2024. Here is a graph of new monthly submissions to the repository

This shows you a picture of the sheer volume of research being done by researchers all across the globe. Naturally comes a question: are all of these researches completely valid? Are the 2.6 million claims made by these researches are all true? (at least, based on whatever we, as humankind, know so far).

Well, since the “truth” can be subjective for different topics (e.g. economics, where you can have multiple equally valid but contradicting theories), let’s turn our focus on the research for the “Mathematics” category alone. From 2009 onwards, there have been more than 354 thousand such manuscripts:

It is STILL a pretty big number if we want to verify the validity of the proofs of different claims made by each of these manuscripts.

Now think of you as a young researcher venturing into the world of mathematical research. You will possibly go to this arXiv, get a bunch of papers on a related problem that you are working on, and you would start thinking about how you can apply the results in those papers to your specific problem.

But, is the results present in those paper is true?

You have no direct way of knowing for sure.

Many of these papers would probably be peer-reviewed, and you can trust the academic honesty and integrity of the authors and the reviewers.
The other approach is to try to verify the proofs presented in those papers yourself. But then, those authors will cite another result from an earlier paper, and you gotta verify the earlier paper now. And, you can expect, the chain continues that way!

To give you a better analogy: Let’s say you receive twenty thousand rupees of cash from someone after you sell your old laptop to the guy. What would you do after you get the cash? You would possibly verify if the denominations you received are legal tenders or not, you will verify the authenticity. And you have a direct way to verify that, look for some special marks here and there. But imagine instead just asking the person “Are the denominations legal?”, and once the person verifies everything is okay, would you believe him for sure?

Such a practice based on trust and academic integrity is at the foundation of academic research of mathematics, which was supposed to be based on cold-harsh facts based on logical deductions starting with the axioms. So it seems that we have diverged from the principles of mathematics, at least at the current times, where proofs of mathematical results are to be taken with a pinch of salt. While I point it out specifically for mathematics, so is true for all research based on mathematical theories, including statistics, machine learning and emerging studies on artificial intelligence.

So, what is a proof?

The typical way to verify the correctness of a mathematical result is to produce a proof. A proof is a structured chain of arguments that can be used to deduce a result starting from a set of axiomatic truths. Now, each of these arguments presented must be one of valid logical propositions. Some examples are:

If a statement p is true, then “not p” must be false. (Meaning if 2 + 2 = 4, then 2 + 2 ≠ 4 must be false)
If a statement p is true, and “p implies q” is true, then q must be true. (For example, if Messi scoring a goal implies Argentina will win, and Messi scores a goal, then Argentina wins the game).

and so on.

And the axiomatic truths are like some basic principles, that we can no way of proving, but we can intuitively and empirically verify that they are in fact true. However, strictly rigorously speaking, every mathematical result ever should be stated as

(If the set of axioms …. hold true, then) so and so … result holds.

Since this part shown in the brackets applies to every single mathematical result, hence they are often omitted and are added implicitly by a careful reader.

Okay, what’s wrong with the proofs if they follow precise logic?

Well, nothing is wrong with the proofs. The problem is the way proofs are presented in research — through human language.

The basic problem with the proof is that they are presented in plain old English, which allows language tricks to be involved allowing different ways to bypass logical deductions. To illustrate, consider the following sentence:

“This statement is false”

It is impossible for a logically consistent person to write this using a logically consistent language. In particular, if the statement is true, then whatever it is saying is true, that means it must be false. On the other hand, if the statement is false, whatever it is saying is false, hence the statement itself must be true.

But since I can write it, my head is not blown apart yet, tells me that neither I am a logically consistent being nor the language I am writing in is logically consistent. Therefore, the problem with the validity of the proofs is that they are written in human language. Rather, we want to write them in a different language — the so called “formal language”.

And the translation process from human language to this formal language is called “formalization of mathematical proofs”, and so far, the best way to ensure the validity of a proof.

Formal Proofs

To understand formalization, we think to understand how mathematics is built from its core. Formalization is like playing with Lego bricks. The bricks (axioms) have a fixed design, and the rules (logical deductions) govern how they can be connected. While the pieces are simple, the combinations are endless—you can build anything from a tiny house to a spaceship, much like how mathematicians start with simple rules and create complex theorems.

To illustrate better, think of chess. The rules of chess are like axioms, those are very basic fundamental and simple rules.

Pawns move forward but capture diagonally.
Knights move in an L shape.
A queen can move any number of squares horizontally, vertically or diagonally.
The game ends when King is in checkmate.

Based on these rules alone, and the principle of logical deductions, people came up with all sorts of chess strategies (e.g. forks, pins, x-rays). For example,

Since a queen does what a rook or bishop can do and also more, its value must be higher than a rook or a bishop.
A bishop can stay only on same-coloured squares.

And here is an interesting theorem for you to try to prove yourself:

“If only a king and bishop versus a king remain on the board, the game is a draw because checkmate cannot be forced.”

You can prove it using the simple rules of chess alone.

Formalization of mathematical proofs is just demonstrating this connection to the basic axioms (or the theorems you have already proved), using these logical deduction steps.

So, how do we verify formalized proof?

To verify the formalized proof, we can simply verify if every statement present in the proof is a “valid” logical deduction step, and then if it ultimately reaches one or more of the axioms. To illustrate:

Formalization is very similar to the chain of legal documents to prove the ownership of land or real estate property. To verify the ownership, you need to

Verify that each prior transactions and owner transferships of the land is legally valid.
And verify that following the chain leads you back to some “authorized” property owners (like Government or some medieval-times king, minister or farmers, etc.)

Now for usual mathematical proofs, this chain can be VERY VERY LONG. Standard results would be around ten or twenty thousand steps from the axioms and possibly even more. Honestly! I have no idea.

Therefore, again it is impossible for a human to verify each of these steps, and that’s where a computer program such as LEAN comes into the picture.

LEAN was a software project launched by Leonardo de Moura from Microsoft Research in 2013. Over times, it has grown into a full blown programming language (and also a theorem prover), and is currently open-sourced. A thriving community around LEAN is currently maintaining the largest repository called “MathLib” containing the formal proofs of hundreds of mathematical results.

How does LEAN verify the proof? Can we trust it over humans?

LEAN is based on the principles of “dependent type theory”.

If you know of any programming language, imagine what is the first thing you have learnt there. It is possibly the use of variables and the different types of data it supports.

For example:

“Hello world” is a variable of type “string”.
2 is a variable of type “integer”
3.14 is a variable of type “float”
“true” is a variable of type “boolean”

and so on. And what will happen if you declare a statement like:

let x: number = "Hello world";

The program will start to throw an error that “Hello world” is not a numeric type value. Therefore, every programming language (or software) is equipped with the basic idea that there are valid datatypes, and you cannot assign a value of a different type to another type.

There are two fundamental reasons you can trust the “type safety” features of your programming language.

The functions or operations that you are going to perform will not be possible for a type-mismatched case. The same C-like compiler will throw an error. For example, you cannot multiply a “number” with a “string”.
Many of the types have different memory requirements. For example, an integer will require 4 bytes of memory to store, while a single character takes only a single byte. Often, the operating system prevents you from overwriting the memory space of one data type with another data type, just to avoid issues like “buffer overflows” or “reading unallocated memory”.

So now, LEAN takes this idea of “type safety” and embellishes it very much, which allows us to perform the verification of proofs. Let’s dive deeper into that.

The Hierarchy of Types

Let’s start with the variable x = 12. We know, the type of x is “number”. What this means is that “12 is a number”.

Then you might ask, what is the type of number? It is “type”. What this means is that “number is a type”.

Then you might ask, what is the type of type? Usually, most of the programming languages will break at this point: However, LEAN will not. LEAN says, the type of type is something, we will refer to that as “type 1”. What this means is that “type is a type1”.

Now you can ask, what is the type of type1? LEAN defines it as type2. What it means is that — “type1 is a type2”.

It can become very abstract like this, but essentially it shows that LEAN defines a hierarchy of types (like a hierarchy of infinities!), where every type<n> is an instance of type<n+1>. Thus, by doing this, it also allows the possibility to create arbitrary kinds of types, which do not usually exist in standard programming languages.

This also means EVERY SINGLE OBJECT in LEAN has Types.

Types Dependent on Types

Let’s now talk about functions. We will start with the first basic function that we always learn, the print() function. This function takes in any argument, and prints out the data on your screen — Simple right?

If every single object in LEAN has types, when what is the type of this function? Usually, if we consider multiplication for example, it takes two numbers and produces another number right? So, we can write its type like:

* : number x number -> number

But for print function, it is a bit tricky, as it can take any type as input. It can take “string” and shows that string on the screen, but it can also take a number as input. So, when we are calling print, there are actually a bunch of functions,

print.string : string -> void
print.number : number -> void
print.boolean : boolean -> void

And in general, for any “type” t, we have

print.t : t -> void   -- where t : type

This kind of functionality is called “polymorphism”, and it provides a useful way to call the same print function, while internally, the function figures out which version of print it needs to call based on its input.

A similar approach is also present for “+” function. For numbers, it just returns the sum of those numbers. But if you pass two strings, it returns the concatenated string.

"Hello" + "World" => "Hello World"
2 + 3 => 5
true + true => true

Therefore, the “+” function is also polymorphic where

+.t : t x t -> t  -- where t : type

This means, we constructed a new type (the type of the function), which is also dependent on another type “t”, which is why the theory is called “dependent type theory”.

Proving Theorems Using Dependent Type Theory

Now to prove theorems in LEAN, we use the following strategy:

Every hypothesis or theorem or mathematical result that we want to prove, is an instance of a special type called “Proposition” denoted as Prop. Therefore,

2 + 2 = 4 : Prop
(a - b)^2 = a^2 + 2ab + b^2 : Prop
b = 1 and 3a + 4b = 10 implies a = 2 : Prop

Note that, propositions can be true or false.

For every proposition, we have an associated type, namely the type of its proof. All proofs of that proposition are simply instances of that type. Therefore, to verify a proof or verify if a result is true, all we need to do is to produce an instance (or example) of it.
- And then, LEAN’s type checker will ensure that the instance or the example is valid proof.

To explain it further,

Q) How do you know if natural number exists or not?
A) Here is a natural number, 0. Since you can write statments like let x: Nat = 0 , the natural numbers must exist.
Q) How do you know if ghosts exists or not?
A) I don’t know. I cannot find examples to write like let x: Ghost = <something>
Q) How do you know if “b = 1 and 3a + 4b = 10 implies a = 2” is true or not?
A) We know “b = 1 and 3a + 4b = 10 implies a = 2” is true if its proof exists. And here is a proof <p>. As you can write statements like let x: Proof(b = 1 and 3a + 4b = 10 implies a = 2) = <p> , the statement is true.

Therefore, all that LEAN does is that it uses its very powerful type-checking abilities to verify the proof that you have written, is actually matching the type of the statement.

Just to illustrate, this is proof in LEAN that shows, if (a - b) = 4 and a*b = 1, then (a+b)-squared is equal to 20.

example {a b : ℚ} (h1 : a - b = 4) (h2 : a * b = 1) : (a + b) ^ 2 = 20 :=
  calc
    (a + b) ^ 2 = (a - b) ^ 2 + 4 * (a * b) := by ring
    _ = 4 ^ 2 + 4 * 1 := by rw [h1, h2]
    _ = 20 := by ring

If you look carefully, you will find that

We have {a b : ℚ} (h1 : a - b = 4) (h2 : a * b = 1), at first, which simply tells the first part of the proposition.
Then after a colon, we specify the proof type of the proposition i.e., “(a + b) ^ 2 = 20”.
Then by using :=, we simply produce an example instance of the type of proof of “(a + b) ^ 2 = 20”.

And the special statements like “ring” and “rw”, help LEAN’s type checker autocomplete some parts of these definitions, helping you to write the formal proofs.

What does it mean for the future of research?

In August 2024, in a special lecture at the Oxford Mathematics department, Terence Tao delivered a lecture on his take regarding how AI can help us in mathematical research — and a crucial ingredient of that is software like LEAN.

Generative AI, as it stands currently, is very “good” at generating things (like text, images, etc.) and is “not so good” at solving complex math problems due to its limited capabilities of logical reasoning. While we are trying to build this reasoning capability in the large language models itself by fine-tuning or by providing additional contexts via RAG or Tools, there is an indirect way to solve this problem that can bring fruitful results much faster. That is, to let the AI like ChatGPT generate hundreds of proofs written in formal language for a given problem, and then let software like LEAN verify the proof. Essentially, this is similar to a Generative-Adversarial setup, only that the adversary is much more powerful with 100% accuracy.

A similar but more sophisticated approach has already been used in this paper, which has successfully proven 155 theorems previously unproved formally by humans — I think, this is a big step towards integrating AI with mathematical research.

I can think of one more way LEAN can help us. Suppose a young researcher was advised to work on a problem by his supervisor. Now, the first thing he/she does is to do a literature review — which is very ad-hoc, usually followed by a Google Scholar search and some help probably from a community like Stack Exchange. In many cases, I have often come across results that I wish to prove, but when searching for similar results, I don’t find proper results, partially because I am not “able to express the theorem that I want well enough” for Google to search properly. Now imagine, we perform embedding of all the LEAN proofs stored in the MathLib repository, and store the embeddings in a vector store. Then using our query, we can very easily find the theorem that we want — and that too, with a proof that we can completely rely on!

Thank you very much for being a valued reader! 🙏🏽 Subscribe below for more stories about how AI can work with other technologies to make our world a better place! 📢

Until next time.