We always think about the potential of AI changing the future. But what about the potential of AI changing the past?
My name is Youssef Nader. I'm an Egyptian AI researcher and a PhD student at the Free University in Berlin, and last year, I led the Vesuvius Grand Prize winning team on exploring this very question.
You see, the story starts almost 2,000 years ago. A Greek philosopher that we believe was Philodemus of Gadara sat in one of the many rooms of the Villa dei Papiri. He talked about music, he talked about pleasure, he talked about what makes things enjoyable, questions that still plague us until today. One of his scribes wrote down his thoughts on sheets of papyrus. The sheets were rolled and stowed away for later generations.
Fast-forward 150 years, ... Mount Vesuvius erupts, burying Herculaneum, the villa and the words of the philosopher under a sea of hot mud and ashes.
Now fast-forward again, to the 17th century. People are excavating around the area. They found beautiful statues, breathtaking frescoes and some weird-looking pieces of charcoal, like you see in this picture. This is when the first scrolls were discovered, and people were racing to excavate more of these. What knowledge is included that is not known to us now? What things should we know about these scrolls?
My name is Julian, and I am a digital archaeologist. When the pyroclastic flow hit the scrolls, it had a destructive effect. It tore into them, shredded off pieces, and it charred them badly. Even the deformation that you can see happened at that point. People, 250-something years ago, were curious what's lying inside those scrolls, hidden and not accessible anymore. Because of a lack of technology, they had to resort to physically unrolling and thereby destroying most of the scrolls. To this day, only the most damaged and deformed scrolls remain in their initial, rolled-up configuration.
Fast-forwarding a little bit, the computer age arrives. Youssef and I are born. We are going on and getting our education --
(Laughter)
and at the same time, Brent Seales, a researcher and professor, had the idea to use CT scan technology to actually digitize the scrolls, with the hope of, one day, digitally unrolling them. Behind me, you can see a video of such a CT scan, and it goes through the CT scan 3D volume, layer by layer. The papyrus is visible as a spiral, and you can see it's tightly wound-up, sometimes touching each other, flaying off. It's a difficult question, how to unroll this digitally. Nat Friedman, a Silicon Valley investor, also saw this research, and he wanted to help. That was in 2022. He reached out, and together with Brent Seales, they created the Vesuvius Challenge, with the goal to motivate nerds all over the world to solve this problem.
(Laughter)
They created a grand prize, promising eternal glory and monetary incentives to anyone who could do that.
(Laughter)
I myself saw that on the internet while writing my master's thesis at ETH Zurich, in robotics, and I was instantly happy to solve it -- or at least try, why not, you know? And I went on, joined the Discord community where all the people that were also contestants and playing with the scroll data were exchanging ideas, and I joined there and started working on it. Also there, on Discord, I met Youssef and Luke [Farritor], who would become my teammates, and with whom I would actually win the grand prize. Surprisingly, it went on, and made global headline news. It even got into the British tabloids.
(Laughter)
So when we started, there were two main problems still remaining. One, you had to unroll the scroll. And two, you then had to make the ink visible. Youssef will tell you more about that part. For me, the most exciting thing was the computer-vision problem of unrolling those scrolls virtually. I decided to iterate on a tool that was created by the Kentucky researchers, and make it faster, less prone to errors and just iterate on it and make it better. The Vesuvius Challenge team saw that and also implemented a team of 10 people that would use my tool. They would annotate scroll data, like you see in this video, where they created a red line where the surface would lie. The algorithm then would take it into 3D space, creating a three-dimensional representation of the surface. Computer algorithms would then flatten it and create a segment. This all would be called “segmentation” in the space of the scrolling and unrolling community.
(Laughter)
So I created open-source commits to this tool and implemented new algorithms from my studies, like Optical Flow, to better track the sheets through the volume, and we end up with something like what you see behind me. First off, those were really small segments, and I added improvement, made the code faster and had lots of feedback from the community. They were really happy, and I was happy getting lots of feedback. It was a really positive environment.
So in the end, I could track the performance of the algorithms, how the segmentation team performed, and I could see that my improvements, from start to finish, would be around a 10,000-fold improvement over the initial version. This algorithm was then also used to unroll all the area that you can see in our submission. All the sheets were generated with these methods.
In December, I was looking for teammates. I made a blog post, and I showcased my newest algorithms, reaching out to anyone that was willing to team up. Youssef and Luke got into contact with me. They were happy to team up, and I was happy as well.
(Laughter)
So after the virtual unwrapping, the words still are not visible. The main problem is that the ink that was used at the time was a carbon-based ink, and carbon-based ink on carbon-based papyrus in a CT scan isn’t visible, or at least [not] to the naked eye. So the same team at the University of Kentucky decided to test whether the ink was present at all in the CT scans. For this, they took some of the pieces that people broke off the scrolls, and they fed them into the same pipeline of the X-ray CT scanning, and this gives us the 3D data that we were working with. Because you can see the ink and it’s an exposed surface, you can even improve it with infrared imaging. And this gives you a ground truth of what letters you're actually trying to find. And then from there, you can train a machine-learning model to try to find these letters. The way this works is that the model looks at very small cubes at a single time and tries to decide whether there is ink present in this area or not. And then, when you keep moving this cube all around, the model gets to see different data samples and then tries to understand what ink actually is.
So this is how it looks while the model is training. It's not perfect, but you can see that, especially around the middle, the model is starting to see the letters perfectly. So the data is there. The ink is there. But it’s just very hard to find and see. Looking at the CT scan raw data on the left here, you can see the fibers, you can see the structure of the papyrus, but the letters are very, very faint. The letters from the right image are very, very faint in the CT scans. And they're actually, in this special case, characterized by a difference of contrast and some speckles, freckles, features that are very hard to see. So what happens if we try to take a look at the segment that Julian was just showing?
So this is the data that we were working with. And I'm going to give you 10 seconds to try to find the letters yourself.
(Laughter)
And as a hint, I'll tell you that there are three letters in this image. Believe me. Try to find some pattern, some crackle patterns, some cracks in there. If you were able to identify this pattern of these three letters --
(Laughter)
then congratulations. One year ago, you may have won 40,000 dollars.
(Laughter)
However, if you're like me, and you couldn't make sense of this, there's a different way that you can find this ink -- one that actually scales very, very well.
So this is where my journey begins with the Vesuvius Challenge. There is this neat idea in computer-vision literature where if you don't actually have labels, if you don't have the goal that you want your AI model to reach, you can pick an intermediary goal along the way. So, looking at these two pairs of images, our eyes can identify that these are the same images, just flipped. And we can do that because we understand the structures that are present in the images. We can see this little triangle, and it's flipped, so we know this is the same triangle. Our eyes already have this feature, but neural networks don't. When they see these images, they can't [tell] that these are the same image. So one idea, just to let it know about the structures and familiarize it with the data, is to show it different views of the same image and tell it that these are the same images. And after that, you take this model and you train it like the previous models that the University of Kentucky did. And while the approach doesn't fully work, it also doesn't fully not work. And this was the first image that was produced by the model. And there was some very faint signal in there. It seemed like the model was catching on something, but it wasn't clear, exactly, what the model was catching on. So I decided to take these predictions and create a new ground truth, asking the model, "Hey, I think these might be letters. I think there's something in there. Try to find more of this." And my ground truth, actually, has four correct letters and four other delusions. But that was OK.
So training a new model with this data, the model started to find more ink, find more letters, and the lines even looked complete. So I thought, "What are the chances that if I do this again, the models keep improving?" And this was the core behind our grand prize-winning solution. Repeating this process over and over, the models kept improving. The main trick was you needed to prevent the models from memorizing what the previous models have learned. You're essentially asking the model to learn what the other model has learned. So overfitting was a serious problem that required a lot of experiments. But in the end, getting the recipe right, we were able to predict all of these letters without the models ever seeing them. These were the first 10 letters. There are, like, 20 in there, but this was the first coherent word read from an unopened papyrus sheet.
From there, scaling the process, within weeks, we had, now, columns of text, even special characters that papyrologists found very interesting that the model was able to find. The approach was open-sourced, and the data and the code were out there, and the race for the grand prize was on. Recovering four paragraphs at an 85-percent clarity. And the key to our success was perfecting the data and the model with so many iterations and so many experiments. In the end, we were able to recover more than 14 columns of text, and 2,000 letters.
(Applause)
2,000 characters safely stored away two millennia ago. In just nine months, we discovered them again. AI helped us, in large portions, writing better code and even being part in our algorithms. It opened a window into the past. What's next? Let's open this window more. AI will help us access information that was so far safely locked away. In the words of the author, "We do not refrain from questioning nor understanding, and may it be evident to say true things as they appear."
(Applause)