Chand John: What's the fastest way to alphabetize your bookshelf?

You work at the college library. You're in the middle of a quiet afternoon when suddenly a shipment of 1,280 different books arrives. The books have been dropped of in one long straight line, but they're all out of order, and the automatic sorting system is broken. To make matters worse, classes start tomorrow, which means that first thing in the morning, students will show up in droves looking for these books. How can you get them all sorted in time? One way would be to start at one end of the line with the first pair of books. If the first two books are in order, then leave them as they are. Otherwise, swap them. Then, look at the second and third books, repeat the process, and continue until you reach the end of the line. At some point, you'll come across the book that should be last, and keep swapping it with every subsequent book, moving it down the line until it reaches the end where it belongs. Then, start from the beginning and repeat the process to get the second to last book in its proper place, and keep going until all books are sorted. This approach is called Bubble Sort. It's simple but slow. You'd make 1,279 comparisons in the first round, then 1,278, and so on, adding up to 818,560 comparisons. If each took just one second, the process would take over nine days. A second strategy would be to start by sorting just the first two books. Then, take the third book and compare it with the book in the second spot. If it belongs before the second book, swap them, then compare it with the book in the first spot, and swap again if needed. Now you've sorted the first three books. Keep adding one book at a time to the sorted sub-line, comparing and swapping the new book with the one before it until it's correctly placed among the books sorted so far. This is called Insertion Sort. Unlike Bubble Sort, it usually doesn't require comparing every pair of books. On average, we'd expect to only need to compare each book to half of the books that came before it. In that case, the total number of comparisons would be 409,280, taking almost five days. You're still doing way too many comparisons. Here's a better idea. First, pick a random book. Call it the partition and compare it to every other book. Then, divide the line by placing all the books that come before the partition on its left and all the ones that come after it on its right. You've just saved loads of time by not having to compare any of the books on the left to any of the ones on the right ever again. Now, looking only at the books on the left, you can again pick a random partition book and separate those books that come before it from those that come after it. You can keep creating sub-partitions like this until you have a bunch of small sub-lines, each of which you'd sort quickly using another strategy, like Insertion Sort. Each round of partitioning requires about 1,280 comparisons. If your partitions are pretty balanced, dividing the books into 128 sub-lines of ten would take about seven rounds, or 8,960 seconds. Sorting these sub-lines would add about 22 seconds each. All in all, this method known as QuickSort could sort the books in under three and a half hours. But there's a catch. Your partitions could end up lopsided, saving no time at all. Luckily, this rarely happens. That's why QuickSort is one of the most efficient strategies used by programmers today. They use it for things like sorting items in an online store by price, or creating a list of all the gas stations close to a given location sorted by distance. In your case, you're done quick sorting with time to spare. Just another high-stakes day in the library.

你是一名大学图书馆的管理员，正在享受午后的闲暇时光。这时快递突然送来了1280本各式各样的书。这些书被排成长长的一排，毫无规律，自动分类系统又恰好出现了故障。更糟糕的是，明天一早，前来上课的学生们，就会涌入图书馆借书。如何才能在规定的时间内整理好所有书籍？一种方法是从最前面的两本书开始。如果这两本书位置正确则不做任何改动，否则，将它们对换。然后再看第二本书和第三本书，重复这个操作，直到你排列至最后一本书。当你遇到应该位于末位的那本书，那么只需要将它不断地与后一本书交换，直至到达队伍末尾。然后，再从头开始，就可以找出倒数第二本书。如此循环，直到将所有的书归位。这个方法被称为 “冒泡排序”。这个方法非常简单，但效率低下。在第一次循环中，你需要进行1279次对比，第二次循环则需要1278次，总计818560次对比。按每次比较需要1秒，这一过程将长达9天。第二种方法是先将前两本排序。然后，将第三本书和第二本进行排列。如果第三本应位于第二本之前，则交换它们的位置。再它和第一本书进行对比，如果这本书应位于第一本之前，则交换它们的位置。现在，你就完成了前三本的排列。每次将一本书插入之前已经排列好的队伍中，与前面相邻的一本书进行对比并判断是否交换，直到这本书到达队伍中正确的位置。这种方法被称为 “插入排序”。与“冒泡排序”不同的是，这种方法通常不必比对所有相邻的书。一般来说，每本书被对比的次数只有前者的一半。这样，总共的对比次数也就是409280次。需要将近5天的时间。对比的次数还是太多。还有一种更好的办法：首先，随机抽出一本书，称之为“分界点”，然后将每本书都与之进行对比。然后，根据“分界点”，把所有应放在它之前的书放在左边，其他的放在右边。这样你就已经节省了大量的时间，因为你再也不必将左边的书，与右边的任何书进行比较。现在，只看左边的一部分。你可以从中再随机抽出一本书，然后以此为另一个“分界点”。你可以像这样不断地进行分割，产生新的“分界点”，直到得出许多的分支。每一个分支都可以用其他的方法进行重排，比如 “插入排序”。每次“分界点”循环所需要的对比次数大约是1280次。如果你的“分界点”分布十分均匀，那么把整列书分成128个分支，每个10本，大约只需要7次循环，也就是8960秒。将每个分支进行排序，每个约需22秒。这个方法被称为 “快速排序”。它能将所有的书在三个半小时之内排列整齐。但是也有特例。取出的分界点可能全都挤在一端，这样你不会节省任何时间。幸运的是，这种情况几乎不可能发生。这也使得“快速排序” 成为编程中最高效的排序方法之一。例如，它被用于网络购物中的价格排序，或是按照距离依次列举出最近的加油站地址。在给书排序的例子中， “快速排序”为你节省了大量的时间。多么充满乐趣的一天。

Chand John: What's the fastest way to alphabetize your bookshelf?

Chand John: What's the fastest way to alphabetize your bookshelf?

Related talks

David J. Malan: What's an algorithm?

Jeff Dekofsky: Euclid's puzzling parallel postulate

Elizabeth Cox: What really happened to the Library of Alexandria?

Brian Dettmer: Old books reborn as art

Dawn Wacek: A librarian's case against overdue book fines

Ann Morgan: My year reading a book from every country in the world

Related talks

David J. Malan: What's an algorithm?

Jeff Dekofsky: Euclid's puzzling parallel postulate

Elizabeth Cox: What really happened to the Library of Alexandria?

Brian Dettmer: Old books reborn as art

Dawn Wacek: A librarian's case against overdue book fines

Ann Morgan: My year reading a book from every country in the world