How You Unknowingly Helped Transcribe Over 40,000 E-Books by Nicholas Tan

If you’ve ever signed up for a Facebook account or tried to change your password on Twitter, you would have encountered the odd distorted section of the online form which asks you to type in the words or numbers generated in an image. This little tool to distinguish a human from a bot is called CAPTCHA, or a Completely Automated Public Turing test to tell Computers and Humans Apart.

Taken from

CAPTCHA clearly has some internal conflict issues.


Unscrupulous coders often design programs that can sign up for fake accounts and spam websites, with more than seven trillion spam messages circulating the globe in 2011. As such, the need to create a barrier to distinguish between humans posting real content and bots pretending to be users was (and is) a very real concern.

In 2000, the idea for CAPTCHA was created by Luis von Ahn, Manuel Blum, Nicholas J. Hopper and John Langford from Carnegie Mellon University. Distorted images were displayed that could be quickly deciphered by a human, but not by a computer. It was a screening test that was hard for a computer to solve – but (crucially) easy for it to generate, so any input could be analysed with a high degree of accuracy.

Taken from

Hipster hamsters notwithstanding. 

Now, this is all very interesting — but where do you come in, and how does typing words into your computer help digitise a vast online library?



Developed at the main campus of Carnegie Mellon, reCAPTCHA was acquired by Google on 16 September 2009. A statement from the official website explains what exactly it is reCAPTCHA does:

About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that’s not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into “reading” books.


reCAPTCHA is currently digitising the archives of The New York Times and books from Google Books. It’s a win-win solution to spam and knowledge preservation that very cleverly makes the best of all worlds.

So the next time you curse the tedious nature of having to decipher the warped bit of text in order to post a snarky comment on an online article you’ve just read, know that you are inadvertently helping to digitise a vast library public domain literary work.

To find out more about this fascinating service, you can visit the official website here.

Related Posts

2 Responses to “How You Unknowingly Helped Transcribe Over 40,000 E-Books”

  1. Avatar

    Terence Lim

    Haha, I was mind blown when I heard about this in CMU too. Luis von Ahn was my prof in CMU. He’s really insanely smart. He’s currently working on this thing called DuoLingo, helping to translate languages over the world using a similar concept.

  2. Nicholas Tan

    Nicholas Tan

    He was your professor?! Wow, I sense a story! 😀

    I really liked his idea because it’s such a simple yet elegant use of technology.


Leave a Reply