Spell Checking Using the Web, Microsoft Research, PDP Family Tree
Via the Above The Fold mailing list, comes a white paper (in PDF form) from Microsoft Research: Spelling correction as an iterative process that exploits the collective knowledge of web users.
Here’s the abstract; it looks pretty interesting:
Logs of user queries to an internet search engine provide a large amount of implicit and explicit information about language. In this paper, we investigate their use in spelling correction of search queries, a task which poses many additional challenges beyond the traditional spelling correction problem. We present an approach that uses an iterative transformation of the input query strings into other strings that correspond to more and more likely queries according to statistics extracted from internet search query logs. There are more where this came from; check them out at the Microsoft Research Publications Database. You can get an overview of this database by doing a “search” with a date range and no keywords. Try 01/01/2003 to 12/31/2003 to see an entire year.
There’s even an RSS Feed (more info including a headline history can be found on the feed’s Syndic8 home page).
While you are on the site, be sure to check out Gordon Bell’s PDP Family Tree poster (402KB GIF, 1700×1141 pixels). How many of these machines have you used?