Mastodon

Grepping for Gaddafi

How do you programmatically find someone’s name in text if there is no generally agreed upon spelling? Moamer Kaddafi present a unique challenge for those of us who like to parse our news before reading it. Several solutions are presented at StackOverflow

The regex, which basically brute forces several well known spellings:

\b(Kh?|Gh?|Qu?)[aeu](d['dt]?|t|zz|dhd)h?aff?[iy]\b

This solution, though, is my favorite though. It uses Soundex which searches for any set of sounds in a given set of english text that approximate the basic phonetics of Gaddafi’s name:

G310, K310, Q310

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Mastodon