NYCPHP Meetup

NYPHP.org

[nycphp-talk] Determine the text language

David Krings ramons at gmx.net
Fri Nov 9 06:44:47 EST 2007


hafez ahmad wrote:
> Hi All,
> 
> How can I use regular expression to determine the text language, is the 
> selected text is English, Arabic, Hebrow,  .....etc 
> 

I wonder if that even could work. Language doesn't follow logic, which is what 
you'd test for with reg expressions. I'd see if there is a chance to hook into 
the Mozilla or OOo dictionaries. Send the selected text through all the dics 
and assume that the one with the least amount of errors is the one that 
matches the dic language. That process will take forever and fail when you 
have horrible spellers.
Or do you want to check for the different type of character set used? If you 
could provide some more detail of what you try to accomplish I guess we could 
give you some more hints.

David



More information about the talk mailing list