NYCPHP Meetup

NYPHP.org

[nycphp-talk] Regular Expressions & Foreign Characters

David Sklar sklar at sklar.com
Wed Sep 17 11:20:38 EDT 2003


On Wednesday, September 17, 2003 10:59 AM,  wrote:

> If I understand correctly, a regular expression like this:
> ^[a-z0-9\',.
> -]{1,35}$/I will not allow foreign characters, e.g., Ë, because it is
> not part of the regular ASCII set of characters but part of the
> extended set. So...what's a kid to do?

Use a POSIX named character class. These respect locale settings:

preg_match('/[[:alnum:]]/','Ë');

This returns true under a locale like 'en_US', or 'de_DE'.

Read all about POSIX named character classes in the egrep(1) manpage.

You should probably call setlocale() in your PHP script before
preg_match()ing against special characters, the default locale (often "C")
may not include these characters in the "alnum" or "alpha" classes. E.g.:

setlocale(LC_CTYPE,'en_US');

or

setlocale(LC_ALL,'en_US');


David





More information about the talk mailing list