NYCPHP Meetup

NYPHP.org

[nycphp-talk] Java provides???

Paul A Houle paul at devonianfarm.com
Wed Aug 12 10:35:29 EDT 2009


Matt Williams wrote:
> On Aug 12, 2009, at 5:57, Leam Hall <leam at reuel.net> wrote:
>
>> What does Java provide that PHP can't do faster and with lighter 
>> resource usage?
>
> Concurrency and threading to name a couple...
    I've got a system that's gotten complicated enough that it's 
"outgrowing" PHP.  One big advantage in PHP is that you can get more 
productivity out of rookie programmers.  It takes a good programmer 1 
1/2 years to be able to produce usable Java,  and some programmers never 
get good at it.  The ideas in this system are complicated enough that I 
think I'd have a hard time hiring another programmer who could handle 
it,  so the simplicity advantage of PHP is gone.  I'm starting to want 
static types so that the compiler is watching my back and so that my IDE 
can do automated refactoring.

    I'm thinking of gradually moving to the JVM but using Scala instead 
of Java.  After 2 years of working in C#,  Java really seems like C#--.  
I mean,  even PHP has closures today.  Type inference,  generics and 
other features in C# make Java seem like it's going backwards.  On the 
other hand,  if I'm doing my own sysadmin or paying somebody to sysadmin 
my systems,  I don't want to be stuck in Windows.  I know a lot of 
people think the type system of Scala is over-complicated,  but after 2 
years of lover's quarrels with the C# type system,  Scala provided the 
general theory that informs my practice in C#.

   I'm interested in logic programming and other inference systems,  as 
well as specialized databases:  there's a lot of that written in Java.  
Java's never quite going to have the efficiency of C,  but it's better 
for systems work than PHP.  If I feel the need for scripting there's 
always Groovy,  Jython,  etc.

    My big beef with the JVM (and the CLR) is the UTF-16 scandal;  
perhaps I'm a cultural imperialist,  but I process lots of text 
(billions and billions of characters) that is mainly:

(i) us-ascii,
(ii) iso-latin-1,  and
(iii) Unicode that is mainly us-ascii with occasional spattering of 
iso-latin-1 and other unicode characters

For me,  UTF-8 encodes text at about (1+epsilon) bytes per character;  
the JVM and CLR encode text at (2+epsilon) bytes per characters.  A few 
years ago,  when I was stuck on 32-bit machines,  that was often the 
difference between a program that could run in RAM and a program that 
couldn't.  Since text processing is limited by memory bandwidth,  it 
often means large text-processing programs run about twice as slow on 
the JVM as they do in UTF-8 based environments.

What makes it a scandal is that UTF-16 pretends to be a fixed-width 
encoding when it really isn't.  Code that works correctly with,  say,  
English or Japanese will break when you're processing Chinese or 
mathematical characters.  Code written with the fast random access that 
Java provides doesn't generalize to all languages,  so you need to fall 
back to the same sequential access methods that you use handling UTF-8 
in PHP.

A big advantage of PHP for unicode handling is that it "does no harm;"  
I've often seen Java and CLR systems fail seriously because of 
limitations in how they handle Unicode characters,  particularly when 
dealing with junky input data.



More information about the talk mailing list