NYCPHP Meetup

NYPHP.org

[nycphp-talk] email system for website

Paul A Houle paul at devonianfarm.com
Mon Jan 4 12:16:11 EST 2010


Matt Juszczak wrote:
> Paul,
>
>>   Sure, but I'd also send "high priority" and "low priority" emails 
>> through separate systems (sendmail/postfix/whatever instances.)
>
> Well, what's to stop me from using the same database table for high 
> priority and low priority, but having the high priority background 
> process continuously loop to check for new items in the queue?
>
> Even if I have the webs send out high priority directly, like you 
> said, that could cause some damage.
>
> Perhaps I could create a centralized relay server that the webs use 
> for high priority, and have the webs send out high priority mail.  But 
> at that point, I unfortunately have to duplicate the high priority 
> mail code =(
>
    Ok,  you're talking about two sorts of queue here.

    (i) there's the queue of an SMTP-compliant mailer (I assume),  and
    (ii) there's a queue that you're maintaining of messages you want to 
send;  this might be a set of database rows,  one per message,  and 
you're doing a "mail merge" process to fill in a template and push 
messages gradually into queue (i)

    Presumably you've got some rate control on (ii),  and the mail merge 
process is watching the length of the queue in (i) (and maybe some other 
variables),  so that you can control the load of the SMTP server.

    One trouble with this is that a certain fraction of mail takes a 
long time to deliver;  an SMTP-compliant mailer will keep trying to 
deliver a message for seven days.  If you're sending enough mail,  
you're eventually going to get a large "plug" of stuck messages that the 
mail server is going to try to keep delivering and re-delivering.  
Ultimately this is going to burn up resources on the mail server,  which 
will impact other things running on that machine:  such as high-priority 
mails you want to send.

    Now,  process (ii) can certainly stop putting messages in (i) once 
the "plug" of stuck messages reaches a significant size,  but that's 
going to really slow down the bulk mail.

    The main factor in mail server performance is the effect of fsync() 
calls on a mechanical disk:  mail delivery events really ought to be 
transactional,  since you don't want to deliver mail twice or fail to 
deliver it.  Fsync() doesn't (honestly) return until a chunk of metal 
moves to a certain place;  the bottleneck isn't so much like "you can do 
so many a second" but more like a systemwide lock,  since there can be 
multiple processes trying to fsync(),  such as syslogd or a database 
server that's committing a transaction.  The traffic jam can get backed 
up,  since other processes can be waiting for the the first process to 
complete,  can be holding more locks and so forth...  So you end up with 
a situation where the performance bottleneck is a real pain to 
understand...  It might even take you 5 minutes to get to a shell prompt 
when you ssh in.

    If you're on a virtual server,  you may (or may not) have somebody 
else doing a lot of fsync() calls,  in which case your performance could 
be hosed for reasons outside your observation and control.  Just as 
likely,  the server will be programmmed to return from fsync() before 
the fsync is done,  which means someday you're going to have a big 
database wreck...

    For $200 a month you can rent a server that will do a great job 
delivering email.  Or you can spend 10-100x that trying to figure 
problems out.

   



More information about the talk mailing list