NYCPHP Meetup

NYPHP.org

[nycphp-talk] [OT] number of files in a directory?

Steve Manes smanes at magpie.com
Mon Jan 2 20:26:51 EST 2006


max goldberg wrote:
> The downside is that you have to make sure your code really keeps track 
> of your file system and you aren't accessing it by hand. Another thing 
> you might worry about using md5 is collisions. If this is a mission 
> critical system, you may want to avoid md5 as it is possible (but 
> somewhat unlikely) you will encounter collisions. I've read anyone with 
> a decent computer can create an md5 collision in about an hour, so 
> that's something to keep in mind.

Yeah, this is probably the best the solution.  To avoid collisions what 
you want to do is assign a unique database ID to every asset, use that 
ID to create the MD5 hash, then store the asset with a filename 
containing that unique ID.  That should eliminate collisions.  The worst 
that can happen is that you'll have two different files in the same 
directory but with different filenames, which is cool.

A function like this could be used to both plant the file in the MD5 
filesystem and extract its path later on based on that unique ID:

function get_upload_target($file_id) {
     $hash_id = md5($file_id);
     $subdir = substr($hash_id, 0, 3) .
       '/' .
       substr($hash_id, 3, 3);
     return $subdir;
   }

Use case: someone uploads the file "mykitty.jpg" and it's inserted into 
the database as id=1234.  get_upload_target(1234) returns:

  81d/c9b

The file is then written as $ASSET_DIR/81d/c9b/1234

Or 1234.jpg, or 1234.mykitty.jpg, whatever.  I like to give the file a 
recognizable file type extension.

To extract that file later, just run the ID through get_upload_target() 
again to build the filesystem path.



More information about the talk mailing list