NYCPHP Meetup

NYPHP.org

[nycphp-talk] [OT] number of files in a directory?

Anirudh Zala arzala at gmail.com
Mon Jan 2 00:32:17 EST 2006


Hi,

# First of all, do not use word [OT] while discussing such important issues here, because this kind of problems/queries can be part of any programming environment. Seeing [OT] as subject of your query, will reduce chance of your replies upto 50%.

# Anwyay straight to your point:

=> Never, as per my thinking, use only 1 location to store all your images or dynamic contents. Because it increases chance of serving your images slower, and it enhances threat of stealing them from your website without your information.

=> Another thing from your usage point of view is that: Storing all images at only 1 place will slow down serving your images to users, because I am sure you will have usage of php functions like "is_file" or "file_exists" in your script to check whether images actually exists or not before showing them to users. Reason is that when you have almost 100, 000 images at 1 place then above function will take lot of time then scanning 1000 or 5000 images if you store 1000 images in 100 folders approx.

=> Hence split number of images to be stored in each directory by 1000 or so, or consider usage of PRODUCT ID etc. i.e if you have 100 products, then strcture of your directory system will look like below.

\.....
	\images

		\1
			\RECORDID_SOME>10CHARACTERHASH.EXT (i.e 123456_aswe34567bg.jpg)
		\2
			\RECORDID_SOME>10CHARACTERHASH.EXT (i.e 7890_aswertgyuionjui.jpg)
		\3
			\.......
		\4
			\.......
		....

=> As above strcture says; under your main images folder "images", create folder according to "PRODUCT ID" and store images of number records by "RECORDID_SOME>10DIGITHASH.EXT". In your table/s of records, you will have unique "RECORDID" for your each record. We also need > 10 character hash as suffix of your images, so that stealers can not ever guess how many images you have in particular folder and if they want to hoover it by automatic script, they can't find what and where to search. But make sure that you will have approx 1000 to 5000 images in each product folder (i.e 1,2,3 etc). If you think number of images will exceed than that, then use slightly different strcuture, like factors of 100s. i.e 1_100, 101_200, 201_300 etc. But make sure that RECORDIF must fall in between factor of 100s i.e if your image name is like "7890_aswertgyuionjui.jpg" them it should go under "7801_7900" folder only.

=> Also disable Directory Index of main image folder "images" so that your server will not expose your whole directory structure  to anyone who tries to access that folder directly from browser. Instead, server will generate error 403 (access forbidden).

=> Please note that by above implementation, do not worry if number of folder reaches > 1000, but numebr of files must be smaller or almost same in each folder, because going into any of 1000 folder and displaying any image from there is far more faster, when server is Linux as you said earlier, than displaying same image from haystack of 1000 X 1000 images.

=> Another suggestion is, if your website is going to have faster movement of records like at a time only 50000 records are active but millions of records are sold or inactive and your vistors/users are surfing with those 50000 records mainly, then split location of images into 2 categories like "images" and "sold_images" so that total number of images in your active images will not exceed certain number, and your website will give consistent performance even if number of records or visitors are increasing.

# Other extra suggestions:

=> If you are using NAS using technologies like NFS etc. then beware of using function like "is_dir" or "file_exists" or any other funcion which check existance of file/folder. So try to avoid using those functions when you have lot of images or folders are to be first checked and only then to be displayed while storing those images on NAS. Because checking above items on same hard drive is very much faster than checking on remote drive.

=> When you have lot of images are to be served, then Bandwidth becomes important issue to be taken into account. Because images can not almost be compressed while sending them to browser for display. Hence they will use full bandiwdth that it needs. In this situation, if you do not need those images to be printed, keep resolution of images upto 72 only, because images to be shown on monitor can be displayed well with 72 resolution than for printing. By this way you can save lot of disk space. But if you need to offer those images for printing as well then you will need to keep 2 versions of images, however only if you have enough storage capacity, 1: for showing in browser only with resolution of 72 and 2: for printing with higher resolution that you require..

=> Finally, of course, "Keep your city clean". So design a backend script that can match number of records in your database and images required for that. So if records are not there in your db, then no need to keep those images.

Hope these guidelines will help you in solving most of your queries for image storing.


Thanks,

Anirudh Zala


On Sun, 01 Jan 2006 06:23:38 +0530, Marc Antony Vose <suzerain at suzerain.com> wrote:

> Hey all:
>
> First of all:  Happy New Year!
>
> Secondly: I am rebuilding a site that was coded somewhat sloppily,
> and they have product images all stored in one directory (a script
> that I am not writing auto-uploads them to the web server from
> elsewhere).  Presently, this directory contains about 33,000 files.
> It will be more like 75,000 when the site launches, if things remain
> the same.
>
> The question is:  should I be worried about this, or was this only a
> problem several years ago? (I remember people at one time attempting
> to not put too many files in one place.)
>
> If I should be worried, what could happen?  Will we ever reach a hard
> limit of files per directory?
>
> Is it better if each product instead has its own directory inside
> there (i.e., 75,000 directories), each with as many files as we need
> inside, or is that just the same problem?
>
> Cheers,
>



-- 
-----------------------------------------------------
Anirudh Zala (Production Manager)
ASPL, http://www.aspl.in
Ph: +91 281 245 1894
arzala at gmail.com
-----------------------------------------------------



More information about the talk mailing list