NYCPHP Meetup

NYPHP.org

[nycphp-talk] XML files

Anirudh Zala xml at aumcomputers.com
Tue May 20 06:09:39 EDT 2003


Larry

Let's have look about your problems 1 by 1

1: Problem of storing and accessing large volume of files in 1 container directory.

A: In such cases splitted directory structure solution is best like what "Rick Seeger" has suggested. Hence you can design a special directory structure that can generate new directory as and when total amout of files execeeds the limit say... 100 files per directory. Below sample function does that (it generate new directory as soon as "id" exceeds value 100)

// CRATES A DIRECTORY ACCORDING TO GIVEN ID AND PARENT DIR PATH
function createDir($id,$parent)
{
    $nd_dirlimit = 100;

    $mod = ceil($id/$nd_dirlimit) - 1;

    $dirs = $mod * $nd_dirlimit + 1;
    $dire = $dirs + $nd_dirlimit - 1;

    $tempDir = $dirs."_".$dire;

    if(! file_exists("$parent/$tempDir"))
          mkdir("$parent/$tempDir",0777);

    return "$tempDir";
}

By above method dirctory structure will be like ../1_100/, ../101_200/, ../201_300/ and in each directory file name with id say "1.xml" to "100.xml" (or with any name) will reside. Once above directory structure is implemented no script will crawl like now :D.

2: Importing Data into DB by parsing XML files

A: Use php's file "XPath.class.php" this is very nice OO file to parse XML documents at greater speed. It has lot of easy to use method and properties like $xml->evaluate, $xml->setAttributes, $xml->appendChild, $xml->removeAttribute, $xml->exportToFile etc...by which you can parse lot of XML documents like playing Zigsaw puzzles. It is very handy, class based php file. Check example at http://www.soi.city.ac.uk/~sa386/php/XPath_V2/testBench/useCases/

Some times it is useful to use Utilities of Perl and executing your perl/cgi scripts from console level. This very fastest and reliable method IF such doucmens are to be generated, read or write in batch mode. Perl's XML::DOM package helps here. Other advantage of such mechanism is that you can run such scripts seperately from other servers that has connection with your DB server, hence reduction of overhaed on your main server that has php scripts running.

Another method is XSQL that is powerful but still new to us and PHP, where your DB SQL is written directly into XML file that syncronises (update your XML document from DB by executing XSQL) your XML and DB from 2 ways. (Can't tell much here :( )

3: Is there another way to index and query large volumes of documents?

A: I am not getting here exactly your problem, But if you need faster document access from directory or file system (serching 1 file from million files) then answer of 1 st question can help here where you just need to search particular document from 100 files from documnet diretory, already given from DB.

And if question is related to Data then use of indexes (Indexing commonly used fields in Db like primary keys and foreigh keys) in mysql can make "Access of record" faster.

Thanks,

Anirudh Zala (azala at lechuon.com)

------------------------------------------------------------------------------------------------------
Anirudh Zala (Project Manager),           Tel: +91 281 2451894
AUM Computers,                                Gsm: +91 98981 37727
317, Star Plaza,                                 anirudh at aumcomputers.com
Rajkot-360001, Gujarat, INDIA,            http://www.aumcomputers.com
------------------------------------------------------------------------------------------------------

----- Original Message ----- 
From: "Larry Chuon" <LarryC at indexstock.com>
To: "NYPHP Talk" <talk at nyphp.org>
Sent: Wednesday, 07 May, 2003 4:01 AM
Subject: RE: [nycphp-talk] XML files


> Thank you all for your quick responses.  I'll consider generating XML on the
> fly (still new at this).  Here's another question.  Besides importing the
> files to dB, is there another way to index and query large volumes of
> documents.  I'm going out of tangent a bit here.  Let say, I'm archiving
> tons of document for a public library.  They want to scan and digitize
> millions of articles.  What is the best way to index and search for articles
> say by keywords or captions?
> 
> 
> 
> -----Original Message-----
> From: Malcolm, Gary [mailto:gmalcolm at professionalcredit.com]
> Sent: Tuesday, May 06, 2003 5:53 PM
> To: NYPHP Talk
> Subject: RE: [nycphp-talk] XML files
> 
> http://www.phpwebhosting.com/
> http://phphosts.codewalkers.com
> http://www.oinko.net/freephp
> www.free-php-hosting.com
> 
> cheap hosting... cheap db access... i love (hearts in eyes) mysql
> 
> 
> > -----Original Message-----
> > From: soazine at pop.erols.com [mailto:soazine at pop.mail.rcn.net]
> > Sent: Tuesday, 06 May, 2003 2:41 PM
> > To: NYPHP Talk
> > Subject: Re: [nycphp-talk] XML files
> >
> >
> > Importing the XML files into a database is an ideal solution,
> > unfortunately, not always an available one, such as in my
> > case.  I have
> > space on a remote server where database access is very
> > expensive (it's my
> > own site and out of my price range to afford db access), so I have to
> > resort to XML as well.  PHP parses XML extremely fast and
> > efficiently; I
> > highly recommend it.
> >
> > I'd use PHP's available XML parsers along with grouping them into
> > directories sorted by a date or some other delimiter to allow
> > for smaller
> > amount of files per directory.
> >
> > Phil
> >
> > Original Message:
> > -----------------
> > From: Analysis & Solutions danielc at analysisandsolutions.com
> > Date: Tue,  6 May 2003 17:35:16 -0400
> > To: talk at nyphp.org
> > Subject: Re: [nycphp-talk] XML files
> >
> >
> > On Tue, May 06, 2003 at 04:49:21PM -0400, Larry Chuon wrote:
> > > I'm doing everything that you mention below.
> >
> > So, import the files into a database and get rid of the XML files.
> >
> > Here's a quick tutorial on how to parse XML in PHP:
> >    http:www.analysisandsolutions.com/code/phpxml.htm
> >
> > Enjoy,
> >
> > --Dan
> >
> > --
> >      FREE scripts that make web and database programming easier
> >            http://www.analysisandsolutions.com/software/
> >  T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
> >  4015 7th Ave #4AJ, Brooklyn NY    v: 718-854-0335   f: 718-854-0409
> >
> >
> >
> >
> >
> >
> > --------------------------------------------------------------------
> > mail2web - Check your email from the web at
> > http://mail2web.com/ .
> >
> >
> >
> >
> >
> >
> >
> >
> 
> 
> 
> 
> 
> 
> --- Unsubscribe at http://nyphp.org/list/ ---
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20030520/e98118f6/attachment.html>


More information about the talk mailing list