NYCPHP Meetup

NYPHP.org

[nycphp-talk] XML files

Larry Chuon LarryC at indexstock.com
Wed May 7 14:17:47 EDT 2003


Hi Anirudh,
 
Long time no hear.  Thank you and everyone at NYPHP <mailto:everyone at NYPHP>
for the solution.  I learn a great deal in the last day and half.  As Chalu
had noted, I need to rethink through the solution again.  Pumping out nearly
a million XML files is not the best practice.  Last night, I accepted Gary
Malcolm's invitation to look at Xindice.  Xindice uses Xpath and Xupdate, it
addresses some of my concerns (not certain how many records it can support
though, but I can categorize).  Unfortunately, Xindice does not have a PHP
client for their API.  I need to figure out something for this.  
 
Your sample codes resolve my large volume of files should I ever need to
spit out more XML files.
 
Now, I have a whole new set of problems.  Lets say, I need to share data
with some companies.  The XML files get changed and must be updated on my
system or their system depending on who is/are the recipient(s) of the
changes.  Since I have nearly a million records, how do I keep track of
changes?  Importing everything back into Xindice or whatever repository in a
multiple of X times will generate a lot unnecessary traffic.  On the other
hand, updating only the changes will require a lot of tracking (timestamp)
as well.  Finally, giving appropriate hook to the partners will require an
extensive predefined business rules (I don't know it all and I don't assume
to know my partners/users business workflow either) and some sort of version
control for rollback and approval process similarly to CMS.  Thought
anybody?
 
Thanks for all the suggestions guys.
Larry
 
-----Original Message-----
From: Anirudh Zala [mailto:anirudh at aumcomputers.com]
Sent: Wednesday, May 07, 2003 2:17 AM
To: talk at nyphp.org
Cc: LarryC at indexstock.com
Subject: Re: [nycphp-talk] XML files
 
Larry
 
Let's have look about your problems 1 by 1
 
1: Problem of storing and accessing large volume of files in 1 container
directory.
 
A: In such cases splitted directory structure solution is best like what
"Rick Seeger" has suggested. Hence you can design a special directory
structure that can generate new directory as and when total amout of files
execeeds the limit say... 100 files per directory. Below sample function
does that (it generate new directory as soon as "id" exceeds value 100)
 
// CRATES A DIRECTORY ACCORDING TO GIVEN ID AND PARENT DIR PATH
function createDir($id,$parent)
{
    $nd_dirlimit = 100;

    $mod = ceil($id/$nd_dirlimit) - 1;
 
    $dirs = $mod * $nd_dirlimit + 1;
    $dire = $dirs + $nd_dirlimit - 1;
 
    $tempDir = $dirs."_".$dire;
 
    if(! file_exists("$parent/$tempDir"))
          mkdir("$parent/$tempDir",0777);
 
    return "$tempDir";
}
By above method dirctory structure will be like ../1_100/, ../101_200/,
../201_300/ and in each directory file name with id say "1.xml" to "100.xml"
(or with any name) will reside. Once above directory structure is
implemented no script will crawl like now :D.
 
2: Importing Data into DB by parsing XML files
 
A: Use php's file "XPath.class.php" this is very nice OO file to parse XML
documents at greater speed. It has lot of easy to use method and properties
like $xml->evaluate, $xml->setAttributes, $xml->appendChild,
$xml->removeAttribute, $xml->exportToFile etc...by which you can parse lot
of XML documents like playing Zigsaw puzzles. It is very handy, class based
php file. Check example at
http://www.soi.city.ac.uk/~sa386/php/XPath_V2/testBench/useCases/
<http://www.soi.city.ac.uk/~sa386/php/XPath_V2/testBench/useCases/> 
 
Some times it is useful to use Utilities of Perl and executing your perl/cgi
scripts from console level. This very fastest and reliable method IF such
doucmens are to be generated, read or write in batch mode. Perl's XML::DOM
package helps here. Other advantage of such mechanism is that you can run
such scripts seperately from other servers that has connection with your DB
server, hence reduction of overhaed on your main server that has php scripts
running.
 
Another method is XSQL that is powerful but still new to us and PHP, where
your DB SQL is written directly into XML file that syncronises (update your
XML document from DB by executing XSQL) your XML and DB from 2 ways. (Can't
tell much here :( )
 
3: Is there another way to index and query large volumes of documents?
 
A: I am not getting here exactly your problem, But if you need faster
document access from directory or file system (serching 1 file from million
files) then answer of 1 st question can help here where you just need to
search particular document from 100 files from documnet diretory, already
given from DB.
 
And if question is related to Data then use of indexes (Indexing commonly
used fields in Db like primary keys and foreigh keys) in mysql can make
"Access of record" faster.
 
Thanks,
 
Anirudh Zala ( azala at lechuon.com <mailto:azala at lechuon.com> )
 
----------------------------------------------------------------------------
--------------------------
Anirudh Zala (Project Manager),           Tel: +91 281 2451894
AUM Computers,                                Gsm: +91 98981 37727
317, Star Plaza,
<mailto:anirudh at aumcomputers.com> anirudh at aumcomputers.com
Rajkot-360001, Gujarat, INDIA,             <http://www.aumcomputers.com>
http://www.aumcomputers.com
----------------------------------------------------------------------------
--------------------------
 
----- Original Message ----- 
From: "Larry Chuon" <  <mailto:LarryC at indexstock.com> LarryC at indexstock.com>
To: "NYPHP Talk" <  <mailto:talk at nyphp.org> talk at nyphp.org>
Sent: Wednesday, 07 May, 2003 4:01 AM
Subject: RE: [nycphp-talk] XML files
 
> Thank you all for your quick responses.  I'll consider generating XML on
the
> fly (still new at this).  Here's another question.  Besides importing the
> files to dB, is there another way to index and query large volumes of
> documents.  I'm going out of tangent a bit here.  Let say, I'm archiving
> tons of document for a public library.  They want to scan and digitize
> millions of articles.  What is the best way to index and search for
articles
> say by keywords or captions?
> 
> 
> 
> -----Original Message-----
> From: Malcolm, Gary [mailto:gmalcolm at professionalcredit.com]
> Sent: Tuesday, May 06, 2003 5:53 PM
> To: NYPHP Talk
> Subject: RE: [nycphp-talk] XML files
> 
>  <http://www.phpwebhosting.com/> http://www.phpwebhosting.com/
>  <http://phphosts.codewalkers.com> http://phphosts.codewalkers.com
>  <http://www.oinko.net/freephp> http://www.oinko.net/freephp
>  <http://www.free-php-hosting.com> www.free-php-hosting.com
> 
> cheap hosting... cheap db access... i love (hearts in eyes) mysql
> 
> 
> > -----Original Message-----
> > From:  <mailto:soazine at pop.erols.com> soazine at pop.erols.com
[mailto:soazine at pop.mail.rcn.net]
> > Sent: Tuesday, 06 May, 2003 2:41 PM
> > To: NYPHP Talk
> > Subject: Re: [nycphp-talk] XML files
> >
> >
> > Importing the XML files into a database is an ideal solution,
> > unfortunately, not always an available one, such as in my
> > case.  I have
> > space on a remote server where database access is very
> > expensive (it's my
> > own site and out of my price range to afford db access), so I have to
> > resort to XML as well.  PHP parses XML extremely fast and
> > efficiently; I
> > highly recommend it.
> >
> > I'd use PHP's available XML parsers along with grouping them into
> > directories sorted by a date or some other delimiter to allow
> > for smaller
> > amount of files per directory.
> >
> > Phil
> >
> > Original Message:
> > -----------------
> > From: Analysis & Solutions  <mailto:danielc at analysisandsolutions.com>
danielc at analysisandsolutions.com
> > Date: Tue,  6 May 2003 17:35:16 -0400
> > To:  <mailto:talk at nyphp.org> talk at nyphp.org
> > Subject: Re: [nycphp-talk] XML files
> >
> >
> > On Tue, May 06, 2003 at 04:49:21PM -0400, Larry Chuon wrote:
> > > I'm doing everything that you mention below.
> >
> > So, import the files into a database and get rid of the XML files.
> >
> > Here's a quick tutorial on how to parse XML in PHP:
> >    http:www.analysisandsolutions.com/code/phpxml.htm
> >
> > Enjoy,
> >
> > --Dan
> >
> > --
> >      FREE scripts that make web and database programming easier
> >             <http://www.analysisandsolutions.com/software/>
http://www.analysisandsolutions.com/software/
> >  T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
> >  4015 7th Ave #4AJ, Brooklyn NY    v: 718-854-0335   f: 718-854-0409
> >
> >
> >
> >
> >
> >
> > --------------------------------------------------------------------
> > mail2web - Check your email from the web at
> >  <http://mail2web.com/> http://mail2web.com/ .
> >
> >
> >
> >
> >
> >
> >
> >
> 
> 
> 
> 
> 
> 
> --- Unsubscribe at  <http://nyphp.org/list/> http://nyphp.org/list/ ---
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nyphp.org/pipermail/talk/attachments/20030507/f4929964/attachment.html>


More information about the talk mailing list