NYCPHP Meetup

NYPHP.org

[nycphp-talk] Searching PDF files?

Phil Costa pcosta at macromedia.com
Thu Jan 8 12:35:03 EST 2004


Search engines typically have filters that know how to read the PDF file
format. Once it's converted to regular text, they use their usual algorithms
to index it.

All of the commercial search engines (Verity, Autonomy, etc.) have filters
for PDF files as well as Excel, Word, powerpoint, and so on. Some of the
open source ones do as well, though I'm not really sure how good they are.
One I've heard good things about is Lucene, which is managed by the Apache
group. It's written in Java. I'm sure there are open source C-based ones as
well.

Phil

-----Original Message-----
From: talk-bounces at lists.nyphp.org [mailto:talk-bounces at lists.nyphp.org] On
Behalf Of tom at supertom.com
Sent: Thursday, January 08, 2004 12:06 PM
To: NYPHP Talk
Subject: [nycphp-talk] Searching PDF files?


Hey folks, a question about PDF files:

In Google's search results, they are able to search PDF files.  Anyone know
how this is done, and if it can be done in PHP?

Thanks,

Tom

_______________________________________________
talk mailing list
talk at lists.nyphp.org http://lists.nyphp.org/mailman/listinfo/talk



More information about the talk mailing list