NYCPHP Meetup

NYPHP.org

[nycphp-talk] SEARCHING PDF DOCUMENTS WITH UNIX

Daniel Convissor danielc at analysisandsolutions.com
Thu Mar 4 21:12:19 EST 2004


On Thu, Mar 04, 2004 at 08:42:55PM -0500, DeWitt, Michael wrote:
> I am using xpdf to get text out of pdfs.

... IF you've got X windows going.

Let's see what Panix has...

d> apropos pdf | grep text

  latex, elatex, lambda, pdflatex (1) - structured text formatting and
    typesetting
  pdftotext (1) - Portable Document Format (PDF) to text converter 
    (version 2.02)

That second one looks like it'll fit the bill.

d> man pdftotext
       ... snip ...
       Pdftotext  reads the PDF file, PDF-file, and writes a text
       file, text-file.  If text-file is not specified, pdftotext
       converts  file.pdf  to file.txt.  If text-file is '-', the
       text is sent to stdout.
       ... snip ...

d> pdftotext afile.pdf - | grep stringicareabout

Works like a charm.

Enjoy,

--Dan

-- 
 T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
            data intensive web and database programming
                http://www.AnalysisAndSolutions.com/
 4015 7th Ave #4, Brooklyn NY 11232  v: 718-854-0335 f: 718-854-0409



More information about the talk mailing list