NYCPHP Meetup

NYPHP.org

[nycphp-talk] SEARCHING PDF DOCUMENTS WITH UNIX

DeWitt, Michael mjdewitt at alexcommgrp.com
Thu Mar 4 21:17:35 EST 2004


Dan,

Isn't pdftotext part of the xpdf package?

Mike

> -----Original Message-----
> From:	Daniel Convissor [SMTP:danielc at analysisandsolutions.com]
> Sent:	Thursday, March 04, 2004 9:12 PM
> To:	NYPHP Talk
> Subject:	Re: [nycphp-talk] SEARCHING PDF DOCUMENTS WITH UNIX
> 
> On Thu, Mar 04, 2004 at 08:42:55PM -0500, DeWitt, Michael wrote:
> > I am using xpdf to get text out of pdfs.
> 
> ... IF you've got X windows going.
> 
> Let's see what Panix has...
> 
> d> apropos pdf | grep text
> 
>   latex, elatex, lambda, pdflatex (1) - structured text formatting and
>     typesetting
>   pdftotext (1) - Portable Document Format (PDF) to text converter 
>     (version 2.02)
> 
> That second one looks like it'll fit the bill.
> 
> d> man pdftotext
>        ... snip ...
>        Pdftotext  reads the PDF file, PDF-file, and writes a text
>        file, text-file.  If text-file is not specified, pdftotext
>        converts  file.pdf  to file.txt.  If text-file is '-', the
>        text is sent to stdout.
>        ... snip ...
> 
> d> pdftotext afile.pdf - | grep stringicareabout
> 
> Works like a charm.
> 
> Enjoy,
> 
> --Dan
> 
> -- 
>  T H E   A N A L Y S I S   A N D   S O L U T I O N S   C O M P A N Y
>             data intensive web and database programming
>                 http://www.AnalysisAndSolutions.com/
>  4015 7th Ave #4, Brooklyn NY 11232  v: 718-854-0335 f: 718-854-0409
> _______________________________________________
> talk mailing list
> talk at lists.nyphp.org
> http://lists.nyphp.org/mailman/listinfo/talk



More information about the talk mailing list