• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Reply to: PDF data

Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "PDF data"

Collapse

  • PRC1964
    replied
    Just a wild shot but have you tried an OCR package to read the pdf? I've seen that done with news stories.

    Leave a comment:


  • BoredBloke
    replied
    I'm not paying for one - It's only something which 'might' be an option here.

    Leave a comment:


  • Spacecadet
    replied
    looks like working with the original source code for one of the free ones might be the best option then

    Or have a look at one of the Not free ones?

    Leave a comment:


  • BoredBloke
    replied
    That is what I have been finding with all the free ones. They do things like drop the minus symbol also, which is a pain when it appears in some of the server names we have here (but not all). Also, they are not consistant when handling tables. Sometimes they place data from 3 columns in 3 seperate rows and other times they just join it all together.

    Leave a comment:


  • Spacecadet
    replied
    Originally posted by bored View Post
    pdftotext is distributed with most *nixes and you can get it on Windows as part of cygwin, too.

    http://en.wikipedia.org/wiki/Pdftotext

    Edit - if you follow the link in the wikipedia article, there appears to be a download compiled with MSVC which does not require cygwin.
    Just tried that... works ok(ish) tables don't seem to come out of it too well though and it would be difficult to programmatically work with the resulting text file.

    Leave a comment:


  • bored
    replied
    pdftotext is distributed with most *nixes and you can get it on Windows as part of cygwin, too.

    http://en.wikipedia.org/wiki/Pdftotext

    Edit - if you follow the link in the wikipedia article, there appears to be a download compiled with MSVC which does not require cygwin.
    Last edited by bored; 18 September 2007, 19:48.

    Leave a comment:


  • BoredBloke
    replied
    I'll let you know when I've tested it - it might not work yet!!

    Leave a comment:


  • Spacecadet
    replied
    which one was that?

    Leave a comment:


  • BoredBloke
    replied
    Thanks folks. I've found one that will do it.

    Leave a comment:


  • Spacecadet
    replied
    http://www.google.co.uk/search?hl=en...e+Search&meta=
    This looks quite good and not a bad price:
    http://www.docsmartz.net/

    once its in a text format you might be able to use Microsofts log parser:
    http://www.microsoft.com/downloads/d...displaylang=en

    and a combination of other custom scripts to strip out what you actually want

    Leave a comment:


  • BoredBloke
    replied
    The report I might be getting will be really big and not really an option for cutting and pasting - I want something which will take a PDF and pump the data to a text file so that I can pull it into a database. Excel can't do more than 65000 rows and this will be a lot bigger.

    Leave a comment:


  • MrRobin
    replied
    How do you mean? The data in a table in a PDF? Can you not just highlight, copy and paste into something like excel? Am I missing something?

    Leave a comment:


  • BoredBloke
    started a topic PDF data

    PDF data

    Does anybody know of any software (free!!!!) which would allow me to get to the data held in a PDF file?
Working...
X