Billy pleroma (AP)
If you have to deal with those pesky PDF files, pdftohtml from the poppler package and w3m are all you need.

pdftohtml -i -s -stdout filename.pdf | w3m -T text/html

Make it into a convenient function by adding it to your .bashrc.
if [ $# -ne 1 ]; then
    echo "Usage: pdf filename."
    pdftohtml -i -s -stdout $1 | w3m -T text/html
D. P. Goodman mastodon (AP)
I have found the conversions from these to be ugly at best, and often worse than useless. Converting pdf, even properly tagged pdf, is still a dicey business.
Billy pleroma (AP)
most actual pdf software is not accessible with orca, and even the few that are can be a pain to use. This method at least gets the text in a format that is usable. I figured the formatting wouldn't be exactly the same as the original, but I didn't realize it would be terrible. In my case, however, the layout isn't usually that important.
D. P. Goodman mastodon (AP)
Fair enough; as long as the pdf has text in it (many scans don't), you will get something readable. I spend a *lot* of time on document conversion, and find the whole process very frustrating with pdf.

The mutools bundle is also worth a look.
Billy pleroma (AP)
oh, no doubt about it, pdf is the root of all evil. It's like plain text came out and it was good, so companies immediately set about finding the worst thing possible lol. Now they all use it, no plain text instruction manuals to be found ever. I will look into mutools, thanks for the suggestion.
I found something called ocrmypdf that seems to do pretty well when you can't get a text conversion. Too many options though, with no simple way to just open up a document and try to extract the text from it in different ways. Checkboxes would be rather nice here, but I just use the sidecar output usually.
Billy pleroma (AP)
ocrdesktop can also do it. The new version is quite nice for viewing pdfs. My little trick is good for when you are in a hurry and don't want to switch to an X session just to read some text. Although, now that I think about it, ocrdesktop may be able to do it and just pipe the contents to a file. The other good part about my trick is links in the generated document actually work. You can read with w3m and just hit enter on the section name to jump right to it.

This website uses cookies to offer a satisfactory user experience and full functionality. By using this site, you agree to have our cookies placed on your device.