Send CathInfo's owner Matthew a gift from his Amazon wish list:
https://www.amazon.com/hz/wishlist/ls/25M2B8RERL1UO

Author Topic: problem we-texts?  (Read 838 times)

0 Members and 1 Guest are viewing this topic.

Offline Marlelar

  • Sr. Member
  • ****
  • Posts: 3473
  • Reputation: +1816/-233
  • Gender: Female
problem we-texts?
« on: January 04, 2014, 12:27:38 PM »
  • Thanks!0
  • No Thanks!0
  • I like to use archive dot org to download some of the old Catholic books but the formatting of most of them leaves them all but unreadable.  Does anyone else have this problem w/public domain books?

    An example from Humility of Heart  ( link)

    ...but for the man who means to graduate for heaven there S» no C$Capc k° m ' t# Accordingly our divine Master...

    what it actually says is:

    ...but for the man who means to graduate for heaven there is no escape from it.   Accordingly our divine Master...

    So why the gobbldygook?  I'm about to give up on e-books altogether as this happens with almost every other sentence in some books which leaves them unreadable.

    I have also tried many of them in Kindle format as well as epub and pdf and they are all just as unreadable. :really-mad2:


    Marsha


    Offline Memento

    • Jr. Member
    • **
    • Posts: 269
    • Reputation: +135/-0
    • Gender: Female
    problem we-texts?
    « Reply #1 on: January 04, 2014, 12:53:30 PM »
  • Thanks!0
  • No Thanks!0
  • Hi Marsha,
    While using an I-Pad, I can read the online version and PDF just fine, but the downloaded ePub version had the "gobbledygook" words.

    I did not try to download the PDF yet but usually do not encounter any problems with that format. Thanks for the link to Humility of Heart.


    Offline shin

    • Full Member
    • ***
    • Posts: 1671
    • Reputation: +854/-4
    • Gender: Male
    problem we-texts?
    « Reply #2 on: January 04, 2014, 02:17:19 PM »
  • Thanks!0
  • No Thanks!0
  • The reason why you find problems with the non-PDF formats is that it's been OCRed. That is an acronym for Optical-Character-Recognition program, a program that scans and converts pictures to text. It's also called machine transcription.

    These programs make text files which are small files full of text, and usually not much else (like pictures).

    The OCR program is only a program, not a person, so if it finds text that is smudged or if it glitches you come up with funny unreadable text like that. Sometimes you can interpret what it thought the text was, sometimes the OCR thought scratches and smudges were text, and there wasn't any there, so forget it.

    PDFS are normally scanned images, pictures of books, there's no interpretation to text necessary to read them.

    This makes them easier to read in many cases. But they're much larger to download because of this, as pictures take up more space than text. Also the picture scans may be a bit off too.

    For some e-books, volunteers correct typos and submit the correct copies to e-text archives like Gutenberg, the Internet Archive, Google Books, etc.

    That's when you find easy to read clean copy without errors.

    All the files uploaded as PDFs to the Internet Archive tend to have an automatic OCR done other formats, often a very poor unreadable one. You may get a readable copy, you may not.

    FWIW -- Saints' Books only offers non-PDFs if they've been proofed and typo-corrected. So the books are always readable. I prefer non-PDFs personally, even though the site is full of PDFs and there's always an opening for more volunteers and helpers to help out transcribing and correcting copy.  :smile:


    Sincerely,

    Shin

    'Flores apparuerunt in terra nostra. . . Fulcite me floribus.' (The flowers appear on the earth. . . stay me up with flowers. Sg 2:12,5)'-

    Offline Marlelar

    • Sr. Member
    • ****
    • Posts: 3473
    • Reputation: +1816/-233
    • Gender: Female
    problem we-texts?
    « Reply #3 on: January 04, 2014, 02:20:07 PM »
  • Thanks!0
  • No Thanks!0
  • Yes, reading it online is no problem, but almost every time I download a book it has varying amounts of corruption.

    Marsha

    Offline shin

    • Full Member
    • ***
    • Posts: 1671
    • Reputation: +854/-4
    • Gender: Male
    problem we-texts?
    « Reply #4 on: January 04, 2014, 03:00:42 PM »
  • Thanks!0
  • No Thanks!0
  • At least the PDFs will be readable normally.  :smile:
    Sincerely,

    Shin

    'Flores apparuerunt in terra nostra. . . Fulcite me floribus.' (The flowers appear on the earth. . . stay me up with flowers. Sg 2:12,5)'-


    Offline Marlelar

    • Sr. Member
    • ****
    • Posts: 3473
    • Reputation: +1816/-233
    • Gender: Female
    problem we-texts?
    « Reply #5 on: January 05, 2014, 02:03:26 AM »
  • Thanks!0
  • No Thanks!0
  • Quote from: shin
    The reason why you find problems with the non-PDF formats is that it's been OCRed. That is an acronym for Optical-Character-Recognition program, a program that scans and converts pictures to text. It's also called machine transcription.

    PDFS are normally scanned images, pictures of books, there's no interpretation to text necessary to read them.


    Thanks for the tip on pdfs.  I just did the pdf text of Humility of Heart and it was neat and tidy - yeah!  Not sure why some of the other pdfs were corrupted but this one came out just fine.

    Marsha