"viev text" - Questions about using EPPI-Reviewer

Forum (Archive)

This forum is kept largely for historic reasons and for our latest changes announcements. (It was focused around the older EPPI Reviewer version 4.)

There are many informative posts and answers to common questions, but you may find our videos and other resources more informative if you are an EPPI Reviewer WEB user.

Click here to search the forum. If you do have questions or require support, please email eppisupport@ucl.ac.uk.

<< Back to main Help page

12/05/2011 20:53

Chiara Fusar Poli

Joined: 11/01/2012

Posts: 3

"viev text"

Hello,
I have got a problem with some .pdf files: after uploading, when I open the "viev text" column and click on the "go" button, I cannot read the text; the file is white or with symbols or with words but without the text.
How can I solve this problem?
thank you in advance,

Chiara

13/05/2011 10:35

Jeff Brunton

Joined: 17/10/2011

Posts: 594

Re: "viev text"

Hello Chira,

If you see the strange symbols the most likely reason is the the pdf is encrypted to stop people from cutting and pasting the content into other pieces of software. A good way to check for this is to cut and paste a bit of the pdf text into some word processing software such as Word. If cutting and pasting into Word still shows strange symbols then the file is encrypted. You may need to contact the author to see if they will give you an un-ecrypted version.

If you see no text then the pdf might actually be a scan of a pdf (i.e. an image). In that case there isn't any textual content in the file to display. It is just an picture of the text.

If you think your pdf is not one of these cases you can email it to us and we will have a look at it.

Best regards,

Jeff

13/05/2011 10:41

Sergio Graziosi

Joined: 17/10/2011

Posts: 325

Re: "viev text" Modified By Sergio Graziosi on 13/05/2011 10:46:27

Hi Chiara,

Jeff is right, this usually happens when the PDF is either made of bitmap images (view text is empty) or in an encrypted format (view text is full of crazy symbols). I'll add some more details for everyone's benefit.

PDFs allow for mixed content as well, so sometime protions of the text will be extracted and some other will not (either because they are images or because they are encrypted). There is no real solution for such situations:
If the PDF is made out of scanned images rather than text, to extract the text we would need to automatically extract the images and feed them into a text recognition algorithm (OCR), this is possible in theory, but it is certain to be error-prone and is very complex to put toghether. We don't have plans to write this feature for the time being.
If the PDF is encrypted, then there is no solution at all: encryption is used precisely to stop people from working with the textual content. People may read it and sometimes print it, but all other features are effectively blocked.

You can verify if a troublesome PDF falls into one of the two categories in this way:
1) open up the pdf outside of EPPI-Reviewer in adobe reader (or equivalent),
2) select some text, copy it and try to paste it into a program like Word.
3a) If you couldn't select, copy and and paste anything at all, then it is likely that the PDF is made out of images.
3b) If the pasted text is made out of symbols, then the file is encrypted.
3c) if the text can be transferred into Word, but EPPI-Reviewer does not import it, then you hit one of the rare cases were some small error in the PDF itself prevents our system to correctly extract the text. In this case, you can usually "Save as Text" the PDF from inside Acrobat Reader. You can then upload the symple text file into EPPI-Reviewer as you would upload ordinary PDFs.

I hope this helps,

Sergio

Page 1 of 1