You can automatically extract text from documents on adding by selecting the Automatically extract text from documents while adding option from the Options window.
To Automatically Extract text (OCR) from the Document:
2. | Click the Options button. The Options window will be launched. |
3. | Select the DB options node in the left pane. The DB options will now be displayed in the right pane. |
4. | Check the Automatically extract text from documents on check-in option. |
5. | Click the OK button to apply the changes. |
|
• | You can also switch from Sohodox OCR engine to Microsoft Office OCR engine to extract text from documents. For more info see Extract Text from Document |
• | Sometimes for slower machines you may want to turn off the automatic extraction and indexing of documents. |
• | Sohodox uses it's built-in text extractor for MS Word (DOC, DOCX), MS Excel (XLS, XLSX) and PDF files (PDF files which contain text and not only scanned images). In case of any other file formats, for Sohodox to be able to extract text from a file of that particular format, an IFilter for that file format must be installed on the user's machine. |
IFilters for the following file formats are installed by default on Windows 2000/XP/2003/2008//Vista/7 machines...
► | PPT (Microsoft PowerPoint presentation) |
Related Topics
Search for text in a document
Document Full Text Search - FAQ
Page URL:
http://www.sohodox.com/docs/help/index.htm?automatically_extract_text_fro.htm