Automatically Extract text from documents

Print this Topic  Previous Topic Home Topic Next Topic
You are here: Search for a document > Document Full Text Search >Automatically Extract text from documents

 
You can automatically extract text from documents on adding by selecting the Automatically extract text from documents while adding option from the Options window.

 

To Automatically Extract text (OCR) from the Document:

1.In Sohodox, click the Sohodox button.
2.Click the Options button. The Options window will be launched.
3.Select the DB options node in the left pane. The DB options will now be displayed in  the right pane.
4.Check the Automatically extract text from documents on check-in option.
5.Click the OK button to apply the changes.

 

You can also switch from Sohodox OCR engine to Microsoft Office OCR engine to extract text from documents. For more info see Extract Text from Document
Sometimes for slower machines you may want to turn off the automatic extraction and indexing of documents.

 

Sohodox uses it's built-in text extractor for MS Word (DOC, DOCX), MS Excel (XLS, XLSX) and PDF files (PDF files which contain text and not only scanned images). In case of any other file formats, for Sohodox to be able to extract text from a file of that particular format, an IFilter for that file format must be installed on the user's machine.

 
   IFilters for the following file formats are installed by default on Windows 2000/XP/2003/2008//Vista/7 machines...

PPT (Microsoft PowerPoint presentation)
HTML documents
TXT documents

 

 
 


Related Topics
Search for text in a document

Document Full Text Search - FAQ

 


Page URL: http://www.sohodox.com/docs/help/index.htm?automatically_extract_text_fro.htm