Text Formatting Settings
When you print to Miraplacid Text Driver,
Preview window pops up.
If you click on "Settings" button on the Preview Window toolbar,
Settings dialog will open.
Settings dialog has several tabs discussed
here. This document describes
Text Formatting Settings.
- Character set New versions of Windows print text in Unicode.
You can keep it in Unicode or convert to old good 8-bit bytes. Please specify
your charater set from the dropdown list.
- Insert Unicode prefix Some text editors add codes 0xFF 0xFE to the beginnning of Unicode text file.
We recommend to add the prefix unless you are sure your software does not support it.
- End of line style You can choose between Windows and Unix style.
We recommend to use Windows style unless you use legacy Unix software.
- Insert page breaks Adds page prakes symbol.
As an alternative, you can add {PAGE} to outlut file name to save pages to indvidual files.
- Formatting Style
Miraplacid Text Driver can format extracted text different ways. Unfortunately, there is no way
to make it look exactly like the original docuement. Plain text files do not support
different font types and sizes and cannot condence or expand characters. However,
Miraplacid Text Driver Text Formatting plug-ins do a really good job in most cases.
Formatting Styles
If you need your text to look like the original document, please select Formatted text.
If you need just a text without formatting, select Plain text.
If you familiar with XML files processing, you can try XML output.
It saves textboxes with text, size, location and
font information. Besides, it contains page size and DPI settings.
RSS-Atom formatting style allows you to save information in Web content syndication formats RSS and Atom
for further using in news exchanging services.
Text with Layout is similar to Formatted text, but based on previous version of
text formatting algorithm.
We recommend to use "Formatted text" unless "Text with Layout" works better for you.
Formatting Style Settings
- Formatted text and Text with Layout - you can turn on or off Print margins.
When turned on, Miraplacid Text Driver will add blank borders to formatted text. Border sizes will
be calculated to match print margin settings in the document you extracting text from.
- Plain text - This text formatting style just merges all pieces of text in each line.
By default it adds blank character between them, but you can change it by updating Delimiter
value. This value may include the following escaped special characters: \s (whitespace), \t, \r, \n, \f, \\ (backslash itself) and \xnnnn (nnn is a hexadecimal code of Unicode symbol).
- XML option Optimize output can be used to merge individual textboxes, if words became split to several pieces, into whole words.
Textboxes bound coordinates will be merged if this option is turned on.
Whitespaces will be removed from output.
- RSS-Atom style settings Link, Author and Description represent appropriate fields of RSS
channel or Atom feed attributes. Additionally, Add <BR> to EOL option adds linebreak tag to make text look formatted (in Atom this means HTML content type).
- Formatted text and XML - you can set Whitespace size.
Sometimes, it is impossible to separate textboxes automatically because there are very small spaces between them.
Adjusting this parameter from 10% to 100% width from average space character, you may set an order in document with many "glued" words and phrases.
See also: