Miraplacid Text Driver use-cases
Miraplacid Text Driver meets our customer's needs right out of the box with very few exceptions. It extracts text from a printed document and saves it to a file. It can also copy it to clipboard, email, upload to a server, or print to a real printer. It has rich options for naming output files and folders, text formatting, automatic processing, and so on.User can create multiple virtual printers with different settings and choose which virtual printer to print to.
There are, however, some cases when that is not enough.
1. The toughest problem we faced so far is text formatting. Original document might be composed with multiple fonts, include tables and all kind of fancy formatting. Target text file is plain text. There is no exact algorithm to precisely convert one to the other. We have to deal with heuristics and trade-offs.
It is hard to decide which strings shall go into a single line, yet it is much harder to guess the right spaces between the strings, for instance, for tables. To make things worse, sometimes small-print text just wouldn't fit. Imagine a typewriter - there is no way to switch to finer font and squeeze 200 characters into a table cell.
To make things worse, when Windows prepares document for printing, it splits text into chunks. Sometimes these chunks shall be glued back together.
When Miraplacid Text Driver project started many years ago, Miraplacid Text Driver had just "plain text" mode and "text with layout".
The former just ignored tabulation and aligned text to the left. The latter tried to reproduce the original formatting.
We also had an option for dumping all the raw printing output to XML, mostly for debugging purposes. Surprisingly, some of our customers found a way to use this XML output for their needs.
We have a long history of text formatting improvements. At some point we had to introduce a new heuristic algorithm, because the "text with layout" did not work very good on our customer's documents. However, we were not able to change "text with layout", because many of our customers rely it, and it works just fine for them. This is where "formatted text" mode came from.
They do the same thing, but result is somewhat different. One or the other usually provides good results for the most of documents. Nevertheless, we are constantly adding new formatting features per our customer requests.
Some documents contain tables with right alignment. With the old algorithm they were not look nice. We introduced "right align" option.
Some documents in some applications have space character - based formatting and extracted text looks better when these characters are preserved. For the majority of documents and applications, though, space characters shall be ignored, cause they are just adding mess, especially when application is too creative with character width. This is "Use document spaces" option.
Some documents shall look consistent throughout all the pages. We introduced "character size" hint option for the rendering algorithm. It applies to all the pages of the document to ensure they will be processed uniformly. This is handy for multiple-pages reports, to make them ready for further parsing.
Some documents have very tricky document layout, and had issues with print margins and white spaces between the characters. We had added more fine-tuning to "formatted text" mode to make his output files look nice.
The story of all these changes is the same: they start with email from Miraplacid Text Driver customer who cannot make extracted text look right. In response, we ask for problem description, files and images. Then our engineers prepare a pilot version of new formatting feature. As soon as customer gets desired result out of it, we include these new features into our main Miraplacid Text Driver distribution.
The bottom line is: Text formatting is tricky. We are constantly improving it, and customer's feedback helps a lot. In some tricky cases perfect solution might just not be possible. However, we do our best to provide results our customers need.
2. The first version of Miraplacid Text Driver many years ago just saved text to a file. Later on we had customers who needed to integrate it with some third-party software to upload files to a server.
To make their life easier, we added to Miraplacid Text Driver FTP and HTTP upload support. Our customers were happy with simplicity of the solution.
Recently one of our customers asked whether it is possible to upload files to a server securely. Lack of security was a deal-breaker for her, due to sensitive nature of the processed documents.
We implemented HTTPS protocol support, which resolved the issue.
3. First two use-cases are rather boring. This one we like the most.
One of our customers needed to print some documents to one physical printer, and print some other documents to the other printer, yet do not print some other documents at all - depend on some key phrases in a document body.
The problem is - the user was some complicate automated system, which sends all these documents to the same printer.
They asked whether Miraplacid products could help. Sure they can. Same day our engineers built a prototype software on C# and sent source code to customer. Customer was quite happy.
Here is what the prototype did. It hooked up to Miraplacid Text Driver core and wait for a document to be printed.
Then it parsed the document text for required key phrases. Based on findings, it forwarded document to the right real printer.
Our customer needed to print the original document with all the graphics, not just extracted text. No problems, Miraplacid Text Driver has this option on board.
4. We were surprised to learn, that Miraplacid Text Driver is used in place of tiny retail terminal printer that prints receipts.
Our customer needed a copy to be saved to a file, yet another copy printed to a real retail terminal printer. Few customers complained, that redirected output is too wide or too narrow.
Our engineers had to introduce fine-tuning for EMF settings (which stands for Windows Enhanced Metafile) and, for the worse case scenario, brute force scale factor.
It completely solved the problem with size of printed redirected image - "Scale factor" option.