Cleaning unnecessary tags with TransTools Document Cleaner
Cleaning unnecessary tags with TransTools Document Cleaner
Context:
Some Microsoft Word documents produced by OCR (Optical Character Recognition) and PDF conversion tools, when imported into memoQ, may contain excessive tags, even though the original document has simple formatting. This article describes how to prepare such Word documents in order to reduce the number of tags to the minimum.
Description:
Excessive tags in Word documents produced from PDF files occur due to slight differences in formatting applied to individual characters or words by OCR or PDF conversion tools, or due to bookmarks.
In simple cases, it is sufficient to save the document in DOCX format and import it, making sure that Ignore minor formatting changes for fewer tags option is checked if you use Import with options. This option is applied automatically if you use the Import command rather than Import with options, and removes tags caused by font character spacing (font scale, spacing, position, kerning).
If this does not help, you can use Document Cleaner (http://www.translatortools.net/word-doccleaner.html), which is part of TransTools for Word add-in, distributed as free software. Document Cleaner is a collection of tools designed to clean badly formatted documents before translation in CAT tools.
How to:
- Download TransTools from http://www.translatortools.net/download.html and install it. This will install TransTools for Word add-in and additional optional components (which you can deselect during the installation process).
- Open the document in Microsoft Word and click Document Cleaner on the TransTools tab (if you use Word 2007 or later) or TransTools -> Document Cleaner from the menu (if you use Word 2003 or earlier).
- On the Tag Cleaner tab of Document Cleaner dialogue, choose the necessary options from the list. Usually, the default options will be sufficient. For details about each option, see http://www.translatortools.net/word-doccleaner.html
- Click Clean Tags.
- Save the document and import/re-import it into memoQ. The document will now have much fewer tags.
We would like to thank Stanislav Okhvat for the contribution.
Please feel free to check out the TransTools website for more information.
Comments
0 comments
Please sign in to leave a comment.