Knowledgebase: Product > memoQ
Control characters in Arabic and Hebrew texts
Posted by P├ęter Botta on 28 February 2013 10:03 AM

Title: Control characters in Arabic and Hebrew texts since memoQ 6


In order to show bidirectional (BiDi) texts in the correct order in memoQ, memoQ uses control characters for languages like Arabic or Hebrew. These control characters are not shown in formats like DOC/RTF, DOCX, PPTX, XLSX, etc. MS Office for instance uses an extra flag for each character that defines how to treat a character (LTR or RTL markers). This also helps to disambiguate neutral characters such as spaces, commas, periods or parantheses.

Example of a control character:

Prio to memoQ version 6.0, such an extra flag as e.g. MS Office uses it, was also used in memoQ to flag each character. In the MBD format (memoQ bilingual format), which was discontinued with memoQ version 6.0, these extra flags were saved to still show text correctly after the document was imported.

With memoQ 6.0, the MBD format was replaced by memoQ XLIFF. In XLIFF, which is pure Unicode, there is no place for such flags. When you save your document as XLIFF, and you import the XLIFF file into another tool, you get the logical Unicode reading, which is not what the author of the text originally meant. Therefore, an extra step was added at import if content comes from MS Office: the pure text and the directional flags are analysed and the LTR and RTL markers are inserted in the places where they logically belong to. This ensures that the text displays as the author meant it.

As a result of creating Unicode that MS Office does not create in the first place on document import, it may affect your TM lookup. Previously stored entries in the TM do not have the BiDi control characters (prior to memoQ 6.0), but the new source text does which you store in your TM. It can result that previously 100% matches are only in the 95-99% match range.

(0 vote(s))
This article was helpful
This article was not helpful

Comments (0)
Help Desk Software by Kayako