Post-editing Machine Translation - Tips and Best Practice

At CLEAR Global/Translators without Borders we may use machine translation (MT) to pre-translate some of our projects. This means you might need to correct MT outputs while working on a post-editing (PE) task.

We have prepared tips on how to best approach working in MT projects. We also detail below some common issues you’ll come across to help you identify and navigate through them.

Do you think you’re in need of more training on post-editing or machine translation? If so, make sure to check our Learning Center, where you’ll find Introduction to Machine Translation and Post Editing and other courses.

How to post-edit machine translation in a nutshell?

Improving the output of state-of-art MT is challenging, especially when the target looks natural without any human intervention. It is easy to move on to another segment without realizing there is an issue with the translation!

  1. You should always start by reading the source segment.
  2. Done? Move to comparing it with the MT output to identify if and where you need to change something.
  3. Edit what needs to be changed. Remember to:

a) check the translation against our quality categories, asking yourself

  • Is the MT output communicating the meaning of the source text correctly and precisely?

  • Does the text follow the standard spelling, punctuation, and grammar?

  • Are any keywords and phrases translated accurately and in a consistent manner? Are we correctly using our termbase entries?

  • Does the translation sound natural and is it appropriate for the readers in terms of style?

  • Are the tags (special markers inside the CAT tool - Phrase TMS) correctly added to the segment? Is the formatting correct?

b) consider if the output has any issues that go beyond linguistic errors related to the text’s purpose, domain (subject matter), audience or tone,

c) be consistent with your lexical or grammatical choices between segments.

  1. To finish, reread the target segment before confirming it, making sure nothing has slipped your attention.

Challenging source content

Some of the source texts are more challenging for the MT engines to translate. Others might result in translation errors that are easy to miss. Keep an extra eye on such sources when doing your PE task:

  • broken sentences or sentences with blank spaces to fill out,
  • hyperlinks or nonwords (codes with sets of letters and numbers, serial numbers, hashtags, docs paths, etc.),
  • tables of content, especially with numbers,
  • titles and lists of contents,
  • language, region, or country names,
  • dates and names of months (also in written form),
  • names, proper names, or organization names,
  • texts written in all caps, numbers, especially monetary values and symbols, such as currency symbols,
  • contents with mixed languages.

Error impact

Some issues have more impact than others. In this section, we would like to share how we would classify some of the most common MT issues using our DQF-MQM error typology. We hope this will help you in your work on post-editing tasks!

Critical issues

Issues that have a significant impact and may cause severe implications. Those are rarely produced by the MT but can cause more damage than others and might be difficult to spot:

  • MT hallucinations: the translation does not represent the original at all or is incomprehensible.
    Ex.: “The refugees crossed into Spain” becoming “crossed into Portugal”.

  • MT duplications: the translation contains duplicated source meaning or parts of it. Think about “boat boat” or “I went to the market, to the market I go”.

  • MT additions/omission: the translation includes or omits a word, symbol or number that is not in the original. For example, “1000” becomes “1000 €” or vice versa.

  • Changed numbers, dates, names, or strings of numbers and letters. Oddly, MT sometimes struggles with those and changes them to something else, within or outside of the same category. Compare those items with the original before accepting your segments if you do not want to have “May” instead of “November” in your translation.

Major issues

Issues that have a considerable impact and may confuse or mislead the reader. In case of MT outputs, they are often related to a lack of localization or consistency issues:

  • Titles or references (bibliography) that have not been localized. Do you need more clarity on how to approach them? Check the project instructions or reach out to your project manager.

  • Consistency issues (lexical or stylistic). MT does not really hold information between the segments and, consequently, does not make unified ‘decisions’. Do you need to translate a word in a particular way or are you following the same grammatical forms in the following sentences? Make sure to edit your MT output to reflect this.

Minor issues

Issues that have a slight impact do not cause loss of meaning nor confuse the reader. Those are more common and, if accumulated, tend to impact readers’ attention:

  • Omitted or unnecessary whitespaces, especially if the target language has any particular rules for them different from the source text rules.

    • Ex.: French takes two spaces before a question mark, while English does not take any.
  • Wrong punctuation in the target language. Make sure you always follow the correct punctuation rules for the language we are translating into, and not the original language’s ones!

    • Ex.: Spanish uses opening question marks, while English does not.
  • Wrong text formatting (bold, italics, uppercase, other) or inconsistent with source. Make sure to correct this also in your MTPE task.