39 languages automatically: how our AI translation handles technical terminology

39 languages automatically: how our AI translation handles technical terminology

A look behind the scenes of our automatic product-data translation - and why technical terminology has to be treated differently from a novel.

Machine translation today is so good that in many cases you can no longer tell it apart from human translation. Translation services work fluently, idiomatically, with a feel for register. Then you translate a DPP data set - and suddenly “rear lock fibre closure” becomes “Hinterschloss-Faserverschluss”.

The problem is called technical terminology. Here we explain why product data is not to be treated like novels and which tools Transpareo provides so that your 39 language versions stay comprehensible.

The fundamental problem: one word, several meanings

“Seal” in the DPP of an outdoor jacket: a watertight closure. “Seal” in a laboratory: a seal animal or a gasket, depending on context. “Seal” in a maintenance log: possibly a stamp.

A general translation model chooses based on the statistical context. With a flowing text this works - the novel provides plenty of context. With a data field primary_closure: seal there is barely any context. The model guesses.

The result is subtle errors. Not as dramatic as “Hinterschloss-Faserverschluss”, but consequential: a component that is called “Dichtung” in German is suddenly called “sigillo” instead of “guarnizione” in an Italian DPP. A buyer can no longer find the spare part.

What Transpareo delivers today

Our translation system transfers every new content automatically into all active languages. Four properties characterise it:

  • Markdown and variable preservation: placeholders such as <a href="/en/register">Pro-Mitgliedschaft</a> and Markdown structures are extracted before the translation, the pure text is translated, and afterwards the structures are reinserted unchanged. This keeps links, forms and layout consistent across all languages.
  • Central translation entries: translations are not stored in the data set itself but in a shared layer. Several data sets with the same source text share a translation. This saves translation costs and unifies terms automatically across the data model.
  • Automatic re-translation on change: if the source text is changed, the translations are regenerated in all languages. A correction in German - 38 other language versions follow automatically.
  • Per-data-set markers: content can be excluded from the automatic run or existing translations can be locked - for international product names or manual corrections, for example.

Where the customer fills the gaps

The automatic translation delivers mostly correct results for descriptive texts, marketing texts and care instructions. With critical technical terminology - the “seal”/”guarnizione” case - a residual set of errors remains that the customer’s admin has to correct.

Here the admin has three levers:

  1. Manual override per language and key: every translation entry can be opened in the application manager and adjusted per language. With the lock marker this manual translation is retained at the next automatic run.
  2. Glossary import: existing terminologies from translator tools or PDF glossaries can be imported as CSV and directly create written translation entries.
  3. Per-language corrections in ongoing operation: an Italian sales team notices an error and corrects it in the application manager - the correction takes effect immediately, the other translations are left untouched.

The EU language reality

24 official EU languages sounds like a lot. In practice there are three layers:

  • core markets: DE, EN, FR, IT, ES, NL - here every consumer expects perfection
  • significant markets: PT, PL, SV, DA, FI - a good level, occasionally you notice the machine
  • rare languages: MT, GA, ET, LV, LT - sometimes you have a DPP in Maltese without an end consumer in Malta ever scanning it. Mandatory nonetheless.

The obligation is not optional. The ESPR requires DPP content in the language of the member state in which the product is sold. Anyone serving 27 states therefore has 24 languages in play (some share languages).

Why a centralised localisation layer

Most platforms store translations as additional fields on the data set: description_de, description_en, … 39 fields per translatable attribute. Sounds simple, but it has three disadvantages:

  • duplicated text. Two products with the same material note produce 39 + 39 translations instead of 39 once
  • hard to scale. Adding a 40th language means: a schema migration across all translatable models
  • corrections hard to apply globally. If “guarnizione” is corrected everywhere, all data sets would have to be edited individually

The shared translation layer solves this: one entry, many references. One correction, all data sets benefit.

What we do not yet have

A customer-specific terminology database with automatic suggestion detection is in the development plan but is not shipped today. Anyone who starts today gets far with the existing tools: manual overrides, glossary imports and the lock marker cover the most common use cases.

We believe that machines should do the bulk of the work and humans should intervene only where it is really necessary. Until the automatic terminology detection is available, the manual lever is transparent - and that is more honest than a promise that is not kept.

Updates on multilingualism and DPP practice

New languages, data quality and product features - curated once a month to your inbox.