CAT Analysis: Repeats Count

While asking or bidding for a translation project involving this or that CAT tool, it is customary to ask or quote a discount rate for repeats. 

But what is repeat, and what it involves. Simply put, a repeat is a sentence or a "segment" which is repeated within the document that the CAT recognizes as identical. Since they are identical, translation is also identical, that is to say, once an instance of an identical segment is translated, it requires no further processing by the translator other than simply hitting a key combination to accept it as a valid translation after seeing a 100% match. Therefore, and reasonably, outsourcers do not want to pay full rate for such repeats. And they ask a (deep) discount at full rate for repeats with which translators usually comply. 

Unfortunately, the translator is required to quote before actually analyzing the document himself. 

The problem here lies not in asking a discount, but in determining a fair discount that would benefit both parties. And it is not an easy job as it might seem at first sight as the following simple analysis will show. Precisely, the problem is to find the actual number of unique segments requiring translation, since the CAT tools give us a tally rather than a frequency distribution. 

We can take two extreme scenarios in a any repeat analysis: the best and the worst one. Let us examine both scenarios with a numerical example. Let us say that we have document where a CAT tool reports a repeat of 10,000 words. And the translator is required to quote a discount for repeats. How much he should quote? How can he fix a reasonable rate that is fair to both parties?

Best scenario:

Repeats consist of only a single segment (a word, a sentence, or a part thereof) which is repeated over and again. For example, a sentence consisting of 10 words repeated for 1,000 times. In this case, the translator is required to translate only 10 words to get the whole 10,000 words translated! For the rest, he would only press a particular key combination to accept the translation that takes only one or two seconds.

Worst Scenario:

In the worst scenario, each segment is repeated only once. Following the example above, half of the words (5,000) make up the unique segments that require translation. Assuming, as above, that each segment consists of 10 words, the translator would be required to translate 500 segments. 

Actual Frequency distribution:
In reality, the number of unique segments requiring translation varies between the two extremes depending on the type of document analyzed. In some documents, a single segment may be repeating a hundred times. In general, number of repeats vary. Returning to our example, let us say that the repeats have the following actual frequency distribution:

2,000 words repeated two times. = 4,000 words
1,500 words repeated three times = 4,500 words
125 words repeated four times = 500 words
200 words repeated five times = 1,000 words

Total number of words in unique segments requiring translation = 3725 words.

Cost Estimates based on Scenarios:

Now we can estimate costs on the basis of our hypotetical scenarios, and determine the gains or losses incurred by the parties, i.e., by the outsourcer and the translator. 

In all calculations let us assume that  full rate is 0.1 unit currency, and the translator offers 70% discount for repeats.

Table 1. Actual Number of Words Requiring Translation and Costs

Scenario unique words translated  quoted price outsourcer's gain/loss translator's gain/loss
Best 10 300 - 299  + 299
Worst 5000 300 + 200  - 200
Actual 3725 300 + 72.5 - 72.5

As can easily be seen from the table, all scenarios other than the "best" one leads to translator to suffer real losses. The more he discounts, the more he looses. Therefore, he should be careful when quoting discounts for repeats. In practice, discounts more than 60% lead to losses for the translator. 

Let us illustrate this with our above example, this time assuming that the translator quotes discount on the basis of worst scenario. 

Table 2. Discount Quoted on the Basis of Worst Scenario (i.e., at 50%)

Scenario unique words translated  quoted price outsourcer's gain/loss translator's gain/loss
Best 10 500 - 499  + 499
Worst 5000 500 0 0
Actual 3725 500 - 127.5 + 127.5
And the last table for 60% discount shows that this is the optimum level for both the outsourcer as well as the translator.

Table 3. Discount Quoted at 60% (i.e., currency unit 0.04 / words)

Scenario unique words translated  quoted price outsourcer's gain/loss translator's gain/loss
Best 10 400 - 399  + 399
Worst 5000 400 +100 -100
Actual 3725 400 - 27.5 + 27.5

In summary, we can safely conclude that the repeats discount should not be less than 50% and greater than 70%, the optimum rate being 60%. 

