-
Notifications
You must be signed in to change notification settings - Fork 13
Description
The problem
Found problem while translate page https://en.wikipedia.org/wiki/Data_Encryption_Standard with Linguist.
Permanent link on page is https://en.wikipedia.org/w/index.php?title=Data_Encryption_Standard&oldid=1282716609
Translator implementation limit array elements for batch translation up to 80 elements.
The problem is Scheduler does not respect value returned by method checkLimitExceeding and sometimes send more than 80 elements.
My expectation is Scheduler will run checkLimitExceeding before every translation, to make sure the data are correct and ready to be sent over internet.
Current behavior leads to rate limits errors by API server.
Second request yields next request with 81 elements:
{"q":["Data Encryption Standard – Hungarian","Piawaian Penyulitan Data – Malay","Data Encryption Standard – Dutch","Data Encryption Standard – Japanese","DES – Norwegian Bokmål","DES – Norwegian Nynorsk","good article badge","Data Encryption Standard – Polish","Data Encryption Standard – Portuguese","Data Encryption Standard – Romanian","DES – Russian","Standardi i kriptimit të të dhënave – Albanian","Data Encryption Standard – Simple English","Data Encryption Standard – Slovak","DES – Slovenian","DES – Serbian","Data Encryption Standard – Serbo-Croatian","DES – Finnish","Data Encryption Standard – Swedish","DES – Tajik","DES – Turkish","Data Encryption Standard – Ukrainian","DES (mã hóa) – Vietnamese","数据加密标准 – Wu","數據加密標準 – Cantonese","資料加密標準 – Chinese","Edit interlanguage links","Namespaces","View the content page [Alt+Shift+c]","Discuss improvements to the content page [Alt+Shift+t]","Change language variant","Views","Edit this page [Alt+Shift+e]","Past revisions of this page [Alt+Shift+h]","Page tools","Tools","More options","Edit this page [Alt+Shift+e]","List of all English Wikipedia pages containing links to this page [Alt+Shift+j]","Recent changes in pages linked from this page [Alt+Shift+k]","Upload files [Alt+Shift+u]","Permanent link to this revision of this page","More information about this page","Information on how to cite this page","Expand all collapsible elements on the current page","Edit interlanguage links","Download this page as a PDF file","Page tools","Appearance","IBM","Help:IPA/English","/ˌ/: secondary stress follows","'d' in 'dye'","/iː/: 'ee' in 'fleece'","/ˌ/: secondary stress follows","/iː/: 'ee' in 'fleece'","/ˈ/: primary stress follows","/ɛ/: 'e' in 'dress'","'s' in 'sigh'","'d' in 'dye'","/ɛ/: 'e' in 'dress'","'z' in 'zoom'","Symmetric-key algorithm","Encryption","Cryptography","IBM","Horst Feistel","National Bureau of Standards","National Security Agency","Differential cryptanalysis","Brute-force attack","Federal Information Processing Standard","Classified information","Key length","Symmetric-key algorithm","Block cipher","Category:CS1 maint: multiple names: authors list","Category:CS1 maint: multiple names: authors list","History of cryptography","Classical cipher","Keygen"],"source":"en","target":"ru","format":"text","api_key":"SECRET_HERE"}Third request yields next request with 103 elements:
{"q":["Main menu","Main menu","move to sidebar","Search","Donate","Create account","Log in","Personal tools","\n\t\tPages for logged out editors ","Contents","hide","(Top)","History","Toggle History subsection","NSA's involvement in the design","The algorithm as a standard","Chronology","Description","Toggle Description subsection","Overall structure","The Feistel (F) function","Key schedule","Pseudocode","Security and cryptanalysis","Toggle Security and cryptanalysis subsection","Brute-force attack","Data Encryption Standard","50 languages","Article","Talk","Read","Edit","View history","Tools","Tools","move to sidebar","Appearance","hide","Text","Small","Standard","Large","Width","Standard","Wide","Color ","(beta)","Automatic","Light","Dark","From Wikipedia, the free encyclopedia","Data Encryption Standard","The Feistel function (F function) of DES","General","Designers","IBM","The "," (",") is a "," for the "," of digital data. Although its short key length of 56 bits makes it too insecure for modern applications, it has been highly influential in the advancement of ",".\n","Data Encryption Standard","DES","/",",","/","ˌ","d","iː","ˌ","iː","ˈ","ɛ","s","d","ɛ","z","symmetric-key algorithm","encryption","cryptography","Developed in the early 1970s at "," and based on an earlier design by ",", the algorithm was submitted to the "," (NBS) following the agency's invitation to propose a candidate for the protection of sensitive, unclassified electronic government data. In 1976, after consultation with the "," (NSA), the NBS selected a slightly modified version (strengthened against ",", but weakened against ","), which was published as an official "," (FIPS) for the United States in 1977.","IBM","Horst Feistel","National Bureau of Standards","National Security Agency","differential cryptanalysis","brute-force attacks","Federal Information Processing Standard","[","2","]","classified","key length","symmetric-key","block cipher"],"source":"en","target":"ru","format":"text","api_key":"SECRET_HERE"}API error example:
{"error":"Invalid request: request (103) exceeds text limit (80)"}Translator implementation:
/**
* Homepage: https://github.com/LibreTranslate/LibreTranslate
* Demo: https://libretranslate.com/
* API docs: https://libretranslate.com/docs/
*/
class LibreTranslator {
// URL of your instance of LibreTranslate
// for local instance use URL "http://localhost/translate"
apiPath = 'https://libretranslate.com/translate';
// Insert API key if you have
apiKey = 'API_KEY_HERE';
translate = (text, from, to) => {
this.translateBatch([text], from, to).then((translations) => translations[0])
};
translateBatch = (texts, from, to) => {
console.log('Call translateBatch');
return fetch(this.apiPath, {
credentials: 'omit',
headers: {
Accept: '*/*',
'Content-Type': 'application/json',
},
method: 'POST',
mode: 'cors',
body: JSON.stringify({
q: texts,
source: from,
target: to,
format: 'text',
api_key: this.apiKey,
}),
})
.then((r) => r.json())
.then(({ translatedText }) => translatedText)
};
getLengthLimit = () => 2000;
getRequestsTimeout = () => (1/(1200/60/60) * 1000) * 3;
checkLimitExceeding = (text) => {
const textsListLimit = 80;
if (Array.isArray(text) && text.length > textsListLimit) {
const overflow = text.slice(textsListLimit).reduce((len, text) => len + text.length, 0);
console.log('Overflow', overflow);
return overflow;
}
const textLength = !Array.isArray(text)
? text.length
: text.reduce((len, text) => len + text.length, 0);
return textLength - this.getLengthLimit();
};
static isSupportedAutoFrom = () => true;
// prettier-ignore
static getSupportedLanguages = () => [
"en", "ar", "az", "zh", "cs",
"nl", "eo", "fi", "fr", "de",
"el", "hi", "hu", "id", "ga",
"it", "ja", "ko", "fa", "pl",
"pt", "ru", "sk", "es", "sv",
"tr", "uk", "vi"
];
};
LibreTranslator;Solution
We have to reproduce a bug with tests and fix original problem. Scheduler must always check checkLimitExceeding and call translation methods only if limits is not exceeded.
In case it is not possible to adequate text splitting, Scheduler may throw exception. It is better than ignore user settings.