Skip to content

Latest commit

 

History

History

eng-zle

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): bel bel_Latn orv_Cyrl rue rus ukr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
newstest2012-engrus.eng.rus 25.6 0.533
newstest2013-engrus.eng.rus 20.0 0.480
newstest2015-enru-engrus.eng.rus 22.4 0.518
newstest2016-enru-engrus.eng.rus 21.7 0.505
newstest2017-enru-engrus.eng.rus 23.6 0.527
newstest2018-enru-engrus.eng.rus 20.9 0.513
newstest2019-enru-engrus.eng.rus 22.0 0.487
Tatoeba-test.eng-bel.eng.bel 20.5 0.468
Tatoeba-test.eng.multi 35.6 0.570
Tatoeba-test.eng-orv.eng.orv 0.4 0.140
Tatoeba-test.eng-rue.eng.rue 0.9 0.158
Tatoeba-test.eng-rus.eng.rus 38.8 0.598
Tatoeba-test.eng-ukr.eng.ukr 36.9 0.582

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): bel bel_Latn orv_Cyrl rue rus ukr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
newstest2012-engrus.eng.rus 25.9 0.535
newstest2013-engrus.eng.rus 20.0 0.480
newstest2015-enru-engrus.eng.rus 22.5 0.517
newstest2016-enru-engrus.eng.rus 21.6 0.506
newstest2017-enru-engrus.eng.rus 23.4 0.526
newstest2018-enru-engrus.eng.rus 20.8 0.512
newstest2019-enru-engrus.eng.rus 21.6 0.485
Tatoeba-test.eng-bel.eng.bel 20.9 0.464
Tatoeba-test.eng.multi 35.3 0.564
Tatoeba-test.eng-orv.eng.orv 0.5 0.134
Tatoeba-test.eng-rue.eng.rue 1.3 0.178
Tatoeba-test.eng-rus.eng.rus 39.0 0.596
Tatoeba-test.eng-ukr.eng.ukr 36.7 0.579

opus2m-2020-08-02.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): bel bel_Latn orv_Cyrl rue rus ukr
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-02.zip
  • test set translations: opus2m-2020-08-02.test.txt
  • test set scores: opus2m-2020-08-02.eval.txt

Benchmarks

testset BLEU chr-F
newstest2012-engrus.eng.rus 27.4 0.550
newstest2013-engrus.eng.rus 21.4 0.493
newstest2015-enru-engrus.eng.rus 24.2 0.534
newstest2016-enru-engrus.eng.rus 23.3 0.518
newstest2017-enru-engrus.eng.rus 25.3 0.541
newstest2018-enru-engrus.eng.rus 22.4 0.527
newstest2019-enru-engrus.eng.rus 24.1 0.505
Tatoeba-test.eng-bel.eng.bel 20.8 0.471
Tatoeba-test.eng.multi 37.2 0.580
Tatoeba-test.eng-orv.eng.orv 0.6 0.130
Tatoeba-test.eng-rue.eng.rue 1.4 0.168
Tatoeba-test.eng-rus.eng.rus 41.3 0.616
Tatoeba-test.eng-ukr.eng.ukr 38.7 0.596

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): bel orv rue rus ukr
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>bel<< >>bel_Latn<< >>orv<< >>orv_Cyrl<< >>rue<< >>rus<< >>ukr<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
newstest2012.eng-rus 26.7 0.544 3003 64830 0.993
newstest2013.eng-rus 20.6 0.485 3000 58560 0.983
newstest2015-enru.eng-rus 23.8 0.529 2818 55915 1.000
newstest2016-enru.eng-rus 22.6 0.515 2998 62018 1.000
newstest2017-enru.eng-rus 24.6 0.536 3001 60255 1.000
newstest2018-enru.eng-rus 22.2 0.523 3000 61920 1.000
newstest2019-enru.eng-rus 22.7 0.495 1997 48153 0.937
Tatoeba-test.eng-bel 23.7 0.499 2500 16231 0.998
Tatoeba-test.eng-bel_Latn 2.1 0.015 3 29 0.683
Tatoeba-test.eng-multi 35.7 0.574 10000 63596 0.986
Tatoeba-test.eng-orv 0.7 0.151 322 1708 1.000
Tatoeba-test.eng-rue 1.0 0.209 120 496 1.000
Tatoeba-test.eng-rus 38.4 0.594 10000 66695 0.986
Tatoeba-test.eng-ukr 37.1 0.583 10000 60677 0.976
tico19-test.eng-rus 21.4 0.497 2100 55837 0.917

opus1m+bt-tuned4eng2bel-2021-04-16.zip

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-bel 24.1 0.502 2500 16237 0.995
Tatoeba-test.eng-multi 4.1 0.206 10000 63596 1.000

opusTCv20210807+bt_transformer-big_2022-03-13.zip

Benchmarks

testset BLEU chr-F #sent #words BP
newstest2012.eng-rus 36.7 0.62851 3003 64830 0.971
newstest2013.eng-rus 26.9 0.54707 3000 58560 0.969
newstest2015-enru.eng-rus 34.9 0.62591 2818 55915 0.994
newstest2016-enru.eng-rus 33.0 0.60568 2998 62018 0.980
newstest2017-enru.eng-rus 37.3 0.64230 3001 60255 0.987
newstest2018-enru.eng-rus 32.8 0.61235 3000 61920 0.983
newstest2019-enru.eng-rus 31.7 0.57847 1997 48153 0.908
Tatoeba-test-v2021-08-07.eng-bel 24.7 0.50065 2500 16231 0.946
Tatoeba-test-v2021-08-07.eng-multi 41.6 0.63018 10000 66008 1.000
Tatoeba-test-v2021-08-07.eng-orv 0.5 0.17450 322 1708 1.000
Tatoeba-test-v2021-08-07.eng-rue 1.1 0.21725 120 496 1.000
Tatoeba-test-v2021-08-07.eng-rus 44.7 0.65732 19425 133881 1.000
Tatoeba-test-v2021-08-07.eng-ukr 37.7 0.60066 13127 80866 0.994
tico19-test.eng-rus 33.8 0.59342 2100 55837 0.905