Skip to content

Latest commit

 

History

History

eng-pqe

opus-2020-06-28.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): fij gil haw mah mri nau niu rap smo tah ton tvl
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-06-28.zip
  • test set translations: opus-2020-06-28.test.txt
  • test set scores: opus-2020-06-28.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-fij.eng.fij 22.5 0.434
Tatoeba-test.eng-gil.eng.gil 59.3 0.739
Tatoeba-test.eng-haw.eng.haw 1.1 0.159
Tatoeba-test.eng-mah.eng.mah 7.6 0.363
Tatoeba-test.eng-mri.eng.mri 7.2 0.295
Tatoeba-test.eng.multi 11.3 0.311
Tatoeba-test.eng-nau.eng.nau 0.5 0.094
Tatoeba-test.eng-niu.eng.niu 28.1 0.509
Tatoeba-test.eng-rap.eng.rap 3.5 0.163
Tatoeba-test.eng-smo.eng.smo 24.6 0.461
Tatoeba-test.eng-tah.eng.tah 10.4 0.296
Tatoeba-test.eng-ton.eng.ton 21.1 0.463
Tatoeba-test.eng-tvl.eng.tvl 29.3 0.500

opus-2020-07-27.zip

  • dataset: opus
  • model: transformer
  • source language(s): eng
  • target language(s): fij gil haw lkt mah mri nau niu rap smo tah ton tvl
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus-2020-07-27.zip
  • test set translations: opus-2020-07-27.test.txt
  • test set scores: opus-2020-07-27.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-fij.eng.fij 20.5 0.406
Tatoeba-test.eng-gil.eng.gil 31.4 0.607
Tatoeba-test.eng-haw.eng.haw 0.5 0.141
Tatoeba-test.eng-lkt.eng.lkt 0.5 0.077
Tatoeba-test.eng-mah.eng.mah 8.4 0.331
Tatoeba-test.eng-mri.eng.mri 7.9 0.300
Tatoeba-test.eng.multi 10.3 0.304
Tatoeba-test.eng-nau.eng.nau 0.5 0.083
Tatoeba-test.eng-niu.eng.niu 34.6 0.531
Tatoeba-test.eng-rap.eng.rap 2.1 0.148
Tatoeba-test.eng-smo.eng.smo 25.4 0.467
Tatoeba-test.eng-tah.eng.tah 8.9 0.263
Tatoeba-test.eng-ton.eng.ton 26.1 0.489
Tatoeba-test.eng-tvl.eng.tvl 28.9 0.520

opus2m-2020-08-01.zip

  • dataset: opus2m
  • model: transformer
  • source language(s): eng
  • target language(s): fij gil haw lkt mah mri nau niu rap smo tah ton tvl
  • model: transformer
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • download: opus2m-2020-08-01.zip
  • test set translations: opus2m-2020-08-01.test.txt
  • test set scores: opus2m-2020-08-01.eval.txt

Benchmarks

testset BLEU chr-F
Tatoeba-test.eng-fij.eng.fij 22.1 0.396
Tatoeba-test.eng-gil.eng.gil 41.9 0.673
Tatoeba-test.eng-haw.eng.haw 0.6 0.114
Tatoeba-test.eng-lkt.eng.lkt 0.5 0.075
Tatoeba-test.eng-mah.eng.mah 9.7 0.386
Tatoeba-test.eng-mri.eng.mri 7.7 0.301
Tatoeba-test.eng.multi 11.3 0.306
Tatoeba-test.eng-nau.eng.nau 0.5 0.071
Tatoeba-test.eng-niu.eng.niu 42.5 0.560
Tatoeba-test.eng-rap.eng.rap 3.3 0.122
Tatoeba-test.eng-smo.eng.smo 27.0 0.462
Tatoeba-test.eng-tah.eng.tah 11.3 0.307
Tatoeba-test.eng-ton.eng.ton 27.0 0.528
Tatoeba-test.eng-tvl.eng.tvl 29.3 0.513

opus1m+bt-2021-04-10.zip

  • dataset: opus1m+bt
  • model: transformer-align
  • source language(s): eng
  • target language(s): aoz fij gil haw mah mgm mri nau niu rap smo tah tet ton tvl
  • model: transformer-align
  • pre-processing: normalization + SentencePiece (spm32k,spm32k)
  • a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
  • valid language labels: >>aai<< >>aaw<< >>aaz<< >>adr<< >>adz<< >>aek<< >>agf<< >>agw<< >>ahb<< >>aia<< >>aie<< >>aix<< >>aji<< >>akg<< >>akr<< >>akt<< >>alm<< >>alo<< >>alp<< >>alu<< >>amk<< >>amq<< >>amv<< >>and<< >>ane<< >>anx<< >>aok<< >>aol<< >>aor<< >>aoz<< >>apb<< >>apo<< >>app<< >>apr<< >>aps<< >>apx<< >>aqr<< >>asl<< >>asz<< >>aty<< >>aua<< >>aud<< >>aui<< >>aul<< >>auq<< >>aut<< >>avb<< >>axx<< >>baa<< >>baj<< >>bay<< >>bbn<< >>bbv<< >>bcd<< >>bch<< >>bcm<< >>bcu<< >>bdd<< >>bed<< >>bek<< >>bgt<< >>bgy<< >>bhc<< >>bhp<< >>bhw<< >>biq<< >>bjk<< >>bjl<< >>bki<< >>bkx<< >>blp<< >>blq<< >>bmc<< >>bmk<< >>bmn<< >>bnd<< >>bnf<< >>bnk<< >>bnp<< >>bnr<< >>bpa<< >>bpg<< >>bpk<< >>bpz<< >>brj<< >>brr<< >>brz<< >>bsm<< >>btp<< >>btr<< >>bty<< >>buk<< >>bvc<< >>bvd<< >>bvt<< >>bwa<< >>bwb<< >>bwd<< >>bwf<< >>bxa<< >>bxf<< >>bxh<< >>bzh<< >>bzn<< >>bzq<< >>cal<< >>cam<< >>chk<< >>cir<< >>crc<< >>dac<< >>dad<< >>ddi<< >>ddw<< >>dgg<< >>dhv<< >>dij<< >>dix<< >>dmr<< >>dnk<< >>dob<< >>don<< >>dor<< >>drn<< >>drr<< >>dsn<< >>duf<< >>dva<< >>dww<< >>elu<< >>emi<< >>emw<< >>end<< >>erg<< >>erk<< >>erw<< >>etn<< >>faf<< >>far<< >>fij<< >>fnb<< >>frd<< >>frt<< >>fud<< >>fut<< >>fwa<< >>gal<< >>gar<< >>gdd<< >>gei<< >>ges<< >>gfk<< >>gga<< >>ggt<< >>ghn<< >>gil<< >>gip<< >>gli<< >>gmb<< >>goc<< >>goo<< >>gop<< >>gri<< >>grw<< >>grz<< >>gve<< >>gvs<< >>gzn<< >>hah<< >>hao<< >>haw<< >>hbu<< >>heg<< >>hgw<< >>hik<< >>hiw<< >>hla<< >>hoa<< >>hob<< >>hot<< >>hrk<< >>hrw<< >>hti<< >>htu<< >>hud<< >>huk<< >>hul<< >>huw<< >>hvk<< >>hvn<< >>iai<< >>idt<< >>iff<< >>ila<< >>ilu<< >>imr<< >>ire<< >>irh<< >>ism<< >>jae<< >>jaj<< >>jal<< >>jau<< >>jaz<< >>jmd<< >>kbi<< >>kbm<< >>kbt<< >>kbw<< >>kcl<< >>kdf<< >>kdk<< >>kei<< >>kem<< >>kgb<< >>khl<< >>khz<< >>kij<< >>kis<< >>kje<< >>kji<< >>kjr<< >>kkk<< >>klv<< >>klx<< >>koa<< >>kod<< >>kos<< >>kpd<< >>kpg<< >>kqf<< >>kqw<< >>krd<< >>krf<< >>ksd<< >>kse<< >>ksg<< >>ksl<< >>ksx<< >>ktk<< >>ktm<< >>kud<< >>kuk<< >>kuv<< >>kvc<< >>kvh<< >>kvo<< >>kvp<< >>kvv<< >>kwd<< >>kwf<< >>kwh<< >>kxa<< >>kxr<< >>kyd<< >>kzb<< >>kzd<< >>kzk<< >>kzl<< >>kzu<< >>kzx<< >>laz<< >>lbb<< >>lbq<< >>lbu<< >>lbv<< >>lcc<< >>lcd<< >>lcl<< >>lcm<< >>lcq<< >>lcs<< >>lek<< >>ler<< >>let<< >>leu<< >>lex<< >>lga<< >>lgb<< >>lgk<< >>lgl<< >>lgr<< >>lgu<< >>lhh<< >>lht<< >>lib<< >>lid<< >>lih<< >>lio<< >>ljl<< >>lka<< >>lkn<< >>lle<< >>llf<< >>llg<< >>llp<< >>llu<< >>llx<< >>lmb<< >>lmf<< >>lmg<< >>lmj<< >>lml<< >>lmq<< >>lmr<< >>lmu<< >>lmv<< >>lmy<< >>lnn<< >>loj<< >>los<< >>lox<< >>lpa<< >>lrn<< >>lrv<< >>lrz<< >>lti<< >>ltu<< >>lva<< >>lvu<< >>lwe<< >>lwt<< >>lww<< >>lzl<< >>mah<< >>mbh<< >>mbk<< >>mbq<< >>mcy<< >>mee<< >>mek<< >>met<< >>meu<< >>mft<< >>mgl<< >>mgm<< >>mgm_Latn<< >>mhs<< >>mhz<< >>mjk<< >>mjm<< >>mkj<< >>mkt<< >>mkv<< >>mky<< >>mla<< >>mll<< >>mln<< >>mlu<< >>mlv<< >>mlx<< >>mme<< >>mmg<< >>mmm<< >>mmo<< >>mmt<< >>mmw<< >>mmx<< >>mna<< >>mnl<< >>mnv<< >>mox<< >>mpl<< >>mpn<< >>mpo<< >>mpr<< >>mpx<< >>mpy<< >>mqa<< >>mqc<< >>mqi<< >>mqm<< >>mqp<< >>mqy<< >>mqz<< >>mrb<< >>mri<< >>mrk<< >>mrl<< >>mrm<< >>mrn<< >>mrp<< >>mrq<< >>mrs<< >>mrv<< >>msn<< >>msq<< >>mss<< >>msu<< >>mte<< >>mth<< >>mtt<< >>mum<< >>mva<< >>mvd<< >>mvn<< >>mvo<< >>mvr<< >>mvt<< >>mvx<< >>mwa<< >>mwc<< >>mwg<< >>mwh<< >>mwi<< >>mwo<< >>mxe<< >>mxm<< >>mxz<< >>myw<< >>mzz<< >>nae<< >>nak<< >>nal<< >>nau<< >>nbn<< >>ncc<< >>ncf<< >>ncn<< >>nee<< >>nek<< >>nem<< >>nen<< >>nfa<< >>nfl<< >>ngr<< >>nho<< >>nil<< >>niu<< >>nke<< >>nkk<< >>nkp<< >>nkr<< >>nlg<< >>nlz<< >>nmb<< >>nmk<< >>nms<< >>nmt<< >>nmw<< >>nnd<< >>nni<< >>npn<< >>nrg<< >>nrz<< >>nsn<< >>nss<< >>nsw<< >>ntu<< >>nua<< >>nul<< >>num<< >>nuq<< >>nur<< >>nuw<< >>nvh<< >>nwi<< >>nxa<< >>nxe<< >>nxg<< >>nxl<< >>ojv<< >>olr<< >>omb<< >>oni<< >>onu<< >>ora<< >>orz<< >>oum<< >>oyy<< >>pat<< >>pdn<< >>pek<< >>pex<< >>pfa<< >>pgk<< >>pif<< >>piv<< >>pix<< >>piz<< >>pkg<< >>pkp<< >>plb<< >>ple<< >>plh<< >>pma<< >>pme<< >>pmo<< >>pmt<< >>pnh<< >>pon<< >>pop<< >>ppm<< >>ppn<< >>pri<< >>pss<< >>psw<< >>ptn<< >>ptp<< >>ptr<< >>ptv<< >>puw<< >>pwg<< >>rai<< >>rak<< >>rap<< >>rar<< >>ray<< >>reb<< >>rga<< >>rgu<< >>riu<< >>rjg<< >>rkh<< >>rmm<< >>rnn<< >>roe<< >>ror<< >>row<< >>rri<< >>rro<< >>rtm<< >>rug<< >>sau<< >>sax<< >>sbb<< >>sbc<< >>sbe<< >>sbh<< >>seu<< >>sew<< >>sgu<< >>sgz<< >>sih<< >>sij<< >>sjr<< >>ske<< >>ski<< >>sku<< >>sky<< >>skz<< >>slp<< >>slu<< >>slz<< >>smo<< >>snc<< >>sns<< >>sob<< >>sol<< >>sov<< >>spb<< >>spe<< >>spr<< >>sps<< >>srf<< >>srw<< >>sry<< >>ssg<< >>sso<< >>ssv<< >>ssz<< >>ste<< >>stn<< >>stw<< >>svb<< >>sve<< >>swp<< >>sws<< >>sww<< >>szn<< >>szw<< >>tah<< >>tbc<< >>tbe<< >>tbf<< >>tbj<< >>tbo<< >>tbx<< >>tdt<< >>tet<< >>tev<< >>tgc<< >>tgg<< >>tgi<< >>tgo<< >>tgp<< >>tgs<< >>tio<< >>tkd<< >>tkl<< >>tkp<< >>tkv<< >>tkw<< >>tlm<< >>tlr<< >>tls<< >>tlt<< >>tlu<< >>tlv<< >>tlx<< >>tmb<< >>tmi<< >>tmq<< >>tmt<< >>tmy<< >>tni<< >>tnk<< >>tnl<< >>tnn<< >>tnp<< >>tns<< >>tnx<< >>ton<< >>tox<< >>tpa<< >>tpf<< >>tpv<< >>tpz<< >>tql<< >>tqp<< >>trb<< >>tre<< >>tsr<< >>tte<< >>tti<< >>ttu<< >>ttv<< >>tuc<< >>tva<< >>tve<< >>tvk<< >>tvl<< >>tvm<< >>twp<< >>twu<< >>txn<< >>txq<< >>tzn<< >>ubr<< >>udj<< >>uge<< >>uli<< >>una<< >>unu<< >>upv<< >>urn<< >>urr<< >>urv<< >>utp<< >>uur<< >>uve<< >>uvl<< >>val<< >>vao<< >>vbb<< >>viv<< >>vlp<< >>vme<< >>vmg<< >>vnk<< >>vnm<< >>vnp<< >>vra<< >>vrs<< >>vrt<< >>wab<< >>wad<< >>wag<< >>wah<< >>wat<< >>waz<< >>wbb<< >>wbw<< >>wed<< >>weo<< >>wet<< >>wew<< >>wgb<< >>wgo<< >>wha<< >>wiv<< >>wkd<< >>wlr<< >>wls<< >>wmh<< >>wmn<< >>wnk<< >>woc<< >>woe<< >>woo<< >>wrp<< >>wrx<< >>wsa<< >>wsi<< >>wuv<< >>wuy<< >>wwo<< >>wyy<< >>xbr<< >>xkx<< >>xmt<< >>xmx<< >>xsi<< >>xxk<< >>yap<< >>yki<< >>ykk<< >>ykm<< >>ylu<< >>yly<< >>yml<< >>ymn<< >>ymp<< >>yob<< >>zeg<< >>zgr<< >>zsa<< >>zsu<<
  • download: opus1m+bt-2021-04-10.zip
  • test set translations: opus1m+bt-2021-04-10.test.txt
  • test set scores: opus1m+bt-2021-04-10.eval.txt

Benchmarks

testset BLEU chr-F #sent #words BP
Tatoeba-test.eng-aoz 0.7 0.085 24 73 1.000
Tatoeba-test.eng-fij 26.6 0.441 44 180 1.000
Tatoeba-test.eng-gil 58.8 0.739 14 83 1.000
Tatoeba-test.eng-haw 1.0 0.165 92 447 1.000
Tatoeba-test.eng-mah 12.6 0.408 29 172 1.000
Tatoeba-test.eng-mgm 0.9 0.128 33 282 1.000
Tatoeba-test.eng-mri 9.5 0.313 363 3742 0.962
Tatoeba-test.eng-multi 11.3 0.306 861 6303 1.000
Tatoeba-test.eng-nau 0.8 0.090 20 85 1.000
Tatoeba-test.eng-niu 37.3 0.548 29 150 1.000
Tatoeba-test.eng-rap 2.0 0.147 26 125 1.000
Tatoeba-test.eng-smo 24.7 0.477 78 432 1.000
Tatoeba-test.eng-tah 10.3 0.285 21 100 1.000
Tatoeba-test.eng-tet 1.7 0.155 53 252 1.000
Tatoeba-test.eng-ton 27.0 0.521 20 94 1.000
Tatoeba-test.eng-tvl 33.1 0.529 15 86 1.000