1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
|
#+STARTUP: showall
* WER-loss
This test will check WER on some news articles, and ensure we don't
get worse results than the last time around:
#+BEGIN_SRC sh
t/wer-loss
#+END_SRC
Articles and their post-edits (graciously donated by NTB/NPK) are in
=t/ntb=, along with the previous good MT output. If our WER is worse
than the previous good one, we fail.
* Hash count
We should not have more #'s than last time on the =t/ntb= articles.
* Regression/Pending wiki tests
You should have your tests in a page named after your language pair, e.g.
https://wiki.apertium.org/wiki/apertium-sme-smj/Regression_tests
and
https://wiki.apertium.org/wiki/apertium-sme-smj/Pending_tests
** Running the tests
To run the tests from your language pair, assuming it's been set up as
shown above, do
#+BEGIN_SRC sh
t/update-latest
#+END_SRC
This will overwrite the files named t/latest-pending.results and
t/latest-regression.results. You can view the differences with
#+BEGIN_SRC sh
git diff
#+END_SRC
Test results are kept in git since that means we don't have to keep
moving things back and forth between "Pending" and "Regression" in the
wiki whenever we pass a new test (or fail an old one), and we get a
nice log of our progression.
To run just regression or just pending tests, use t/pending-tests or
t/regression-tests. Pass the -f argument to those scripts to only see
failed regression tests or passed pending tests, e.g.
#+BEGIN_SRC sh
t/regression-tests -f
#+END_SRC
|