The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Download Latest Version sanat.tar.bz2 (9.7 kB)
Email in envelope

Get an email when there's a new version of Sanat

Home / Sanat / 2000
Name Modified Size InfoDownloads / Week
Parent folder
REQUIREMENTS 2001-01-29 5.1 kB
example.dict 2001-01-24 950 Bytes
README 2001-01-24 11.8 kB
sanat 2001-01-24 19.5 kB
Totals: 4 Items   37.4 kB 0
Sanat

Sanat 2000.16 - 11th December, 2000
Esa Turtiainen


BASIC USAGE

This is the program sanat that helps you to study
foreign languages.

You can start program on command line like

	sanat dictionary.dict

The simplest way to use the proram is the simple 
question - answer mode.  Program writes a word in
one language and you must answer with corresponding
word in another language.  The answer must be exact
(but see option --noexact later).

In the default mode the program wants you to know
each word two times in both directions.

In the end of the word there may be extra information
in [brackets].  This is useful information but it
is not required in exact answer--acutally you must
not give it.

In the beginning of the question line there are
two numbers: how many words you have known and how
many there are in total.  The total may increase if
you have not known the word many times in sequence.
This will be explained thorougly in algorithm section.

The query order seems to be random but actually there
is a pretty complex algorithm behind. It takes into
account the current level of knowledge of each word.
Please refer to algorithm section for more details.

Some brief help is available with command:

	sanat --help


INTERACTIVE COMMANDS

There are some commands that you can use instead of
answer.  They are SAVE, LOAD, HELP, EXIT and QUIT.
They are written with uppercase to avoid collisions
with right answers.

SAVE stores the current session to the default save
file. Note that there is just one default save file
and all the previous saved sessions are lost!

LOAD loads the session from the default save file.
The current session is lost!

HELP gives a brief help text.

BUG: The current help text is too complex.

EXIT saves the current session and exits program.

QUIT exits the program without save. It asks
for confirmation.

If the program is invoked from the network, most
of these commands are disabled.


GIVING OPTIONS FROM THE COMMAND LINE

You can set options to modify query process and dictionary
selection.  They are normally given in the command line where
the option name is presented with two hyphens:

	sanat --round_max=1 dictionary.dict

Usage of '=' is optional and you can shorten option name as long
as it is ambiquous.  Thus, you can write:

	sanat --round_m 1 dictionary.dict

as well.  (There is another option --round_exp.)

Unsetting an option is expressed by adding 'no' to the
beginning of the option name:

	sanat --nobothdir dictionary.dict

vs. setting the option with:

	sanat --nobothdir dictionary.dict

Some of the options can be set from the dictionary file as well.
Please refer to the later discussion of the structure of the
dictionary file.


PLAYING WITH SAVE FILES

If the session is saved to the default save file, the program
can be invoked with

	sanat --load

that is the same as starting the program with some dictionary
file and issuing LOAD command.

The default save file can be changed with option

	sanat --save_file file.sav dictionary.dict

or

	sanat --save_file file.sav --load


QUERY OPTIONS

Query options modify behaviour of queries.

Normally we require that every word is known two times.
This can be changed with option --round_max:

	sanat --round_max=5 dictionary-dict

will ask every word five times.

Normally we want exact answers.  Option --noexact allows
answers what we evaluate by ourselves.

BUG: In the current version nonexact mode is disabled
because it is untested.


OPTIONS LIMITING DICTIONARY

By default, words are asked both directions.  The option
--nobothdir asks words only in one direction.

TODO: Currently it is possible to query only in one
direction. Sometimes we'd like to do it vice versa.

In some cases, dictionary file can be hundreds of words.
It is handy to split it to smaller pieces.  It is more
efficient to study one complete session at time.

You can limit words by command line options --word_first
and --word_last.  For example,

	sanat --word_first 10 --word_last 19

studies words 10..19 (inclusive) in this session.

	sanat --word_first 10 --word_last 0

studies from word 10 to the end.

Calculation of words does not include extra lines in the
dictionary file: commenta, user comments, options or empty
lines.  Therefore, the word number is *not* the same as
the line number in the dictionary file.

TODO: There should be options to select a set of random words
from the dictionary.


HANDLING DUPLICATES IN THE DICTIONARY

It is a bit tricky to handle duplicates in the dictionary.
User should answer either with all or one of them.
Neither of the solutions really help us to learn
words.

The current approach is to take only one of the
alternatives from the dictionary file (the last one).

With the option --noallow_duplicates (-ALLOW_DUPLICATES
in the dictionary file) you can say that it is an
error to have duplicates in the dictionary.  In
that case, you can edit dictionary file in a way
that you feel that it has the best alternative set.
You can comment out conflicting lines with # and
keep -ALLOW_DUPLICATES in the beginning of the file

TODO: In the current implementation, only the *last*
occurence of the synonym is taken into the query set.
A more developed algorithm would be to select a random
one to the query set.


DICTIONARY FILE FORMAT

Conventionally dictionaries are in .dict files.  They
are in a simple text format:

	word1/comment1:word2/comment2

(it must start from the beginning of the line.)

Additionally, in the file there can be comments, user
comments and option settings.  Comments are lines
that start with #.  For example:

	#
	# This is a comment
	#

Empty lines are allowed as comments as well.

A user comment is echoed to the user when program is
started.  The user comment starts with exclamantion mark
"!".  For example:

	!
	! This text (including empty lines) is written
	! for the user in the beginning of the program
	!

Some options can be set in the dictionary file.  Setting
option on is presented with + in the beginning of the line
and unsetting option is presented with -.  For example:

	-BOTHDIR
	-EXACT
	-ALLOW_DUPLICATES
	...
	+ALLOW_DUPLICATES

You can mark a region where you do not allow duplicates
by marking it with unset .. set pair like in the previous
example. However, with most options it does not matter
if options are set many times in the dictionary, only
the last setting is meaningful.

Option names are the same in dictionary file and in the
command line but in the command line they are lowercase
and in the dictionary file they are uppercase.

BUG: In the current version it is not possible to 
overwrite dictionary file options with command line
options.  Dictionary file options have higher priority.
In some cases, this is most unsatisfactory.


INSTALLATION

The program sanat is written using Perl 5.005.

The only library package that sanat uses is package
Getopt::Long that should be a standard part of
the Perl system.

The current system is prepared for three kind of
environment: Unix, Windows and invocation as a network
service through Unix inetd.

There is a separate README file describing installation
of inetd network server.


WINDOWS INSTALLATION

First, you need the Perl system.  It is a free program
available through links in http://www.perl.com.  Look
for downloads and Win32.  The current distribution is
called ActivePerl and (unfortunately) it is a very complete
system consisting of almost 10MB compressed file.
Actually, we are using a tiny part of the system.

When the perl is running, you should copy the sanat program
to (for example)

	C:\Program files\sanat\sanat

Then run program:

	pl2bat sanat

to translate it to sanat.bat. This makes startup easier.

The dictionary is (for example)

	C:\Program files\sanat\my.dic

Dictionaries with 8-bit character sets must usually be
converted when they are transported from Unix to Windows.

Then create a shortcut to this file and modify command
line of this shortcut (from Properties) (in one line):

	C:\Program files\sanat\sanat
          C:\Program files\sanat\my.dic

If you need to save many sessions, create many shortcuts
with a separate save file for each (again, in one line):

	C:\Program files\sanat\sanat
	  --save_file C:\Program files\sanat\my.sav
          C:\Program files\sanat\my.dic

If you have saved the session once with SAVE command,
you can start the session with the shortcut:

	C:\Program files\sanat\sanat --load


ALGORITHMS

Selecting words randomly for queries has proven to
be a bad good algorithm. Word that is not known is
queried often too soon or too late.  In the first case,
you have not learnt the word but you can still answer.
In the latter case, you have forgotten the word.  The
worst thing is that all the most difficult words will
be collected to the end.

Selection of words in sanat program is random but the
randomness is heavily guided by three priorities:
knowledge priority, round priority and retry priority.

Knowledge priority tells how well the word is known.
If the word is known well, it will be asked (probably)
quite late for the second time.  If the word is known
badly, it will be asked quite soon again.

Round priority tries to ensure that no word is left behind.
If the word is left behind, it will be asked more likely.

Retry priority prevents word to be asked too soon.
It enforces word not to be asked for some time and keeps
it somewhat less likely some time after that.

The one that you most likely want to modify is the
retry priority. Within --retry_min words (default 3)
we do not query the same word again (unless all words
are queried within the same limit).  After --retry_norm
(default 6) the likelyhood returns back to normal.

All the priorities give values 0..1 and they are multiplied
(like propabilities) to the the resulting priority.
Some priorities can be exponentiated: exponent that is >1
makes 1 more likely, exponent that is <1 makes 0 more
likely. Word selection is weighted with resulting
priority so that 0.8 is selected four times more often
than 0.2.

Exponent of round priority can be set with option --round_exp.
This option must be odd integer value because it is used
with negative numbers. Exponent of retry priority is normally 4
but it can be changed with option --retry_exp.  Knowledge
priority uses exponentiation as well, but it is built in to
the function.

Knowledge priority is initially 0.5 (--know_init).  If the word
is known at that first try, the priority is set to 0.05 (--know_min).
If the word is not known, we rise the priority to the power of
0.5 (--know_noexp).  If we know the word, we return to initial
priority 0.5.  If the priority risest to 0.9 (--know_retry) we 
cancel one time when the word was already known.


TESTING PRIORITIES

You can print out round priority and retry priority functions
with current parameters by options --test_roundprio and
--test_retryprio.  With test_round you get output of all
the priorities after every question.

With --test_fixrand=1 you can set random seed.  With the same
random seed, you get always the same results.

You can test evenness of the selection process with
--test_histogram.  It selects 1000 words for query and the
result should be about the same for every word.


OTHER TESTING

The option --verbose gives more information on execution,
esecially it gives information on how the dictionary file
is interpreted.

Finishing execution can be tested with options --test_autoyes
(should finish), --test_autono (should not finish) and
--test_autorand (should finish).  They automatically answer
right, wrong and randomly (respectively) to every question.

The option --test_input echoes the user input in hexadecimal
form.  This is useful in finding errors in 8-bit characters
and with different line endings (CR vs. LF vs. CRLF).
Source: README, updated 2001-01-24