[go: up one dir, main page]

File: lt-proc.1

package info (click to toggle)
lttoolbox 3.5.0-3
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 1,328 kB
  • sloc: cpp: 12,761; ansic: 3,589; python: 391; makefile: 71; sh: 24
file content (178 lines) | stat: -rw-r--r-- 5,523 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
.TH lt-proc 1 2006-03-23 "" ""
.SH NAME
lt-proc \- This application is part of the lexical processing modules
and tools (
.B lttoolbox
)
.PP
This tool is part of the apertium machine translation
architecture: \fBhttp://www.apertium.org\fR.
.SH SYNOPSIS
.B lt-proc
[
.B \-a \fR|
.B \-b \fR|
.B \-o \fR|
.B \-c \fR|
.B \-d \fR|
.B \-e \fR|
.B \-g \fR|
.B \-n \fR|
.B \-p \fR|
.B \-s \fR|
.B \-t \fR|
.B \-v \fR|
.B \-h
.B \-z
.B \-w
] [
.B \-i \fR icx_file
] fst_file [input_file [output_file]]
.PP
.B lt-proc
[
.B \-\-analysis \fR|
.B \-\-bilingual \fR|
.B \-\-surf\-bilingual \fR|
.B \-\-case\-sensitive \fR|
.B \-\-debugged\-gen \fR|
.B \-\-decompose\-nouns \fR|
.B \-\-generation \fR|
.B \-\-non\-marked\-gen \fR|
.B \-\-tagged\-gen \fR|
.B \-\-post\-generation \fR|
.B \-\-sao \fR|
.B \-\-transliteration \fR|
.B \-\-null\-flush
.B \-\-dictionary\-case
.B \-\-decompose\-compounds \fR|
.B \-\-version \fR|
.B \-\-help
] [
.B \-\-ignored\-chars \fR icx_file
] fst_file [input_file [output_file]]
.SH DESCRIPTION
.BR lt-proc
is the application responsible for providing the four lexical
processing functionalities

.RS
\(bu \fImorphological analyser\fR  ( option \fB\-a\fR )
.PP
\(bu \fIlexical transfer\fR  ( option \fB\-n\fR )
.PP
\(bu \fImorphological generator\fR  ( option \fB\-g\fR )
.PP
\(bu \fIpost-generator\fR  ( option \fB\-p\fR )
.RE
\fR
.PP
It accomplishes these tasks by reading binary files containing a
compact and efficient representation of dictionaries (a class of
finite-state transducers called augmented letter transducers). These
files are generated by \fBlt\-comp(1)\fR.
.PP
It is worth to mention that some characters
(`\fB[\fR', `\fB]\fR', `\fB$\fR', `\fB^\fR', `\fB/\fR', `\fB+\fR') are
\fIspecial\fR chars used for format and encapsulation. They should be
escaped if they have to be used literally, for
instance: `\fB[\fR'...`\fB]\fR' are ignored and the format of a
\fIlinefeed\fR is `\fB^\fR...\fB$\fR'.
.SH OPTIONS
.TP
.B \-a, \-\-analysis
Tokenizes the text in surface forms (lexical units as they appear in
texts) and delivers, for each surface form, one or more lexical forms
consisting of lemma, lexical category and morphological inflection
information. Tokenization is not straightforward due to the existence,
on the one hand, of contractions, and, on the other hand, of
multi-word lexical units. For contractions, the system reads in a
single surface form and delivers the corresponding sequence of lexical
forms. Multi-word surface forms are analysed in a left-to-right,
longest-match fashion. Multi-word surface forms may be invariable
(such as a multi-word preposition or conjunction) or inflected (for
example, in es, \fI"echaban de menos"\fR, \(dqthey missed\(dq, is a
form of the imperfect indicative tense of the verb \fI"echar de
menos"\fR, \(dqto miss\(dq). Limited support for some kinds of
discontinuous multi-word units is also available. Single-word surface
forms analysis produces output like the one in these examples:
\ \fI"cantar"\fR \-> `\fI^cantar/cantar<vblex><inf>$\fR' or
\ `\fI"daba"\fR \->
\ `\fI^daba/dar<vblex><pii><p1><sg>/dar<vblex><pii><p3><sg>$\fR'.
.TP
.B \-b, \-\-bilingual
Does lexical transference, attaching queues of morphological symbols
not specified in the dictionaries. As the analysis mode, supports
multiple lexical forms in the target language for a given lexical
form in the source language. Works typically with the output of
apertium-pretransfer.
.TP
.B \-o, \-\-surf\-bilingual
As with \-b, but takes input from apertium\-tagger \-p , with
surface forms, and if the lexical form is not found in the bilingual
dictionary, it outputs the surface form of the word.
.TP

.B \-c, \-\-case\-sensitive
Use the literal case of the incoming characters
.TP
.B \-d, \-\-debugged\-gen
Morph. generation with all the stuff
.TP
.B \-e, \-\-decompose\-compounds
Try to treat unknown words as compounds, and decompose them.
.TP
.B \-w, \-\-dictionary\-case
Use the case information contained in the lexicon, instead of the surface
case (only applied in analysis mode).
.TP
.B \-g, \-\-generation
Delivers a target-language surface form for each target-language
lexical form, by suitably inflecting it.
.TP
.B \-n, \-\-non\-marked\-gen
Morphological generation (like \fB\-g\fR) but without unknown word
marks (asterisk `*').
.TP
.B \-b, \-\-tagged\-gen
Morphological generation (like \fB\-g\fR) but retaining part-of-speech
tags.
.TP
.B \-p, \-\-post\-generation
Performs orthographical operations such as contractions and
apostrophations. The post-generator is usually \fIdormant\fR (just
copies the input to the output) until a special \fIalarm\fR symbol
contained in some target-language surface forms \fIwakes\fR it up to
perform a particular string transformation if necessary; then it goes
back to sleep.
.TP
.B \-s, \-\-sao
Input processing is in \fIorthoepikon\fR (previously `\fIsao\fR')
annotation system format: \fBhttp://orthoepikon.sf.net\fR.
.TP
.B \-t, \-\-transliteration
Apply a transliteration dictionary
.TP
.B \-i, \-\-ignored\-chars icx_file
Ignores characters specified in the file icx_file
.TP
.B \-z, \-\-null\-flush
Flush output on the null character
.TP
.B \-v, \-\-version
Display the version number.
.TP
.B \-h, \-\-help
Display this help.
.SH FILES
.B input_file
The input compiled dictionary.
.SH SEE ALSO
.I lt-expand\fR(1),
.I lt-comp\fR(1),
.I apertium-tagger\fR(1),
.I apertium\fR(1).
.SH BUGS
Lots of...lurking in the dark and waiting for you!
.SH AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante.