[go: up one dir, main page]

File: README

package info (click to toggle)
uima-as 2.3.1-9
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, buster
  • size: 7,416 kB
  • sloc: java: 42,882; xml: 24,751; sh: 187; makefile: 9
file content (444 lines) | stat: -rw-r--r-- 22,337 bytes parent folder | download | duplicates (10)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
     Apache UIMA Asynchronous Scaleout (UIMA-AS) Version 2.3.1 README
     ----------------------------------------------------------------
     
0. Building from the Source Distribution
========================================
     
We use Maven 3.0 for building; download this if needed, 
and set the environment variable MAVEN_OPTS to -Xmx800m -XX:MaxPerSize-256m.

Then do the build by going into the .../uima-as directory, and issuing the command
   mvn clean install.
   
This builds everything except the ...source-release.zip file. If you want that,
change the command to 

   mvn clean install -Papache-release
   
Look for the result in the two artifacts: 
   ../uima-as/target/uima-as-[version]-source-release.zip if run with -Papache-release) and
   ../uima-as/target/uima-as-[version]-bin.zip (or  ...tar.gz) 

For more details, please see http://uima.apache.org/building-uima.html   

1. What's New in 2.3.1
====================== 
  - UIMA-AS repackaged - includes base UIMA
  - New ActiveMQ version - includes ActiveMQ 5.4.1 with its dependencies
  - Each thread uses dedicated Serializer/Deserializer instance, removing a bottleneck 
  - Many bug fixes 
  - Multiple improvements to increase performance
  - Improved error handling and stability
  - Fixed CDE slow startup with an aggregate containing a remote delegate
  - Improves and fixes bugs related to XMI delta serialization   
  - Fixes a bug preventing shutdown of C++ services
  - Improved debugging with optional dumping of JVM stack if process() takes too long
  - Added detection of failed client before processing a CAS
  - Fixed bugs in JMX Monitor integrated with UIMA AS service 
  - Improved performance by increasing prefetch size on reply queue
  - Fixed processParentLast logic
  

1.1 Contents of Apache UIMA-AS binary distribution
--------------------------------------------------

The Apache UIMA-AS binary distribution includes
  - Apache UIMA Java SDK 
  - Apache UIMA Asynchronous Scaleout extensions
  - Saxon
  - Apache ActiveMQ
  - Spring Framework

It includes the base UIMA binary distribution which it depends on.

UIMA-AS components, in addition to those that come with base UIMA include:

  shell scripts:
  --------------

  bin/startBroker.sh/bat: starts the ActiveMQ broker, which must be running before
      UIMA AS services can be deployed.
  bin/deployAsyncService.sh/bat: deploys an AnalysisEngine as a UIMA-AS
      service.  Takes one or more UIMA-AS Deployment Descriptors as arguments.
  bin/runRemoteAsyncAE.sh/bat: Calls a UIMA-AS service. Takes arguments specifying the
      location of the service, and an optional CollectionReader descriptor file used to
      obtain the CASes to be processed by the service.

  documentation:
  --------------
 
  docs/d/uima_async_scaleout.pdf: UIMA-AS documentation, including the specification
      for the deployment descriptor file syntax.

  examples:
  ---------
 
  examples/deploy/as/...  (Sample Deployment Descriptors)
    Deploy_RoomNumberAnnotator.xml: Deploys Room Number Annotator Primitive AE
    Deploy_MeetingDetectorTAE.xml: Deploys Meeting Detector Aggregate AE with all
        delegates in the same JVM.
    Deploy_MeetingDetectorTAE_Whiteboard.xml: Deploys Meeting Detector Aggregate AE
        using the whiteboard Flow Controller.
    Deploy_MeetingDetectorTAE_RemoteRoomNumber.xml: Deploys Meeting Detector Aggregate AE
        that uses remotely deployed RoomNumberAnnotator.
    Deploy_MeetingDetectorTAE_3MeetingAnnotator.xml: Deploys Meeting Detector Aggregate AE
        with three instances of the MeetingAnnotator component.
    Deploy_MeetingDetectorTAE_Sync_3Instances.xml: Deploys 3 instances of the
        Meeting Detector as a Synchronous Aggregate (meaning the delegate AEs do not each
        get their own input queue).
    Deploy_MeetingAnnotator.xml: Deploys C++ Meeting Annotator. Note: requires
        installation of uimacpp SDK into $UIMA_HOME.

    MeetingFinderAggregate.xml: Aggregate descriptor that use the same components as the
        CPE examples MeetingFinderCPE* in base UIMA.
    Deploy_MeetingFinder.xml: Deploys MeetingFinderAggregate illustrating scalability and
        error handling similar to the CPM examples; see Section 4 on migration below.

  descriptors:
  ------------
 
  descriptors/as/...  (Other Sample Descriptors for use with UIMA AS)
    MeetingDetectorAsyncAE.xml:  Specifier that can be used to call a UIMA AS
        Service from an existing UIMA application; see Section 2.5 below.


2. Installation and Setup
=========================

2.1 Supported Platforms
-----------------------

UIMA AS Requires Java 5 or later.  It has been tested with Sun Java 5 on Windows XP and Linux,
and with IBM Java 6 on Linux.  Other platforms and Java (5+) implementations should work, 
but have not been significantly tested.

2.2. Environment Variables
--------------------------

After you have unpacked the UIMA AS distribution, you must perform the following
environment variable settings (the same as for normal Apache UIMA setup):

    * Set JAVA_HOME to the directory of your JRE installation you would like to use for UIMA.
    * Set UIMA_HOME to the apache-uima-as directory of your unpacked Apache UIMA distribution
    * Append UIMA_HOME/bin to your PATH

    Note: The Mac OS X operating system has special procedures for setting up global environment
    variables; see http://developer.apple.com/qa/qa2001/qa1067.html for how to do this.

2.3 Running the Setup Script
----------------------------

You must run the script UIMA_HOME/bin/adjustExamplePaths.bat (or .sh).  This updates
paths in the examples based on the actual UIMA_HOME directory path.

2.4 Setting up Eclipse
----------------------

Eclipse users should install the UIMA Eclipse Plugins and UIMA Examples Project using the 
procedure described in Chapter 3 of the Apache UIMA Overview and Setup guide,
which you can find online at http://uima.apache.org; click on Documentation ->
HTML Online Version -> Overview and Setup -> 3. Eclipse IDE setup for UIMA. 

Since UIMA AS requires Java 5, you must be sure to set up your uimaj-examples Eclipse
project to use a version 5 (or later) JRE, and you must set your compiler compliance level to 5.0.  To do
this go to Window->Preferences and navigate to the Java->Compiler page.  Remember to 
run the base Eclipse using Java 5 (or later), as well.


3. Getting Started
==================

3.1 Starting the ActiveMQ Broker
--------------------------------

UIMA AS services require an ActiveMQ broker to be available with which to create/register 
the service request queue. If no broker is available, start a new broker on the same machine 
the services will run on or another machine; this is done by first setting an env parameter
ACTIVEMQ_BASE pointing at a writable directory, or simply by cd'ing to a writable directory,
and running:

   startBroker.sh/bat

The first time run this script will create a new directory $ACTIVEMQ_BASE/amq (or ./amq)
and default configuration files will be copied there. The configuration files can then be 
customized to modify broker behavior for subsequent startups.

Note: only one broker can be started at a time on the same machine with the same
      configuration file, or on different machines from the same writable directory.

When the broker starts it will print a message such as:

  INFO TransportServerThreadSupport - Listening for connections at: tcp://yourHostname:61616

Note this URL since you will need it to run services and clients.

The tcp listening port must be exposed to any clients or services using the broker.
To connect to a broker running behind a firewall using HTTP tunneling, see section 3.6 below.

3.2 Deploying an Analysis Engine as a UIMA AS Asynchronous Service
------------------------------------------------------------------

a. Create a Deployment Descriptor.
  Examples can be found in the examples/deploy/as directory,
  and the syntax is documented in docs/d/uima_async_scaleout.pdf.
  
  Note: One of the things that the deployment descriptor may contain is a broker placeholder with 
  this syntax: ${defaultBrokerURL}. The placeholder is replaced at runtime with an actual broker
  URL. The value for the placeholder comes from System properties. The brokerURL attribute of <inputQueue ...>
  element is optional. If not present, a default of tcp://localhost:61616 will be used.

  The examples assume the broker is listening on tcp://localhost:61616.

b. Run the command:

     deployAsyncService.sh/cmd [testDD.xml] [-brokerURL url]

  The argument to the command is the deployment descriptor you created in step (a). An optional argument
  -brokerURL specifies a URL of the broker that the service will use to create connections to queues. This
  argument takes effect only if your deployment descriptor does not explicitly name the broker URL in the 
  <inputQueue ...> xml element *or* the brokerURL attribute is set to a placeholder ${defaultBrokerURL}. 
  Omitting brokerURL or using a placeholder is a way to keep your deployment descriptors portable. You don't 
  need to edit your deployment descriptors when switching brokers.

  Note: If you use import by name in your deployment descriptor, UIMA AS searches the CLASSPATH
  as well as directories on UIMA_DATAPATH to resolve the import.
 
  Note: deployAsyncService.sh/cmd scripts launch UimaBootstrap main program which loads UIMA jars
  dynamically from UIMA_HOME/lib, UIMA_HOME/apache-activemq-5.4.1, UIMA_HOME/apache-activemq-5.4.1/lib,
  and UIMA_HOME/apache-activemq-5.4.1/lib/optional directories. If you want to use a different
  version of ActiveMQ, set the ACTIVEMQ_HOME environment variable to the location of 
  ActiveMQ you intend to use. When using different version of ActiveMQ ( older than version 5.4), start
  the broker using a startup script located in ACTIVEMQ_HOME/bin directory instead of 
  UIMA_HOME/bin/startBroker.[cmd|sh] script provided in this release. This is due to the fact that the 
  5.4 branch of ActiveMQ xml schema has changed. The activemq_nojournal.xml provided in this release two
  optimizations: schedulerSupport=false to disable broker message scheduler, and producerFlowControl="false" 
  to turn off producer flow control.
  
  Also, if you want to deploy your own annotator that is 
  installed in a different directory than UIMA_HOME/lib, set the UIMA_CLASSPATH 
  environment variable to point to one or more Classpath entries;  these entries can either be
  directories of Java class files, Jar files, or directories of Jar files (in which case, all the 
  Jars are added in an arbitrary order). Separate multiple entries using File.pathSeparator.    
  
  Note: Both UIMA AS client and UIMA AS service by default add a time-to-live (TTL) to 
  every request message. This enables expiration of messages that are not consumed. 
  Currently, the UIMA AS client multiplies the Process Timeout value by 10 and uses
  this as TTL. The UIMA AS service sets TTL to the Timeout value multiplied by the number 
  of outstanding requests. To disable TTL, add a system property -DNoTTL. A convenient 
  way to set this parameter is by adding -DNoTTL to the env parameter UIMA_JVM_OPTS 
  before running deployAsyncService and/or runRemoteAsyncAE.  
  

3.3 Calling a UIMA AS Asynchronous Service
------------------------------------------

To test a remote UIMA service you can use the script:
  runRemoteAsyncAE.sh/cmd brokerUrl endpoint

  This connects to a remote AE at specified brokerUrl and endpoint (which must match
  the inputQueue endpoint in the remote AE service's deployment descriptor).

  A subset of the optional arguments to runRemoteAsyncAE are:
    -c  Specifies a CollectionReader descriptor.  The client will obtain CASes from the
        CollectionReader and send them to the service for processing.  If this option
        is omitted, one empty CAS will be sent to the service (useful for services
        containing a CAS Multiplier acting as a collection reader).

    -d  Specifies a deployment descriptor. The specified service will be deployed before processing
        begins, and the service will be undeployed after processing completes. 
        Multiple -d entries can be given.

    -o  Specifies an Output Directory.  All CASes received by the client's
        CallbackListener will be serialized to XMI in the specified OutputDir.
        If omitted, no XMI files will be output.
        
  The full set of arguments are documented if you type the command with no arguments.

3.4 Quick Test of an async service
----------------------------------

Start two terminal windows, each with an environments setup as described in section 2.2.

  * In the first terminal window start the broker (as described in section 3.1),
    by running the commands:
      cd some-writable-directory     
      startBroker.sh/bat

  * In the second terminal window, deploy a sample service and send it some CASes:
      cd $UIMA_HOME/examples/deploy/as
      runRemoteAsyncAE.sh/cmd tcp://localhost:61616 MeetingDetectorTaeQueue \
         -d Deploy_MeetingDetectorTAE.xml \
         -c $UIMA_HOME/examples/descriptors/collection_reader/FileSystemCollectionReader.xml

    If you get an UnsupportedClassVersionError, Java 5 is probably not being used.
    If the driver fails to find the input data, adjustExamplePaths was probably not run.

3.5 Calling a UIMA AS Asynchronous Service from an Existing UIMA Application

You can also call a UIMA AS Service from the DocumentAnalyzer or any other UIMA
application using a new JMS client Service Descriptor
(see section 1.7 in the UIMA Asynchronous Scaleout documentation). 
However, note that this is a synchronous interface, that is, it will process only 
one CAS at a time, so it will not take advantage of the scalability that UIMA AS 
provides.  To process more than one CAS at a time, you can use the Asynchronous 
UIMA AS Client as described in section 3.3, or write your own application using the
UIMA AS Client APIs.

An example JMS client service descriptor is provided in

  examples/descriptors/as/MeetingDetectorAsyncAE.xml

The JMS service makes use of the customResourceSpecifier capability in Apache UIMA.
For more information on the customResourceSpecifier see the "Custom Resource Specifiers"
section in the Apache UIMA Reference manual.

3.6 Firewalls between clients and services
------------------------------------------

A service running behind a firewall can be accessed as long as its input queue
is on a broker that is accessable. For example, the service can register with a
public broker running outside the firewall. Alternatively the broker may be configured 
to tunnel over HTTP. For details see http://activemq.apache.org/configuring-transports.html

Note: The ActiveMQ 5.4.1 jars distributed with this UIMA AS release include bug fixes described in 
      http://issues.apache.org/activemq/browse/AMQ-1308. Previous releases of UIMA AS included 
      custom ActiveMQ 4.1.1 jars (which we have patched) to remedy the problems. When using 
      ActiveMQ HTTP connector, make sure that both a client and a service use either ActiveMQ 4.1.1 
      or ActivemMQ 5.4.1+ jars. 
       
     
3.7 Monitoring a broker and its queues
--------------------------------------

When the broker starts it will print a message such as:
  INFO  ManagementContext  - JMX consoles can connect to service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi

Connect a JMX console to this service with:
         $JAVA_HOME/bin/jconsole service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi
         
(Note: jconsole is available in Java SDK (not JRE) distributions from Sun)         
If your console is not on the same machine as the broker, replace localhost by
the name of the broker's machine.  The default ports for the broker (61616) and 
for the JMX server (1099) can be overridden in the broker configuration file 
created when the broker is first started. For more details see 
http://activemq.apache.org/jmx.html

3.8 Monitoring UIMA AS service
------------------------------

UIMA AS service monitoring is available via JMX and jConsole. To enable this, please set the following 
before starting a service:

set UIMA_JVM_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=<uniquePortNumber, e.g. 8009> 
                  -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

Connect a jConsole to this JMX service as described in 3.7 above using the appropriate port, e.g.

         $JAVA_HOME/bin/jconsole service:jmx:rmi:///jndi/rmi://localhost:8009/jmxrmi

Under the MBeans tab, expand org.apache.uima in the left panel to view UIMA components enabled for JMX monitoring.

3.9 Stopping UIMA AS service
----------------------------

A service can be stopped from a command line or remotely using jConsole and JMX. When the service is
launched it displays a prompt on stdout:

    Enter 'q' to quiesce and stop the service or 's' to stop it now: 

It reads stdin expecting either 'q' or 's', ignoring all other characters. When 'q' is typed, the
service will quiesce and stop. As part of this, the service closes its input queue and waits until all CASes 
still "in-play" are finished. When the input CAS is returned the service stops. When 's' is typed the service
closes its input queue and immediately releases all CASes being processed and stops.

To stop UIMA AS process remotely or if the process runs in a background use jConsole and JMX. Using approach 
described in 3.8 above launch jConsole. Once the connection is created, in the left pane open:

org.apache.uima
    ee.jms.service
        <Your Annotator Name> Uima EE Service
              Controller
                   Operations
                           
Here you will find two buttons labeled:
          CompleteProcessingAndStop
          StopNow

CompleteProcessingAndStop will initiate quiesce while StopNow will initiate a hard stop.

4. Migration from CPM to UIMA-AS
================================

Migrating a collection processing engine from the CPM to UIMA-AS is straightforward.

First, migrate the CPE descriptor to a standard UIMA aggregate descriptor:  
create a UIMA aggregate that includes all the components specified in the CPE descriptor. 
Transfer any parameter overrides in the CPE descriptor to the aggregate descriptor.
Note that the aggreate descriptor must set <multipleDeploymentAllowed> to false 
to be consistent with collection reader and CAS consumer delegates.

Second, test this aggregate descriptor by instantiating the aggregate and sending it a single CAS.
The contents of the CAS are not important; its purpose is to start the collection reader delegate
which will then create the actual CASes to be processed by the other aggregate components.
The CAS Visual Debugger, CVD, is a useful tool for doing this test.

Next, create a UIMA-AS deployment descriptor that specifies desired scaleout and error handling.
Vinci services are still supported, although it is recommended to replace them with UIMA-AS
services to enable more efficient load balancing and greater scaleout capability.

An example of this kind of migration is embodied by the sample descriptors:
  Original:  
     $UIMA_HOME/examples/descriptors/collection_processing_engine/MeetingFinderCPE_Integrated.xml
  Migrated:
     $UIMA_HOME/examples/deploy/as/MeetingFinderAggregate.xml
     $UIMA_HOME/examples/deploy/as/Deploy_MeetingFinder.xml


5. Known problems/limitations with Release 2.3.1
================================================

  1. No automatic refresh for broken connections with temp reply queues.
  
  2. When connecting to an AMQ broker behind a firewall, avoid using the maxInactivityDuration=0 
     decoration on the brokerURL (see: http://activemq.apache.org/configuring-wire-formats.html)
     as it turns off AMQ 'keep alive' messaging. Without these, a firewall may assume a connection
     has become stale and close its ports. 

  3. UIMA AS does not (yet) support PEAR specifiers. The deployment descriptor should not point to PEAR file. 
     Instead, PEARs must be unzipped and classpath adjusted to point to required resources.

  4. To monitor multiple UIMA AS services on the same machine, each must be assigned a unique 
     JMX port (see section 3.8).

  5. Handling of timeouts on Cas Multiplier delegates not working 
  
For up-to-date information on UIMA-AS issues, see our issue tracker:
http://issues.apache.org/jira/secure/BrowseProject.jspa?id=12310570

Crypto Notice
-------------

   This distribution includes cryptographic software.  The country in 
   which you currently reside may have restrictions on the import, 
   possession, use, and/or re-export to another country, of 
   encryption software.  BEFORE using any encryption software, please 
   check your country's laws, regulations and policies concerning the
   import, possession, or use, and re-export of encryption software, to 
   see if this is permitted.  See <http://www.wassenaar.org/> for more
   information.

   The U.S. Government Department of Commerce, Bureau of Industry and
   Security (BIS), has classified this software as Export Commodity 
   Control Number (ECCN) 5D002.C.1, which includes information security
   software using or performing cryptographic functions with asymmetric
   algorithms.  The form and manner of this Apache Software Foundation
   distribution makes it eligible for export under the License Exception
   ENC Technology Software Unrestricted (TSU) exception (see the BIS 
   Export Administration Regulations, Section 740.13) for both object 
   code and source code.

   The following provides more details on the included cryptographic
   software:
   
   This distribution includes portions of Apache ActiveMQ, which, in
   turn, is classified as being controlled under ECCN 5D002.