Dear all.
I have got a workflow consisting of three jobs:
1, a Generator job (local script),
2, a BOINC job,
3, a Collector job (local script).
My workflow inputs are relatively big: 20M / work unit (WU). After the execution, I only want to retrieve a specific output file of the collector job. Unfortunately, it takes a lot of time if I have a lot of WUs.
For example: assuming there are 100 BOINC tasks => more than 2G has to be copied if I use the method getFiletoPortalServer() of ASMService. Once the workflow zip file is copied over, I extract the information I need, and write it to the ouputstream of my portlet's resource response. On my virtual machine, it takes about 5 minutes.
I have made some investigations and figured out that getFiletoPortalServer() calls a private method, getFileStreamFromStorage(), in the following way:
is = getFileStreamFromStorage(userID, workflowID, DownloadTypeConstants.InstanceAll);
Then it outputs the contents of the retrieved inputstream (i.e. is) to the user's worklfow_outputs directory ([TOMCAT_DIR]/temp/tmp/users/[USER_ID]/workflow_outputs/[WF_NAME]). So basically getFiletoPortalServer() copies everything (all jobs output + the first job's input) and then returns filepath to the caller.
ASMService provides us with another method getFileStream(), but unfortunately it also passes InstanceAll to getFileStreamFromStorage(), which results in a long execution time. As of writing, a new function getSingleOutputFileStream is also available on the trunk version, but again it is quite slow, because of similar reasons (it's implementation is similar to the former two functions).
As far as I know, the gUSE portal supports downloading outputs of individual jobs: Concrete/workflow + details/select instance + details/select job + details/Download file output. It is much faster than the ASM approach.
It would be nice if ASM could provide a function that worked in a similar fashion (it should download only what the caller requests). This way we could avoid slow download speed.
Many thanks in advance!
Best regards,
Attila Sasvari
Dear Attila,
Thanks for describing the problem, the next ASM release (3.4.5) will contain a method that returns InputStream coming from the servlet interface of Storage component (so, it does not use the slow web-service interface)
Cheers,
akos
Akos Balasko writes:
I will be on holiday until the 20th of August. In urgent case please, contact my substitute Dr. Robert Lovas (rlovas@sztaki.hu). In case of SCI-BUS, please, contact Zoltan Farkas (zfarkas@sztaki.hu) and/or Eva Feuer (feuer@sztaki.hu).
Regards,
Peter