Differences between revisions 2 and 3
Revision 2 as of 2011-01-30 15:09:17
Size: 5018
Editor: ?Hajo Krabbenhöft
Comment:
Revision 3 as of 2011-01-30 15:14:19
Size: 5906
Editor: ?Hajo Krabbenhöft
Comment:
Deletions are marked like this. Additions are marked like this.
Line 67: Line 67:
The plugin currently logs the job id's of all executed KnowARC grid jobs. This means we can freely re-use references to data because the user can clean up all his old grid jobs on demand. The user should be warned here, that this will invalidate any references and therefore should not be done while Taverna workflows are running.

In general, every invocation, independently of it's method, should be logged. Also, invocations should generally not clean-up any of their data, so that we can use references. When downloading a reference, we can after that delete the original file, because it's data is now stored inside Taverna.

This still leaves the problem on when to clean up past invocations. Should this happen after running the workflow? (Would make it impossible for the user to view unused output ports) When closing Taverna? Only when the user requests to do so? Timeout?

Notes on the discussion about the external tool at the Debian Med meeting during January 2011.

External tool invocation discussion

Ways of calling tools are described by "use case descriptions". The use case descriptions can be held in a registry. For example,

http://usecase.taverna.org.uk/sharedRepository/index.php

The KnowARC project developed a plugin for Taverna that can read the use case descriptions and allow you to include calls to the a tool in a workflow.

You need to know how/where to call the tool - the "invocation mechanism". There are currently three options for the invocation mechanism:

  • local
  • ssh to specific machines (uses Taverna's credential manager to keep the login information confidential)
  • on a KnowARC grid

In the current plugin, the setting for how to call the tool is shared by all external tool services in a workflow e.g. all the tools are run locally.

Planned improvement

Taverna will manage a set of invocation environments that are named and identified by a UUID, for example "fred" and "62A81F2F-4C3D-4C0C-ACF1-681327130328". (The UUID may be changed to being a URL.)

An external tool service in a workflow will state the invocation environment that the tool will be run in.

The invocation environments can specify the settings for the various invocation mechanisms and also which invocation mechanism is currently to be used e.g. ssh goes to phoebus.cs.man.ac.uk but to currently use local. So you can have some services using environment "fred", some "bob" and some "jim".

The invocation environment manager allows users to edit the settings and also change which mechanism to use. This allows you to easily change where a set of tools will be run e.g. to change all services using setting "bob" to run locally.

There is also a proposal to understand the idea of test and production invocation environments. So, "fred" can be set to run services locally during test and on a grid during production. The choice of whether to run a workflow in a test or production mode will be made when the workflow is run.

It is not clear how the choice of mode will be shown to the user

A workflow run may vary the data that it uses according to the run mode e.g. to use different data during test and production. There needs to be explicit support for this in the workflow but it is not clear how.

Ongoing issues

  • ssh equivalent for windows.

External tool description

There is an XML format for specifying the use case descriptions. We have also looked at:

  • acd as used by EMBOSS and ?SoapLab

  • Galaxy
  • ?BioPieces

  • GIMIAS
  • BOINC

Current plan is to be able to translate the EMBOSS acd descriptions and put them in a repository. Future work will be done to ensure that the external tool capability is sufficient.

Additional invocation mechanisms

For Taverna 2.3, we will include local and ssh invocation as part of the release. The KnowARC invocation will probably be made available as a plugin.

Need to look at cloud invocation soon.

Invocation environment checking

There is currently some limited ways of specifying what needs to be on the machine where a tool will be run, and also how to check if the tool can be run there. Need to look at more general ways of specifying this.

Sensible handling of data

Want to minimize the transfer of data. So data stays, where possible, with the tools that will use it. Also, tools are invoked where the data is.

Need to extend Taverna's data handling mechanisms to deal with this. Need to improve some of the invocation mechanisms to better decide where to run the tools.

The plugin currently logs the job id's of all executed KnowARC grid jobs. This means we can freely re-use references to data because the user can clean up all his old grid jobs on demand. The user should be warned here, that this will invalidate any references and therefore should not be done while Taverna workflows are running.

In general, every invocation, independently of it's method, should be logged. Also, invocations should generally not clean-up any of their data, so that we can use references. When downloading a reference, we can after that delete the original file, because it's data is now stored inside Taverna.

This still leaves the problem on when to clean up past invocations. Should this happen after running the workflow? (Would make it impossible for the user to view unused output ports) When closing Taverna? Only when the user requests to do so? Timeout?

Windows Use cases

All current use cases or any new use case without an explicit windows flag is to be considered linux/mac.

Use case invocation is steered in two locations:

Every use case activity has a tab in it's configuration for selecting two invocation environments, one for testing and one for production. The meaning of these invocation environments is specified in Tavern's configuration options and is therefore globally shared for all workflows.

Invocation environments are always used for linux use cases, which means that on a windows machine it should be impossible to select local invocation in the taverna-wide configuration dialog.

A windows-only use case has it's tab for selecting invocation environments grayed out, because all invocation environments are for linux only. So a windows-only use case will always use local execution. On a linux machine, this should produce an error when health checking or running the workflow.

A linux-only use case will have aforementioned tab for selecting, which will be used to choose one of the taverna-wide invocation environments. This makes sense on windows and linux machines, because even when running on windows, the user will not be able to create an invocation environment to use local execution.