Parameter Model - Second iteration
A second iteration on the proposed parameter model and mechanism is
given
here. This is based on the discussions of
September 2004 (see below), adding support for the capabilities we
discussed then such as output as well as input parameters, and the
use of parameter sets to represent general data entities.
DougTody - 2005-06-22 20:32
Parameter Model - A first discussion
The remainder of this page contains a listing of a e-mail
discussion on the parameter model which took place in Sep. 2004.
Minor editorial changes were made to get a better layout.
--
PrebenGrosbol - 21 Jun 2005
DougTody - 2004-09-21 18:44
Hi Folks -
As mentioned in the OPTICON telecon today, one of the things we are working on here is a parameter model, to be used to describe the parameter sets of "components" such as astronomical tasks or VO services (a "task" is roughly equivalent to a command in a CLI). Although it is jumping the gun a bit, this is a simple thing which is timely and which it would be useful to discuss within the context of our OPTICON working group.
The idea here is that astronomical software is packaged as components which can be called either from the host level, e.g., a Unix shell, or from some execution framework, e.g., using some technology such as SOAP or CORBA as the low level communications protocol. A parameter mechanism is needed to describe the parameters for a component. Parameter sets would be defined in technology-neutral XML and would be used for things such as parameter resolution, editing and display of parameter sets in a CLI, or binding components into scripting languages such as Python. To actually invoke a component the full parameter set would not be needed, rather a keyword table would be passed to the component, listing the parameters as simple keyword value pairs. This keyword table would probably also be formatted as XML and would probably be passed through the communications layer as a string blob when the component is executed, although this could depend upon the protocol used. The parameter mechanism is "science domain" (high level) and should be the same regardless of the technology used in the execution framework or CLI, which is part of the reason we don't propose using a technology such as CORBA IDL, WSDL, XML schema etc., directly to express parameters (another reason is that these mechanisms are too low-level for this purpose).
So assume for the moment that we decide we need parameters and parameter sets. What do we want for a parameter model? What are the attributes of a parameter? A first cut at this is given below. Comments are welcome. In particular, would something along these lines work for your current or future applications? The key issue at the moment is the model, rather than implementation issues such as how to express this in XML.
- Doug
Summary:
- A parameter set is a set (ordered sequence) of parameter objects.
- Parameter sets are normally associated with components (e.g., tasks or services), but stand on their own as data entities.
- Parameter sets are (probably) defined in XML.
- Parameter sets are used mainly on the client side (e.g., Python, or a GUI) for parameter handling and parameter resolution and limited verification when a command is executed.
- When a component is invoked all that is needed is keyword=value where the keyword is the parameter name (<pset>.<pname).
Parameter Model
The attributes of a parameter are as follows:
- Name: The parameter name (required). This should be a simple identifier, giving the parameter name within some namespace such as the defining pset.
- Type: The parameter data type (required). Initially the choices can be limited to "string", "int", and "float". Parameters are always full precision so there is no need to distinguish between int and long, float and double, and so forth. Other parameter types are possible, but these can be added later.
- Mode: The parameter mode (required). This can be either "required" or "hidden". Required parameters are those which should normally be reviewed by the user (e.g., given on the command line or at least verified) when a component is invoked. A common example is a data operand, e.g., the data object to be operated upon. Hidden parameters are things like control parameters for algorithms which can normally be defaulted and ignored.
- Value: The current parameter value (optional). There should be some means to specify an undefined value (no value, or uninitialized), an indefinite value (numerical value is indefinite), or for a string param, the null or empty string. The null string is not the same as undefined, rather it is a zero length string.
- Default: The default value for the parameter (optional). The legal values are the same as for Value.
- Maxlen: The maximum array length of the parameter (optional). Zero means an array value of any length is permitted, one means only scalar values are permitted, and N means arrays of up to N elements are permitted. If Maxlen is omitted the parameter is assumed to be scalar.
- Min: The minimum value for the parameter, if any (optional). For example, Type=int, Min=0 refers to the nonnegative integers.
- Max: The maximum value for the parameter, if any (optional).
- Enum: An enumerated list of values for the parameter (optional).
- Prompt: Short one-liner prompt string (required). A short prompt string to be issued by the CLI if it is necessary to prompt for the parameter value.
- Help: A paragraph of text describing the parameter, or a URI to be used to look up the help text for the parameter.
The Parameter class may have methods as well as attributes, but these are
defined by an implementation, e.g., a CLI, and not by the entity which
defines the pset itself.
DanielPonz - 2004-09-22 11:38
Dear colleagues,
The draft parameter model is a very good step on concrete points that we have to handle.
My understanding is that the model defined by Doug is applicable to input parameters, output results and physical constants used in a task. In this context, I miss two attributes: units and errors.
Although it is a tricky matter, here you have a first step to start discussing the issue:
- Units: optional specification of the physical units of the parameter. Proposed default is "unitless". One could be strict and use the normalized SI basic system, or follow a more practical approach using domain specific units.
- Error: optional estimation of the error associated to a parameter, expressed in the units of the parameter value (or relative error?). Default is "undefined" (no value).
The question of the units is an essential point. One could have a simple approach and use units just as labels to document parameter sets or be more ambitious and handle unit conversions, etc. In the second case, we could be strict and use the normalized SI basic system, or follow a more practical approach using domain specific units.
With respect to error handling, we could follow many approaches, but probably errors should be associated to parameter values as an attribute.
Best regards,
Daniel
PrebenGrosbol - 2004-09-22 18:18
Dear All,
Yes, I fully agree. If such a concept is not added we would miss the chance of improving the current situation. Actually, the hierarch is more like this (not including all attributes):
- Pset contains an ordered sequence of Parameter's
- Parameter contains Name, Value, Mode, ..
- Value contains Type, Data, ...
The Type can by 'string', 'int', 'real'. For 'real' Data, the Value also has Unit, Error, and ErrorModel where the latter could be something like 'Gaussian' or 'Poisson' It is unclear if one also should associate Unit and Error to 'int' type. If they are uncertain, they would better be given as 'real'
Several attributes in the Parameter are only relevant for input parameters and should possibly be separated out to have a more basic class which describe any parameter in the system (both input and output).
Best regards,
Preben
DougTody - 2004-09-22 20:36
BTW, I am on travel starting tomorrow hence I will be out of the loop for a while should this discussion continue.
There appear to be two issues here:
- Do we want to pass data (such as a fit of a numerical model) via the parameter mechanism, or just control parameters?
- Do we want to use the parameter mechanism for output as well as input?
I was focusing primarily on an input mechanism to control the operation of a task at a high level, but it is clear that both Daniel and Preben are thinking of both numerical data and output as well. Clearly we do need to consider out what to do about output via the invocation mechanism (i.e., other than writing to external storage).
Probably any data model can be expressed as a set of keywords, so in principle we could use a common parameter mechanism or keyword table to encode data entities for both input and output. This would provide a simple alternative to passing around structured entities, would provide a simple way to get data from the task back to the client or SL, and would work fine for many simple cases. In this case I agree we would need to model numerical quantities as well as simple control parameters, hence we would probably need to include both units and errors.
As Preben points out usage is different for input and output. Hence we probably need to model these slightly differently. Plus they would need to be handled differently at the level of a binding into a scripting language or CLI. For example, the default input pset for a task is normally persistent, global and shared, whereas any parameter output block from a task would want to be returned only to the caller. If we try to pass data via a parameter mechanism then probably the amount of data passed will need to be allowed to vary, which is another difference from the input parameter set.
So - I agree we need to think about how to handle 1) and 2) above, and it could be useful to provide some support for this in the parameter mechanism. If for the moment we consider only the problem of INPUT parameters, is the proposed parameter model adequate? - Doug
DougTody - 2004-09-22 18:51
Hi Daniel -
These are good points (especially about units), but there may be better ways to handle the more general case, which can get complex. We would like to keep the basic parameter mechanism simple.
The parameter model is based mainly on what is needed to invoke a task from a CLI. One wants to keep it simple enough that a user can easily invoke a task with parameters on the command line (of course tasks or other components can be called in other ways but CLI usage is an important design driver). Hence, parameters are mainly for input, and want to be kept fairly simple.
In a simple CLI what we have to work with is basically keyword=value or just a positional value, hence units would normally be fixed by the definition of the parameter (or a string value could be used with some simple syntax such as 1024k). However, even if units are fixed it would be useful to specify them in the parameter specification so that the user interface could do something with this information. For example they could be displayed for the user, or a sophisticated client interface might even implement unit conversions.
In my experience the need for errors on input parameters, as opposed to data, is rare: do we have any use cases for this? Things such as errors are relevant mainly to numerical models of some sort, e.g., the result of some fit. Simple cases could be handled by using several associated parameters to pass the information. Otherwise it could be difficult to deal with in a CLI.
If a task takes a nontrivial numerical model for input, or generates one as output, things can get complicated and it might be better handled using a different mechansim, e.g., reading and writing a data entity of some sort. The data entity could be stored externally, in a file, table, or database, passing only a reference to the entity as a parameter, or it could be formatted as a string and passed in as a parameter. For example, a numerical model could be serialized in XML, or some simple string format such as CSV, and passed through as a string parameter. Another alternative might be to use a separate pset for the numerical model and pass the information in that way; this would avoid the need for a special mechanism to parse the information (the keyword table mechanism allows multiple psets to be passed in to a task).
Parameter sets can be output or updated by a task but this is problematic as a way to return information from a task. It is probably better to either use messaging or to write to a file or database. This is an important problem but probably a separate one.
- Doug
BillPence? - 2004-09-22 21:05
Hi Doug,
Here are a couple comments on the draft parameter model:
- In addition to the 3 parameter types of "string", "int", and "float", the "boolean" type (True/False) is also widely used and should be supported.
- It may also be useful to have a "filename" type as a particular type of "string", but this could be added later. A GUI could then check if the file exists before starting the application program.
- The IRAF parameter files have the concept of a "learning" mode, i.e., whether or not the currently entered value should be remembered the next time the application is run. Does this concept also apply to this parameter model? If so, are you assuming that "required" parameters are always learned, and "hidden" parameters" are not?
Bill
DougTody - 2004-09-22 21:59
Hi Bill -
I didn't know you were on this list, I am glad to see this is the case.
>
Here are a couple comments on the draft parameter model:
>
>
- In addition to the 3 parameter types of "string", "int", and "float", the
>
"boolean" type (True/False) is also widely used and should be supported.
I agree - there is no standard way to handle this without making it a primary type, so we should add it.
>
- It may also be useful to have a "filename" type as a particular type of
>
"string", but this could be added later. A GUI could then check if the file
>
exists before starting the application program.
Maybe; there are issues, and maybe there is a better way to provide this kind of verification. There are a number of interesting but exotic possible parameter types like this (other important cases are the "interactive" parameter types like keystroke events and cursors, and list-structured parameters). Since new parameter types can be added easily later I am inclined to leave out anything exotic in this first version.
>
- The IRAF parameter files have the concept of a "learning" mode, i.e.,
>
whether or not the currently entered value should be remembered the next
>
time the application is run. Does this concept also apply to this parameter
>
model? If so, are you assuming that "required" parameters are always
>
learned, and "hidden" parameters" are not?
Yes, learn mode applies here. The thing about learn mode in IRAF parameter files is that I don't think we ever had a case where it wasn't turned on. Essentially all "required" parameters are learned, with the last value entered being the prompted value the next time around. It is a popular user interface feature. We should think about this more carefully but the real question is whether there are cases where this should be turned off.
- Doug
BillPence? - 2004-09-23 22:05
Our large package of ftools software uses IRAF-style parameter files, and there are about 40 tasks that have required parameters that are not learned (they have mode 'q' instead of 'ql'). There are 2 main classes of these parameters:
- a boolean parameter with a prompt similar to "Are you really sure you want to delete this file??". Even if the user replied 'Yes' before, the default value should always be 'No'.
- string parameters whose value is invariably unique each time the task is run, such as the name for the output file, or a processing history comment string to be entered by the user.
Thus I think there is a definite need to be able to specify whether required parameters should be learned or not.
On the other hand, I don't know of any cases where it is desirable to have 'learned' hidden parameters, but it is probably better to allow for this case as well.
Bill
DoutTody? - 2004-09-25 16:47
Hi Bill -
Good point, I agree. I think we have other code which uses this feature as well.
I don't think that hidden parameters should be learned as this would prevent temporary command line overrides of hidden parameters from being used without changing the hidden parameter value, which is not what is expected.
- Doug
to top