Distributed Multiwavelength Data Analysis Scenario
An increasing amount of astronomical data is stored in digital archives
which are distributed around the world. Analysis often involves
combination and comparision of newly obtained data with existing data
from such archives.
In this scenario the user analyzes data from multiple wavelength
regimes, with data from different sources often varying considerably in
representation and characteristics. The data to be analyzed may be either
local or remote. The data collections to be analyzed may be of any size.
The analysis is driven from the user's workstation and may include software
written by the user, such as analysis scripts or algorithms.
In this case the following capabilities are needed:
- Common software infrastructure. This is needed at several levels,
e.g., standard data access services to serve up data from archives,
and a standard data analysis environment to make use of such services
and permit integration of data analysis or user interface components
from multiple sources.
- Location transparency. Ideally the user should not have to care
whether the data to be analyzed is stored locally or in a remote
archive, with the same tools available for analysis in either case.
Due to the exponential growth in data volumes, in the future most data
access will be to remote data. Due to limited network bandwidth this
may require moving some of the computation to where the data is stored.
- Scalability. It is impractical to write new software for different
computational environments. It should be possible to use the same
software transparently on a workstation, on a cluster, or on the Grid.
Scalability is required to be able to deal with large data volumes.
- Data mediation. Data from different sources or wavelength regimes,
produced at different times, is generally complex and heterogeneous
and may differ in both content and representation. Active mediation
is required to make it practical to combine large amounts of data at
analysis time. This includes subsetting and filtering, data model
transformation, and reformatting to a standard intermediate data
representation. In the case of data access via the VO most of this
is handled by the VO infrastructure, but the client still needs to
deal with the data which is returned. Even ignoring VO concerns,
data model mediation can be an issue when mixing software
components from different systems.
--
DougTody - 24 Aug 2004
to top