Report on Breakout session Two: Sharing and declaring schema

The participants introduced themselves and their relevant projects that cover a wide spectrum of applications and domains where metadata registry activities and schema developments are taking place (e.g. geology, steel engineering, environmental information systems, telecommunication, speech technology, multilingual dictionary and terminology resources, machine translation, library engineering, information brokerage, multimedia/audiovisual archives, educational information, geographical information, agricultural information, administrative information, etc.).

Given the broad range of projects covered, iIt is not surprising that we do not mean the same by terms such as 'registry' and 'schema'.

Sharing schemas is often, but not necessarily, linked to aligning schemas. There will be different reasons and motivations to cooperate with other projects and initiatives. Is it a nice idea or a real requirement to share schemas? Usually it would not be altruistic reasons but rather concrete expectations, e.g. within the same domain, a cooperative network, a global company, etc.

But different user groups have different reasons to need schemas and different requirements to schema coordination (re-usability, eCommerce, Intranet-based knowledge management, dissemination of public information, etc.). The need to publish a schema sometimes comes from people other than the schema designers themselves. For example, searching across resources can be more precise if schema information is available.

Basically it is recommended in DC applications to publish one's schema, especially with subelements and value sets (e.g. classification systems), also to give input for DC namespace registries.

DC is not so much specified so there are many different ways of using it (even cataloguing rules are applied in slightly different ways).

The expectation is that the SCHEMAS project will a real overview of metadata projects. Precise descriptions are needed in order to find what we need, with a good search facility, e.g. for one-stop-shop tools for cross-searches – to find out about available metadata sets.

Historically speaking, data documentation on numeric databases was the origin of metadata initiatives (describing the methods of how the data were gathered). Nowadays, with DC being so popular, it is mostly bibliographic metadata embedded in web documents that are discussed. But the numeric database community has had its own discourse on "metadata" for the past 30 to 40 years, the focus of which was documentation to help reuse of data. This is quite different from the more recent "metadata" movement, with its emphasis on descriptive metadata embedded in Web documents. We need some kind of continuity and cooperation between the numeric data communities on the one hand and library communities on the other hand, as well as integration across domains. There has always been something of a gulf between library and IR people on the one hand, and database people on the other.

In cross-domain applications, can we really use the same schema, or do we have to adapt them, since we might use them in different ways? How can we manage this natural variation, difference, diversity in a context of standardization, harmonization, interoperability, sharing?

Metadata are used for different purposes, and a registry is a purpose of its own. Semantic equivalence can only be established in a concrete context and for a certain purpose. Within a controlled environment (e.g. MARC), it is possible to control the use of defined terms and the correct application of rules. The relationship between elements depends on context – for example, the "Subject" field of email and a newspaper headline might be equivalent (as "titles") for an application of fuzzy searching. Registries, therefore, should state their purpose (e.g. "fuzzy searching").

The advantage of sharing schemas is the opportunity of not re-inventing the wheel. The achievements of STEP (exchange for product model, data since 1984, creating large data repositories, later it should be used for knowledge bases) should be used in current metadata developments. In the early days, it focused on physical file exchange, then it was seen that merging data entails more advanced considerations. The "application protocol" of STEP makes the same distinction between definition and reuse as DESIRE profiles.

There is a certain anarchy in XML and some authority is needed, but the question is, who can and should control it, in order to make sure that interoperability can be reached. More standardization is needed, so the ISO/IEC framework (primarily JTC 1 and TC 154, but also TC 46 and TC 37) should be used more intensely (although procedures have to be speeded up in some of these committees).

STEEL-ML is a markup language for steel construction and steel trading. OASIS is a start at cataloging such approaches. XML.org takes a very broad approach, covering just about everything.

Most people search at a generic level.

Before aligning schemas the domains concerned with their requirements and semantics have to be studied. Interoperability across different domains and applications can be reached on a rather generic level. For purposes of internal corporate knowledge management, mapping different metadata sets will have to take place on a more specific level.

According to the dictionary analogy, a registry mainly includes terms and definitions (a purely "semantic" registry). Such semantic 'standards' should be separate from syntactic standards (like it is done in EDIFACT). For example, the semantics of "date" have nothing to do with ISO 8601.