|
||||||||||||||||||
|
|
Metadata Watch Report #1[ contents | section 1 | section 3 ] Section 2 - Top-Level Synthesis2.1 Common Themes This top-level synthesis of the SCHEMAS sectoral Metadata Watch (MD Watch) reports provides an overview of the findings of the first quarterly MD Watch report. It highlights the common threads running through the world-wide metadata scene as well as key differences among the sectors. The sectors covered in the sectoral MD Watch reports are as follows:
The themes common to the metadata-related activities in the above sectors are similar to those encountered throughout the field of information and communication technology, and include:
Clearly, from reading the list of sectors (above), there is significant scope for overlap. For example, if a commercial firm creates an education and training package that specifically makes use of metadata and schemas in a novel way, does that fall into the Industry category or the Education and Training category? If an audio-visual product is comprised of cultural content, does that fall into Audio-Visual or Cultural Heritage? To solve this problem, the SCHEMAS correspondents allocated in advance the various activities they were to cover. In those cases in which it was unclear in which category an activity should be placed, a judgement was made based on a close examination of the activity.
A number of sub-sectors, such as transportation and logistics, did not appear to be sources of activity in the metadata field. That, in itself, is information. Taking the case of transportation and logistics, it could be that (a) there are metadata-related activities taking place, but they are not easily locatable, and may appear in the next quarterly MD Watch report, or (b) players in the transportation and logistics field are participating in metadata-related activities that fall into other categories.
2.4 Content is Content – or is it? Content is content, and nobody denies that content needs metadata to be truly useable, especially with the current proliferation of digital content. However, different sectors deal with different types and amounts of content, and this is one reason for the divergence – both qualitative and quantitative – in the activities across the various sectors. For example, the Industry sector is populated by activities focusing on business information systems for commerce, especially business-to-business commerce on the Web, i.e. the supply chain. Schemas for purchasing, bills of lading, and invoicing abound in this sector. Also quite common are schemas related to the internal business processes that any firm undertakes – human resources, employee records, and salary administration, for example. The above can all be represented perfectly adequately in text (alphanumeric) format and, as one would expect, activities in the Industry sector focus primarily on the description of textual information. On the other hand, the Audio-Visual sector is quite different. Here we are dealing with staggeringly large amounts of information, only a portion of which is textual. In addition, the means by which this content is distributed varies greatly, e.g. terrestrial broadcast, Internet, and CD- or DVD-ROM. Furthermore, while the technical infrastructure to search, sort, and tag alphanumeric content has been around for decades, the techniques for searching directly on "multimedia" content, e.g. sound, still images and moving images, are non-existent or nascent at best. So the two sectors have quite different problems to solve based on the type of content and the media through which that content is disseminated. Other differences exist as well, as the issues of rights and security can illustrate. The question of who owns the intellectual property rights to an invoice used in business-to-business commerce is not a very important one, but security is extremely important. When considering the online payment of that invoice, security becomes even more important. In the Audio-Visual sector, however, the issue of rights is probably of the highest importance – for example, the potential damage caused from a single broadcast programme being "lost" or illegally intercepted pales in comparison to the potential damage caused from a single business communication being lost or illegally intercepted.
2.5 Convergence and Cross-Functionality The emerging problem (or opportunity) for all metadata specifications of all types, especially those in the Publishing, Industry, and Audio-Visual sectors, is that they tend to end up covering more or less the same ground. The definitions of product and market are becoming hazy in the world of "physical" product. Book publishers release audio and video materials. DVDs include audio, text, visuals, audiovisual. Serials, magazines, news all come now in all media types. The conventional divisions neatly represented by physical content types and their identifiers do not apply to metadata schemes which must increasingly embrace all forms of creations. The fact that one sector is more biased towards "text" or "visual", one more towards "audio" and another more "audiovisual" stuff is not much of a useful distinction when it comes to designing metadata systems in which all must be well described irrespective of their predominance or otherwise, and pretending a recording is a kind of book, or vice versa, as corporate and library systems once did, is no longer adequate. In addition, metadata is becoming multifunctional. For example, all of the major record companies are currently engaged in establishing their own corporate international databases. It is a reasonable assumption that these systems will be designed to support in due course the requirements of marketing, "label copy" for product packaging, Web metadata, rights and royalty management, and sales, as well as incorporating business rules, and the data in them will be derived in part directly from production workflow systems.
The various sectors differ not only in the level of activity going on within them but also in the level of co-operation shown by those activities. The Geographic Information sector shows relatively low levels of co-operation across activities, though it does show a high level of organisation in the sense of division of labour, i.e. the various activities are cleanly divided along traditional GI lines – geospatial data, hydrographic data, geological data, etc. By comparison, the Research sector shows a high degree of co-operation among activities, possibly due to the well-established tradition of international co-operation that has characterised scientific endeavour this century, with activities often building on the results of other activities and ensuring that various concurrently-developed projects remain compatible. Co-operation also depends on the sometimes financially-motivated politics that characterise a particular sector. Where commercial gain can be affected by the outcome of standards activities, such as are described in the individual MD Watch Activity Reports, openness and co-operation is often the first victim. In the Industry and Audio-Visual sectors, for example, players must reconcile conflicting motivations – co-operation can benefit all players and can create commercial opportunities where none had previously existed, but it also runs counter to industrial notions of product differentiation and secrecy. In the Academia, Research, Cultural Heritage and Geographic Information sectors, on the other hand, commercial gain is less of a factor, and therefore less of a barrier to co-operation.
Despite talk of globalisation, including the "globalising" effects of the Internet, geography and language still matter. In fact, geography and language are growing in importance as barriers as content originating from increasingly diverse locations comes into contact with content consumers from increasingly diverse locations. Metadata sets, for example, are still only translated into a handful of languages, if at all. Registry sites are often only in English. And the vast majority of metadata-related activities take place in the same two dozen or so countries, i.e. the countries comprising Europe and North America, Australia, Japan, and a handful of others. But co-operation in metadata-related activities, when it exists, is strong among these countries, and the need for multilinguality is widely recognised, if not widely yet acted upon. The information industry is by now accustomed to US domination in a variety of areas such as Internet services, personal computers, software and operating systems, and servers, to name a few, and American activities in some sectors do tend to focus on American players and their needs. However, this is often in the context of US government or military activities, which, in those cases, is to be expected.
2.8 "Bright Lights" and the Relative Level of Development of the Sectors A few metadata-related activities stand out. Resource Description Framework (RDF) and Dublin Core (DC) are two of the most notable. They are widely accepted from a philosophical standpoint, enjoy the support of a number of activities, whether implicit or explicit, and co-operate with each other. The Publishing and Audio-Visual sectors have their own short lists of well-organised initiatives that, taken together, cover large sections of their respective fields, and the Education domain has four dominant metadata initiatives. The other domains have fewer, if any, "bright lights" leading the way, culminating with the Industry sector which is characterised by inward-looking activities focusing on a small sub-sector of Industry and even, sometimes, competing activities that seem to take no notice of each other.
Metadata-related activities can be divided into two classes: "old school", or pre-Web, and Web-oriented. The majority of the working, tested and viable software tools fall into the former category. There, the terms metadata and schema are generally applied to SQL databases, especially groups of SQL databases which conform to different schemas. What is needed here are tools that combine the quality of the old school with the Web-awareness of the "new school" into products that can work transparently with content in relational databases as well as Web content at the same time. Although a number of sectors are heavily Web-focused, it would be a mistake to consider SCHEMAS to be a Web-oriented effort. Although Internet and Web protocols can be transmitted using almost any physical medium, e.g. terrestrial broadcast, various types of telephonic transmission, and amateur ("ham") radio, much of the content upon which various metadata-related activities are focused is not Web-based, nor will it be in the near future. The Audio-Visual sector addresses much, of not most, of this type of content.
Because we believe that processes should, where possible, be automated and because of the ever-increasing amount of content with which we will have to deal, we want machines to do things for us wherever they can. In the domain of metadata, that means we want machines to do things like understand schemas, transform data between schemas (i.e. understand data conforming to one (source) schema as data conforming to another (destination) schema), and find and retrieve schemas for us. This is related to what Tim Berners-Lee calls the "semantic Web" or "the Web for machines". This idea of machine-readability for metadata sets and schemas, of automated tools to tag content, and, ultimately, of a machine-readable universe of metadata associated with content of all types, is lacking in the activities described so far in the MD Watch. In some cases, machine-readability is probably assumed; in others, it is probably too early in the process to consider machine-readability. However, other activities run the risk of failing to take machine readability into account and having to do some significant catch-up work later. In the case of one registry in particular, the schemas are even difficult for humans to read. Machine readability, while not impossible for a clever programmer on the user side to implement after the fact, was apparently not taken into account.
[ contents | section 1 | section 3 ]
Maintained by: UK Office for Library and
Information Networking (UKOLN)
|
|||||||||||||||||