|
||||||||||||||||||
|
|
Metadata Watch Report #1[ contents | section 1 | section 2 ] Section 3 - Domain reports3.1 Industrial sector 3.2 Publishing sectorPublishing metadata: Babel backwards For an accurate model of the current state of metadata development, the publishing business can look to one of its earliest examples: the story of the Tower of Babel. In this digital version, though, the plot is reversed: the participants have started out with a multitude of languages, only coming later to the dawning realization that they are building the same tower to a multimedia heaven (or hell – that part is unclear). Strong initiatives are coming out of most of the traditional content sectors: books (EPICS, ONIX), recordings (RIAA, IFPI), audiovisual (MPEG, SMPTE), copyright (CIS), news (NITF, NEWSML ), magazines (PRISM), academic journals (CROSSREF); and from the emerging e-books business (EBX, EBOOKS). (Although audiovisual is dealt with in another Metadata Watch report, it is impossible to ignore it entirely in this review). At the same time, metadata activity is being stirred up around the main members of the "ISXX" group of creation identifiers under ISO Technical Committee 46: ISBN (books), ISRC (recordings), ISSN (serials), ISAN (audiovisuals), ISWC (musical works) and the fledgling ISTC (textual works). These developments are normally related to but not necessarily integrated with the initiatives already mentioned. The International DOI Foundation (IDF), adopting (like EPICS) the INDECS metadata framework, is approaching metadata from a multimedia perspective at the outset. Convergence: format wars? The emerging problem (or opportunity) for all of these specifications is that they all finish up covering more or less the same ground. The definitions of product and market are becoming hazy in the world of "physical" product. Book publishers release audio and video materials. DVDs include audio, text, visuals, audiovisual. Serials, magazines, news all come now in all media types. The conventional divisions neatly represented by physical content types and their identifiers do not apply to metadata schemes which must increasingly embrace all forms of creations. The fact that one sector is more biased towards "text" or "visual", one more towards "audio" and another more "audiovisual" stuff is not much of a useful distinction when it comes to designing metadata systems in which all must be well described irrespective of their predominance or otherwise, and pretending a recording is a kind of book, or vice versa, as corporate and library systems once did, is no longer adequate. A key role is currently being played by e-tailers seeking coherence in the way in which metadata is delivered , and they are increasingly multimedia in their product ranges: Amazon.com promotes "books", "music" and "DVD and video" without priority. Amazon played a key role in shaping the international solution for ONIX, the most significant recent development from the book sector, and the discussion has now moved quickly on to the possibility of extending ONIX to music and audiovisual products. But the underlying driver is of course digital delivery. The explosion in the use of unlicensed MP3 files and the record industry’s collective response through the Secure Digital Music Initiative (SDMI) has most publicly breached the dam; but all sectors have been wrestling with early methods of securely and profitably providing content on the Web. The functional specifications for these tend towards an obvious commonality: in metadata at least, the sectoral Chinese walls are collapsing. SDMI, though nominally for "music", must recognise data files of any type. MPEG, nominally for "Motion Pictures", does the same. The "book" industry dictionary EPICS (the "parent" of ONIX) is structured to accommodates any media type to any hierarchical level of content. From different start points in traditional sectors, all major initiatives are being forced to address the same multimedia range of content. Multi-national, multi-lingual, multi-functional To stretch matters further, retailers like Amazon, HMV and Barnes & Noble, and "publishers" like Bertelsmann, Warner and Sony are not just multimedia, they are multinational and multilingual, and so their metadata requirements are inevitably moving in that way too. Language and territorial dimensions are becoming the norm in data specifications. Even more importantly, metadata is becoming multifunctional. For example, all of the major record companies are currently engaged in establishing their own corporate international databases. It is a reasonable assumption that these systems will be designed to support in due course the requirements of marketing, "label copy" for product packaging, web metadata, rights and royalty management, sales and more besides, and the data in them will be derived in part directly from production workflow systems. It is also a reasonable assumption, without betraying any corporate secrets, that similar changes are going on in "publishing" business in all sectors. The need for definitive product and rights data, sourced once and ultimately digitally protected, is gradually dawning as a corporate imperative: it is a sign of the rapidly-changing times that the International Federation of Phonographic Industries, custodians of the metadata-less ISRC, has recently appointed a full time "Metadata Executive". Consequences of multi-functional metadata As publishers plan and build systems where metadata will be called upon to do more and more, the specifications for industry standards are expanding rapidly in their complexity. The EPICS data dictionary has grown far beyond the scope originally envisaged when it began as the more modest "Title Information Project" in 1998. The SMPTE (audiovisual) Data Dictionary has expanded similarly, and the editors of both of these will tell you that there are still major areas, especially in the description of rights management, which are at present dealt with in the most cursory manner, and that many of the multimedia issues are no more than doors or hooks left open for future expansion. The EPICS/ONIX pairing illustrates another characteristic we will see a lot more of as a result of multifunctional metadata: families of specifications. In this case, the EPICS Data Dictionary is the wider resource of which ONIX is one subset, expressed in a specific (XML) markup format. Is this simply "scope-creep", lack of definition or over-ambitious design? Is it not possible to "keep metadata simple"? It seems not. Just as the MARC cataloguing format has grown into a catalogue in its own right, so now commercial industries are addressing even wider description requirements and finding there are no adequate quick metadata fixes. The SDMI initiative’s experience illustrates the point. The original draft functional specification included the identification of general metadata requirements, for both descriptive and rights purposes. It rapidly became a monster, requiring SDMI-compliant technology to interpret descriptions of any kind of content subject to any permutation of business rules, in an environment where creators and rights owners do not yet even have established unique identities. Fairly rapidly, all but the most basic metadata requirement has since been pushed out of the SDMI specification, relying on embedded identifiers to provide the links to metadata in other systems. While the consequences of this complexity for a specific sector are daunting, the impact of the convergence of competing metadata schemes is even more of a concern. While organisations like Amazon.com are making it known that they are intent on having a single metadata delivery format, the sectors that supply them are preparing a whole set of not necessarily compatible schemas. It might be argued that some activities in the "publishing" sector fall outside this convergence. News media and academic journals, for example, surely belong in relatively integral domains, and might proceed to develop their own vocabularies and interchange specifications with relative freedom? Analysis of the functions and content of these, and their overlap with other content types, suggests this would be a brave assumption. The news media, for example, benefits (or suffers) from the fact that it is critically dependent on rapid and accurate data interchange through increasingly complex supply chains, and nowadays in all forms of media. This has led to the early and widespread use of standard mark-up formats throughout the industry, and a tradition of industrial standard-setting. New XML versions of these have appeared (NITF, NEWSML) the latter grappling with the fuller implications of multimedia content. The impact of rights In any case the question of rights metadata makes the suggestion of "safe havens" for content metadata irrelevant. It provides is the functional requirement to end all such, and it is waiting in the wings, likely to make its main entrance and in all probability dominate the commercial metadata stage in the coming few years. "Digital Rights Management" (DRM) has become a buzz-term, but it is somewhat misleading. DRMs at present are generally concerned with content protection and delivery, and pay little or no attention as yet to the underlying complex rights transactions which attend digital dissemination and manipulation. Early flirtations with generic formats in initiatives such as INDECS and the recently announced XrML from Microsoft/Xerox are only the stage-setters. MPEG7’s IPMP work has, rightly, pushed the issue out of scope and deferred to an as-yet non-existent framework of rights metadata. The next year will see some serious and heavyweight activity towards directly implementable rights metadata systems. The consequences for descriptive metadata are potentially serious, because most important descriptive terms (contributors, links, formats, events of creation and publication) are loaded with legal implications in the right context – they are, in fact, essential parts of rights statements and agreements. In the CIS community, for example, Author and Publisher are implicitly rights owners. Indeed, it is actually irrelevant in rights management whether someone really was the author, provided that the "real" author agrees they were (or is sufficiently unwilling to sue, or is dead). Lennon and McCartney are each credited, by agreement, with "co-writing" many of each other’s songs to which they in reality contributed nothing. This is just one of the more famous examples among (literally) millions where commercial interests have framed bibliographic reality. The draft MPEG7 IPMP metadata specification makes it clear that no terms used as MPEG descriptors may be deemed to have any legal implications: the first of many such "disclaimers" with which metadata schemas will have to deal. The ability to recognize "bibliographic" relationships alongside parallel "legal" relationships will add a general further level of complexity in due course. The role of identifiers Identifiers play a central role in most developing publisher schemes. The more recent identifiers come with their own mandatory metadata, and it is a measure of the importance of creator/publisher allocated "ISXX" identifiers that the ISO TC46 subcommittee responsible for them (sc9) intends to clear its decks of all issues not related to identifiers. ISBN (books) has no formal metadata, but the Books In Print and similar trade bibliographic services provide a practical metadata context for ISBN which has been the backbone of the book industry supply chain since the 1970s. In stark contrast with books, the recording industry, though centre-stage in digital delivery, has no tradition at all of collective or standardized metadata. The ISRC has functioned for over a decade with no related metadata or registration database, which means a lot of them are issued but nobody else has any idea what they are (your PC can read the embedded ISRCs from your CD, but it will be none the wiser if it does). As the recording industry’s focus shifts from the bar-coded album to the digital track, ISRC will become a hugely important identifier, but not until it is associated with metadata which enables its discovery and verification. ISSN (serials) has metadata, though exactly what an serial is requires some clarification in the digital age. ISAN (audiovisuals) and ISWC (musical works), the new kids on the block, have mandatory core metadata, and if ISTC (textual works) makes it to the start line it will follow suit, with metadata drawn almost certainly from the EPICS dictionary. Possible multimedia solutions Several developments are worth watching as they may provide some ways to avoid the brewing multimedia confusion. The MPEG21 initiative is the most ambitious "umbrella" framework proposal for bringing all technical and metadata standards together in an integrated program. MPEG’s track record and strong support in technology sectors. The International Digital Object Identifier (DOI) Foundation (IDF) is now deploying an approach to content description and management which implements several potentially valuable principles: the notion of interoperable and overlapping content "genres", the "declaration" of kernel metadata independent of , and "actionable identifiers". The DOI structure allows it to act as a "meta-identifier working in parallel with other more limited Its first implementations include CROSSREF. The INDECS framework (of which EDItEUR, IDF and IFPI are members among others) provides a high level generic model and vocabulary which acts as a framework for the development of interoperable schemas. It is especially focussed on the integration of rights and descriptive metadata, recognizing "events" as the underlying common denominator. Finally, the ONIX International initiative under the EDItEUR umbrella has emerged since April as perhaps the most likely candidate to provide a true multimedia content interchange standard (or set of specifications). It is a measure of the fragility and rapidity with which events are moving throughout this sector that it looked unlikely as recently as March that there would even be a single agreed EPICS/ONIX specification for the book industry, let alone now the possibility of its extension to its audio and audiovisual product relatives. The agreed spec represents a healthy marriage of American market-driven urgency and European analytic thoroughness without ultimately seriously compromising either, so hopes are high for its widespread adoption and success within "the industry formerly known as the book industry" at least. A number of intermediary organizations, include MUZE Inc who are pioneering the use of RDF in this sector, are committed to ONIX compliance. Schema mapping/registries As elsewhere this is in its infancy in the publishing sector. The most public activity is the MPEG7 medmet initiative, which has begun with the relationship between the SMPTE and Dublin Core sets. At this stage it would be best to describe this as experimental. >>Section 3.3 Audio-visual sector [ contents | section 1 | section 2 ]
Maintained by: UK Office for Library and
Information Networking (UKOLN)
|
|||||||||||||||||