|
||||||||||||||||||
|
|
Metadata Watch Report #5[ contents | section 1 | section 2 | section 3 | section 4 | section 5 ] APPENDIX C: Domain report: publishing sectorCorrespondent: Laurie Causton, Clearbay Limited Current state of domainThe IPTC's (www.iptc.org) 2001 Spring Meeting was the first full session since the formal adoption of NewsML (version 1.0), and related metadata issues formed a large part of the discussions. While there was no call for changes to NewsML metadata, a number of issues were seen to need further work - the handling of updates; digital rights; transport mechanisms; metadata structuring standards; standardising physical metadata; elimination of indirection; and formulating a policy on extensibility. Moreover, there was a need to ensure continuing metadata compatibility between NewsML, NITF and the IIM, and it was agreed that there should be co-operation with other organisations involved in the development and application of metadata. January saw a number of proposals for changes and additions to the IPTC's NITF (News Industry Text Format) (www.nitf.org). Overall, the IPTC has been taking a hard look at overlaps and the need for collaboration (more on this in the next section). CISAC (www.cisac.org) has been making progress in the implementation of CIS (Common Information System), deciding on the creation of a new authority within CISAC, the "CIS Supervisory Board" to serve as the administrative council for the CIS. Full details are being worked out, but the objective is to simplify and streamline the administration of CIS, with CISAC itself becoming more of an agency managing the technical standards. Santa Fe is dead; long live OAI - the Santa Fe Convention of the Open Archives Initiative has been discontinued. Attention is now focussed on the Open Archives Initiative Protocol for Metadata Harvesting (www.openarchives.org/OAI/openarchivesprotocol.htm), designed to offer an application-independent interoperability framework for use by communities engaged in publishing content on the Web. In April the PRISM Working Group (Publishing Requirements for Industry Standard Metadata, www.prismstandard.org) announced the release of version 1.0 of the PRISM metadata specification, providing a metadata vocabulary for print and online publishing. PRISM is a good example of the collaboration and the moves to eliminate redundancies and overlap that can be seen these days - more than 20 diverse organisations were involved in the specification, and it builds on existing standards such as the Dublin Core and RDF. Moreover, the Working Group is developing style sheets to make PRISM metadata interoperate with complementary standards such as NITF and NewsML. In November 2000 the Association of American Publishers' (www.publishers.org) Open Ebook Publishing Standards Initiative (www.doi.org/ebooks/doi-eb.html) had recommended the adoption of Digital Object Identifiers (DOIs) as the primary identification system for managing the metadata associated with the development of eBooks. The IDF itself has been active in this area, with the first international DOI-EB (DOI for E-books) meeting held early this year, following the formation of the DOI-EB Working Group. Metadata seems to be enjoying increased attention in the IDF, a point underlined perhaps by its recently appointed Business Development Director, Stephen Mooney: "The persistent identification of intellectual property entities combined with interoperable metadata is the key to effective and efficient commercial rights transactions. DOI is focused on exactly this space..." The issue of intellectual property entities for multimedia is in fact the basis of another recent initiative in the IDF, a study into a Rights Data Dictionary, which will be based on the <indecs> framework (www.indecs.org) and which will be known as <indecs>2. The DOI's namespace for metadata management will be donated to the consortium developing the RDD. There was a good deal of discussion of metadata Application Profiles in the last Metadata Watch report, and now the IDF has come up with its own (a DOI-AP), described as "the functional specification of an application (or set of applications) of the DOI System to a class of intellectual property entities that share a common set of attributes" (http://www.doi.org/doi-ap.html). A DOI-AP is intended to enable implementation of an application (or set of related applications) in a particular environment, ranging from relatively simple discovery to complex e-commerce and rights management applications. It is partly defined in terms of a metadata schema that is always a superset of the DOI Kernel metadata schema. All DOI metadata schemas will be based on the DOI Namespace (DOI-NS), a project that is currently in hand to support the development of DOI-APs. This means that all DOI metadata will be fully compliant with the <indecs> analysis. And further to eBook developments, the EBX Working Group (www.ebxwg.org) formally combined with the Open eBook Foundation (www.oebf.org) in March, with a plan to concentrate their efforts toward the efficient development and widespread adoption of electronic publishing standards. Activity now seems to be concentrated in the OeBF, judging by the respective web sites - that of the EBX WG is now static. The OeBF has now prepared its Requirements Portal to gather participants' requirements for all aspects of e-publishing. The idea is that the requirements thus collected can be debated and considered for incorporation into the e-publishing standards to be developed by the OeBF. Lastly, in February Final Draft Standard 15707 for the International Standard Musical Work Code (ISWC) was distributed to ISO's member bodies for voting and approval, to be completed in May. Overlaps and gaps identification Recent months have seen less of overlaps and gaps arising, but rather more activity in promotion of joint use and collaboration to offset or minimise overlaps. On the NewsML front, a presentation on NewsML has been made to the SMPTE Metadata Committee, and a return presentation on the SMPTE metadata work was made during the IPTC Spring Meeting. The PRISM specification, now at the Working Draft stage, are looking at issues involved in joint use of their specification and NewsML before finalising the specification. Although the aims of both sides are generally complementary, there has been some overlap between the standards, and a NewsML report has proposed ways of achieving compatibility. The nature of the content carried by NewsML will influence the use and development of the standard, and this content can be in many different forms. A survey found more than four hundred standards (generally XML-based) being developed, or already in use, for specialised industries. There is already overlap between many of these initiatives, but co-operation is happening in a number of areas. In general terms, the IPTC Spring Meeting called for co-operation with other organisations involved in metadata development and there is various other evidence, noted above, of the move towards greater co-operation or combination of effort: the "merger" of the OeBF and the EBX Working Group (there is already the OeBF's standards co-ordination initiative, as described in the last report); the PRISM Working Group's aim to make PRISM metadata interoperate with complementary standards; and the IDF's collaboration with various bodies on <indecs>2. Trends One trend certainly seems to be greater collaboration, discussed in the previous section. Part of this might be attributable to a growing recognition of the importance (or inevitability?) of certain key initiatives - those which have become, or are becoming, a stable feature of the metadata landscape in publishing - such as, perhaps, Dublin Core and the DOI. This may arise from a de facto acceptance of their applicability and utility, or simply from their level of adoption by the industry. The International DOI Foundation themselves recognise the need for collaboration and awareness, citing activities in, for example, the OeBF, the World Wide Web Consortium, the Internet Engineering Task Force (IETF) and MPEG as a motivation for their funding of a "content industry wide" Rights Data Dictionary. This is a trend which should continue. Many metadata ventures understandably arise out of the needs of a certain domain, such as publishing and, at least initially, focus on those particular needs. But that scope can broaden and boundaries start to blur. Publishing needs to deal with audio-visual content as much as textual, but the audio-visual sector has its own metadata initiatives. There is already movement towards broader collaboration, such as <indecs>2 which is aimed at multimedia. Another example is the moves made by the Tribune group, a US-based multimedia company covering newspapers and other publications, TV, cable news, and radio, as well as Internet news and information services. They want an economical and practical content sharing system for multimedia, and they see a solution in some form of single repository with a searchable index and common metadata. They have analysed how key areas of metadata would map between standards - difficult because standards have been developed to meet different needs as noted above, giving different metadata structures - and conclude that converging these standards would be impracticable. This raises the question of how to use the metadata assets effectively, and as a first step they are proposing that the IPTC and SMPTE (Society of Motion Picture and Television Engineers - www.smpte.org) should get together to develop a common Media Independent Protocol (MIP), with each group defining the mapping of its metadata to the protocol. Thus far, the proposal has been favourably received by both the IPTC and the SMPTE, and work is continuing. Perhaps another trend may be the development of more advanced approaches to metadata construction. Much of content description can be based on keywords, but these have their limitations - they can be ambiguous or over-specific. The Machine Understanding Group at the MIT Media Laboratory has a long-running "The News in the Future" (NIF) programme (nif.www.media.mit.edu); within that, there is research into ways of describing content using disambiguated concepts. A Structured Controlled Vocabulary (SCV) called BRICO controls the relationships between terms, broadly in the manner of a thesaurus, but not the terms themselves. Past and current sponsors of NIF include a number of organisations from the publishing and broadcasting sectors. As an example of the use of this idea, the existing keyword system in NITF could be complemented with suitably developed concept references. Main issues Activity in the area of e-books continues to increase, and the AAP, IDF and OeBF are all working hard in this area. However, while much of this work may be essential to a viable e-book market, that market may take a while to happen. Gartner Group sees e-books as in their early stages, despite that vendors "have been trying to market these things for some time." Device sales have been low, with Jupiter Media Metrix estimating less than 50,000 e-book hardware devices in use in the United States, and attributing the low number to lack of content and high prices; and they see an e-book audience of only just 1.9 million by the end of 2005. Jupiter's Robert Hertzberg commented "Reading an e-book is just like reading a book ... but it's just less fun, more expensive and heavier. That's not much of a marketing motto." Lack of a single software standard or hardware platform is also seen as a contributing factor, and amongst the standards issues are those of copyright, distribution, and security, which is where metadata has a role. Forrester Research agrees, predicting slow growth. They estimate digital delivery of custom-printed books, textbooks, and e-books to account for total revenues of $7.8 billion by 2005, around 17.5 percent of publishing industry revenues, but only $251 million will come from e-books and the necessary devices. These industry observers generally all see technical and educational books as being the first which may gain acceptance, since their audience is more attuned to digital delivery. The industry appears to concur - in January, netLibrary, Inc and Houghton Mifflin announced plans to launch a digital textbook initiative, and February saw McGraw-Hill and the American Society of Mechanical Engineers International (ASME) forming alliances with technology vendors to improve digital delivery to specialised audiences, believing that education and professional services were the most immediate prospects for e-publishing. Not all pundits agree with this gloomy forecast. IDC sees demand for e-books building slowly in 2001, then exploding in 2002, with the US market growing from $9 million in 2000 to $414 million in 2004. But note that, as recorded in the last Metadata Watch report, there had been estimates of $12million sales in downloaded books in 1999. And a recent Seybold survey showed a lukewarm attitude in North America towards reading electronic content and even less of a commitment to spending money for it. In fact, two-thirds of all respondents were "not at all likely" to purchase an e-book or dedicated device in the next year. So, on one hand we have the publishing industry and a good number of technology vendors committing themselves heavily in this market, on the other we have an audience which is showing some evidence of indifference and little evidence of imminent growth. The standards work is necessary, and metadata initiatives will have a key role in enabling e-book production and commerce, but it does rather look like that they will not need to hurry. Multilingualism The publishing sector has historically seen more of a focus on trading and rights management, and therefore on developing the enabling mechanisms for business use. Accordingly, multilingual aspects have perhaps taken a back seat. This is not to say that language is totally ignored. Clearly, in describing content, the source language of that content is a valuable descriptive element, and is commonly found in publishing metadata schemes. Indeed, various initiatives have adopted Dublin Core, and hence employ its language element. As an example, EBX - focussed on trading and distribution of e-books, employs Dublin Core for what it calls its 'concise metadata', while suggesting schemes such as ONIX for 'extended' metadata, but within its specification there is no mention of language or multilingual aspects. And in any event, employing a language descriptor is not the same as providing for multilingualism. Moreover, with the emphasis on business processing, typically within an XML structure, the formal elements cannot admit linguistic variation; they must be consistently presented to comply with the protocol. For example, PRISM uses Dublin Core, and also describes its own elements Dublin Core-style: by way of an Identifier, a formal XML element type (a protocol element) which must be expressed as-is, such as prism:distributor; and a name, such as Distributor, which can be expressed in any language. Controlled vocabularies Controlled vocabularies occur to a limited extent in publishing metadata. Certainly, given the common adoption of Dublin Core, publishing metadata can follow any controlled vocabulary aspects of that initiative. Certain others offer vocabularies -NewsML proposes a controlled vocabulary for its Topic Types for example. PRISM is notable here - indeed, its aim is the development of a standard XML metadata vocabulary. As the PRISM people say: "But while XML specifies how things can be encoded for exchange, it does not specify what information must be exchanged. Therefore, the publishing industry needs standard vocabularies such as PRISM to realise the potential of e-commerce in online publishing." It defines sets of controlled vocabularies, for example for resource types and categories. Moreover, although these are early days yet, the International DOI Foundation (IDF) is funding the feasibility study for the development of a Rights Data Dictionary (RDD) - "a common dictionary or vocabulary for intellectual property rights." Even so, one might expect more in this sector in the development of controlled vocabularies, since the e-business bias should encourage descriptive precision, but there is not much evidence found at present - the priorities for the moment are different perhaps. [ contents | section 1 | section 2 | section 3 | section 4 | section 5 ]
Maintained by: UK Office for Library and
Information Networking (UKOLN)
|
|||||||||||||||||