|
||||||||||||||||||
|
|
Metadata Watch Report #5[ contents | section 1 | section 2 | section 3 | section 4 ] Section 5 - Controlled vocabulariesAppendix A: Domain Report: Cultural Heritage Sector In the relatively short history of metadata for Web resources, many metadata creators have been typing in text for the metadata content without using commonly known controlled vocabularies. In the worst case, they did just type in a couple of keywords that they thought of on the spot. In more organised environments (e.g. producing resources for a specific project or in a specific domain) they may have used a list that was agreed upon within their group of collaborators. It is becoming widely recognised that interoperability can break down completely if people enter metadata content with no underlying agreements. How can searches for a certain subject deliver relevant resources if every metadata creator assigns keywords that are not know to others? We need at least some form of consistency in order to be able to build useful discovery services. As many people realise, controlled vocabularies can help to address this problem. Some people use words like 'ontologies', 'semantic registers', 'concept categories' or 'thesauri'; whereas the exact definitions of these words or the objectives that they serve may vary depending per domain, the important point for this discussion is that they define a finite set of terms, usually maintained by some organisation and intended to bring some structure in the subjects covered in a certain domain. Of course, the idea is not new - in various domains controlled vocabularies have been in use for a long time, sometime for centuries. We find them in the form of flat lists of terms, but they can also be structured hierarchically or be part of a networked structures that allow various types of relations to be expressed between terms. These lists help creators of metadata by giving them a list of subjects to choose from, maybe in the form of drop-down menus in their user interface. The same list can of course be used to suggest search terms to searchers, thereby enhancing the chances that the user will find relevant results. Some controlled vocabularies also address the multilingual issue by providing parallel lists in multiple languages. The list for a specific language can then be used by metadata creators and searchers speaking that language as above, while smart tools can either build a single language-independent index or expand searches to search parallel indexes based on the equivalence of the terms in the various languages. Other controlled vocabularies are based on a numerical approach. These have the advantage of being language-independent and can be presented to users in the right language in the user interface. The disadvantage is that it is not obvious from looking at the raw metadata what the code stands for. The issue of controlled vocabularies is now becoming an important discussion topic within and between metadata standardisation and implementation activities. When appropriate controlled vocabularies are publicly available and are consistently used in metadata creation, more useful search facilities can be built and long-term interoperability can be ensured. [ contents | section 1 | section 2 | section 3 | section 4 ]
Maintained by: UK Office for Library and
Information Networking (UKOLN)
|
|||||||||||||||||