6. Metadata and Metadata Standards [2] (1)
1 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
6. Metadata and Metadata
Standards [2]
IS 202 - 14 September 2006
Bob Glushko
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
2 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Plan for IO & IR Lecture #6
The same item using different metadata models and
syntaxes
MARC Record
Dublin Core
Metadata incompatibility and interoperability
"Metadata" {and,or,vs} "Vocabulary"
Cory Doctorow's Rant
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
3 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The Same Item in Different
Metadata Models
MARC (MAchine-Readable Catalog) Record
International Standard Bibliographic Description (ISBD)
RFC 1807
Text Encoding Initiative (TEI) Header
Dublin Core
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
4 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The MARC Record
1968 - When the Library of Congress began to use
computers in the 1960s, it devised the LC Machine
Readable Catalog Format, a system of using brief
numbers, letters, and symbols within the cataloging
record itself to mark different types of information.
MARC mandates a rich description with strong datatyping
and vocabulary control for the values of its metadata
elements
In the 1980s (and revised in 2002) the Anglo-American
Cataloguing Rules (AACR) extended the MARC standard
so that it could describe music and various other kinds of
"non-book" entities
This "integration" causes some substantial technical and
theoretical concerns
MARC is often criticized for being unsuited to the modern
computing environment
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
5 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The MARC Record [Example]
ID:DCLC9124851-B
RTYP:c
ST:p
FRN:
MS:c EL:
AD:06-20-91
CC:9110 BLT:am
DCF:a
CSC:
MOD:
SNR:
ATC:
UD:04-11-92
CP:cou
L:eng
INT:
GPC:
BIO:
FIC:0
CON:b
PC:s
PD:1992/
REP:
CPI:0
FSI:0
ILC:a
II:1
MMD:
OR:
POL:
DM:
RR:
COL:
EML:
GEN: BSE:
010
9124851
020
0872878112 (cloth)>
020
0872879674 (paper)
040
DLC$cDLC$dDLC
050 00 Z693$b.W94 1991
082 00 025.3$220
100 1
Wynar, Bohdan S.
245 10 Introduction to cataloging and classification /$cBohdan S.
Wynar.
250
8th ed. /$bArlene G. Taylor.
260
Englewood, Colo. :$bLibraries Unlimited,$c1992.
300
xvii, 633 p. :$bill. ;$c24 cm.
440 0 Library science text series
504
Includes bibliographical references (p. 591-599) and index.
650 0 Cataloging.
650 0 Subject cataloging.
650 0 Classification$xBooks.
630 00 Anglo-American cataloguing rules.
700 10 Taylor, Arlene G.,$d1941-
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
6 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
ISBD Syntax
Title Proper (GMD) = Parallel title : other title info / First
statement of responsibility ; others. -- Edition information. -Material. -- Place of Publication : Publisher Name, Date. -Material designation and extent ; Dimensions of item. -(Title of Series / Statement of responsibility). -- Notes. -Standard numbers: terms of availability (qualifications).
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
7 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
ISBD Instance
Introduction to cataloging and classification / Bohdan S.
Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. :
Libraries Unlimited, 1992. -- (Library science text series).
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
8 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
RFC 1807
BIB-VERSION:: CS-TR-v2.1
ID:: UCB//123456
ENTRY:: September 9, 1997
TYPE:: BOOK
TITLE:: Introduction to cataloging and classification
AUTHOR:: Wynar, Bohdan S.
AUTHOR:: Taylor, Arlene G.
DATE:: 1992
PAGES:: 633
COPYRIGHT:: Libraries Unlimited, 1992
SERIES:: Library Science Text Series
END:: UCB//123456
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
9 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
TEI Header (Minimal)
<teiHeader>
<fileDesc>
<titleStmt>
<title> Introduction to cataloging and
classification</title>
<respStmt><name>Bohdan S. Wynar<resp> 8th edition by</resp>
<name>Arlene G. Taylor</name>
</respStmt>
</titleStmt>
<publicationStmt>
<distributor>Libraries Unlimited</distributor>
</publicationStmt>
<sourceDesc>
<bibl> Introduction to cataloging and classification /
Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo.
: Libraries Unlimited, 1992.
</bibl>
</sourceDesc>
</fileDesc>
<teiHeader>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
10 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Dublin Core
Proposed in 1995 as a standard set of metadata
elements, simple enough be be supplied by a document's
author rather than by a professional metadata-maker
DC is the set of elements, described abstractly and all
optional
The semantics of DC were established by an
international, cross-disciplinary group of professionals
from librarianship, computer science, text encoding, the
museum community, and other related fields
There are specifications of how to use it in numerous
syntaxes (especially XML and RDF) and languages
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
11 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The Dublin Core Elements [1]
TITLE -- the name given to the resource
IDENTIFIER -- an unambiguous reference to the resource
within a given context
SUBJECT -- the topic of the resource's content; key
words or classification phrases
CREATOR -- an entity primarily responsible for making
the content of the resource
CONTRIBUTOR -- An entity responsible for making
contributions to the content of the resource
PUBLISHER -- the entity primarily responsible for making
the resource available
DATE -- a date associated with an event in the life cycle
of the resource
FORMAT -- the physical or digital manifestation of the
resource
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
12 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The Dublin Core Elements [2]
DESCRIPTION -- an account of the content of the
resource; abstract, TOC, etc.
LANGUAGE -- a language of the intellectual content of
the resource
TYPE -- the nature or genre of the content of the resource
RIGHTS -- information about rights held in and over the
resource
SOURCE -- reference to a resource from which the
present resource is derived
RELATION -- reference to a related resource
COVERAGE -- the extent or scope of the content of the
resource
AUDIENCE -- a class of entity for which the resource is
intended or useful
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
13 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Dublin Core [Example]
<dc:title>Introduction to cataloging and classification</dc:title>
<dc:creator>Taylor, Arlene G.</dc:creator>
<dc:contributor>Wynar, Bohdan S.</dc:contributor>
<dc:date>1992</dc:date>
<dc:format>book</dc:format>
...
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
14 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Using the Dublin Core
"Some information may appear to belong in more than
one metadata element"
"There is potential semantic overlap between some
elements"
"There will occasionally be some judgment required from
the person assigning the metadata"
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
15 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Metadata Incompatibility
All of these metadata models and syntax co-exist but they
are not completely compatible
Some of this incompatibility reflects the different purposes
and audiences for which the standard was created
This is reflected in different scopes and granularity of the
metadata elements
There are also no guarantees of semantic equivalence
among the seemingly corresponding metadata elements
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
16 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Achieving Metadata Interoperability
[1]
"We do not need a bibliographic record format. We need
a bibliographic metadata infrastructure... Our systems
must be able to accommodate a great diversity of record
formats to provide us with the flexibility and power that
only such diversity can provide" (Tennant)
Interoperability doesn't require that two systems be
identical in design or implementation, only that they can
exchange information and use the information they
exchange.
Interoperability requires that the information being
exchanged is conceptually equivalent
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
17 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Achieving Metadata Interoperability
[2]
If conceptual equivalence can be established, converting
one implementation to another is a necessary but often
trivial thing to do
But it isn't always possible to establish equivalence, and it
is often not bi-directional because one model is "smarter"
or "richer" than anoteher
And even when you can, it may not be possible to
automate the transformation
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
18 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Metadata Encoding and
Transmission Standard (METS)
http://www.loc.gov/standards/mets
Developed by the Digital Library Federation as an
implementation strategy for preservation metadata
(needed to periodically refresh and migrate the data,)
Specifies an XML syntax for packaging metadata
adhering to different standards as parts in a container and
associating it with the same object
METS doesn't address the problem that the metadata
standards are different; it just defines a standard way to
package a set of them
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
19 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Crosswalks
A transformation that re-encodes, renames, rearranges,
or restructures information from one metadata standard to
another is sometimes called a CROSSWALK
First you need to establish the conceptual equivalence of
information in the source and target models
It is sometimes useful to define equivalences for subsets
or profiles of different metadata models and settle for a
partial crosswalk
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
20 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Interchange Formats
Ideally, any two metadata standards could interoperate by
transforming them into a common interchange format
This would reduce the N x N requirement for crosswalks
from any model to another to the simpler 2 x N task of
transforming each to and from the interchange format
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
21 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
"Metadata" {and, or, vs}
"Vocabulary"
"Metadata" usually means description information about
some content or entity
Often general-purpose or "horizontal"
"Vocabulary" means the set of terms needed to encode
the semantics in some content domain
Can be horizontal, but often "domain-specific" or "vertical"
A "document type" model is defined by its "vocabulary"
Distinction not always clear or important; both metadata
and vocabularies are MODELS of what they describe
Interoperability, crosswalks, interchange hubs, etc
concepts apply to both metadata and vocabularies
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
22 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
"Vocabulary" Interoperability
Example -- The Target Model for an
Order
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
23 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The XSD Schema for the Expected
Order [1]
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="Order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="BuyersID" type="xs:string"/>
<xs:element name="BuyerParty" type="PartyType"/>
<xs:element name="OrderLine" type="OrderLineType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PartyType">
<xs:sequence>
<xs:element name="ID" type="xs:string"/>
<xs:element name="PartyName" type="PartyNameType"/>
<xs:element name="Address" type="AddressType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="PartyNameType">
<xs:sequence>
<xs:element name="Name" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
24 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The XSD Schema for the Expected
Order [2]
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="Room" type="xs:string"/>
<xs:element name="BuildingNumber" type="xs:string"/>
<xs:element name="StreetName" type="xs:string"/>
<xs:element name="CityName" type="xs:string"/>
<xs:element name="PostalZone" type="xs:string"/>
<xs:element name="CountrySubentity" type="xs:string"/>
<xs:element name="Country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="OrderLineType">
<xs:sequence>
<xs:element name="LineItem" type="LineItemType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="LineItemType">
<xs:sequence>
<xs:element name="BookItem" type="BookItemType"/>
<xs:element name="BasePrice" type="xs:decimal"/>
<xs:element name="Quantity" type="xs:int"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="BookItemType">
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author" type="xs:string"/>
<xs:element name="ISBN" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
25 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The Expected Instance
<Order>
<BuyersID>91604</BuyersID>
<BuyerParty>
<ID>KEEN</ID>
<PartyName>
<Name>Maynard James Keenan</Name>
</PartyName>
<Address>
<Room>505</Room>
<BuildingNumber>11271</BuildingNumber>
<StreetName>Ventura Blvd.</StreetName>
<CityName>Studio City</CityName>
<PostalZone>91604</PostalZone>
<CountrySubentity>California</CountrySubentity>
<Country>USA</Country>
</Address>
</BuyerParty>
<OrderLine>
<LineItem>
<BookItem>
<Title>Foucault's Pendulum</Title>
<Author>Umberto Eco</Author>
<ISBN>0345368754</ISBN>
</BookItem>
<BasePrice>7.99</BasePrice>
<Quantity>1</Quantity>
</LineItem>
</OrderLine>
</Order>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
26 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Identical Model with Different Tag
Names [1]
<Customer>
<Number>KEEN</Number>
<Name>
<BusinessName>Maynard James Keenan</BusinessName>
</Name>
<Location>
<Unit>505</Unit>
<StreetNumber>11271</StreetNumber>
<Street>Ventura Blvd.</Street>
<City>Studio City</City>
<ZipCode>91604</ZipCode>
<State>California</State>
<Country>USA</Country>
</Location>
</Customer>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
27 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Identical Model with Different Tag
Names [2]
<Acheteur>
<ID>KEEN</ID>
<Nom>
<NomCommercial>Maynard James Keenan</NomCommercial>
</Nom>
<Addresse>
<Appartment>505</Appartment>
<Bâtiment>11271</Bâtiment>
<Rue>Ventura Blvd.</Rue>
<Ville>Studio City</Ville>
<CodePostal>91604</CodePostal>
<Etat>California</Etat>
<Pays>USA</Pays>
</Addresse>
</Acheteur>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
28 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Same Model, Attributes Instead of
Elements
<BuyerParty
ID="KEEN"
Name="Maynard James Keenan"
Room="505" BuildingNumber="11271"
StreetName="Ventura Blvd."
City="Studio City"
State="California"
PostalCode="91604"
>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
29 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Granularity Conflicts
<Address>
<StreetAddress>11271 Ventura Blvd. #505</StreetAddress>
<City>Studio City 91604</City>
<CountrySubentity>California</CountrySubentity>
<Country>USA</Country>
</Address>
<PartyName>
<FamilyName>Keenan</FamilyName>
<MiddleName>James</MiddleName>
<FirstName>Maynard</FirstName>
</PartyName>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
30 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Assembly Mismatch - Separate
Customer and Order Documents [1]
<BuyerParty>
<ID>KEEN</ID>
<PartyName>
<Name>Maynard James Keenan</Name>
</PartyName>
<Address>
<Room>505</Room>
<BuildingNumber>11271</BuildingNumber>
<StreetName>Ventura Blvd.</StreetName>
<CityName>Studio City</CityName>
<PostalZone>91604</PostalZone>
<CountrySubentity>California</CountrySubentity>
<Country>USA</Country>
</Address>
</BuyerParty>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
31 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Assembly Mismatch - Separate
Customer and Order Documents [2]
<Order>
<BuyersID>91604</BuyersID>
<BuyerParty>
<ID>KEEN</ID>
</BuyerParty>
<OrderLine>
<LineItem>
<BookItem>
<Title>Foucault's Pendulum</Title>
<Author>Umberto Eco</Author>
<ISBN>0345368754</ISBN>
</BookItem>
<BasePrice>7.99</BasePrice>
<Quantity>1</Quantity>
</LineItem>
</OrderLine>
</Order>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
32 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Conceptual Incompatibility
<Address>
<Latitude direction="N">37.871</Latitude>
<Longitude direction="W">-122.271</Longitude>
</Address>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
33 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
The "Not So Fast" Cases that Might
Even Validate
The names are the same but the semantics aren't
<BuyerParty>
<ID>555-22-1234</ID>
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
34 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Validation Does Not Imply
Interoperability
Suppose the document validates against the recipient's
schema
The semantics can still be different in important ways (the ID
SSN example) – the strongest level of validation can fall
short of establishing that the "same tags" have exactly the
"same meaning" to the sender and recipient
Furthermore, the recipient may not be able to validate all of
the business rules that are important
This is a good argument for industry standards / reference
models / in your conceptual models or using XML
vocabularies that represent them in authoritative ways
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
35 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Doctorow on Metadata
People lie
People are lazy
People are stupid
People delude themselves
Metadata metrics distort it
Metadata suffers from "the vocabulary problem"
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
36 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Graded Assignment 1: Designing a
Vocabulary
Develop a vocabulary for describing some aspects of
sports or some aspects of music - choose the domain that
interests you the most.
Identify and define at least 10 terms or semantic
components needed in the vocabulary
Test the adequacy of the coverage of your sports or
music vocabulary by using it to describe a real or
hypothetical event in one existing sport or music category
of your choosing
This does not require any XML
Due on next Tuesday 19 September before class
9/13/2006 8:50 PM
6. Metadata and Metadata Standards [2] (1)
37 of 37
file:///C:/Documents%20and%20Settings/glushko/My%20Documents/L...
Readings for IO & IR Lecture #7
Svenonius Chapter 6, Chapter 8 (127-132)
Karl Fast, Fred Liese, and Mike Steckel. What is a
controlled vocabulary?
Karl Fast, Fred Liese, and Mike Steckel. Creating a
controlled vocabulary.
Glushko and McGrath, Document Engineering, Chapter
12 (399-406)
http://www.sims.berkeley.edu/~glushko/DocumentEngineering
9/13/2006 8:50 PM