2013-05-08

Image metadata on the Web: URL as identifier

Before you start thinking about common metadata for your images (creator, date created, caption, license), first consider what I think is the most important piece of metadata: A unique identifier for your image. And please make it a URL. Why?

First, you want to avoid duplicates in search engine results. You’ll be using the same image on different Web pages, possibly with slight variations: Different sizes, file formats, or cropping. Which means that the URL to the image file is not the same. A unique identifier makes sure others can find out these are just renditions or variations of the same image. (Current image search engines often show lots of duplicates. If they don’t make use of our nice identifiers once we add them, we can always roll our own search engine… ☺ Yes, I’m serious.)

Second reason: A well-groomed image will have lots of metadata. Temporal, geographical, creator and licensor related, subject descriptions, licensing terms. You don’t want to add all this baggage to each Web page the image is used on, so you need a separate place to publish all the metadata for that image. And once you have it, it makes perfect sense use that place as the permanent home for your image and use its URL as the image’s unique identifier.

Suppose that you’re using that URL/identifier whenever you publish or distribute the image: You put it into your HTML, embed it into the image files, and make sure it doesn’t get lost if you register the image with a registry like PLUS or distribute it through third parties like Flickr or Getty Images. What have you just gained? Well, now you can remain the authoritative source of your image’s metadata! You can fix mistakes, add renditions or links or legal notes and change licensing terms at will because you’re in control of that URL. (Third parties probably won’t recognize your self-hosted metadata yet, but let’s move into that direction.) 

To practice what I preach, I have added an RDFa resource attribute to the HTML div containing the blog post’s photo (you might want to view the HTML source code of the previous post). An example:

<div resource="http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q" typeof="schema:ImageObject">
<img src="/device_strehle/dev1/2013/05-02/72/65/file69wpi6cfox11c7cgw70q.jpg" />
</div>

With this HTML markup, I’m also telling search engines that the referenced URL is about an image, using the schema.org ImageObject type. (I’m a newbie re schema.org and RDFa, suggestions for improvement are welcome!)

What if someone just downloads the image file, ignoring my lovingly-crafted HTML markup? I want them to see my URL as well. So I’m embedding it in the XMP-plus:ImageSupplierImageID metadata field of the JPEG file using ExifTool:

exiftool -XMP-plus:ImageSupplierImageID=http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q IMG_1980.jpg

(This is just a first try, there’s probably other metadata fields I should write it to. I’m choosing this field for now because you can see and modify it in Photoshop: File / File Info… / IPTC Extension / Supplier’s Image ID.)

Note that the URL I’m pointing to doesn’t yet exist: I’ll create that page in the next step. For now, I have just added a unique identifier that looks like a URL (so the correct name is probably URI or IRI, can’t get used to that).

For reference, here’s a few other places that I don’t fully understand yet, but look like they should possibly also contain the URL/identifier if the image gets distributed in a suitable format:

EXIF ImageUniqueID. PLUS LDF Terms and Conditions URL / Licensor Image ID / Copyright Owner Image ID / Image Creator Image ID. ODRL Asset uid. schema.org url property. IPTC NewsML G2 newsItem guid attribute / web (Web address) element. PRISM url element. XMP xmp:Identifier / xmpRights:WebStatement / xmpMM:DocumentID. Dublin Core Metadata Element Set identifier. 

(I’m sure there’s more. Yes, this makes my head explode as well. Please tell me that it’s much simpler than that.)

What do you think? I’d love to hear your feedback (@tistre on Twitter; for e-mail addresses see my home page).

Update (2018-09-06): Five years later, I still don’t know… There’s also plus:licensorImageID. See also Christian Weiske – Adding the source URL to an image's meta data.

Wed, 08 May 2013 05:44:37 +0000