Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, software development and Digital Asset Management, since 2002.
2012-05-31

Semantic markup for “You can license this image”

Searching the web for images you can actually (legally) use, for commercial or non-commercial purposes, is almost impossible: Google or Bing will show you millions of images, but have no clue under which terms you’re allowed to use them. Lots of “information silos” let professionals search for, and license, rights cleared images, from iStockphoto to Getty Images. If you want your photos to be found there, you’ll have to copy them into one (or more) of these sites (see the Flickr / Getty Images cooperation), which means more work for the photographer. And the user or buyer has to search through multiple silos. Since a lot of these silos exist, most searches will miss out on most of the photos out there.

While curated image collections are fine and can offer consistent, high quality, spam-free content, I think there should also be usable image search engines with much greater coverage. With more and more images being put on the web, it would be great if image search engines could index the most important information directly off the referencing HTML page: Title, description, date created, whether the image is free for non-commercial or commercial use, whether and where I can buy a license.

To the user, it should be a simple list of options in, say, Google image search: “Only images which are free for non-commercial use”, “Only images that are free or can be licensed”. (And if Google doesn’t implement this, others can roll their own image search engines.)

The Semantic Web is trending again and offers great options for marking up metadata within HTML, but unfortunately there’s no “one true way”. What exactly should the HTML markup look like? Would one use WhatWG microdata, schema.org microdata, schema.org RDFa Lite? (As far as I know, PLUS and RightsML cannot be embedded in HTML.)

I have created a separate page with four examples of different ways to mark up an image license. Warning: Since I’m a Semantic Web newbie, they may be wrong or suboptimal…

Example #1: To refer to a Creative Commons license in HTML, you can use “RDFa and the rel=license microformat”, according to this Stack Overflow page on “Semantic HTML markup for a copyright notice”.

Example #2: The WhatWG HTML microdata proposal contains a section on “Licensing works”, with a nice example of an image available under both a Creative Commons and the MIT license – using the microdata format with itemprop=license.

Example #3: The schema.org CreativeWork type has the properties copyrightHolder and copyrightYear, but no license property. IPTC rNews extends schema.org, adding copyrightNotice and usageTerms. The latter sounds like it could refer to a license URL: “xsd:string | xsd:anyURI | owl:thing. A human or machine-readable statement about the usage terms pertaining to the NewsItem.”

Example #4: Same as above, but (instead of microdata) in RDFa Lite format (which in the future can maybe also be used for schema.org markup).

The Google Rich Snippets Testing Tool recognizes only #2 and #3. It likes example #3 best (schema.org in microdata format), but complains about the rNews extension: “Warning: Page contains property "usageterms" which is not part of the schema.”

Do you know of a better markup alternative? Does a license-/rights-aware image search engine already exist? I’m looking forward to your feedback!