Tim’s Weblog Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, managing software development and Digital Asset Management, since 2002.

ImageSnippets | A Metadata Authoring System for Images

ImageSnippets™ is a system for creating structured, transportable metadata for your images. It can be used as a digital asset management tool as well as an image/metadata publishing platform.”

Take a look at the help pages, and read Margaret Warren’s post introducing ImageSnippets to the iptc-photometadata Yahoo! Group – a new system which can help with protecting images from becoming orphans:

“ImageSnippets is a bit of a swiss-army knife prototype at the moment with many new types of terms and features not typically found in current metadata editing environments.

[…] The system creates an HTML+RDFa file containing a link to the image AND all of it's metadata is represented as structured data in the file.”

I like that it combines public, application-level and personal datasets. That you can reference an image by its URL, i.e. you don’t have to upload it and can still add metadata for it. (Reminds me of the DAM Value Chains – Metadata article by Ralph Windsor: “separate a digital file from metadata and other associated asset data so you could more easily delegate the task of managing it.”) And I love that it publishes RDFa!

Tue, 28 May 2013 20:57:07 +0000

Short links (2013-05-28)

Tue, 28 May 2013 13:52:51 +0000

Richard Wallis: Putting Linked Data on the Map

Richard Wallis – Putting Linked Data on the Map:

“Linked Data is just there – without the need for an API the raw data (described in RDF) is ‘just there to consume’. With only standard [http] web protocols, you can get the data for an entity in their dataset by just doing a http GET request on the identifier.

[…] So why is this often missed? Maybe it is because there is nothing to learn, no API documentation required, you can see and use it by just entering a URI into your web browser – too simple to be interesting perhaps.”

Mon, 27 May 2013 08:12:50 +0000

Linked Data for better image search on the Web

Today, searching the Web for an image that you’re allowed to use in public (either at no cost or after paying a license fee) is a suboptimal experience. Web search engines Google or Bing turn up images with unclear rights or in bad quality. Specialized “silos” like Getty Images or iStock Photos work well for professionals but only find those images that were submitted to them on their terms.

(An interesting alternative approach is the German i-picturemaxx (APIS) network that allows distributed searches across a network of servers, but is closed / “pay to play” and based on proprietary technology.)

I think the future lies in publishing better image metadata on the Web, and better image search engines that make use of that metadata. Whether you’re a pro photographer, a hobbyist or a news agency – make sure there’s a simple HTML page on the Web for each of your images. With essential metadata (license or offer, description, your contact information) embedded in the HTML source code as semantic RDFa markup. Then let the search engine crawlers do their job. If they don’t pick up and make good use of that metadata, let’s build a new image search engine that does!

Sounds too simple? I’m actually a Semantic Web skeptic. Cory Doctorow’s 2001 criticism is still very much valid and explains why the “SemWeb” hasn’t taken off yet. But I think it could work here: Image licensing is an existing market with some money on the table. There is an incentive for both producers and consumers of digital images; finding the right photo is hard and copyright and licensing become increasingly important. (Plus it helps that it’s potentially a global market with few barriers: If you find the perfect photo of a rose, it shouldn’t matter that it was taken by an amateur who lives on a different continent and doesn’t speak English.)

What is difficult, and will remain so, is getting content creators to take the time to add meaningful, structured metadata. And to make their metadata play along well with other creators’. People describe things in different words: There’ll never be perfect alignment. But some common usage should evolve once the benefits become obvious (think folksonomy and SEO).

These things are also difficult, but we can do something about them: Reusing and improving common vocabularies and combining them with our own, custom terms. Building and spreading software that makes metadata editing and vocabulary juggling easy, or even fun. Agreeing on the protocols and formats to be used for publishing metadata on the Web, and having software support them. Getting existing or new image search engines to use the metadata. And helping creators and customers make transactions. 

Lots of work to do. But I think publishing and crawling metadata on the open Web are the critical first step.

The protocol and format should be HTTP and HTML with RDFa: HTTP and HTML (and the ecosystem of browsers and search engines) have proven to work well at “Web scale”, with millions of producers and billions of consumers of information. HTML is readable by any human with a Web browser, which is its killer feature. And RDFa seems to win the race against microdata for semantic markup within HTML. (The current discussion on embedded metadata in image files is important as well, but in HTML it’s so much easier to access and modify that I see it as the primary data source.)

Note that I don’t care whether image distributors offer an API. As a developer, I’m getting tired of APIs (at least for read access). Imagine you have three sources for image metadata; one offering a CMIS API, one implementing OAI-PMH and the third being the Getty Images API. How many pages of documentation are you going to read, how much development time are you going to spend until you can do a simple keyword search and list essential metadata from each? (And once you’re done, how about the other 215 photo Web service APIs?)

What do you think – am I aiming too high, missing something, or am I on the right track? I’d love to hear your thoughts.

Update: Ralph Windsor replies – Applying Linked Data Concepts To Derive A Global Image Search Protocol. My follow-up: Linked Data for public, siloed, and internal images. And an in-depth article by Ralph Windsor: The Building Blocks Of Digital Asset Management Interoperability.

Update (2020-02-21): The IPTC announces Google’s “Licensable Images”, see Image License Metadata in Google Images (BETA).

Sun, 26 May 2013 21:46:18 +0000

Short links (2013-05-22)

Wed, 22 May 2013 12:47:43 +0000

David Diamond: Five Reasons Why DAM is No Photoshop

David Diamond on CMSWire – Five Reasons Why DAM is No Photoshop:

“So what went wrong with the DAM industry? Where is the explosive growth? The IPOs?

[…] DAM vendors lack vision. Just as one could argue that PayPal should have been a product of Western Union, it's easy to argue that DropBox and Google Drive should have come from a DAM vendor.

[…] If a DAM vendor knows anything about DAM, it should be able to speak about it in unique terms, in content authored by its own personnel. Agreeing with Henrik de Gyor, linking to David Riecks articles, or retweeting Real Story Group is not how DAM vendors will move this industry forward.

[…] You can’t just unplug your metadata and assets from one DAM and plug them into another. This is bad news for disgruntled customers, but it’s great news for lazy DAM vendors. Business professionals call it 'high switching costs.'”

Thu, 16 May 2013 20:40:08 +0000

Short links (2013-05-15)

Wed, 15 May 2013 21:24:18 +0000

Cameron Morrissey: Jump Under the Bus

Cameron Morrissey on great leaders – Diary Entry #117 – Jump Under the Bus:

“Any mistake in their area of oversight is their fault – They should have seen it coming, should have prepared better, should have audited work better, or should have set up better processes. They understand that there is always something they could have done to prevent the mistake from occurring, and while the employee or peer may have had culpability as well, ultimately they are the leader.”

Wed, 15 May 2013 08:46:10 +0000

Seth Godin: Lead up

Seth Godin – Lead up:

“We have an astonishing amount of freedom at work. Not just the freedom to call meetings, make phone calls and pitch ideas, but yes, the freedom to quit, to find a new gig, to pick the clients we're going to take on and to decide how we're going to deal with a request from someone who seems to have far more power than we do. "Yes, sir" is one possible answer, but so is leading from below, creating a reputation and an environment where the people around you are transformed into the bosses you deserve.”

Mon, 13 May 2013 07:55:07 +0000

Image metadata on the Web: URL as identifier

Before you start thinking about common metadata for your images (creator, date created, caption, license), first consider what I think is the most important piece of metadata: A unique identifier for your image. And please make it a URL. Why?

First, you want to avoid duplicates in search engine results. You’ll be using the same image on different Web pages, possibly with slight variations: Different sizes, file formats, or cropping. Which means that the URL to the image file is not the same. A unique identifier makes sure others can find out these are just renditions or variations of the same image. (Current image search engines often show lots of duplicates. If they don’t make use of our nice identifiers once we add them, we can always roll our own search engine… ☺ Yes, I’m serious.)

Second reason: A well-groomed image will have lots of metadata. Temporal, geographical, creator and licensor related, subject descriptions, licensing terms. You don’t want to add all this baggage to each Web page the image is used on, so you need a separate place to publish all the metadata for that image. And once you have it, it makes perfect sense use that place as the permanent home for your image and use its URL as the image’s unique identifier.

Suppose that you’re using that URL/identifier whenever you publish or distribute the image: You put it into your HTML, embed it into the image files, and make sure it doesn’t get lost if you register the image with a registry like PLUS or distribute it through third parties like Flickr or Getty Images. What have you just gained? Well, now you can remain the authoritative source of your image’s metadata! You can fix mistakes, add renditions or links or legal notes and change licensing terms at will because you’re in control of that URL. (Third parties probably won’t recognize your self-hosted metadata yet, but let’s move into that direction.) 

To practice what I preach, I have added an RDFa resource attribute to the HTML div containing the blog post’s photo (you might want to view the HTML source code of the previous post). An example:

<div resource="http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q" typeof="schema:ImageObject">
<img src="/device_strehle/dev1/2013/05-02/72/65/file69wpi6cfox11c7cgw70q.jpg" />

With this HTML markup, I’m also telling search engines that the referenced URL is about an image, using the schema.org ImageObject type. (I’m a newbie re schema.org and RDFa, suggestions for improvement are welcome!)

What if someone just downloads the image file, ignoring my lovingly-crafted HTML markup? I want them to see my URL as well. So I’m embedding it in the XMP-plus:ImageSupplierImageID metadata field of the JPEG file using ExifTool:

exiftool -XMP-plus:ImageSupplierImageID=http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q IMG_1980.jpg

(This is just a first try, there’s probably other metadata fields I should write it to. I’m choosing this field for now because you can see and modify it in Photoshop: File / File Info… / IPTC Extension / Supplier’s Image ID.)

Note that the URL I’m pointing to doesn’t yet exist: I’ll create that page in the next step. For now, I have just added a unique identifier that looks like a URL (so the correct name is probably URI or IRI, can’t get used to that).

For reference, here’s a few other places that I don’t fully understand yet, but look like they should possibly also contain the URL/identifier if the image gets distributed in a suitable format:

EXIF ImageUniqueID. PLUS LDF Terms and Conditions URL / Licensor Image ID / Copyright Owner Image ID / Image Creator Image ID. ODRL Asset uid. schema.org url property. IPTC NewsML G2 newsItem guid attribute / web (Web address) element. PRISM url element. XMP xmp:Identifier / xmpRights:WebStatement / xmpMM:DocumentID. Dublin Core Metadata Element Set identifier. 

(I’m sure there’s more. Yes, this makes my head explode as well. Please tell me that it’s much simpler than that.)

What do you think? I’d love to hear your feedback (@tistre on Twitter; for e-mail addresses see my home page).

Update (2018-09-06): Five years later, I still don’t know… There’s also plus:licensorImageID. See also Christian Weiske – Adding the source URL to an image's meta data.

Wed, 08 May 2013 05:44:37 +0000

Farm animal photos (and semantic markup)

A pig on a technical blog? Sorry for being off-topic (and don’t expect good photos, I’m just a point-and-shoot amateur):

I’ll publish a few images now and then because I want to experiment (in public) with semantic markup for both articles and images. I’d like to find out what search engines make of my HTML markup, and what the challenges are for the markup author.

(I’m deliberately starting from scratch, so if you “view source” today there’ll be no semantic markup for the images yet, and no embedded metadata within the files. Watch this space for updates as I work this out.)

Why? Because I think that in the long term, it’s wrong to publish your digital assets through someone else’s data silo (be it Facebook, Flickr, Picasa, Getty Images or even Imgembed). Let’s evolve conventions for publishing them (with rich metadata) on our own Web sites, under our own domain names. Let links and search engines lead visitors to your own property on the Web, and remain in control of presentation and behaviour.

Thu, 02 May 2013 19:53:22 +0000

Short links (2013-05-02)

Thu, 02 May 2013 08:52:47 +0000