{"id":1657,"date":"2013-05-26T00:00:00","date_gmt":"2013-05-25T22:00:00","guid":{"rendered":"https:\/\/wwwneu.strehle.de\/tim\/weblog\/archives\/2013\/05\/26\/1608\/"},"modified":"2013-05-26T00:00:00","modified_gmt":"2013-05-25T22:00:00","slug":"1608","status":"publish","type":"post","link":"https:\/\/www.strehle.de\/tim\/weblog\/archives\/2013\/05\/26\/1608\/","title":{"rendered":"Linked Data for better image search on the Web"},"content":{"rendered":"<p>Today, searching the Web for an image that you\u2019re allowed to use in public (either at no cost or after paying a license fee) is a suboptimal experience. Web search engines <a href=\"http:\/\/images.google.com\/\">Google<\/a> or <a href=\"http:\/\/www.bing.com\/?scope=images\">Bing<\/a> turn up images with unclear rights or in bad quality. Specialized \u201csilos\u201d like <a href=\"http:\/\/www.gettyimages.com\/\">Getty Images<\/a> or <a href=\"http:\/\/www.istockphoto.com\/\">iStock Photos<\/a> work well for professionals but only find those images that were submitted to them on their terms.<\/p>\n<p>(An interesting alternative approach is the German <a href=\"http:\/\/picturemaxx.com\/index.php?20049042778227936610.00001502929892194251331426052013080804&amp;LCID=2\">i-picturemaxx (APIS) network<\/a> that allows distributed searches across a network of servers, but is closed \/ \u201cpay to play\u201d and based on proprietary technology.)<\/p>\n<p>I think the future lies in publishing better image metadata on the Web, and better image search engines that make use of that metadata. Whether you\u2019re a pro photographer, a hobbyist or a news agency \u2013 make sure there\u2019s a simple HTML page on the Web for each of your images. With essential metadata (license or offer, description, your contact information) embedded in the HTML source code as semantic <a href=\"http:\/\/www.w3.org\/TR\/2012\/NOTE-rdfa-primer-20120607\/\">RDFa<\/a> markup. Then let the search engine crawlers do their job. If they don\u2019t pick up and make good use of that metadata, let\u2019s build a new image search engine that does!<\/p>\n<p>Sounds too simple? I\u2019m actually a Semantic Web skeptic. <a href=\"http:\/\/www.well.com\/~doctorow\/metacrap.htm\">Cory Doctorow\u2019s 2001 criticism<\/a> is still very much valid and explains why the \u201cSemWeb\u201d hasn\u2019t taken off yet. But I think it could work here: Image licensing is an existing market with some money on the table. There is an incentive for both producers and consumers of digital images; finding the right photo is hard and copyright and licensing become increasingly important. (Plus it helps that it\u2019s potentially a global market with few barriers: If you find the perfect photo of a rose, it shouldn\u2019t matter that it was taken by an amateur who lives on a different continent and doesn\u2019t speak English.)<\/p>\n<p>What is difficult, and will remain so, is getting content creators to take the time to add meaningful, structured metadata. And to make their metadata play along well with other creators\u2019. People describe things in different words: There\u2019ll never be perfect alignment. But some common usage should evolve once the benefits become obvious (think folksonomy and SEO).<\/p>\n<p>These things are also difficult, but we can do something about them: Reusing and improving common vocabularies and combining them with our own, custom terms. Building and spreading software that makes metadata editing and vocabulary juggling easy, or even fun. Agreeing on the protocols and formats to be used for publishing metadata on the Web, and having software support them. Getting existing or new image search engines to use the metadata. And helping creators and customers make transactions.<\/p>\n<p>Lots of work to do. But I think publishing and crawling metadata on the open Web are the critical first step.<\/p>\n<p>The protocol and format should be HTTP and HTML with RDFa: HTTP and HTML (and the ecosystem of browsers and search engines) have proven to work well at \u201cWeb scale\u201d, with millions of producers and billions of consumers of information. HTML is readable by any human with a Web browser, which is its killer feature. And RDFa seems to win the race against microdata for semantic markup within HTML. (The current discussion on embedded metadata in image files is important as well, but in HTML it\u2019s so much easier to access and modify that I see it as the primary data source.)<\/p>\n<p>Note that I don\u2019t care whether image distributors offer an API. As a developer, I\u2019m getting tired of APIs (at least for read access). Imagine you have three sources for image metadata; one offering a <a href=\"http:\/\/docs.oasis-open.org\/cmis\/CMIS\/v1.1\/cs01\/CMIS-v1.1-cs01.html\">CMIS<\/a> API, one implementing <a href=\"http:\/\/www.openarchives.org\/OAI\/openarchivesprotocol.html\">OAI-PMH<\/a> and the third being the <a href=\"https:\/\/api.gettyimages.com\/\">Getty Images API<\/a>. How many pages of documentation are you going to read, how much development time are you going to spend until you can do a simple keyword search and list essential metadata from each? (And once you\u2019re done, how about the <a href=\"http:\/\/www.programmableweb.com\/apis\/directory\/1?apicat=Photos\">other 215 photo Web service APIs<\/a>?)<\/p>\n<p>What do you think \u2013 am I aiming too high, missing something, or am I on the right track? I\u2019d love to <a href=\"\/tim\/\">hear<\/a> your thoughts.<\/p>\n<p><em><span class=\"italic\">Update:<\/span><\/em> Ralph Windsor replies \u2013 <a href=\"http:\/\/digitalassetmanagementnews.org\/semantic-web\/applying-linked-data-image-search\/\">Applying Linked Data Concepts To Derive A Global Image Search Protocol<\/a>. My follow-up: <a href=\"\/tim\/weblog\/archives\/2013\/06\/05\/1612\">Linked Data for public, siloed, and internal images<\/a>. And an in-depth article by Ralph Windsor: <a href=\"http:\/\/www.cmswire.com\/cms\/digital-asset-management\/the-building-blocks-of-digital-asset-management-interoperability-021996.php\">The Building Blocks Of Digital Asset Management Interoperability<\/a>.<\/p>\n<p><em>Update (2020-02-21):<\/em> The <a href=\"https:\/\/iptc.org\/news\/announcing-googles-licensable-images-developer-release\/\">IPTC announces Google\u2019s \u201cLicensable Images\u201d<\/a>, see <a href=\"https:\/\/developers.google.com\/search\/docs\/data-types\/image-license-metadata\">Image License Metadata in Google Images (BETA)<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, searching the Web for an image that you\u2019re allowed to use in public (either at no cost or after paying a license fee) is a suboptimal experience. Web search engines Google or Bing turn up images with unclear rights or in bad quality. Specialized \u201csilos\u201d like Getty Images or iStock Photos work well for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_share_on_mastodon":"0"},"categories":[1],"tags":[],"class_list":["post-1657","post","type-post","status-publish","format-standard","hentry","category-weblog"],"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1657","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/comments?post=1657"}],"version-history":[{"count":0,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1657\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/media?parent=1657"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/categories?post=1657"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/tags?post=1657"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}