Tim’s Weblog Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, managing software development and Digital Asset Management, since 2002.

XHTML+RDFa for structured data exchange

Last week I wrote on Twitter: “Testing my theory that structured data exchange between apps/parties is best done in #XHTML+#RDFa. Human browseable & usable w/ XSLT, XPath.” This is the long version of that tweet:

As a developer in the enterprise DAM software business, I’ve been doing a lot of integration work – with news agency data and newspaper editorial systems, CMS, syndication, image databases and so on. In the early years, it was a joy to see ugly text formats disappear: Almost everyone switched to XML sooner or later. This made parsing and understanding the data so much easier. And we didn’t just consume XML; very early we went all XML for our APIs and data exports. (I alone am guilty of inventing more than ten simple XML formats in the last decade.)

The explosion of custom XML vocabularies (and XML-based standards) has its drawbacks, of course. Back in 2006, Tim Bray wrote in Don’t invent XML languages: “The smartest thing to do would be to find a way to use one of the perfectly good markup languages that have been designed and debugged and have validators and authoring software and parsers and generators and all that other good stuff.”

I don’t think that in the short term, everyone agreeing on just a handful of vocabularies means a lot less work for developers. Developers would still understand and use the vocabulary differently. My main gripe is that someone’s got to write code if a non-developer (i.e., someone not happy reading raw XML) simply wants to access, read and search the data. This hurts communication and quality control and bug hunting in lots of projects (in ours, at least).

Transparent, visible, browseable, searchable: That’s how I increasingly want the data our software receives, and that it emits. So I’ve started playing with XHTML+RDFa. Tim Bray again: “If you use XHTML you can feed it to the browsers that are already there on a few hundred million desktops and humans can read it, and if they want to know how to do what it’s doing, they can “View Source”—these are powerful arguments.” I’d like to add that using HTML opens the data up to Web search engines, and using XHTML specifically allows us to keep working with the fine toolset of XSLT, XPath and xmllint. (See also my post on Linked Data for better image search. And Jon Moore who started it all, watch his talk on Building Hypermedia APIs with HTML!)

This week, a customer requested that I deliver a data dump (a few days worth of newspaper articles and images) to another software vendor. Which format exactly wasn’t important yet. So I took the occasion and had a first shot at modelling the content stored in our DC-X DAM as XHTML with metadata and data structures expressed as RDFa within. This was a bit more complicated than expected: RDFa feels relatively complex (there’s various ways to mark up a statement), and I had to think a lot about the metadata schema (I tried to use schema.org types and properties where applicable). 

I created one HTML file for each newspaper page, and one (hyperlinked) file for each story on that page. I ended up with relatively simple RDFa, using only these HTML attributes so far: content, datatype, prefix, property, resource, typeof. (The RDFa / Play visualization was quite helpful, by the way.) I avoided nesting objects: The simple XSLT I built to prove that the data can be easily converted searches for properties recursively, to remain independent of the HTML markup (example: <xsl:for-each select=".//*[@property='schema:datePublished']">), and got confused if objects were nested.

It feels great that the customer (and I) can easily view that data in a Web browser. But I’m an RDF newbie so the resulting RDFa source is rather ugly, and lots of things are still missing. If I find a way to publish samples on the Web, I’ll post about it here and would love your feedback! (It feels strange to advocate RDFa, by the way, as I still dislike the RDF data model and prefer Topic Maps instead…)

Tue, 25 Jun 2013 13:04:30 +0000

Laurence Hart: Box Isn’t Disrupting Because of the Cloud

Laurence Hart – Box Isn’t Disrupting Because of the Cloud:

“Box is disrupting because they focus on the people using the application. SaaS is the the disruptive delivery mechanism that enables the spread of their solution.

All IT vendors are being disrupted in this fashion, not just Content Management. Ease-of-use is driving adoption in a viral nature that is almost unheard of in the space.”

Tue, 18 Jun 2013 22:04:11 +0000

Short links (2013-06-18)

Tue, 18 Jun 2013 21:39:53 +0000

A trend towards reusable UI components in Web apps

In Web application development, I’m seeing a trend towards reusable components for building the user interface. The idea isn’t new (see MashupsPortletsWeb Parts or jQuery Plugins): Make it easy to reuse ready-made UI elements built by different developers (e.g. a form field with autocomplete functionality, a date picker, a tree view, a dialog) in your Web application. That should save a lot of developer time.

But in the last years, lots of Web apps (including ours) committed to fat frameworks (Ext JS or YUI 2) which promised rapid development and a huge set of ready-made widgets. The first 60% of the app actually were developed rapidly, but then you were stuck: Extending the framework yourselves was hard, and swapping in widgets from other frameworks and libraries was ugly or impossible. To quote Dr. Axel Rauschmayer in Google’s Polymer and the future of web UI frameworks: “Currently, frameworks are largely incompatible: they usually come with their own tool chain, inheritance API, widget infrastructure, etc.”

I’m glad that this era is ending, and lighter approaches are emerging that focus on simple reusability. Just in time for the new Web app interfaces I’m going to build this year! I’ve written down my thoughts on JavaScript UI components already, so what follows is a few links that illustrate the broader trend.

Most prominently, the official W3C Web Components: “Web Components enable Web application authors to define widgets with a level of visual richness and interactivity not possible with CSS alone, and ease of composition and reuse not possible with script libraries today.” Watch the Web Components: A Tectonic Shift for Web Development video for an in-depth technical introduction.

Pete Hunt from Facebook – Why did we build React?: “React is a library for building composable user interfaces. It encourages the creation of reusable UI components which present data that changes over time.”

Flight by Twitter: “Flight is a lightweight, component-based JavaScript framework that maps behavior to DOM nodes. […] Components do not engage each other directly; instead, they broadcast their actions as events which are subscribed to by other components.”

Henri Bergius – Writing reusable, multi-platform JavaScript with Component: “With Component you can easily write and distribute reusable JavaScript modules, including user interface components that may include HTML templates and CSS.”

Making components interoperable (especially event handling, CSS/looks, consistent behaviour) is hard, there will always be elements that don’t go together well. But a simpler, more accessible approach to component building and packaging should make the lives of Web developers easier. I’ll try to share what I learn…

Update: Here’s what I learned so far – A simple JavaScript component architecture (first draft). And why I think UI components are very important: Web app interoperability – the missing link

Fri, 14 Jun 2013 20:39:55 +0000

Short links (2013-06-12)

Wed, 12 Jun 2013 07:27:53 +0000

Short links (2013-06-05)

Wed, 05 Jun 2013 19:50:27 +0000

Linked Data for public, siloed, and internal images

Ralph Windsor discusses my previous blog post on DAM News – Applying Linked Data Concepts To Derive A Global Image Search Protocol. He finds better words than I did, rephrasing my suggestion as “a universal protocol where images get described like web pages (HTML) so you can crawl them using search engine techniques”.

Ralph points out that large commercial image sellers might not want to participate in an open network: “Allowing their media out into the open for some third party to index – who they probably regard with wary suspicion (e.g. Google) is likely to be a step too far.” Maybe. Although they’ll go where the customers are – a Google Images search for “airport hamburg 92980935” turns up Getty Images image #92980935, so I assume that Getty Images wants Google to crawl their database. If an open image network emerges on the public Web, the commercial platforms will want to become a part of it once it reaches critical mass. What’s more, one of them could even embrace the change and start building the best image search engine that crawls the Web! (A bit like the Getty Images Flickr cooperation but without the need to copy the images over into their database.) 

But “out in the open” is an important point: Many images (and other content types) will always be restricted to limited groups of users. Still, this is no reason to invent a complicated API for accessing them: In intranets, lots of non-public documents are available as HTML, allowing users and internal search engines to easily access them. You can do the same for image metadata – restrict access to the local network, require username and password (or API key, authorization token etc.) as you see fit, but serve it to authenticated search engines (and users) as HTML + RDFa anyway.

A Web of images (to paraphrase Mike Eisenberg) with rich metadata that’s easy to read for machines and humans? I have no idea whether we’ll actually get there in the near future, but that’s what we should aim for!

Wed, 05 Jun 2013 21:21:42 +0000