2013-05-08

Image metadata on the Web: URL as identifier

Before you start thinking about common metadata for your images (creator, date created, caption, license), first consider what I think is the most important piece of metadata: A unique identifier for your image. And please make it a URL. Why?

First, you want to avoid duplicates in search engine results. You’ll be using the same image on different Web pages, possibly with slight variations: Different sizes, file formats, or cropping. Which means that the URL to the image file is not the same. A unique identifier makes sure others can find out these are just renditions or variations of the same image. (Current image search engines often show lots of duplicates. If they don’t make use of our nice identifiers once we add them, we can always roll our own search engine… ☺ Yes, I’m serious.)

Second reason: A well-groomed image will have lots of metadata. Temporal, geographical, creator and licensor related, subject descriptions, licensing terms. You don’t want to add all this baggage to each Web page the image is used on, so you need a separate place to publish all the metadata for that image. And once you have it, it makes perfect sense use that place as the permanent home for your image and use its URL as the image’s unique identifier.

Suppose that you’re using that URL/identifier whenever you publish or distribute the image: You put it into your HTML, embed it into the image files, and make sure it doesn’t get lost if you register the image with a registry like PLUS or distribute it through third parties like Flickr or Getty Images. What have you just gained? Well, now you can remain the authoritative source of your image’s metadata! You can fix mistakes, add renditions or links or legal notes and change licensing terms at will because you’re in control of that URL. (Third parties probably won’t recognize your self-hosted metadata yet, but let’s move into that direction.) 

To practice what I preach, I have added an RDFa resource attribute to the HTML div containing the blog post’s photo (you might want to view the HTML source code of the previous post). An example:

<div resource="http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q" typeof="schema:ImageObject">
<img src="/device_strehle/dev1/2013/05-02/72/65/file69wpi6cfox11c7cgw70q.jpg" />
</div>

With this HTML markup, I’m also telling search engines that the referenced URL is about an image, using the schema.org ImageObject type. (I’m a newbie re schema.org and RDFa, suggestions for improvement are welcome!)

What if someone just downloads the image file, ignoring my lovingly-crafted HTML markup? I want them to see my URL as well. So I’m embedding it in the XMP-plus:ImageSupplierImageID metadata field of the JPEG file using ExifTool:

exiftool -XMP-plus:ImageSupplierImageID=http://www.strehle.de/tim/data/document/doc69wpi6bms01kix6470q IMG_1980.jpg

(This is just a first try, there’s probably other metadata fields I should write it to. I’m choosing this field for now because you can see and modify it in Photoshop: File / File Info… / IPTC Extension / Supplier’s Image ID.)

Note that the URL I’m pointing to doesn’t yet exist: I’ll create that page in the next step. For now, I have just added a unique identifier that looks like a URL (so the correct name is probably URI or IRI, can’t get used to that).

For reference, here’s a few other places that I don’t fully understand yet, but look like they should possibly also contain the URL/identifier if the image gets distributed in a suitable format:

EXIF ImageUniqueID. PLUS LDF Terms and Conditions URL / Licensor Image ID / Copyright Owner Image ID / Image Creator Image ID. ODRL Asset uid. schema.org url property. IPTC NewsML G2 newsItem guid attribute / web (Web address) element. PRISM url element. XMP xmp:Identifier / xmpRights:WebStatement / xmpMM:DocumentID. Dublin Core Metadata Element Set identifier. 

(I’m sure there’s more. Yes, this makes my head explode as well. Please tell me that it’s much simpler than that.)

What do you think? I’d love to hear your feedback (@tistre on Twitter; for e-mail addresses see my home page).

Wed, 08 May 2013 07:44:37 +0200
2013-01-09

D3.js – Data-Driven Documents

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.”

Wed, 09 Jan 2013 10:23:47 +0100
2012-08-09

Using HTML as the Media Type for your API

Jon Moore – Using HTML as the Media Type for your API:

“There are actually a variety of reasons I prefer using HTML:

rich semantics
hypermedia support
already standardized
tooling support”

Thu, 09 Aug 2012 09:32:50 +0200

Finally — an XML Markup Solution for Design-Based Publishers: Introducing the PRISM Source Vocabulary

Dianne Kennedy – Finally — an XML Markup Solution for Design-Based Publishers: Introducing the PRISM Source Vocabulary:

“Until the tablet-publishing tsunami hit, design-based publications were able to justify their labor-intensive design-based publication process.

[…] We have come to believe the Source is the Solution. We must capture and store platform-agnostic content as early as possible.

[…] Source content must be semantically rich enough to enable the publisher to select content and automate layout and delivery to a wide variety of publishing platform in platform-native formats.

[…] In order to refine what we mean by the generic term article, the PRISM Content Type Controlled Vocabulary has been developed. […] Some content types that describe the unit-of-storage include an advertisement, article, blog entry, book chapter, cover, masthead, introduction and navigational aid.

[…] The Where Used metadata block allows for usage tracking. […] PSV allows for tracking the platform and even the device where the content was published. PSV also allows for tracking the section or page of the publication where the content appeared. Altogether, PSV offers nearly 40 optional fields to describe where content was used.

[…] The Usage Rights metadata block provides optional metadata fields that can be used by publishers to track usage rights of content in a repository. The 15 optional metadata fields in this block are based on the PRISM Usage Rights Metadata Specification.

[…] Unlike EPUB3, PSV makes no extensions to HTML5 and has no restrictions. PSV recommends that the new HTML5 <article tag be used as the root element for any content unit.

[…] PSV recommends a number of PRISM semantic classes that you can use to qualify any HTML5 element. Examples include box, caption, dateline, credit, and pull quote.”

(Via Simon St. Laurent at O’Reilly Radar – Applying markup to complexity).

Thu, 09 Aug 2012 21:41:40 +0200
2012-02-16

rNews is here. And this is what it means.

Evan Sandhaus at New York Times Open – rNews is here. And this is what it means.:

“On September 21, the IPTC and Schema.org officially announced their work together.

So by October 2011, we had a supported standard for embedding publishing specific metadata into HTML documents. Now all we had to do was actually implement rNews on nytimes.com.

And that’s what we did.

[…] all you have to do is view source on any nytimes.com article published on or after January, 23 2012. In the HTML you will see new attributes like ‘itemtype’, ‘itemprop’ and ‘itemid’. If you paste an article URL into the Google Rich Snippets tool, you can see a parse of the structured data now embedded into every nytimes.com article.”

Thu, 16 Feb 2012 22:32:55 +0100
2011-11-29

Why We Removed the Wiki Markup Editor in Confluence 4.0

Atlassian Confluence team – Why We Removed the Wiki Markup Editor in Confluence 4.0:

“Wiki markup as a storage format hindered our ability to add new features, like merge table cells, that customers had been demanding. This is because Wiki Markup is a very limited subset of XHTML and because any new editor feature had to be built twice...once in the RTE and once in the Wiki Markup Editor. We also had a lot of bugs when toggling between the two edit modes. We knew for some time that we'd need to unify the dual-RTE and Wiki Markup Editors into one simple-yet-capable editing experience and store Confluence content in a more extensible storage format – i.e. XHTML.”

Tue, 29 Nov 2011 21:05:15 +0100
2011-11-23

W3C Ontology for Media Resources 1.0

W3C Candidate Recommendation Ontology for Media Resources 1.0 (July 2011):

“The intent of this vocabulary is to bridge the different descriptions of media resources, and provide a core set of descriptive properties. This document defines a core set of metadata properties for media resources, along with their mappings to elements from a set of existing metadata formats.”

Mapped metadata standards: CableLabs 1.1, DIG35, Dublin Core, EBUCore, EXIF 2.2, ID3, IPTC NewsML-G2, LOM 2.1, Media RSS, MPEG-7, OGG, QuickTime, DMS-1, TTML, TV-Anytime, TXFeed, XMP, YouTube

Example XML for most standards can be viewed in the testsuite.

(Via Johannes Schmidt.)

Wed, 23 Nov 2011 08:47:39 +0100
2011-06-05

The Good, the Bad, and the Ugly of REST APIs

George Reese at O'Reilly Community – The Good, the Bad, and the Ugly of REST APIs:

"The data in your API calls should not look like highly normalized representations of database tables. They should represent a model of the data in a way that makes sense to API consumers. When you map APIs to your data/object model, you often end up with a chatty API."

Sun, 05 Jun 2011 22:33:44 +0200
2011-01-04

What will become of Twitter?

Dave Winer – What will become of Twitter?:

"Twitter is a wonderful solution to many of the problems we had with RSS, most importantly, how to go from the impulse to subscribe to having actually subscribed. In Twitter it's one click. In RSS, it's an unpredictable number of complex clicks. That in a nutshell is why Twitter blossomed.

[…] It always takes longer than you think it should, but eventually the open formats and protocols replace the systems built on corporate training wheels."

Tue, 04 Jan 2011 21:59:36 +0100

In defense of RSS

Seth Godin – In defense of RSS:

"RSS is quiet and fast and professional and largely hype-free. Perhaps that's why it's not the flavor of the day."

Tue, 04 Jan 2011 21:46:36 +0100
2010-10-17

REST in peace, SOAP

Pingdom – REST in peace, SOAP:

"Looks like the tide of the web API protocol war (if there ever was one) has shifted firmly in REST’s favor while SOAP has been forced back. Web developers have cast their votes, they want RESTful APIs."

(Via Tim Bray.)

Sun, 17 Oct 2010 21:48:24 +0200
2009-12-22

The Best and the Worst Tech of the Decade

James Turner at O'Reilly Radar – The Best and the Worst Tech of the Decade:

"SOAP was a particularly egregious failure, because it was sold so heavily as the final solution to the interoperatibility problem. The catch, of course, was that no two vendors implemented the stack quite the same way, with the result that getting a .NET SOAP client to talk to a Java server could be a nightmare. […] And the WSDL files that define SOAP endpoints are unreadable and impossible to generate by hand."

Tue, 22 Dec 2009 22:05:22 +0100
2009-10-20

PubHubSunday

Tim Bray – PubHubSunday:

"I see PubSubHubbub, as much as anything, as an attempt to capture Twitter’s pattern of information flow in a reproducible, interoperable way.

[…] Hooking up a publishing system to the PubSubHubbub machinery is damn easy; I know because I just did it. You have to put <link> element(s) in your Atom feed pointing at one or more hubs that will be aggregating you. Then, when you update your site, you need to ping the hub(s) using HTTP POST."

Tue, 20 Oct 2009 14:33:44 +0200
2009-10-07

View Source Tutorial: Fancy Web Page Using HTML5, CSS, and SVG

Ajaxian – View Source Tutorial: Fancy Web Page Using HTML5, CSS, and SVG:

"I love this example because it shows a few things. First, that SVG is definitely not dead; this works on every browser but IE -- that doesn't sound like a dead technology to me. Second, this example is much better done with SVG than the Canvas tag."

Wed, 07 Oct 2009 20:59:34 +0200
2009-10-06

Announcing Custom Times Feeds

Tom Jackson at the New York Times – Announcing Custom Times Feeds:

"About a year ago, we launched the Times Article Search API and the TimesTags API, two systems that allow developers to search our extensive collection of articles and to identify the canonical terms we use to describe them. Together, these APIs provide developers with the means to accurately find articles that are relevant to nearly any subject.

But there’s a problem with these (and most) APIs: they’re inherently narrow in their reach. The methods for using them are indecipherable for the average (non-developer) Times reader, and the data returned are formatted in a way that’s specific to each API, limiting their use across the Web. Enter the solution: custom RSS feeds."

Sounds to me like they should have based their Search API on OpenSearch: they would have gotten custom feeds for free…

Tue, 06 Oct 2009 12:36:00 +0200
2008-11-28

REST APIs must be hypertext-driven

Roy Fielding – REST APIs must be hypertext-driven:

"A REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API). From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations or implied by the user’s manipulation of those representations."

Fri, 28 Nov 2008 12:33:14 +0100
2008-07-08

Atomic Monday

Tim Bray – Atomic Monday:

"To post an image (or any other bit-blob) with Atompub, you HTTP-POST it; the server stores it and creates a synthetic Atom entry for metadata about it. Then if you want to update the metadata, you have to PUT that. So Joe Gregorio, based on his work at Google, is proposing “atom-multipart”; the idea is use pack up your bit-blob and an Atom entry full of metadata, and push ’em at the server in a MIME multipart package.

Everyone seems to like the idea, the Atom-protocol mailing list is chewing it over, the IETF seems to think it’s appropriate for the standards track, and I’ve volunteered to be the consensus referee."

Tue, 08 Jul 2008 10:57:26 +0200
2008-04-03

And now the appeals and reactions while OOXML sits on hold

Georg Greve at Groklaw - And now the appeals and reactions while OOXML sits on hold:

"Ha! Caught some of you. Because some of you *did* think Microsoft was changing and getting more open and was wanting to build bridges to FOSS, etc. I know you did. I hoped for a while myself. Well, take a look at the evidence splayed out before us on the ISO table. It speaks. And what it says is, "There is no new Microsoft."

And so we need to get smarter. Make the division more clear. People will choose well, given a clear choice. Firefox and Ubuntu and Red Hat and others have demonstrated that. There is no need to compromise. And if you are tempted by the money, think about the rest of us, will you? Look at ISO. Do you want to be like that?

Anyone, then, from this day forward who is naive enough to believe a single word from Microsoft needs to see a doctor right away. That is the single most important positive result from this OOXML process, as far as I'm concerned. Now we know."

See also David DeJean at Computerworld – Microsoft wins this OOXML battle, but loses the war

Thu, 03 Apr 2008 23:14:16 +0200

OOXML's (Out of) Control Characters

Rob Weir on the Microsoft Office XML format - OOXML's (Out of) Control Characters:

"Let's now look at how OOXML defines the semantics of its ST_Xstring type:

”ST_Xstring (Escaped String) - String of characters with support for escaped invalid-XML characters. For all characters which cannot be represented in XML as defined by the XML 1.0 specification, the characters are escaped using the Unicode numerical character representation escape character format _xHHHH_, where H represents a hexadecimal character in the character's value. […]”

In other words, although ST_Xstring is declared to be a restriction of xsd:string it is, via a proprietary escape notation, in fact expanding the semantics of xsd:string to create a value space that includes additional characters, including characters that are invalid in XML.

[…] The reader might think that I exaggerate the importance of this, that surely ST_Xstring is only used in OOXML in edge cases, in rare, compatibility modes. We wish that this were true. However, a look at the DIS 29500 shows that ST_Xstring is pervasive, and in fact is the predominant data type in SpreadsheetML, used to express the vast majority of spreadsheet content, including cell contents, headers, footers, displays strings, error strings, tooltip help, range names, etc. Any application that operates on an OOXML spreadsheet will need to deal with this mess."

A commenter says: "I thought OOXML stood for "optionally open XML" but it looks to me it actually is a recursive acronym: OOXML Obviously ain't XML"

Thu, 03 Apr 2008 09:29:35 +0200
2008-02-19

Saved by xmpp4moz!

Bolinfest - Saved by xmpp4moz!:

"I downloaded the SamePlace Suite Firefox extension, which is a small suite of applications built on top of xmpp4moz. I fired it up and discovered that it contained a nice Jabber client written in XUL with explicit support for Google Talk. The only remaining question was: how did it work?"

Tue, 19 Feb 2008 16:54:36 +0100
2008-02-08

The future of XML

Elliotte Rusty Harold at IBM developerWorks - The future of XML:

"Of course, the most important conversion isn't from OpenDoc to OOXML or vice versa: it's a down conversion from either OpenDoc or OOXML to XHTML. The HTML exporters in OpenOffice and Microsoft Office are uniformly atrocious. Look for third-party developers to pick up the slack. Most important, look for individual corporate developers and webmasters to begin publishing custom templates for their sites. This will enable regular folks to write in Microsoft Word as they're accustomed to doing and then upload their musings straight into the local content-management system. Editing and reviewing tools can be built right in.

[...] Traditionally, you see two hard problems in training non-techies to write for the Web: teaching them semantic markup and showing them how to use FTP. (Remember, many nontechnical users can't even use the standard File Open dialog box. They store everything in the My Documents folder or on the desktop. They're lost if they accidentally put a file somewhere else. Programmers understand hierarchies, but many users don't think that abstractly.)

XML-enabled word processors like OpenOffice and Microsoft Word solve the first problem. The Atom Publishing Protocol solves the second. APP will do to do for Web authoring what HTTP did for Web browsing: provide a standard protocol that a variety of independent clients and servers can use to communicate without prior agreement or a shared conceptual model.

[...] Query is finally ready for production, and APP is ready to break out. If I was looking to invest money or time in XML, these are the technologies I'd focus on. The world might not need yet another content-management system, blog engine, or bulletin board; but it absolutely could use each of these if they stored and searched their content with a native XML database and published to it with APP."

Fri, 08 Feb 2008 16:24:19 +0100
2008-01-16

Atom Is The New JCR

Adrian Sutton - Atom Is The New JCR:

"When the Java Content Repository (JCR) standard first came out it was supposed to bring in a new era of compatibility between content repositories and put an end to the content silo. There was, and still is, a lot of talk about it and just about everyone added JCR compliance to their marketing materials. [...] There are a few CMSs around that do have good JCR support - Alfresco for example - but they're few and far between and even with that, there isn't a lot of people taking advantage of that support and the standardization of the repository interface.

Then along came Atom which is all about remote access and manipulation of data and missing probably 90% of the functionality that JCR offers. It really isn't a competitor to JCR at all and yet it's doing more to break down content silos than JCR ever has.

[...] Having Atom support in your product, serving and consuming as necessary is becoming an extremely powerful feature."

(Via Sam Ruby.) 

Wed, 16 Jan 2008 21:42:02 +0100
2008-01-02

XML Schemas: guaranteed non-interoperability as a design methodology?

Rick Jelliffe at the O'Reilly XML Blog - XML Schemas: guaranteed non-interoperability as a design methodology?:

"Why not? Because, as far as I can make out, the idea that we will all be better off if we pretend that XML Schemas is a unified and whole specification, one size that can fit all, then somehow it will magically happen. But fantasy is a really poor substitute for reality. Time and time again I have seen clients happy about XML Schemas and its promises, only to have their hopes dashes as they realize that as soon as they need to start deploying they have to use subsets and there is no support from “standards” to help interoperability."

Wed, 02 Jan 2008 10:07:14 +0100
2007-11-22

WS-dämmerung

Via Tim Bray - WS-dämmerung

James M. Snell -  Notes:

"Those who are familiar with my history with IBM should know that I was once a *major* proponent of the WS-* approach. [...] I was involved in most of the internal efforts to design and prototype nearly all of the WS-* specifications. However, over the last two years I haven’t written a single line of code that has anything to do with WS-*. The reason for this change is simple: when I was working on WS-*, I never once worked on an application that solved a real business need. Everything I wrote back then were demos. Now that I’m working for IBM’s WebAhead group, building and supporting applications that are being used by tens of thousands of my fellow IBMers, I haven’t come across a single use case where WS-* would be a suitable fit. In contrast, during that same period of time, I’ve implemented no fewer than 10 Atom Publishing Protocol implementations, have helped a number of IBM products implement Atom and Atompub support, published thousands of Atom feeds within the firewall, etc."

Dare Obasanjo - WS-* is to REST as Theory is to Practice:

"My movement towards embracing building RESTful Web services from being a WS-* advocate is based on my experiences as someone who worked on the fundamental building blocks of these technologies and then as someone who became a user of these technologies when I moved to MSN Windows Live. The seeds were probably sown when I found myself writing code to convert Microsoft’s GetTopDownloads Web service to an RSS feed because the SOAP Web service was more complicated to deal with and less useful than an RSS feed. Later on I realized that RSS was the quintessential RESTful Web service and just asking people “How many RSS feeds does Microsoft produce?” versus how many SOAP endpoints does Microsoft expose is illuminating in itself." 

Thu, 22 Nov 2007 10:15:11 +0100
2007-09-11

A conversation with Rohit Khare about syndication-oriented architecture

Jon Udell - A conversation with Rohit Khare about syndication-oriented architecture:

"Start by “RSSifying” everything in sight. Then flow all the feeds through a “syndication bus”."

Tue, 11 Sep 2007 21:13:44 +0200
2007-09-10

Update to libxml2 in PHP - progress hath been acquired

Greg Beaver - Update to libxml2 in PHP - progress hath been acquired:

"I am abandoning the creation of a relax NG schema in favor of the battle-tested xsd. The error messages for xsd validation are far clearer than the rng ones."

W3C Schema/Relax NG/DTD seem to be totally useless in PHP, help?:

"Now that I am working on the PHP 5+ implementation of Pyrus, the first thing I thought I might do is create a Relax NG schema that the PHP libxml can handle. After an entire day of fighting with the thing, I've managed to discover more than 10 simple and valid Relax NG schema that simply don't work with the version of libxml distributed with PHP 5.2.3. In addition, with helpful error messages like "Expecting name, got nothing here," even with the use of libxml_use_internal_errors() I find the error reporting to be excruciatingly useless."

Mon, 10 Sep 2007 09:58:37 +0200
2007-08-28

JavaScript Jabber Client Library

"JSJaC is a jabber client library written in JavaScript to ease implementation of web based jabber clients."

There's a separate page about the 1.0 branch

Tue, 28 Aug 2007 10:42:34 +0200
2007-08-27

The fall of the Desktop and the File and the rise of Topical Interfaces and Topical Documents

Rick Jelliffe at XML.com - The fall of the Desktop and the File and the rise of Topical Interfaces and Topical Documents:

"The rise of Topics represents a great challenge to operating system and desktop suite vendors. When we look at Windows, or Mac or Linux window managers, we see that they really interact with the user at the wrong level. They say that the topic the user is interested in is applications and files. But how many people nowadays start their computer interaction with a web browser pointed to Google? There are still people whose organizing topic of interest in their computer interaction is the file or application, of course, but they have been swamped by people who are interested in the topic."

Mon, 27 Aug 2007 22:15:41 +0200
2007-08-13

InDesign CS3 and XML Authoring: Could be Good

Eliot Kimber - InDesign CS3 and XML Authoring: Could be Good:

"The main gotcha here is that InDesign is sensitive to newlines in the XML data, because newlines trigger the application of paragraph styles. What I've found so far is that you have to manage the text content very carefully so that you only emit newlines at true paragraph boundaries.

[...] 6. Switch to InDesign and bring up the Links pallet. In that you'll find your XML document listed. Select it and click the "update link" button. Magically, your XML changes are re-imported into InDesign and the styles applied.

Hey presto! Immediate, easy, convenient pagination of XML using InDesign. Something that was not immediate, easy, or convenient with CS2."

Mon, 13 Aug 2007 16:18:45 +0200
2007-08-08

Introducing OpenSearch

Uche Ogbuji at XML.com - Introducing OpenSearch:

"Search and web feeds go together pretty naturally, as anyone who has set up some kind of vanity search feed knows. [...] Rather than having to poll the search engine yourself and having to remember which results you have seen, your reader will simply alert you when there are new results. This simple but very useful concept is the core idea behind the OpenSearch specification."

Wed, 08 Aug 2007 16:17:25 +0200
2007-06-27

mod_atom

Tim Bray - mod_atom:

"This is a stripped-down implementation of the server side of the Atom Publishing Protocol as an Apache module, implemented in C."

Wed, 27 Jun 2007 11:59:23 +0200
2007-02-14

Introducing RDFa

Bob DuCharme - Introducing RDFa:

"For a long time now, RDF has shown great promise as a flexible format for storing, aggregating, and using metadata. Maybe for too long—its most well-known syntax, RDF/XML, is messy enough to have scared many people away from RDF. The W3C is developing a new, simpler syntax called RDFa (originally called "RDF/a") that is easy enough to create and to use in applications that it may win back a lot of the people who were first scared off by the verbosity, striping, container complications, and other complexity issues that made RDF/XML look so ugly."

Wed, 14 Feb 2007 23:58:49 +0100
2006-12-14

Validation considered harmful

Mark Baker - Validation considered harmful:

"A good rule of thumb in document design is to avoid making assumptions about what won’t be there in the future, and a rule of thumb for software is to defer checking extension fields or values until you can’t any longer. On the Web, you need to be able to process messages from the future."

Thu, 14 Dec 2006 07:41:55 +0100
2006-12-11

Adobe MARS: Looks Interesting

Eliot Kimber - Adobe MARS: Looks Interesting:

"MARS is an XML-based format that is intended as a functional replacement for PDF. It's not really accurate to call it an XML version of PDF because it's not a simple transliteration of PDF into tags (which could be done easily enough) but a ground-up exercise in designing and XML-based scheme for doing what PDF does.

[...] MARS tries to use standards as much as it can and it seems to do so to a remarkable level of completeness. It uses SVG for representing each page, supports the usual standards for media objects (bitmaps, videos, etc.). Uses Zip for packaging, and so on." 

Mon, 11 Dec 2006 12:18:02 +0100
2006-12-06

Microformats Icons

Wolfgang Bartelme  - Microformats Icons:

 "As Microformats have gained much popularity over the last year we thought it was time to standardize the way they are represented on a website. So we created the Microformats Icon Set. The starter set contains icons for hCal, hResume, hCard, XFN and a generic TAG icon."

Wed, 06 Dec 2006 23:52:04 +0100
2006-12-01

They can’t hear you

Pete Lacey - They can’t hear you:

"Maybe you don’t work for or with a Global 2000 company, so I’ll let you in on a little secret: They Can’t Hear You! That’s right, the CIOs, and Enterprise Architechts, and, yes, even the journeyman programmer employed by these firms have no idea that there’s even a discussion going on. [...] And the typical corporate technologist (broad strokes here, of course I don’t mean you) hasn’t considered REST and decided against it, they haven’t even heard the term. Ditto RelaxNG, Django, Atom, and everything else that makes the Web work and makes working with the Web easy.

[...] Business-oriented technologist refuse to beleive that simple solutions apply to their problem set. It’s always been complex before, and gosh darnit, it’s gonna stay that way. They want transactions, and reliability, and asynchronous messaging, and orchestration, and everything else. If it doesn’t look like Rendezvous or Tuxedo or BizTalk, then it can’t be a business grade solution, therefore it must be some toy."

Fri, 01 Dec 2006 10:58:08 +0100
2006-11-28

Choose RELAX Now

Tim Bray - Choose RELAX Now:

"Everybody who actually touches the technology has known the truth for years, and it’s time to stop sweeping it under the rug. W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs."

Tue, 28 Nov 2006 10:14:29 +0100
2006-11-24

Microsoft XML Notepad 2007

Chris Lovett - XML Notepad 2007 Design:

"I finally got around to fulfilling a promise I made to a friend at MSDN. Back in 1998, we shipped an XML Notepad, written by Murray Low in C++. Later on it fell behind in support for XML standards and, because we didn't have time to fix it, we pulled it off MSDN. But Murray apparently did such a nice job that MSDN was inundated with requests to put the notepad back up, so MSDN asked me for a replacement.

I've been working on System.Xml in C# since 1999, so I figured I could crank out a replacement using the .NET Framework pretty quickly. Well, the problem was it was one of those side projects on my "one-day" list — you know how that goes!"

Fri, 24 Nov 2006 17:11:30 +0100
2006-11-17

The S stands for Simple

Pete Lacey - The S stands for Simple:

"SOAP Guy: On the bright side, nobody uses the Doc/Lit style anymore. In order to get transport independence back we’re all using wrapped-doc/lit now. Doesn’t that sound cool: wrapped-doc/lit?

Developer: What’s that?

SG: Well, it’s just like Doc/Lit, but you take the whole message and wrap it in an element that has the same name as the operation. Now the operation name is back in the message where it belongs.

Dev: Okay, where’s the spec on this?

SG: Oh, there is no spec. This is just what Microsoft seems to be doing. Looked like a good idea, so now all the cool kids are doing it."

Fri, 17 Nov 2006 08:29:59 +0100
2006-11-08

The Next Web?

Simon St. Laurent at XML.com - The Next Web?:

"Developers who craft smart APIs on their servers for use by AJAX-based web pages can then expose those APIs to other developers, getting the benefits of better interfaces for users who use web browsers to consume the data and for users who have their own custom programs consuming the data. Depending on how carefully the developer models AJAX transactions on traditional web HTTP transactions, these services even look a lot like the REST approach proposed earlier for web services."

Wed, 08 Nov 2006 16:32:00 +0100

Web apps, just give me the data

Jon Udell at InfoWorld - Web apps, just give me the data:

"Scraping data off Web pages can be effective, but it’s far from ideal. Although we think of the Web as a rich trove of data, the pickings are depressingly slim if you want to transform or recombine that data. And there’s no good reason why that should be so. It’s easy to make data available for reuse by human analysts or automatic services."

Wed, 08 Nov 2006 16:12:14 +0100
2006-11-07

Trade-offs

Sam Ruby - Trade-offs:

"Contrary to what some will lead you to believe, I submit that it is possible for a query to be simultaneously SOAP, REST, RPC, POX, and transmitted using HTTP POST. And all this could be done with or without a WSDL document or an XML schema."

Tue, 07 Nov 2006 20:54:56 +0100
2006-10-29

Windows Live Barcode

Imran Ali - Windows Live Barcode:

"I’ve long thought the potential of 2D codes, like QR, Semacode and others, was enormous - but very few handsets are equipped, by default, with the capability to scan codes. Consequently, we see very few codes embedded in online services or the physical world. Microsoft’s move could help to kick start code usage… I have a niggling feeling that there’s a compelling intersection of microformats and 2D codes."

Sun, 29 Oct 2006 23:37:21 +0100
2006-09-17

I Think, Therefore I eXist ...

Kurt Cagle - I Think, Therefore I eXist ...:

"eXist exposes a WebDAV interface that any WebDAV enabled software can use to both retrieve and write content from/to the database. For instance, from Oxygen (my all time favorite XML editor) I can select Open URL, pass the appropriate URL (http://localhost:8080/exist/webdav/db/ on the default) and you can then open XML “files” from the database, work on them, and save them, all without realizing that these are not files at all. From both Internet Explorer and Konqueror, you can also create Folders and drag XML content into them, creating collections and populating them very simply."

Sun, 17 Sep 2006 23:37:51 +0200
2006-09-14

“Publish” Everywhere

Tim Bray - “Publish” Everywhere:

"Here’s the Atom dream: A “Publish” button on everything. On every word processor and email reader and web browser and cellphone and PDA and spreadsheet and photo-editor and digicam and outliner and sales-force tracker. Really, everywhere. If it doesn’t have a “Publish” button, it’s broken."

Thu, 14 Sep 2006 23:05:09 +0200

atomic

"atomic is an Atom protocol client implemented firefox extension. It can communicate with any number of different Atom protocol servers that support introspection. [...]   To view the plugin, go to View->Sidebar->Atomic. [...] This plugin uses the tinyMCE editor for authoring XHTML."
Thu, 14 Sep 2006 13:37:45 +0200
2006-09-06

Office Open XML: Good or Evil?

Eliot Kimber - Office Open XML: Good or Evil?:

"Oh, and I hate MS Word with the fiery passion of a thousand burning suns. I'd sooner chew off my own arm than spend any time actually authoring words in Word. I've spent so many years authoring XML that having to deal with $*%&# like doing a backspace at the end of a paragraph destroys its formatting with no good way to get it back or the complete inability to do autonumbering and any other number of just stupid things that people tolerate day and after day for reasons that I can't understand and the egregious waste of productivity that I've observed in my own XML-steeped colleagues who are literally sitting next to me just makes me want to SCREAM. But that's just me."

Wed, 06 Sep 2006 22:14:50 +0200
2006-09-01

The internets are made of tubes

Avi Bryant - The internets are made of tubes:

"With Dabble, anyone can now import data from a feed, combine it with data from elsewhere, restructure and filter it as needed, and push it out as another feed so the process can repeat."

Fri, 01 Sep 2006 21:23:36 +0200

Using Service Data Objects to construct XML

Graham Charters at IBM developerWorks - Using Service Data Objects to construct XML:

"The code extract below shows how SDO can be used to load the XML schema and a quotes document and then add a new quote entry to that document."

Fri, 01 Sep 2006 10:28:46 +0200

XStandard

"XStandard is the leading standards-compliant plug-in WYSIWYG editor for desktop applications and browser-based content management systems (IE/Mozilla/Firefox/Opera/Safari/Netscape).

The editor generates clean XHTML Strict or 1.1, uses CSS for formatting, and ensures the clean separation of content from presentation. Markup generated by XStandard meets the most demanding accessibility requirements. The editor's cool features include drag & drop file upload, spell checking and an image library that integrates tightly with your CMS.

XStandard Lite is free for commercial use."

Or maybe one can add XStandard-like custom tags and attributes to TinyMCE?

Fri, 01 Sep 2006 00:08:14 +0200
2006-08-29

NOT Getting Started with PHP 5 SOAP

Not having played with PHP 5's native SOAP extension yet, I did expect it to work smoothly with the most simple application I could think of - querying Google via its SOAP Search API. Well...

I first compiled the latest PHP 5.1.6 with --enable-soap and downloaded the Google SOAP Search API developer's kit which contains their WSDL file, GoogleSearch.wsdl.

Running this example PHP code...

<?php $client = new SoapClient('GoogleSearch.wsdl'); try { $result = $client->doGoogleSearch( '[Secret Google key]', 'Tim Strehle', 0, 3 ); foreach ($result->resultElements as $resultElement) { print $resultElement->URL; } } catch (SOAPFault $f) { echo $f->faultstring . "\n"; } ?>

... produced a lovely error message:

tim@vm:/tmp>php test.php No Deserializer found to deserialize a ':filter' using encoding style 'http://schemas.xmlsoap.org/soap/encoding/'.

Here's the actual SOAP request PHP was sending:

<?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="urn:GoogleSearch" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <ns1:doGoogleSearch> <key xsi:type="xsd:string">[Secret Google key]</key> <q xsi:type="xsd:string">Tim Strehle</q> <start xsi:type="xsd:int">0</start> <maxResults xsi:type="xsd:int">3</maxResults> <filter xsi:nil="true"/> <restrict xsi:nil="true"/> <safeSearch xsi:nil="true"/> <lr xsi:nil="true"/> <ie xsi:nil="true"/> <oe xsi:nil="true"/> </ns1:doGoogleSearch> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

The error message did come from Google:

<?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <SOAP-ENV:Fault> <faultcode>SOAP-ENV:Client</faultcode> <faultstring>No Deserializer found to deserialize a ':filter' using encoding style 'http://schemas.xmlsoap.org/soap/encoding/'.</faultstring> <faultactor>/search/beta2</faultactor> </SOAP-ENV:Fault> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Obviously Google doesn't like the xsi:nil stuff created by PHP. Modifying those empty tags manually in an XML file...

<?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="urn:GoogleSearch" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <ns1:doGoogleSearch> <key xsi:type="xsd:string">[Secret Google key]</key> <q xsi:type="xsd:string">Tim Strehle</q> <start xsi:type="xsd:int">0</start> <maxResults xsi:type="xsd:int">3</maxResults> <filter xsi:type="xsd:boolean">false</filter> <restrict xsi:type="xsd:string"></restrict> <safeSearch xsi:type="xsd:boolean">false</safeSearch> <lr xsi:type="xsd:string"></lr> <ie xsi:type="xsd:string"></ie> <oe xsi:type="xsd:string"></oe> </ns1:doGoogleSearch> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

... and sending the SOAP request using curl finally produced correct results:

tim@vm:/tmp>cat test.curl header = "SOAPAction: urn:GoogleSearchAction" header = "Content-Type: text/xml" data = "@/tmp/test.xml" url = "http://api.google.com/search/beta2" tim@vm:/tmp>curl -K test.curl | xmllint --format - <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <ns1:doGoogleSearchResponse xmlns:ns1="urn:GoogleSearch" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <return xsi:type="ns1:GoogleSearchResult"> <directoryCategories xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns2:Array" ns2:arrayType="ns1:DirectoryCategory[0]"> </directoryCategories> <documentFiltering xsi:type="xsd:boolean">false</documentFiltering> <endIndex xsi:type="xsd:int">3</endIndex> <estimateIsExact xsi:type="xsd:boolean">false</estimateIsExact> <estimatedTotalResultsCount xsi:type="xsd:int">184000</estimatedTotalResultsCount> <resultElements xmlns:ns3="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns3:Array" ns3:arrayType="ns1:ResultElement[3]"> <item xsi:type="ns1:ResultElement"> <URL xsi:type="xsd:string">http://tim.digicol.de/</URL> <cachedSize xsi:type="xsd:string">4k</cachedSize> <directoryCategory xsi:type="ns1:DirectoryCategory"> <fullViewableName xsi:type="xsd:string"/> <specialEncoding xsi:type="xsd:string"/> </directoryCategory> <directoryTitle xsi:type="xsd:string"/> <hostName xsi:type="xsd:string"/> <relatedInformationPresent xsi:type="xsd:boolean">true</relatedInformationPresent> <snippet xsi:type="xsd:string">&lt;b&gt;Tim&lt;/b&gt; at his desk &lt;b&gt;Tim Strehle&lt;/b&gt; @ Digital Collections &lt;b&gt;...&lt;/b&gt; &lt;b&gt;Tim&amp;#39;s&lt;/b&gt; Weblog &amp;middot; Ceterum censeo...&lt;br&gt; www.liederdatenbank.de, my personal project for building a database &lt;b&gt;...&lt;/b&gt;</snippet> <summary xsi:type="xsd:string"/> <title xsi:type="xsd:string">&lt;b&gt;Tim Strehle&lt;/b&gt; @ Digital Collections</title> </item> <item xsi:type="ns1:ResultElement"> <URL xsi:type="xsd:string">http://tim.digicol.de/weblog/</URL> <cachedSize xsi:type="xsd:string">33k</cachedSize> <directoryCategory xsi:type="ns1:DirectoryCategory"> <fullViewableName xsi:type="xsd:string"/> <specialEncoding xsi:type="xsd:string"/> </directoryCategory> <directoryTitle xsi:type="xsd:string"/> <hostName xsi:type="xsd:string"/> <relatedInformationPresent xsi:type="xsd:boolean">true</relatedInformationPresent> <snippet xsi:type="xsd:string">My linkblog: What I (&lt;b&gt;Tim Strehle&lt;/b&gt;) read on the web, on PHP. XML. Information Science&lt;br&gt; and Information Architecture... 2006-08-23 &lt;b&gt;...&lt;/b&gt;</snippet> <summary xsi:type="xsd:string"/> <title xsi:type="xsd:string">&lt;b&gt;Tim&amp;#39;s&lt;/b&gt; Weblog » Latest posts</title> </item> <item xsi:type="ns1:ResultElement"> <URL xsi:type="xsd:string">http://freshmeat.net/~tistre/</URL> <cachedSize xsi:type="xsd:string">12k</cachedSize> <directoryCategory xsi:type="ns1:DirectoryCategory"> <fullViewableName xsi:type="xsd:string"/> <specialEncoding xsi:type="xsd:string"/> </directoryCategory> <directoryTitle xsi:type="xsd:string"/> <hostName xsi:type="xsd:string"/> <relatedInformationPresent xsi:type="xsd:boolean">true</relatedInformationPresent> <snippet xsi:type="xsd:string">User info page for &lt;b&gt;Tim Strehle&lt;/b&gt;. Name: &lt;b&gt;Tim Strehle&lt;/b&gt;. User ID: #96744. Email: &lt;b&gt;tim&lt;/b&gt;&lt;br&gt; __at__ &lt;b&gt;strehle&lt;/b&gt; __dot__ &lt;b&gt;...&lt;/b&gt; &lt;b&gt;Tim Strehle&lt;/b&gt; didn&amp;#39;t post any article comments yet. &lt;b&gt;...&lt;/b&gt;</snippet> <summary xsi:type="xsd:string"/> <title xsi:type="xsd:string">freshmeat.net: User information</title> </item> </resultElements> <searchComments xsi:type="xsd:string"/> <searchQuery xsi:type="xsd:string">Tim Strehle</searchQuery> <searchTime xsi:type="xsd:double">0.020699</searchTime> <searchTips xsi:type="xsd:string"/> <startIndex xsi:type="xsd:int">1</startIndex> </return> </ns1:doGoogleSearchResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

I don't know enough SOAP so I cannot say whether this is Google's or PHP's fault, but it's definitely not the "just works" experience I'd expect from both of them. If SOAP's complexity itself is to blame, I may have been right to lean towards REST (or POX over HTTP) without ever having used much SOAP...

Update: "Optional fields cause problems", says Dare Obasanjo - ETech 2005 Trip Report: Building a New Web Service at Google... It turns out that filling in all optional parameters in the SOAP call makes the PHP script work:

<?php $client = new SoapClient('GoogleSearch.wsdl'); try { $result = $client->doGoogleSearch( '[Secret Google key]', 'Tim Strehle', 0, 3, false, '', false, '', '', '' ); ... ?>
Tue, 29 Aug 2006 00:24:28 +0200
2006-08-22

WP-APP: Atom Publisher Protocol for WordPress

Elias Torres - WP-APP: Atom Publisher Protocol for WordPress:

"Simply drop app.php into your root wordpress install directory and have some Atom/APP fun.

If you don’t have wordpress installed, I’ve setup a demo server for you to try it out. First, register to get your own account, since you won’t be able to post/put/delete entries without a valid username and password. The rest is all curl geekiness for those who care."

Tue, 22 Aug 2006 23:35:46 +0200
2006-08-11

Enhydra JaWE

"Enhydra JaWE (Java Workflow Editor) is the first open source graphical Java workflow process editor fully according to WfMC specifications supporting XPDL as its native file format."

Fri, 11 Aug 2006 16:44:55 +0200
2006-08-10

Solr: Indexing XML with Lucene and REST

Bertrand Delacretaz at XML.com - Solr: Indexing XML with Lucene and REST:

"Solr (pronounced "solar") builds on the well-known Lucene search engine library to create an enterprise search server with a simple HTTP/XML interface. Using Solr, large collections of documents can be indexed based on strongly typed field definitions, thereby taking advantage of Lucene's powerful full-text search features. This article describes Solr's indexing interface and its main features, and shows how field-type definitions are used for precise content analysis."

Thu, 10 Aug 2006 11:22:14 +0200
2006-08-08

WOA vs ROA

Sam Ruby - WOA vs ROA:

"There is a term that you won’t see in the body of Dion’s post. Or in the body of Alex’s. Or in the Wikipedia article on SOA.

That term is “hypertext link”. Or even the term “link”.

[...] The link is the glue that holds the web together. It is what differentiates the web from protocols like ftp that merely serve as access methods for documents.

The very notion of a link has become practically inexpressible and virtually unthinkable in the vernacular of SOA."

Tue, 08 Aug 2006 17:43:06 +0200
2006-08-02

Feed Access Control Standard for RSS and ATOM

Bloglines - Feed Access Control Standard for RSS and ATOM:

"We are proposing (and have implemented) an RSS and ATOM extension that allows publishers to indicate the distribution restrictions of a feed. Setting the access restriction to 'deny' will indicate the feed should not be re-distributed. In Bloglines, we'll use this to prevent the display of the feed information or posts in search results or any other public venue. If other readers and aggregators use the information in the same way, and publishers of feeds, including services that let users create feeds, implement this standard, we could make significant progress toward making feeds truly safe for non-public information. We think that's a pretty cool idea.

For technical details on the RSS and ATOM extension, refer to this document:

http://www.bloglines.com/about/specs/fac-1.0"

Wed, 02 Aug 2006 14:38:02 +0200
2006-07-20

XML Content Management the Dr. Macro Way: Simple Is Good

W. Eliot Kimber - XML Content Management the Dr. Macro Way: Simple Is Good:

"The key lessons I took away from this experience and that drive all my thinking about content management are:

1. Manage the XML source as versioned storage objects

2. Do all semantic processing, including link managing, metadata indexing, etc. as separate activities on top of or separate from the core storage

3. All of the complexity in XML content management is concentrated at the boundary between the repository and the outside world and that is where the system's complexity should likewise be concentrated.

[...] Can I implement all the functionality required using Subversion (nee CVS) and XSLT (possibly with a few extension functions to handle specialized business logic, such as connecting to another, pre-existing information system)?

That is, can I prove my understanding of the requirements and business processes through the implementation of a system using a brute force mechanism?

If the answer is yes, then the next question is, why don't you?"

Thu, 20 Jul 2006 23:43:32 +0200
2006-07-18

Why I Hate Microformats

Robert Cooper - Why I Hate Microformats:

"Yay, you have an iCal microformat in your page. You can use Trails, now to stick it right into your Google calendar. Neat.

The problem is, this is a serious abuse of HTML. The way you SHOULD have done this is:

    <html:div xmlns="http://www.w3.org/2002/12/cal/">
       <vevent>
       <dtstart>20060501</dtstart><html:abbr>May 1</html:abbr>
    ...

Then present your iCal entry with CSS. Yes, we have waited years and years and years for Microsoft to get off their rears and implement CSS with namespaces, which everyone else has had for years. However, IE7 is around the proverbial corner, and we should finally get the option to embed actual real data into our HTML pages and style it. There is no reason to use semantically incorrect HTML and beat up on the class attribute."

Tue, 18 Jul 2006 12:40:38 +0200
2006-07-17

A Week in the Valley: GData

Nathan Torkington - A Week in the Valley: GData:

"There's a huge move within Google away from SOAP and even REST-style ad hoc APIs and towards GData instead. The big point for me was that GData is just Atom/RSS for reading, Atom Publishing for writing, and A9 stored queries for searching."

Mon, 17 Jul 2006 21:45:14 +0200
2006-06-29

Freedom To Leave

Simon Phipps - Freedom To Leave:

"If "interoperability" meant "import only", I'd never feel safe trying new things so market growth and innovation would be inhibited. People who implement open standards like this are smart, because although they allow customers to leave for greener pastures they also allow them to return - I am still using Bloglines despite the appeal of BlogBridge - and the confidence I feel over "owning" my data makes me a much more interesting customer.

That feeling is caused by more than interoperability - it takes full substitutability for me to have the confidence to stay as well as the freedom to leave. "

Thu, 29 Jun 2006 11:08:53 +0200
2006-05-03

Google Data APIs

"The Google Data APIs ("GData" for short) provide a simple standard protocol for reading and writing data on the web. GData combines common XML-based syndication formats (Atom and RSS) with a feed-publishing system based on the Atom publishing protocol, plus some extensions for handling queries."

Update: Byrne Reese on Google Calendar and OpenSearch.

Wed, 03 May 2006 17:48:01 +0200
2006-04-21

Automatic feed enclosure download for backups?

"FeedStation allows you to download enclosures that appear in your NewsGator Online or FeedDemon feeds. " That's nice. Sounds like I can automate downloads for any kind of file...

Thinking beyond podcasting: I've got a couple of applications on the web (like this weblog) for which I can do automated (local) backups (via a shell script run from a cron job), and currently I manually copy them over to my Windows laptop - from time to time, when I remember doing so.

How about setting up an ultra-simple PHP script producing an Atom (or RSS) feed with enclosures pointing to the files in that backup directory?

When I think about this, shouldn't automatically getting the latest copy of my favourite software be as simple as subscribing to its "download"-feed (with enclosures)? (And a naive question, since I haven't really thought about this - what's the difference between "photocasting" and automatic enclosure downloads from image feeds into my picture directory?)

Fri, 21 Apr 2006 16:43:40 +0200
2006-03-27

Styles: Beyond WS and REST

Tim Bray - Styles: Beyond WS and REST:

"I think “Web Style” would be a better name than “REST”. [...] I think we should take the “Web Services” label into the jailyard, strap on a blindfold, give it a last cigarette, and shoot it. It doesn’t mean much any more, and to the extent that it does, it’s misleading: WS-* doesn’t have much of the Web about it."

Mon, 27 Mar 2006 10:37:59 +0200
2006-03-24

Hi-Rest and Lo-Rest, two broken halves of the tower of Babylon

jonnay - Hi-Rest and Lo-Rest, two broken halves of the tower of Babylon:

"All the HTTP conformance in the world wont mean a thing if your application stores client state on the server. You still won't be RESTful."

Fri, 24 Mar 2006 13:30:03 +0100
2006-03-19

The REST Elevator Pitch

Koranteng Ofosu-Amaah - The REST Elevator Pitch:

"I've recently been thinking about defining the hardest problems I've encountered in software engineering, my cursory top 10 list:

  • State
  • Caching
  • Latency
  • Concurrency
  • Search
  • Metadata
  • Persistence
  • The Holy Grail Of Extensibility
  • Structured data
  • Character encoding

Now I'm not a database person so I handwaved away all of those data peoples' worries in one word: persistence."

Sun, 19 Mar 2006 21:36:13 +0100
2006-03-14

Tonic: A RESTful Web App Development Framework

"Tonic is an open source less is more, RESTful Web application development and Web site management PHP script designed to do things "the right way", where resources are king and the framework gets out of the way and leaves the developer to get on with it."

Tue, 14 Mar 2006 00:20:59 +0100
2006-02-23

WS-Angst

Tim Bray on SOAP/WSDL vs. REST - WS-Angst:

"Me, I think the WS-stench of something WS-rotting from the WS-head down is becoming increasingly difficult to ignore."

Thu, 23 Feb 2006 10:56:47 +0100
2006-02-20

Versioning REST

Adam Kalsey - Versioning REST:

"One thing I dislike about URL schemes for versioning a web services resource is it feels decidedly un-RESTful. The URL of a resource shouldn’t change simply because the format of that resource’s representation does.

[...] In Tagyu, I’ve going to accomplish this by placing a <versions/> element in the XML file. This element will provide access to the previous, next, and latest resource representations as well as a reference for which one is being used currently. Instead of referring to versions with version numbers, I’m using the date that the version was released, simply because this date has more meaning than an arbitrarily-chosen version number."

Mon, 20 Feb 2006 12:18:56 +0100
2006-01-19

Don’t Invent XML Languages

Tim Bray - Don’t Invent XML Languages:

"Designing XML Languages is hard. It’s boring, political, time-consuming, unglamorous, irritating work. It always takes longer than you think it will, and when you’re finished, there’s always this feeling that you could have done more or should have done less or got some detail essentially wrong."

Thu, 19 Jan 2006 13:43:26 +0100
2006-01-18

PHP-OpenDocument Library

PHP-OpenDocument Library: "I wrote a small PHP library for manipulating OpenDocument files and have released it under Apache 2.0 License.

Its current features are:
* Create plain-text Text (.odt) documents.
* Create simple Spreadsheet (.ods) document."

Wed, 18 Jan 2006 09:25:45 +0100
2005-11-29

Prince

Prince is a computer program that converts XML into PDF documents. Prince can read many XML formats, including XHTML and SVG. Prince formats documents according to style sheets written in CSS.”

Tue, 29 Nov 2005 11:46:00 +0100
2005-11-01

Learning from THE WEB

Adam Bosworth at ACM Queue - Learning from THE WEB:

“Successful systems on the Web are bottom-up. They don’t mandate much in a top-down way. Instead, they control themselves through tipping points. For example, Flickr doesn’t tell its users what tags to use for photos. Far from it. Any user can tag any photo with anything (well, I don’t think you can use spaces). But, and this is a key but, Flickr does provide feedback about the most popular tags, and people seeking attention for their photos, or photos that they like, quickly learn to use that lexicon if it makes sense. It turns out to be amazingly stable.

[…] It is time that the database vendors stepped up to the plate and started to support a native RSS 2.0/Atom protocol and wire format; a simple way to ask very general queries; a way to model data that encompasses trees and arbitrary graphs in ways that humans think about them; far more fluid schemas that don’t require complex joins to model variations on a theme about anything from products to people to places; and built-in linear scaling so that the database salespeople can tell their customers, in good conscience, for this class of queries you can scale arbitrarily with regard to throughput and extremely well even with regard to latency, as long as you limit yourself to the following types of queries. Then we will know that the database vendors have joined the 21st century.”

Tue, 01 Nov 2005 22:10:00 +0100
2005-10-04

SPARQL: Web 2.0 Meet the Semantic Web

Kendall Clark - SPARQL: Web 2.0 Meet the Semantic Web:

"RDF is pretty foundational to the Semantic Web, and it's got a data model, a formal semantics, and a concrete serialization (in XML). What it didn't have till lately was a standard query language. Imagine relational algebra and RDBMSes without SQL. Pretty hard to imagine. So the SemWeb needed a SQL. It stood up the Data Access Working Group, which has been working for about 20 months and has come up with SPARQL - an RDF query language and protocol."

Tue, 04 Oct 2005 00:13:00 +0200
2005-09-28

HOWTO Avoid Being Called a Bozo When Producing XML

Henri Sivonen - HOWTO Avoid Being Called a Bozo When Producing XML:

“There seem to be developers who think that well-formedness is awfully hard - if not impossible - to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don’ts helps developers to move from the first group to the latter.”

Wed, 28 Sep 2005 01:54:00 +0200
2005-09-22

Dreaming of an Atom Store: A Database for the Web

Joe Gregorio at XML.com - Dreaming of an Atom Store: A Database for the Web:

“Imagine that you just have a huge glob of storage that you can store Atom Entries in, and which you can edit using the APP, and then search over using OpenSearch. That idea, that big blob of Atom Entries, all editable and searchable, is an Atom Store.”

Thu, 22 Sep 2005 12:11:00 +0200
2005-09-14

Docvert

This web service software takes multiple word processor files (typically .doc) and converts them to Oasis OpenDocument v1.0 format, and then optionally runs them through an XML pipeline. The result is returned in a .zip file.

Docvert builds upon OpenOffice.org because it has the best chance of dealing with the vagaries of the MS Word format.”

Wed, 14 Sep 2005 14:04:00 +0200
2005-09-08

WinFS and social information management

Jon Udell at InfoWorld - WinFS and social information management:

“I saw my first demo of Microsoft’s Cairo OFS (Object File System) back in 1993. It was briefly unveiled at the Professional Developers Conference that year, and then shelved. This week I installed the beta version of its successor, WinFS.”

Thu, 08 Sep 2005 16:27:00 +0200
2005-08-01

Respecting Lotus Notes

Jon Udell - Respecting Lotus Notes:

"Notes' blurring of the boundaries between document-oriented and record-oriented data was extraordinarily useful. One way to read the history of XML is as an effort (still in progress) to formalize a hybrid data model that embraces both perspectives."

Mon, 01 Aug 2005 23:03:08 +0200
2005-07-15

Atom 1.0

Tim Bray - Atom 1.0:

"It's cooked and ready to serve. There are a couple of IETF process things to do, but this draft is essentially Atom 1.0. Now would be a good time for implementors to roll up their sleeves and go to work."

Fri, 15 Jul 2005 10:02:30 +0200
2005-06-28

How do you design a remixable Web application?

Jon Udell - How do you design a remixable Web application?:

"A website that wants to be remixable will deliver content as XML and behavior as script. These aspects can be, and will be, combined server-side for Web 1.0 clients, but Web 2.0 clients will increasingly be able to do this processing for themselves. So there will be two ways to remix: by intercepting the server-side combination of XML content and scripted behavior, or by recombining on the client."

Tue, 28 Jun 2005 12:06:41 +0200
2005-06-24

Calling SOAP Servers from JS in Mozilla

Zachary Kessin at ONLamp.com: "This article shows how to set up a simple SOAP server in PHP and call it from JavaScript."

Fri, 24 Jun 2005 11:12:17 +0200
2005-05-31

Introduction to XFML

Peter Van Dijck at XML.com - Introduction to XFML:

"XFML is a simple XML format for exchanging metadata in the form of faceted hierarchies, sometimes called taxonomies. Its basic building blocks are topics, also called categories. XFML won't solve all your metadata needs. It's focused on interchanging faceted classification and indexing data."

Tue, 31 May 2005 14:24:13 +0200
2005-05-28

AJAX: redesign your PHP applications?

Björn Schotte - AJAX: redesign your PHP applications?

"First of all, XMLHttpRequest has a problem: in InternetExplorer, it doesn't work without ActiveX. This makes it pretty useless when being used in companies like HypoVereinsbank or Siemens (both are customers we're working for) where ActiveX has been disabled for security reasons.

Second, the fixed costs for a HTTP requests are changing: while AJAX technology is designed to load just the delta of the data, your application design has to be changed in order to keep the performance: don't include whole framework stuff while you receive a XMLHttpRequest that is only trying to receive three rows out of the database."

Sat, 28 May 2005 21:27:45 +0200
2005-05-27

Problem-first design

Dave Megginson - Problem-first design:

"How many developers do you know who complain about working nights and weekends manually entering connection information for thousands of publicly available web services? Given that there are, at most, a few dozen sites offering web services over the public web (and that's web services in the most general sense, including REST as well as SOAP), I'll guess that the answer is 'zero'.

So here's my suggestion: let's hold off on designing new specifications until there's a real problem to solve."

Fri, 27 May 2005 08:57:53 +0200
2005-04-27

Constructing or Traversing URIs?

In Joe Gregorio's XML.com "The Restful Web" column - Constructing or Traversing URIs?:

"We have all these resources in our system, yet how do we enable the URIs of those resources to be discovered? Part of our specification, and of our running system, is being able to navigate around those resources. There are two types of solutions available to us; URI Construction and Hypertext Navigation. Let's look at both of them carefully to learn about their advantages and disadvantages."

Wed, 27 Apr 2005 08:59:59 +0200
2005-04-23

Bosworth's Web of Data

At ONLamp.com, Daniel H. Steinberg summarizes Adam Bosworth's keynote at the MySQL Users Conference 2005:

"Adam Bosworth suggested that we "do for information what HTTP did for user interface." [...] As a result of a simple, sloppy, standards-based, scalable platform, we have information at our fingertips from Google, Amazon, eBay, and Salesforce. Bosworth's own company, Google, gets hundreds of millions of hard queries a day. He said they see it as putting Ph.Ds in tanks to drive through walls rather than around them.

In addition to the advantages in software, there have been great gains in hardware. Bosworth said that one million dollars buys you five hundred machines with 2TB of in-memory data, a PetaByte of on-disk data, and a reasonable throughput of fifty thousand requests per second. This amounts to one billion requests per day. Having this sort of power changes the way you think."

Sat, 23 Apr 2005 21:45:31 +0200
2005-04-15

MicroFormats

"microformats are:

* a way of thinking about data * design principles for formats * adapted to current behaviors and usage patterns * highly correlated with semantic xhtml, AKA the real world semantics, AKA lowercase semantic web, AKA lossless XHTML"

Take a look at the hCalendar example.

Fri, 15 Apr 2005 10:55:18 +0200
2005-04-12

Radical Simplification

Sam Ruby points to IBM's confession (and recommends PHP and other technologies):

"Application development using IBM programming models and tools is untenably complex. The Research Division's new Services and Software strategy includes a strong focus on radical simplification. [...] Over 70 people in IBM worldwide are currently participating in an effort to define the problem, and the scope of the solution, more precisely."

Tue, 12 Apr 2005 23:21:30 +0200
2005-04-06

Styles of Web application intermediation

Jon Udell at InfoWorld - Styles of Web application intermediation:

"Consider a purchase order represented as an XML document and governed by a policy that requires schema validity. We can enforce that policy either on the wire or on the desktop. One way buys you application-independent consistency. The other way lets you tailor your users' interactive experiences. These are complementary strategies.

Here's a less familiar but equally compelling scenario: You're leasing a Web-based application, it lacks a feature you need, and the developer won't cough it up in a timely manner. Because the application's user interface is delivered through the Web as XML packets alongside the protocol and data packets, you can tweak it globally or locally. Same benefits, same synergy."

Wed, 06 Apr 2005 22:41:26 +0200
2005-03-18

MS Ignoring developer demand for REST tools?

Microsoft's Mike Champion - MS Ignoring developer demand for REST tools?:

"Plain ol' XML over HTTP works just fine, and WS-* is overkill, in situations where:

* information is public and encryption/authentication are unnecessary; * all communication visible to the service consumer is done over one protocol, HTTP; * nothing terribly bad happens if a message is lost or duplicated; * and there are few demands for multi-part transaction management beyond what can be implemented with HTTP sessions or cookies."

Fri, 18 Mar 2005 00:21:39 +0100
2005-03-16

OpenSearch

"Many sites today return search results as an tightly integrated part of the website itself. Unfortunately, those search results can't be easily reused or made available elsewhere, as they are usually wrapped in HTML and don't follow any one convention. OpenSearch offers an alternative: an open format that will enable those search results to be displayed anywhere, anytime. Rather than introduce yet another proprietary or closed protocol, OpenSearch is a straightforward and backward-compatible extension of RSS 2.0, the widely adopted XML-based format for content syndication.

Any site that has content - and a search box - can choose to return results in OpenSearch RSS. This includes travel sites, classifieds, encyclopedias... If you can provide search results for something, it probably can fit into the OpenSearch model. Returning OpenSearch results is easy - the format is the standard set of XML elements, plus three additional elements designed to support navigation between pages."

Update: The new OpenSearch homepage

Wed, 16 Mar 2005 22:08:48 +0100

Don't Panic

Sam Ruby - Don't Panic:

"Just" use HTTP. This is an updated version of my Attractive Nuisance at Chris Sell's Applied XML Developers Conference 5."

Wed, 16 Mar 2005 11:43:51 +0100
2005-03-05

REST design questions

David Megginson - REST design questions:

"RESTafarians point out that REST is the basis of the Web's success, but that's really only the GET part (and its cousin, POST). Despite WebDAV, we have very little experience using PUT and DELETE even for regular web pages, much less to maintain a data repository. Even the much-touted RESTful web services from Amazon and eBay are GET-only (and POST, in eBay's case); in fact, many, if not most firewall come preconfigured to block PUT and DELETE, since web admins see them mainly as security holes.

My gut feeling is that REST is, in fact, more manageable than XML-RPC or WS-* for XML on the Web, but that we have a lot of issues we'll need to work out first. Data management is never really simple, and while WS-* makes it harder than it has to be, even the simplest REST model cannot make it trivial."

Sat, 05 Mar 2005 00:01:46 +0100
2005-03-03

Buzzing the Yahoo! Search Web Services

Rasmus Lerdorf - Buzzing the Yahoo! Search Web Services:

"I still much prefer the REST services out there. SOAP always reminds me of being stuck behind the guy in a hat driving a Lincoln Towncar. You eventually get to where you want to go, but the journey is painful. With REST you can just toss your query into your browser and have a look at the returned XML. SOAP starts to make more sense when the queries you are sending get more complex than just tossing a couple of keywords to a search service and setting a couple of flags. But don't even try to read the SOAP spec. If you managed to fight your way through that spec already, try the new WSDL 2.0 Draft Spec. This is the sort of stuff that makes my brain hurt."

Thu, 03 Mar 2005 22:10:36 +0100
2005-02-21

How to Create a REST Protocol

Joe Gregorio at XML.com - XML.com: How to Create a REST Protocol:

"If you follow web services, then you may have heard of REST. REST is an architectural style that can be used to guide the construction of web services. Recently, there have been attempts to create such services that have met with mixed success. This article outlines a series of steps you can follow in creating your protocol--guidance that will help you get all the benefits that REST has to offer, while avoiding common pitfalls."

Mon, 21 Feb 2005 15:49:52 +0100
2005-01-26

Syntext Serna

"With Serna, general users and professional authors alike can create and maintain complex XML documents. With minimal training, most people will find it as easy to use as a conventional word processor.

[...] Serna is unique because it allows users to edit documents in print appearance , thereby greatly simplifying and shortening the document maintenance cycle. Having print appearance means that users have a real view of layout, section numbering, TOC, data merged from other documents, etc.

Serna uses the most prominent open publishing standards of the day: XSLT and XSL-FO during the whole authoring process. Most other editors which claim to be WYSIWYG use simple (often proprietary) stylesheets which provide only a limited view of the document, usually very different from the final presentation."

Wed, 26 Jan 2005 17:53:06 +0100
2004-12-14

The present and future value of Python

Jon Udell's Vancouver Python Workshop talk on The present and future value of Python:

"The endgame here is a hybrid data engine with object, relational, and XML surfaces. Could you build such a thing in Python? I don't see why not. If you can build a scalable high-performance object database like ZODB in Python, I'll bet you can build the kind of hybrid I'm talking about. Of course, there's not an infinite supply of Jim Fultons. And a lot of companies are chasing the universal database holy grail. Oracle and IBM have gotten pretty far down that road already. At the other end of the commercial spectrum, OpenLink Software's Virtuoso has been delivering the goods for a couple of years now. In the open source world, I'm not sure where things stand. Postgres and ZODB and MySQL and Berkeley DB XML are all pieces of the puzzle, but I don't see any plan for fitting them together."

See also The Register - IBM moves the database goalposts...

Tue, 14 Dec 2004 13:21:35 +0100
2004-11-25

RelaxNG

Tim Bray - More Relax:

"[...] that RelaxNG is the world's best schema language, and that anyone who who's using XML but not RelaxNG should be nervous."

Thu, 25 Nov 2004 13:14:12 +0100
2004-11-22

Adam Bosworth's ISCOC04 Talk

Adam Bosworth - ISCOC04 Talk:

"That software which is flexible, simple, sloppy, tolerant, and altogether forgiving of human foibles and weaknesses turns out to be actually the most steel cored, able to survive and grow while that software which is demanding, abstract, rich but systematized, turns out to collapse in on itself in a slow and grim implosion.

[...] What is more, in one of the unintended ironies of software history, HTML was intended to be used as a way to provide a truly malleable plastic layout language which never would be bound by 2 dimensional limitations, ironic because hordes of CSS fanatics have been trying to bind it with straight jackets ever since, bad mouthing tables and generations of tools have been layering pixel precise 2 dimensional layout on top of it. And yet, ask any gifted web author, like Jon Udell, and they will tell you that they often use it in the lazy sloppy intuitive human way that it was designed to work. They just pour in content. In 1996 I was at some of the initial XML meetings. The participants' anger at HTML for "corrupting" content with layout was intense. Some of the initial backers of XML were frustrated SGML folks who wanted a better cleaner world in which data was pristinely separated from presentation. In short, they disliked one of the great success stories of software history, one that succeeded because of its limitations, not despite them. I very much doubt that an HTML that had initially shipped as a clean layered set of content (XML, Layout rules - XSLT, and Formatting- CSS) would have had anything like the explosive uptake.

Now as it turns out I backed XML back in 1996, but as it turns out, I backed it for exactly the opposite reason. I wanted a flexible relaxed sloppy human way to share data between programs and compared to the RPC's and DCOM's and IIOP's of that day, XML was an incredibly flexible plastic easy going medium. It still is. And because it is, not despite it, it has rapidly become the most widely used way to exchange data between programs in the world. And slowly, but surely, we have seen the other older systems, collapse, crumple, and descend towards irrelevance.

Consider programming itself. There is an unacknowledged war that goes on every day in the world of programming. It is a war between the humans and the computer scientists. It is a war between those who want simple, sloppy, flexible, human ways to write code and those who want clean, crisp, clear, correct ways to write code. It is the war between PHP and C /Java. It used to be the war between C and dBase. Programmers at the level of those who attend Columbia University, programmers at the level of those who have made it through the gauntlet that is Google recruiting, programmers at the level of this audience are all people who love precise tools, abstraction, serried ranks of orderly propositions, and deduction. But most people writing code are more like my son. Code is just a hammer they use to do the job. PHP is an ideal language for them. It is easy. It is productive. It is flexible. Associative arrays are the backbone of this language and, like XML, is therefore flexible and self describing. They can easily write code which dynamically adapts to the information passed in and easily produces XML or HTML.

[...] I remember listening many years ago to someone saying contemptuously that HTML would never succeed because it was so primitive. It succeeded, of course, precisely because it was so primitive. Today, I listen to the same people at the same companies say that XML over HTTP can never succeed because it is so primitive. Only with SOAP and SCHEMA and so on can it succeed. But the real magic in XML is that it is self-describing. The RDF guys never got this because they were looking for something that has never been delivered, namely universal truth."

Mon, 22 Nov 2004 14:56:47 +0100
2004-10-06

Don't Be Afraid to Drop the SOAP

Sam Tregar - Don't Be Afraid to Drop the SOAP:

"The best candidates for SOAP applications are lightweight network applications without significant performance requirements. If your application doesn't absolutely require network interaction, or if it will deal with large amounts of data then you should avoid SOAP."

Wed, 06 Oct 2004 13:47:14 +0200