{"id":1850,"date":"2017-03-10T00:00:00","date_gmt":"2017-03-09T23:00:00","guid":{"rendered":"https:\/\/wwwneu.strehle.de\/tim\/weblog\/archives\/2017\/03\/10\/1610-2\/"},"modified":"2025-07-31T21:52:55","modified_gmt":"2025-07-31T19:52:55","slug":"1610-2","status":"publish","type":"post","link":"https:\/\/www.strehle.de\/tim\/weblog\/archives\/2017\/03\/10\/1610-2\/","title":{"rendered":"Metadata values have metadata, too"},"content":{"rendered":"\n<p><strong>Field values<\/strong> \u2013 inside an SQL database column, an XML tag, or an image\u2019s embedded metadata \u2013 <strong>are the \u201catoms\u201d of a data model<\/strong>: the smallest unit. For example, in this record:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;country&gt;\n  &lt;name&gt;Germany&lt;\/name&gt;\n  &lt;population&gt;82175700&lt;\/population&gt;\n&lt;\/country&gt;\n<\/code><\/pre>\n\n\n\n<p>\u2026 the number \u201c82,175,700\u201d is the field value in the \u201cGermany\u201d country record\u2019s \u201cpopulation\u201d field.<\/p>\n\n\n\n<p>Just as with atoms, there\u2019s a bit more to them once you dig a little deeper. Imagine your boss complaining \u201cthis number is wrong \u2013 how did it end up in our database?\u201d You might explain to him that you entered this number into the database because the Wikipedia page you visited last week said this was the approximate population of Germany, according to a 2015 estimate.<\/p>\n\n\n\n<p>Suddenly, <strong>that \u201catomic\u201d, inconspicuous number has a bunch of metadata attached to it<\/strong>: a validity date (in 2015), a precision qualifier (\u201capproximately\u201d), provenance (Wikipedia), and user information (you entered it).<\/p>\n\n\n\n<p>Obviously, the real world is way more complicated than our database structures. That\u2019s one of the points of data modeling; real-world complexities which aren\u2019t of much use to our business should be left out of the data model to keep it simple. (My post <a href=\"\/tim\/weblog\/archives\/2013\/02\/08\/1555\">Why I prefer Topic Maps to RDF<\/a> has a few remarks on data modeling.)<\/p>\n\n\n\n<p>But what if this kind of additional metadata matters to your business? Let\u2019s look at the <strong>kinds of metadata<\/strong> which can be applied to data points:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Data type and format<\/strong>: You\u2019ll usually have that already, i.e. you know whether the value in your \u201cDate created\u201d field is a simple string (\u201cyesterday, I think\u201d), a number (seconds since Jan 1st, 1970) or a proper date\/time format (\u201c2017-03-10T15:00:00+01:00\u201d). Programming languages and data exchange formats each have their own data types, check out <a href=\"https:\/\/www.w3.org\/TR\/xmlschema-2\/\">XML Schema Datatypes (XSD)<\/a> for a good example.<\/li>\n\n\n\n<li><strong>Unit<\/strong>: An amount of money requires the currency (\u201c\u20ac35.11\u201d), and length, weight etc. have a unit of measurement (\u201c110.5 cm\u201d).<\/li>\n\n\n\n<li><strong>Language<\/strong>: Important for names and text fields (\u201cen\u201d).<\/li>\n\n\n\n<li><strong>Provenance<\/strong>: the source of the value (\u201chttps:\/\/en.wikipedia.org\/wiki\/Germany\u201d).<\/li>\n\n\n\n<li><strong>User:<\/strong> who added or edited the value (common in auditing).<\/li>\n\n\n\n<li><strong>Last modified<\/strong>: when the value was added or edited.<\/li>\n\n\n\n<li><strong>Application<\/strong>: the software which wrote the value. For example, the <a href=\"https:\/\/iptc.org\/standards\/iim\/\">IPTC IIM<\/a> has an \u201cOriginating Program\u201d field.<\/li>\n\n\n\n<li><strong>Transaction<\/strong>: the larger transaction during which the value was written; as described by Ralph Windsor in <a href=\"http:\/\/digitalassetmanagementnews.org\/features\/the-digital-asset-transaction-management-system-a-time-machine-for-digital-assets\/\">The Digital Asset Transaction Management System \u2013 A Time Machine For Digital Assets<\/a>: \u201ca unique identifier for each batch operation\u201d.<\/li>\n\n\n\n<li><strong>Confidence<\/strong>: how sure you are the value is correct. Automatic image recognition and classification software usually provides a numeric confidence score in percent. In <a href=\"https:\/\/www.linkedin.com\/pulse\/its-time-kurt-cagle\">It\u2019s About Time<\/a>, Kurt Cagle suggests \u201cApproximate\u201d, \u201cInferred\u201d, \u201cReported\u201d, \u201cConfirmed\u201d.<\/li>\n\n\n\n<li><strong>Validity<\/strong>: the date\/time range \u2013 often an open range (when the start date is unknown, or when there\u2019s no end date yet) \u2013 during which the value was valid. Useful to mark a former e-mail address, or a maiden name. (Could also be used to \u201csurf\u201d past versions of your data if you manage to implement something like the <a href=\"https:\/\/mementoweb.org\/guide\/rfc\/\">Memento framework<\/a>.)<\/li>\n\n\n\n<li><strong>Accuracy \/ Precision<\/strong>: how accurate the value is. In historical archives and museums, you\u2019re likely to deal with data you know to be only guesswork. (You might have a photo that\u2019s definitely from Christmas Eve, but you don\u2019t have an exact year.) The <a href=\"https:\/\/www.loc.gov\/standards\/datetime\/pre-submission.html\">Extended Date\/Time Format (EDTF)<\/a> draft offers an extensive syntax for inexact dates.<\/li>\n<\/ul>\n\n\n\n<p>To be able to add metadata to your field values, be prepared for a lot of work if you use a relational (SQL) database or a simple NoSQL database. In XML, you can often use attributes. For full flexibility, Topic Maps which support \u201cscope\u201d and reification would be great <a href=\"\/tim\/weblog\/archives\/2015\/06\/14\/1763\">if they were more widely available<\/a>. RDF also does <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reification_(computer_science)\">reification<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Field values \u2013 inside an SQL database column, an XML tag, or an image\u2019s embedded metadata \u2013 are the \u201catoms\u201d of a data model: the smallest unit. For example, in this record: \u2026 the number \u201c82,175,700\u201d is the field value in the \u201cGermany\u201d country record\u2019s \u201cpopulation\u201d field. Just as with atoms, there\u2019s a bit more [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_share_on_mastodon":"0"},"categories":[1],"tags":[],"class_list":["post-1850","post","type-post","status-publish","format-standard","hentry","category-weblog"],"share_on_mastodon":{"url":"","error":""},"_links":{"self":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1850","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/comments?post=1850"}],"version-history":[{"count":1,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1850\/revisions"}],"predecessor-version":[{"id":1905,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/posts\/1850\/revisions\/1905"}],"wp:attachment":[{"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/media?parent=1850"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/categories?post=1850"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.strehle.de\/tim\/wp-json\/wp\/v2\/tags?post=1850"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}