Tim's Weblog
Tim Strehle’s links and thoughts on Web apps, software development and Digital Asset Management, since 2002.
2019-02-17

Clean Data is more important than Clean Code

In my experience, many software developers don’t care much about data modeling. They seem to prioritize clean code, a good technology stack and a good user interface over getting the data model right.

All of these are important, but the data model is the foundation:

You can refactor source code anytime, going from unclean to clean and back. You can even throw your UI and all of your code away, and replace your entire technology stack.

The only thing that will live on is the data you migrate into the new system – and most faults in your data model are impossible or impractical to fix. (Believe me; I made many wrong choices as a developer, and my data modeling mistakes had much more long-term impact than my bad coding.)

Data is forever:

Data you did not capture because there was no place for it in your data model will be lost forever.

Data with different semantics (like a photo’s “date taken” vs “date imported”) that you had to squeeze into the same column because your data model did not let you differentiate will be indistinguishable forever.

Data with different data types (is it plain text or HTML?) or encoding (UTF-8 or not?) that went into your database without any hint because there was no attribute for it in your data model (and no normalization step) will be messed up (and potentially insecure) forever.

First, get your data model right.