DECOR-dataset

A dataset is a list of (hierarchical) concepts. A DECOR project contains one or more datasets. When there are multiple datasets, they are different versions of the same dataset. It is not recommended to create datasets for different healthcare projects in the same DECOR project.

A dataset consists of (healthcare) concepts. You can see a DECOR dataset as an hierarchical concept repository. Concepts should be well-defined and can be demonstrated with the following examples:

Examples
  • Female
  • Body length
  • Diabetes
  • Monochorionic

The following datatypes are available (items printed as bold are part of the current DECOR implementation).

Datatype Nederlands Deutsch Description
count aantal Zähler Countable (non-monetary) quantities. Used for countable types such as:
  • pregnancies,
  • steps (taken by a physiotherapy patient),
  • number of cigarettes smoked in a day.
code code Code A system of valid symbols/codes, that substitute for specified concepts e.g. alpha, numeric, symbols and/or combinations, usually defined by a formal reference to a terminology or ontology, but may also be defined by the provision of text. Typically a symbol/code is expressed with a value for code, an identifier for the terminology or ontology it belongs to and at least one textual representation (display name).
ordinal ordinaal Ordinal Models rankings and scores, e.g. pain, Apgar values, etc, where there is a) implied ordering, b) no implication that the distance between each value is constant, and c) the total number of values is finite. Note that although the term ‘ordinal’ in mathematics means natural numbers only, here any integer is allowed, since negative and zero values are often used by medical professionals for values around a neutral point.

Scores are commonly encountered in various clinical assessment scales. Assigning a value to a concept should generally be done in a formal code system that defines the value, or in an applicable value set for the concept, but some concepts do not have a formal definition (or are not even represented as a concept formally, especially in questionnaires. Scores may even be assigned arbitrarily during use (hence, on Coding). The value may be constrained to an integer in some contexts of use. Examples of sets of ordinal values:

  • -3, -2, -1, 0, 1, 2, 3 -- reflex response values;
  • 0, 1, 2 -- Apgar score values
  • 1, 2, 3, 4,... -- ASA classification
  • I, II, III, IV, ... -- Tanner scale
identifier identificatie Identifikation Type for representing identifiers of real-world entities. Typical identifiers include drivers licence number, social security number, prescription id, order id, and so on.
string string Zeichenkette Any text item, without visual formatting.
text string met opmaak Zeichenkette mit Formatierung A text item, which may contain any amount of legal characters arranged as e.g. words, sentences etc. Visual formatting and hyperlinks may be included.
date datum Datum Represents an absolute point in time, as measured on the Gregorian calendar, and specified only to the day. Semantics defined by ISO 8601. Used for recording dates in real world time. The partial form is used for approximate birth dates, dates of death, etc.
datetime datum+tijd Datum+Zeit Represents an absolute point in time, specified to the second. Semantics defined by ISO 8601. Used for recording a precise point in real world time, e.g. the exact date and time of the birth of a baby, and for approximate time stamps, e.g. the origin of an history observation which is only partially known.
time tijd Zeit Represents a time, specified to the second (hh:mm:ss). Semantics defined by ISO 8601. Used for recording a real world time, e.g. time of medication administration and starting/stopping a procedure, and for approximate times, e.g. the origin of an history observation which is only partially known.
Per April 2020
complex samengestelde gegevens Sammlung von Daten Non-atomic datatypes which are not explictly further defined in the dataset itself. Example: 'address' or 'person name'. Usually complex types are assumed to be well-known enough not to warrant further decomposition in the dataset itself.
decimal decimaal getal Dezimalzahl Decimal number (rarely used, in most cases a decimal number is actually a quantity).
quantity hoeveelheid Menge Quantitified type representing "scientific" quantities, i.e. quantities expressed as a magnitude and units. If not further specified with fractionDigits, a decimal number with optional decimal point (i.e. '3.14159265359').

There are some "special" quantities (used in healthcare), explained later:

  • for time durations duration shall be used
  • for monetary amounts currency shall be used
duration tijdsduur Dauer Is a quantity, represents a period of time with respect to a notional point in time, which is not specified. A sign may be used to indicate the duration is “backwards” in time rather than forwards.
boolean boolean Boolean Items which are truly boolean data, such as true/false or yes/no answers.
blob binair Binär Things that are typically stored as binary objects in the computer world and need to be rendered appropriately, e.g.
  • images: like X-rays, computertomographic images;
  • graphic: diagrams, graphs, mathematical curves, or the like – usually a vector image;
  • icons: a sign or representation that stands for its object by virtue of a resemblance or analogy to it
  • pictures: A visual representation of a person, object, or scene – usually a raster image.
currency valuta Währung Monetary quantities
ratio ratio Ratio A ratio of two Quantity values - a numerator and a denominator

The datatypes can be further restricted using the following datatype facets:

Facet Description Example Applies to
unit Unit for quantities kg, mmol quantity
minInclude Range min include for quantities 1, 100 count, ordinal, quantity, currency
maxInclude Range max include for quantities 1, 100 count, ordinal, quantity, currency
fractionDigits Fraction digits for quantities"1" for at least 1 or "1!" for exactly 1 "1" for at least 1, "1!" for exactly 1 quantity
timeStampPrecision Precisions for timing specs date, datetime
default Default value all datatypes
fixed Fixed value all datatypes
minLength Minimum length for strings string
maxLength Maximum length for strings string

Facet timeStampPrecision takes the following values

timeStampPrecision value
Y at least year (YYYY)
Y! only year (YYYY)
YM at least month (MM) and year (YYYY)
YM! only month (MM) and year (YYYY)
YMD at least day (DD), month (MM) and year (YYYY)
YMD! only day (DD), month (MM) and year (YYYY)
YMDHM at least day (DD), month (MM) and year (YYYY), hour (hh) and minute (mm)

Dataset versioning

  • New dataset versions get a new @id and a new @effectiveDate
  • The concepts in the new dataset version keep their @id, but get a new @effectiveDate and inherit from the original concept. If a concept needs changes it may be disconnected (deinherit) from its source concept so editing is possible
  • The name and other properties of a concept constitute its definition. This definition should be governed. This means that any property in any language is under governance. When project wants to reuse (inherit) these concepts they inherit this definition as-is, and only comments are allowed additions. These comment cannot in any way shape or form alter the semantics of the original concept
  • Multi lingual setting: when a project wants to reuse concepts from a building block repository (BBR) that does not have defining properties in the same language as the project, then the project can do one of the following things:
    • Accept the BBR concept as-is, potentially adding a comment
    • Talk to the BBR governance group and work out an agreement whereby translations may be submitted
      The recommended procedure for a BBR governance group for submitted translations is to create a new version of the dataset that adds the translations. The governance group could, but this is not recommended, decide to add the translation to the original dataset. ART will not support this as BBR datasets should be final, and final objects cannot be edited.