DECOR-dataset
A dataset is a list of (hierarchical) concepts. A DECOR project contains one or more datasets. When there are multiple datasets, they are different versions of the same dataset. It is not recommended to create datasets for different healthcare projects in the same DECOR project.
A dataset consists of (healthcare) concepts. You can see a DECOR dataset as an hierarchical concept repository. Concepts should be well-defined and can be demonstrated with the following examples:
Examples |
|
The following datatypes are available (items printed as bold are part of the current DECOR implementation).
Datatype | Nederlands | Deutsch | Description |
---|---|---|---|
count | aantal | Zähler | Countable (non-monetary) quantities. Used for countable types such as:
|
code | code | Code | A system of valid symbols/codes, that substitute for specified concepts e.g. alpha, numeric, symbols and/or combinations, usually defined by a formal reference to a terminology or ontology, but may also be defined by the provision of text. Typically a symbol/code is expressed with a value for code, an identifier for the terminology or ontology it belongs to and at least one textual representation (display name). |
ordinal | ordinaal | Ordinal | Models rankings and scores, e.g. pain, Apgar values, etc, where there is a) implied ordering, b) no implication that the distance between each value is constant, and c) the total number of values is finite. Note that although the term ‘ordinal’ in mathematics means natural numbers only, here any integer is allowed, since negative and zero values are often used by medical professionals for values around a neutral point. Scores are commonly encountered in various clinical assessment scales. Assigning a value to a concept should generally be done in a formal code system that defines the value, or in an applicable value set for the concept, but some concepts do not have a formal definition (or are not even represented as a concept formally, especially in questionnaires. Scores may even be assigned arbitrarily during use (hence, on Coding). The value may be constrained to an integer in some contexts of use. Examples of sets of ordinal values:
|
identifier | identificatie | Identifikation | Type for representing identifiers of real-world entities. Typical identifiers include drivers licence number, social security number, prescription id, order id, and so on. |
string | string | Zeichenkette | Any text item, without visual formatting. |
text | string met opmaak | Zeichenkette mit Formatierung | A text item, which may contain any amount of legal characters arranged as e.g. words, sentences etc. Visual formatting and hyperlinks may be included. |
date | datum | Datum | Represents an absolute point in time, as measured on the Gregorian calendar, and specified only to the day. Semantics defined by ISO 8601. Used for recording dates in real world time. The partial form is used for approximate birth dates, dates of death, etc. |
datetime | datum+tijd | Datum+Zeit | Represents an absolute point in time, specified to the second. Semantics defined by ISO 8601. Used for recording a precise point in real world time, e.g. the exact date and time of the birth of a baby, and for approximate time stamps, e.g. the origin of an history observation which is only partially known. |
time | tijd | Zeit | Represents a time, specified to the second (hh:mm:ss). Semantics defined by ISO 8601. Used for recording a real world time, e.g. time of medication administration and starting/stopping a procedure, and for approximate times, e.g. the origin of an history observation which is only partially known. Per April 2020 |
complex | samengestelde gegevens | Sammlung von Daten | Non-atomic datatypes which are not explictly further defined in the dataset itself. Example: 'address' or 'person name'. Usually complex types are assumed to be well-known enough not to warrant further decomposition in the dataset itself. |
decimal | decimaal getal | Dezimalzahl | Decimal number (rarely used, in most cases a decimal number is actually a quantity). |
quantity | hoeveelheid | Menge | Quantitified type representing "scientific" quantities, i.e. quantities expressed as a magnitude and units. If not further specified with fractionDigits, a decimal number with optional decimal point (i.e. '3.14159265359').
There are some "special" quantities (used in healthcare), explained later:
|
duration | tijdsduur | Dauer | Is a quantity, represents a period of time with respect to a notional point in time, which is not specified. A sign may be used to indicate the duration is “backwards” in time rather than forwards. |
boolean | boolean | Boolean | Items which are truly boolean data, such as true/false or yes/no answers. |
blob | binair | Binär | Things that are typically stored as binary objects in the computer world and need to be rendered appropriately, e.g.
|
currency | valuta | Währung | Monetary quantities |
ratio | ratio | Ratio | A ratio of two Quantity values - a numerator and a denominator |
The datatypes can be further restricted using the following datatype facets:
Facet | Description | Example | Applies to |
---|---|---|---|
unit | Unit for quantities | kg, mmol | quantity |
minInclude | Range min include for quantities | 1, 100 | count, ordinal, quantity, currency |
maxInclude | Range max include for quantities | 1, 100 | count, ordinal, quantity, currency |
fractionDigits | Fraction digits for quantities"1" for at least 1 or "1!" for exactly 1 | "1" for at least 1, "1!" for exactly 1 | quantity |
timeStampPrecision | Precisions for timing specs | date, datetime | |
default | Default value | all datatypes | |
fixed | Fixed value | all datatypes | |
minLength | Minimum length for strings | string | |
maxLength | Maximum length for strings | string |
Facet timeStampPrecision takes the following values
timeStampPrecision | value |
---|---|
Y | at least year (YYYY) |
Y! | only year (YYYY) |
YM | at least month (MM) and year (YYYY) |
YM! | only month (MM) and year (YYYY) |
YMD | at least day (DD), month (MM) and year (YYYY) |
YMD! | only day (DD), month (MM) and year (YYYY) |
YMDHM | at least day (DD), month (MM) and year (YYYY), hour (hh) and minute (mm) |
Dataset versioning
- New dataset versions get a new @id and a new @effectiveDate
- The concepts in the new dataset version keep their @id, but get a new @effectiveDate and inherit from the original concept. If a concept needs changes it may be disconnected (deinherit) from its source concept so editing is possible
- The name and other properties of a concept constitute its definition. This definition should be governed. This means that any property in any language is under governance. When project wants to reuse (inherit) these concepts they inherit this definition as-is, and only comments are allowed additions. These comment cannot in any way shape or form alter the semantics of the original concept
- Multi lingual setting: when a project wants to reuse concepts from a building block repository (BBR) that does not have defining properties in the same language as the project, then the project can do one of the following things:
- Accept the BBR concept as-is, potentially adding a comment
- Talk to the BBR governance group and work out an agreement whereby translations may be submitted
The recommended procedure for a BBR governance group for submitted translations is to create a new version of the dataset that adds the translations. The governance group could, but this is not recommended, decide to add the translation to the original dataset. ART will not support this as BBR datasets should be final, and final objects cannot be edited.