Extract data from SpiGes XML files using R programming.
# install from the r-universe
install.packages("SpiGesXML", repos = "https://swissstatsr.r-universe.dev")# install from GitHub
remotes::install_github("SwissStatsR/SpiGesXML")By default, the functions use the “latest” format validation version, which is currently the SpiGes version 1.5. The section Change format version provides examples how to change manually the version.
Using spiges_get_df(), you can get the data from any SpiGes XML file
using the x argument. The x argument can be an URL or the path of a
give XML file.
library(SpiGesXML)
# SpiGes XML file example
# https://www.bfs.admin.ch/bfs/de/home/statistiken/gesundheit/gesundheitswesen/projekt-spiges.assetdetail.27905035.html
#xml_example <- "https://dam-api.bfs.admin.ch/hub/api/dam/assets/32129227/master"
xml_example <- "https://dam-api.bfs.admin.ch/hub/api/dam/assets/36147530/master"
spiges_get_df(x = xml_example, node = "Administratives")## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36182162/master.
## # A tibble: 2 × 35
## ent_id burnr fall_id burnr…¹ abc_f…² gesch…³ alter alter…⁴ wohno…⁵ wohnk…⁶
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 845724581 52704… 5443546 7584215 A 1 0 1 AG01 AG
## 2 845724581 52704… 5443547 7584215 A 2 37 <NA> <NA> <NA>
## # … with 25 more variables: wohnland <chr>, nationalitaet <chr>,
## # eintrittsdatum <chr>, eintritt_aufenthalt <chr>, eintrittsart <chr>,
## # einw_instanz <chr>, liegeklasse <chr>, versicherungsklasse <chr>,
## # admin_urlaub <chr>, chlz <chr>, aufenthalt_ips <chr>, beatmung <chr>,
## # schwere_score <chr>, art_score <chr>, nems <chr>, aufenthalt_imc <chr>,
## # aufwand_imc <chr>, hauptleistungsstelle <chr>, grundversicherung <chr>,
## # tarif <chr>, austrittsdatum <chr>, austrittsentscheid <chr>, …
If for some reasons there is no data for a given node, spiges_get_df()
will return an error with suggestions of existing nodes in the SpiGes
file.
spiges_get_df(x = xml_example, node = "Admin")Error in `spiges_get_df()`:
! `node` must be one of "Administratives", "Diagnose", "KostentraegerFall", "Behandlung", or "Rechnung", not "Admin".
ℹ Did you mean "Administratives"?
The spiges_get_df() function validate internally the XML file. If
correctly validated, it returns the data with a success message in the
console. If the validation is incorrect, only an error message is
returned. To return the data even with an incorrect validation, you can
add use the force argument as TRUE.
Here an example of an incorrect format with an “INCORRECT VALUE” for “burnr”:
xml_file_incorrect <- system.file(
"example_incorrect_format.xml",
package = "SpiGesXML"
)
library(xml2) # install.packages("xml2")
# for example burnr="INCORRECT VALUE"
xml2::read_xml(xml_file_incorrect)## {xml_document}
## <Unternehmen ent_id="100000012" version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.bfs.admin.ch/xmlns/gvs/spiges-data/1.4">
## [1] <Standort burnr="INCORRECT VALUE">\n <Fall fall_id="1">\n <Administra ...
If you use this incorrect XML file, spiges_get_df() will return an
error message (but the function can return the data anyway using
force = TRUE):
spiges_get_df(
x = xml_file_incorrect,
node = "Administratives",
force = FALSE # TO RETURN DATA, USE "TRUE"
)## ✖ Incorrect format using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36182162/master.
##
## Element '{http://www.bfs.admin.ch/xmlns/gvs/spiges-data/1.4}Unternehmen': No matching global declaration available for the validation root.
Variables can be individually selected using the variables argument:
spiges_get_df(x = xml_example, node = "Administratives", variables = c("abc_fall", "geschlecht"))## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36182162/master.
## # A tibble: 2 × 5
## ent_id burnr fall_id abc_fall geschlecht
## <chr> <chr> <chr> <chr> <chr>
## 1 845724581 52704341 5443546 A 1
## 2 845724581 52704341 5443547 A 2
If the variable name in variables doesn’t exist in the file, will
return an error message. When multiple variable names are provided, only
correct variable names will be returned (with no error messages if a
variable name is correct).
spiges_get_df(x = xml_example, node = "Administratives", variables = "Geschlecht")Error in `spiges_get_df()`:
! `variables` must be one of "burnr_gesv", "abc_fall", "geschlecht", "alter", "alter_U1", "wohnort_medstat", "wohnkanton",
"wohnland", "nationalitaet", "eintrittsdatum", "eintritt_aufenthalt", "eintrittsart", "einw_instanz", "liegeklasse",
"versicherungsklasse", "admin_urlaub", "chlz", "aufenthalt_ips", "beatmung", "schwere_score", "art_score", "nems", "aufenthalt_imc",
"aufwand_imc", "hauptleistungsstelle", "grundversicherung", "tarif", "austrittsdatum", "austrittsentscheid", "austritt_aufenthalt", or
"austritt_behandlung", not "Geschlecht".
ℹ Did you mean "geschlecht"?
You can also access “Personenidentifikatoren” data using
spiges_get_df(). Under the hood the function is using the spiges-ids
format validation (while other available nodes are using the
spiges-data validation format).
id_example <- "https://dam-api.bfs.admin.ch/hub/api/dam/assets/36147531/master"
spiges_get_df(x = id_example, node = "Personenidentifikatoren")## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36147528/master.
## # A tibble: 2 × 5
## ent_id burnr fall_id ahv geburtsdatum
## <chr> <chr> <chr> <chr> <chr>
## 1 845724581 71548624 5443546 7561234567897 20220412
## 2 845724581 71548624 5443547 7561111111119 19850117
You can change the version of the XSD schema validation using the
schema_xsd argument (by default “latest”). Note that you might
encounter issues with custom schema validation as SpigGesXML aims to
work with the latest schema version available.
Here an example for schema version “1.3”:
# SpiGes format version 1.3
spiges_get_df(
x = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905035/master",
node = "Diagnose",
schema_xsd = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905037/master"
)## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905037/master.
## # A tibble: 2 × 8
## ent_id burnr fall_id diagnose_id diagnose_kode diagnos…¹ diagn…² diagn…³
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 845724581 71548624 5443546 1 Z380 <NA> <NA> <NA>
## 2 845724581 71548624 5443547 1 F29 <NA> 1 <NA>
## # … with abbreviated variable names ¹diagnose_seitigkeit, ²diagnose_poa,
## # ³diagnose_zusatz
Here another example of version 1.3 for “Personenidentifikatoren” node:
spiges_get_df(
x = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905038/master",
node = "Personenidentifikatoren",
schema_xsd = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905036/master"
)## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905036/master.
## # A tibble: 2 × 5
## ent_id burnr fall_id ahv geburtsdatum
## <chr> <chr> <chr> <chr> <chr>
## 1 845724581 71548624 5443546 7561234567897 20220412
## 2 845724581 71548624 5443547 7561111111119 19850117
Show node names available for data extraction in XML SpiGes:
spiges_get_name_nodes(x = xml_example)## [1] "KostentraegerUnternehmen" "KostentraegerStandort"
## [3] "Administratives" "Neugeborene"
## [5] "KostentraegerFall" "Diagnose"
## [7] "Behandlung" "Rechnung"
## [9] "Psychiatrie" "Medikament"
## [11] "Patientenbewegung"
Get list of variable names available by SpiGes node names of any SpiGes Schema XSD file:
spiges_get_name_variables(
schema_xsd = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/32129184/master"
) |>
str()## List of 4
## $ Unternehmen : chr [1:2] "ent_id" "version"
## $ Standort : chr "burnr"
## $ Fall : chr "fall_id"
## $ Personenidentifikatoren: chr [1:2] "ahv" "geburtsdatum"
The following node names are not available for data extraction: “Unternehmen”, “Standort”, “Fall” and “Kantonsdaten”.
- Read-in the “Kantonsdaten” (on levels “Fall”, “Standort”,
“Unternehmen”).
- There are no specified attributes as each canton can set its own variables… Could be problematic for including it in the function.
