Skip to content

SwissStatsR/SpiGesXML

Repository files navigation

SpiGesXML

swissstatsr badge SpiGesXML status badge R-CMD-check

Extract data from SpiGes XML files using R programming.

Installation

# install from the r-universe
install.packages("SpiGesXML", repos = "https://swissstatsr.r-universe.dev")
# install from GitHub
remotes::install_github("SwissStatsR/SpiGesXML")

Latest validation format

By default, the functions use the “latest” format validation version, which is currently the SpiGes version 1.5. The section Change format version provides examples how to change manually the version.

Examples

Using spiges_get_df(), you can get the data from any SpiGes XML file using the x argument. The x argument can be an URL or the path of a give XML file.

library(SpiGesXML)

# SpiGes XML file example 
# https://www.bfs.admin.ch/bfs/de/home/statistiken/gesundheit/gesundheitswesen/projekt-spiges.assetdetail.27905035.html
#xml_example <- "https://dam-api.bfs.admin.ch/hub/api/dam/assets/32129227/master"
xml_example <- "https://dam-api.bfs.admin.ch/hub/api/dam/assets/36147530/master"

spiges_get_df(x = xml_example, node = "Administratives")
## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36182162/master.

## # A tibble: 2 × 35
##   ent_id    burnr  fall_id burnr…¹ abc_f…² gesch…³ alter alter…⁴ wohno…⁵ wohnk…⁶
##   <chr>     <chr>  <chr>   <chr>   <chr>   <chr>   <chr> <chr>   <chr>   <chr>  
## 1 845724581 52704… 5443546 7584215 A       1       0     1       AG01    AG     
## 2 845724581 52704… 5443547 7584215 A       2       37    <NA>    <NA>    <NA>   
## # … with 25 more variables: wohnland <chr>, nationalitaet <chr>,
## #   eintrittsdatum <chr>, eintritt_aufenthalt <chr>, eintrittsart <chr>,
## #   einw_instanz <chr>, liegeklasse <chr>, versicherungsklasse <chr>,
## #   admin_urlaub <chr>, chlz <chr>, aufenthalt_ips <chr>, beatmung <chr>,
## #   schwere_score <chr>, art_score <chr>, nems <chr>, aufenthalt_imc <chr>,
## #   aufwand_imc <chr>, hauptleistungsstelle <chr>, grundversicherung <chr>,
## #   tarif <chr>, austrittsdatum <chr>, austrittsentscheid <chr>, …

If for some reasons there is no data for a given node, spiges_get_df() will return an error with suggestions of existing nodes in the SpiGes file.

spiges_get_df(x = xml_example, node = "Admin")
Error in `spiges_get_df()`:
! `node` must be one of "Administratives", "Diagnose", "KostentraegerFall", "Behandlung", or "Rechnung", not "Admin".
ℹ Did you mean "Administratives"?

The spiges_get_df() function validate internally the XML file. If correctly validated, it returns the data with a success message in the console. If the validation is incorrect, only an error message is returned. To return the data even with an incorrect validation, you can add use the force argument as TRUE.

Here an example of an incorrect format with an “INCORRECT VALUE” for “burnr”:

xml_file_incorrect <- system.file(
  "example_incorrect_format.xml", 
  package = "SpiGesXML"
)

library(xml2) # install.packages("xml2")

# for example burnr="INCORRECT VALUE"
xml2::read_xml(xml_file_incorrect)
## {xml_document}
## <Unternehmen ent_id="100000012" version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.bfs.admin.ch/xmlns/gvs/spiges-data/1.4">
## [1] <Standort burnr="INCORRECT VALUE">\n  <Fall fall_id="1">\n    <Administra ...

If you use this incorrect XML file, spiges_get_df() will return an error message (but the function can return the data anyway using force = TRUE):

spiges_get_df(
  x = xml_file_incorrect,
  node = "Administratives",
  force = FALSE # TO RETURN DATA, USE "TRUE"
)
## ✖ Incorrect format using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36182162/master.
## 
## Element '{http://www.bfs.admin.ch/xmlns/gvs/spiges-data/1.4}Unternehmen': No matching global declaration available for the validation root.

Variables can be individually selected using the variables argument:

spiges_get_df(x = xml_example, node = "Administratives", variables = c("abc_fall", "geschlecht"))
## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36182162/master.

## # A tibble: 2 × 5
##   ent_id    burnr    fall_id abc_fall geschlecht
##   <chr>     <chr>    <chr>   <chr>    <chr>     
## 1 845724581 52704341 5443546 A        1         
## 2 845724581 52704341 5443547 A        2

If the variable name in variables doesn’t exist in the file, will return an error message. When multiple variable names are provided, only correct variable names will be returned (with no error messages if a variable name is correct).

spiges_get_df(x = xml_example, node = "Administratives", variables = "Geschlecht")
Error in `spiges_get_df()`:
! `variables` must be one of "burnr_gesv", "abc_fall", "geschlecht", "alter", "alter_U1", "wohnort_medstat", "wohnkanton",
  "wohnland", "nationalitaet", "eintrittsdatum", "eintritt_aufenthalt", "eintrittsart", "einw_instanz", "liegeklasse",
  "versicherungsklasse", "admin_urlaub", "chlz", "aufenthalt_ips", "beatmung", "schwere_score", "art_score", "nems", "aufenthalt_imc",
  "aufwand_imc", "hauptleistungsstelle", "grundversicherung", "tarif", "austrittsdatum", "austrittsentscheid", "austritt_aufenthalt", or
  "austritt_behandlung", not "Geschlecht".
ℹ Did you mean "geschlecht"?

Get IDs data

You can also access “Personenidentifikatoren” data using spiges_get_df(). Under the hood the function is using the spiges-ids format validation (while other available nodes are using the spiges-data validation format).

id_example <- "https://dam-api.bfs.admin.ch/hub/api/dam/assets/36147531/master"

spiges_get_df(x = id_example, node = "Personenidentifikatoren")
## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/36147528/master.

## # A tibble: 2 × 5
##   ent_id    burnr    fall_id ahv           geburtsdatum
##   <chr>     <chr>    <chr>   <chr>         <chr>       
## 1 845724581 71548624 5443546 7561234567897 20220412    
## 2 845724581 71548624 5443547 7561111111119 19850117

Change format validation version

You can change the version of the XSD schema validation using the schema_xsd argument (by default “latest”). Note that you might encounter issues with custom schema validation as SpigGesXML aims to work with the latest schema version available.

Here an example for schema version “1.3”:

# SpiGes format version 1.3 
spiges_get_df(
  x = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905035/master", 
  node = "Diagnose",
  schema_xsd = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905037/master"
)
## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905037/master.

## # A tibble: 2 × 8
##   ent_id    burnr    fall_id diagnose_id diagnose_kode diagnos…¹ diagn…² diagn…³
##   <chr>     <chr>    <chr>   <chr>       <chr>         <chr>     <chr>   <chr>  
## 1 845724581 71548624 5443546 1           Z380          <NA>      <NA>    <NA>   
## 2 845724581 71548624 5443547 1           F29           <NA>      1       <NA>   
## # … with abbreviated variable names ¹​diagnose_seitigkeit, ²​diagnose_poa,
## #   ³​diagnose_zusatz

Here another example of version 1.3 for “Personenidentifikatoren” node:

spiges_get_df(
    x = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905038/master", 
    node = "Personenidentifikatoren",
    schema_xsd = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905036/master"
)
## ✔ Format correctly validated using XSD schema https://dam-api.bfs.admin.ch/hub/api/dam/assets/27905036/master.

## # A tibble: 2 × 5
##   ent_id    burnr    fall_id ahv           geburtsdatum
##   <chr>     <chr>    <chr>   <chr>         <chr>       
## 1 845724581 71548624 5443546 7561234567897 20220412    
## 2 845724581 71548624 5443547 7561111111119 19850117

Get available node names

Show node names available for data extraction in XML SpiGes:

spiges_get_name_nodes(x = xml_example)
##  [1] "KostentraegerUnternehmen" "KostentraegerStandort"   
##  [3] "Administratives"          "Neugeborene"             
##  [5] "KostentraegerFall"        "Diagnose"                
##  [7] "Behandlung"               "Rechnung"                
##  [9] "Psychiatrie"              "Medikament"              
## [11] "Patientenbewegung"

Get available variable names

Get list of variable names available by SpiGes node names of any SpiGes Schema XSD file:

spiges_get_name_variables(
  schema_xsd = "https://dam-api.bfs.admin.ch/hub/api/dam/assets/32129184/master"
) |>
  str()
## List of 4
##  $ Unternehmen            : chr [1:2] "ent_id" "version"
##  $ Standort               : chr "burnr"
##  $ Fall                   : chr "fall_id"
##  $ Personenidentifikatoren: chr [1:2] "ahv" "geburtsdatum"

Notes

The following node names are not available for data extraction: “Unternehmen”, “Standort”, “Fall” and “Kantonsdaten”.

TODO

  • Read-in the “Kantonsdaten” (on levels “Fall”, “Standort”, “Unternehmen”).
    • There are no specified attributes as each canton can set its own variables… Could be problematic for including it in the function.

About

R package to extract data from SpiGes XML files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages