Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,15 @@ jobs:
extra-packages: any::rcmdcheck
needs: check

- uses: r-lib/actions/check-r-package@v2
with:
upload-snapshots: true
build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'
- name: Install dependencies
run: |
sudo apt-get update && sudo apt-get install -y clang
- name: Run R CMD check with ASAN
run: |
export CC=clang
export CFLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer"
export CXX=clang++
export CXXFLAGS="-fsanitize=address,undefined -fno-omit-frame-pointer"
R CMD build .
R CMD check --as-cran read.dbc_*.tar.gz
shell: bash
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ SRC=./src

.PHONY: lib
lib: clean # build the shared library version of dbc2dbf
R CMD SHLIB -o src/db2dbf.so src/*.c -fsanitize=undefined
R CMD SHLIB -o src/db2dbf.so src/*.c -fsanitize=address,undefined

.PHONY: clean
clean: # clean generated files
Expand Down
23 changes: 23 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Package: read.dbc
Title: Read Data Stored in DBC (Compressed DBF) Files
Description: Functions for reading and decompressing the DBC (compressed DBF) files. Please note that this is the file format used by the Brazilian Ministry of Health (DATASUS) to publish healthcare datasets. It is not related to the FoxPro or CANdb DBC file formats.
Version: 1.0.7
Depends: R (>= 3.3.0)
Imports: foreign
Authors@R: c(
person("Daniela", "Petruzalek", email = "daniela.petruzalek@gmail.com", role = c("aut", "cre", "cph")),
person("Mark", "Adler", email = "madler@alumni.caltech.edu", role = c("cph", "ctb")),
person("Pablo", "Marcondes Fonseca", email = "pablo.mmarcondes@gmail.com", role = c("cph", "ctb"))
)
Maintainer: Daniela Petruzalek <daniela.petruzalek@gmail.com>
URL: https://github.com/danicat/read.dbc
BugReports: https://github.com/danicat/read.dbc/issues
Copyright: 2016 Daniela Petruzalek
License: AGPL-3
Encoding: UTF-8
RoxygenNote: 7.3.1
NeedsCompilation: yes
Packaged: 2025-07-14 23:14:25 UTC; jules
Author: Daniela Petruzalek [aut, cre, cph],
Mark Adler [cph, ctb],
Pablo Marcondes Fonseca [cph, ctb]
5 changes: 5 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Generated by roxygen2: do not edit by hand

export(dbc2dbf)
export(read.dbc)
useDynLib(read.dbc)
7 changes: 7 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# read.dbc 1.0.7

* Removed broken links
* Improved error handling in blast.c to prevent runtime errors (fixes gcc-UBSAN)
* Update DESCRIPTION with collaborators
* Documentation edits for conciseness
* Overall doc improvements
61 changes: 61 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/R/dbc2dbf.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# dbc2dbf.R
# Copyright (C) 2016 Daniela Petruzalek
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published
# by the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

#' Decompress a DBC file
#'
#' This function allows you decompress a DBC file. When decompressed, it becomes a regular DBF file.
#'
#' @param input.file The name of the DBC file (including extension)
#' @param output.file The output file name (including extension)
#' @return Return TRUE if succeed, FALSE otherwise.
#' @details
#' DBC is the extension for compressed DBF files (from the 'XBASE' family of databases).
#' This is a proprietary file format used by the Brazilian government to publish public healthcare data.
#' When decompressed, it becomes a regular DBF file.
#'
#' Please note that this is the file format is not related to the FoxPro or CANdb DBC file formats.
#' @source
#' The internal C code for \code{dbc2dbf} is based on \code{blast} decompressor and \code{blast-dbf} (see \emph{References}).
#' @keywords dbc dbf
#' @export
#' @useDynLib read.dbc
#' @author Daniela Petruzalek, \email{daniela.petruzalek@gmail.com}
#' @seealso \code{\link{read.dbc}}
#' @examples
#' # Input file name
#' input <- system.file("files/sids.dbc", package = "read.dbc")
#'
#' # Output file name
#' output <- tempfile(fileext = ".dbc")
#'
#' # The call returns TRUE on success
#' if( dbc2dbf(input.file = input, output.file = output) ) {
#' print("File decompressed!")
#' # do things with the file
#' }
#'
#' file.remove(output) # clean up example, don't do in real life :)
#'
#' @references
#' \code{blast} source code in C: \url{https://github.com/madler/zlib/tree/master/contrib/blast}
#' \code{blast-dbf}, DBC to DBF command-line decompression tool: \url{https://github.com/eaglebh/blast-dbf}
#'
dbc2dbf <- function(input.file, output.file) {
if( !file.exists(input.file) )
stop("Input file does not exist.")
out <- .C("dbc2dbf", input = as.character(path.expand(input.file)), output = as.character(path.expand(output.file)))
file.exists(output.file)
}
68 changes: 68 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/R/read.dbc.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# read.dbc.R
# Copyright (C) 2016 Daniela Petruzalek
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published
# by the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

#' Read Data Stored in DBC (Compressed DBF) Files
#'
#' This function allows you to read a DBC (compressed DBF) file into a data frame.
#' @details
#' DBC is the extension for compressed DBF files (from the 'XBASE' family of databases).
#' This is a proprietary file format used by the Brazilian government to publish public healthcare data, and it is not related to the FoxPro or CANdb DBC file formats.
#'
#' The \code{read.dbc} function will decompress the input DBC file into a temporary DBF file and call \code{\link{read.dbf}} from the \code{foreign} package to read it into a data frame.
#'
#' @note
#' DATASUS is the name of the Department of Informatics of the Brazilian Health System (Sistema Único de Saúde - SUS) and is responsible for publishing public healthcare data in Brazil.
#' Besides the DATASUS, the Brazilian National Agency for Supplementary Health (ANS) also uses this file format for its public data.
#'
#' This function was tested using files from both DATASUS and ANS to ensure compliance with the format, and hence ensure its usability by researchers.
#'
#' Neither this project, nor its author, has any association with the Brazilian government.
#' @param file The name of the DBC file (including extension)
#' @param ... Further arguments to be passed to \code{\link{read.dbf}}
#' @return A data.frame of the data from the DBC file.
#' @keywords dbc datasus
#' @export
#' @author Daniela Petruzalek, \email{daniela.petruzalek@gmail.com}
#' @seealso \code{\link{dbc2dbf}}
#' @examples
#' # The 'sids.dbc' file is the compressed version of 'sids.dbf' from the "foreign" package.
#' file <- system.file("files/sids.dbc", package="read.dbc")
#' sids <- read.dbc(file)
#' str(sids)
#' summary(sids)
#'
#' # This is a small subset of U.S. NOAA storm database.
#' file <- system.file("files/storm.dbc", package="read.dbc")
#' storm <- read.dbc(file)
#' head(storm)
#' str(storm)
#'
read.dbc <- function(file, ...) {
# Output file name
out <- tempfile(fileext = ".dbf")

# Decompress the dbc file using the blast library wrapper.
if( dbc2dbf(file, out) ) {
# Use read.dbf from foreing package to read the uncompressed file
df <- foreign::read.dbf(out, ...)

# Delete temp file
file.remove(out)

# Return data frame
return(df)
}
}
100 changes: 100 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<!-- badges: start -->
[![R-CMD-check](https://github.com/danicat/read.dbc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/danicat/read.dbc/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
# read.dbc

Author: Daniela Petruzalek
e-mail: daniela.petruzalek@gmail.com
License: AGPLv3

## Introduction

`read.dbc` is a R package to enable importing data from `DBC` (compressed `DBF`) files into data frames. Please note that this is the file format used by DATASUS and it is not related to Microsoft FoxPro or CANdb DBC file formats.

DATASUS is the name of the Department of Informatics of Brazil's Healthcare System (Sistema Unico de Saúde - SUS). They are responsible for publishing Brazilian public healthcare data. Besides DATASUS, the Brazilian National Agency for Supplementary Health (ANS) also uses this file format for its public data.

This code was tested using files from both DATASUS and ANS to ensure compliance with the format, and hence ensure its usability by researchers.

This project is based on the work of [Mark Adler](https://github.com/madler/zlib/tree/master/contrib/blast) (blast) and [Pablo Fonseca](https://github.com/eaglebh/blast-dbf) (blast-dbf).

Neither this project, nor its author, is related in any way to the Brazilian government.

## Changelog

For a complete description of the changes, please check [CHANGELOG.md](/inst/CHANGELOG.md).

## Repository Contents

- `README.md`: this file.
- `CHANGELOG.md`: change history.
- `src/blast.c`: decompression tools for PKWare Data Compression Library (DCL).
- `src/blast.h`: `blast.c` header and usage notes.
- `src/dbc2dbf.c`: the main program to decompress the dbc files to dbf.
- `R/read.dbc.R`: the code for reading `.dbc` files within R.
- `R/dbc2dbf.R`: a helper function to decompress the `.dbc` files, it works as a wrapper to the "blast" code.
- `man/*`: package manuals
- `inst/*`: test and misc files

## Installation

As of June, 7 of 2016, this package officialy became part of [CRAN](https://cran.r-project.org/package=read.dbc) (The Comprehensive R Archive Network). Therefore, it's current stable version can be installed by running `install.packages`:

install.packages("read.dbc")

In case you want to install the development version of this package, you still can do it using the `devtools` library:

devtools::install_github("danicat/read.dbc")

## Usage

Reading a DBC file to a data frame:

# The 'sids.dbc' file is the compressed version of 'sids.dbf' from the "foreign" package.
sids <- read.dbc(system.file("files/sids.dbc", package="read.dbc"))
str(sids)
summary(sids)

# The following code will download data from the "Declarations of Death" database for
# the Brazilian state of Parana, year 2013. Source: DATASUS / Brazilian Ministry of Health
url <- "ftp://ftp.datasus.gov.br/dissemin/publicos/SIM/CID10/DORES/DOPR2013.dbc"
download.file(url, destfile = "DOPR2013.dbc", mode = "wb")
dopr <- read.dbc("DOPR2013.dbc")
head(dopr)
str(dopr)

Decompressing a DBC file to a DBF:

# Input file name
in.f <- system.file("files/sids.dbc", package = "read.dbc")

# Output file name
out.f <- tempfile(fileext = ".dbc")

# The call return logi = TRUE on success
if( dbc2dbf(input.file = in.f, output.file = out.f) ) {
print("File decompressed!")
file.remove(out.f)
}

## Contact Info

If you have any questions, please contact me at [daniela.petruzalek@gmail.com](mailto:daniela.petruzalek@gmail.com).

## Developer Information

### Mac OS X

Setup:
- Install Xcode
- Install R: https://cran.r-project.org/bin/macosx/
- Install Rstudio: https://posit.co/download/rstudio-desktop/
- Run `make setup` to install R dependencies
- Run `make check` to verify the package

You can also run `make help` to see a list of available commands.

## Submitting to CRAN

First make sure all the checks are passing by running `make cran`.

Once ready, use `devtools::submit_cran()`. This needs to run from RStudio or the R interpreter itself as the tool doesn't allow non-interactive runs.
48 changes: 48 additions & 0 deletions read.dbc.Rcheck/00_pkg_src/read.dbc/inst/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
## CHANGELOG.md

### Version 1.0.7

* Removed broken links
* Improved error handling in blast.c to prevent runtime errors (fixes gcc-UBSAN)
* Update DESCRIPTION with collaborators
* Documentation edits for conciseness
* Overall doc improvements

### Version 1.0.5
- Fixed BUG that left files open on error (Issue #4)

### Version 1.0.4
- Fixed BUG on the Solaris port
- Small code cleanups

### Version 1.0.3
- Cleanup of the manual - disambiguation of the file format
- This DBC file is not compatible with FoxPro or CANdb
- Added path expansion to handle '~' in file names

### Version 1.0.2
- Preparations for CRAN
- Improved error handling in C code
- Improved examples in documentation
- Removed keep.dbf parameter from read.dbc. (useless?)
- Fixed read.dbc to use tempfiles.

### Version 1.0.1
- Documentation cleanup
- Added test files sids.dbc and storm.dbc
- Separation of code from the command-line decompressor blast-dbf to avoid conditional compilation
- Removed unused files

### Version 1.0.0: Packaged release
- Project was converted into a R package
- Now it can be installed with devtools::install_github("danicat/read.dbc")
- Added documentation
- Minor fixes and code reorganization

### Version 0.1: (Initial Release)

- Fork of the code available on https://github.com/eaglebh/blast-dbf.
- Fixed the code to work with standard input/output redirection.
- Split the core blast code (blast.c) from the dbc2dbf code (dbc2dbf.c).
- Added conditional compilation to shared library (.so) or command line.
- Note: the original test.pk decompression test is broken in this version because it has no header (as opposed to a .dbc file).
Binary file not shown.
Binary file not shown.
Loading
Loading