The transformation rules specify how the records in REDCap are transformed into records in your database.
This is a simple example with a single database table. For this example, the project is non-longitudinal and has one form called "Registration".
REDCap Data
| record_id | first_name | last_name | dob | registration_complete |
|---|---|---|---|---|
| 1001 | Anahi | Gislason | 08-27-1973 | 2 |
| 1002 | Marianne | Crona | 06-18-1958 | 2 |
| 1003 | Ryann | Tillman | 08-28-1967 | 2 |
Transformation Rules
TABLE,registration,registration_id,ROOT
FIELD,first_name,string
FIELD,last_name,string
FIELD,dob,date,birthdate
Database Table
registration
| registration_id | record_id | first_name | last_name | birthdate |
|---|---|---|---|---|
| 1 | 1001 | Anahi | Gislason | 1973-08-27 |
| 2 | 1002 | Marianne | Crona | 1958-06-18 |
| 3 | 1003 | Ryann | Tillman | 1967-08-28 |
In this example:
- The database table name is specified as registration
- The database table is specified with a "rows type" of ROOT, because the rows of data have a one-to-one mapping to record IDs, i.e., each study participant has one first name, one last name, and one birthdate.
- The database field registration_id (specified in the TABLE command) is created automatically as an auto-incremented synthetic key
- The database field record_id represents the REDCap record ID, and is created automatically in the database for all tables by REDCap-ETL
- The database fields record_id, first_name and last_name match the REDCap fields.
- The REDCap field dob with type date, was renamed to birthdate in the database
- The birthdate database field has Y-M-D format, which is what REDCap returns (even though the field was defined as having M-D-Y format in REDCap)
- No transformation rule was defined for the REDCap registration_complete field, so it does not show up in the database. You are not required to specify a rule for every field, so you can specify rules for only those fields that you are interested in.
Transformation rules consists of one or more TABLE statements, where each TABLE statement is followed by zero or more FIELD statements that specify what REDCap fields are stored in the table. Each statement needs to be on its own line.
TABLE, <table_name>, <parent_table|primary_key_name>, <rows_type>
FIELD, <field_name>, <field_type>[, <database_field_name>]
...
FIELD, <field_name>, <field_type>[, <database_field_name>]
TABLE, <table_name>, <parent_table|primary_key_name>, <rows_type>
FIELD, <field_name>, <field_type>[, <database_field_name>]
...
A table statement with rows type ROOT that has no fields following it will generate a table that contains only a synthetic primary key field and a record ID field.
The transformation rules language is line-oriented, and each line has a comma-separated value (CSV) syntax. This allows the transformation rules to be edited as a spreadsheet, as long as it is saved in CSV format. Editing this way eliminates the need to enter field separators (commas), and automatically aligns field mappings horizontally, which makes them easier to read.
The following lines are ignored by ETL processing:
- Comment lines - lines where the first (non-space) character is a #
- Blank lines
- Lines with all blank fields (i.e., only commas, or commas and spaces)
Table statements specify the tables that should be generated in your database.
TABLE, <table_name>, <parent_table|primary_key_name>, <rows_type>
Note: if the table is a root table, it has no parent table, and the field after the table name is the name to use for the table's (synthetic) primary key:
TABLE, <table_name>, <primary_key_name>, ROOT
For non-root tables, the field after the table name is the name of its parent table.
TABLE, <table_name>, <parent_table>, <rows_type>
The <rows_type> indicates the following things about the table:
- If the table is a root table or a child table of another table.
- What data rows from REDCap will be stored in the table.
- What identifier/key fields the rows of the database table will contain.
The possible <rows_type> values are shown in the table below:
| Rows Type | Description |
|---|---|
| ROOT | This indicates that the table is a root table (it has no parent table). ROOT table are intended for tables that store REDCap fields that have a one-to-one relationship with the REDCap record ID, for example: first name, last name, birthdate. If REDCap fields that have multiple values per record ID are specified in a root table, a warning message will be generated by REDCap-ETL. |
| EVENTS | If this rows type is specified, only REDCap values that are in rows that are "standard" events (i.e., from non-repeating forms that are in non-repeating events) will be stored in the database table. Since the same form can be in multiple (non-repeating) events, in general, the rows in the database table will have a many-to-one relationship with the record ID. For example, you might have a field "total_cholesterol" in events "Initial Visit" and "Final Visit", so there would be 2 "total_cholesterol" values per record ID. |
| REPEATING_EVENTS | This rows type indicates that only REDCap values from rows that are in repeating events will be stored in the database table. |
| REPEATING_INSTRUMENTS | This rows type indicates that only REDCap values that are in repeating instruments will be stored in the database table. |
| <suffixes> | This is typically used for a table that stores related REDCap fields that have the same prefix, but different suffixes. For example, you might specify suffixes of "1;2;3" for fields "heart_rate1", "heart_rate2" and "heart_rate3" that represent different heart rate measurements of a participant, and would represent a many-to-one relationship of heart rate to the the record ID field. |
|
EVENTS:<suffixes> REPEATING_EVENTS:<suffixes> REPEATING_INSTRUMENTS:<suffixes> |
These are typically used for tables that store related REDCap fields that have the same prefix, but different suffixes, that occur in multiple events in a longitudinal study. For example, you might specify suffixes of "1;2;3" for fields "heart_rate1", "heart_rate2" and "heart_rate3" that represent different heart rate measurements of a participant, and are in events "Initial Visit" and "Final Visit". The events would represent a many-to-one relationship with the record ID, and heart rate field would represent a many-to-one relationship with the event that contained them. |
& operator for rows type
For longitudinal studies, the three rows type EVENTS, REPEATING_INSTRUMENTS and REPEATING_EVENTS can be combined together using the & operator, for example:
TABLE, visits, enrollment, EVENTS & REPEATING_EVENTS
Suffixes
<suffixes> is in the format
<suffix1>; <suffix2>; ...
for example:
1;2;3;4
a;b;c
first; second; third
Identifier/key fields
REDCap-ETL will automatically create various identifier and key fields in each database table:
<primary_key>- a numeric synthetic key (created for all tables)<foreign_key>- a numeric foreign key that references the primary key of the table's parent table. This field is created for all tables with a rows type other thanROOT.<record_id>- the record ID from REDCap (created for all tables)redcap_event_name- the REDCap unique event name for the data record in REDCap. This is only created if the REDCap study is longitudinal, and the table's rows type is one of the following:EVENTS,REPEATING_EVENTS,REPEATING_INSTRUMENTS,EVENTS:<suffixes>redcap_repeat_instrument- the REDCap instrument name for the data record in REDCap. This is only created for tables with the rows typeREPEATING_INSTRUMENTSredcap_repeat_instance- the REDCap instance value for the data record in REDCap. This field is only created for tables with rows typesREPEATING_EVENTSorREPEATING_INSTRUMENTS.redcap_suffix- this field contains the suffix value for the record. This field is only created for tables that have a rows type of<suffixes>orEVENTS:<suffixes>.
Field statements specify the fields that are generated in the tables in your database.
FIELD, <field_name>, <field_type>[, <database_field_name>]
<field_name> is the name of the field in REDCap.
- If
<database_field_name>is not set,<field_name>will also be the name of the field in the database where the extracted data are loaded - If
<database_field_name>is set, then it will be used as the name of the field in the database where the extracted data are loaded
<field_type> can be one of the REDCap-ETL types in the table below that shows
the database types used to store the different REDCap-ETL types.
| REDCap-ETL Type | MySQL Type | PostgreSQL Type | SQL Server Type | SQLite Type | CSV (Spreadsheet) Type |
|---|---|---|---|---|---|
| int | int | int | int | int | number |
| float | float | float | float | float | number |
| char(size) | char(size) | char(size) | char(size) | char(size) | text |
| varchar(size) | varchar(size) | varchar(size) | varchar(size) | varchar(size) | text |
| string | text | text | text | text | text |
| dropdown | int/varchar | int/varchar | int/varchar | int/varchar | number/text |
| radio | int/varchar | int/varchar | int/varchar | int/varchar | number/text |
| date | date | date | date | date | datetime |
| datetime | datetime | timestamptz | datetime | datetime | datetime |
| checkbox | int | int | int | int | number |
NOTES:
- Field types 'dropdown' and 'radio' will have database type int/number if all values specified in REDCap for them are integer values, otherwise they will have database type varchar/text. The size of the varchar will be the minimum size necessary to store the longest dropdown/radio value.
- Field types 'checkbox', 'dropdown', and 'radio' can only be used for their corresponding REDCap field types.
TABLE,FIELD,<rows_type>, and<field_type>are all case sensitive. So, for example,TABLE,FIELD,ROOT, andEVENTSmust be uppercase, andint,float,string,date, andcheckboxmust be lowercase.
In this example, the REDCap project is a longitudinal project with a registration form and a visit form. The visit form is used by 3 events: Visit1, Visit2 and Visit3.
REDCap Data
| record_id | redcap_event_name | first_name | last_name | dob | registration_complete | weight | height | visit_complete |
|---|---|---|---|---|---|---|---|---|
| 1001 | registration_arm_1 | Anahi | Gislason | 8/27/1973 | 2 | |||
| 1001 | visit1_arm_1 | 90 | 1.7 | 2 | ||||
| 1001 | visit2_arm_1 | 91 | 1.7 | 2 | ||||
| 1001 | visit3_arm_1 | 92 | 1.7 | 2 | ||||
| 1002 | registration_arm_1 | Marianne | Crona | 6/18/1958 | 2 | |||
| 1002 | visit1_arm_1 | 88 | 1.8 | 2 | ||||
| 1002 | visit2_arm_1 | 88 | 1.8 | 2 | ||||
| 1002 | visit3_arm_1 | 87 | 1.8 | 2 | ||||
| 1003 | registration_arm_1 | Ryann | Tillman | 8/28/1967 | 2 | |||
| 1003 | visit1_arm_1 | 100 | 1.9 | 2 | ||||
| 1003 | visit2_arm_1 | 102 | 1.9 | 2 | ||||
| 1003 | visit3_arm_1 | 105 | 1.9 | 2 |
Transformation Rules
TABLE,registration,registration_id,ROOT
FIELD,record_id,string
FIELD,first_name,string
FIELD,last_name,string
FIELD,dob,date
TABLE,visit,registration,EVENTS
FIELD,weight,string
FIELD,height,string
Database Tables
registration
| registration_id | record_id | first_name | last_name | birthdate |
|---|---|---|---|---|
| 1 | 1001 | Anahi | Gislason | 1973-08-27 |
| 2 | 1002 | Marianne | Crona | 1958-06-18 |
| 3 | 1003 | Ryann | Tillman | 1967-08-28 |
visits
| visit_id | registration_id | record_id | redcap_event_name | weight | height |
|---|---|---|---|---|---|
| 1 | 1 | 1001 | visit1_arm_1 | 90 | 1.7 |
| 2 | 1 | 1001 | visit2_arm_1 | 91 | 1.7 |
| 3 | 1 | 1001 | visit3_arm_1 | 92 | 1.7 |
| 4 | 2 | 1002 | visit1_arm_1 | 88 | 1.8 |
| 5 | 2 | 1002 | visit2_arm_1 | 88 | 1.8 |
| 6 | 2 | 1002 | visit3_arm_1 | 87 | 1.8 |
| 7 | 3 | 1003 | visit1_arm_1 | 100 | 1.9 |
| 8 | 3 | 1003 | visit2_arm_1 | 102 | 1.9 |
| 9 | 3 | 1003 | visit3_arm_1 | 105 | 1.9 |
For the visits table:
- visit_id is the synthetic primary key automatically generated by REDCap-ETL
- registration_id is the foreign key automatically generated by REDCap-ETL that points to the parent table registration
REDCap Data
| Event | Variable | Record1 | Record2 | Record3 |
|---|---|---|---|---|
| Initial | record | 1 | 2 | 3 |
| Initial | var1 | Joe | Jane | Rob |
| Initial | var2 | Smith | Doe | Smith |
| evA | var3 | 10 | 11 | 12 |
| evA | var4 | 20 | 21 | 22 |
| evA | var5a | 1001 | 1021 | 1031 |
| evA | var6a | 2001 | 2021 | 2031 |
| evA | var5b | 1002 | 1022 | 1032 |
| evA | var6b | 2002 | 2022 | 2032 |
| evA | var7 | 10,000 | 10,001 | 10,002 |
| evA | var8a | red1 | red2 | red3 |
| evA | var8b | green1 | green2 | green3 |
| evB | var3 | 101 | 102 | 103 |
| evB | var4 | 201 | 202 | 203 |
| evB | var5a | 3001 | 3021 | 3031 |
| evB | var6a | 4001 | 4021 | 4031 |
| evB | var5b | 3002 | 3022 | 3032 |
| evB | var6b | 4002 | 4022 | 4032 |
| evB | var7 | 20,000 | 20,001 | 20,002 |
| evB | var8a | blue1 | blue2 | blue3 |
| evB | var8b | yellow1 | yellow2 | yellow3 |
Transformation Rules
TABLE, Main, Main_id, ROOT
FIELD, record, int
FIELD, var1, string
FIELD, var2, string
TABLE, Second, Main, EVENTS
FIELD, var3, int
FIELD, var4, int
TABLE, Third, Main, EVENTS
FIELD, var7, int
TABLE, Fourth, Third, a;b
FIELD, var5, int
FIELD, var6, int
TABLE, Fifth, Main, EVENTS:a;b
FIELD, var8, string
Database Tables
NOTE: This only shows data transformed from REDCap record 1
Main
| record_id | var1 | var2 |
|---|---|---|
| 1 | Joe | Smith |
Second
| second_id | record_id | redcap_event_name | var3 | var4 |
|---|---|---|---|---|
| 1 | 1 | evA | 10 | 20 |
| 2 | 1 | evB | 101 | 201 |
Third
| third_id | record_id | redcap_event_name | var7 |
|---|---|---|---|
| 1 | 1 | evA | 10,000 |
| 2 | 1 | evB | 20,000 |
Fourth
| fourth_id | third_id | redcap_suffix | var5 | var6 |
|---|---|---|---|---|
| 1 | 1 | a | 1001 | 2001 |
| 2 | 1 | b | 1002 | 2002 |
| 3 | 2 | a | 3001 | 4001 |
| 4 | 2 | b | 3002 | 4002 |
Fifth
| fifth_id | record_id | redcap_event_name | redcap_suffix | var8 |
|---|---|---|---|---|
| 1 | 1 | evA | a | red1 |
| 2 | 1 | evA | b | green1 |
| 3 | 1 | evB | a | blue1 |
| 4 | 1 | evB | b | yellow1 |
NOTES:
- In this example, var3/var4 need to be put in one table while var7 needs to be put in a different table, despite the fact that all three variables have the same 1-many relationship with var1/var2.
- This syntax assumes that Events will always be used to define 1-many relationships to the root parent table. Although we might envision a more complex situation in which events are named such that some events are considered children of other events, in practice that has not been done.
- This example does not include a situation in which a child table that uses suffixes has a parent table that also uses suffixes, but the transforming code can handle that situation.