-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
For loading a file into a Microsoft SQL Server database, the use of quotations needs to be consistent across columns and rows. Also, there is an extra tab at the end of each row that causes issues since the file is tab delimited. Lastly, Byte 0xe4 appears in the icd9.txt file, and causes issues in loading the file into a Microsoft SQL Server database.
The following line of bash code addresses each of those issues by removing quotations, the tabs at the end of each rows, and the special characters that cause problems.
cat icd9.txt | tr -d '"' | sed 's/\t$//g' | LANG=C sed 's/[\d128-\d255]//g' > icd9.csv
Metadata
Metadata
Assignees
Labels
No labels