| title | tags | sidebar | permalink | summary | keywords | |
|---|---|---|---|---|---|---|
Notes on TLCL Ch 20 |
|
home_sidebar |
TLCL_3.5.html |
Text processing is one of the most frequent tasks on the command line. The tools introduced in this chapter are great tools and spending some time on these will help you in the long run. |
cat, sort, uniq, cut, paste, join, diff, tr, sed |
- MS Stream (UF account needed)
- Dropbox (No account, offers picture-in-picture, but no captions or search)
Don't forget you can use the PDF to copy/paste larger chunks of data...In this case, it takes some regex playing to make it work, so I have put a copy of the distros.txt file in /ufrc/bsc4452/share/Class_Files/TLCL_files/.
- p. 273: Applications of Text: Mostly this is pointing out the diversity of text data that we may encounter. Again, text processing is pervasive and important to so much of what we do, so these tools are important!
- p. 274:
cat: Thecatprogram is maybe the most used program on the command line...it displays the contents of a text file on the screen. In the RegEx chapter we mentioned that there is a character at the end of each line.catcan show you this, as well as tab characters, with thecat -Aoption. - p. 276: MS-DOS Text vs. Unix Text: Especially for Windows users this is an important box to read. DOS (the underlying OS of Windows) and Linux do not use the same characters to signify the end of a line. Many text editors on Windows default to DOS line breaks. If you then transfer a file with DOS line breaks to Linux, the file is often interpreted as one long line and this usually breaks things! VSCode, always uses Linux line breaks. But the
cat -Acommand featured here is handy.
{% include callout.html content="To convert a file with DOS line breaks to Linux, you can use the dos2unix: e.g. dos2unix file.txt. There is a file in the Class_Files folder with DOS line breaks:
[magitz@login3 ~]$ cat -A /blue/bsc4452/share/Class_Files/data/DOS_formatted_file.txt
This is an example DOS text file.^M$
^M$
Notice the extra new line characters when viewed with cat -A.^M$
^M$
Creating a file on Windows using DOS line breaks and uploading the file to^M$
Linux is a constant source of pain. When you look at the file, everything^M$
seems normal. But Linux commands will often read the file as one long line.^M$
This breaks many things and causes trouble...^M$
^M$
To convert a DOS file to Linux, use the dos2unix command:^M$
dos2unix file.text^M$
" type="warning"%}
- MS Stream (UF account needed)
- Dropbox (No account, offers picture-in-picture, but no captions or search)
{% include tip.html content="You may notice that the text starts with a basic command like cut -f 4 distros.txt and then adds another transformation, or changes an option to get a slightly different output. This iterative process, starting simple and refining to achieve a desired result is an excellent strategy when approaching a problem. A command line like the one on 290--sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt > distros-by-date.txt--is not written fully formed, fully functional from the start; it is the product of testing, modifying, refining. Do not expect to write such commands from scratch, do not believe that the author did either!" %}
- MS Stream (UF account needed)
- Dropbox (No account, offers picture-in-picture, but no captions or search)
- MS Stream (UF account needed)
- Dropbox (No account, offers picture-in-picture, but no captions or search)
-
p. 290 & 291:
pasteandjoin:pastecan be useful, but keep in mind that the order of records in the files must be the same!joinhowever can be used to combine files that have a shared column of data. We will look at joins more when we get to databases. Have a look at this section, but don't worry too much about the details. -
p. 293 - 298: Comparing Text: You can skip the rest of the chapter. The
diffcommand can be useful, but there's too much to learn here...
- MS Stream (UF account needed)
- Dropbox (No account, offers picture-in-picture, but no captions or search)
-
p. 301:
sedis super powerful, this is a quick introduction.sedallows you to do regular expression find and replace like operations from the command line. I would suggest looking at this section, but focus on the substitution options: e.g.s/first/second/g. The text has a ton of great examples, and maybe one of them speaks to you, but I think it may be overwhelming! -
p. 309:
aspell: skip this section.