Skip to content

1.2 Standard Computer Tools

David Kahler edited this page Aug 3, 2023 · 2 revisions

This wiki page discusses getting your computer ready for data analysis and modeling


Sharing and Project Coordination

We use GitHub to share computer codes and track project progression. Register for Github and setup your two-factor authentication (you can use Duo, which is what Duquesne uses for two-factor authentication, so you will not need another app).
https://github.com/

Hardware

I got some questions about why I make certain decisions about my computer hardware. Hopefully this information helps. Any specific brands mentioned are only provided as examples and not endorsements of any manufacturer.

Power Supply

All of the desktops and computer peripherals are connected to uninterruptible power supplies (UPS), or batteries. This is for the longevity of the computers, especially the longevity and crash reduction of hard drives, and to avoid the irritation of lost work in the event of a power outage. This does not mean that you can keep working on computers if there is a power outage; instead, you should use that time to prepare your computer for shutdown so that the battery does not run out completely (lead-acid batteries do not like to be completely discharged) and to prepare for additional power abnormalities. Laptops are not connected to UPSs because they have batteries. Needless to say, everything should be connected to a surge protector.

Data Storage

Local Data Storage

The lab computers use some local data storage. This is a combination of external solid state drives (SSD) and some drives that are redundant array of independent disks (RAID), which are generally hard disk drives (HDD). Solid states drives are generally more resilient to day-to-day use, but are expensive, and have inferior data longevity (not generally relevant, but included in the description for completeness). Hard disk drives are vulnerable to more malfunctions due to moving parts, although, these drives are less expensive and magnetic storage is far superior for longevity. The RAID system is for backup. A drive contains at least two hard drives of comparable size. They continuously back up to one another; therefore, the capacity is only half of the total capacity. This is useful for big files that would take a while to download and/or large derived files that will are extremely valuable and would take an unreasonable amount of time to recreate from raw data. Thunderbolt connections are used whenever possible to increase the speed of file transfer. This is especially important for use of the RAID/HDD system for manipulation.

Remote Data Storage

This is also called cloud storage. Duquesne has a subscription to Box, which offers redundant data backup and you can map the drive to any computer operating system. You will be limited in speed of access. This is especially good for large data files that are not being actively used or any small data files. Also, this is a good sharing tool.

Software

Getting Your Computer Ready

It's not as straight-forward to simply start downloading programs. Programs must be installed in a logical manner so that you can easily reference them. This is particularly important for Git and RStudio and the libraries used by QGIS. Some care should also be used in the installation and folder/directory placement of the USGS programs.

XCode (full install, MacOS only)

Apple's XCode includes may tools that will be used for a variety of tasks. Specifically, the Command Line Tools contain a collection of programs that will assist you in your work; sometimes these are simply called Developer Tools because they are part of that suite of features. First, go to the App Store and update your XCode. This may take two attempts and could take a long time. After you're sure that XCode is up-to-date, install the Command Line Tools. This can be done through XCode or in the terminal by typing xcode-select --install. If you have not done so before, you may need to create a developer account linked to your Apple ID.

Homebrew

Homebrew is a program that allows you to easily install programs for MacOS or Linux. It is uniquely important when you want to install or update a small software package that will allow you to do something else. Most of these other programs you may think, why didn't that just come already installed in the operating system? The answer, of course, is that most people don't use computers for programming and data analytics. You can also get the GCC (see below) via Homebrew. Get Homebrew installation instructions at: https://brew.sh/.

Git

Git comes with the Command Line Tools but you can also install it directly if you are using Windows. This software is needed to take advantage of all GitHub has to offer.
Download and install Git from https://git-scm.com/
A great tutorial has already been written for Git, visit: https://happygitwithr.com/install-git.html.

Programming and Data Analytics

R/RStudio

R is a programming language that was originally developed around statistical analysis. It has quickly become the preeminent scientific data analysis tool in the world. R is a language and, for this class, it is unrealistic to operate purely from the command line. RStudio is a GUI, or graphical user interface, and it makes R unbelievably powerful. If you have used Matlab in the past, you will be very comfortable with the RStudio interface.

Fortran/C++ (GCC)

The Gnu Compiler Collection (GCC) contains compilers for common programming languages such as Fortran, C, and C++. These compilers have been bundled in Mac's XCode Developer Tools, but they may be out of date. To update them, visit the Gnu project: https://gcc.gnu.org/.

Mac/Linux Commands

ps -ax to get list of process IDs
kill -9 , where pid is process ID

Shell Scripts Not Running?

xattr -l ./SCRIPT.sh  

Remove quarantine by:

xattr -d com.apple.quarantine ./script.sh  

GIS

QGIS

The easiest way to get started with GIS is to use the open-source, freely available, QGIS: https://www.qgis.org/en/site/.

Hydrology Models

https://www.hec.usace.army.mil/software/hec-hms/

https://www.usgs.gov/mission-areas/water-resources/science/modflow-and-related-programs?qt-science_center_objects=0#qt-science_center_objects

Chemical Equilibrium Model: Minteq

Minteq is a chemical equilibrium software developed by the USEPA; however, it was developed without a graphical user interface (GUI). Luckily, someone has developed a GUI for Minteq and made it freely available. It requires Windows.
https://vminteq.lwr.kth.se/
If you would like to learn more about Minteq, visit the EPA website: https://www.epa.gov/ceam/minteqa2-equilibrium-speciation-model.

Clone this wiki locally