Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 136 additions & 4 deletions install.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,20 +47,152 @@ docker run --rm -v $PWD:/home/work tschijnmo/drudge:gristmill python3 script.py
can also execute the script directly.


# Downloads and installation (Native)
# Local Installation (Development)

For development, the drudge stack can also be downloaded, compiled, and
installed from source. For most non-developmental users, execution by Docker
is recommended.

## Conda

Conda is _strongly_ recommended for building and running this program locally. Conda is a utility for managing environments, which are like containers that hold a specific set of software installations. Just like a docker container can maintain exact versions and configurations of dependencies, so can Conda. The install script utilizes conda and will not run without it. To get conda simply go [here](https://www.anaconda.com/docs/getting-started/miniconda/install) and follow the instructions for your operating system. Once it is installed you may proceed

## Install Script (Linux-x86_64)

**These instructions function only for Linux x86_64 machines**. If you are using a mac or a different architecture, skip straight to the manual installation instructions.

There exists an install script at `drudge/install/install.sh`. You must run this script by executing
```
source install/install.sh
```
You must use `source` as it executes the commands in the current process (as you) rather than opening a subshell in which conda usually breaks. Simply running this line should be enough to get you into shape. At this point if you'd like to utilize vscode you can simply run

```
code .
```

to open a vscode window to the drudge location.

Should anything go wrong, work your way through the manual installation instructions below

## Manual Installation

These instructions follow directly the existing install script, but provide a more interactive experience if needed. Enter the root drudge directory before proceeding, this is the base directory of the github repository.

### Create your Conda Environment
```
conda create --name $ENV_NAME python=3.9 -y
conda install --name $ENV_NAME -- file install/$ENV_TYPE.txt -y

conda init
conda activate $ENV_NAME
```

#### Parameters:
- `$ENV_NAME`: This is the name of your environment, replace it with an environment name that does not already exist. Check the existing environment names with `conda env list`
- `$ENV_TYPE.txt`: This is the desired environment dependencies file. There are two in the `install/` folder, `env_x86.txt` and `env_arm.txt`. To know which one you need simply run `uname -m` and it will return either `x86_64` or `arm64`,
- `arm64 -> env_arm.txt`
- `x86_64 -> env_x86.txt`
- If you get something other than `arm64` or `x86_64` feel free to try to install either file anyway and let us know if it doesn't work
- (UPDATE) Conda has been having issues lately so if you encounter problems ensure you're on the latest conda version with `conda update -n base -c defaults conda` and conda install the packages one at a time. Sometimes attempting to install multiple packages in a single transaction breaks conda.

### Clone Submodules
```
git submodule update --init --recursive
```
This installs necessary dependencies` github repositories

### Set Environment Variables and Build
```
python3 setup.py build
python3 setup.py install

export PYTHONPATH=/PATH/TO/drudge/build
export DUMMY_SPARK=1
```

The first two lines build the c++ files into cpython files which can be imported and executed by our python program. By default python cannot import or utilize c++ files, this step is necessary for a fully functioning drudge.

The next two lines set necessary environment variables, so python knows where to find our python imports, and to utilize local dummy_spark instead of apache spark (which isn't quite working on python3.9 yet). The PYTHONPATH should be set to the build directory inside the drudge repository, this build directory is created by the `setup.py` `build` and `install` commands in the previous lines.

<!-- ### Copy the built cpython files
```
cp build/lib.linux-x86_64-cpython-39/drudge/wickcore.cpython-39-x86_64-linux-gnu.so drudge/
cp build/lib.linux-x86_64-cpython-39/drudge/canonpy.cpython-39-x86_64-linux-gnu.so drudge/
Comment on lines +120 to +121
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the copying of compiled CPython libraries is not required if python setup.py install is successfully executed. @Wholinator can you double check?

```
These are _**EXAMPLE**_ lines. Yours might look different. What you're looking for is:
1) In the Build Folder inside the root folder
2) In the folder that starts with `lib.`
3) In its `drudge` folder
4) The two files that end with `.so`

Copy both these files into the `drudge/drudge` folder that contains the `canonpy.cpp` and `wickcore.cpp` files.

This step is the reason that the script is impossible to generalize across operating systems. I can see little rhyme or reason for the way these files/folders get named so predicting them in code appears impossible. -->

### Get Dummy Spark
Ensure you're in the base drudge directory and
```
git clone https://github.com/DrudgeCAS/DummyRDD ../dummyRDD/
cp -r ../dummyRDD/dummy_spark .
rm -rf ../dummyRDD/
```

These code lines do the following:
1) Clone the repository that contains dummy_spark just outside of the drudge directory
2) Copies the relevant `dummy_spark/` folder into the drudge base directory
3) Deletes the now unnecessary `dummyRDD` repository

### Running It
It should be completely installed now. You can open your desired location in your desired IDE but you must ensure 3 things are true before running the code.

1) The python interpreter used by the IDE must be set to the python executable inside your conda environment, likely at something like `~/miniconda3/envs/drudge/bin/python`. This tells the IDE which python executable to use to run all your code.
2) The conda environment must be activated by whatever is running the code. In VSCode there's a terminal which shows the debug or run commands when you click debug or run. This terminal is where the conda environment must be activated. This tells the IDE where to find all the dependencies/packages required for drudge to run
3) PYTHONPATH environment variable must be set to the `build` folder in your drudge repository. This build folder is created when you run the `setup.py` steps. This tells the IDE where to find all the drudge files that you'll want to import.

## VSCode Specific Instructions
There are some steps required for drudge to function in its current state in vscode. You should treat this as a checklist every time you open the project.

### 1) Correct Interpreter
In the bottom right of vscode, just to the left of the bell you should see something like `3.9.20 ('drudge': conda)`. This denotes the currently enabled python interpreter (interpreter=python executable file). If it does not show up, ensure you have a python file open. You want to look through this interpreter list and choose the one that corresponds to the conda environment you made during installation.

### 2) Conda Environment Activated
In the terminal in the bottom portion of the window (you can open with `ctrl/cmd + ~` if it's closed) with the `TERMINAL` tab selected you should see your command prompt. At the very left there might be some things in parenthesis like `(drudge) (base)`. You want to ensure that your drudge environment name is here. If it's not, simply run
Comment on lines +159 to +160
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

```
conda activate drudge
```
with drudge replaced with whatever your environment name is.

NOTE: If you're having import or version issues and have multiple environments listed in parenthesis it can be helpful to deactivate all environments by running
```
conda deactivate
```
repeatedly until all environment names are cleared, and then activating your drudge environment again.

### 3) Set Environment Variables
```
export PYTHONPATH=$(pwd)
export DUMMY_SPARK=1
```
Every time you open a new terminal:
- You just opened vscode
- You clicked debug for the first time this session
- You hit the plus sign to the right of the TERMINAL tab

you'll need to ensure that both the drudge conda environment is activated, and that the environment variables are set. To set the `PYTHONPATH` variable correctly you need to ensure you are in the `drudge/` base directory (the one that github clones).

### You're set.
This should be everything you need to do to get drudge running. If there's problems or you encounter other errors, I recommend you take notes on what the problem was and what you did to fix it.

<!--
## Dependencies

In order to fully take advantage of the latest technology, the drudge/gristmill
stack requires Python at least 3.6, and Apache Spark at least 2.2 is need. To
compile the binary components, a C++ compiler with good C++14 support is
required. Clang++ later than 3.9 and g++ later than 6.3 is known to work.


-->
## Downloads

All components of the drudge/gristmill stack are hosted on Github. The
Expand All @@ -84,7 +216,7 @@ submodules of
wrapping core C++ native modules for Python with ease.

As a result, to clone the repositories, `--recurse-submodules` is recommended.

<!--
## Compilation and installation

By `setuptools`, inside the root directory of the source tree of drudge or
Expand All @@ -94,4 +226,4 @@ gristmill, the compilation and installation can simply be
python3 setup.py build
python3 setup.py install
```

-->