From a3640a98a376c59834dc863cc7d6f2a9705b0818 Mon Sep 17 00:00:00 2001 From: Matthew Wholey <15wholeym@gmail.com> Date: Wed, 14 May 2025 15:06:56 -0500 Subject: [PATCH 1/3] Installation and usage instructions updated --- install.md | 136 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 132 insertions(+), 4 deletions(-) diff --git a/install.md b/install.md index b04bc57..76bdc9c 100644 --- a/install.md +++ b/install.md @@ -47,12 +47,140 @@ docker run --rm -v $PWD:/home/work tschijnmo/drudge:gristmill python3 script.py can also execute the script directly. -# Downloads and installation (Native) +# Local Installation (Development) For development, the drudge stack can also be downloaded, compiled, and installed from source. For most non-developmental users, execution by Docker is recommended. +## Conda + +Conda is _strongly_ recommended for building and running this program locally. Conda is a utility for managing environments, which are like containers that hold a specific set of software installations. Just like a docker container can maintain exact versions and configurations of dependencies, so can Conda. The install script utilizes conda and will not run without it. To get conda simply go [here](https://www.anaconda.com/docs/getting-started/miniconda/install) and follow the instructions for your operating system. Once it is installed you may proceed + +## Install Script (Linux-x86_64) + +**These instructions function only for Linux x86_64 machines**. If you are using a mac or a different architecture, skip straight to the manual installation instructions. + +There exists an install script at `drudge/install/install.sh`. You must run this script by executing +``` +source install/install.sh +``` +You must use `source` as it executes the commands in the current process (as you) rather than opening a subshell in which conda usually breaks. Simply running this line should be enough to get you into shape. At this point if you'd like to utilize vscode you can simply run + +``` +code . +``` + +to open a vscode window to the drudge location. + +Should anything go wrong, work your way through the manual installation instructions below + +## Manual Installation + +These instructions follow directly the existing install script, but provide a more interactive experience if needed. Enter the root drudge directory before proceeding, this is the base directory of the github repository. + +### Create your Conda Environment +``` +conda create --name $ENV_NAME python=3.9 -y +conda install --name $ENV_NAME -- file install/$ENV_TYPE.txt -y + +conda init +conda activate $ENV_NAME +``` + +#### Parameters: + - `$ENV_NAME`: This is the name of your environment, replace it with an environment name that does not already exist. Check the existing environment names with `conda env list` + - `$ENV_TYPE.txt`: This is the desired environment dependencies file. There are two in the `install/` folder, `env_x86.txt` and `env_arm.txt`. To know which one you need simply run `uname -m` and it will return either `x86_64` or `arm64`, + - `arm64 -> env_arm.txt` + - `x86_64 -> env_x86.txt` + - If you get something other than `arm64` or `x86_64` feel free to try to install either file anyway and let us know if it doesn't work + +### Clone Submodules +``` +git submodule update --init --recursive +``` +This installs necessary dependencies` github repositories + +### Set Environment Variables and Build +``` +export PYTHONPATH=$(pwd) +export DUMMY_SPARK=1 + +python3 setup.py build +python3 setup.py install +``` +The first two lines set necessary environment variables, so python knows where to find our python imports, and to utilize local dummy_spark instead of apache spark (which isn't quite working on python3.9 yet). + +The next two lines build the c++ files into cpython files which can be imported and executed by our python program. By default python cannot import or utilize c++ files, this step is necessary for a fully functioning drudge + +### Copy the built cpython files +``` +cp build/lib.linux-x86_64-cpython-39/drudge/wickcore.cpython-39-x86_64-linux-gnu.so drudge/ +cp build/lib.linux-x86_64-cpython-39/drudge/canonpy.cpython-39-x86_64-linux-gnu.so drudge/ +``` +These are _**EXAMPLE**_ lines. Yours might look different. What you're looking for is: + 1) In the Build Folder inside the root folder + 2) In the folder that starts with `lib.` + 3) In its `drudge` folder + 4) The two files that end with `.so` + +Copy both these files into the `drudge/drudge` folder that contains the `canonpy.cpp` and `wickcore.cpp` files. + +This step is the reason that the script is impossible to generalize across operating systems. I can see little rhyme or reason for the way these files/folders get named so predicting them in code appears impossible. + +### Get Dummy Spark +Ensure you're in the base drudge directory and +``` +git clone https://github.com/DrudgeCAS/DummyRDD ../dummyRDD/ +cp -r ../dummyRDD/dummy_spark . +rm -rf ../dummyRDD/ +``` + +These lines: + 1) Clone the repository that contains dummy_spark just outside of the drudge directory + 2) Copies the relevant `dummy_spark/` folder into the drudge base directory + 3) Deletes the now unnecessary `dummyRDD` repository + +### That's it! + It should be completely installed now. If you now run `code .` from the drudge base directory you'll open a vscode window with the conda environment activated, the environment variables set, and the correct interpreter selected. + + If you should close this vscode window and want to open it again you'll need to do a couple steps + +## Utilizing VSCode +There are some steps required for drudge to function in its current state in vscode. You should treat this as a checklist every time you open the project. + +### 1) Correct Interpreter +In the bottom right of vscode, just to the left of the bell you should see something like `3.9.20 ('drudge': conda)`. This denotes the currently enabled python interpreter (interpreter=python executable file). If it does not show up, ensure you have a python file open. You want to look through this interpreter list and choose the one that corresponds to the conda environment you made during installation. + +### 2) Conda Environment Activated +In the terminal in the bottom portion of the window (you can open with `ctrl/cmd + ~` if it's closed) with the `TERMINAL` tab selected you should see your command prompt. At the very left there might be some things in parenthesis like `(drudge) (base)`. You want to ensure that your drudge environment name is here. If it's not, simply run +``` +conda activate drudge +``` +with drudge replaced with whatever your environment name is. + +NOTE: If you're having import or version issues and have multiple environments listed in parenthesis it can be helpful to deactivate all environments by running +``` +conda deactivate +``` +repeatedly until all environment names are cleared, and then activating your drudge environment again. + +### 3) Set Environment Variables +``` +export PYTHONPATH=$(pwd) +export DUMMY_SPARK=1 +``` +Every time you open a new terminal: + - You just opened vscode + - You clicked debug for the first time this session + - You hit the plus sign to the right of the TERMINAL tab + +you'll need to ensure that both the drudge conda environment is activated, and that the environment variables are set. To set the `PYTHONPATH` variable correctly you need to ensure you are in the `drudge/` base directory (the one that github clones). + +### You're set. +This should be everything you need to do to get drudge running. If there's problems or you encounter other errors, I recommend you take notes on what the problem was and what you did to fix it. + + ## Downloads All components of the drudge/gristmill stack are hosted on Github. The @@ -84,7 +212,7 @@ submodules of wrapping core C++ native modules for Python with ease. As a result, to clone the repositories, `--recurse-submodules` is recommended. - + From 4152c91761f24c3f3dc58fc65007b6c27a94537f Mon Sep 17 00:00:00 2001 From: Matthew Wholey <15wholeym@gmail.com> Date: Thu, 5 Jun 2025 19:28:17 -0500 Subject: [PATCH 2/3] Updated documentation to reflect new build process, replaced requirement to copy compiled shared object files with updated pythonpath requirement, removed language assuming vscode usage --- install.md | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/install.md b/install.md index 76bdc9c..ccb8a1f 100644 --- a/install.md +++ b/install.md @@ -94,6 +94,7 @@ conda activate $ENV_NAME - `arm64 -> env_arm.txt` - `x86_64 -> env_x86.txt` - If you get something other than `arm64` or `x86_64` feel free to try to install either file anyway and let us know if it doesn't work + - (UPDATE) Conda has been having issues lately so if you encounter problems ensure you're on the latest conda version with `conda update -n base -c defaults conda` and conda install the packages one at a time. Sometimes attempting to install multiple packages in a single transaction breaks conda. ### Clone Submodules ``` @@ -103,17 +104,18 @@ This installs necessary dependencies` github repositories ### Set Environment Variables and Build ``` -export PYTHONPATH=$(pwd) -export DUMMY_SPARK=1 - python3 setup.py build python3 setup.py install + +export PYTHONPATH=/PATH/TO/drudge/build +export DUMMY_SPARK=1 ``` -The first two lines set necessary environment variables, so python knows where to find our python imports, and to utilize local dummy_spark instead of apache spark (which isn't quite working on python3.9 yet). -The next two lines build the c++ files into cpython files which can be imported and executed by our python program. By default python cannot import or utilize c++ files, this step is necessary for a fully functioning drudge +The first two lines build the c++ files into cpython files which can be imported and executed by our python program. By default python cannot import or utilize c++ files, this step is necessary for a fully functioning drudge. + +The next two lines set necessary environment variables, so python knows where to find our python imports, and to utilize local dummy_spark instead of apache spark (which isn't quite working on python3.9 yet). The PYTHONPATH should be set to the build directory inside the drudge repository, this build directory is created by the `setup.py` `build` and `install` commands in the previous lines. -### Copy the built cpython files + ### Get Dummy Spark Ensure you're in the base drudge directory and @@ -136,17 +138,19 @@ cp -r ../dummyRDD/dummy_spark . rm -rf ../dummyRDD/ ``` -These lines: +These code lines do the following: 1) Clone the repository that contains dummy_spark just outside of the drudge directory 2) Copies the relevant `dummy_spark/` folder into the drudge base directory 3) Deletes the now unnecessary `dummyRDD` repository ### That's it! - It should be completely installed now. If you now run `code .` from the drudge base directory you'll open a vscode window with the conda environment activated, the environment variables set, and the correct interpreter selected. + It should be completely installed now. You can open your desired location in your desired IDE but you must ensure 3 things are true before running the code. - If you should close this vscode window and want to open it again you'll need to do a couple steps + 1) The python interpreter used by the IDE is set to the python executable inside your conda environment. Likely at something like `~/miniconda3/envs/drudge/bin/python`. This tells the IDE which python executable to use to run all your code. + 2) The conda environment must be activated by whatever is running the code. In VSCode there's a terminal which shows the debug or run commands when you click debug or run. This terminal is where the conda environment must be activated. This tells the IDE where to find all the dependencies/packages required for drudge to run + 3) PYTHONPATH environment variable must be set to the `build` folder in your drudge repository. This build folder is created when you run the `setup.py` steps. This tells the IDE where to find all the drudge files that you'll want to import. -## Utilizing VSCode +## VSCode Specific Instructions There are some steps required for drudge to function in its current state in vscode. You should treat this as a checklist every time you open the project. ### 1) Correct Interpreter From c59d0082a4748f133d15be21d3013593cf86d608 Mon Sep 17 00:00:00 2001 From: Matthew Wholey <15wholeym@gmail.com> Date: Thu, 5 Jun 2025 19:29:00 -0500 Subject: [PATCH 3/3] Slight language change --- install.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/install.md b/install.md index ccb8a1f..c3c1b7b 100644 --- a/install.md +++ b/install.md @@ -143,10 +143,10 @@ These code lines do the following: 2) Copies the relevant `dummy_spark/` folder into the drudge base directory 3) Deletes the now unnecessary `dummyRDD` repository -### That's it! +### Running It It should be completely installed now. You can open your desired location in your desired IDE but you must ensure 3 things are true before running the code. - 1) The python interpreter used by the IDE is set to the python executable inside your conda environment. Likely at something like `~/miniconda3/envs/drudge/bin/python`. This tells the IDE which python executable to use to run all your code. + 1) The python interpreter used by the IDE must be set to the python executable inside your conda environment, likely at something like `~/miniconda3/envs/drudge/bin/python`. This tells the IDE which python executable to use to run all your code. 2) The conda environment must be activated by whatever is running the code. In VSCode there's a terminal which shows the debug or run commands when you click debug or run. This terminal is where the conda environment must be activated. This tells the IDE where to find all the dependencies/packages required for drudge to run 3) PYTHONPATH environment variable must be set to the `build` folder in your drudge repository. This build folder is created when you run the `setup.py` steps. This tells the IDE where to find all the drudge files that you'll want to import.