We explain the idea surrounding a communication-avoiding tall-skinny QR (TSQR) factorisation. Here are the steps taken:
-
Matrix Partitioning: We are given a tall, narrow matrix
$W\in\mathbb{R}^{m\times n}$ with$m\gg n$ . This matrix is divided into four blocks of rows, which would be distributed across four processors, with$W$ now being$[W_0\ W_1\ W_2\ W_3]^\intercal$ , where each block$W_i$ has$\frac{m}{4}$ rows. -
Local QR Factorisation: Each processor performs a local QR decomposition on its block,
$W_i=Q_iR_i$ , where$Q_i\in\mathbb{R}^{\frac{m}{4}\times n}$ is orthogonal and$R_i\in\mathbb{R}^{n\times n}$ is upper triangular. Moreover, this step avoids inter-processor communication, keeping computation local. -
Reduction Step: The upper triangular factors
$R_i$ from each local QR decomposition are collected and combined into a new small matrix given by$R=[R_0\ R_1\ R_2\ R_3]^\intercal\in\mathbb{R}^{4n\times n}$ . Another QR factorisation is performed on this reduced matrix, giving us$R=Q\prime R_{\text{final}}$ . Here,$Q\prime\in\mathbb{R}^{4n\times n}$ is another orthogonal matrix, and$R_{\text{final}}\in\mathbb{R}^{n\times n}$ is the final upper triangular factor. -
Constructing the Final
$Q$ : Since$W=QR$ , the final orthogonal matrix is$Q=[Q_0\ Q_1\ Q_2\ Q_3]^\intercal\cdot Q\prime\in\mathbb{R}^{m\times n}$ . The local orthogonal matrices$Q_i$ are updated by multiplying with$Q\prime$ . The final decomposition is thus$W=QR_{\text{final}}$ .
The repository is organised in such a way as to address the requirements of the assignment (i.e., a separate directory for the Python implementation and for the C implementation).
Located in c, where this folder contains the C code required for the assignment, addressing the second and third question. It is organised as follows:
-
tsqr.c: Implements the communication-avoiding TSQR factorisation using LAPACK routines along with MPI for parallel processing. This file defines a method which performs the local QR factorisations and the subsequent reduction step as described in the lectures. -
timing.c: Serves as a test driver that runs the communication-avoiding TSQR implementation for various matrix dimensions. It times the execution of the factorisation for different values of$m$ (number of rows) and$n$ (number of columns), and outputs the results to a file namedtiming.txt. -
plot.py: A Python script that reads the timing data fromtiming.txtand generates a plot. The plot visualises the scaling behaviour of the TSQR algorithm. -
Makefile: Automates the compilation process for the C files. See here for details on how to compile and run.
Located in python. Its structure is as follows:
tsqr.ipynb: A Jupyter Notebook that implements the communication-avoiding TSQR factorisation. In this version, the input matrix is divided into four blocks (using Python's slicing capabilities) and the QR decompositions are computed for each block without any parallel programming.
The required Python packages are listed in requirements.txt. Make sure these are installed (e.g., via pip install -r requirements.txt) before running any Python scripts. These are also essential for running plot.py.
- Prior to a Full Compilation and Execution: Navigate to the
cfolder and run:
make cleanThis is to ensure timing.png is deleted, allowing a fresh run to commence.
- Compilation: Once more, ensure you are in the
cfolder and run:
makeThis will generate timing and tsqr, two executables for their respective source files.
- Execution and Timing: We first focus on
tsqr. In order to run it, use the following command:
mpirun -np 4 ./tsqrThis will print out the input, tsqr method within main. Moving on to timing, run using the following:
mpirun -np 4 ./timingNote that we do not recommend changing the values of m and n here, as they were chosen to allow for a clear plot to be generated, in order to show the scaling properties of the algorithm. This will generate a file called timing.txt containing the execution times for different matrix sizes.
- Plot Generation: With the timing data available, run the following command:
python3 plot.pyThis will produce a plot called timing.png showing how the performance varies with different values of
- Cleaning: Once finished, simply run the following:
make cleanThis will remove all executables along with any generated residual files (e.g., timing.png and timing.txt).
Ensure you are in python and open the Jupyter Notebook tsqr.ipynb within your own environment (e.g., Jupyter Notebook, Jupyter Lab, etc.).