comp-lin-alg.github.io/L2_QR_factorisation.html at master · comp-lin-alg/comp-lin-alg.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
<!DOCTYPE html>

<html lang="en" data-content_root="./">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

    <title>3. QR factorisation &#8212; Computational linear algebra course 2023.0 documentation</title>
    <link rel="stylesheet" type="text/css" href="_static/pygments.css?v=03e43079" />
    <link rel="stylesheet" type="text/css" href="_static/fenics.css?v=16c5e00f" />
    <link rel="stylesheet" type="text/css" href="_static/proof.css" />
    <script src="_static/documentation_options.js?v=f1ab3ab9"></script>
    <script src="_static/doctools.js?v=9a2dae69"></script>
    <script src="_static/sphinx_highlight.js?v=dc90522c"></script>
    <script src="_static/proof.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@2/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="4. Analysing algorithms" href="L3_analysing_algorithms.html" />
    <link rel="prev" title="2. Linear algebra preliminaries" href="L1_preliminaries.html" />
<!--[if lte IE 6]>
<link rel="stylesheet" href="_static/ie6.css" type="text/css" media="screen" charset="utf-8" />
<![endif]-->

<link rel="stylesheet" href="_static/featured.css">


<link rel="shortcut icon" href="_static/icon.ico" />


  </head><body>
<div class="wrapper">
  <a href="index.html"><img src="_static/banner.png" width="900px" alt="Project Banner" /></a>
  <div id="access">
    <div class="menu">
      <ul>
          <li class="page_item"><a href="https://github.com/Computational-Linear-Algebra-Course/computational-linear-algebra-course" title="GitHub">GitHub</a></li>
      </ul>
    </div><!-- .menu -->
  </div><!-- #access -->
</div><!-- #wrapper -->


    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">

  <section id="qr-factorisation">
<h1><span class="section-number">3. </span>QR factorisation<a class="headerlink" href="#qr-factorisation" title="Link to this heading">¶</a></h1>
<p>A common theme in computational linear algebra is transformations
of matrices and algorithms to implement them. A transformation is
only useful if it can be computed efficiently and sufficiently
free of pollution from truncation errors (either due to finishing
an iterative algorithm early, or due to round-off errors). A particularly
powerful and insightful transformation is the QR factorisation.
In this section we will introduce the QR factorisation and some
good and bad algorithms to compute it.</p>
<section id="what-is-the-qr-factorisation">
<h2><span class="section-number">3.1. </span>What is the QR factorisation?<a class="headerlink" href="#what-is-the-qr-factorisation" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450191857" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>We start with another definition.</p>
<div class="proof proof-type-definition" id="id1">

    <div class="proof-title">
        <span class="proof-type">Definition 3.1</span>

            <span class="proof-title-name">(Upper triangular matrix)</span>

    </div><div class="proof-content">
<p>An <span class="math notranslate nohighlight">\(m\times n\)</span> upper triangular matrix <span class="math notranslate nohighlight">\(R\)</span> has coefficients satisfying
<span class="math notranslate nohighlight">\(r_{ij}=0\)</span> when <span class="math notranslate nohighlight">\(i &gt; j\)</span>.</p>
<p>It is called upper triangular because the nonzero rows form a triangle
on and above the main diagonal of <span class="math notranslate nohighlight">\(R\)</span>.</p>
</div></div><p>Now we can describe the QR factorisation.</p>
<div class="proof proof-type-definition" id="id2">

    <div class="proof-title">
        <span class="proof-type">Definition 3.2</span>

            <span class="proof-title-name">(QR factorisation)</span>

    </div><div class="proof-content">
<p>A QR factorisation of an <span class="math notranslate nohighlight">\(m\times n\)</span> matrix <span class="math notranslate nohighlight">\(A\)</span> consists of an <span class="math notranslate nohighlight">\(m\times m\)</span>
unitary matrix <span class="math notranslate nohighlight">\(Q\)</span> and an <span class="math notranslate nohighlight">\(m\times n\)</span> upper triangular matrix <span class="math notranslate nohighlight">\(R\)</span> such
that <span class="math notranslate nohighlight">\(A=QR\)</span>.</p>
</div></div><p>The QR factorisation is a key tool in analysis of datasets, and
polynomial fitting. It is also at the core of one of the most widely
used algorithms for finding eigenvalues of matrices. We shall discuss
all of this later during this course.</p>
<p>When <span class="math notranslate nohighlight">\(m &gt; n\)</span>, <span class="math notranslate nohighlight">\(R\)</span> must have all zero rows after the <span class="math notranslate nohighlight">\(n\)</span> th row. Hence,
it makes sense to only work with the top <span class="math notranslate nohighlight">\(n\times n\)</span> block of <span class="math notranslate nohighlight">\(R\)</span>
consisting of the first <span class="math notranslate nohighlight">\(n\)</span> rows, which we call <span class="math notranslate nohighlight">\(\hat{R}\)</span>. Similarly,
in the matrix-matrix product <span class="math notranslate nohighlight">\(QR\)</span>, all columns of <span class="math notranslate nohighlight">\(Q\)</span> beyond the <span class="math notranslate nohighlight">\(n\)</span> th
column get multiplied by those zero rows in <span class="math notranslate nohighlight">\(R\)</span>, so it makes sense to
only work with the first <span class="math notranslate nohighlight">\(n\)</span> columns of <span class="math notranslate nohighlight">\(Q\)</span>, which we call <span class="math notranslate nohighlight">\(\hat{Q}\)</span>.
We then have the reduced QR factorisation, <span class="math notranslate nohighlight">\(\hat{Q}\hat{R}\)</span>.</p>
<div class="proof proof-type-exercise" id="id3">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.3</span>

    </div><div class="proof-content">
<p>The <a class="reference internal" href="cla_utils.html#cla_utils.exercises2.orthog_space" title="cla_utils.exercises2.orthog_space"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises2.orthog_space()</span></code></a> function has been
left unimplemented. Given a set of vectors <span class="math notranslate nohighlight">\(v_1,v_2,\ldots,v_n\)</span>
that span the subspace <span class="math notranslate nohighlight">\(U \subset \mathbb{C}^m\)</span>, the function
should find an orthonormal basis for the orthogonal complement
<span class="math notranslate nohighlight">\(U^{\perp}\)</span> given by</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[U^{\perp} = \{x \in \mathbb{C}^m: x^*v = 0, \, \forall v \in U\}.\]</div>
</div></blockquote>
<p>It is expected that it will only compute this up to a tolerance.
You should make use of the built in QR factorisation routine
<a class="reference external" href="https://numpy.org/doc/stable/reference/generated/numpy.linalg.qr.html#numpy.linalg.qr" title="(in NumPy v2.2)"><code class="xref py py-func docutils literal notranslate"><span class="pre">numpy.linalg.qr()</span></code></a>. The test script <code class="docutils literal notranslate"><span class="pre">test_exercises2.py</span></code> in
the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this function.</p>
</div></div><p>In the rest of this section we will examine some algorithms for computing
the QR factorisation, before discussing the application to least squares
problems. We will start with a bad algorithm, before moving on to some
better ones.</p>
</section>
<section id="qr-factorisation-by-classical-gram-schmidt-algorithm">
<h2><span class="section-number">3.2. </span>QR factorisation by classical Gram-Schmidt algorithm<a class="headerlink" href="#qr-factorisation-by-classical-gram-schmidt-algorithm" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450192200" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The classical Gram-Schmidt algorithm for QR factorisation is motivated
by the column space interpretation of the matrix-matrix multiplication
<span class="math notranslate nohighlight">\(A = QR\)</span>, namely that the jth column <span class="math notranslate nohighlight">\(a_j\)</span> of <span class="math notranslate nohighlight">\(A\)</span> is a linear
combination of the orthonormal columns of <span class="math notranslate nohighlight">\(Q\)</span>, with the coefficients
given by the jth column <span class="math notranslate nohighlight">\(r_j\)</span> of <span class="math notranslate nohighlight">\(R\)</span>.</p>
<p>The first column of <span class="math notranslate nohighlight">\(R\)</span> only has a non-zero entry in the first row, so
the first column of <span class="math notranslate nohighlight">\(Q\)</span> must be proportional to <span class="math notranslate nohighlight">\(A\)</span>, but normalised
(i.e. rescaled to have length 1). The scaling factor is this first row
of the first column of <span class="math notranslate nohighlight">\(R\)</span>. The second column of <span class="math notranslate nohighlight">\(R\)</span> has only non-zero
entries in the first two rows, so the second column of <span class="math notranslate nohighlight">\(A\)</span> must be
writeable as a linear combination of the first two columns of
<span class="math notranslate nohighlight">\(Q\)</span>. Hence, the second column of <span class="math notranslate nohighlight">\(Q\)</span> must by the second column of <span class="math notranslate nohighlight">\(A\)</span>
with the first column of <span class="math notranslate nohighlight">\(Q\)</span> projected out, and then normalised. The
first row of the second column of <span class="math notranslate nohighlight">\(R\)</span> is then the coefficient for this
projection, and the second row is the normalisation scaling
factor. The third row of <span class="math notranslate nohighlight">\(Q\)</span> is then the third row of <span class="math notranslate nohighlight">\(A\)</span> with the
first two columns of <span class="math notranslate nohighlight">\(Q\)</span> projected out, and so on.</p>
<p>Hence, finding a QR factorisation is equivalent to finding an
orthonormal spanning set for the columns of <span class="math notranslate nohighlight">\(A\)</span>, where the span of the
first <span class="math notranslate nohighlight">\(j\)</span> elements of the spanning set and of the first <span class="math notranslate nohighlight">\(j\)</span> columns of
<span class="math notranslate nohighlight">\(A\)</span> is the same, for <span class="math notranslate nohighlight">\(j=1,\ldots, n\)</span>.</p>
<p>Hence we have to find <span class="math notranslate nohighlight">\(R\)</span> coefficients such that</p>
<div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}q_1 = \frac{a_1}{r_{11}},\\q_2 = \frac{a_2-r_{12}q_1}{r_{22}}\\\vdots\\q_n = \frac{a_n - \sum_{i=1}^{n-1}r_{in}q_i}{r_{nn}},\end{aligned}\end{align} \]</div>
<p>with <span class="math notranslate nohighlight">\((q_1,q_2,\ldots,q_n)\)</span> an orthonormal set. The non-diagonal
entries of <span class="math notranslate nohighlight">\(R\)</span> are found by inner products, i.e.,</p>
<div class="math notranslate nohighlight">
\[r_{ij} = q_i^*a_j, \, i &lt; j,\]</div>
<p>and the diagonal entries are chosen so that <span class="math notranslate nohighlight">\(\|q_i\|=1\)</span>, for
<span class="math notranslate nohighlight">\(i=1,2,\ldots,n\)</span>, i.e.</p>
<div class="math notranslate nohighlight">
\[|r_{jj}| = \left\| a_j - \sum_{i=1}^{j-1} r_{ij} q_i \right\|.\]</div>
<p>Note that this absolute value does leave a degree of nonuniqueness
in the definition of <span class="math notranslate nohighlight">\(R\)</span>. It is standard to choose the diagonal entries
to be real and non-negative.</p>
<p>We now present the classical Gram-Schmidt algorithm as pseudo-code.</p>
<ul class="simple">
<li><p>FOR <span class="math notranslate nohighlight">\(j = 1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(v_j \gets a_j\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(i = 1\)</span> TO <span class="math notranslate nohighlight">\(j-1\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(r_{ij} \gets q_i^*a_j\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(i = 1\)</span> TO <span class="math notranslate nohighlight">\(j-1\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(v_j \gets v_j - r_{ij}q_i\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
<li><p><span class="math notranslate nohighlight">\(r_{jj} \gets \|v_j\|_2\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(q_j \gets v_j/r_{jj}\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>(Remember that Python doesn’t have END FOR statements, but instead
uses indentation to terminate code blocks. We’ll write END statements
for code blocks in pseudo-code in these notes.)</p>
<div class="proof proof-type-exercise" id="id4">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.4</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <a class="reference internal" href="cla_utils.html#cla_utils.exercises2.GS_classical" title="cla_utils.exercises2.GS_classical"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises2.GS_classical()</span></code></a> function
has been left unimplemented. It should implement the classical
Gram-Schmidt algorithm above, using Numpy slice notation so that
only one Python for loop is used. The function should work “in
place” by changing the values in <span class="math notranslate nohighlight">\(A\)</span>, without introducing
additional intermediate arrays (you will need to create a new array
to store <span class="math notranslate nohighlight">\(R\)</span>). The test script <code class="docutils literal notranslate"><span class="pre">test_exercises2.py</span></code> in the
<code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this function.</p>
</div></div><div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>The <span class="math notranslate nohighlight">\((\ddagger)\)</span> symbol in an exercise indicates that the code for
that exercise is in scope for use in the coursework. When the code
is used in the coursework, we will grade the code quality, for
appropriate use of Numpy slice operations, efficient use of array
memory, loop minimisation, and avoiding computation inside loops
that could be done beforehand.</p>
</div>
</section>
<section id="projector-interpretation-of-gram-schmidt">
<h2><span class="section-number">3.3. </span>Projector interpretation of Gram-Schmidt<a class="headerlink" href="#projector-interpretation-of-gram-schmidt" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450192723" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>At each step of the Gram-Schmidt algorithm, a projector is applied to
a column of <span class="math notranslate nohighlight">\(A\)</span>. We have</p>
<div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}q_1 = \frac{P_1a_1}{\|P_1a_1\|},\\q_2 = \frac{P_2a_2}{\|P_2a_2\|},\\\vdots\\q_n = \frac{P_na_n}{\|P_na_n\|},\end{aligned}\end{align} \]</div>
<p>where <span class="math notranslate nohighlight">\(P_j\)</span> are orthogonal projectors that project out the first <span class="math notranslate nohighlight">\(j-1\)</span>
columns <span class="math notranslate nohighlight">\((q_1,\ldots,q_{j-1})\)</span> (<span class="math notranslate nohighlight">\(P_1\)</span> is the identity as this set is
empty when <span class="math notranslate nohighlight">\(j=1\)</span>). The orthogonal projector onto the first <span class="math notranslate nohighlight">\(j-1\)</span> columns
is <span class="math notranslate nohighlight">\(\hat{Q}_{j-1}\hat{Q}_{j-1}^*\)</span>, where</p>
<div class="math notranslate nohighlight">
\[\hat{Q}_{j-1} =
\begin{pmatrix} q_1 &amp; q_2 &amp; \ldots &amp; q_{j-1} \end{pmatrix}.\]</div>
<p>Hence, <span class="math notranslate nohighlight">\(P_j\)</span> is the complementary projector, <span class="math notranslate nohighlight">\(P_j=I -
\hat{Q}_{j-1}\hat{Q}_{j-1}^*\)</span>.</p>
</section>
<section id="modified-gram-schmidt">
<h2><span class="section-number">3.4. </span>Modified Gram-Schmidt<a class="headerlink" href="#modified-gram-schmidt" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450193303" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>There is a big problem with the classical Gram-Schmidt algorithm. It
is unstable, which means that when it is implemented in inexact
arithmetic on a computer, round-off error unacceptably pollutes the
entries of <span class="math notranslate nohighlight">\(Q\)</span> and <span class="math notranslate nohighlight">\(R\)</span>, and the algorithm is not useable in
practice. What happens is that the columns of <span class="math notranslate nohighlight">\(Q\)</span> are not quite
orthogonal, and this loss of orthogonality spoils everything. We will
discuss stability later in the course, but right now we will just
discuss the fix for the classical Gram-Schmidt algorithm, which is
based upon the projector interpretation which we just discussed.</p>
<p>To reorganise Gram-Schmidt to avoid instability, we decompose <span class="math notranslate nohighlight">\(P_j\)</span>
into a sequence of <span class="math notranslate nohighlight">\(j-1\)</span> projectors of rank <span class="math notranslate nohighlight">\(m-1\)</span>, that each project
out one column of <span class="math notranslate nohighlight">\(Q\)</span>, i.e.</p>
<div class="math notranslate nohighlight">
\[P_j = P_{\perp q_{j-1}}\ldots P_{\perp q_2} P_{\perp q_1},\]</div>
<p>where</p>
<div class="math notranslate nohighlight">
\[P_{\perp q_j} = I - q_jq_j^*.\]</div>
<p>Then,</p>
<div class="math notranslate nohighlight">
\[v_j = P_ja_j = P_{\perp q_{j-1}}\ldots P_{\perp q_2}P_{\perp q_1}a_j.\]</div>
<p>Here we notice that we must apply <span class="math notranslate nohighlight">\(P_{\perp q_1}\)</span> to all but one
columns of <span class="math notranslate nohighlight">\(A\)</span>, and <span class="math notranslate nohighlight">\(P_{\perp q_2}\)</span> to all but two columns of <span class="math notranslate nohighlight">\(A\)</span>,
<span class="math notranslate nohighlight">\(P_{\perp q_3}\)</span> to all but three columns of <span class="math notranslate nohighlight">\(A\)</span>, and so on.</p>
<p>By doing this, we gradually transform <span class="math notranslate nohighlight">\(A\)</span> to a unitary matrix, as follows.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}\begin{split}A =
\begin{pmatrix}
a_1 &amp; a_2 &amp; a_3 &amp; \ldots &amp; a_n \\
\end{pmatrix}\end{split}\\\begin{split}\begin{pmatrix}
q_1 &amp; v_2^1 &amp; v_3^1 &amp; \ldots &amp; v_n^1 \\
\end{pmatrix}\end{split}\\\begin{split}\to
\begin{pmatrix}
q_1 &amp; q_2 &amp; v_3^2 &amp; \ldots &amp; v_n^2 \\
\end{pmatrix}\end{split}\\\begin{split}\ldots
\to
\begin{pmatrix}
q_1 &amp; q_2 &amp; q_3 &amp; \ldots &amp; q_n \\
\end{pmatrix}.\end{split}\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>Then it is just a matter of keeping a record of the coefficients
of the projections and normalisation scaling factors and storing
them in <span class="math notranslate nohighlight">\(R\)</span>.</p>
<p>This process is mathematically equivalent to the classical Gram-Schmidt
algorithm, but the arithmetic operations happen in a different order,
in a way that turns out to reduce accumulation of round-off errors.</p>
<p>We now present this modified Gram-Schmidt algorithm as pseudo-code.</p>
<ul class="simple">
<li><p>FOR <span class="math notranslate nohighlight">\(i = 1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(v_i \gets a_i\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(i = 1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(r_{ii} \gets \|v_i\|_2\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(q_i = v_i/r_{ii}\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(j = i+1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(r_{ij} \gets q_i^*v_j\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(v_j \gets v_j - r_{ij}q_i\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>This algorithm can be applied “in place”, overwriting the entries
in <span class="math notranslate nohighlight">\(A\)</span> with the <span class="math notranslate nohighlight">\(v\)</span> s and eventually the <span class="math notranslate nohighlight">\(q\)</span> s.</p>
<div class="proof proof-type-exercise" id="id5">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.5</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <a class="reference internal" href="cla_utils.html#cla_utils.exercises2.GS_modified" title="cla_utils.exercises2.GS_modified"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises2.GS_modified()</span></code></a> function has been
left unimplemented. It should implement the modified Gram-Schmidt
algorithm above, using Numpy slice notation where possible.
What is the minimal number of Python
for loops possible?</p>
<p>The function should work “in place” by changing the values in <span class="math notranslate nohighlight">\(A\)</span>,
without introducing additional intermediate arrays (you will need
to create a new array to store <span class="math notranslate nohighlight">\(R\)</span>). The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises2.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this
function.</p>
</div></div><div class="proof proof-type-exercise" id="id6">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.6</span>

    </div><div class="proof-content">
<p>Investigate the mutual orthogonality of the <span class="math notranslate nohighlight">\(Q\)</span> matrices that are
produced by your classical and modified Gram-Schmidt
implementations. Is there a way to test mutual orthogonality
without writing a loop? Round-off typically causes problems for
matrices with large condition numbers and large off-diagonal
values. You could also try the opposite of what was done in
<code class="docutils literal notranslate"><span class="pre">test_GS_classical</span></code>: instead of ensuring that all of the entries
in the diagonal matrix <span class="math notranslate nohighlight">\(D\)</span> are <span class="math notranslate nohighlight">\(\mathcal{O}(1)\)</span>, try making some of
the values small and some large. See if you can find a matrix that
illustrates the differences in orthogonality between the two
algorithms.</p>
</div></div></section>
<section id="modified-gram-schmidt-as-triangular-orthogonalisation">
<h2><span class="section-number">3.5. </span>Modified Gram-Schmidt as triangular orthogonalisation<a class="headerlink" href="#modified-gram-schmidt-as-triangular-orthogonalisation" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450193575" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>This iterative transformation process can be written as
right-multiplication by an upper triangular matrix. For
example, at the first iteration,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}\underbrace{
\begin{pmatrix}
v_1^0 &amp; v_2^0 &amp; \ldots &amp; v_n^0
\end{pmatrix}}_{A}
\underbrace{
\begin{pmatrix}
\frac{1}{r_{11}} &amp; -\frac{r_{12}}{r_{11}} &amp; \ldots &amp;
\ldots &amp; -\frac{r_{1n}}{r_{11}} \\
0 &amp; 1 &amp; 0 &amp; \ldots &amp; 0 \\
0 &amp; 0 &amp; 1 &amp; \ldots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ldots &amp; \vdots \\
0 &amp; 0 &amp; 0 &amp; \ldots &amp; 1 \\
\end{pmatrix}}_{R_1}
=
\underbrace{
\begin{pmatrix}
q_1 &amp; v_2^1 &amp; \ldots &amp; v_n^1
\end{pmatrix}}_{A_1}.\end{split}\]</div>
</div></blockquote>
<p>To understand this equation, we can use the column space
interpretation of matrix-matrix multiplication. The columns of <span class="math notranslate nohighlight">\(A_1\)</span>
are linear combinations of the columns of <span class="math notranslate nohighlight">\(A\)</span> with coefficients
given by the columns of <span class="math notranslate nohighlight">\(R_1\)</span>.  Hence, <span class="math notranslate nohighlight">\(q_1\)</span> only depends on <span class="math notranslate nohighlight">\(v_1^0\)</span>,
scaled to have length 1, and <span class="math notranslate nohighlight">\(v_i^1\)</span> is a linear combination of
<span class="math notranslate nohighlight">\((v_1^0,v_i^0)\)</span> such that <span class="math notranslate nohighlight">\(v_i^1\)</span> is orthogonal to <span class="math notranslate nohighlight">\(q_1\)</span>, for <span class="math notranslate nohighlight">\(1&lt;i\leq
n\)</span>.</p>
<p>Similarly, the second iteration may be written as</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}\underbrace{
\begin{pmatrix}
v_1^1 &amp; v_2^1 &amp; \ldots &amp; v_n^1
\end{pmatrix}}_{A_1}
\underbrace{
\begin{pmatrix}
1 &amp; 0 &amp; 0 &amp;
\ldots &amp; 0 \\
0 &amp; \frac{1}{r_{22}} &amp; -\frac{r_{23}}{r_{22}} &amp; \ldots &amp; -\frac{r_{2n}}{r_{22}} \\      0 &amp; 0 &amp; 1 &amp; \ldots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ldots &amp; \vdots \\
0 &amp; 0 &amp; 0 &amp; \ldots &amp; 1 \\
\end{pmatrix}}_{R_2}
=
\underbrace{
\begin{pmatrix}
q_1 &amp; q_2 &amp; v_3^2 \ldots &amp; v_n^2
\end{pmatrix}}_{A_2}.\end{split}\]</div>
</div></blockquote>
<p>It should become clear that each transformation from <span class="math notranslate nohighlight">\(A_i\)</span> to <span class="math notranslate nohighlight">\(A_{i+1}\)</span>
takes place by right multiplication by an upper triangular matrix <span class="math notranslate nohighlight">\(R_{i+1}\)</span>,
which is an identity matrix plus entries in row i. By combining these
transformations together, we obtain</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[A\underbrace{R_1R_2\ldots R_n}_{\hat{R}^{-1}} = \hat{Q}.\]</div>
</div></blockquote>
<p>Since upper triangular matrices form a group, the product of the <span class="math notranslate nohighlight">\(R_i\)</span>
matrices is upper triangular. Further, all the <span class="math notranslate nohighlight">\(R_i\)</span> matrices have
non-zero determinant, so the product is invertible, and we can write
this as <span class="math notranslate nohighlight">\(\hat{R}^{-1}\)</span>. Right multiplication by <span class="math notranslate nohighlight">\(\hat{R}\)</span> produces the
usual reduced QR factorisation. We say that modified Gram-Schmidt
implements triangular orthogonalisation: the transformation of <span class="math notranslate nohighlight">\(A\)</span> to
an orthogonal matrix by right multiplication of upper triangular
matrices.</p>
<p>This is a powerful way to view the modified Gram-Schmidt process from
the point of view of understanding and analysis, but of course we do not
form the matrices <span class="math notranslate nohighlight">\(R_i\)</span> explicitly (we just follow the pseudo-code given
above).</p>
<div class="proof proof-type-exercise" id="id7">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.7</span>

    </div><div class="proof-content">
<p>In a break from the format so far, the
<a class="reference internal" href="cla_utils.html#cla_utils.exercises2.GS_modified_R" title="cla_utils.exercises2.GS_modified_R"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises2.GS_modified_R()</span></code></a> function has been
implemented. It implements the modified Gram-Schmidt algorithm in
the form describe above using upper triangular matrices. This is
not a good way to implement the algorithm, because of the inversion
of <span class="math notranslate nohighlight">\(R\)</span> at the end, and the repeated multiplication by zeros in
multiplying entries of the <span class="math notranslate nohighlight">\(R_k\)</span> matrices, which is a
waste. However it is important as a conceptual tool for
understanding the modified Gram-Schmidt algorithm as a triangular
orthogonalisation process, and so it is good to see this in a code
implementation. Study this function to check that you understand
what is happening.</p>
<p>However, the <a class="reference internal" href="cla_utils.html#cla_utils.exercises2.GS_modified_get_R" title="cla_utils.exercises2.GS_modified_get_R"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises2.GS_modified_get_R()</span></code></a>
function has not been implemented. This function computes the <span class="math notranslate nohighlight">\(R_k\)</span>
matrices at each step of the process. Complete this code. The test
script <code class="docutils literal notranslate"><span class="pre">test_exercises2.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will also
test this function.</p>
</div></div></section>
<section id="householder-triangulation">
<h2><span class="section-number">3.6. </span>Householder triangulation<a class="headerlink" href="#householder-triangulation" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450199222" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>This view of the modified Gram-Schmidt process as triangular
orthogonalisation gives an idea to build an alternative algorithm.
Instead of right multiplying by upper triangular matrices to transform
<span class="math notranslate nohighlight">\(A\)</span> to <span class="math notranslate nohighlight">\(\hat{Q}\)</span>, we can consider left multiplying by unitary
matrices to transform <span class="math notranslate nohighlight">\(A\)</span> to <span class="math notranslate nohighlight">\(R\)</span>,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\underbrace{Q_n\ldots Q_2Q_1}_{=Q^*}A = R.\]</div>
</div></blockquote>
<p>Multiplying unitary matrices produces unitary matrices, so we obtain
<span class="math notranslate nohighlight">\(A=QR\)</span> as a full factorisation of <span class="math notranslate nohighlight">\(A\)</span>.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450199366" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>To do this, we need to work on the columns of <span class="math notranslate nohighlight">\(A\)</span>, from left to right,
transforming them so that each column has zeros below the
diagonal. These unitary transformations need to be designed so that they
don’t spoil the structure created in previous columns. The easiest
way to ensure this is construct a unitary matrix <span class="math notranslate nohighlight">\(Q_k\)</span> with an identity
matrix as the <span class="math notranslate nohighlight">\((k-1)\times (k-1)\)</span> submatrix,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}Q_k =
\begin{pmatrix}
I_{k-1} &amp; 0 \\
0 &amp; F \\
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>This means that multiplication by <span class="math notranslate nohighlight">\(Q_k\)</span> won’t change the first <span class="math notranslate nohighlight">\(k-1\)</span>
rows, leaving the previous work to remove zeros below the diagonal
undisturbed. For <span class="math notranslate nohighlight">\(Q_k\)</span> to be unitary and to transform all below
diagonal entries in column <span class="math notranslate nohighlight">\(k\)</span> to zero, we need the
<span class="math notranslate nohighlight">\((n-k+1)\times(n-k+1)\)</span> submatrix <span class="math notranslate nohighlight">\(F\)</span> to also be unitary, since</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}Q_k^* =
\begin{pmatrix}
I_{k-1} &amp; 0 \\
0 &amp; F^* \\
\end{pmatrix}, \,
Q_k^{-1} =
\begin{pmatrix}
I_{k-1} &amp; 0 \\
0 &amp; F^{-1} \\
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>We write the <span class="math notranslate nohighlight">\(k\)</span> th column <span class="math notranslate nohighlight">\(v_k^k\)</span> of <span class="math notranslate nohighlight">\(A_k\)</span> as</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}v_k^k =
\begin{pmatrix}
\hat{v}_k^k \\
x
\end{pmatrix},\end{split}\]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(\hat{v}_k^k\)</span> contains the first <span class="math notranslate nohighlight">\(k-1\)</span> entries of <span class="math notranslate nohighlight">\(v_k^k\)</span>. The column
gets transformed according to</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}Q_kv_k^k = \begin{pmatrix}
\hat{v}_k^k \\
Fx
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>and our goal is that <span class="math notranslate nohighlight">\(Fx\)</span> is zero, except for the first entry (which
becomes the diagonal entry of <span class="math notranslate nohighlight">\(Q_kv_k^k\)</span>). Since <span class="math notranslate nohighlight">\(F\)</span> is unitary, we must
have <span class="math notranslate nohighlight">\(\|Fx\|=\|x\|\)</span>. For now we shall specialise to
real matrices, so we choose to have</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Fx = \pm\|x\|e_1,\]</div>
</div></blockquote>
<p>where we shall consider the sign later. Complex matrices have a more
general formula for Householder transformations which we shall not
discuss here.</p>
<p>We can achieve this by using a Householder reflector for <span class="math notranslate nohighlight">\(F\)</span>, which is
a unitary transformation that does precisely what we
need. Geometrically, the idea is that we consider a line joining <span class="math notranslate nohighlight">\(x\)</span>
and <span class="math notranslate nohighlight">\(Fx=\pm\|x\|e_1\)</span>, which points in the direction <span class="math notranslate nohighlight">\(v=\pm\|x\|e_1-x\)</span>. We can
transform <span class="math notranslate nohighlight">\(x\)</span> to <span class="math notranslate nohighlight">\(Fx\)</span> by a reflection in the hyperplane <span class="math notranslate nohighlight">\(H\)</span> that is
orthogonal to <span class="math notranslate nohighlight">\(v\)</span>. Since reflections are norm preserving, <span class="math notranslate nohighlight">\(F\)</span> must be
unitary. Applying the projector <span class="math notranslate nohighlight">\(P\)</span> given by</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Px = \left(I - \frac{vv^*}{v^*v}\right)x,\]</div>
</div></blockquote>
<p>does half the job, producing a vector in <span class="math notranslate nohighlight">\(H\)</span>. To do a reflection we
need to go twice as far,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Fx = \left(I - 2\frac{vv^*}{v^*v}\right)x.\]</div>
</div></blockquote>
<p>We can check that this does what we want,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}Fx = \left(I - 2\frac{vv^*}{v^*v}\right)x,\\   = x - 2\frac{(\pm\|x\|e_1 - x)}{\|\pm\|x\|e_1 - x\|^2}
   (\pm\|x\|e_1 - x)^*x,\\   = x - 2\frac{(\pm\|x\|e_1 - x)}{\|\pm\|x\|e_1 - x\|^2}
   \|x\|(\pm x_1 - \|x\|),\\   = x + (\pm\|x\|e_1 - x) = \pm\|x\|e_1,\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>as required, having checked that (assuming <span class="math notranslate nohighlight">\(x\)</span> is real)</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\|\pm \|x\|e_1 - x\|^2 = \|x\|^2 \mp 2\|x\|x_1 + \|x\|^2
= -2\|x\|(\pm x_1 - \|x\|).\]</div>
</div></blockquote>
<p>We can also check that <span class="math notranslate nohighlight">\(F\)</span> is unitary. First we check that <span class="math notranslate nohighlight">\(F\)</span>
is Hermitian,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}\left(I - 2\frac{vv^*}{v^*v}\right)^*
= I - 2\frac{(vv^*)^*}{v^*v},\\= I - 2\frac{(v^*)^*v^*}{v^*v},\\= I - 2\frac{vv^*}{v^*v} = F.\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>Now we use this to show that <span class="math notranslate nohighlight">\(F\)</span> is unitary,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}F^*F = \left(I - 2\frac{vv^*}{v^*v}\right)
\left(I - 2\frac{vv^*}{v^*v}\right)\\= I - 4\frac{vv^*}{v^*v} +
4 \frac{vv^*}{v^*v}\frac{vv^*}{v^*v} = I,\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>so <span class="math notranslate nohighlight">\(F^*=F^{-1}\)</span>. In summary, we have constructed a unitary
matrix <span class="math notranslate nohighlight">\(Q_k\)</span> that transforms the entries below the diagonal
of the kth column of <span class="math notranslate nohighlight">\(A_k\)</span> to zero, and leaves the previous
<span class="math notranslate nohighlight">\(k-1\)</span> columns alone.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450200163" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>Earlier, we mentioned that there is a choice of sign in <span class="math notranslate nohighlight">\(v\)</span>.  This
choice gives us the opportunity to improve the numerical stability of
the algorithm. In the case of real matrices, to avoid unnecessary
numerical round off, we choose the sign that makes <span class="math notranslate nohighlight">\(v\)</span> furthest from
<span class="math notranslate nohighlight">\(x\)</span>, i.e.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[v = \mbox{sign}(x_1)\|x\|e_1 + x.\]</div>
</div></blockquote>
<p>(Exercise, show that this choice of sign achieves this.) It is critical
that we use a definition of <span class="math notranslate nohighlight">\(\mbox{sign}\)</span> that always returns a number
that has magnitude 1, so we conventionally choose <span class="math notranslate nohighlight">\(\mbox{sign}(0)=1\)</span>.</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Note that the <code class="docutils literal notranslate"><span class="pre">numpy.sign</span></code> function has <span class="math notranslate nohighlight">\(\mbox{sign}(0)=0\)</span>, so
you need to take care of this case separately in your Python
implementation.</p>
</div>
<p>For complex valued matrices, the Householder reflection uses <span class="math notranslate nohighlight">\(x_1/|x_1|\)</span>
(except for <span class="math notranslate nohighlight">\(x_1=0\)</span> where we use 1 as above).</p>
<p>We are now in a position to describe the algorithm in
pseudo-code. Here it is described an “in-place” algorithm, where the
successive transformations to the columns of <span class="math notranslate nohighlight">\(A\)</span> are implemented as
replacements of the values in <span class="math notranslate nohighlight">\(A\)</span>. This means that we can allocate
memory on the computer for <span class="math notranslate nohighlight">\(A\)</span> which is eventually replaced with the
values for <span class="math notranslate nohighlight">\(R\)</span>. To present the algorithm, we will use the “slice”
notation to describe submatrices of <span class="math notranslate nohighlight">\(A\)</span>, with <span class="math notranslate nohighlight">\(A_{k:l,r:s}\)</span> being
the submatrix of <span class="math notranslate nohighlight">\(A\)</span> consisting of the rows from <span class="math notranslate nohighlight">\(k\)</span> to <span class="math notranslate nohighlight">\(l\)</span> and
columns from <span class="math notranslate nohighlight">\(r\)</span> to <span class="math notranslate nohighlight">\(s\)</span>.</p>
<ul class="simple">
<li><p>FOR <span class="math notranslate nohighlight">\(k = 1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(x = A_{k:m,k}\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(v_k \gets \mbox{sign}(x_1)\|x\|_2e_1 + x\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(v_k \gets v_k/\|v_k\|\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(A_{k:m,k:n} \gets A_{k:m,k:n} - 2v_k(v_k^*A_{k:m,k:n})\)</span>.</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<div class="proof proof-type-exercise" id="id8">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.8</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder" title="cla_utils.exercises3.householder"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder()</span></code></a> function
has been left unimplemented. It should implement the algorithm
above, using only one loop over <span class="math notranslate nohighlight">\(k\)</span>. It should return the resulting
<span class="math notranslate nohighlight">\(R\)</span> matrix. The test script <code class="docutils literal notranslate"><span class="pre">test_exercises3.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code>
directory will test this function.</p>
</div></div><div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>The “slice” notation using colons “:” is more an more important
when we are doing operations on submatrices. For example,</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Don’t forget that Python numbers from zero, which will be important
when implementing the submatrices using Numpy slice notation.</p>
</div>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>The Python functions “inner”, “outer”, and “dot” are very useful
for succinctly expressing many linear algebra
operations. “inner(a,b)” multiplies each component of a
multidimensional array <span class="math notranslate nohighlight">\(a\)</span> with the corresponding component of another
array <span class="math notranslate nohighlight">\(b\)</span> (with the same “shape”) and sums over all indices.
“dot(a, b)” only sums over the last index of <span class="math notranslate nohighlight">\(a\)</span> and the first
index of <span class="math notranslate nohighlight">\(b\)</span>. For some algorithms it may be necessary to swap
the order of <span class="math notranslate nohighlight">\(a\)</span> and <span class="math notranslate nohighlight">\(b\)</span> and make use of transposes to sum
over the correct index.</p>
</div>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450201578" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>Note that we have not explicitly formed the matrix <span class="math notranslate nohighlight">\(Q\)</span> or the product
matrices <span class="math notranslate nohighlight">\(Q_i\)</span>. In some applications, such as solving least squares
problems, we don’t explicitly need <span class="math notranslate nohighlight">\(Q\)</span>, just the matrix-vector product
<span class="math notranslate nohighlight">\(Q^*b\)</span> with some vector <span class="math notranslate nohighlight">\(b\)</span>. To compute this product, we can just
apply the same operations to <span class="math notranslate nohighlight">\(b\)</span> that are applied to the columns of
<span class="math notranslate nohighlight">\(A\)</span>. This can be expressed in the following pseudo-code, working
“in place” in the storage of <span class="math notranslate nohighlight">\(b\)</span>.</p>
<ul class="simple">
<li><p>FOR <span class="math notranslate nohighlight">\(k = 1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(b_{k:m} \gets b_{k:m} - 2v_k(v_k^*b_{k:m})\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>We call this procedure “implicit multiplication”.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450202242" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>In this section we will frequently encounter systems of the form</p>
<div class="math notranslate nohighlight">
\[\hat{R}x = y.\]</div>
<p>This is an upper triangular system that can be solved efficiently
using back-substitution.</p>
<p>Written in components, this equation is</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}R_{11}x_1 + R_{12}x_2 + \ldots + R_{1(m-1)}x_{m-1} + R_{1m}x_m = y_1,\\0x_1 + R_{22}x_2 + \ldots + R_{2(m-1)}x_{m-1} + R_{2m}x_m = y_2,\\\vdots\\0x_1 + 0x_2 + \ldots + R_{(m-1)(m-1)}x_{m-1} + R_{(m-1)m}x_m = y_{m-1},\\ 0x_1 + 0x_2 + \ldots + 0x_{m-1} + R_{mm}x_m = y_{m}.\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>The last equation yields <span class="math notranslate nohighlight">\(x_m\)</span> directly by dividing by <span class="math notranslate nohighlight">\(R_{mm}\)</span>, then
we can use this value to directly compute <span class="math notranslate nohighlight">\(x_{m-1}\)</span>. This is repeated
for all of the entries of <span class="math notranslate nohighlight">\(x\)</span> from <span class="math notranslate nohighlight">\(m\)</span> down to 1. This procedure is
called back substitution, which we summarise in the following
pseudo-code.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(x_m  \gets y_m/R_{mm}\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(i= m-1\)</span> TO 1 (BACKWARDS)</p>
<ul>
<li><p><span class="math notranslate nohighlight">\(x_i \gets (y_i - \sum_{k=i+1}^mR_{ik}x_k)/R_{ii}\)</span></p></li>
</ul>
</li>
</ul>
<div class="proof proof-type-exercise" id="id9">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.9</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The function <a class="reference internal" href="cla_utils.html#cla_utils.exercises3.solve_U" title="cla_utils.exercises3.solve_U"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.solve_U()</span></code></a> has
been left unimplemented.  It should use backward substitution to
solve upper triangular systems. The interfaces are set so that
multiple right hand sides can be provided and solved at the same
time. The functions should only use one loop over the rows of <span class="math notranslate nohighlight">\(U\)</span>,
to efficiently solve the multiple problems. The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises3.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test these
functions.</p>
</div></div><div class="proof proof-type-exercise" id="id10">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.10</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> Show that the implicit multiplication procedure is
equivalent to computing an extended array</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\hat{A} = \begin{pmatrix}
a_1 &amp; a_2 &amp; \ldots &amp; a_n &amp; b
\end{pmatrix}\]</div>
</div></blockquote>
<p>and performing Householder on the first <span class="math notranslate nohighlight">\(n\)</span> rows. Transform the
equation <span class="math notranslate nohighlight">\(Ax=b\)</span> into <span class="math notranslate nohighlight">\(Rx=\hat{b}\)</span> where <span class="math notranslate nohighlight">\(QR=A\)</span>, and find the form
of <span class="math notranslate nohighlight">\(\hat{b}\)</span>, explaining how to get <span class="math notranslate nohighlight">\(\hat{b}\)</span> from Householder
applied to <span class="math notranslate nohighlight">\(\hat{A}\)</span> above. Solving systems with upper triangular
matrices is much cheaper than solving general matrix systems as
we shall discuss later.</p>
<p>Now, say that we want to solve multiple equations</p>
<div class="math notranslate nohighlight">
\[Ax_i =b_i, i=1,2,\ldots,k,\]</div>
<p>which have the same matrix <span class="math notranslate nohighlight">\(A\)</span> but different right hand sides
<span class="math notranslate nohighlight">\(b=b_i\)</span>, <span class="math notranslate nohighlight">\(i=1,2,\ldots,k\)</span>. Extend this idea above to the case
<span class="math notranslate nohighlight">\(k&gt;1\)</span>, by describing an extended <span class="math notranslate nohighlight">\(\hat{A}\)</span> containing all the <span class="math notranslate nohighlight">\(b_i\)</span>
vectors.</p>
<p>The <a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder_solve" title="cla_utils.exercises3.householder_solve"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder_solve()</span></code></a> function has
been left unimplemented. It takes in a set of right hand side
vectors <span class="math notranslate nohighlight">\(b_1,b_2,\ldots,b_k\)</span> and returns a set of solutions
<span class="math notranslate nohighlight">\(x_1,x_2,\ldots,x_k\)</span>.  In the course of solving, it should
construct an extended array <span class="math notranslate nohighlight">\(\hat{A}\)</span>, and then pass it to
<a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder" title="cla_utils.exercises3.householder"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder()</span></code></a>. You will also need
<a class="reference internal" href="cla_utils.html#cla_utils.exercises3.solve_U" title="cla_utils.exercises3.solve_U"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.solve_U()</span></code></a>.</p>
</div></div><p>If we really need <span class="math notranslate nohighlight">\(Q\)</span>, we can get it by matrix-vector products with
each element of the canonical basis <span class="math notranslate nohighlight">\((e_1,e_2,\ldots,e_n)\)</span>.  This
means that first we need to compute a matrix-vector product <span class="math notranslate nohighlight">\(Qx\)</span> with
a vector <span class="math notranslate nohighlight">\(x\)</span>. One way to do this is to apply the Householder
reflections in reverse, since</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Q = (Q_n\ldots Q_2Q_1)^* = Q_1Q_2\ldots Q_n,\]</div>
</div></blockquote>
<p>having made use of the fact that the Householder reflections are
Hermitian. This can be expressed in the following pseudo-code.</p>
<ul class="simple">
<li><p>FOR <span class="math notranslate nohighlight">\(k = n\)</span> TO <span class="math notranslate nohighlight">\(1\)</span> (DOWNWARDS)</p>
<ul>
<li><p><span class="math notranslate nohighlight">\(x_{k:m} \gets x_{k:m} - 2v_k(v_k^*x_{k:m})\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>Note that this requires to record all of the history of the <span class="math notranslate nohighlight">\(v\)</span> vectors,
whilst the <span class="math notranslate nohighlight">\(Q^*\)</span> application algorithm above can be interlaced with the
steps of the Householder algorithm, using the <span class="math notranslate nohighlight">\(v\)</span> values as they are
needed and throwing them away. Then we can compute <span class="math notranslate nohighlight">\(Q\)</span> via</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Q = \begin{pmatrix}
Qe_1 &amp; Qe_2 &amp; \ldots &amp; Qe_n
\end{pmatrix},\]</div>
</div></blockquote>
<p>with each column using the <span class="math notranslate nohighlight">\(Q\)</span> application algorithm described above.</p>
<div class="proof proof-type-exercise" id="id11">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.11</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> Show that the implicit multiplication procedure
applied to the columns of <span class="math notranslate nohighlight">\(I\)</span> produces <span class="math notranslate nohighlight">\(Q^*\)</span>, from which we can
easily obtain <span class="math notranslate nohighlight">\(Q\)</span>, explaining how. Show how to implement this by
applying Householder to an augmented matrix <span class="math notranslate nohighlight">\(\hat{A}\)</span> of some
appropriate form.</p>
<p>The <a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder_qr" title="cla_utils.exercises3.householder_qr"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder_qr()</span></code></a> function has been
left unimplemented. It takes in the <span class="math notranslate nohighlight">\(m\times n\)</span> array <span class="math notranslate nohighlight">\(A\)</span> and
returns <span class="math notranslate nohighlight">\(Q\)</span> and <span class="math notranslate nohighlight">\(R\)</span>. It should use the method of this exercise to
compute them by forming an appropriate <span class="math notranslate nohighlight">\(\hat{A}\)</span>, calling
<a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder" title="cla_utils.exercises3.householder"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder()</span></code></a> and then extracting
appropriate subarrays using slice notation. The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises3.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will also test
this function.</p>
</div></div></section>
<section id="application-least-squares-problems">
<h2><span class="section-number">3.7. </span>Application: Least squares problems<a class="headerlink" href="#application-least-squares-problems" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/450202726" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>Least square problems are relevant in data fitting problems,
optimisation and control, and are also a crucial ingredient of modern
massively parallel linear system solver algorithms such as GMRES,
which we shall encounter later in the course. They are a way of
solving “long thin” matrix vector problems <span class="math notranslate nohighlight">\(Ax=b\)</span> where we want to
obtain <span class="math notranslate nohighlight">\(x\in \mathbb{C}^n\)</span> from <span class="math notranslate nohighlight">\(b\in\mathbb{C}^m\)</span> with <span class="math notranslate nohighlight">\(A\)</span> an
<span class="math notranslate nohighlight">\(m\times n\)</span> matrix.  Often the problem does not have a solution as it
is overdetermined for <span class="math notranslate nohighlight">\(m&gt;n\)</span>. Instead we just seek <span class="math notranslate nohighlight">\(x\)</span> that minimises
the 2-norm of the residual <span class="math notranslate nohighlight">\(r=b-Ax\)</span>, i.e. <span class="math notranslate nohighlight">\(x\)</span> is the minimiser of</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[min_x \|Ax - b\|^2.\]</div>
</div></blockquote>
<p>This residual will not be zero in general, when <span class="math notranslate nohighlight">\(b\)</span> is not in the
range of <span class="math notranslate nohighlight">\(A\)</span>. The nearest point in the range of <span class="math notranslate nohighlight">\(A\)</span> to <span class="math notranslate nohighlight">\(b\)</span> is <span class="math notranslate nohighlight">\(Pb\)</span>,
where <span class="math notranslate nohighlight">\(P\)</span> is the orthogonal projector onto the range of <span class="math notranslate nohighlight">\(A\)</span>. From
<a class="reference internal" href="L1_preliminaries.html#orthogonalprojector"><span class="std std-numref">theorem 2.28</span></a>, we know that
<span class="math notranslate nohighlight">\(P=\hat{Q}\hat{Q}^*\)</span>, where <span class="math notranslate nohighlight">\(\hat{Q}\)</span> from the reduced <span class="math notranslate nohighlight">\(QR\)</span>
factorisation has the same column space as <span class="math notranslate nohighlight">\(A\)</span> (but with orthogonal
columns).</p>
<p>Then, we just have to solve</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Ax = Pb,\]</div>
</div></blockquote>
<p>which is now solveable since <span class="math notranslate nohighlight">\(Pb\)</span> is in the column space of <span class="math notranslate nohighlight">\(A\)</span> (and
hence can be written as a linear combination of the columns of <span class="math notranslate nohighlight">\(A\)</span> i.e.
as a matrix-vector product <span class="math notranslate nohighlight">\(Ax\)</span> for some unknown <span class="math notranslate nohighlight">\(x\)</span>).</p>
<p>Now we have the reduced <span class="math notranslate nohighlight">\(QR\)</span> factorisation of <span class="math notranslate nohighlight">\(A\)</span>, and we can write</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\hat{Q}\hat{R}x = \hat{Q}\hat{Q}^*b.\]</div>
</div></blockquote>
<p>Left multiplication by <span class="math notranslate nohighlight">\(\hat{Q}^*\)</span> then gives</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\hat{R}x = \hat{Q}^*b.\]</div>
</div></blockquote>
<p>This is an upper triangular system that can be solved efficiently
using back-substitution.</p>
<div class="proof proof-type-exercise" id="id12">

    <div class="proof-title">
        <span class="proof-type">Exercise 3.12</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder_ls" title="cla_utils.exercises3.householder_ls"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder_ls()</span></code></a>
function has been left unimplemented. It takes in the <span class="math notranslate nohighlight">\(m\times n\)</span>
array <span class="math notranslate nohighlight">\(A\)</span> and a right-hand side vector <span class="math notranslate nohighlight">\(b\)</span> and solves the least
squares problem minimising <span class="math notranslate nohighlight">\(\|Ax-b\|\)</span> over <span class="math notranslate nohighlight">\(x\)</span>. It should do this
by forming an appropriate augmented matrix <span class="math notranslate nohighlight">\(\hat{A}\)</span>, calling
<a class="reference internal" href="cla_utils.html#cla_utils.exercises3.householder" title="cla_utils.exercises3.householder"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.householder()</span></code></a> and extracting appropriate
subarrays using slice notation, before using
<a class="reference internal" href="cla_utils.html#cla_utils.exercises3.solve_U" title="cla_utils.exercises3.solve_U"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises3.solve_U()</span></code></a> to solve the resulting upper
triangular system, before returning the solution <span class="math notranslate nohighlight">\(x\)</span>. The test
script <code class="docutils literal notranslate"><span class="pre">test_exercises3.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will also
test this function.</p>
</div></div><div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>You will need to do extract the appropriate submatrix to obtain the
square (and invertible) reduced matrix <span class="math notranslate nohighlight">\(\hat{R}\)</span>.</p>
</div>
</section>
</section>


            <div class="clearer"></div>
          </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="footer" role="contentinfo">
    &#169; Copyright 2020-2023, Colin J. Cotter.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.4.0.
    </div>
  </body>
</html>