comp-lin-alg.github.io/L4_LU_decomposition.html at master · comp-lin-alg/comp-lin-alg.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
<!DOCTYPE html>

<html lang="en" data-content_root="./">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

    <title>LU decomposition &#8212; Computational linear algebra course 2023.0 documentation</title>
    <link rel="stylesheet" type="text/css" href="_static/pygments.css?v=03e43079" />
    <link rel="stylesheet" type="text/css" href="_static/fenics.css?v=16c5e00f" />
    <link rel="stylesheet" type="text/css" href="_static/proof.css" />
    <script src="_static/documentation_options.js?v=f1ab3ab9"></script>
    <script src="_static/doctools.js?v=9a2dae69"></script>
    <script src="_static/sphinx_highlight.js?v=dc90522c"></script>
    <script src="_static/proof.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@2/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
<!--[if lte IE 6]>
<link rel="stylesheet" href="_static/ie6.css" type="text/css" media="screen" charset="utf-8" />
<![endif]-->

<link rel="stylesheet" href="_static/featured.css">


<link rel="shortcut icon" href="_static/icon.ico" />


  </head><body>
<div class="wrapper">
  <a href="index.html"><img src="_static/banner.png" width="900px" alt="Project Banner" /></a>
  <div id="access">
    <div class="menu">
      <ul>
          <li class="page_item"><a href="https://github.com/Computational-Linear-Algebra-Course/computational-linear-algebra-course" title="GitHub">GitHub</a></li>
      </ul>
    </div><!-- .menu -->
  </div><!-- #access -->
</div><!-- #wrapper -->


    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">

  <section id="lu-decomposition">
<h1>LU decomposition<a class="headerlink" href="#lu-decomposition" title="Link to this heading">¶</a></h1>
<p>In this section we look at the some other algorithms for solving the
equation <span class="math notranslate nohighlight">\(Ax=b\)</span> when <span class="math notranslate nohighlight">\(A\)</span> is invertible. On the one hand the <span class="math notranslate nohighlight">\(QR\)</span>
factorisation has great stability properties. On the other, it can be
beaten by other methods for speed when there is particular structure
to exploit (such as lots of zeros in the matrix). In this section, we
explore the the family of methods that go right back to the technique
of Gaussian elimination, that you will have been familiar with since
secondary school.</p>
<section id="an-algorithm-for-lu-decomposition">
<h2>An algorithm for LU decomposition<a class="headerlink" href="#an-algorithm-for-lu-decomposition" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454095315" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The computational way to view Gaussian elimination is through the LU
decomposition of an invertible matrix, <span class="math notranslate nohighlight">\(A=LU\)</span>, where <span class="math notranslate nohighlight">\(L\)</span> is lower
triangular (<span class="math notranslate nohighlight">\(l_{ij}=0\)</span> for <span class="math notranslate nohighlight">\(j&gt;i\)</span>) and <span class="math notranslate nohighlight">\(U\)</span> is upper triangular
(<span class="math notranslate nohighlight">\(u_{ij}=0\)</span> for <span class="math notranslate nohighlight">\(j&lt;i\)</span>). Here we use the symbol <span class="math notranslate nohighlight">\(U\)</span> instead of <span class="math notranslate nohighlight">\(R\)</span> to
emphasise that we are looking as square matrices.  The process of
obtaining the <span class="math notranslate nohighlight">\(LU\)</span> decomposition is very similar to the Householder
algorithm, in that we repeatedly left multiply <span class="math notranslate nohighlight">\(A\)</span> by matrices to
transform below-diagonal entries in each column to zero, working from
the first to the last column. The difference is that whilst the
Householder algorithm left multiplies with unitary matrices, here,
we left multiply with lower triangular matrices.</p>
<p>The first step puts zeros below the first entry in the first column.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}\begin{split}A_1 = L_1A = \begin{pmatrix}
u_1 &amp; v_2^1 &amp; v_2^1 &amp; \ldots &amp; v_n^1 \\
\end{pmatrix},\end{split}\\\begin{split}\,
u_1 = \begin{pmatrix} u_{11} \\ 0 \\ \ldots \\ 0\end{pmatrix}.\end{split}\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>Then, the next step puts zeros  below the second entry in the second
column.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}\begin{split}A_2 = L_2L_1A = \begin{pmatrix}
u_1 &amp; u_2 &amp; v_2^2 &amp; \ldots &amp; v_n^2 \\
\end{pmatrix},\end{split}\\\begin{split}\,
u_2 = \begin{pmatrix} u_{12} \\ u_{22} \\ 0 \\ \ldots \\ 0 \\
\end{pmatrix}.\end{split}\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>After repeated left multiplications we have</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[A_n = {L_n\ldots L_2L_1}A = U.\]</div>
</div></blockquote>
<p>This process of transforming <span class="math notranslate nohighlight">\(A\)</span> to <span class="math notranslate nohighlight">\(U\)</span> is called Gaussian elimination.</p>
<p>If we assume (we will show this later) that all these lower triangular
matrices are invertible, we can define</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}L = (L_n\ldots L_2L_1)^{-1} = L_1^{-1}L_2^{-1}\ldots L_n^{-1},\\\mbox{ so that }\\L^{-1} = L_n\ldots L_2L_1.\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>Then we have <span class="math notranslate nohighlight">\(L^{-1}A = U\)</span>, i.e. <span class="math notranslate nohighlight">\(A=LU\)</span>.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454096015" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>So, what’s the advantage of writing <span class="math notranslate nohighlight">\(A=LU\)</span>? Well, we can define
<span class="math notranslate nohighlight">\(y=Ux\)</span>.  Then, we can solve <span class="math notranslate nohighlight">\(Ax=b\)</span> in two steps, first solving <span class="math notranslate nohighlight">\(Ly=b\)</span>
for <span class="math notranslate nohighlight">\(y\)</span>, and then solving <span class="math notranslate nohighlight">\(Ux=y\)</span> for <span class="math notranslate nohighlight">\(x\)</span>. The latter equation is an
upper triangular system that can be solved by the back
substitution algorithm we introduced for QR factorisation. The former
equation can be solved by forward substitution, derived in an analogous
way, written in pseudo-code as follows.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(x_1 \gets b_1/L_{11}\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(i= 2\)</span> TO <span class="math notranslate nohighlight">\(m\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(x_i \gets (b_i - \sum_{k=1}^{i-1}L_{ik}x_k)/L_{ii}\)</span></p></li>
</ul>
</li>
</ul>
<p>Forward substitution has an operation count that is identical to back
substitution, by symmetry, i.e. <span class="math notranslate nohighlight">\(\mathcal{O}(m^2)\)</span>. In contrast, we
shall see shortly that the Gaussian elimination process has an
operation count <span class="math notranslate nohighlight">\(\mathcal{O}(m^3)\)</span>. Hence, it is much cheaper to solve
a linear system with a given <span class="math notranslate nohighlight">\(LU\)</span> factorisation than it is to form <span class="math notranslate nohighlight">\(L\)</span>
and <span class="math notranslate nohighlight">\(U\)</span> in the first place. We can take advantage of this in the
situation where we have to solve a whole sequence of linear systems
<span class="math notranslate nohighlight">\(Ax=b_i\)</span>, <span class="math notranslate nohighlight">\(i=1,2,\ldots,K\)</span>, with the same matrix <span class="math notranslate nohighlight">\(A\)</span> but different
right hand side vectors. In this case we can pay the cost of forming
<span class="math notranslate nohighlight">\(LU\)</span> once, and then use forward and back substitution to cheaply solve
each system. This is particularly useful when we need to repeatedly
solve systems as part of larger iterative algorithms, such as time
integration methods or Monte Carlo methods.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454096580" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>So, we need to find lower triangular matrices <span class="math notranslate nohighlight">\(L_k\)</span> that do not change
the first <span class="math notranslate nohighlight">\(k-1\)</span> rows, and transforms the <span class="math notranslate nohighlight">\(k\)</span>-th column <span class="math notranslate nohighlight">\(x_k\)</span> of <span class="math notranslate nohighlight">\(A_k\)</span>
as follows.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}Lx_k = L\begin{pmatrix}
x_{1k}\\
\vdots\\
x_{kk}\\
x_{k+1,k}\\
\vdots\\
x_{m,k}\\
\end{pmatrix}
= \begin{pmatrix}
x_{1k}\\
\vdots\\
x_{kk}\\
0 \\
\vdots\\
0 \\
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>As before with the Householder method, we see that we need the top-left
<span class="math notranslate nohighlight">\(k\times k\)</span> submatrix of <span class="math notranslate nohighlight">\(L\)</span> to be the identity (so that it doesn’t change
the first <span class="math notranslate nohighlight">\(k\)</span> rows). We claim that the following matrix transforms
<span class="math notranslate nohighlight">\(x_k\)</span> to the required form.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}\begin{split}L_k = \begin{pmatrix}
1 &amp; 0 &amp; 0 &amp; \ldots &amp; 0 &amp; \ldots &amp; \ldots &amp; \ldots &amp; 0 \\
0 &amp; 1 &amp; 0 &amp; \ldots &amp; 0 &amp; \ldots &amp; \ldots&amp; \vdots &amp; 0 \\
0 &amp; 0 &amp; 1 &amp; \ldots &amp; 0 &amp; \ldots &amp; \ldots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; 1 &amp; 0 &amp; \ldots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; -l_{k+1,k} &amp; 1 &amp; \ldots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; -l_{k+2,k} &amp; 0 &amp; \ddots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; \vdots &amp; 0 &amp; \ldots &amp; \ddots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; -l_{m,k} &amp; 0 &amp; \ldots &amp; \ldots &amp;1 \\
\end{pmatrix},\end{split}\\\quad\\\begin{split}l_k = \begin{pmatrix}
0 \\
0 \\
0 \\
\vdots \\
0 \\
l_{k+1,k}=x_{k+1,k}/x_{kk} \\
l_{k+2,k}= x_{k+2,k}/x_{kk} \\
\vdots\\
l_{m,k} = x_{m,k}/x_{kk} \\
\end{pmatrix}.\end{split}\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>This has the identity block as required, and we can verify that <span class="math notranslate nohighlight">\(L_k\)</span>
puts zeros in the entries of <span class="math notranslate nohighlight">\(x_k\)</span> below the diagonal by first writing
<span class="math notranslate nohighlight">\(L_k = I - l_ke_k^*\)</span>. Then,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[L_kx_k = I - l_ke_k^* = x_k - l_k\underbrace{(e_k^*x_k)}_{=x_{kk}},\]</div>
</div></blockquote>
<p>which subtracts off the below diagonal entries as required. Indeed,
multiplication by <span class="math notranslate nohighlight">\(L_k\)</span> implements the row operations that are performed
to transform below diagonal elements of <span class="math notranslate nohighlight">\(A_k\)</span> to zero during Gaussian
elimination.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454097320" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The determinant of a lower triangular matrix is equal to the trace
(product of diagonal entries), so <span class="math notranslate nohighlight">\(\det(L_k)=1\)</span>, and consequently
<span class="math notranslate nohighlight">\(L_k\)</span> is invertible, enabling us to define <span class="math notranslate nohighlight">\(L^{-1}\)</span> as above.
To form <span class="math notranslate nohighlight">\(L\)</span> we need to multiply the inverses of all the <span class="math notranslate nohighlight">\(L_k\)</span> matrices
together, also as above. To do this, we first note that <span class="math notranslate nohighlight">\(l_k^*e_k=0\)</span>
(because <span class="math notranslate nohighlight">\(l_k\)</span> is zero in the only entry that <span class="math notranslate nohighlight">\(e_k\)</span> is nonzero). Then
we claim that <span class="math notranslate nohighlight">\(L_k^{-1}=I + l_ke_k^*\)</span>, which we verify as follows.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}(I + l_ke_k^*)L_k =       (I + l_ke_k^*)(I - l_ke_k^*)
= I + l_ke_k^* - l_ke_k^* + (l_ke_k^*)(l_ke_k*)\\= I + \underbrace{l_k(e_k^*l_k)e_k*}_{=0} = I,\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>as required. Similarly if we multiply the inverse lower triangular
matrices from two consecutive iterations, we get</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}L_k^{-1}L_{k+1}^{-1} = (I + l_ke_k^*)(I + l_{k+1}e_{k+1}^*)
= I + l_ke_k^* + l_{k+1}e_{k+1}^* + l_k\underbrace{(e_k^*l_{k+1})}_{=0}e_{k+1}^*\\= I + l_ke_k^* + l_{k+1}e_{k+1}^*,\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>since <span class="math notranslate nohighlight">\(e_k^*l_{k+1}=0\)</span> too, as <span class="math notranslate nohighlight">\(l_{k+1}\)</span> is zero in the only place
where <span class="math notranslate nohighlight">\(e_k\)</span> is nonzero. If we iterate this argument, we get</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[L = I + \sum_{i=1}^{m-1}l_ie_i^*.\]</div>
</div></blockquote>
<p>Hence, the <span class="math notranslate nohighlight">\(k`th column of `L\)</span> is the same as the <span class="math notranslate nohighlight">\(k`th column of `L_k^{-1}\)</span>,
i.e.,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}L = \begin{pmatrix}
1 &amp; 0 &amp; 0 &amp; \ldots &amp; 0 &amp; \ldots &amp; \ldots &amp; \ldots &amp; 0 \\
l_{21} &amp; 1 &amp; 0 &amp; \ldots &amp; 0 &amp; \ldots &amp; \ldots&amp; \vdots &amp; 0 \\
l_{31} &amp; l_{32} &amp; 1 &amp; \ldots &amp; 0 &amp; \ldots &amp; \ldots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; 1 &amp; 0 &amp; \ldots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; l_{k+1,k} &amp; 1 &amp; \ldots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; l_{k+2,k} &amp; l_{k+2,k+1} &amp; \ddots &amp; \vdots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; \vdots &amp; l_{m-1,k+1} &amp; \ldots &amp; \ddots &amp; 0 \\
\vdots &amp; \ddots &amp; \ddots &amp; \ddots &amp; l_{m,k} &amp; l_{m,k+1} &amp; \ldots &amp; \ldots &amp;1 \\
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>In summary, we can compute entries of <span class="math notranslate nohighlight">\(L\)</span> during the Gaussian elimination
process of transforming <span class="math notranslate nohighlight">\(A\)</span> to <span class="math notranslate nohighlight">\(U\)</span>. Note that the matrices <span class="math notranslate nohighlight">\(L_1,L_2,\ldots\)</span>
should not be explicitly formed during the elimination process, they are just
a mathematical concept to translate from the row operations into the final
<span class="math notranslate nohighlight">\(L\)</span> matrix.</p>
<div class="proof proof-type-exercise" id="id1">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p>Having said that, let’s take a moment to compute some examples
using the <span class="math notranslate nohighlight">\(L_1,L_2,\ldots\)</span> matrices (to help with understanding).
The <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises6.get_Lk()</span></code> function has been left
unimplemented. It should return one of these matrices given the
<span class="math notranslate nohighlight">\(l_k\)</span> entries.  The test script <code class="docutils literal notranslate"><span class="pre">test_exercises6.py</span></code> in the
<code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this function.</p>
<p>Once it passes the tests, experiment with the inverse and
multiplication properties above, to verify that they work.</p>
</div></div><div class="admonition warning">
<p class="admonition-title">Warning</p>
<p>There is a sign error in the test for the above exercise. But this
is not currently part of the course so we are not fixing it currently.</p>
</div>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454098164" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The Gaussian elimination algorithm is written in pseudo-code as
follows. We start by copying <span class="math notranslate nohighlight">\(A\)</span> into <span class="math notranslate nohighlight">\(U\)</span>, and setting <span class="math notranslate nohighlight">\(L\)</span> to
an identity matrix, and then work “in-place” i.e. replacing values
of <span class="math notranslate nohighlight">\(U\)</span> and <span class="math notranslate nohighlight">\(L\)</span> until they are completed. In a computer implementation,
this memory should be preallocated and then written to instead of
making copies (which carries overheads).</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(U \gets A\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(L \gets I\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(k=1\)</span> TO <span class="math notranslate nohighlight">\(m-1\)</span></p>
<ul>
<li><p>for <span class="math notranslate nohighlight">\(j=k+1\)</span> TO <span class="math notranslate nohighlight">\(m\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(l_{jk} \gets u_{jk}/u_{kk}\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(u_{j,k:m} \gets u_{j,k:m} - l_{jk}u_{k,k:m}\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>To do an operation count for this algorithm, we note that the
dominating operation is the update of <span class="math notranslate nohighlight">\(U\)</span> inside the <span class="math notranslate nohighlight">\(j\)</span> loop. This
requires <span class="math notranslate nohighlight">\(m-k+1\)</span> multiplications and subtractions, and is iterated
<span class="math notranslate nohighlight">\(m-k\)</span> times in the <span class="math notranslate nohighlight">\(j\)</span> loop, and this whole thing is iterated from
<span class="math notranslate nohighlight">\(j=k+1\)</span> to <span class="math notranslate nohighlight">\(m\)</span>. Hence the asymptotic operation count is</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}N_{\mbox{FLOPs}} = \sum_{k=1}^{m-1}\sum_{j=k+1}^m 2(m-k+1),\\= \sum_{k=1}^{m-1}2(m-k+1)\underbrace{\sum_{j={k+1}}^m 1}_{=m-k}\\= \sum_{k=1}^{m-1}2m^2 - 4mk + 2k^2\\\sim 2m^3 -4\frac{m^3}{2} + \frac{2m^3}{3} = \frac{2m^3}{3}.\end{aligned}\end{align} \]</div>
</div></blockquote>
<div class="proof proof-type-exercise" id="id2">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p>Since the diagonal entries of <span class="math notranslate nohighlight">\(L\)</span> are all ones, the total amount of
combined memory required to store <span class="math notranslate nohighlight">\(L\)</span> and <span class="math notranslate nohighlight">\(U\)</span> is the same as the
amount of memory required to store <span class="math notranslate nohighlight">\(A\)</span>. Further, each iteration of
the LU factorisation algorithm computes one column of <span class="math notranslate nohighlight">\(L\)</span> and one
rows of <span class="math notranslate nohighlight">\(U\)</span>, and the corresponding column an row of <span class="math notranslate nohighlight">\(A\)</span> are not
needed for the rest of the algorithm. This creates the opportunity
for a memory-efficient ‘in-place’ algorithm in which the matrix <span class="math notranslate nohighlight">\(A\)</span>
is modified until it contains the values for <span class="math notranslate nohighlight">\(L\)</span> and <span class="math notranslate nohighlight">\(U\)</span>.</p>
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises6.LU_inplace()</span></code> function
has been left unimplemented. It should implement this in-place
low-storage procedure, applying the changes to the provided matrix
<span class="math notranslate nohighlight">\(A\)</span>.  The test script <code class="docutils literal notranslate"><span class="pre">test_exercises6.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code>
directory will test this function.</p>
</div></div><div class="proof proof-type-exercise" id="id3">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The LU factorisation requires 3 loops (this is why it
has a cubic FLOP count). In the algorithm above, there are two
explicit loops and one explicit one (in the slice notation). It is
possible to rewrite this in a single loop, using an outer
product. Identify this outer product, and update
<code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises6.LU_inplace()</span></code> to make use of this
reformulation (using <a class="reference external" href="https://numpy.org/doc/stable/reference/generated/numpy.outer.html#numpy.outer" title="(in NumPy v2.2)"><code class="xref py py-func docutils literal notranslate"><span class="pre">numpy.outer()</span></code></a>). Do you notice any
improvement in speed?</p>
</div></div><div class="proof proof-type-exercise" id="id4">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The function <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises6.solve_L()</span></code> has
been left unimplemented.  It should use forward substitution to
solve lower triangular systems. The interfaces are set so that
multiple right hand sides can be provided and solved at the same
time. The functions should only use one loop over the rows of <span class="math notranslate nohighlight">\(L\)</span>,
to efficiently solve the multiple problems. The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises6.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test these
functions.</p>
</div></div><div class="proof proof-type-exercise" id="id5">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> Propose an algorithm to use the LU factorisation to
compute the inverse of a matrix.  The functions
<code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises6.inverse_LU()</span></code> has been left
unimplemented.  Complete it using your algorithm, using functions
developed in the previous exercises where possible. The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises6.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test these
functions.</p>
</div></div></section>
<section id="pivoting">
<h2>Pivoting<a class="headerlink" href="#pivoting" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454098919" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454108809" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>Gaussian elimination will fail if a zero appears on the diagonal,
i.e. we get <span class="math notranslate nohighlight">\(x_{kk}=0\)</span> (since then we can’t divide by it). Similarly,
Gaussian elimination will amplify rounding errors if <span class="math notranslate nohighlight">\(x_{kk}\)</span> is very
small, because a small error becomes large after dividing by <span class="math notranslate nohighlight">\(x_{kk}\)</span>.
The solution is to reorder the rows in <span class="math notranslate nohighlight">\(A_k\)</span> so that that <span class="math notranslate nohighlight">\(x_{kk}\)</span> has
maximum magnitude. This would seem to mess up the <span class="math notranslate nohighlight">\(LU\)</span> factorisation
procedure. However, it is not as bad as it looks, as we will now
see.</p>
<p>The main tool is the permutation matrix.</p>
<div class="proof proof-type-definition" id="id6">

    <div class="proof-title">
        <span class="proof-type">Definition </span>

            <span class="proof-title-name">(Permutation matrix)</span>

    </div><div class="proof-content">
<p>An <span class="math notranslate nohighlight">\(m\times m\)</span> permutation matrix has precisely one entry equal to
1 in every row and column, and zero elsewhere.</p>
</div></div><p>A compact way to store a permutation matrix <span class="math notranslate nohighlight">\(P\)</span> as a size <span class="math notranslate nohighlight">\(m\)</span> vector
<span class="math notranslate nohighlight">\(p\)</span>, where <span class="math notranslate nohighlight">\(p_i\)</span> is equal to the number of the column containing the 1
entry in row <span class="math notranslate nohighlight">\(i\)</span> of <span class="math notranslate nohighlight">\(P\)</span>.  Multiplying a vector <span class="math notranslate nohighlight">\(x\)</span> by a permutation
matrix <span class="math notranslate nohighlight">\(P\)</span> simply rearranges the entries in <span class="math notranslate nohighlight">\(x\)</span>, with <span class="math notranslate nohighlight">\((Px)_i =
x_{p_i}\)</span>.</p>
<p>During Gaussian elimination, say that we are at stage <span class="math notranslate nohighlight">\(k\)</span>, and
<span class="math notranslate nohighlight">\((A_k)_{kk}\)</span> is not the largest magnitude entry in the <span class="math notranslate nohighlight">\(k`th column of
`A_k\)</span>. We reorder the rows to fix this, and this is what we call
<em>pivoting</em>. Mathematically this reordering is equivalent to
multiplication by a permutation matrix <span class="math notranslate nohighlight">\(P_k\)</span>. Then we continue the
Gaussian elimination procedure by left multiplying by <span class="math notranslate nohighlight">\(L_k\)</span>, placing
zeros below the diagonal in column <span class="math notranslate nohighlight">\(k\)</span> of <span class="math notranslate nohighlight">\(P_kA_k\)</span>.</p>
<p>In fact, <span class="math notranslate nohighlight">\(P_k\)</span> is a very specific type of permutation matrix, that only
swaps two rows. Therefore, <span class="math notranslate nohighlight">\(P_k^{-1}=P_k\)</span>, even though this is not
true for general permutation matrices.</p>
<p>We can pivot at every stage of the procedure, producing a permutation
matrix <span class="math notranslate nohighlight">\(P_k\)</span>, <span class="math notranslate nohighlight">\(k=1,\ldots, {m-1}\)</span> (if no pivoting is necessary at a given
stage, then we just take the identity matrix as the pivoting matrix
for that stage). Then, we end up with the result of Gaussian elimination
with pivoting,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[L_{m-1}P_{m-1}\ldots L_2P_2L_1P_1A = U.\]</div>
</div></blockquote>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454109227" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>This looks like it has totally messed up the LU factorisation, because
<span class="math notranslate nohighlight">\(LP\)</span> is not lower triangular for general lower triangular matrix <span class="math notranslate nohighlight">\(L\)</span>
and permutation matrix <span class="math notranslate nohighlight">\(P\)</span>. However, we can save the situation, by
trying to swap all the permutation matrices to the right of all of the
<span class="math notranslate nohighlight">\(L\)</span> matrices. This does change the <span class="math notranslate nohighlight">\(L\)</span> matrices, because matrix-matrix
multiplication is not commutative. However, we shall see that it does
preserve the lower triangular matrix structure.</p>
<p>To see how this is done, we focus on how things look after two stages
of Gaussian elimination. We have</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[A_2 = L_2P_2L_1P_1A = L_2\underbrace{P_2L_1P_2}_{=L_1^{(2)}}P_2P_1A
= L_2L_1^{(2)}P_2P_1A,\]</div>
</div></blockquote>
<p>having used <span class="math notranslate nohighlight">\(P_2^{-1}=P_2\)</span>. Left multiplication with <span class="math notranslate nohighlight">\(P_2\)</span> exchanges
row 2 with some other row <span class="math notranslate nohighlight">\(j\)</span> with <span class="math notranslate nohighlight">\(j&gt;2\)</span>. Hence, right multiplication
with <span class="math notranslate nohighlight">\(P_2\)</span> does the same thing but with columns instead of rows.
Therefore, <span class="math notranslate nohighlight">\(L_1P_2\)</span> is the same as <span class="math notranslate nohighlight">\(L_1\)</span> but with column 2 exchanged
with column <span class="math notranslate nohighlight">\(j\)</span>. Column 2 is just <span class="math notranslate nohighlight">\(e_2\)</span> and column <span class="math notranslate nohighlight">\(j\)</span> is just <span class="math notranslate nohighlight">\(e_j\)</span>,
so now column 2 has the 1 in row <span class="math notranslate nohighlight">\(j\)</span> and column <span class="math notranslate nohighlight">\(j\)</span> has the 1 in
row 2. Then, <span class="math notranslate nohighlight">\(P_2L_1P_2\)</span> exchanges row 2 of <span class="math notranslate nohighlight">\(L_1P_2\)</span> with row <span class="math notranslate nohighlight">\(j\)</span> of
<span class="math notranslate nohighlight">\(L_1P_2\)</span>. This just exchanges <span class="math notranslate nohighlight">\(l_{12}\)</span> with <span class="math notranslate nohighlight">\(l_{1j}\)</span>, and swaps the
1s in columns 2 and <span class="math notranslate nohighlight">\(j\)</span> back to the diagonal. In summary, <span class="math notranslate nohighlight">\(P_2L_1P_2\)</span>
is the same as <span class="math notranslate nohighlight">\(L_1\)</span> but with <span class="math notranslate nohighlight">\(l_{12}\)</span> exchanged with <span class="math notranslate nohighlight">\(l_{1j}\)</span>.</p>
<p>Moving on to the next stage, and we have</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[A_3 = L_3P_3L_2L_1P_2P_1A = L_3\underbrace{P_3L_2P_3}_{=L_2^{(3)}}
\underbrace{P_3L_1P_3}_{=L_1^{(3)}}P_3P_2P_1A.\]</div>
</div></blockquote>
<p>By similar arguments we see that <span class="math notranslate nohighlight">\(L_2^{(3)}\)</span> is the same as <span class="math notranslate nohighlight">\(L_2\)</span> but
with <span class="math notranslate nohighlight">\(l_{23}\)</span> exchanged with <span class="math notranslate nohighlight">\(l_{2j}\)</span> for some (different) <span class="math notranslate nohighlight">\(j\)</span>, and
<span class="math notranslate nohighlight">\(L_2^{(3)}\)</span> is the same as <span class="math notranslate nohighlight">\(L_2^{(2)}\)</span> with <span class="math notranslate nohighlight">\(l_{13}\)</span> exchanged with
<span class="math notranslate nohighlight">\(l_{1j}\)</span>. After iterating this argument, we can obtain</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\underbrace{L_{m-1}^{(m-1)}\ldots L_2^{(m-1)}L_1^{(m-1)}}_{L^{-1}}
\underbrace{P_{m-1}\ldots P_2P_1}_PA = U,\]</div>
</div></blockquote>
<p>where we just need to keep track of the permutations in the <span class="math notranslate nohighlight">\(L\)</span>
matrices as we go through the Gaussian elimination stages. These <span class="math notranslate nohighlight">\(L\)</span>
matrices have the same structure as the basic LU factorisation, and hence
we obtain</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[L^{-1}PA = U \implies PA = LU.\]</div>
</div></blockquote>
<p>This is equivalent to permuting the rows of <span class="math notranslate nohighlight">\(A\)</span> using <span class="math notranslate nohighlight">\(P\)</span> and then
finding the LU factorisation using the basic algorithm (except we
can’t implement it like that because we only decide how to build <span class="math notranslate nohighlight">\(P\)</span>
during the Gaussian elimination process).</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454109660" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The LU factorisation with pivoting can be expressed in the following
pseudo-code.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(U\gets A\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(L\gets I\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(P\gets I\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(k=1\)</span> TO <span class="math notranslate nohighlight">\(m-1\)</span></p>
<ul>
<li><p>Choose <span class="math notranslate nohighlight">\(i\geq k\)</span> to maximise <span class="math notranslate nohighlight">\(|u_{ik}|\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(u_{k,k:m} \leftrightarrow u_{i,k:m}\)</span> (row swaps)</p></li>
<li><p><span class="math notranslate nohighlight">\(l_{k,1:k-1} \leftrightarrow l_{i,1:k-1}\)</span> (row swaps)</p></li>
<li><p><span class="math notranslate nohighlight">\(p_{k,1:m} \leftrightarrow p_{i,1:m}\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(j=k+1\)</span> TO <span class="math notranslate nohighlight">\(m\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(l_{jk} \gets u_{jk}/u_{kk}\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(u_{j,k:m} \gets u_{j,k:m} - l_{jk}u_{k,k:m}\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454110324" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>To solve a system <span class="math notranslate nohighlight">\(Ax=b\)</span> given the a pivoted LU factorisation <span class="math notranslate nohighlight">\(PA=LU\)</span>,
we left multiply the equation by <span class="math notranslate nohighlight">\(P\)</span> and use the factorisation get
<span class="math notranslate nohighlight">\(LUx=Pb\)</span>. The procedure is then as before, but <span class="math notranslate nohighlight">\(b\)</span> must be permuted to
<span class="math notranslate nohighlight">\(Pb\)</span> before doing the forwards and back substitutions.</p>
<p>We call this strategy <em>partial pivoting</em>. In contrast, <em>complete
pivoting</em> additionally employs permutations <span class="math notranslate nohighlight">\(Q_k\)</span> on the right that
swap columns of <span class="math notranslate nohighlight">\(A_k\)</span> as well as the rows swapped by the permutations
<span class="math notranslate nohighlight">\(P_k\)</span>. By similar arguments, one can obtain the LU factorisation with
complete pivoting, <span class="math notranslate nohighlight">\(PAQ=LU\)</span>.</p>
<div class="proof proof-type-exercise" id="id7">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The function <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises7.perm()</span></code> has
been left unimplemented. It should take an <span class="math notranslate nohighlight">\(m\times m\)</span> permutation
matrix <span class="math notranslate nohighlight">\(P\)</span>, stored as an (integer-valued) array of indices
<span class="math notranslate nohighlight">\(p\in\mathbb{N}^m\)</span> so that <span class="math notranslate nohighlight">\((Px)_i = x_{p_i}\)</span>, <span class="math notranslate nohighlight">\(i=1,2,\ldots, m\)</span>,
and replace it with the matrix <span class="math notranslate nohighlight">\(P_{i,j}P\)</span> (also stored as a array
of indices) where <span class="math notranslate nohighlight">\(P_{i,j}\)</span> is the permutation matrix that
exchanges the entries <span class="math notranslate nohighlight">\(i\)</span> and <span class="math notranslate nohighlight">\(j\)</span>. The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises7.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this
function.</p>
</div></div><div class="proof proof-type-exercise" id="id8">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The function <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises7.LUP_inplace()</span></code>
has been left unimplemented. It should extend the in-place
algorithm for LU factorisation (with the outer-product formulation,
if you managed it) to the LUP factorisation. As well as computing L
and U “in place” in the array where the input A is stored, it will
compute a permutation matrix, which can and should be constructed
using <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises7.perm()</span></code>.The test script
<code class="docutils literal notranslate"><span class="pre">test_exercises7.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this
function.</p>
</div></div><div class="proof proof-type-exercise" id="id9">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The function <code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises7.solve_LUP()</span></code>
has been left unimplemented. It should use the LUP code that you
have written to solve the equation <span class="math notranslate nohighlight">\(Ax=b\)</span> for <span class="math notranslate nohighlight">\(x\)</span> given inputs <span class="math notranslate nohighlight">\(A\)</span>
and <span class="math notranslate nohighlight">\(b\)</span>.  The test script <code class="docutils literal notranslate"><span class="pre">test_exercises7.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code>
directory will test this function.</p>
</div></div><div class="proof proof-type-exercise" id="id10">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> Show how to compute the determinant of <span class="math notranslate nohighlight">\(A\)</span> from the
LUP factorisation in <span class="math notranslate nohighlight">\(\mathcal{O}(m)\)</span> time (having already
constructed the LUP factorisation which costs
<span class="math notranslate nohighlight">\(\mathcal{O}(m^3)\)</span>). Complete the function
<code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises7.det_LUP()</span></code> to implement this
computation. The test script <code class="docutils literal notranslate"><span class="pre">test_exercises7.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code>
directory will test this function.</p>
</div></div></section>
<section id="stability-of-lu-factorisation">
<h2>Stability of LU factorisation<a class="headerlink" href="#stability-of-lu-factorisation" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454110810" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>To characterise the stability of LU factorisation, we quote the following
result.</p>
<div class="proof proof-type-theorem" id="id11">

    <div class="proof-title">
        <span class="proof-type">Theorem </span>

    </div><div class="proof-content">
<p>Let <span class="math notranslate nohighlight">\(\tilde{L}\)</span> and <span class="math notranslate nohighlight">\(\tilde{U}\)</span> be the result of the Gaussian
elimination algorithm implemented in a floating point number system
satisfying axioms I and II. If no zero pivots are encountered, then</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\tilde{L}\tilde{U} = A + \delta A\]</div>
</div></blockquote>
<p>where</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\frac{\|\delta A\|}{\|L\|\|U\|} = \mathcal{O}(\varepsilon),\]</div>
</div></blockquote>
<p>for some perturbation <span class="math notranslate nohighlight">\(\delta A\)</span>.</p>
</div></div><p>The algorithm is backward stable if <span class="math notranslate nohighlight">\(\|L\|\|U\|=\mathcal{O}(\|A\|)\)</span>,
but there will be problems if <span class="math notranslate nohighlight">\(|L\|\|U\|\gg \|A\|\)</span>. For a proof of this
result, see the textbook by Golub and van Loan.</p>
<p>A similar result exists for pivoted LU. The main extra issue is that
small changes could potentially lead to a different pivoting matrix
<span class="math notranslate nohighlight">\(\tilde{P}\)</span> which is then <span class="math notranslate nohighlight">\(O(1)\)</span> different from <span class="math notranslate nohighlight">\(P\)</span>. This is characterised
in the following result (which we also do not prove).</p>
<div class="proof proof-type-theorem" id="id12">

    <div class="proof-title">
        <span class="proof-type">Theorem </span>

    </div><div class="proof-content">
<p>Let <span class="math notranslate nohighlight">\(\tilde{P}\)</span>, <span class="math notranslate nohighlight">\(\tilde{L}\)</span> and <span class="math notranslate nohighlight">\(\tilde{U}\)</span> be the result of the
partial pivoted Gaussian elimination algorithm implemented in a
floating point number system satisfying axioms I and II. If no zero
pivots are encountered, then</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\tilde{L}\tilde{U} = \tilde{P}A + \delta A\]</div>
</div></blockquote>
<p>where</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\frac{\|\delta A\|}{\|A\|} = \mathcal{O}(\rho\varepsilon),\]</div>
</div></blockquote>
<p>for some perturbation <span class="math notranslate nohighlight">\(\delta A\)</span>, and where <span class="math notranslate nohighlight">\(\rho\)</span> is the growth
factor,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\rho = \frac{\max_{ij}|u_{ij}|}{\max_{ij}|a_{ij}|}.\]</div>
</div></blockquote>
</div></div><p>Thus, partial pivoting (and complete pivoting turns out not to help
much extra) can keep the entries in <span class="math notranslate nohighlight">\(L\)</span> under control, but there can
still be pathological cases where entries in <span class="math notranslate nohighlight">\(U\)</span> can get large,
leading to large <span class="math notranslate nohighlight">\(\rho\)</span> and unstable computations.</p>
</section>
<section id="taking-advantage-of-matrix-structure">
<h2>Taking advantage of matrix structure<a class="headerlink" href="#taking-advantage-of-matrix-structure" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454111577" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The cost of the standard Gaussian elimination algorithm to form <span class="math notranslate nohighlight">\(L\)</span>
and <span class="math notranslate nohighlight">\(U\)</span> is <span class="math notranslate nohighlight">\(\mathcal{O}(m^3)\)</span>, which grows rather quickly as <span class="math notranslate nohighlight">\(m\)</span>
increases. If there is structure in the matrix, then we can often
exploit this to reduce the cost. Understanding when and how to exploit
structure is a central theme in computational linear algebra.
Here we will discuss some examples of structure to be exploited.</p>
<p>When <span class="math notranslate nohighlight">\(A\)</span> is a lower or upper triangular matrix then we can use
forwards or back substitution, with <span class="math notranslate nohighlight">\(\mathcal{O}(m^2)\)</span> operation count
as previously discussed.</p>
<p>When <span class="math notranslate nohighlight">\(A\)</span> is a diagonal matrix, i.e. <span class="math notranslate nohighlight">\(A_{ij}=0\)</span> for <span class="math notranslate nohighlight">\(i\ne j\)</span>, it only
has <span class="math notranslate nohighlight">\(m\)</span> nonzero entries, that can be stored as a vector,
<span class="math notranslate nohighlight">\((A_{11},A_{22},\ldots,A_{mm})\)</span>. In this case, <span class="math notranslate nohighlight">\(Ax=b\)</span> can be solved in
<span class="math notranslate nohighlight">\(m\)</span> operations, just by setting <span class="math notranslate nohighlight">\(x_i=b_i/A_{ii}\)</span>, for
<span class="math notranslate nohighlight">\(i=1,2,\ldots,m\)</span>.</p>
<p>Similarly, if <span class="math notranslate nohighlight">\(A \in \mathcal{C}^{dm\times dm}\)</span> is block diagonal,
i.e.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}A = \begin{pmatrix}
B_{1} &amp; 0 &amp; \ldots &amp; 0 \\
0 &amp; B_{2} &amp; \ldots &amp; 0 \\
\vdots &amp; \vdots &amp; \ddots &amp; 0 \\
0 &amp; 0 &amp; \ldots &amp; B_{m}
\end{pmatrix},\end{split}\]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(B_{i}\in\mathcal{C}^{d\times d}\)</span> for <span class="math notranslate nohighlight">\(i=1,2,\ldots,m\)</span>. The inverse
of <span class="math notranslate nohighlight">\(A\)</span> is</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}A = \begin{pmatrix}
B_{1}^{-1} &amp; 0 &amp; \ldots &amp; 0 \\
0 &amp; B_{2}^{-1} &amp; \ldots &amp; 0 \\
\vdots &amp; \vdots &amp; \ddots &amp; 0 \\
0 &amp; 0 &amp; \ldots &amp; B_{m}^{-1}
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>A generalisation of a diagonal matrix is a banded matrix, where
<span class="math notranslate nohighlight">\(A_{ij}=0\)</span> for <span class="math notranslate nohighlight">\(i&gt;j+p\)</span> and for <span class="math notranslate nohighlight">\(i&lt;j-q\)</span>. We call <span class="math notranslate nohighlight">\(p\)</span> the lower
bandwidth of <span class="math notranslate nohighlight">\(A\)</span>; <span class="math notranslate nohighlight">\(q\)</span> is the upper bandwidth. When the matrix is
banded, there are already zeros below the diagonal of <span class="math notranslate nohighlight">\(A\)</span>, so we know
that the corresponding entries in the <span class="math notranslate nohighlight">\(L_k\)</span> matrices will be zero.
Further, because there are zeros above the diagonal of <span class="math notranslate nohighlight">\(A\)</span>, these do
not need to be updated when applying the row operations to those
zeros.</p>
<div class="proof proof-type-exercise" id="id13">

    <div class="proof-title">
        <span class="proof-type">Exercise </span>

    </div><div class="proof-content">
<p>Construct the <span class="math notranslate nohighlight">\(100\times 100\)</span> matrix <span class="math notranslate nohighlight">\(A\)</span> as follows: take <span class="math notranslate nohighlight">\(A=3I\)</span>,
then set <span class="math notranslate nohighlight">\(A_{1,i}=1\)</span>, for <span class="math notranslate nohighlight">\(i=1,\ldots,100\)</span>. Then set <span class="math notranslate nohighlight">\(A_{i,1}=i\)</span> for
<span class="math notranslate nohighlight">\(i=1,\ldots,100\)</span>.  Using your own LU factorisation, compute the LU
factorisation of <span class="math notranslate nohighlight">\(A\)</span>. What
do you observe about the number of non-zero entries in <span class="math notranslate nohighlight">\(L\)</span> and <span class="math notranslate nohighlight">\(U\)</span>?
Explain this using what you have just learned about banded
matrices. Can the situation be improved by pivoting? (Just think about
it, don’t need to implement it.)</p>
</div></div><p>The Gaussian elimination algorithm (without pivoting) for a banded
matrix is given as pseudo-code below.</p>
<ul>
<li><p><span class="math notranslate nohighlight">\(U \gets A\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(L \gets I\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(k=1\)</span> TO <span class="math notranslate nohighlight">\(m-1\)</span></p>
<ul>
<li><p>FOR <span class="math notranslate nohighlight">\(j=k+1\)</span> TO <span class="math notranslate nohighlight">\(\min(k+p,m)\)</span></p>
<blockquote>
<div><ul class="simple">
<li><p><span class="math notranslate nohighlight">\(l_{jk} \gets u_{jk}/u_{kk}\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(n \gets \min(k+q, m)\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(u_{j,k:n} \gets u_{j,k:n}- l_{jk}u_{k,k:n}\)</span></p></li>
</ul>
</div></blockquote>
</li>
<li><p>END FOR</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>The operation count for this banded matrix algorithm is
<span class="math notranslate nohighlight">\(\mathcal{O}(mpq)\)</span>, which is linear in <span class="math notranslate nohighlight">\(m\)</span> instead of cubic!
Further, the resulting matrix <span class="math notranslate nohighlight">\(L\)</span> has lower bandwidth <span class="math notranslate nohighlight">\(p\)</span>
and <span class="math notranslate nohighlight">\(U\)</span> has upper bandwidth <span class="math notranslate nohighlight">\(q\)</span>. This means that we can also
exploit this structure in the forward and back substitution
algorithms as well. For example, the forward substitution algorithm
is given as pseudo-code below.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(x_1 \gets b_1/L_{11}\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(k=2\)</span> TO <span class="math notranslate nohighlight">\(m\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(j \gets \max(1, k-p)\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(x_k \gets \frac{b_k -L_{k,j:k-1}x_{j:k-1}}{L_{kk}}\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<p>This has an operation count <span class="math notranslate nohighlight">\(\mathcal{O}(mp)\)</span>. The story is
very similar for the back substitution.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454112153" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>Another example that we have already encountered is unitary matrices
<span class="math notranslate nohighlight">\(Q\)</span>. Since <span class="math notranslate nohighlight">\(Q^{-1}=Q^*\)</span>, solving the system <span class="math notranslate nohighlight">\(Qx=b\)</span> is just the cost of
applying <span class="math notranslate nohighlight">\(Q^*\)</span>, with operation count <span class="math notranslate nohighlight">\(\mathcal{O}(m^2)\)</span>.</p>
<p>An important matrix that we shall encounter later is an upper
Hessenberg matrix, that has a lower bandwidth of 1, but no particular
zero structure above the diagonal. In this case, the <span class="math notranslate nohighlight">\(L\)</span> matrix is
still banded (with lower bandwidth 1) but the <span class="math notranslate nohighlight">\(U\)</span> matrix is not.  This
means that there are still savings due to the zeros in <span class="math notranslate nohighlight">\(L\)</span>, but work
has to be done on the entire column of <span class="math notranslate nohighlight">\(U\)</span> above the diagonal, and so
solving an upper Hessenberg system has operation count
<span class="math notranslate nohighlight">\(\mathcal{O}(m^2)\)</span>.</p>
</section>
<section id="cholesky-factorisation">
<h2>Cholesky factorisation<a class="headerlink" href="#cholesky-factorisation" title="Link to this heading">¶</a></h2>
<p>An example of extra structure which we shall discuss in a bit more
detail is the case of Hermitian positive definite matrices. Recall
that a Hermitian matrix satisfies <span class="math notranslate nohighlight">\(A^*=A\)</span>, whilst positive definite
means that</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[x^*Ax &gt; 0, \, \forall \|x\|&gt;0.\]</div>
</div></blockquote>
<p>When <span class="math notranslate nohighlight">\(A\)</span> is Hermitian positive definite, it is possible to find an
upper triangular matrix <span class="math notranslate nohighlight">\(R\)</span> such that <span class="math notranslate nohighlight">\(A=R^*R\)</span>, which is called the
Cholesky factorisation. To show that it is possible to compute
the Cholesky factorisation, we start by assuming that <span class="math notranslate nohighlight">\(A\)</span> has
a 1 in the top-left hand corner, so that</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}A = \begin{pmatrix}
1 &amp; w^* \\
w &amp; K \\
\end{pmatrix}\end{split}\]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(w\)</span> is a <span class="math notranslate nohighlight">\(m-1\)</span> vector containing the rest of the first column
of <span class="math notranslate nohighlight">\(A\)</span>, and <span class="math notranslate nohighlight">\(K\)</span> is an <span class="math notranslate nohighlight">\((m-1)\times(m-1)\)</span> Hermitian positive
definite matrix. (Exercise: show that <span class="math notranslate nohighlight">\(K\)</span> is Hermitian positive
definite.)</p>
<p>After one stage of Gaussian elimination, we have</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}\underbrace{\begin{pmatrix}
1 &amp; 0 \\
-w &amp; I \\
\end{pmatrix}}_{L_1^{-1}}
\underbrace{
\begin{pmatrix}
1 &amp; w^* \\
w &amp; K \\
\end{pmatrix}}_{A}
=
\begin{pmatrix}
1 &amp; w^* \\
0 &amp; K - ww^* \\
\end{pmatrix}.\end{split}\]</div>
</div></blockquote>
<p>Further,</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}\begin{pmatrix}
1 &amp; w^* \\
0 &amp; K - ww^* \\
\end{pmatrix}=
\underbrace{
\begin{pmatrix}
1 &amp; 0 \\
0 &amp; K - ww^* \\
\end{pmatrix}}_{A_1}
\underbrace{
\begin{pmatrix}
1 &amp; w^T \\
0 &amp; I \\
\end{pmatrix}}_{(L_1^{-1})^*=L_1^{-*}},\end{split}\]</div>
</div></blockquote>
<p>so that <span class="math notranslate nohighlight">\(A = L_1^{-1}A_1L_1^{-*}\)</span>. If <span class="math notranslate nohighlight">\(a_{11} \neq 1\)</span>, we at least
know that <span class="math notranslate nohighlight">\(a_{11}= e_1^*Ae_1&gt;0\)</span>, and the factorisation becomes</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}A =
\underbrace{\begin{pmatrix} \alpha &amp; 0 \\
w/\alpha &amp; I \\
\end{pmatrix}}_{R_1^T}
\underbrace{
\begin{pmatrix}
1 &amp; 0 \\
0 &amp; K - \frac{ww^*}{a_{11}} \\
\end{pmatrix}}_{A_1}
\underbrace{
\begin{pmatrix}
\alpha &amp; w/\alpha \\
0 &amp; I \\
\end{pmatrix}}_{R_1},\end{split}\]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(\alpha=\sqrt{a_{11}}\)</span>. We can check that <span class="math notranslate nohighlight">\(A_1\)</span> is positive
definite, since</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[x^*A_1x = x^*R_1^{-*}AR_1x = (R_1^{-1}x)^*AR_1x = y^*Ay &gt; 0, \mbox{ where }
y = R_1x.\]</div>
</div></blockquote>
<p>Hence, <span class="math notranslate nohighlight">\(K-{ww^*}/{a_{11}}\)</span> is positive definite, since</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}r^*\left(K-\frac{ww^*}{a_{11}}\right)r = \begin{pmatrix} 0 \\ r \\ \end{pmatrix}^*
A_1 \begin{pmatrix} 0 \\ r \\ \end{pmatrix} &gt; 0,\end{split}\]</div>
</div></blockquote>
<p>and hence we can now perform the same procedure all over again to <span class="math notranslate nohighlight">\(K -
{ww^*}/a_{11}\)</span>. By induction we can always continue until we have the
required Cholesky factorisation, which is unique (since there were no
choices to be made at any step).</p>
<p>We can then present the Cholesky factorisation as pseudo-code.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(R\gets A\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(k=1\)</span> TO <span class="math notranslate nohighlight">\(m\)</span></p>
<ul>
<li><p>FOR <span class="math notranslate nohighlight">\(j=k+1\)</span> to <span class="math notranslate nohighlight">\(m\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(R_{j,j:m} \gets R_{j,j:m} - R_{k,j:m}\bar{R}_{kj}/R_{kk}\)</span></p></li>
</ul>
</li>
<li><p><span class="math notranslate nohighlight">\(R_{k,k:m} \gets R_{k,k:m}/\sqrt{R_{k:k}}\)</span></p></li>
</ul>
</li>
</ul>
<p>The operation count of the Cholesky factorisation is dominated
by the operation inside the <span class="math notranslate nohighlight">\(j\)</span> loop, which has one division,
<span class="math notranslate nohighlight">\(m-j+1\)</span> multiplications, and <span class="math notranslate nohighlight">\(m-j+1\)</span> subtractions, giving
<span class="math notranslate nohighlight">\(\sim 2(m-j)\)</span> FLOPs. The total operation count is then</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[N_{\mbox{FLOPs}} = \sum_{k=1}^m\sum_{j=k+1}^m
\sim \frac{1}{3}m^3.\]</div>
</div></blockquote>
</section>
</section>


            <div class="clearer"></div>
          </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="footer" role="contentinfo">
    &#169; Copyright 2020-2023, Colin J. Cotter.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.4.0.
    </div>
  </body>
</html>