comp-lin-alg.github.io/L6_krylov.html at master · comp-lin-alg/comp-lin-alg.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
<!DOCTYPE html>

<html lang="en" data-content_root="./">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />

    <title>6. Iterative Krylov methods for \(Ax=b\) &#8212; Computational linear algebra course 2023.0 documentation</title>
    <link rel="stylesheet" type="text/css" href="_static/pygments.css?v=03e43079" />
    <link rel="stylesheet" type="text/css" href="_static/fenics.css?v=16c5e00f" />
    <link rel="stylesheet" type="text/css" href="_static/proof.css" />
    <script src="_static/documentation_options.js?v=f1ab3ab9"></script>
    <script src="_static/doctools.js?v=9a2dae69"></script>
    <script src="_static/sphinx_highlight.js?v=dc90522c"></script>
    <script src="_static/proof.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@2/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="7. Preconditioning Krylov methods" href="L7_preconditioning.html" />
    <link rel="prev" title="5. Finding eigenvalues of matrices" href="L5_eigenvalues.html" />
<!--[if lte IE 6]>
<link rel="stylesheet" href="_static/ie6.css" type="text/css" media="screen" charset="utf-8" />
<![endif]-->

<link rel="stylesheet" href="_static/featured.css">


<link rel="shortcut icon" href="_static/icon.ico" />


  </head><body>
<div class="wrapper">
  <a href="index.html"><img src="_static/banner.png" width="900px" alt="Project Banner" /></a>
  <div id="access">
    <div class="menu">
      <ul>
          <li class="page_item"><a href="https://github.com/Computational-Linear-Algebra-Course/computational-linear-algebra-course" title="GitHub">GitHub</a></li>
      </ul>
    </div><!-- .menu -->
  </div><!-- #access -->
</div><!-- #wrapper -->


    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">

  <section id="iterative-krylov-methods-for-ax-b">
<h1><span class="section-number">6. </span>Iterative Krylov methods for <span class="math notranslate nohighlight">\(Ax=b\)</span><a class="headerlink" href="#iterative-krylov-methods-for-ax-b" title="Link to this heading">¶</a></h1>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454126320" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>In the previous section we saw how iterative methods are necessary
(but can also be fast) for eigenvalue problems <span class="math notranslate nohighlight">\(Ax=\lambda x\)</span>.
Iterative methods can also be useful for solving linear systems
<span class="math notranslate nohighlight">\(Ax=b\)</span>, generating a sequence of vectors <span class="math notranslate nohighlight">\(x^k\)</span> that converge to the
solution. We shall examine Krylov subspace methods, where each
iteration mainly involves a matrix-vector multiplication. For dense
matrices, matrix-vector multiplication costs <span class="math notranslate nohighlight">\(\mathcal{O}(m^2)\)</span>, but
often (e.g. numerical solution of PDEs, graph problems, etc.)  <span class="math notranslate nohighlight">\(A\)</span> is
sparse (i.e. has zeros almost everywhere) and the matrix-vector
multiplication costs <span class="math notranslate nohighlight">\(\mathcal{O}(m)\)</span>. The goal is then to find a
method where only a few iterations are necessary before the error is
very small, so that the solver has total cost <span class="math notranslate nohighlight">\(\mathcal{O}(mN)\)</span> where
<span class="math notranslate nohighlight">\(N\)</span> is the total number of iterations, hopefully small.</p>
<p>Since we only need the result of a matrix-vector multiplication, it is
even possible to solve a linear system without storing <span class="math notranslate nohighlight">\(A\)</span>
explicitly. Instead one can just provide a subroutine that implements
matrix-vector multiplication in some way; this is called a
“matrix-free” implementation.</p>
<section id="krylov-subspace-methods">
<h2><span class="section-number">6.1. </span>Krylov subspace methods<a class="headerlink" href="#krylov-subspace-methods" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454126582" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>In this section we will introduce Krylov subspace methods for solving
<span class="math notranslate nohighlight">\(Ax=b\)</span> (we will not specialise to real or symmetric matrices
here). The idea is to approximate the solution using the basis</p>
<div class="math notranslate nohighlight">
\[(b, Ab, A^2b, A^3b, \ldots, A^kb)\]</div>
<p>whose span is called a Krylov subspace. After each iteration the
Krylov subspace grows by one dimension. As we have already seen from
studying power iteration, the later elements in this sequence will get
very parallel (they will all be approximating the eigenvector with
largest magnitude of eigenvalue). Hence, we once again need to resort
to orthogonalising the basis. We could simply take the QR
factorisation of this basis, but the Arnoldi iteration coming up
next also provides a neat way to solve the equation when projected
onto the Krylov subspace.</p>
</section>
<section id="arnoldi-iteration">
<h2><span class="section-number">6.2. </span>Arnoldi iteration<a class="headerlink" href="#arnoldi-iteration" title="Link to this heading">¶</a></h2>
<p>The key to Krylov subspace methods turns out to be the transformation
of <span class="math notranslate nohighlight">\(A\)</span> to an upper Hessenberg matrix by orthogonal similarity
transforms, so that <span class="math notranslate nohighlight">\(A=QHQ^*\)</span>. We have already looked at using
Householder transformations to do this in the previous section.
The Householder technique is not so suitable for large dimensional
problems, so instead we look at a way of proceeding column by
column, just like the Gram-Schmidt method does for finding
<span class="math notranslate nohighlight">\(QR\)</span> factorisations.</p>
<p>We do this by rewriting <span class="math notranslate nohighlight">\(AQ=QH\)</span>. The idea is that at iteration <span class="math notranslate nohighlight">\(n\)</span> we
only look at the first <span class="math notranslate nohighlight">\(n\)</span> columns of <span class="math notranslate nohighlight">\(Q\)</span>, which we
call <span class="math notranslate nohighlight">\(\hat{Q}_n\)</span>. When <span class="math notranslate nohighlight">\(m\)</span> is large, this is a significant saving:
<span class="math notranslate nohighlight">\(mn \ll m^2\)</span>.
To execute the iteration, it turns out that
we should look at the <span class="math notranslate nohighlight">\((n+1)\times n\)</span> upper left-hand section of <span class="math notranslate nohighlight">\(H\)</span>,
i.e.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}\tilde{H}_n = \begin{pmatrix}
h_{11} &amp; \ldots &amp; h_{1n} \\
h_{21} &amp; \ddots &amp; \vdots \\
0 &amp; \ddots &amp; h_{nn} \\
0 &amp; 0 &amp; h_{n+1,n} \\
\end{pmatrix}\end{split}\]</div>
</div></blockquote>
<p>Then, <span class="math notranslate nohighlight">\(A\hat{Q}_n = \hat{Q}_{n+1}\tilde{H}_n\)</span>.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454127181" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>Using the column space interpretation of matrix-matrix multiplication,
we see that the <span class="math notranslate nohighlight">\(n\)</span>-th column is</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[Aq_n = h_{1n}q_1 + h_{2n}q_2 + \ldots + h_{n,n}q_n + h_{n+1,n}q_{n+1}.\]</div>
</div></blockquote>
<p>This formula shows us how to construct the non-zero entries of the
nth column of <span class="math notranslate nohighlight">\(H\)</span>; this defines the Arnoldi algorithm which we
provide as pseudo-code below.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(q_1\gets b/\|b\|\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(n=1,2,\ldots\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(v\gets Aq_n\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(j=1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(h_{jn}\gets q_j^*v\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(v \gets v - h_{jn}q_j\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
<li><p><span class="math notranslate nohighlight">\(h_{n+1,n} \gets \|v\|\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(q_{n+1} \gets v/\|v\|\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<div class="proof proof-type-exercise" id="id1">

    <div class="proof-title">
        <span class="proof-type">Exercise 6.1</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <a class="reference internal" href="cla_utils.html#cla_utils.exercises10.arnoldi" title="cla_utils.exercises10.arnoldi"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.arnoldi()</span></code></a> function has
been left unimplemented. It should implement the Arnoldi algorithm
using Numpy array operations where possible, and return the <span class="math notranslate nohighlight">\(Q\)</span> and
<span class="math notranslate nohighlight">\(H\)</span> matrices after the requested number of iterations is complete.
What is the minimal number of Python for loops possible?</p>
<p>The test
script <code class="docutils literal notranslate"><span class="pre">test_exercises10.py</span></code> in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test
this function.</p>
</div></div><p>If we were to form the (reduced) QR factorisation of the <span class="math notranslate nohighlight">\(m\times n\)</span> Krylov
matrix</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\begin{split}K_n = \begin{pmatrix}
b &amp; Ab &amp; \ldots &amp; A^{n_1}b \\
\end{pmatrix}\end{split}\]</div>
</div></blockquote>
<p>then we would get <span class="math notranslate nohighlight">\(Q=\hat{Q}_n\)</span>. Importantly, in the Arnoldi iteration, we
never form <span class="math notranslate nohighlight">\(K_n\)</span> or <span class="math notranslate nohighlight">\(R_n\)</span> explicitly, since these are very
ill-conditioned and not useful numerically.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454136990" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>But what is the use of the <span class="math notranslate nohighlight">\(\tilde{H}_n\)</span> matrix? Applying
<span class="math notranslate nohighlight">\(\hat{Q}_n^*\)</span> to <span class="math notranslate nohighlight">\(A\hat{Q}_n = \hat{Q}_{n+1}\tilde{H}_n\)</span> gives</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[ \begin{align}\begin{aligned}\hat{Q}_n^*A\hat{Q}_n = \hat{Q}_n^*\hat{Q}_{n+1}\tilde{H}_n,\\\begin{split}= \begin{pmatrix}
1 &amp; 0 &amp; \ldots &amp; 0 &amp; 0 \\
0 &amp; \ddots &amp; \ddots &amp; \vdots &amp; \vdots \\
\vdots &amp; \ddots &amp; \ddots &amp; \vdots &amp; \vdots \\
0 &amp; \ldots &amp; \ldots &amp; 1 &amp; 0 \\
\end{pmatrix}
\tilde{H}_n
= H_n,\end{split}\end{aligned}\end{align} \]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(H_n\)</span> is the <span class="math notranslate nohighlight">\(n\times n\)</span> top left-hand corner of <span class="math notranslate nohighlight">\(H\)</span>.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454171516" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The intrepretation of this is that <span class="math notranslate nohighlight">\(H_n\)</span> is the orthogonal projection
of <span class="math notranslate nohighlight">\(A\)</span> onto the Krylov subspace <span class="math notranslate nohighlight">\(\mathrm{span}(K_n)\)</span>. To see this, take any vector <span class="math notranslate nohighlight">\(v\)</span>,
and project <span class="math notranslate nohighlight">\(Av\)</span> onto the the Krylov subspace <span class="math notranslate nohighlight">\(\mathrm{span}(K_n)\)</span>.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[PAv = \hat{Q}_n\hat{Q}_n^*v.\]</div>
</div></blockquote>
<p>Then, changing basis to the orthogonal basis gives</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\hat{Q}_n^*(\hat{Q}_n\hat{Q}_n^*A)\hat{Q}_n = \hat{Q}_n^*A\hat{Q}_n
= H_n.\]</div>
</div></blockquote>
</section>
<section id="gmres">
<h2><span class="section-number">6.3. </span>GMRES<a class="headerlink" href="#gmres" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454171559" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The Generalised Minimum Residual method (GMRES), due to Saad (1986),
exploits these properties of the Arnoldi iteration. The idea is
that we build up the orthogonal basis for the Krylov subspaces
one by one, and at each iteration we solve the projection of
<span class="math notranslate nohighlight">\(Ax=b\)</span> onto the Krylov basis as a least squares problem, until
the residual <span class="math notranslate nohighlight">\(\|Ax-b\|\)</span> is below some desired tolerance.</p>
<p>To avoid the numerical instabilities that would come from using the
basis <span class="math notranslate nohighlight">\((b,Ab,A^2b,\ldots)\)</span>, we use the Arnoldi iteration to build an
orthonormal basis, and seek approximate solutions of the form <span class="math notranslate nohighlight">\(x_n =
\hat{Q}_ny\)</span> for <span class="math notranslate nohighlight">\(y\in\mathbb{C}^n\)</span>. We then seek the value of <span class="math notranslate nohighlight">\(y\)</span>
that minimises the residual</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\mathcal{R}_n = \|A\hat{Q}_ny - b\|.\]</div>
</div></blockquote>
<p>This explains the Minimum Residual part of the name. We also see from
this definition that the residual cannot increase with iterations,
because it only increases the subspace where we seek a solution.</p>
<p>This problem can be simplified further by using <span class="math notranslate nohighlight">\(A\hat{Q}_n = \hat{Q}_{n+1}\tilde{H}_n\)</span>,
so</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\mathcal{R}_n = \|\hat{Q}_{n+1}\tilde{H}_n y - b\|.\]</div>
</div></blockquote>
<p>Remembering that <span class="math notranslate nohighlight">\(b=\|b\|q_1\)</span>, we see that the entire residual is in
the column space of <span class="math notranslate nohighlight">\(\hat{Q}_{n+1}\)</span>, and hence we can invert
on the column space using <span class="math notranslate nohighlight">\(\hat{Q}_{n+1}^*\)</span> which does not change
the norm of the residual due to the orthonormality.</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\mathcal{R}_n = \|\tilde{H}_n y - \hat{Q}_{n+1}^*b\|
= \|\tilde{H}_n y - e_1\|b\|\|.\]</div>
</div></blockquote>
<p>Finding <span class="math notranslate nohighlight">\(y\)</span> to minimise <span class="math notranslate nohighlight">\(\mathcal{R}_n\)</span> requires the solution of a
least squares problem, which can be computed via QR factorisation
as we saw much earlier in the course.</p>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454171921" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>We are now in position to present the GMRES algorithm as pseudo-code.</p>
<ul class="simple">
<li><p><span class="math notranslate nohighlight">\(q_1 \gets b/\|b\|\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(n=1,2,\dots\)</span></p>
<ul>
<li><p>APPLY STEP <span class="math notranslate nohighlight">\(n\)</span> OF ARNOLDI</p></li>
<li><p>FIND <span class="math notranslate nohighlight">\(y\)</span> TO MINIMIZE <span class="math notranslate nohighlight">\(\|\tilde{H}_ny - \|b\|e_1\|\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(x_n \gets \hat{Q}_ny\)</span></p></li>
<li><p>CHECK IF <span class="math notranslate nohighlight">\(\mathcal{R}_n &lt;\)</span> TOL</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<div class="proof proof-type-exercise" id="id2">

    <div class="proof-title">
        <span class="proof-type">Exercise 6.2</span>

    </div><div class="proof-content">
<p><span class="math notranslate nohighlight">\((\ddagger)\)</span> The <a class="reference internal" href="cla_utils.html#cla_utils.exercises10.GMRES" title="cla_utils.exercises10.GMRES"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.GMRES()</span></code></a> function has
been left unimplemented. It should implement the basic GMRES
algorithm above, using one loop over the iteration count. You can
paste code from your <a class="reference internal" href="cla_utils.html#cla_utils.exercises10.arnoldi" title="cla_utils.exercises10.arnoldi"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.arnoldi()</span></code></a>
implementation, and you should use your least squares code to solve
the least squares problem.  The test script <code class="docutils literal notranslate"><span class="pre">test_exercises10.py</span></code>
in the <code class="docutils literal notranslate"><span class="pre">test</span></code> directory will test this function.</p>
</div></div><div class="proof proof-type-exercise" id="id3">

    <div class="proof-title">
        <span class="proof-type">Exercise 6.3</span>

    </div><div class="proof-content">
<p>The least squares problem in GMRES requires the QR
factorisation of <span class="math notranslate nohighlight">\(H_k\)</span>. It is wasteful to rebuild this from scratch
given that we just computed the QR factorisation of
<span class="math notranslate nohighlight">\(H_{k-1}\)</span>. Modify your code so that it recycles the QR
factorisation, applying just one extra Householder rotation per
GMRES iteration. Don’t forget to check that it still passes the
test.</p>
</div></div><div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>Don’t get confused by the two Q matrices involved in GMRES! There
is the Q matrix for the Arnoldi iteration, and the Q matrix for
the least squares problem. These are not the same.</p>
</div>
</section>
<section id="convergence-of-gmres">
<h2><span class="section-number">6.4. </span>Convergence of GMRES<a class="headerlink" href="#convergence-of-gmres" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454198706" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>The algorithm presented as pseudocode is the way to implement GMRES
efficiently. However, we can make an alternative formulation
of GMRES using matrix polynomials.</p>
<p>We know that <span class="math notranslate nohighlight">\(x_n\in \mathrm{span}(K_n)\)</span>, so we can write</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[x_n = c_0b + c_1Ab + c_2A^2b + \ldots + c_{n-1}A^{n-1}b
= p'(A)b,\]</div>
</div></blockquote>
<p>where</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[p'(z) = c_0 + c_1z + c_2z^2 + \ldots + \ldots c_{n-1}z^{n-1} \implies
p'(A) = c_0I + c_1A + c_2A^2 + \ldots + c_{n-1}A^{n-1}.\]</div>
</div></blockquote>
<p>Here we have introduced the idea of a matrix polynomial, where the
kth power of <span class="math notranslate nohighlight">\(z\)</span> is replaced by the kth power of <span class="math notranslate nohighlight">\(A\)</span>.</p>
<p>The residual becomes</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[r_n = b - Ax_n = b - Ap'(A)b = (I - Ap'(A))b
= p(A)b,\]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(p(z) = 1 - zp'(z)\)</span>. Thus, the residual is a matrix polynomial
<span class="math notranslate nohighlight">\(p\)</span> of <span class="math notranslate nohighlight">\(A\)</span> applied to <span class="math notranslate nohighlight">\(b\)</span>, where <span class="math notranslate nohighlight">\(p\in \mathcal{P}_n\)</span>, and</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\mathcal{P}_n = \{\mbox{degree }\leq n\mbox{ polynomials with }
p(0)=1\}.\]</div>
</div></blockquote>
<p>Hence, we can recast iteration <span class="math notranslate nohighlight">\(n\)</span> of GMRES as a polynomial
optimisation problem: find <span class="math notranslate nohighlight">\(p_n\in \mathcal{P}_n\)</span> such that
<span class="math notranslate nohighlight">\(\|p_n(A)b\|\)</span> is minimised.  We have</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\|r_n\|  = \|p_n(A)b\| \leq \|p_n(A)\|\|b\|
\implies \frac{\|r_n\|}{\|b\|} \leq \|p_n(A)\|.\]</div>
</div></blockquote>
<p>Assuming that <span class="math notranslate nohighlight">\(A\)</span> is diagonalisable, <span class="math notranslate nohighlight">\(A=V\Lambda V^{-1}\)</span>, then
<span class="math notranslate nohighlight">\(A^s=V\Lambda^sV^{-1}\)</span>, and</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\|p_n(A)\| = \|Vp_n(\Lambda^s)V^{-1}\|
\leq \underbrace{\|V\|\|V^{-1}\|}_{=\kappa(V)}
\|p_n\|_{\Lambda(A)},\]</div>
</div></blockquote>
<p>where <span class="math notranslate nohighlight">\(\Lambda(A)\)</span> is the set of eigenvalues of <span class="math notranslate nohighlight">\(A\)</span>, and</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\|p\|_{\Lambda(A)} = \sup_{x\in \Lambda(A)}\|p(x)\|.\]</div>
</div></blockquote>
<p>Hence we see that GMRES will converge quickly if <span class="math notranslate nohighlight">\(V\)</span> is
well-conditioned, and <span class="math notranslate nohighlight">\(p(x)\)</span> is small for all <span class="math notranslate nohighlight">\(x\in \Lambda(A)\)</span>.  This
latter condition is not trivial due to the <span class="math notranslate nohighlight">\(p(0)=1\)</span> requirement. One
way it can happen is if <span class="math notranslate nohighlight">\(A\)</span> has all eigenvalues clustered in a small
number of groups, away from $0$. Then we can find a low degree
polynomial that passes through 1 at <span class="math notranslate nohighlight">\(x=0\)</span>, and 0 near each of the
clusters. Then GMRES will essentially converge in a small number of
iterations (equal to the degree of the polynomial). There are problems
if the eigenvalues are scattered over a wide region of the complex
plane: we need a very high degree polynomial to make <span class="math notranslate nohighlight">\(p(x)\)</span> small at
all the eigenvalues and hence we need a very large number of
iterations. Similarly there are problems if eigenvalues are very close
to zero.</p>
<div class="proof proof-type-exercise" id="id4">

    <div class="proof-title">
        <span class="proof-type">Exercise 6.4</span>

    </div><div class="proof-content">
<p>Investigate the convergence of the matrices provided by the
functions <a class="reference internal" href="cla_utils.html#cla_utils.exercises10.get_AA100" title="cla_utils.exercises10.get_AA100"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.get_AA100()</span></code></a>,
<a class="reference internal" href="cla_utils.html#cla_utils.exercises10.get_BB100" title="cla_utils.exercises10.get_BB100"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.get_BB100()</span></code></a>, and
<a class="reference internal" href="cla_utils.html#cla_utils.exercises10.get_CC100" title="cla_utils.exercises10.get_CC100"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.get_CC100()</span></code></a>, by looking at the residual
after each iteration (which should be provided by
<a class="reference internal" href="cla_utils.html#cla_utils.exercises10.GMRES" title="cla_utils.exercises10.GMRES"><code class="xref py py-func docutils literal notranslate"><span class="pre">cla_utils.exercises10.GMRES()</span></code></a> as specified in the docstring).
What do you observe? What is it about the three matrices that
causes this different behaviour?</p>
</div></div></section>
<section id="preconditioned-gmres">
<h2><span class="section-number">6.5. </span>Preconditioned GMRES<a class="headerlink" href="#preconditioned-gmres" title="Link to this heading">¶</a></h2>
<details>
<summary>
Supplementary video</summary><div class="video_wrapper" style="">
<iframe allowfullscreen="true" src="https://player.vimeo.com/video/454218547" style="border: 0; height: 345px; width: 560px">
</iframe></div></details><p>This final topic has been a strong focus of computational linear algebra
over the last 30 years. Typically, the matrices that we want to solve
do not have eigenvalues clustered in a small number of groups, and so
GMRES is slow. The solution (and the challenge) is to find a matrix
<span class="math notranslate nohighlight">\(\hat{A}\)</span> such that <span class="math notranslate nohighlight">\(\hat{A}x = y\)</span> is cheap to solve (diagonal, or triangular, or
something else) and such that <span class="math notranslate nohighlight">\(\hat{A}^{-1}A\)</span> <em>does</em> have eigenvalues clustered
in a small number of groups (e.g. <span class="math notranslate nohighlight">\(\hat{A}\)</span> is a good approximation of <span class="math notranslate nohighlight">\(A\)</span>, so
that <span class="math notranslate nohighlight">\(\hat{A}^{-1}A\approx I\)</span> which has eigenvalues all equal to 1). We call
<span class="math notranslate nohighlight">\(\hat{A}\)</span> the preconditioning matrix, and the idea is to apply GMRES to
the (left) preconditioned system</p>
<blockquote>
<div><div class="math notranslate nohighlight">
\[\hat{A}^{-1}Ax = \hat{A}^{-1}b.\]</div>
</div></blockquote>
<p>GMRES on this preconditioned system is equivalent to the following algorithm,
called preconditioned GMRES.</p>
<ul class="simple">
<li><p>SOLVE <span class="math notranslate nohighlight">\(\hat{A}\tilde{b}=b\)</span>.</p></li>
<li><p><span class="math notranslate nohighlight">\(q_1 \gets \tilde{b}/\|\tilde{b}\|\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(n=1,2,\dots\)</span></p>
<ul>
<li><p>SOLVE <span class="math notranslate nohighlight">\(\hat{A}v = Aq_n\)</span></p></li>
<li><p>FOR <span class="math notranslate nohighlight">\(j=1\)</span> TO <span class="math notranslate nohighlight">\(n\)</span></p>
<ul>
<li><p><span class="math notranslate nohighlight">\(h_{jn}=q_j^*v\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(v = v - h_{jn}q_j\)</span></p></li>
</ul>
</li>
<li><p>END FOR</p></li>
<li><p><span class="math notranslate nohighlight">\(h_{n+1,n} \gets \|v\|\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(q_{n+1}\)</span>gets v/|v|`</p></li>
<li><p>FIND <span class="math notranslate nohighlight">\(y\)</span> TO MINIMIZE <span class="math notranslate nohighlight">\(\|\tilde{H}_ny - \|\tilde{b}\|e_1\|\)</span></p></li>
<li><p><span class="math notranslate nohighlight">\(x_n \gets \hat{Q}_ny\)</span></p></li>
<li><p>CHECK IF <span class="math notranslate nohighlight">\(\mathcal{R}_n &lt;\)</span> TOL</p></li>
</ul>
</li>
<li><p>END FOR</p></li>
</ul>
<div class="proof proof-type-exercise" id="id5">

    <div class="proof-title">
        <span class="proof-type">Exercise 6.5</span>

    </div><div class="proof-content">
<p>Show that this algorithm is equivalent to GMRES applied to the
preconditioned system.</p>
</div></div><p>The art and science of finding preconditioning matrices <span class="math notranslate nohighlight">\(\hat{A}\)</span> (or
matrix-free procedures for solving <span class="math notranslate nohighlight">\(\hat{A}x=y\)</span>) for specific problems
arising in data science, engineering, physics, biology etc. can
involve ideas from linear algebra, functional analysis, asymptotics,
physics, etc., and represents a major activity in scientific computing
today.</p>
</section>
<section id="knowing-when-to-stop">
<h2><span class="section-number">6.6. </span>Knowing when to stop<a class="headerlink" href="#knowing-when-to-stop" title="Link to this heading">¶</a></h2>
<p>We should stop an iterative method when the error is sufficiently small.
But, we don’t have access the the exact solution, so we can’t compute the
error. Two things we can look at are:</p>
<ul class="simple">
<li><p>The residual <span class="math notranslate nohighlight">\({r}^k=A{x}^k-{b}\)</span>, or</p></li>
<li><p>The pseudo-residual <span class="math notranslate nohighlight">\({s}^k = {x}^{k+1}-{x}^k\)</span>,
which both tend to zero as <span class="math notranslate nohighlight">\({x}^k\to{x}^*\)</span> provided that <span class="math notranslate nohighlight">\(A\)</span> is
invertible.</p></li>
</ul>
<p>How do their sizes relate to the size of <span class="math notranslate nohighlight">\({e}^k={x}^k-{x}^*\)</span>?</p>
<div class="math notranslate nohighlight">
\[\begin{split}{e}^k  = &amp; {x}^* - {x}^k \\
 = &amp; A^{-1}(A{x}^*-A{x}^k) \\
 = &amp; A^{-1}({b}-A{x}^k) \\
 = &amp; A^{-1}{r}^k,\end{split}\]</div>
<p>so <span class="math notranslate nohighlight">\(\|e^k\| \leq \|A^{-1}\|\|r^k\|\)</span>.</p>
<p>The relative error <span class="math notranslate nohighlight">\(\|e^k\|/\|x^*\|\)</span> satisfies</p>
<div class="math notranslate nohighlight">
\[\frac{\|e^k\|}{\|x^*\|}
= \frac{\|A^{-1}r^k\|}{\|x\|}
\leq \|A^{-1}\|\frac{\|r^k\|}{\|x\|}
\leq \|A^{-1}\|\|A\|\frac{\|r^k\|}{\|b\|},\]</div>
<p>so the relative error is bounded from above by
the condition number of <span class="math notranslate nohighlight">\(\|A\|\)</span> multiplied by
the relative residual <span class="math notranslate nohighlight">\(\|r^k\|/\|b\|\)</span>. If the condition
number is large, it is possible to have a small residual
but still have a large condition number.</p>
<p>Similar results hold for the pseudoresidual, but are
more complicated to show for the case of GMRES.</p>
</section>
</section>


            <div class="clearer"></div>
          </div>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="footer" role="contentinfo">
    &#169; Copyright 2020-2023, Colin J. Cotter.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.4.0.
    </div>
  </body>
</html>