Intro_to_programming_for_data_sci/recursion.qmd at main · NUstat/Intro_to_programming_for_data_sci · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
---
title: "R: Recursion"
format:
    html:
      toc: true
      self-contained: true
editor_options:
  chunk_output_type: console
---

## Recursion

Recursion is a method of solving a problem by dividing it into smaller instances of the same problem. Recursion solves such problems by using functions that call themselves from within their own code. This forms a loop, where every time the function is called, it calls itself again and again. However, every time the function calls itself, it checks certain condition(s) which are the stopping condition(s). When such condition(s) are true the function will stop calling itself. These conditions are called the base case of the recursive function.

Every recursive function must have at least two cases:

**1. Base case:** This is the simplest case that can be answered directly, and the function does not call itself.

**2. Recursive case:** This is a relatively more complex case that cannot be answered directly, but can be described as a smaller instance of the same problem. In this case, the function calls itself to answer the smaller problem.

Below is an example, where we defined a function that computes the factorial of an integer by recursion.


```{r}
factorial <- function(n) {

  # Base case
  if (n == 1) return(1)

  # Recursive case
  return(n * factorial(n - 1))
}
factorial(5)
```

In the above example, the case $n=1$ is the base case, where the function does not need to call itself, and returns 1. All other cases, where $n>1$, and $n \in  \mathbb{Z}$ are recursive cases, where the function calls itself with a smaller instance of the same problem.

A recursive function must satisfy the following conditions:

1. There must be a case for all valid inputs.

2. There must be a base case that makes no recursive calls.

3. When the function makes a recursive call, it should be to a simpler instance and make forward progress towards the base case.

**Example:** Write a recursive function that returns the $n^{th}$ term of the Fibonacci sequence, where $n$ is an integer, and $n>0$. In a Fibonacci sequence, each number is the sum of the preceding two numbers, and the sequence starts from $0,1$. The sequence is as follows:

$0, 1, 1, 2, 3, 5, 8, 13, 21, ...$

```{r}
fibonacci <- function(n) {
  # Base case
  if (n == 0 | n == 1) return(n)

  #Recursive case
  return(fibonacci(n - 1) + fibonacci(n - 2))
}
#The function `fibonacci` prints the n+1th term of the fibonacci sequence when `n` is passed as an argument. Thus, we need to reduce `n` by 1 to print the nth term of the sequence. The function `nth_term` reduces `n` by 1 before passing `n` to the function `fibonacci()`.
nth_term <- function(N) {
  fibonacci(N - 1)
}
nth_term(7)
```


### Practice exercise 1

Write a recursive function that computes the sum of squares of the first $N$ natural numbers, where $N$ is a parameter to the function.

```{r}
#| eval: false
squares <- function(N)
{

  # Base case
  if(N == 1)  return(1)

  # Recursive case
  return(N ** 2 + squares(N - 1))
}
squares(10)
```

### Practice exercise 2

Write a function that counts the occurrence of digit $k$ in a given integer $n$ using recursion. The function has $n$ and $k$ as parameters.

```{r}
#| eval: false
freq_digits <- function(n, d) {
  if (n == 0) return(0)
  digit <- n %% 10
  n_int <- as.integer(n / 10)
  if (digit == d) return(1 + freq_digits(n_int, d))
  return(freq_digits(n_int, d))
}
freq_digits(8670800,0)
```

### Practice exercise 3

Use recursion to write a function that accepts a word as an argument, and returns `TRUE` if the word is a palindrome, otherwise returns `FALSE`.


```{r}
#| eval: false
word<-'racecar'
palindrome <- function(word) {
  if(nchar(word) <= 1) return(TRUE)
  if(substr(word, 1, 1) == substr(word, nchar(word), nchar(word))) {
    palindrome(substr(word, 2, nchar(word) - 1))
  } else return(FALSE)
}
palindrome(word)
```

## Space occupied by recursive calls

Stack memory is a memory usage mechanism that allows the system memory to be used as temporary data storage that behaves as a first-in-last-out buffer.

When a recursive function is called in a programming language, the stack memory is used to keep track of each invocation of the function. Let's break down how stack memory is occupied during the execution of a recursive function:

### Function Call

When a recursive function is called, a new stack frame is created on the call stack. The parameters, local variables, and return address are stored in this stack frame.

### Nested Calls

If the recursive function makes another call to itself, a new stack frame is created for the new invocation. This process continues as long as the base case is not reached.

### Stack Frames in Memory

Each stack frame is pushed onto the top of the call stack, forming a chain of frames. The stack grows deeper with each recursive call.

### Local Variables and Parameters

Each invocation has its own set of local variables and parameters stored in its respective stack frame. These values are separate and independent for each level of recursion.

### Return Addresses

The return address of each invocation is stored in its stack frame. When a function call completes, the program knows where to return by using this address.

### Base Case

The recursion continues until the base case is reached. The base case is a condition that, when met, stops the recursive calls and starts unwinding the call stack.

### Unwinding the Stack

* As the base case is reached, the recursive calls start to complete, and the stack frames are popped off the call stack.

* The return values are used to compute the final result as the stack unwinds.

### Memory Deallocation

* As each stack frame is popped off, the memory occupied by that frame is deallocated.

* This process continues until the initial function call is reached.

The function `CStack_info()` can be used to retrieve the stack memory occupied by the recursive calls. If the function is recursively called so many times such that the stack usage limit is reached, the program stops indicating a stack overflow error.

## Recursion vs iteration

Recursion is typically used when the problem is naturally recursive (for e.g., generating a Fibonacci sequence), or the data is naturally recursive ( for e.g., filesystem). Recursive solutions can be easy to read and understand as compared to the corresponding iterative solution.

One downside of recursion is that it may take more space than an iterative solution. Building up a stack of recursive calls consumes memory temporarily, and the stack is limited in size, which may become a limit on the size of the problem that the recursive implementation can solve.

In the factorial examples below, we compare the stack memory occupied in case of recursion and iteration. Note that the space occupied continues to increase in case of recursion while remains a constant in case of iteration. The units of the `size` and `current` attributes are bytes.

```{r}
#| echo: false
rm(list = ls())
```


```{r}
# Finding factorial of an integer with recursion

factorial_recursion <- function(n) {

  if (n == 1) return(1)
  print(Cstack_info())
  return(n * factorial(n - 1))
}
factorial_recursion(5)
```

```{r}
#| echo: false
rm(list = ls())
```


```{r}
# Finding factorial of an integer with iteration

factorial_iteration <- function(n) {
  fac <- 1
  for (i in 1:n) {
    fac <- fac*i
    print(Cstack_info())
  }
  return(fac)
}
factorial_iteration(5)
```


### Time Complexity

* There are $O(N)$ recursive calls in our recursive approach, and each call uses $O(1)$ operations. Thus, the time complexity of factorial using recursion is $O(N)$.

* There are $O(N)$ iterations of the loop in our iterative approach, so its time complexity is also $O(N)$.

Though both the programs’ theoretical time complexity is the same, a recursive program will take more time to execute due to the overhead of function calls, which is much higher than that of iteration.

### Space Complexity

* In the recursive program, due to each recursive call, some memory gets allocated in the stack to store parameters and local variables. As there are $O(N)$ recursive calls, the space complexity using recursion is $O(N)$.

* No extra memory gets allocated in the iterative program, so its space complexity is $O(1)$.


### Strengths and Weaknesses of Recursion and Iteration

#### Iteration

**Strengths:**

* Iteration can be used to repeatedly execute a set of statements without the overhead of function calls and without using stack memory.

* Iteration is faster and more efficient than recursion.

* It's easier to optimize iterative codes.

**Weaknesses:**

* In loops, we can go only in one direction, i.e., we can’t go or transfer data from the current state to the previous state that has already been executed.

* It’s difficult to traverse trees/graphs using loops.

#### Recursion

**Strengths:**

* It’s easier to code the solution using recursion when the solution of the current problem is dependent on the solution of smaller similar problems.

* Recursive codes are smaller and easier to understand.

* We can pass information to the next state in the form of parameters and return information to the previous state in the form of the return value.

* It’s a lot easier to perform operations on trees and graphs using recursion.

**Weaknesses:**

* The simplicity of recursion comes at the cost of time and space efficiency.

* It is much slower than iteration due to the overhead of function calls and control shift from one function to another.

* It requires extra memory on the stack for each recursive call. This memory gets de-allocated when function execution is over.

* It is difficult to optimize a recursive code, and they generally have higher time complexity than iterative codes due to overlapping sub-problems.