-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy path06-lists.Rmd
More file actions
executable file
·375 lines (264 loc) · 17 KB
/
06-lists.Rmd
File metadata and controls
executable file
·375 lines (264 loc) · 17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
# Lists and Sequences
In this module, we will cover using **lists** in Python, which is a data type representing a _sequence_ of data values (similar to how a `string` is a squence of letters, and a `range` is a sequence of numbers). A list is a fundamental data type in Python, and key to writing almost all practical programs. Lists are used to store and organize large sets of data (and computer programs usually deal with _lots_ of data). This module cover how to create, access, and utilize lists to automate the processing of larger amounts of data.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Contents**
- [Resources](#resources)
- [Lists](#lists)
- [List Indices](#list-indices)
- [List Operations and Methods](#list-operations-and-methods)
- [Lists and Loops](#lists-and-loops)
- [Nested Lists](#nested-lists)
- [Tuples](#tuples)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## Resources
- [Lists (Severance)](http://openbookproject.net/thinkcs/python/english3e/lists.html)
- [Lists (Sweigart)](https://automatetheboringstuff.com/chapter4/)
- [Tuples (Downey)](https://books.trinket.io/pfe/10-tuples.html)
- [Sequence Types (Python Docs)](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)
## Lists
A list is a **mutable**, **ordered** sequence of values that are all stored in a single variable. For example, you can make a vector `names` that contains the strings "Sarah", "Amit", and "Zhang", or a vector `one_to_hundred` that stores the numbers from 1 to 100. Each value in a list is refered to as an **element** of that list; thus our `names` list would have 3 elements: `"Sarah"`, `"Amit"`, and `"Zhang"`.
Lists are written as literals inside _square brackets_ (**`[]`**), with each element in the list separated by a _comma_ (**`,`**):
```python
# a list of names
names = ["Sarah", "Amit", "Zhang"]
# a list of numbers (can contain "duplicate" values)
numbers = [1, 2, 2, 3, 5, 8]
# lists can contain different types (including other lists!)
things = ["raindrops", 2, True, [5, 9, 8]]
# lists can be empty (with no elements)
empty = []
```
- List variables should be named using **plurals** (name_s_, number_s_, etc.), because lists hold multiple values!
Other sequences (such as strings or ranges) can be converted into lists by using the built-in `list()` function:
```python
list("hello") # ['h', 'e', 'l', 'l', 'o']
list(range(1,5)) # [1, 2, 3, 4]
```
### List Indices
We can refer to individual elements in a list by their **index**, which is the number of their position in the list. Lists are _zero-indexed_, which means that positions are counted starting at `0`. For example, in the list:
```python
vowels = ['a', 'e', 'i', 'o', 'u']
```
The `'a'` (the first element) is at _index_ 0, `'e'` (the second element) is at index 2, and so on.
- This also means that the last element can be found at the index `length_of_list - 1`. This pattern is why values like `range(5)` _include_ `0` but _exclude_ the last `5`.
You can retrieve an element from a list using **bracket notation**: you refer to the element at a particular index of a list by writing the name of the list, followed by square brackets (**`[]`**) that contain the index of interest:
```r
names = ["Sarah", "Amit", "Zhang"]
# access the element at index 0
name_first = names[0]
print(name_first) # Sarah
# access the element at index 2
name_third = names[2]
print(name_third) # Zhang
# accessing an index not in the list will give an error
name_fourth = names[3] # IndexError!
# negative indices count backwards from the end
name_last = names[-1] # Zhang
name_second_to_last = names[-2] #Amit
```
The value inside the square brackets can any expression that resolves to an integer, including variables:
```python
names[1+1] # "Amit"
last_index = len(names) - 1 # last index is length of list - 1
names[last_index] # "Zhang"
# Don't forget to subtract one from the length!
names[len(names)] # IndexError!
# Using an index of `-1` is a better solution
```
It is possible to select multiple, _consecutive_ elements from a list by specifying a **slice**. A slice is written as the starting and ending indices separated by a colon (**`:`**); the starting index is included and the ending index is _excluded_. For example:
```python
letters = ['a', 'b', 'c', 'd', 'e', 'f']
# indices 1 through 3 (non-inclusive)
letters[1:3] # ['b', 'c']
# indices 3 to the end (inclusive)
letters[3:] # ['d', 'e', 'f']
# indices up to 3 (non-inclusive)
letters[:3] # ['a', 'b', 'c']
# indices 2 to (2 from end) (non-inclusive)
letters[2:-2] # ['c', 'd'], the `e` is excluded
# all the indices. This produces a new list with the same contents!
letters[:]
```
An indexed reference to a list element (e.g., `names[0]`) is effectively a _variable in its own right_: you can think of `names[0]`, `names[1]`, and `names[2]` as being equivalent to having variables `names_0`, `names_1`, `names_2` (each of which has its own value). Lists effectively provide a "shortcut" for having lots and lots of variables that are all related; instead we can "collect" those variables into a list!
Because list references are variables, they can be used anywhere that a "normal" variable can be. In particular, this means that variables can be assigned to them, allowing the list to be **mutated** (changed):
```python
# a list of school supplies
school_supplies = ["Backpack", "Laptop", "Pen"]
# replace "Pen" with "Pencil"
school_supplies[2] = "Pencil"
print(school_supplies) # ['Backpack', 'Laptop', 'Pencil']
# You can only assign values to "variables" that exist in the list!
school_supplies[3] = "Paper" # IndexError!
```
Just as with variables: if the list index (e.g., `my_list[index]`) is on the _left_ side of an assignment, it means the **variable** (which "slot" in the list). If it is on the _right_ side of an assignment, it means the **value** (which element is in that slot).
Finally, note that **strings** are also _indexed sequences_ of characters. Thus you can use _bracket notation_ to refer to individual letters, and many string methods utilize the index in their arguments:
```python
message_str = "Hello world"
message_str[1:5] # "ello"
# find the index of the 'w'
message_str.find('w') # 6
```
## List Operations and Methods
Lists support a number of different [operations](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations) and [methods](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) (functions):
```python
# Addition (+) to combine lists
['a','b'] + [1,2] # ['a', 'b', 1, 2]
# Multiplication (*) performs multiple additions
[1,2] * 3 # [1, 2, 1, 2, 1, 2]
# A sample list
s = ['a', 'b', 'c', 'd']
# Add value to end of list
s.append('x') # add 'x' at end
# Add value in middle of list
s.insert(2, 'y') # put 'y' at index 2 (everyone else shifts over)
# Remove from end of list ("pop off")
s.pop() # removes and returns the last item (`x`)
# Remove from middle
s.pop(2) # removes and returns item at index 2 (`y`)
# Remove specific value
s.remove('c') # remove the first 'c' in the list; nothing returned
# Remove all elements
s.clear()
```
- Note that list methods such as `append()` or `clear()` usually **mutate** the existing list value and then return `None`. In comparison, string methods (such as `lower()` or `replace()`) will return a _different_ string value. This is because lists can be changed, but strings cannot (they are _immutable_).
When comparing lists using a _relational operator_ (e.g., `==` or `>`), the operation is applied to the lists **member-wise**: each element in the first list operand is compared to the element _at the same index_ in the second list operand. If the comparison is `True` for _each_ pair of elements, then the expression is `True`. In practice, this means that (a) you can use `==` to compare the contents of lists, and (b) a list is "smaller" than another if it's first item is smaller than the other's.
But be careful: just because two lists have the same contents (are `==`) does not mean that they are _the same list_! In particular, two lists can be _different objects_ (values) but still have the same contents. In Python, you can test whether two values are actually _the same value_ (as opposed to having the same content) using the **`is`** operator.
```python
# With strings, literals are shared (because they cannot be mutated)
str_a = "banana" # `a` labels string literal "banana"
str_b = "banana" # `b` labels string literal "banana"
# Both variables label the same (literal) value
str_a is str_b # True
# With lists, each list created is a different object!
list_a = [1,2,3] # `a` labels a new list [1,2,3]
list_b = [1,2,3] # `b` labels a new list [1,2,3]
a == b # True, have same values as contents
a is b # False, are two different objects
list_c = list_a # `c` labels the value that `a` labels
a is c # True, both are the same object
# Modify the list!
list_a[0] = 10
print(list_b[0]) # 1 (a different list)
print(list_c[0]) # 10 (the same list)
```
Keeping track of whether a _new_ list has been created is particularly important when using lists as **arguments to functions**. Function arguments are local variables that are _assigned_ the passed value—if this value is a list, then the assigned variable will refer to the same list, and any modifications to the argument will affect the value outside of the function:
```python
# A version of a function that modifies the list
def delete_first(a_list):
a_list.pop(0) # modifies the given value
letters = ['a','b','c']
delete_first(letters) # call function
print(letters) # ['b', 'c'], variable is changed
# A version of a function that does not modify the list
def delete_first(a_list):
a_list = a_list[1:] # create new local variable (replacing old local var)
letters = ['a','b','c']
delete_first(letters) # call function
print(letters) #=> ['a', 'b', 'c'], variable is not changed
```
### Lists and Loops
As with other iterable sequences like strings and files, it is possible to iterate through the contents of a list by using a _for loop_:
```python
numbers = [3.98, 8, 10.8, 3.27, 5.21]
for element in numbers: # `number` is a better local variable name
print(element)
# This will not let you modify the list, since the "number" variable is local
for number in numbers:
number = round(number, 1)
print(numbers) # [3.98, 8, 10.8, 3.27, 5.21], not rounded
```
In order to _modify_ the list while iterating through it, you need a way to refer to the element you want to change. Since we refer to elements by their _index_, a better solution is to instead iteate through the _range of indices_ rather than through the elements themselves:
```python
numbers = [3.98, 8, 10.8, 3.27, 5.21]
for i in range(len(numbers)): # `i` for "index"
print(numbers[i]) # refer to elements by index
# Can now modify the list
for i in range(len(numbers)):
numbers[i] = round(numbers[i], 1) # change value at index to rounded version
print(numbers) # [4.0, 8, 10.8, 3.3, 5.2], rounded!
```
- Note that this process of applying some change to each element in a list is known as a **mapping** (each value "maps" or goes to some transformed version of itself). We will discuss mapping more in a later module.
## Nested Lists
As noted at the start of the module, lists elements can be of **any** data type (and any _combination_ of data types)—including other lists! These "lists of lists" are known as **nested lists** or **2-dimensional lists** (or _3d-list_ for a "list of lists of lists", etc). Nested lists are most commonly used to represent information such as _tables_ or _matrices_.
Nested lists work exactly like normal lists; the elements just happen to themselves be indexable (like strings!):
```python
# a list of different dinners available at a fancy party
# this list has 4 elements, each of which is a list of 3 elements
# the indentation is just for human readability
dinner_options = [
["chicken", "mashed potatoes", "mixed veggies"],
["steak", "seasoned potatoes", "asparagus"],
["fish", "seasoned rice", "green beans"],
["portobello steak", "seasoned rice", "green beans"]
]
len(dinner_options) # 4
fish_option = dinner_options[2] # ["fish", "seasoned rice", "green beans"]
# because fish_option is a list, we can reference its elements by index
print(fish_option[0]) # "fish"
```
In this example `fish_option` is a variable that refers to a list, and thus its elements can be accessed by index using bracket notation. But as with any operator or function, it is also possible to use bracket notation on an _anonymous value_ (e.g., a literal value that has not been assigned to a variable). That is, because `dinner_options[2]` is a list, we can use bracket notation refer to an element of that list without assigning it to a variable first:
```python
# Access the 2th element's 0th element
dinner_options[2][0] # "fish"
```
This "pair of brackets" notation allows you to easily access elements within nested lists. This is particularly useful for 2d-lists that represent _tables_ as a list of "rows" (often data records), each of which is a list of "column cells" (often data features):
```python
# a simple table of values
table = [ ['aa','ab','ac','ad'],
['ba','bb','bc','bd'],
['ca','cb','cc','cd'] ]
row = 1 # cells starting with 'b'
col = 3 # cells ending with 'd'
table[row][col] # "bd", the cell at row/col
```
Note that we often use _nested for loops_ to iterate through a _nested list_:
```python
for i in range(len(table)): # go through each row (with index)
for j in range(len(table[i])) # go through each col of that row (with index)
print(table[i][j]) # access ith row, jth column
```
- We use a `j` for the index of the nested loop, because the `i` for "index" was already taken! `row` and `col` are also excellent local variable names.
## Tuples
While lists are _mutable_ (changeable) sequences of data, **tuples** represent ___immutable___ sequences of data. These are useful if you want to enforce that a data value won't be changed, such as for a function argument (or a dictionary key; see [module 10](../../module10-dictionaries)). Indeed, many built-in Python functions utilize tuples.
**Tuples** are written as _comma-separated sequences_ as values. They are often placed inside parentheses for clarity (to help indicate the start and end of the tuple values):
```python
letters_tuple = ('a', 'b', 'c')
print(letters_tuple) # ('a', 'b', 'c')
# also a tuple (without parentheses)
numbers_tuple = 1, 2, 3
print(numbers_tuple) # (1, 2, 3)
# A tuple representing a person's name, age, and whether they are hungry
# Tuple values have _implied_ meanings, which should be explained in comments
hungry_person = ('Ada', 28, True)
# In English, tuples may be named based on the number of elements
triple = (1,2,3)
double = (4,5)
single = (6,) # extra comma indicates is a tuple, not just int `6`
empty = () # an empty expression is a tuple!
type(()) # <class 'tuple'> (type of empty expression)
```
- It's important to note that while we often write tuples in parentheses (and they are printed in parentheses), it is the **commas** that makes a sequence of literals into a tuple. The parentheses act just like like they do in mathematical expressions—they are only necessary to clarify ambiguity in the order-of-operations. You will find that some "idiomatic" expressions using tuples forgo the parentheses, making the syntax look more magical than it is!
Elements in tuples can be accessed by **index** using **bracket notation**, just like the elements in lists. However, tuples _cannot be modified_, so you cannot assign a new value to an index in a tuple:
```python
letter_triple = ('a','b','c')
print(letter_triple[0]) # 'a'
print(letter_triple[1:3]) # ('b','c'), a tuple
letter_triple[0] = 'z' # TypeError!
```
Tuples can be compared using _relational operators_ just like lists, and have the "member-wise" comparison behavior described above. This makes it easy to order the immutable tuples just like you would order numbers or strings.
Finally, tuples provide one additional useful feature. The Python interpreter uses tuples to perform **multiple assignments**, where you assign multiple values to multiple variables in a single statement. We have already been able to assign multiple, comma-separated values to a single _tuple_ variable (a process called **packing**). But Python also supports having a single sequential value (e.g., a _tuple_) be assigned to multiple, comma-separate variables (a process called **unpacking**)!
```python
triple = 1, 2, 3 # assign multiple values to single variable (packing)
print(triple) # (1, 2, 3)
x, y, z = (1,2,3) # assign single tuple value to multiple variables (unpacking)
print(x) # 1
print(y) # 2
print(z) # 3
# the VALUE in this statement is evaluated as a tuple, and then is assigned to
# multiple variables!
a, b, c = x, y, z # a=x; b=y; c=z
# the same process can be used to swap values!
a, b = b, a
```
This is mostly a useful shortcut (_syntactical sugar_), but is also used by some idiomatic Python constructions.