You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
accessing the array is easy: the object knows all the usual
mathematical operations; all these operations are pointwise, so you
can just
printdata, data+2*data/data**0.5-data**2
this is also reasonably efficient!
we can easily access slices (subarrays) as well
array[a:b] gives you elements array[a] to array[b-1]
multidimensional arrays have dimensions separated by commas: array[a:b,c:d,e:f]
any missing endpoint is treated as the extreme point (inclusive):
missing starting point becomes first element and missing endpoint
becomes last element (note that this is inclusive)
negative indices count from the end of the array
note that there is no way to define a slice with a negative
endpoint index which includes last element of the array
slices can stride array[a:b:c] gives every cth point from a
(inclusive) to b (exclusive)
or can slice backwards: array[a:b:-c] gives every cth point
backwards from a (inclusive) to b (exclusive)
if any axis of your slice contains no points at all, you get an
empty array: if you want to remove a dimension, use a trivial
slice of one element
printmat, mat**2, rowvec, colvecprintcolvec*mat1, mat2*colvec.T, mat2*rowvecprintmat1*colvec# this raises ValueError: (3,4) matrix cannot be multipied from the left by (1,3) matrix
never think or call an array a matrix or vice versa: they obey different arithmetic
but to a certain extent, arrays are vectors: they can be broadcast
to 1-column and 1-row matrices, but do not have the usual transposes
(they DO have a transpose, though), arithmetic is array arithmetic etc
arrayvec=numpy.random.random((3))
printarrayvec, arrayvec.Tprintarrayvec*mat1printmat2*arrayvec# gives an error
writing efficient numpy code
Let’s also take a sneak peek into next topic, profiling while we look at
how to do numerics using Laplacian as an example
Laplacian is obviously related to PDEs, but the arithmetic is very
similar to e.g. a discrete low pass filter like $yi = xi-1 + a
(yi - xi-1)$ or output[i] : output[i-1] + a * (input[i] -
output[i-1])= with output[0] = input[0]
in python we could do
importnumpyimportcProfileimporttimeastimemoddefinit_data(sizes):
returnnumpy.random.random(sizes)
defLaplacian(data, lapl, d):
foriiinrange(1,data.shape[0]-1):
forjjinrange(1,data.shape[1]-1):
forkkinrange(1,data.shape[2]-1):
lapl[ii,jj,kk] = (
(data[ii-1,jj,kk] -2*data[ii,jj,kk] +data[ii+1,jj,kk])/d[0]*d[1]*d[2] +
(data[ii,jj-1,kk] -2*data[ii,jj,kk] +data[ii,jj+1,kk])/d[1]*d[0]*d[2] +
(data[ii,jj,kk-1] -2*data[ii,jj,kk] +data[ii,jj,kk+1])/d[2]*d[0]*d[1])
returndefrunone(func):
d=numpy.array([0.1,0.1,0.1])
data=init_data((100,100,100))
lapl=numpy.zeros_like(data)
cp=cProfile.Profile()
start=timemod.clock()
cp.runcall(func, data, lapl, d)
end=timemod.clock()
print("cProfile gave total time of {time} s and the following profile.".format(time=end-start))
cp.print_stats()
L=runone(Laplacian)
that took a while (9.4 s on my laptop)! Any ideas why?
let’s try numpy-style without explicit loops
numpy converts operatins between sliced or whole numpy arrays into vectorised loops
note that this can deceive you: how much memory does array_A = array_B + array_C*array_D consume? How many memory accesses does it contain?
let’s see how cython works and improves performance
everything from %%cython to the next empty line will be saved to
a tepmorary file, turned into a C code using cython and then
compiled into a python module which is then imported
when cython runs, it does not see our current namespace (it is a
separate process), so we need to import whatever we use
there is also a special cimport command, which imports “into C code”
the @cython lines are decorators which affect how cython
treats the following function: we want no bounds checking on our
arrays and we want $1/0$ to produce $∞$ instead of python’s
ZeroDivisionError
this is more or less standard cython preamble
notice also the type definitions in the function definition:
always type everything in cython as if you do not, cython
treats them as pytohn objects with all the performance penalty
that implies
we can do still better: the gcc compiler used does not realise that
the lattice constants do not change from lattice site to lattice
site, so the /d[0]*d[1]*d[2] etc could be done just once and then
multiplied (never divide if you can avoid it!) into the stencil:
even compared to the vectorised pure python, it is 10x
compiling with cython outside of python
save the code into a file (complicated to arrange in a jupyter
notebook, so get profiling.pyx and setup.py from the repo and
place in the right directory
run python setup.py build_ext --inplace to get a module called
profiling you can import
Profiling
we already know cProfile, but let’s see what it gives in a more complicated example
morecomplicatedcProfile
cython’s profiling capabilities are also of interes: in earlier
examples, we saw just something like
_cython_magic_c63ab7889ce7cc65e5cd8f75df5d29ae.Laplacian_cython2
and that’s all we would have seen even if the cython code would have
had deeper call hierarchies: cProfile cannot see into cython without
cython giving it a hand
This is a problem with profiling: the overheads of setting up
profiling of an oft-called function will give the wrong impression
of how much time the caller takes.
two more useful tricks: inlining and defining a pure-C function (not directly callable from python)