Contents:
2. Path to packages
3. PYZO editor for .py
4. Basics of numpy arrays
5. Numpy.random
6. Array operations
7. Boolean indexing
8. Fancy indexing
9. Transposition
3. PYZO editor for .py
4. Basics of numpy arrays
5. Numpy.random
6. Array operations
7. Boolean indexing
8. Fancy indexing
9. Transposition
1. Jupyter notebooks
Jupyter is part of Anaconda pack of packages, but I prefer simple commands:$ python3 -m pip install jupyterJupyter notebooks run a web server that can be queried either directly from web console, or via terminal. To find what localhost:port is used by a running server:
$ jupyter notebook list
To stop a server:
$ jupyter notebook stop 8888
To open a notebook file (with .ipynb extension), in terminal point to a notebook directory and run:
This starts a server, then opens a notebook's home directory in a browser. Click a notebook to run it. Keep terminal shell open and running a server as an attached process. To stop a server use Ctrl-C, or just close a terminal.$ jupyter notebook
Shift-Return to run selected cell and advance to the next.
2. Paths to Python components and packages
>>> import sys
>>> print('\n'.join(sys.path))
3. PYZO editor for .py
PYZO works for me. It is lightweight, has debugging options, and is configurable.
Supports cells, blocks of code that can be executed with a simple shortcut (Cmd-Return). Define cells by delimiting with #$$ optional commentary lines.
Supports cells, blocks of code that can be executed with a simple shortcut (Cmd-Return). Define cells by delimiting with #$$ optional commentary lines.
Some shortcuts (original and my own custom):
---------- EDITOR:
Opt-Tab - select previous file
F1 - focus to shell panel
F2 - focus to file editor
Cmd-/ - comment selection
Cmd-Opt-/ - uncomment
---------- RUN:
Cmd-R - run file as a script (in console restarts interpreter)
Cmd-Return - run cell with cursor (cells are delimited by #%%)
Opt-Return - run selection
---------- DEBUGGER:
Cmd-B - toggle breakpoint
F6 - step over
F7 - step in
F8 - step out
Ctrl-Cmd-Y - continue
Ctrl-Cmd-. - stop debugging
4. Basics of numpy arrays
Many ways to init arrays. There are two main options - shape and dtype.>>> x = np.ndarray(shape=(2, 2), dtype=np.int8, order='C')
>>> print(x)
[[1 0] [1 1]]Internal buffer is linear. Shape can be changed easily without reallocation if total element count remains the same:
>>> x.shape = (1,4)
>>> print(x)
[[1 0 1 1]]Other array constructors:
>>> print(np.array((1, 2, 3)))
[1 2 3]
>>> print(np.zeros((2, 3)))
[[0. 0. 0.]
[0. 0. 0.]]
>>> print(np.empty((2,)))
[7.74860419e-304 7.74860419e-304]
Array construction using list comprehensions. Note, unspecified dimension size of -1 infers the required element count in that dimension:
>>> x = np.array([(x, y) for x in [1,2,3] for y in [3,1,4] if x != y])
>>> print(x)
[[1 3]
[1 4]
[2 3]
[2 1]
[2 4]
[3 1]
[3 4]]
>>> x.shape = (2, -1)
>>> print(x)
[[1 3 1 4 2 3 2]
[1 2 4 3 1 3 4]]
>>> x.shape = (-1)
>>> print(x)
[1 3 1 4 2 3 2 1 2 4 3 1 3 4]
Numpy arrays in list comprehension expressions:
>>> y = np.array([e for e in x if not e % 2])
>>> print(y)
[4 2 2 2 4 4]
Range into 2D array:
>>> x = np.arange(15).reshape(5, -1).T
>>> print(x)
[[ 0 3 6 9 12]
[ 1 4 7 10 13]
[ 2 5 8 11 14]]
Numpy array operations perform like native C memory access ops (or parallelized vectors in GPU). Therefore much faster than Python's list comprehensions:
#%% Timing init
import time
x1 = np.arange(1000000)
x2 = np.arange(1000000)
#%% Timing native
t0 = time.time()
for _ in range(10):
x1 **= 2
t1 = time.time()
print("x1 time = ", t1 - t0)
#%% Timing list comprehension
t0 = time.time()
for _ in range(10):
x2 = np.array([x ** 2 for x in x1])
t1 = time.time()
print("x2 time = ", t1 - t0)
#%%
----------------------------------------
x1 time = 0.015141725540161133
x2 time = 4.744795083999634
5. Numpy.random
To generate standard numpy arrays filled with random numbers:np.random.rand(d0,..dn) - with each value uniformly in range [0, 1).
np.random.randn(d0,..dn) - with each value in Gaussian distribution, where mean = 0, variance = 1 (sigma squared).
sigma * np.random.randn(d0,..dn) + mean - a full normal distribution.
There are lots of other useful functions, i.e. np.random.choice(...).
6. Array operations
Slice of an array returns a data structure that defines subrange and points at the original array. Note that the second index (i.e. in [3:10]) points beyond the last element:x1 = np.arange(12)
print(x1)
x1_slice = x1[3:10]
print('x1_slice = ', x1_slice)
print('x1_slice[0:3] = ', x1_slice[0:3])
x1_slice[0:3] = 100
print(x1)
----------------------------------------
[ 0 1 2 3 4 5 6 7 8 9 10 11]
x1_slice = [3 4 5 6 7 8 9]
x1_slice[0:3] = [3 4 5]
[ 0 1 2 100 100 100 6 7 8 9 10 11]
Multi-dim array slices:
x1 = np.arange(12).reshape(4,-1)
print(x1)
x1_slice = x1[2:4]
print(x1_slice)
print(x1_slice[0][1:3])
----------------------------------------
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[ 6 7 8]
[ 9 10 11]]
[7 8]
Slicing across multiple dimensions:
print(x1[:2, 1:3])
x1[:, 1:2] = 100
print(x1)
----------------------------------------
[[1 2]
[4 5]]
[[ 0 100 2]
[ 3 100 5]
[ 6 100 8]
[ 9 100 11]]
7. Boolean indexing
Taken and modified from the pydata-book.
A boolean operation with array will return an array of boolean element-wise results. Boolean array when used as index will pick array elements where boolean component is True.
Boolean array size must be equal to data array size at the index:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Joe'])
data = np.random.randint(1, 10, (5, 4))
print(names)
print(data)
mask = (names == 'Bob') | (names == 'Will')
print(mask)
print(data[mask])
data[mask, 1:3] = 100.
print(data)
data[data > 5] = 0
print(data)
----------------------------------------
['Bob' 'Joe' 'Will' 'Bob' 'Joe']
[[1 4 3 6]
[6 1 3 3]
[7 1 6 1]
[3 3 9 4]
[4 7 8 9]]
[ True False True True False]
[[1 4 3 6]
[7 1 6 1]
[3 3 9 4]]
[[ 1 100 100 6]
[ 6 1 3 3]
[ 7 100 100 1]
[ 3 100 100 4]
[ 4 7 8 9]]
[[1 0 0 0]
[0 1 3 3]
[0 0 0 1]
[3 0 0 4]
[4 0 0 0]]
8. Fancy indexing
Passing list of numbers into an indexer will treat it as an index picker. Allows assembling a new array from a combination of elements of another array:
x1 = np.arange(20).reshape((5, 4))
print(x1)
x2 = x1[[1, 4, 2, 2], [0, 3, 1, 2]]
print(x2)
x1[[1, 4, 2, 2], [0, 3, 1, 2]] = 100
print(x1)
----------------------------------------
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
[ 4 19 9 10]
[[ 0 1 2 3]
[100 5 6 7]
[ 8 100 100 11]
[ 12 13 14 15]
[ 16 17 18 100]]
9. Transposition
Transposed matrix is pointing at the same data (no copying takes place):
x1 = np.arange(15).reshape((3, 5))
print(x1)
x2 = x1.T
print(x2)
x2[2] = 100
print(x1)
----------------------------------------
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
[[ 0 5 10]
[ 1 6 11]
[ 2 7 12]
[ 3 8 13]
[ 4 9 14]]
[[ 0 1 100 3 4]
[ 5 6 100 8 9]
[ 10 11 100 13 14]]
References:
1. https://github.com/wesm/pydata-book
2. https://leemendelowitz.github.io/blog/how-does-python-find-packages.html
2. https://leemendelowitz.github.io/blog/how-does-python-find-packages.html
3. https://docs.python.org/3/installing/index.html
4. http://nbviewer.jupyter.org/github/pydata/pydata-book/blob/2nd-edition/ch02.ipynb#
5. https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences
5. https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences