1. Jupyter notebooks

Jupyter is part of Anaconda pack of packages, but I prefer simple commands:

$ python3 -m pip install jupyter

Jupyter notebooks run a web server that can be queried either directly from web console, or via terminal. To find what localhost:port is used by a running server:

$ jupyter notebook list

To stop a server:

$ jupyter notebook stop 8888

To open a notebook file (with .ipynb extension), in terminal point to a notebook directory and run:

$ jupyter notebook

This starts a server, then opens a notebook's home directory in a browser. Click a notebook to run it. Keep terminal shell open and running a server as an attached process. To stop a server use Ctrl-C, or just close a terminal.

Shift-Return to run selected cell and advance to the next.

2. Paths to Python components and packages

>>> import sys

>>> print('\n'.join(sys.path))

3. PYZO editor for .py

PYZO works for me. It is lightweight, has debugging options, and is configurable.
Supports cells, blocks of code that can be executed with a simple shortcut (Cmd-Return). Define cells by delimiting with #$$ optional commentary lines.

Some shortcuts (original and my own custom):

---------- EDITOR:

Opt-Tab - select previous file

F1 - focus to shell panel

F2 - focus to file editor

Cmd-/ - comment selection

Cmd-Opt-/ - uncomment

---------- RUN:

Cmd-R - run file as a script (in console restarts interpreter)

Cmd-Return - run cell with cursor (cells are delimited by #%%)

Opt-Return - run selection

---------- DEBUGGER:

Cmd-B - toggle breakpoint

F6 - step over

F7 - step in

F8 - step out

Ctrl-Cmd-Y - continue

Ctrl-Cmd-. - stop debugging

4. Basics of numpy arrays

Many ways to init arrays. There are two main options - shape and dtype.

>>> x = np.ndarray(shape=(2, 2), dtype=np.int8, order='C')

>>> print(x)

[[1 0] [1 1]]

Internal buffer is linear. Shape can be changed easily without reallocation if total element count remains the same:

>>> x.shape = (1,4)

>>> print(x)

[[1 0 1 1]]

Other array constructors:

>>> print(np.array((1, 2, 3)))

[1 2 3]

>>> print(np.zeros((2, 3)))

[[0. 0. 0.]

[0. 0. 0.]]

>>> print(np.empty((2,)))

[7.74860419e-304 7.74860419e-304]

Array construction using list comprehensions. Note, unspecified dimension size of -1 infers the required element count in that dimension:

>>> x = np.array([(x, y) for x in [1,2,3] for y in [3,1,4] if x != y])

>>> print(x)

[[1 3]

[1 4]

[2 3]

[2 1]

[2 4]

[3 1]

[3 4]]

>>> x.shape = (2, -1)

>>> print(x)

[[1 3 1 4 2 3 2]

[1 2 4 3 1 3 4]]

>>> x.shape = (-1)

>>> print(x)

[1 3 1 4 2 3 2 1 2 4 3 1 3 4]

Numpy arrays in list comprehension expressions:

>>> y = np.array([e for e in x if not e % 2])

>>> print(y)

[4 2 2 2 4 4]

Range into 2D array:

>>> x = np.arange(15).reshape(5, -1).T

>>> print(x)

[[ 0 3 6 9 12]

[ 1 4 7 10 13]

[ 2 5 8 11 14]]

Numpy array operations perform like native C memory access ops (or parallelized vectors in GPU). Therefore much faster than Python's list comprehensions:

#%% Timing init

import time

x1 = np.arange(1000000)

x2 = np.arange(1000000)

#%% Timing native

t0 = time.time()

for _ in range(10):

x1 **= 2

t1 = time.time()

print("x1 time = ", t1 - t0)

#%% Timing list comprehension

t0 = time.time()

for _ in range(10):

x2 = np.array([x ** 2 for x in x1])

t1 = time.time()

print("x2 time = ", t1 - t0)

#%%

----------------------------------------

x1 time = 0.015141725540161133

x2 time = 4.744795083999634

5. Numpy.random

To generate standard numpy arrays filled with random numbers:

np.random.rand(d0,..dn) - with each value uniformly in range [0, 1).

np.random.randn(d0,..dn) - with each value in Gaussian distribution, where mean = 0, variance = 1 (sigma squared).

sigma * np.random.randn(d0,..dn) + mean - a full normal distribution.

There are lots of other useful functions, i.e. np.random.choice(...).

6. Array operations

Slice of an array returns a data structure that defines subrange and points at the original array. Note that the second index (i.e. in [3:10]) points beyond the last element:

x1 = np.arange(12)

print(x1)

x1_slice = x1[3:10]

print('x1_slice = ', x1_slice)

print('x1_slice[0:3] = ', x1_slice[0:3])

x1_slice[0:3] = 100

print(x1)

----------------------------------------

[ 0 1 2 3 4 5 6 7 8 9 10 11]

x1_slice = [3 4 5 6 7 8 9]

x1_slice[0:3] = [3 4 5]

[ 0 1 2 100 100 100 6 7 8 9 10 11]

Multi-dim array slices:

x1 = np.arange(12).reshape(4,-1)

print(x1)

x1_slice = x1[2:4]

print(x1_slice)

print(x1_slice[0][1:3])

----------------------------------------

[[ 0 1 2]

[ 3 4 5]

[ 6 7 8]

[ 9 10 11]]

[[ 6 7 8]

[ 9 10 11]]

[7 8]

Slicing across multiple dimensions:

print(x1[:2, 1:3])

x1[:, 1:2] = 100

print(x1)

----------------------------------------

[[1 2]

[4 5]]

[[ 0 100 2]

[ 3 100 5]

[ 6 100 8]

[ 9 100 11]]

7. Boolean indexing

Taken and modified from the pydata-book.

A boolean operation with array will return an array of boolean element-wise results. Boolean array when used as index will pick array elements where boolean component is True.

Boolean array size must be equal to data array size at the index:

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Joe'])

data = np.random.randint(1, 10, (5, 4))

print(names)

print(data)

mask = (names == 'Bob') | (names == 'Will')

print(mask)

print(data[mask])

data[mask, 1:3] = 100.

print(data)

data[data > 5] = 0

print(data)

----------------------------------------

['Bob' 'Joe' 'Will' 'Bob' 'Joe']

[[1 4 3 6]

[6 1 3 3]

[7 1 6 1]

[3 3 9 4]

[4 7 8 9]]

[ True False True True False]

[[1 4 3 6]

[7 1 6 1]

[3 3 9 4]]

[[ 1 100 100 6]

[ 6 1 3 3]

[ 7 100 100 1]

[ 3 100 100 4]

[ 4 7 8 9]]

[[1 0 0 0]

[0 1 3 3]

[0 0 0 1]

[3 0 0 4]

[4 0 0 0]]

8. Fancy indexing

Passing list of numbers into an indexer will treat it as an index picker. Allows assembling a new array from a combination of elements of another array:

x1 = np.arange(20).reshape((5, 4))

print(x1)

x2 = x1[[1, 4, 2, 2], [0, 3, 1, 2]]

print(x2)

x1[[1, 4, 2, 2], [0, 3, 1, 2]] = 100

print(x1)

----------------------------------------

[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]

[12 13 14 15]

[16 17 18 19]]

[ 4 19 9 10]

[[ 0 1 2 3]

[100 5 6 7]

[ 8 100 100 11]

[ 12 13 14 15]

[ 16 17 18 100]]

9. Transposition

Transposed matrix is pointing at the same data (no copying takes place):

x1 = np.arange(15).reshape((3, 5))

print(x1)

x2 = x1.T

print(x2)

x2[2] = 100

print(x1)

----------------------------------------

[[ 0 1 2 3 4]

[ 5 6 7 8 9]

[10 11 12 13 14]]

[[ 0 5 10]

[ 1 6 11]

[ 2 7 12]

[ 3 8 13]

[ 4 9 14]]

[[ 0 1 100 3 4]

[ 5 6 100 8 9]

[ 10 11 100 13 14]]

References:

1. https://github.com/wesm/pydata-book
2. https://leemendelowitz.github.io/blog/how-does-python-find-packages.html

3. https://docs.python.org/3/installing/index.html

4. http://nbviewer.jupyter.org/github/pydata/pydata-book/blob/2nd-edition/ch02.ipynb#
5. https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences

A month ago I prepared a quick team and a project for consideration at Microsoft's Imagine Cup. It was an intriguing experience to document details, to present in a video, and to polish a previously developed prototype - a wireless motion/gesture capturing app and a set of sensors.
I understood there was a slim chance to actually win the Cup, so I instead focused on setting up this project for possible future applications elsewhere.

The team is called Heuristic Clinic, the project is Applied Neural Diagnosis. Brief description can be found here: Heuristic Clinic, and the resulting certificate can be viewed here.
They were kind to offer $500 worth of Azure cloud, but I had to decline because the biggest expense is funding client-side development. Appreciated.

2018 Big Idea Challenge

Top 50 - 2018 Big Idea Challenge

This Certificate is Presented To:

alexey l

Team Heuristic Clinic

Congratulations and thank you for your hard work and dedication as a competitor in Imagine Cup 2018. You are now part of an elite international community of students who have shown remarkable creativity and innovation to push technology forward.

Pages

Sunday, March 25, 2018

ML with Python (numpy, pandas), my tools + code snippets (Part 1)

Contents:

1. Jupyter notebooks

2. Paths to Python components and packages

3. PYZO editor for .py

4. Basics of numpy arrays

5. Numpy.random

6. Array operations

7. Boolean indexing

8. Fancy indexing

9. Transposition

Wednesday, March 21, 2018

Microsoft's Imagine Cup Challenge - Team Heuristic Clinic

2018 Big Idea Challenge

Top 50 - 2018 Big Idea Challenge

alexey l

Team Heuristic Clinic

Pablo Veramendi
Imagine Cup Competition Manager
Microsoft Corporation

Pages

Sunday, March 25, 2018

ML with Python (numpy, pandas), my tools + code snippets (Part 1)

Contents:

1. Jupyter notebooks

2. Paths to Python components and packages

3. PYZO editor for .py

4. Basics of numpy arrays

5. Numpy.random

6. Array operations

7. Boolean indexing

8. Fancy indexing

9. Transposition

Wednesday, March 21, 2018

Microsoft's Imagine Cup Challenge - Team Heuristic Clinic

2018 Big Idea Challenge

Top 50 - 2018 Big Idea Challenge

alexey l

Team Heuristic Clinic

Pablo VeramendiImagine Cup Competition ManagerMicrosoft Corporation

Pablo Veramendi
Imagine Cup Competition Manager
Microsoft Corporation