Getting started with Python¶

Raghav K. Chhetri
Keller Lab, HHMI Janelia
03/26/2020

This tutorial is based on
Python: v3.7
OS: 64-bit Windows 10
Make minor modifications wherever necessary if using other platforms

Install Python 3.7 using Anaconda¶

Download Anaconda installer and proceed with installation
Once installation is complete, Python is ready to be launched. Two options:
1. Graphical route > Anaconda Navigator
2. Command line > Anaconda Prompt -- we'll use this option in this tutorial

Get familiar with Anaconda Prompt¶

From the Windows Start menu, search for and open “Anaconda Prompt”:

Check your installation
(base) C:\Users\Raghav>where python

Check Python version
(base) C:\Users\Raghav>python –-version

List all installed packages
(base) C:\Users\Raghav>conda list

Update to latest version
(base) C:\Users\Raghav>conda update conda

Start Python¶

Several options to do so:

Command line execution using Python interpreter
(base) C:\Users\Raghav>python

Interactive command line execution using IPython
(base) C:\Users\Raghav>ipython

Spyder (matlab-like interface)
(base) C:\Users\Raghav>spyder

Jupyter notebook
(base) C:\Users\Raghav>jupyter notebook

Conda enviroments¶

Create a new environment (firstenv here)
(base) C:\Users\Raghav>conda create –-name firstenv python=3.7 anaconda

Activate an environment
(base) C:\Users\Raghav>conda activate firstenv (firstenv)C:\Users\Raghav>

Deactivate an active environment
(firstenv) C:\Users\Raghav>conda deactivate (base) C:\Users\Raghav>

Clone from an existing environment
(base) C:\Users\Raghav>create --name secondenv –-clone firstenv

Delete an existing environment
conda env remove --name firstenv

List all environments
(base) C:\Users\Raghav>conda info --envs

Conda Cheat Sheet

Basics¶

Press Shift + Enter to run the code cell

x = [1, 2, 3, 4, 5]
y = x
print(x,y)

[1, 2, 3, 4, 5] [1, 2, 3, 4, 5]

x.append(6)
print(x,y)

[1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6]

The value of y is also updated since both variables point to the same object in memory

x=['a','b','c']
print(x, y)

['a', 'b', 'c'] [1, 2, 3, 4, 5, 6]

In the above cell, a new value was assigned to 'x', so it now points to another location in memory, whereas 'y' still points to the previous object in memory

Takeaway: variables in Python are pointers to objects in memory

Import packages

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

'Shift + Tab' will show you the Docstring for the object you just typed in a code cell. With 'Shift' pressed, press 'Tab' again to cycle through more documentation

x = np.linspace(-np.pi, np.pi, 201)
print(x.dtype,type(x),len(x))

float64 <class 'numpy.ndarray'> 201

plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]'); plt.ylabel('sin(x)'); plt.axis('tight')
plt.show()

Working with images¶

inPath = 'Y:/Exchange/Raghav/tutorial_python/data/'

from os.path import join
from glob import glob
from skimage.io import imread, imsave

#Grab filenames for all cameras
fnames_cm00 = glob(join(inPath, '*CM00*.tif')); fnames_cm00.sort()
fnames_cm01 = glob(join(inPath, '*CM01*.tif')); fnames_cm01.sort()
fnames_cm02 = glob(join(inPath, '*CM02*.tif')); fnames_cm02.sort()
fnames_cm03 = glob(join(inPath, '*CM03*.tif')); fnames_cm03.sort()

print(fnames_cm00[0])
len(fnames_cm00)

Y:/Exchange/Raghav/tutorial_python/data\SPM00_TM000000_CM00_CHN00_roi.RESLVL2.tif

201

Display opposing-view-images:¶

TM=0
cm00stack = imread(fnames_cm00[TM]) 
cm01stack = imread(fnames_cm01[TM]) 
cm02stack = imread(fnames_cm02[TM]) 
cm03stack = imread(fnames_cm03[TM])

print(type(cm00stack), cm00stack.shape, cm00stack.dtype) #Dimensions of (Z,Y,X)
print(type(cm03stack), cm03stack.shape, cm03stack.dtype)

<class 'numpy.ndarray'> (67, 486, 166) uint16
<class 'numpy.ndarray'> (73, 476, 159) uint16

Define a function to show several images from two imagestacks:

def plot_image_pairs(stackA, stackB, cam_list, index_type, nrows=2, ncols=8, stride=8, vmin=90, vmax=300):
    '''
    This function shows several images from two imagestacks
    Example: plot_image_pairs(cm00stack, cm01stack, ('CM00', 'CM01'), 'z')
    '''
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols,figsize=(21,12))
    for j in range(0,ncols):
        ax[0,j].imshow(stackA[stride*(j+1),:,:], cmap='gray', vmin=vmin, vmax=vmax)
        ax[0,j].axis('off')
        ax[0,j].set_title( cam_list[0] +', ' +index_type +'=%d' % (stride*(j+1)))
        
        ax[1,j].imshow(stackB[stride*(j+1),:,:], cmap='gray',vmin=vmin, vmax=vmax)
        ax[1,j].axis('off')
        ax[1,j].set_title(cam_list[1] + ', ' +index_type +'=%d' % (stride*(j+1)))

plot_image_pairs(cm00stack, cm01stack, ('CM00', 'CM01'), 'z')

plot_image_pairs(cm02stack, cm03stack, ('CM02', 'CM03'), 'z')

pyklb¶

pyklb is a Python wrapper to read/write klb files. On 64-bit Windows running Python 3.7, install as follows:

Download, unzip, and copy pyklb-master to your local drive
For eg., C:\Users\

Activate your Python environment (firstenv here)

Navigate to the \dist\ folder in your local pyklb-master
cd pyklb-master\dist

pip install the wheel present in this folder
pip install pyklb-0.0.3.dev0-cp37-cp37m-win_amd64.whl

Download, unzip, and copy klb.dll to your environment’s \lib\site-packages\ folder
For eg., C:\Anaconda3\envs\firstenv\lib\site-packages\

If you are on a different OS or using a different Python version, build and install everything from scratch from here

Open a .klb file in Python¶

Let’s now open a .klb file in Python to confirm that things are working as expected. Below, I’m using ipython for this:

(base) C:\Users\Raghav>conda activate firstenv (firstenv) C:\Users\Raghav>ipython In [1]: import pyklb as klb In [2]: img = klb.readfull('y:/.../...klb’) In [3]: type(img) In [4]: img.shape In [5]: help(klb)

If this works, then you are all set with loading .klb files in your Python environment!

Read and write klbs¶

import pyklb as klb
img = klb.readfull('Y:/Documents/Slides/Raghav/tutorial_python/test.klb')
print(type(img), img.shape, img.dtype)

<class 'numpy.ndarray'> (78, 256, 94) uint16

You'll notice that the file is loaded in Python as a numpy N-dimensional array. From its shape, we can tell that the numpy array represents a 3D image-stack. Recall that klb can support up to 5D arrays, and the dimension convetion is as follows: [tm,ch,z,y,x]

For this example 3D image-stack, the dimenion convention is thus [z,y,x]

Memory locations

tmp = img
print(hex(id(img)), hex(id(tmp)))

0x1c554895d00 0x1c554895d00

Note that both 'img' and 'tmp' are pointing to the same memory location.

However, if we were to invoke the 'copy' method, then a separate copy is created and mapped to a different memory location.

foo = np.copy(img)
print(hex(id(img)), hex(id(foo)))

0x1c554895d00 0x1c5548168f0

img2 = ((img/np.max(img))*255).astype(np.uint8)
print(type(img2), img2.shape, img2.dtype)
print(hex(id(img)), hex(id(img2)))

<class 'numpy.ndarray'> (78, 256, 94) uint8
0x1c554895d00 0x1c554ac0490

Above, we took the original 'img' array and changed it to 8-bit and saved it as 'img2'

So, a new array was created in the process, and thus the memory locations of 'img' and 'img2' are different.

Save the 8-bit image as a new .klb file:

klb.writefull(img2, filepath='Y:/Exchange/Raghav/tutorial_python/img-8bit.klb')

Note: you can seek more info by calling the 'help' function as: help(klb)

Specify the sampling of your data in microns:

klb.writefull(img2, filepath='Y:/Exchange/Raghav/tutorial_python/img-8bit-um.klb', 
              pixelspacing_tczyx = (4.06125,0.8125,0.8125))

img3 = klb.readfull('Y:/Exchange/Raghav/tutorial_python/img-8bit-um.klb')

Display an image¶

To display an image, we'll use matplotlib, a library for creating visualizations in Python.

import matplotlib.pyplot as plt
#Show the graphics in the notebook
%matplotlib inline

Let's grab a single plane from the 3D image-stack and display it in the notebook as follows

zslice = img3[24,:,:] # This is an example of slicing

Note that the index in Python starts at 0, so if you want to grab the first z-plane, here is how you would do it first_zslice = img3[0,:,:]

And, the last z-plane can be grabbed using an index of '-1' last_zslice = img3[-1,:,:]

plt.figure(figsize=(8,8))
plt.imshow(zslice,cmap='gray',interpolation='none')
plt.colorbar()
plt.show()

Select a ROI of the image by slicing the numpy array

roi = zslice[180:230,0:50]
print(type(roi),roi.mean(), roi.min(), roi.max())

plt.imshow(roi,cmap='gray',interpolation='none')
plt.show()

<class 'numpy.ndarray'> 83.4728 61 161

Display multiple images¶

Instead of looking at a single image, let's display several images from our 3D image-stack on a subplot. For this, we'll write a function and we'll also use a built-in function called 'range' 'which generates a sequence of numbers as shown here

a = range(0,5)
print(type(a))
list(a)

<class 'range'>

[0, 1, 2, 3, 4]

Define a function to show several images from an imagestack:

def plot_images(stackA, axial_label, stride=8, vmin=60, vmax=255, cmap='hot'):
    '''
    This function shows several images from an imagestack
    Example: plot_images(cm00stack, 'z')
    '''
    ncols = int(stackA.shape[0]/stride)
    fig, ax = plt.subplots(1,ncols,figsize=(21,12))
    for j in range(0,ncols):
        ax[j].imshow(stackA[stride*(j),:,:], cmap=cmap, interpolation ='none', vmin=vmin, vmax=vmax)
        ax[j].axis('off')
        ax[j].set_title(axial_label +'=%d' % (stride*j))

plot_images(img3,'Zplane',stride=16)

Orthogonal view¶

Before we look at orthogonal views of our 3D image-stack, let's first generate a random 3x3 matrix and get familiar with numpy's 'transpose' method

a = np.random.randint(0,10,size=(3,3))
print(a)

[[0 7 8]
 [3 8 3]
 [6 2 2]]

a.transpose(), a.T

(array([[0, 3, 6],
        [7, 8, 2],
        [8, 3, 2]]),
 array([[0, 3, 6],
        [7, 8, 2],
        [8, 3, 2]]))

hex(id(a)), hex(id(a.transpose())), hex(id(a.T))

('0x1c55493fdf0', '0x1c55490d030', '0x1c55490d030')

#Transpose an image-stack
imgY = img3.T
print('Original shape',img3.shape)
print('After transpose',imgY.shape)

Original shape (78, 256, 94)
After transpose (94, 256, 78)

The transposed 3D stack has the X and Z dimensions swapped. This is a 90 degree rotation about the Y-axis.

plot_images(imgY, axial_label='Y', stride =16)

Now swap the first two dimensions (Z and Y) to get a 90 degree rotation about X-axis

imgX=img3.transpose((1,0,2))
plot_images(imgX, axial_label='X', stride=64)
print('Original shape',img3.shape)
print('After transpose',imgX.shape)

Original shape (78, 256, 94)
After transpose (256, 78, 94)

Memory layout of the orignal 3D array and transposed 3D arrays that we generated:

print(img3.flags)

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

print(imgX.flags)

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

The arrays need to be 'C_CONTIGUOUS', otherwise you'll get an error when you try to save the array as klb. So, let's make them so before saving the rotated image-stacks

imgX = np.ascontiguousarray(imgX)
imgY = np.ascontiguousarray(imgY)

print(imgX.flags['C_CONTIGUOUS'])
print(imgY.flags['C_CONTIGUOUS'])

True
True

klb.writefull(imgX, filepath='Y:/Exchange/Raghav/tutorial_python/imgX-8bit-um.klb')
klb.writefull(imgY, filepath='Y:/Exchange/Raghav/tutorial_python/imgY-8bit-um.klb')

Interactive visualization: napari¶

You'll soon miss the interactivity that we are accustomed to with ImageJ/Fiji and 'matplotlib' doesn't offer those features. An image-viewer for Python called 'napari' fills that void.

You can learn more on how to install 'napari' here: https://napari.org/

They also have several tutorials to get you started. It'd be benficial to familiarize yourself with another library in Python called Dask (dask array in particular). Dask is already installed in your Python environment, so you can start using it right away!

Dask arrays are like Numpy arrays, with the key difference being that they can be arbirtrily chunks within a grid. Go through '01_dask_array.ipynb' for more on this topic

A quick tour of Dask Arrays¶

Dask arrays coordinate many Numpy arrays, arranged into chunks within a grid. They support a large subset of the Numpy API.

from dask.distributed import Client, progress
client = Client(processes=False, threads_per_worker=8,
                n_workers=8, memory_limit='12GB')
client

import dask.array as da
import time
x = da.random.random((10000, 10000), chunks=(1000, 1000))
x

y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z

Call .compute() when you want your result as a NumPy array.

z.compute()

array([0.99212105, 0.99908829, 1.00056553, ..., 0.99885682, 1.00162508,
       0.98664376])

If you have available RAM for your dataset then you can persist data in memory. This allows future computations to be much faster.

y = y.persist()

%%time
y[0, 0].compute()

Wall time: 359 ms

0.1915126491067638

%%time 
y.sum().compute()

Wall time: 279 ms

99996196.87110524

Simple test functions¶

These functions do simple operations like add two numbers together, but they sleep for a random amount of time to simulate real work.

import random
def inc(x):
    time.sleep(random.random())
    return x + 1
def dec(x):
    time.sleep(random.random())
    return x - 1   
def add(x, y):
    time.sleep(random.random())
    return x + y

We can run them like normal Python functions below

%%time
x = inc(1); y = dec(2); z = add(x, y)
z

Wall time: 773 ms

3

These ran one after the other, in sequence. Note though that the first two lines inc(1) and dec(2) don't depend on each other, we could have called them in parallel had we been clever.

Annotate functions with Dask Delayed to make them lazy¶

We can call dask.delayed on our funtions to make them lazy. Rather than compute their results immediately, they record what we want to compute as a task into a graph that we'll run later on parallel hardware.

import dask
inc = dask.delayed(inc)
dec = dask.delayed(dec)
add = dask.delayed(add)

Calling these lazy functions is now almost free. We're just constructing a graph

xx = inc(1); yy = dec(2); zz = add(xx, yy)
zz

Delayed('add-49e3046e-ae0e-47de-a146-4efe14731d42')

%%time
zz.compute()

Wall time: 1.66 s

3

Where to from here?¶

Python Bio-image Analysis, EMBL Heidelburg¶

https://github.com/WhoIsJack/python-bioimage-analysis-tutorial

BioPython¶

Biopython is a library of computational molecular biology tools. conda install -c conda-forge biopython

# Example Script:
# Import the sequence and alphabet toolboxes from the Biopython library
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

# Define a sequence as a DNA Seq object
seq = Seq("CCTCAGCGAGGACAGCAAGGGACTAGCC", IUPAC.unambiguous_dna)

# We've assigned a DNA alphabet to our sequence
alphabet = seq.alphabet

complement = seq.complement()
reverse_complement = seq.reverse_complement()

# Use the transcribe() method to switch alphabets to RNA 
mRNA = seq.transcribe()

# Use the translate() method to get the corresponding protein sequence
protein =  mRNA.translate()