Getting started with Python

Raghav K. Chhetri
Keller Lab, HHMI Janelia
03/26/2020

This tutorial is based on
Python: v3.7
OS: 64-bit Windows 10
Make minor modifications wherever necessary if using other platforms

Install Python 3.7 using Anaconda

  1. Download Anaconda installer and proceed with installation

  2. Once installation is complete, Python is ready to be launched. Two options:

    1. Graphical route > Anaconda Navigator
    2. Command line > Anaconda Prompt -- we'll use this option in this tutorial

Get familiar with Anaconda Prompt

From the Windows Start menu, search for and open “Anaconda Prompt”:

  • Check your installation
    (base) C:\Users\Raghav>where python
  • Check Python version
    (base) C:\Users\Raghav>python –-version
  • List all installed packages
    (base) C:\Users\Raghav>conda list
  • Update to latest version
    (base) C:\Users\Raghav>conda update conda

Start Python

Several options to do so:

  • Command line execution using Python interpreter
    (base) C:\Users\Raghav>python
  • Interactive command line execution using IPython
    (base) C:\Users\Raghav>ipython
  • Spyder (matlab-like interface)
    (base) C:\Users\Raghav>spyder
  • Jupyter notebook
    (base) C:\Users\Raghav>jupyter notebook

Conda enviroments

  • Create a new environment (firstenv here)
    (base) C:\Users\Raghav>conda create –-name firstenv python=3.7 anaconda
  • Activate an environment
    (base) C:\Users\Raghav>conda activate firstenv (firstenv)C:\Users\Raghav>
  • Deactivate an active environment
    (firstenv) C:\Users\Raghav>conda deactivate (base) C:\Users\Raghav>
  • Clone from an existing environment
    (base) C:\Users\Raghav>create --name secondenv –-clone firstenv
  • Delete an existing environment
    conda env remove --name firstenv
  • List all environments
    (base) C:\Users\Raghav>conda info --envs

Basics

Press Shift + Enter to run the code cell

In [1]:
x = [1, 2, 3, 4, 5]
y = x
print(x,y)
[1, 2, 3, 4, 5] [1, 2, 3, 4, 5]
In [2]:
x.append(6)
print(x,y)
[1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6]

The value of y is also updated since both variables point to the same object in memory

In [3]:
x=['a','b','c']
print(x, y)
['a', 'b', 'c'] [1, 2, 3, 4, 5, 6]

In the above cell, a new value was assigned to 'x', so it now points to another location in memory, whereas 'y' still points to the previous object in memory

Takeaway: variables in Python are pointers to objects in memory

Import packages

In [4]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

'Shift + Tab' will show you the Docstring for the object you just typed in a code cell. With 'Shift' pressed, press 'Tab' again to cycle through more documentation

In [5]:
x = np.linspace(-np.pi, np.pi, 201)
print(x.dtype,type(x),len(x))
float64 <class 'numpy.ndarray'> 201
In [6]:
plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]'); plt.ylabel('sin(x)'); plt.axis('tight')
plt.show()

Working with images

In [7]:
inPath = 'Y:/Exchange/Raghav/tutorial_python/data/'  
In [8]:
from os.path import join
from glob import glob
from skimage.io import imread, imsave
In [9]:
#Grab filenames for all cameras
fnames_cm00 = glob(join(inPath, '*CM00*.tif')); fnames_cm00.sort()
fnames_cm01 = glob(join(inPath, '*CM01*.tif')); fnames_cm01.sort()
fnames_cm02 = glob(join(inPath, '*CM02*.tif')); fnames_cm02.sort()
fnames_cm03 = glob(join(inPath, '*CM03*.tif')); fnames_cm03.sort()
In [10]:
print(fnames_cm00[0])
len(fnames_cm00)
Y:/Exchange/Raghav/tutorial_python/data\SPM00_TM000000_CM00_CHN00_roi.RESLVL2.tif
Out[10]:
201

Display opposing-view-images:

In [11]:
TM=0
cm00stack = imread(fnames_cm00[TM]) 
cm01stack = imread(fnames_cm01[TM]) 
cm02stack = imread(fnames_cm02[TM]) 
cm03stack = imread(fnames_cm03[TM]) 
In [12]:
print(type(cm00stack), cm00stack.shape, cm00stack.dtype) #Dimensions of (Z,Y,X)
print(type(cm03stack), cm03stack.shape, cm03stack.dtype)
<class 'numpy.ndarray'> (67, 486, 166) uint16
<class 'numpy.ndarray'> (73, 476, 159) uint16

Define a function to show several images from two imagestacks:

In [13]:
def plot_image_pairs(stackA, stackB, cam_list, index_type, nrows=2, ncols=8, stride=8, vmin=90, vmax=300):
    '''
    This function shows several images from two imagestacks
    Example: plot_image_pairs(cm00stack, cm01stack, ('CM00', 'CM01'), 'z')
    '''
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols,figsize=(21,12))
    for j in range(0,ncols):
        ax[0,j].imshow(stackA[stride*(j+1),:,:], cmap='gray', vmin=vmin, vmax=vmax)
        ax[0,j].axis('off')
        ax[0,j].set_title( cam_list[0] +', ' +index_type +'=%d' % (stride*(j+1)))
        
        ax[1,j].imshow(stackB[stride*(j+1),:,:], cmap='gray',vmin=vmin, vmax=vmax)
        ax[1,j].axis('off')
        ax[1,j].set_title(cam_list[1] + ', ' +index_type +'=%d' % (stride*(j+1)))
In [14]:
plot_image_pairs(cm00stack, cm01stack, ('CM00', 'CM01'), 'z')
In [15]:
plot_image_pairs(cm02stack, cm03stack, ('CM02', 'CM03'), 'z')

pyklb

pyklb is a Python wrapper to read/write klb files. On 64-bit Windows running Python 3.7, install as follows:

  • Download, unzip, and copy pyklb-master to your local drive
    For eg., C:\Users\
  • Activate your Python environment (firstenv here)
  • Navigate to the \dist\ folder in your local pyklb-master
    cd pyklb-master\dist
  • pip install the wheel present in this folder
    pip install pyklb-0.0.3.dev0-cp37-cp37m-win_amd64.whl
  • Download, unzip, and copy klb.dll to your environment’s \lib\site-packages\ folder
    For eg., C:\Anaconda3\envs\firstenv\lib\site-packages\

If you are on a different OS or using a different Python version, build and install everything from scratch from here

Open a .klb file in Python

  • Let’s now open a .klb file in Python to confirm that things are working as expected. Below, I’m using ipython for this:

    (base) C:\Users\Raghav>conda activate firstenv (firstenv) C:\Users\Raghav>ipython In [1]: import pyklb as klb In [2]: img = klb.readfull('y:/.../...klb’) In [3]: type(img) In [4]: img.shape In [5]: help(klb)

  • If this works, then you are all set with loading .klb files in your Python environment!

Read and write klbs

In [16]:
import pyklb as klb
img = klb.readfull('Y:/Documents/Slides/Raghav/tutorial_python/test.klb')
print(type(img), img.shape, img.dtype)
<class 'numpy.ndarray'> (78, 256, 94) uint16

You'll notice that the file is loaded in Python as a numpy N-dimensional array. From its shape, we can tell that the numpy array represents a 3D image-stack. Recall that klb can support up to 5D arrays, and the dimension convetion is as follows: [tm,ch,z,y,x]

For this example 3D image-stack, the dimenion convention is thus [z,y,x]

Memory locations

In [17]:
tmp = img
print(hex(id(img)), hex(id(tmp)))
0x1c554895d00 0x1c554895d00

Note that both 'img' and 'tmp' are pointing to the same memory location.

However, if we were to invoke the 'copy' method, then a separate copy is created and mapped to a different memory location.

In [18]:
foo = np.copy(img)
print(hex(id(img)), hex(id(foo)))
0x1c554895d00 0x1c5548168f0
In [19]:
img2 = ((img/np.max(img))*255).astype(np.uint8)
print(type(img2), img2.shape, img2.dtype)
print(hex(id(img)), hex(id(img2)))
<class 'numpy.ndarray'> (78, 256, 94) uint8
0x1c554895d00 0x1c554ac0490

Above, we took the original 'img' array and changed it to 8-bit and saved it as 'img2'

So, a new array was created in the process, and thus the memory locations of 'img' and 'img2' are different.

Save the 8-bit image as a new .klb file:

In [20]:
klb.writefull(img2, filepath='Y:/Exchange/Raghav/tutorial_python/img-8bit.klb')

Note: you can seek more info by calling the 'help' function as: help(klb)

Specify the sampling of your data in microns:

In [21]:
klb.writefull(img2, filepath='Y:/Exchange/Raghav/tutorial_python/img-8bit-um.klb', 
              pixelspacing_tczyx = (4.06125,0.8125,0.8125))
In [22]:
img3 = klb.readfull('Y:/Exchange/Raghav/tutorial_python/img-8bit-um.klb')

Display an image

To display an image, we'll use matplotlib, a library for creating visualizations in Python.

In [23]:
import matplotlib.pyplot as plt
#Show the graphics in the notebook
%matplotlib inline 

Let's grab a single plane from the 3D image-stack and display it in the notebook as follows

In [24]:
zslice = img3[24,:,:] # This is an example of slicing

Note that the index in Python starts at 0, so if you want to grab the first z-plane, here is how you would do it first_zslice = img3[0,:,:]

And, the last z-plane can be grabbed using an index of '-1' last_zslice = img3[-1,:,:]

In [25]:
plt.figure(figsize=(8,8))
plt.imshow(zslice,cmap='gray',interpolation='none')
plt.colorbar()
plt.show()

Select a ROI of the image by slicing the numpy array

In [26]:
roi = zslice[180:230,0:50]
print(type(roi),roi.mean(), roi.min(), roi.max())

plt.imshow(roi,cmap='gray',interpolation='none')
plt.show()
<class 'numpy.ndarray'> 83.4728 61 161

Display multiple images

Instead of looking at a single image, let's display several images from our 3D image-stack on a subplot. For this, we'll write a function and we'll also use a built-in function called 'range' 'which generates a sequence of numbers as shown here

In [27]:
a = range(0,5)
print(type(a))
list(a)
<class 'range'>
Out[27]:
[0, 1, 2, 3, 4]

Define a function to show several images from an imagestack:

In [28]:
def plot_images(stackA, axial_label, stride=8, vmin=60, vmax=255, cmap='hot'):
    '''
    This function shows several images from an imagestack
    Example: plot_images(cm00stack, 'z')
    '''
    ncols = int(stackA.shape[0]/stride)
    fig, ax = plt.subplots(1,ncols,figsize=(21,12))
    for j in range(0,ncols):
        ax[j].imshow(stackA[stride*(j),:,:], cmap=cmap, interpolation ='none', vmin=vmin, vmax=vmax)
        ax[j].axis('off')
        ax[j].set_title(axial_label +'=%d' % (stride*j))
In [29]:
plot_images(img3,'Zplane',stride=16)

Orthogonal view

Before we look at orthogonal views of our 3D image-stack, let's first generate a random 3x3 matrix and get familiar with numpy's 'transpose' method

In [30]:
a = np.random.randint(0,10,size=(3,3))
print(a)
[[0 7 8]
 [3 8 3]
 [6 2 2]]
In [31]:
a.transpose(), a.T
Out[31]:
(array([[0, 3, 6],
        [7, 8, 2],
        [8, 3, 2]]),
 array([[0, 3, 6],
        [7, 8, 2],
        [8, 3, 2]]))
In [32]:
hex(id(a)), hex(id(a.transpose())), hex(id(a.T))
Out[32]:
('0x1c55493fdf0', '0x1c55490d030', '0x1c55490d030')
In [33]:
#Transpose an image-stack
imgY = img3.T
print('Original shape',img3.shape)
print('After transpose',imgY.shape)
Original shape (78, 256, 94)
After transpose (94, 256, 78)

The transposed 3D stack has the X and Z dimensions swapped. This is a 90 degree rotation about the Y-axis.

In [34]:
plot_images(imgY, axial_label='Y', stride =16)

Now swap the first two dimensions (Z and Y) to get a 90 degree rotation about X-axis

In [35]:
imgX=img3.transpose((1,0,2))
plot_images(imgX, axial_label='X', stride=64)
print('Original shape',img3.shape)
print('After transpose',imgX.shape)
Original shape (78, 256, 94)
After transpose (256, 78, 94)

Memory layout of the orignal 3D array and transposed 3D arrays that we generated:

In [36]:
print(img3.flags)
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [37]:
print(imgX.flags)
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

The arrays need to be 'C_CONTIGUOUS', otherwise you'll get an error when you try to save the array as klb. So, let's make them so before saving the rotated image-stacks

In [38]:
imgX = np.ascontiguousarray(imgX)
imgY = np.ascontiguousarray(imgY)

print(imgX.flags['C_CONTIGUOUS'])
print(imgY.flags['C_CONTIGUOUS'])
True
True
In [39]:
klb.writefull(imgX, filepath='Y:/Exchange/Raghav/tutorial_python/imgX-8bit-um.klb')
klb.writefull(imgY, filepath='Y:/Exchange/Raghav/tutorial_python/imgY-8bit-um.klb')

Interactive visualization: napari

You'll soon miss the interactivity that we are accustomed to with ImageJ/Fiji and 'matplotlib' doesn't offer those features. An image-viewer for Python called 'napari' fills that void.

You can learn more on how to install 'napari' here: https://napari.org/

They also have several tutorials to get you started. It'd be benficial to familiarize yourself with another library in Python called Dask (dask array in particular). Dask is already installed in your Python environment, so you can start using it right away!

Dask arrays are like Numpy arrays, with the key difference being that they can be arbirtrily chunks within a grid. Go through '01_dask_array.ipynb' for more on this topic

A quick tour of Dask Arrays

Dask arrays coordinate many Numpy arrays, arranged into chunks within a grid. They support a large subset of the Numpy API.

In [40]:
from dask.distributed import Client, progress
client = Client(processes=False, threads_per_worker=8,
                n_workers=8, memory_limit='12GB')
client
Out[40]:

Client

Cluster

  • Workers: 8
  • Cores: 64
  • Memory: 96.00 GB
In [41]:
import dask.array as da
import time
x = da.random.random((10000, 10000), chunks=(1000, 1000))
x
Out[41]:
Array Chunk
Bytes 800.00 MB 8.00 MB
Shape (10000, 10000) (1000, 1000)
Count 100 Tasks 100 Chunks
Type float64 numpy.ndarray
10000 10000
In [42]:
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z
Out[42]:
Array Chunk
Bytes 40.00 kB 4.00 kB
Shape (5000,) (500,)
Count 430 Tasks 10 Chunks
Type float64 numpy.ndarray
5000 1

Call .compute() when you want your result as a NumPy array.

In [43]:
z.compute()
Out[43]:
array([0.99212105, 0.99908829, 1.00056553, ..., 0.99885682, 1.00162508,
       0.98664376])

If you have available RAM for your dataset then you can persist data in memory. This allows future computations to be much faster.

In [44]:
y = y.persist()
In [45]:
%%time
y[0, 0].compute()
Wall time: 359 ms
Out[45]:
0.1915126491067638
In [46]:
%%time 
y.sum().compute()
Wall time: 279 ms
Out[46]:
99996196.87110524

Simple test functions

These functions do simple operations like add two numbers together, but they sleep for a random amount of time to simulate real work.

In [47]:
import random
def inc(x):
    time.sleep(random.random())
    return x + 1
def dec(x):
    time.sleep(random.random())
    return x - 1   
def add(x, y):
    time.sleep(random.random())
    return x + y 

We can run them like normal Python functions below

In [48]:
%%time
x = inc(1); y = dec(2); z = add(x, y)
z
Wall time: 773 ms
Out[48]:
3

These ran one after the other, in sequence. Note though that the first two lines inc(1) and dec(2) don't depend on each other, we could have called them in parallel had we been clever.

Annotate functions with Dask Delayed to make them lazy

We can call dask.delayed on our funtions to make them lazy. Rather than compute their results immediately, they record what we want to compute as a task into a graph that we'll run later on parallel hardware.

In [49]:
import dask
inc = dask.delayed(inc)
dec = dask.delayed(dec)
add = dask.delayed(add)

Calling these lazy functions is now almost free. We're just constructing a graph

In [50]:
xx = inc(1); yy = dec(2); zz = add(xx, yy)
zz
Out[50]:
Delayed('add-49e3046e-ae0e-47de-a146-4efe14731d42')
In [51]:
%%time
zz.compute()
Wall time: 1.66 s
Out[51]:
3

Where to from here?

Python Bio-image Analysis, EMBL Heidelburg

https://github.com/WhoIsJack/python-bioimage-analysis-tutorial

BioPython

Biopython is a library of computational molecular biology tools. conda install -c conda-forge biopython

# Example Script:
# Import the sequence and alphabet toolboxes from the Biopython library
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

# Define a sequence as a DNA Seq object
seq = Seq("CCTCAGCGAGGACAGCAAGGGACTAGCC", IUPAC.unambiguous_dna)

# We've assigned a DNA alphabet to our sequence
alphabet = seq.alphabet

complement = seq.complement()
reverse_complement = seq.reverse_complement()

# Use the transcribe() method to switch alphabets to RNA 
mRNA = seq.transcribe()

# Use the translate() method to get the corresponding protein sequence
protein =  mRNA.translate()

Many more... Happy exploring!