Scientific Libraries with Python#

Open In Colab

Although coding with Python is very versatile and allows many advanced features that are useful when manipulating massive data (a common task in science), Python is still a multipurpose language, what implies that scientific routines and functions cannot (should not) be supported within its basic core. Nevertheless, there are many different scientific libraries that can extend the capabilities of Python to scientific implementations in a natural way. One of the most used libraries is NumPy. This introduce the array object as a generalization of Python nested lists. This new object has many linear algebra operations implemented as methods or attributes. This operations are implemented at the low level through fast highly optimized algorithms.

Scientific Libraries with Python

Python+NumPy#

In this way, Python+NumPy can be seen as a framework to implement numerical code as linear algebra abstractions which replaces the slow Python loops. The contrary is also true. In this framework each algorithm must be designed to try to avoid the use of Python loops like for or while.

An ideal program implemented in Python+NumPy does not have explicit Python loops, but only linear algebra abstractions.

Extended Python+Numpy framework#

All the other high level packages for scientific computation are designed to work around the linear algebra abstractions of the Python+NumPy framework

  • Pandas add labels to the Numpy arrays.

  • Mathplotlib plotting library. Ti visualize arrays in 1, 2 and 3 dimensions

  • SciPy, intended for manipulating NumPy arrays more efficiently and for extending and including numerical methods, respectively.

Another less used libraries like SymPy are intended for manipulating analytical expressions, i.e. a CAS (Computer Algebraic System).

Installation of these libraries is often an easy task. In most of the Linux distros you should find them in the official repositories.

Avoid loops. Use abstractions

Official Pages#

See the official pages of the libraries for new versions, news and manuals.

NumPy:

http://www.numpy.org/

SciPy

http://www.scipy.org/

SymPy

http://www.sympy.org/

Anaconda

https://www.anaconda.com/

Anaconda is a self-cointained Python distribution that integrates many standard scientific libraries with Python, along with some generic libraries like MongoDB

There are many different scientific libraries for Python with many different uses, even for very specific tasks. However, as we are interested in general numerical methods, we will focus only on NumPy and Scipy

La forma recomendada de importar los diferentes módulos y el uso de sus métodos y atributos suele resumirse en Cheat Sheets. Para Python científico recomendamos las elaboradas por Data Camp, que pueden consultarse aquí


NumPy#

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object

  • sophisticated (broadcasting) functions

  • tools for integrating C/C++ and Fortran code

  • useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

NumPy Cheat Sheet [PDF]

Basic Use#

Importing and basic math#

NumPy can be imported in several different ways. Importing NumPy with the alias of np is the recommended way

import numpy as np

The stantard name space for other Python modules are

import numpy as np
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
import numpy.linalg as la
import math as m # real 
import cmath as cm # complex

If imported through the name space np, NumPy methods and attributes are accessed by using the assigned name space.

The Basic methods of NumPy include the usual mathematical functions

print( np.exp(10.5), np.log(52.3), np.log10(63.9), np.sqrt(10.0) )
36315.502674246636 3.9569963710708773 1.8055008581584002 3.1622776601683795
#Trigonometric functions
print (np.sin(5.0), np.cos(9.6), np.arcsin(0.5), np.arctan(5))
-0.9589242746631385 -0.984687855794127 0.5235987755982989 1.373400766945016

The basic attributes include some important constants

"The value of PI is {:.2f}".format( np.pi )
'The value of PI is 3.14'
print  ( "The value of PI is {:.6f}".format( np.pi ) )
The value of PI is 3.141593
np.pi
3.141592653589793
print  (f"The value of e is {np.e:.6f}" )
The value of e is 2.718282
print  ( "∞={}".format( np.inf ) )
∞=inf

Which is greater than any other number:

200000000000000000000000000000000000000000000000<np.inf
True
print  ( "0/0={}".format( np.nan ) )
0/0=nan

Conflicts without name spaces:

from numpy import *
sin([2,3])
array([0.90929743, 0.14112001])
from math import *
sin([2,3])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-0612cee6990d> in <module>
----> 1 sin([2,3])

TypeError: must be real number, not list

Lists vs NumPy arrays#

Supported methods for lists

x.append   x.count    x.extend   x.index    x.insert   x.pop      x.remove   x.reverse  x.sort

Supported methods for NumPy arrays

x.T             x.clip          x.dot           x.item          x.prod          x.setfield      x.take
x.all           x.compress      x.dtype         x.itemset       x.ptp           x.setflags      x.tofile
x.any           x.conj          x.dump          x.itemsize      x.put           x.shape         x.tolist
x.argmax        x.conjugate     x.dumps         x.max           x.ravel         x.size          x.tostring
x.argmin        x.copy          x.fill          x.mean          x.real          x.sort          x.trace
x.argsort       x.ctypes        x.flags         x.min           x.repeat        x.squeeze       x.transpose
x.astype        x.cumprod       x.flat          x.nbytes        x.reshape       x.std           x.var
x.base          x.cumsum        x.flatten       x.ndim          x.resize        x.strides       x.view
x.byteswap      x.data          x.getfield      x.newbyteorder  x.round         x.sum           
x.choose        x.diagonal      x.imag          x.nonzero       x.searchsorted  x.swapaxes
np.array([1,2,3,4,5,6,7,8,9,10]).sum()
55

In common

Lists and numpy arrays can both store any type of data

x1 = [1.2, 3.5, 1.9]
x2 = np.array([1.6, -2.6, 6.9])
print( x1, x2 )
[1.2, 3.5, 1.9] [ 1.6 -2.6  6.9]
type(x2)
numpy.ndarray

For lists, new elements can be added using append method

x1 = [1.2, 3.5, 1.9]
x2=x1.copy() # To compare later with numpy
x1.append(5.9)
x1
[1.2, 3.5, 1.9, 5.9]

For arrays, new elements can be added using append function of NumPy

np.append(x2,5.9)
array([1.2, 3.5, 1.9, 5.9])

A list can be converted into a numpy array, but the internal data type is homogenized. In the following example to float

x = [1,3.4,1.0]
x = np.array(x)
x
array([1. , 3.4, 1. ])

And a numpy array can be converted back to a (homogenized) list, as well

list(x)
[1.0, 3.4, 1.0]

Differences Operator + for lists is overloaded for concatenating

x1 = [1,2,3]
x2 = [3,2,1]
print (f'+: {x1+x2}, or np.concatenate: {np.concatenate((x1,x2))}' )
+: [1, 2, 3, 3, 2, 1], or np.concatenate: [1 2 3 3 2 1]

Operator + for numpy arrays is overloaded for adding

x1 = np.array([1,2,3])
x2 = np.array([3,2,1])
print (x1+x2)
[4 4 4]

All the main operators are overloaded only for numpy arrays as element-wise operations

  • Multiplication

x1=np.array(x1)
x2=np.array(x2)
print(x1*x2)
[ 3.12 13.44  7.59]
  • Division

print(x1/x2)
[0.46153846 1.71428571 6.27272727]
  • Substraction

print(x1-x2)
[-1.4  2.   5.8]
  • Power

print(x1**x2)
[ 1.6064649  80.81192733  8.37016462]

NumPy arrays Summary of element-wise operations: Numpy arrays support any mathematical operation (element by element)

x1 = np.array([1.2,4.8,6.9])
x2 = np.array([2.6,2.8,1.1])

print ("Adding", x1+x2)
print ("Multiplication", x1*x2)
print ("Division", x1/x2)
print ("Subtraction", x1-x2)
print ("Power", x1**x2)
Adding [3.8 7.6 8. ]
Multiplication [ 3.12 13.44  7.59]
Division [0.46153846 1.71428571 6.27272727]
Subtraction [-1.4  2.   5.8]
Power [ 1.6064649  80.81192733  8.37016462]

Note: Matrices can be represented as NumPy arrays where each element is a row vector. Nevertheless, be careful when multiply arrays, the operator * is overloaded in such a way that single elements are multiplied one by one, quite different from multiplication of matrices.

A detailed explanation will be given in the Chapter about linear algebra

A = np.array([[1,2],[3,4]])
B = np.array([[4,3],[2,1]])
print (f'A=\n{A}') 
print (f'B=\n{B}')
print (f'A*B=\n{A*B}') # No matrix multiplication
A=
[[1 2]
 [3 4]]
B=
[[4 3]
 [2 1]]
A*B=
[[4 6]
 [6 4]]

A matrix element can be accessed directly

A[0,1]
2
A[0][1]
2

Scientific Python abstractions#

Abstractions in scientific Python refer to the use of high-level constructs or interfaces that simplify complex operations and allow users to focus on the concepts and results rather than the underlying implementation details. Examples of abstractions in scientific Python include NumPy arrays for efficient numerical operations, Pandas dataframes for tabular data manipulation, and Matplotlib for data visualization. Abstractions help to make scientific computing more accessible and efficient for users.

All of them have methods to implement internal loops in faster programming languages that the ones in Python.

Examples of abstractions in NumPy that can be used to avoid for loops are:

  1. Vectorized operations: NumPy supports many element-wise mathematical operations that can be performed directly on entire arrays, without the need for explicit for loops. For example, to multiply every element of an array by a constant, you can simply use the * operator, like this: my_array * 5.

  2. Broadcasting: Broadcasting is a powerful feature in NumPy that allows arrays with different shapes to be used in arithmetic operations. Broadcasting can often be used as a more efficient alternative to for loops. For example, to add a scalar value to every row of a 2D array, you can simply add the scalar to the entire array, like this: my_array + 5.

  3. Masking and boolean indexing: NumPy provides powerful indexing capabilities that can be used to select subsets of an array based on some condition. For example, to select all elements of an array that are greater than 5, you can use a boolean mask, like this: my_array[my_array > 5]. This is often more efficient than using an explicit for loop to iterate over the array.

  4. Aggregation functions: NumPy provides many functions that can be used to compute summary statistics over an entire array, such as np.sum(), np.mean(), and np.std(). These functions abstract away the need for explicit for loops over the array.

Below is given a more detailed explanation for the last two.

Masks#

A mask is a boolean array that is used to index into the original array with the same shape in order to select only the elements that satisfy the condition represented by the mask. The mask is applied as a superspostion of a logical array unpon the original one. The output filters only the True values

x = np.array([1,2,3,4])
y = np.array([False, False, True, True])
x[y]
array([3, 4])
#It is possible to access elements of a numpy array using booleans
x = np.array([1,2,3,4])
y = np.array([False, False, True, True])
x[y]
array([3, 4])

Automatic creation of masks

x>2
array([False, False,  True,  True])
x[x>2] # Automatic mask implementation
array([3, 4])

which is much better and faster than:

xfin=[]
for i in x:
    if i>2:
        xfin.append(i)
        
xfin=np.array(xfin)
xfin
array([3, 4])
#Operators >, <, >=, <= and ==, != are also overloaded for numpy arrays
x = np.array([0,5,8,0])
y = np.array([0,6,5,1])
print(x>y) 
print(x<y) 
print(x==y) 
print(x!=y) 
print(x>4) 
[False False  True False]
[False  True False  True]
[ True False False False]
[False  True  True  True]
[False  True  True False]
#Combining these features, we can perform searches and comparisons far more efficient
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
#A new list with numbers greater than 4
x[x>=4]
array([4, 6, 8, 4, 9, 6, 7])

Numpy has logical operators: np.logical_...

For array([1,4,2,6,8,4,3,0,9,1,3,6,7]):

x[ (x>6) | (x<2) ]
array([1, 8, 0, 9, 1, 7])
x[ np.logical_and(x>2, x<6) ]
array([4, 4, 3, 3])

or

x[ (x>2) & (x<6) ] #or: |
array([4, 4, 3, 3])

the full mask can be negated!

For array([1,4,2,6,8,4,3,0,9,1,3,6,7]):

x[~((x>2) & (x<6)) ]
array([1, 2, 6, 8, 0, 9, 1, 6, 7])

Aggregation functions#

spreadsheeet like operations

#Native methods of numpy arrays allow to calculate basic quantities
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
#Maximum element
print( "Maximum element", x.max() )
#Minimum element
print( "Minimum element", x.min())
#Mean value
"Mean value", x.mean()
Maximum element 9
Minimum element 0
('Mean value', 4.153846153846154)

Upon the indices#

print(x)
[1 4 2 6 8 4 3 0 9 1 3 6 7]
#Sorted arguments of the array
print ("Sorted arguments", x.argsort() )
#Sorted array
print ( "Sorted array", x[x.argsort()])
Sorted arguments [ 0  1  2  3  4  5  6  7  8  9 10 11 12]
Sorted array [0 1 1 2 3 3 4 4 6 6 7 8 9]

Direct sorted

x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
x.sort()
x
array([0, 1, 1, 2, 3, 3, 4, 4, 6, 6, 7, 8, 9])
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
np.sort( x )
array([0, 1, 1, 2, 3, 3, 4, 4, 6, 6, 7, 8, 9])

Miscellaneous methods#

NumPy includes many general purpose functions that complement the capabilities of python. We are interested here specially in functions for creating ordered arrays, storing and loading data as well as histograms, tasks that will be continuously required for the activities of the course.

#Create an array of 1's with a given size (even 2D sizes)
x = np.ones(5)
x
array([1., 1., 1., 1., 1.])
#Create an array of zeros with a given size (even 2D sizes)
x = np.zeros( (2,5) )
x
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
import numpy as np
x=np.array([4,7,4,6,9])
x.sum()
30

Miscellaneous functions#

#Create an array with a given range and a number of intervals
x = np.linspace( -np.pi, np.pi, 10 ) 
print (x)
[-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585  0.34906585
  1.04719755  1.74532925  2.44346095  3.14159265]

For arrays that expands more than one order of magnitud, use

np.logspace( np.log10(2), np.log10(50000),10        )
array([2.00000000e+00, 6.16155028e+00, 1.89823509e+01, 5.84803548e+01,
       1.80164823e+02, 5.55047308e+02, 1.70997595e+03, 5.26805138e+03,
       1.62296817e+04, 5.00000000e+04])
np.linspace( 2, 50000,10        )
array([2.00000000e+00, 5.55733333e+03, 1.11126667e+04, 1.66680000e+04,
       2.22233333e+04, 2.77786667e+04, 3.33340000e+04, 3.88893333e+04,
       4.44446667e+04, 5.00000000e+04])
#Create an array with a given range and a given step
x = np.arange( 1, 5, 0.2 )
print( x )
[1.  1.2 1.4 1.6 1.8 2.  2.2 2.4 2.6 2.8 3.  3.2 3.4 3.6 3.8 4.  4.2 4.4
 4.6 4.8]
#Using the function savetxt, it is possible to store data from a numpy array
data = np.array([[3.2, 2.1],[3.1, 4.1]])
np.savetxt( "file.dat", data, fmt="%1.5e  %1.5e" )
cat file.dat
#In the same way, using the function loadtxt it is possible to load external data files
data = np.loadtxt("file.dat")
#Data is then a multidimensional array with the loaded data
print( data )

Other useful functions will be covered when needed during the course.

List Slices#

Slices work on list-like objects like numpy arrays, and can also be used to change sub-parts of the list.

x=np.ones(10)
x[1:3]=2
x[-1]=8
x[-3:-1]=6
x
array([1., 2., 2., 1., 1., 1., 1., 6., 6., 8.])

Reverse order (arrays do not have the reverse() method)

x[::-1]
array([8., 6., 6., 1., 1., 1., 1., 2., 2., 1.])

Pandas#

See the Pandas Chapter for details

numbers={"even": [0,2,4,6,8],   #  First  key-list
         "odd" : [1,3,5,7,9] }  #  Second key-list

import pandas as pd
pd.set_option('display.max_colwidth',200)
df=pd.DataFrame(numbers)
df
even odd
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9

In the previous DataFrame, all the column values are converted to Numpy arrays, which is the basic object in Numpy corresponding to generalized nested lists;

df.even
0    0
1    2
2    4
3    6
4    8
Name: even, dtype: int64
df.even.values
array([0, 2, 4, 6, 8])

SciPy#

SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. With SciPy an interactive Python session becomes a data-processing and system-prototyping environment rivaling sytems such as MATLAB, IDL, Octave, R-Lab, and SciLab.

Some of the packages included with SciPy are:

  • Special functions (scipy.special)

  • Integration (scipy.integrate)

  • Optimization (scipy.optimize)

  • Interpolation (scipy.interpolate)

  • Fourier Transforms (scipy.fftpack)

  • Signal Processing (scipy.signal)

  • Linear Algebra (scipy.linalg)

Each of these packages must be imported separately. Almost each of the numerical methods that will be covered during the course can be found in SciPy. For example, To import the integrate package, use

import scipy.integrate as integ

The integrate package then includes the next functions:

integ.Tester        integ.fixed_quad    integ.odepack       integ.quadrature    integ.test          
integ.complex_ode   integ.newton_cotes  integ.quad          integ.romb          integ.tplquad       
integ.cumtrapz      integ.ode           integ.quad_explain  integ.romberg       integ.trapz         
integ.dblquad       integ.odeint        integ.quadpack      integ.simps         integ.vode  

In next classes we will explore the offered options by SciPy according to the specific methods covered.