Scientific Libraries with Python
Contents
Scientific Libraries with Python#
Although coding with Python is very versatile and allows many advanced features that are useful when manipulating massive data (a common task in science), Python is still a multipurpose language, what implies that scientific routines and functions cannot (should not) be supported within its basic core. Nevertheless, there are many different scientific libraries that can extend the capabilities of Python to scientific implementations in a natural way. One of the most used libraries is NumPy. This introduce the array object as a generalization of Python nested lists. This new object has many linear algebra operations implemented as methods or attributes. This operations are implemented at the low level through fast highly optimized algorithms.
Scientific Libraries with Python
Python+NumPy#
In this way, Python+NumPy can be seen as a framework to implement numerical code as linear algebra abstractions which replaces the slow Python loops. The contrary is also true. In this framework each algorithm must be designed to try to avoid the use of Python loops like for
or while
.
An ideal program implemented in Python+NumPy does not have explicit Python loops, but only linear algebra abstractions.
Extended Python+Numpy framework#
All the other high level packages for scientific computation are designed to work around the linear algebra abstractions of the Python+NumPy framework
Pandas add labels to the Numpy arrays.
Mathplotlib plotting library. Ti visualize arrays in 1, 2 and 3 dimensions
SciPy, intended for manipulating NumPy arrays more efficiently and for extending and including numerical methods, respectively.
Another less used libraries like SymPy are intended for manipulating analytical expressions, i.e. a CAS (Computer Algebraic System).
Installation of these libraries is often an easy task. In most of the Linux distros you should find them in the official repositories.
Avoid loops. Use abstractions
Official Pages#
See the official pages of the libraries for new versions, news and manuals.
NumPy:
SciPy
SymPy
Anaconda
Anaconda is a self-cointained Python distribution that integrates many standard scientific libraries with Python, along with some generic libraries like MongoDB
There are many different scientific libraries for Python with many different uses, even for very specific tasks. However, as we are interested in general numerical methods, we will focus only on NumPy and Scipy
La forma recomendada de importar los diferentes módulos y el uso de sus métodos y atributos suele resumirse en Cheat Sheets. Para Python científico recomendamos las elaboradas por Data Camp, que pueden consultarse aquí
Contents
NumPy#
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
NumPy Cheat Sheet [PDF]
Basic Use#
Importing and basic math#
NumPy can be imported in several different ways. Importing NumPy with the alias of np is the recommended way
import numpy as np
The stantard name space for other Python modules are
import numpy as np
import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
import numpy.linalg as la
import math as m # real
import cmath as cm # complex
If imported through the name space np
, NumPy methods and attributes are accessed by using the assigned name space.
The Basic methods of NumPy include the usual mathematical functions
print( np.exp(10.5), np.log(52.3), np.log10(63.9), np.sqrt(10.0) )
36315.502674246636 3.9569963710708773 1.8055008581584002 3.1622776601683795
#Trigonometric functions
print (np.sin(5.0), np.cos(9.6), np.arcsin(0.5), np.arctan(5))
-0.9589242746631385 -0.984687855794127 0.5235987755982989 1.373400766945016
The basic attributes include some important constants
"The value of PI is {:.2f}".format( np.pi )
'The value of PI is 3.14'
print ( "The value of PI is {:.6f}".format( np.pi ) )
The value of PI is 3.141593
np.pi
3.141592653589793
print (f"The value of e is {np.e:.6f}" )
The value of e is 2.718282
print ( "∞={}".format( np.inf ) )
∞=inf
Which is greater than any other number:
200000000000000000000000000000000000000000000000<np.inf
True
print ( "0/0={}".format( np.nan ) )
0/0=nan
Conflicts without name spaces:
from numpy import *
sin([2,3])
array([0.90929743, 0.14112001])
from math import *
sin([2,3])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-0612cee6990d> in <module>
----> 1 sin([2,3])
TypeError: must be real number, not list
Lists vs NumPy arrays#
Supported methods for lists
x.append x.count x.extend x.index x.insert x.pop x.remove x.reverse x.sort
Supported methods for NumPy arrays
x.T x.clip x.dot x.item x.prod x.setfield x.take
x.all x.compress x.dtype x.itemset x.ptp x.setflags x.tofile
x.any x.conj x.dump x.itemsize x.put x.shape x.tolist
x.argmax x.conjugate x.dumps x.max x.ravel x.size x.tostring
x.argmin x.copy x.fill x.mean x.real x.sort x.trace
x.argsort x.ctypes x.flags x.min x.repeat x.squeeze x.transpose
x.astype x.cumprod x.flat x.nbytes x.reshape x.std x.var
x.base x.cumsum x.flatten x.ndim x.resize x.strides x.view
x.byteswap x.data x.getfield x.newbyteorder x.round x.sum
x.choose x.diagonal x.imag x.nonzero x.searchsorted x.swapaxes
np.array([1,2,3,4,5,6,7,8,9,10]).sum()
55
In common
Lists and numpy arrays can both store any type of data
x1 = [1.2, 3.5, 1.9]
x2 = np.array([1.6, -2.6, 6.9])
print( x1, x2 )
[1.2, 3.5, 1.9] [ 1.6 -2.6 6.9]
type(x2)
numpy.ndarray
For lists, new elements can be added using append method
x1 = [1.2, 3.5, 1.9]
x2=x1.copy() # To compare later with numpy
x1.append(5.9)
x1
[1.2, 3.5, 1.9, 5.9]
For arrays, new elements can be added using append function of NumPy
np.append(x2,5.9)
array([1.2, 3.5, 1.9, 5.9])
A list can be converted into a numpy array, but the internal data type is homogenized. In the following example to float
x = [1,3.4,1.0]
x = np.array(x)
x
array([1. , 3.4, 1. ])
And a numpy array can be converted back to a (homogenized) list, as well
list(x)
[1.0, 3.4, 1.0]
Differences Operator + for lists is overloaded for concatenating
x1 = [1,2,3]
x2 = [3,2,1]
print (f'+: {x1+x2}, or np.concatenate: {np.concatenate((x1,x2))}' )
+: [1, 2, 3, 3, 2, 1], or np.concatenate: [1 2 3 3 2 1]
Operator + for numpy arrays is overloaded for adding
x1 = np.array([1,2,3])
x2 = np.array([3,2,1])
print (x1+x2)
[4 4 4]
All the main operators are overloaded only for numpy arrays as element-wise operations
Multiplication
x1=np.array(x1)
x2=np.array(x2)
print(x1*x2)
[ 3.12 13.44 7.59]
Division
print(x1/x2)
[0.46153846 1.71428571 6.27272727]
Substraction
print(x1-x2)
[-1.4 2. 5.8]
Power
print(x1**x2)
[ 1.6064649 80.81192733 8.37016462]
NumPy arrays Summary of element-wise operations: Numpy arrays support any mathematical operation (element by element)
x1 = np.array([1.2,4.8,6.9])
x2 = np.array([2.6,2.8,1.1])
print ("Adding", x1+x2)
print ("Multiplication", x1*x2)
print ("Division", x1/x2)
print ("Subtraction", x1-x2)
print ("Power", x1**x2)
Adding [3.8 7.6 8. ]
Multiplication [ 3.12 13.44 7.59]
Division [0.46153846 1.71428571 6.27272727]
Subtraction [-1.4 2. 5.8]
Power [ 1.6064649 80.81192733 8.37016462]
Note: Matrices can be represented as NumPy arrays where each element is a row vector. Nevertheless, be careful when multiply arrays, the operator * is overloaded in such a way that single elements are multiplied one by one, quite different from multiplication of matrices.
A detailed explanation will be given in the Chapter about linear algebra
A = np.array([[1,2],[3,4]])
B = np.array([[4,3],[2,1]])
print (f'A=\n{A}')
print (f'B=\n{B}')
print (f'A*B=\n{A*B}') # No matrix multiplication
A=
[[1 2]
[3 4]]
B=
[[4 3]
[2 1]]
A*B=
[[4 6]
[6 4]]
A matrix element can be accessed directly
A[0,1]
2
A[0][1]
2
Scientific Python abstractions#
Abstractions in scientific Python refer to the use of high-level constructs or interfaces that simplify complex operations and allow users to focus on the concepts and results rather than the underlying implementation details. Examples of abstractions in scientific Python include NumPy arrays for efficient numerical operations, Pandas dataframes for tabular data manipulation, and Matplotlib for data visualization. Abstractions help to make scientific computing more accessible and efficient for users.
All of them have methods to implement internal loops in faster programming languages that the ones in Python.
Examples of abstractions in NumPy that can be used to avoid for loops are:
Vectorized operations: NumPy supports many element-wise mathematical operations that can be performed directly on entire arrays, without the need for explicit for loops. For example, to multiply every element of an array by a constant, you can simply use the
*
operator, like this:my_array * 5
.Broadcasting: Broadcasting is a powerful feature in NumPy that allows arrays with different shapes to be used in arithmetic operations. Broadcasting can often be used as a more efficient alternative to for loops. For example, to add a scalar value to every row of a 2D array, you can simply add the scalar to the entire array, like this:
my_array + 5
.Masking and boolean indexing: NumPy provides powerful indexing capabilities that can be used to select subsets of an array based on some condition. For example, to select all elements of an array that are greater than 5, you can use a boolean mask, like this:
my_array[my_array > 5]
. This is often more efficient than using an explicit for loop to iterate over the array.Aggregation functions: NumPy provides many functions that can be used to compute summary statistics over an entire array, such as
np.sum()
,np.mean()
, andnp.std()
. These functions abstract away the need for explicit for loops over the array.
Below is given a more detailed explanation for the last two.
Masks#
A mask is a boolean array that is used to index into the original array with the same shape in order to select only the elements that satisfy the condition represented by the mask.
The mask is applied as a superspostion of a logical array unpon the original one. The output filters only the True
values
x = np.array([1,2,3,4])
y = np.array([False, False, True, True])
x[y]
array([3, 4])
#It is possible to access elements of a numpy array using booleans
x = np.array([1,2,3,4])
y = np.array([False, False, True, True])
x[y]
array([3, 4])
Automatic creation of masks
x>2
array([False, False, True, True])
x[x>2] # Automatic mask implementation
array([3, 4])
which is much better and faster than:
xfin=[]
for i in x:
if i>2:
xfin.append(i)
xfin=np.array(xfin)
xfin
array([3, 4])
#Operators >, <, >=, <= and ==, != are also overloaded for numpy arrays
x = np.array([0,5,8,0])
y = np.array([0,6,5,1])
print(x>y)
print(x<y)
print(x==y)
print(x!=y)
print(x>4)
[False False True False]
[False True False True]
[ True False False False]
[False True True True]
[False True True False]
#Combining these features, we can perform searches and comparisons far more efficient
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
#A new list with numbers greater than 4
x[x>=4]
array([4, 6, 8, 4, 9, 6, 7])
Numpy has logical operators: np.logical_...
For array([1,4,2,6,8,4,3,0,9,1,3,6,7])
:
x[ (x>6) | (x<2) ]
array([1, 8, 0, 9, 1, 7])
x[ np.logical_and(x>2, x<6) ]
array([4, 4, 3, 3])
or
x[ (x>2) & (x<6) ] #or: |
array([4, 4, 3, 3])
the full mask can be negated!
For array([1,4,2,6,8,4,3,0,9,1,3,6,7])
:
x[~((x>2) & (x<6)) ]
array([1, 2, 6, 8, 0, 9, 1, 6, 7])
Aggregation functions#
spreadsheeet like operations
#Native methods of numpy arrays allow to calculate basic quantities
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
#Maximum element
print( "Maximum element", x.max() )
#Minimum element
print( "Minimum element", x.min())
#Mean value
"Mean value", x.mean()
Maximum element 9
Minimum element 0
('Mean value', 4.153846153846154)
Upon the indices#
print(x)
[1 4 2 6 8 4 3 0 9 1 3 6 7]
#Sorted arguments of the array
print ("Sorted arguments", x.argsort() )
#Sorted array
print ( "Sorted array", x[x.argsort()])
Sorted arguments [ 0 1 2 3 4 5 6 7 8 9 10 11 12]
Sorted array [0 1 1 2 3 3 4 4 6 6 7 8 9]
Direct sorted
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
x.sort()
x
array([0, 1, 1, 2, 3, 3, 4, 4, 6, 6, 7, 8, 9])
x = np.array([1,4,2,6,8,4,3,0,9,1,3,6,7])
np.sort( x )
array([0, 1, 1, 2, 3, 3, 4, 4, 6, 6, 7, 8, 9])
Miscellaneous methods#
NumPy includes many general purpose functions that complement the capabilities of python. We are interested here specially in functions for creating ordered arrays, storing and loading data as well as histograms, tasks that will be continuously required for the activities of the course.
#Create an array of 1's with a given size (even 2D sizes)
x = np.ones(5)
x
array([1., 1., 1., 1., 1.])
#Create an array of zeros with a given size (even 2D sizes)
x = np.zeros( (2,5) )
x
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
import numpy as np
x=np.array([4,7,4,6,9])
x.sum()
30
Miscellaneous functions#
#Create an array with a given range and a number of intervals
x = np.linspace( -np.pi, np.pi, 10 )
print (x)
[-3.14159265 -2.44346095 -1.74532925 -1.04719755 -0.34906585 0.34906585
1.04719755 1.74532925 2.44346095 3.14159265]
For arrays that expands more than one order of magnitud, use
np.logspace( np.log10(2), np.log10(50000),10 )
array([2.00000000e+00, 6.16155028e+00, 1.89823509e+01, 5.84803548e+01,
1.80164823e+02, 5.55047308e+02, 1.70997595e+03, 5.26805138e+03,
1.62296817e+04, 5.00000000e+04])
np.linspace( 2, 50000,10 )
array([2.00000000e+00, 5.55733333e+03, 1.11126667e+04, 1.66680000e+04,
2.22233333e+04, 2.77786667e+04, 3.33340000e+04, 3.88893333e+04,
4.44446667e+04, 5.00000000e+04])
#Create an array with a given range and a given step
x = np.arange( 1, 5, 0.2 )
print( x )
[1. 1.2 1.4 1.6 1.8 2. 2.2 2.4 2.6 2.8 3. 3.2 3.4 3.6 3.8 4. 4.2 4.4
4.6 4.8]
#Using the function savetxt, it is possible to store data from a numpy array
data = np.array([[3.2, 2.1],[3.1, 4.1]])
np.savetxt( "file.dat", data, fmt="%1.5e %1.5e" )
cat file.dat
#In the same way, using the function loadtxt it is possible to load external data files
data = np.loadtxt("file.dat")
#Data is then a multidimensional array with the loaded data
print( data )
Other useful functions will be covered when needed during the course.
List Slices#
Slices work on list-like objects like numpy arrays, and can also be used to change sub-parts of the list.
x=np.ones(10)
x[1:3]=2
x[-1]=8
x[-3:-1]=6
x
array([1., 2., 2., 1., 1., 1., 1., 6., 6., 8.])
Reverse order (arrays do not have the reverse()
method)
x[::-1]
array([8., 6., 6., 1., 1., 1., 1., 2., 2., 1.])
Pandas#
See the Pandas Chapter for details
numbers={"even": [0,2,4,6,8], # First key-list
"odd" : [1,3,5,7,9] } # Second key-list
import pandas as pd
pd.set_option('display.max_colwidth',200)
df=pd.DataFrame(numbers)
df
even | odd | |
---|---|---|
0 | 0 | 1 |
1 | 2 | 3 |
2 | 4 | 5 |
3 | 6 | 7 |
4 | 8 | 9 |
In the previous DataFrame, all the column values are converted to Numpy arrays, which is the basic object in Numpy corresponding to generalized nested lists;
df.even
0 0
1 2
2 4
3 6
4 8
Name: even, dtype: int64
df.even.values
array([0, 2, 4, 6, 8])
SciPy#
SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. With SciPy an interactive Python session becomes a data-processing and system-prototyping environment rivaling sytems such as MATLAB, IDL, Octave, R-Lab, and SciLab.
Some of the packages included with SciPy are:
Special functions (scipy.special)
Integration (scipy.integrate)
Optimization (scipy.optimize)
Interpolation (scipy.interpolate)
Fourier Transforms (scipy.fftpack)
Signal Processing (scipy.signal)
Linear Algebra (scipy.linalg)
Each of these packages must be imported separately.
Almost each of the numerical methods that will be covered during the course can be found in SciPy. For example,
To import the integrate
package, use
import scipy.integrate as integ
The integrate package then includes the next functions:
integ.Tester integ.fixed_quad integ.odepack integ.quadrature integ.test
integ.complex_ode integ.newton_cotes integ.quad integ.romb integ.tplquad
integ.cumtrapz integ.ode integ.quad_explain integ.romberg integ.trapz
integ.dblquad integ.odeint integ.quadpack integ.simps integ.vode
In next classes we will explore the offered options by SciPy according to the specific methods covered.