This package is built for use by people who are new not just to python but to coding, fitting, and programatically interacting with data. If you have experience with EXCEL, but need to fit data using least squares fitting, this is the tool for you.
There are four prerequisite functions for installing ezfit, pandas, numpy, matplotlib, and scipy. The package can be installed though the terminal with the following command.
pip install ezfit numpy pandas matplotlib
Import the ezfit library. This will allow you to have a simple interface for fitting
a pandas DataFrame to some model.
import ezfitTo start, load your data into a pandas DataFrame. Try to allways save your data as a .csv file with one line of headers. Read the documentation on this
x, y, yerr
0, 1, .5
1, .5, .2
...
You can load this data easily with the following easy command
# start by importing the pandas module
import pandas as pd
# Everythin in python uses the dot notation to access attributes and functions
# we need the read_csv() function from pandas, so we will call
df = read_csv("path_to_file") # note that you might need a full path
# lets check that the first 2 rows look correct by getting the `head` of the df
print(df.head(2)) # the print() statement is how you print something in pythonThe output should look something like this
x y yerr
0 0 1.0 0.5
1 1 0.5 0.2
We can also plot the data quickly to make sure it looks right, and determine if there is any cleaning that needs to be done.
# Lets start by getting the standard python plotting library
import matplotlib.pyplot as plt
# the df.plot() function will plot the data in the dataframe in one easy go
df.plot(x = "x", y = "y", yerr="yerr") # you can pass other parameters in too
# this will plot the collumn labeles "y" vs "x", with error bars of size "yerr"
# you can pretty this plot up if you like, but it is fine for just checking the data
plt.show() # This will render the currently active plotYou might want to place this plot on a log scale, and this can be done in many ways. For a complete list of the parameters available to you, please read up on the pandas plot method
Now if the dataset is not collumn seperated, as is the data from CXRO, you will need to tell pandas what seperates the collumns. Lets look at some CXRO index of refraction data
Si3N4 Density=3.44
Energy(eV), Delta, Beta
30. 0.274695814 0.210541397
31.7943592 0.252507478 0.17769818
33.6960373 0.229885429 0.150933087
The first row is density informaton about the material, followed by the rows for
Energy(eV), Delta, and Beta. So first we need to skip the first row of data points.
Using the pd.read_csv() function, we can pass in the parameter skiprows = n where
n is the number of rows we need skipped.
Now to get the data, we need to pass in a parameter telling pandas what to look for
between collumns. Using the parameter sep = \s+ we can tell the function that there
is an unknown number of space characters between collumns. Putting this together
we have
df = pd.read_csv("path_to_file", sep=r"\s+", skiprows=1)
# printing the head gives us
print(df.head(2))Energy(eV), Delta, Beta
0 30.000000 0.274696 2.105414e-01
1 31.794359 0.252507 1.776982e-01
Now there is one issue, the collumns have trailing commas. You can solve that easily in many ways.
df.columns = ["Energy(eV)","Delta","Beta"]
# or
df.columns = [col.replace(",", "") for col in df.columns]
# or
df.rename(columns={"Energy(eV),": "Energy(eV)", "Delta,": "Delta"})
# ... you get the ideaUsing the same methods as above, you can plot the data, and do any cleaning to remove bad data points.
Now you will need to express your mathematical model as a python function. This is the
hardest part of fitting. The syntax is rather simple, and you never need to use types
because python is a neat language. Say we have a line
$$
f(x) = mx+b,
$$
this function maps
# Function in python are created by typeng 'def' before the name of the function
def f(x, m, b): # For the code to work, x (or your domain) must be first
"""
Tripple quotes can be used to create a `doc string` a fancy type of comment
that gets attatched to the top of the function. It is allways a good idea
to comment your functions to say what they do, why they do it, and how
to use them. For example,
x: Domain input
m: slope
b: y-intercept
returns
y = mx + b
"""
y = m * x + b # Use * for multiplication and ** for exonentiation
return y # return is the key word to say what the function returnsOnce you define a model, and load your dataset, you need to fit your data. This can be done very easily. So I will run you though the whole process
import pandas as pd
import matplotlib.pyplot as plt
import ezplot
# ══════════/ Load the Data/ ═════════════════
df = pd.read_csv("path_to_csv")
print(df.head(10))
df.plot(x = "x", y="y", yerr = "yerr")
# ══════════/ Clean the Data/ ═════════════════
# oh no the data x < 1 is bad
mask = (df["x"] > 1)
df = df[mask]
# ══════════/ Define a Model/ ═════════════════
def line(x, m, b):
"""Line function."""
return m * x + b
# ══════════/ Fit the Data/ ═══════════════════
model, ax = df.fit(line, "x", "y", "y_err")
# this function will generate a quick plot of the fit results
plt.show()
# The model has parameters, errors, and goodness of fit
print(model)line:
𝜒2: 88.71565403843992
reduced 𝜒2: 0.9052617759024482
m : (value=1.0858435676047251 ± 0.0497, bounds=(-inf, inf))
b : (value=-0.4650788531268627 ± 0.0903, bounds=(-inf, inf))
Now say you wanted to redo the fit but adding bounds and a starting value for the slobe of the line
model, ax = df.fit(line, "x", "y", "y_err", m={ "value" : 1, "min" : 0 })
# you can pass in a dictionary for each parameter in your model
print(model)Now we get slightly different results
line:
𝜒2: 98.71565403843992
reduced 𝜒2: 1.0052617759024482
m : (value=1.158435676047251 ± 0.0497, bounds=(-inf, inf))
b : (value=-0.4650788531268627 ± 0.0903, bounds=(-inf, inf))