Chromatogram Visualization Tool

The chromatogram visualization tool built with Python to help users view chromatogram data in an easy-to-understand and interactive way.

As a Process Scientist in biotechnology, I have learned how to generate batch reports for process chromatogram data. Chromatography systems can be simple or complex, with some having an interface to monitor relevant parameters during a process run, while others generate a CSV/Excel file at the end of the run. The report typically displays two parameters on the chromatography plot, which are placed on each vertical axis of a timeline graph.

To get a complete picture of the run, process scientists often have to create multiple graphs and correlate time points, CV points or volume(mL) points. This can be time-consuming and counter-productive, as some scientists even have to manually insert required details in a PowerPoint image.

I have found that Python has useful tools that can help generate a plot with all necessary parameters on a single graph without affecting the results of another. In this article, I will show you how to use Python to generate the same process chromatogram data plot step-by-step in a three-way approach. These tools ensure that parameters do not override or affect the results of other parameters, allowing for accurate visualization of chromatogram data. With the help of Python, you can generate a comprehensive plot that includes all relevant parameters without having to create multiple graphs, saving time and effort.

Here are the steps to generate the chromatogram data plot in Python:

Get a cleaned Excel/CSV file of process chromatograph data.
Set up a Jupyter Notebook.
Write relevant Python code.

Get a cleaned Excel/CSV file of process chromatograph data.

If a CSV file is generated, I recommend converting it to an Excel file with the data sheet labeled "data" and the process method sheet labeled "method." In the "method" sheet, add the CV and corresponding start stage for the chromatography run to cell A1. This will help keep the data organized and make it easier to work with. I have provided a sample Excel file and a CSV file to help illustrate this process.

Before reading an Excel file, it is important to make sure that the data in the file is organized properly and calculated correctly.

In process chromatography, different measurements are taken at different stages. These measurements include things like time, flow rate, volume, UV and conductivity readings, and pH levels. There may be other measurements depending on the specific process.

Usually, the measurements are automatically calculated and recorded in a CSV or Excel file. But sometimes, we may need to manually calculate certain measurements.

For example, to calculate the column volume (CV), we can divide the value in the Volume (mL) column by the column volume (in mL) of the chromatography column being used. We can also add a separate column for time (min) if it's not already there. To calculate CV, we can multiply the time value (t; min) with the Flow Rate (Q; mL/min) and divide it by the chromatography column volume (V; mL), which can be calculated beforehand or provided as a batch parameter.

To analyze data using Python in a Jupyter Notebook, you need to ensure that your Excel table is well-structured with each row representing a data point and each parameter having its own corresponding column. It's important to label and organize the data correctly to avoid errors and make it easier to work with.

Set Up a Jupyter Notebook

To set up a Jupyter Notebook:

Install latest Python version on your PC.
Install Anaconda Data Science Platform open-source version on your PC.
Open Jupyter Notebook (Anaconda3) and create a new .ipynb file.
Rename the file. (Learn how to use Jupyter Notebook)
Open Anaconda Prompt (Anaconda3) and install the required Python libraries by writing the following command on PC:

pip install <package-name>

Replace <package-name> with:

pandas
numpy
matplotlib
math

Once the Excel/CSV table is set, copy it to the folder containing the current .ipynb file.

Writing Relevant Python Code

Now that the Jupyter Notebook is up and running, import relevant Python libraries

### import necessary libraries ###

import pandas as pd # for creating pandas DataFrames
import numpy as np # for creating numpy sequences
import math # for mathematical logic
from matplotlib import pyplot as plt # for plotting

To work with data from a CSV or Excel file in Python, you can use the pandas library to create DataFrames.

Here are the steps:

Use pandas.read_csv() or pandas.read_excel() to read in the data from the file, and specify the file path and any necessary parameters.
Create the first DataFrame from the CSV file using pandas.read_csv().
Create the second DataFrame from the Excel file using pandas.read_excel(), and specify the sheet name or index where the data is located.
Create the third DataFrame from the second worksheet in the reference.xlsx file by again using pandas.read_excel() and specifying the sheet name or index.

After creating the three DataFrames, you can use them to perform different data manipulations and analysis in Python.

### read CSV / Excel files and create DataFrames ###

# read main data from CSV file
df_process = pd.read_csv("./reference.csv")

# read main data from Excel file
df_excel = pd.read_excel("./reference.xlsx", sheet_name='data')

# read method data table from Excel file
df_method_labels = pd.read_excel("./reference.xlsx", sheet_name='method')

The information in the df_process and df_excel DataFrames is the same, but the df_method_labels DataFrame is used to describe the chromatography method being used.

df_process.head() # or df_excel.head()

df_method_labels.head()

We have created a table that contains all the important parameters of the chromatography process in separate columns. Additionally, we have another table that displays the starting column volume (CV) values for each stage of the process. It's crucial to ensure that the names of each parameter column in every DataFrame are identical. For example, the column that shows the volume of the solution passing through the column since batch start is denoted as 'ml' in the primary DataFrame, while it is named 'Volume (mL)' in the method DataFrame. To resolve this, we need to rename one of the columns so that both names match.

df_process.rename(columns={'ml':'Volume (mL)'}, inplace=True)

To update the 'Volume (mL)' values in the method table, we will match the 'CV' values in the method table with those of the primary data table and use the corresponding volume values from the primary data table.

# Add volume (mL) to method table taken from main data for volume (mL) corresponding to CV values in the method table

for i, CV_method in enumerate(df_method_labels['CV']):
    for mL, CV in zip(df_process['Volume (mL)'], df_process['CV']):
        if round(CV)==round(CV_method):
            df_method_labels['Volume (mL)'].values[i] = mL

Once the volumes in the method table are updated with the calculated values, the Jupyter Notebook output should show the method DataFrame with starting CV values for each chromatography stage, and their corresponding volumes in mL. This table will be useful as a reference for the analysis and visualization of chromatography data.

We now know how much liquid has passed through the column during the batch, and this amount corresponds to the column volume (CV) at the start of each process stage.

If time (min) since batch start is available in the data table, one can follow a similar approach. However, in this case, one also needs to have the flow rate data and apply the formula mentioned earlier for each row of data to calculate the corresponding CV values.

Usually, the information about process stage is not included in the main DataFrame. However, we can add this information using the method table. To do this, we compare the CV values in both tables. If a CV value in the main DataFrame is equal to or greater than the CV value in the method DataFrame, we can add the corresponding 'Stage' value from the method DataFrame to a new 'Stage' column in the main DataFrame. It's important to create the 'Stage' column beforehand and make sure the data type is compatible between the two DataFrames.

# Add a 'Stage' column to the main DataFrame by first defining a column with random values and string datatype

df_process['Stage'] = list(np.arange(len(df_process)))
df_process['Stage'] = df_process['Stage'].astype(str)

# If CV values in main DataFrame are greater than or equal to the CV values defined in method table then populate stage value from method table

for i, CV in enumerate(df_process['CV']):
    for CV_method, Stage in zip(df_method_labels['CV'], df_method_labels['Stage']):
        if CV >= CV_method:
            df_process['Stage'].values[i]= str(Stage)

Now the main DF looks like this with additional Stage column added:

Next, we will capture each column of the main DF into variables corresponding to a pandas Series:

# Assign each column to a pandas Series variable
            
mL_col = df_process['Volume (mL)']
UV_col = df_process['UV 1_280 (mAU)']
Cond_col = df_process['Cond (mS/cm)']
Conc_col = df_process['Conc B (%)']
PreC_col = df_process['PreC pressure (Mpa)']
DeltaC_col = df_process['DeltaC pressure (Mpa)']
pH_col = df_process['pH']
CV_col = df_process['CV']
Stage_col = df_process['Stage']

To prepare for plotting the chromatography data, we need to check that the parameters have compatible data types and are rounded to a convenient factor. This will help make the plotted data easier to read and understand.

# Function that rounds up the maximum Series value to the nearest factor for plotting

def round_up_max_value(series, factor):
    return math.ceil(max(series.fillna(0)) / factor) * factor

Next, we will create functions that compare a value from a column in a pandas DataFrame and retrieve the corresponding value from another column in the same row. These functions will be used to obtain parameter values for a specific CV value, which will help us determine the limits of the plot.

# Function that converts CV values in method table to corresponding volume for the first occurrence of rounded equivalent CV values

def CV_to_volume(CV_method):
    for mL, CV in zip(df_process['Volume (mL)'], df_process['CV']):
        if round(CV)==round(CV_method):
            return round(mL) 

# Generic form of the above function where the series_from is the series whose value will be compared and series_to is the series from which the value is fetched

def CV_to_value(series_from, series_from_value, series_to):
    for x_from, x_to in zip(df_process[series_from.name], df_process[series_to.name]):
        if round(x_from) == round(float(series_from_value)):
            return x_to

Now that we have the necessary tools to create appropriate plots, I will demonstrate three different approaches to solve the same problem: procedural, functional, and advanced. To help you better understand, let's compare it with the process of making rotis (Indian flatbreads).

Procedural Approach

The procedural approach can be compared to making rotis from scratch, starting with sowing seeds, harvesting wheat, grinding it to make flour, making dough, and finally cooking the rotis. In the context of creating plots, this approach involves hard-coding plot parameters and starting from the beginning, but it provides the most flexibility and control in creating the desired plot.

# Get matplotlib figure and axes objects
fig, ax1 = plt.subplots(figsize=(15,7.5))

# assigning maximum and minimum x-axis values, this can be manipulated to get the section from the full process chromatogram relevant to our analysis
xlim_CV_lower = min(CV_col)
xlim_CV_upper = max(CV_col)

# parameters for first (baseline) x-y axes

# creating the plot
ax1.plot(CV_col, UV_col, color='blue')

# setting primary x-axis parameters
ax1.set_xlabel(CV_col.name)
ax1.set_xlim([xlim_CV_lower,xlim_CV_upper])
ax1.set_xticks(tuple(df_method_labels['CV'][(df_method_labels.CV >=xlim_CV_lower) & (df_method_labels.CV <=xlim_CV_upper)]))
ax1.set_xticklabels(ax1.get_xticks(),rotation=90)

# setting primary y-axis parameters
ax1.set_ylabel(UV_col.name, color='blue')
ax1.set_ylim([-1,math.ceil(max(UV_col) / 100) * 100])
ax1.set_yticks(np.arange(0,800,50))
ax1.yaxis.label.set_color('blue')
ax1.tick_params(axis='y', colors='blue')

# setting grid
plt.grid()

# creating the secondary x-axis plot for Stage
axB = ax1.twiny()
axB.plot(CV_col, UV_col, color='blue')

# setting secondary x-axis parameters
axB.set_xlabel('Sample Biosimilar Process Stage')
axB.set_xlim([xlim_CV_lower,xlim_CV_upper])
axB.set_xticks(tuple(df_method_labels['CV'][(df_method_labels.CV >=xlim_CV_lower) & (df_method_labels.CV <=xlim_CV_upper)]))
axB.set_xticklabels(tuple(df_method_labels['Stage'][(df_method_labels.CV >=xlim_CV_lower) & (df_method_labels.CV <=xlim_CV_upper)]),rotation=90)

# creating the secondary x-axis plot for Volume
axC = ax1.twiny()

# setting secondary x-axis parameters, using CV_to_volume function to convert CV limits to Volume limits
xaxis_volume_limits = [CV_to_volume(xlim_CV_lower),CV_to_volume(xlim_CV_upper)]
axC.set_xlim(xaxis_volume_limits)
axC.spines['top'].set_position(('axes', -0.2))
axC.set_xlabel(mL_col.name, labelpad=-50)
axC.set_xticks(np.arange(xaxis_volume_limits[0],xaxis_volume_limits[1], 20))
axC.set_xticklabels(np.arange(xaxis_volume_limits[0],xaxis_volume_limits[1], 20),rotation=90)

# creating a secondary y-axis plot for Conductivity
ax2 = ax1.twinx()
ax2.plot(CV_col, Cond_col, color='orange')

# setting secondary y-axis parameters for Conductivity
ax2.set_ylabel('Conductivity (mS/cm)', labelpad=10, color='orange')
ax2.set_ylim([-1,250])
ax2.set_yticks(np.arange(0,260,10))
ax2.tick_params(axis='y', colors='orange')

# creating a secondary y-axis plot for pH
ax3 = ax1.twinx()
ax3.plot(CV_col, pH_col, color='red')

# setting secondary y-axis parameters for pH
ax3.spines['right'].set_position(('outward', 80))
ax3.set_ylabel('pH', labelpad=0, color='red')
ax3.set_ylim([0,15])
ax3.set_yticks(np.arange(0,15,0.5))
ax3.tick_params(axis='y', colors='red')

# creating a secondary y-axis plot for Concentration B
ax4 = ax1.twinx()
ax4.plot(CV_col, Conc_col, color='green')

# setting secondary y-axis parameters for Concentration B
ax4.spines['right'].set_position(('axes', -0.1))
ax4.set_ylabel('Conc B (%)', labelpad=-45, color='green')
ax4.set_ylim([-1,101])
ax4.set_yticks(np.arange(0,101,4))
ax4.tick_params(axis='y', colors='green')

# creating a secondary y-axis plot for Pre-Column Pressure
ax5 = ax1.twinx()
ax5.plot(CV_col, PreC_col, color='black')

# setting secondary y-axis parameters for Pre-Column Pressure
ax5.spines['right'].set_position(('axes', -0.18))
ax5.set_ylabel('PreC pressure (Mpa)', labelpad=-45, color='black')
ax5.set_ylim([0,1])
ax5.set_yticks(np.arange(0,1,0.05))
ax5.tick_params(axis='y', colors='black')

# creating a secondary y-axis plot for Delta-Column Pressure
ax6 = ax1.twinx()
ax6.plot(CV_col, DeltaC_col, color='violet')

# setting secondary y-axis parameters for Delta-Column Pressure
ax6.spines['right'].set_position(('outward', 160))
ax6.set_ylabel('DeltaC pressure (Mpa)', labelpad=10, color='violet')
ax6.set_ylim([0,1])
ax6.set_yticks(np.arange(0,1,0.05))
ax6.tick_params(axis='y', colors='violet')

# setting legends for all y-axes
ax1.legend(['UV (mAU)'],ncol=1 , bbox_to_anchor=(0.15, 1.40), facecolor="white" )
ax2.legend(['Conductivity (mS/cm)'],ncol=1, bbox_to_anchor=(0.35, 1.40), facecolor="white")
ax3.legend(['pH'],ncol=1 , bbox_to_anchor=(0.45, 1.40), facecolor="white")
ax4.legend(['Conc B (%)'],ncol=1 , bbox_to_anchor=(0.6, 1.40), facecolor="white")
ax5.legend(['PreC pressure (Mpa)'],ncol=1 , bbox_to_anchor=(0.8, 1.40), facecolor="white")
ax6.legend(['DeltaC pressure (Mpa)'],ncol=1 , bbox_to_anchor=(1, 1.40), facecolor="white")

plt.title('Python-Based Process Chromatogram')

# saving plot to a PNG file
plt.savefig(".\output-hard-coded.png", dpi=1600, bbox_inches='tight')

We get the following output

Plot created using matplotlib had-coding in Python. — Hard-coded Process Chromatography Plot

The procedural approach gives a lot of control over the plot, but if we only want to see a particular part of the plot, like everything after the end of the Load stage (CV ~ 55), we need to change the code. It is also not the best approach if we want to create plots for many chromatography batches.

# assigning maximum and minimum x-axis values, this can be manipulated to get the section from the full process chromatogram relevant to our analysis
xlim_CV_lower = 55
xlim_CV_upper = max(CV_col)

This outputs to the plot as shown below.

Hard-coded output that shows a portion of the full process chromatogram — Portion of Full Process Chromatogram

Functional Approach

The functional approach can be compared to having a machine that can grind flour, make dough, and cook rotis all in one place. The user can decide what ingredients to put in and when. This approach involves defining functions that perform specific tasks and can be reused for different parameters. The user only needs to provide the function with the necessary information and call it in the desired order.

In this approach, the focus is on creating functions that can generate a basic plot, a secondary x-axis plot, and a secondary y-axis plot. It strikes a balance between flexibility and control, making it easier for the user to create customized plots without the need for extensive coding.

# Function that creates a baseline plot while considering various plot parameters

## PARAMETERS ##
# x-axis Series
# y-axis Series
# x-axis ticks separation length
# y-axis ticks separation length
# color of the plot
# lower x-axis value, can be changed if user wants start x-axis value to be other than 0
# lower y-axis value, can be changed if user wants start y-axis value to be other than 0
# specifies whether grid should be displayed

def plot_basic(x_axis,\
               y_axis, \
               x_axis_tick_factor=1, \
               y_axis_tick_factor=5, \
               color='blue',\
               x_lower_val=0,\
               y_lower_val=0,
               gridon = True,\
               anchor_x=0.15):
    
    global ax # ax is made global, as other plots will be overlayed on it 
    global x_LOWER, x_UPPER  # x-axis values are made global for baseline overlay

    fig, ax = plt.subplots(figsize=(15,7.5)) # Defining matplotlib figure and axes objects
        
    x_LOWER = x_lower_val # Confirming lower x-axis value
    x_UPPER = max(x_axis) # Rounding up upper x-axis value

    y_lower = y_lower_val # Confirming lower y-axis value
    y_upper = max(y_axis) # Rounding up upper y-axis value
    
    ax.plot(x_axis, y_axis, color=color) # Plotting figure
    
    # Setting x-axis parameters
    
    ax.set_xlabel(x_axis.name)  # label
    ax.set_xlim([x_LOWER, x_UPPER]) # axis limits
    ax.set_xticks(np.arange(x_LOWER,x_UPPER,x_axis_tick_factor)) # ticks
    
    # Setting y-axis parameters
    
    ax.set_ylabel(y_axis.name, color=color)  # label
    ax.set_ylim([y_lower, y_upper+y_axis_tick_factor])  # axis limits
ax.set_yticks(np.arange(y_lower,y_upper+y_axis_tick_factor,y_axis_tick_factor), color=color)  # ticks
    ax.tick_params(axis='y', colors=color) # tick labels color
    
    # Set grid if gridon parameter is set to True
    
    if gridon:
        plt.grid(axis='y', color=color, linewidth=0.7,alpha=0.5)
        
    # Set legend

    ax.legend([y_axis.name],ncol=1 , bbox_to_anchor=(anchor_x, 1.40), facecolor="white" )

Once the baseline plot is determined, the next step is to define the function for the secondary x-axis.

# Function that creates a secondary x-axis plot while considering various plot parameters

## PARAMETERS ##
# x-axis Series
# y-axis Series
# x-axis tick factor is used in case the data is not taken from method table
# color of the plot
# spine value that specifies whether the axis exists on 'top' or 'bottom'
# spine_position specifies location as 'axes' or 'outward'
# spine_position_attr specifies by how much it is away from 'axes' (0.0 to 1.0) or 'outward' (pixels)
# distance in pixels between axis label and axis ticks
# gridon specifies whether grid is required to be displayed
# from_method_table specifies if data is taken from method table to be plotted on secondary x axis

def plot_secondary_x(x_axis, \
                     y_axis, \
                     x_axis_tick_factor=5,\
                     color='blue',\
                     spine='top',\
                     spine_position = 'axes',\
                     spine_position_attr = 1.0,\
                     labelpad = 0,\
                     gridon = True,
                     from_method_table=False):
        
    ax2 = ax.twiny() # Set axis as secondary to the global axis 'ax'
    
    # Setting x-axis parameters
    
    ax2.set_xlabel(x_axis.name, color=color, labelpad=labelpad)  # label
    x_lower = CV_to_value(CV_col, min(CV_col), x_axis)
    x_upper = CV_to_value(CV_col, max(CV_col), x_axis)
    ax2.set_xlim([x_lower, x_upper])  # axis limits
   
    # Adjusting the axis to position with respect to the plot
 
    ax2.spines[spine].set_position((spine_position, spine_position_attr)) # 'spine' defines left or right of plot
    # 'spine_position' defines whether defined w.r.t. axes or absolute (outward)
    # 'spine_position_attr' is a fraction value where 0 is extreme left and 1 is extreme right for 'axes' spine_position
    # 'spine_position_attr' is a pixel value where for 'outward' spine_position
    
# if the tick labels are used from method table, then add those values, else take normal uniforml spaced ticks from the main DF
    if from_method_table:
        ax2.set_xticks(tuple(df_method_labels['CV']))  # ticks
          ax2.set_xticklabels(tuple(df_method_labels[x_axis.name]),rotation=90)
    else:
        ax2.set_xticks(np.arange(x_lower,x_upper, x_axis_tick_factor))
        ax2.set_xticklabels(np.arange(x_lower, x_upper, x_axis_tick_factor),rotation=90)
    
    # Set grid if gridon parameter is set to True
    
    if gridon:
        plt.grid(axis='x', color=color, linewidth=0.7,alpha=0.5)

Once the secondary x-axis plot is functionalized, we proceed to functionalize the secondary y-axis plots.

# Function that creates a secondary y-axis plot while considering various plot parameters

## PARAMETERS ##
# x-axis Series
# y-axis Series
# x-axis ticks separation length
# y-axis ticks separation length
# color of the plot
# spine value that specifies whether the axis exists on 'left' or 'right'
# spine_position specifies location as 'axes' or 'outward'
# spine_position_attr specifies by how much it is away from 'axes' (0.0 to 1.0) or 'outward' (pixels)
# distance in pixels between axis label and axis ticks
# gridon specifies whether grid is required
# anchor_x determines x position of the legend entry

def plot_secondary_y(x_axis, \
                     y_axis, \
                     x_axis_tick_factor=1, \
                       y_axis_tick_factor=5,\
                       color='blue',\
                       spine='right',\
                       spine_position = 'axes',\
                       spine_position_attr = 60,\
                       labelpad = 0,\
                       gridon = False,\
                        anchor_x=0.15):
    
    x_upper = round_up_max_value(x_axis, x_axis_tick_factor) # Rounding up upper x-axis value
    y_upper = round_up_max_value(y_axis, y_axis_tick_factor) # Rounding up upper y-axis value
    y_lower = np.nanmin(y_axis) # Lower y-axis value is equal to minimum value of Series excluding NaN values
    
    ax2 = ax.twinx() # Set axis as secondary to the global axis 'ax'
    ax2.plot(x_axis, y_axis, color=color)    # Plotting figure
    
    # Setting x-axis parameters
    
#     ax2.set_xlabel(x_axis.name)  # label
#     ax2.set_xlim([x_LOWER, x_upper])  # axis limits
#     ax2.set_xticks(np.arange(x_lower,x_upper,x_axis_tick_factor))  # ticks
    
    # Setting y-axis parameters
    
    ax2.set_ylim([y_lower, y_upper+y_axis_tick_factor])  # axis limits
    ax2.set_yticks(np.arange(y_lower,y_upper+y_axis_tick_factor,y_axis_tick_factor))  # ticks
    ax2.tick_params(axis='y', colors=color)  # tick labels color
    
    # Adjusting the axis to position with respect to the plot
    
    ax2.spines[spine].set_position((spine_position, spine_position_attr))# 'spine' defines left or right of plot
    # 'spine_position' defines whether defined w.r.t. axes or absolute (outward)
    # 'spine_position_attr' is a fraction value where 0 is extreme left and 1 is extreme right for 'axes' spine_position
    # 'spine_position_attr' is a pixel value where for 'outward' spine_position
    
    ax2.set_ylabel(y_axis.name, labelpad=labelpad, color=color) # label, labelpad defines how far label is from axis ticks
    
    # Set grid if gridon parameter is set to True
    
    if gridon:
        plt.grid(axis='y', color=color, linewidth=0.7,alpha=0.5)
        
    # Set legend
    ax2.legend([y_axis.name],ncol=1 , bbox_to_anchor=(anchor_x, 1.40), facecolor="white" )

With these functions determined, we can create the consolidated plot by assigning relevant arguments to each function.

# Plot basic x-y plot - x-axis : CV_col, y-axis : UV_col

plot_basic(CV_col, UV_col, \
               x_axis_tick_factor=5, \
               y_axis_tick_factor=50, 
               color='blue',\
               x_lower_val=0,\
               y_lower_val=0,
               gridon=True,\
                anchor_x=0.15)

plot_secondary_x(Stage_col, \
                     UV_col, \
                     x_axis_tick_factor=20,\
                       color='black',\
                       spine='top',\
                       spine_position = 'axes',\
                       spine_position_attr = 1.0,\
                       labelpad = 0,\
                       gridon = True,\
                        from_method_table=True)

plot_secondary_x(mL_col, \
                     UV_col, \
                        x_axis_tick_factor=40,\
                       color='black',\
                       spine='top',\
                       spine_position = 'axes',\
                       spine_position_attr = -0.17,\
                       labelpad = -70,\
                       gridon = False,\
                        from_method_table=False)

# Plot secondary y plot - x-axis : CV_col, y-axis : Cond_col

plot_secondary_y(CV_col, Cond_col, \
               x_axis_tick_factor=5, \
               y_axis_tick_factor=10,\
               color='orange',\
               spine='right',\
               spine_position = 'outward',\
               spine_position_attr = 0,\
               labelpad = 10,\
               gridon=False,\
               anchor_x=0.35)

# Plot secondary y plot - x-axis : CV_col, y-axis : pH_col

plot_secondary_y(CV_col, pH_col, \
               x_axis_tick_factor=5, \
               y_axis_tick_factor=1,\
               color='red',\
               spine='right',\
               spine_position = 'outward',\
               spine_position_attr = 60,\
               labelpad = 0,\
               gridon=False,\
               anchor_x=0.45)

# Plot secondary y plot - x-axis : CV_col, y-axis : Conc_col

plot_secondary_y(CV_col, Conc_col, \
               x_axis_tick_factor=5, \
               y_axis_tick_factor=4,\
               color='green',\
               spine='right',\
               spine_position = 'axes',\
               spine_position_attr = -0.09,\
               labelpad = -45,\
               gridon=False,\
               anchor_x=0.6)

# Plot secondary y plot - x-axis : CV_col, y-axis : PreC_col

plot_secondary_y(CV_col, PreC_col, \
               x_axis_tick_factor=5, \
               y_axis_tick_factor=0.05,\
               color='black',\
               spine='right',\
               spine_position = 'axes',\
               spine_position_attr = -0.18,\
               labelpad = -65,\
               gridon=False,\
               anchor_x=0.8)

# Plot secondary y plot - x-axis : CV_col, y-axis : DeltaC_col

plot_secondary_y(CV_col, DeltaC_col, \
               x_axis_tick_factor=5, \
               y_axis_tick_factor=0.01,\
               color='violet',\
               spine='right',\
               spine_position = 'outward',\
               spine_position_attr = 100,\
               labelpad = 0,\
               gridon=False,\
               anchor_x=1)

# set plot title

plt.title('Python-Based Process Chromatogram')

# Save plot as PNG file in the same folder as the code

plt.savefig(".\output-functional.png", dpi=1600, bbox_inches='tight')

Output of this code is shown below

Chromatography Plot created by applying functional programming in Python — Functionally Created Chromatogram

The functional approach, although it may seem longer and more complex than hard-coding values, is actually more flexible. With this approach, we can easily add multiple secondary x and y axes and adjust their positioning as needed. Additionally, we can quickly remove any secondary axes that are not relevant to our specific plot. This level of control is achieved without the need for repeated coding, making the functional approach an effective and efficient method for creating customizable plots.

Advanced Approach (Plotly)

An analogy to understand this approach is a large machine that takes in wheat and produces different types of rotis. To operate this machine, the user needs to read the manual carefully to understand the process.

Similarly, Python's Plotly package provides a high level of flexibility in plotting. It has pre-built modules that can analyze data and generate relevant plots, which may be initially challenging to use. However, with a better understanding of the package, the user can customize the output to achieve similar results as in the previous approaches.

This approach not only uses the previously imported packages but also the Plotly package and its modules.

import plotly.graph_objects as go    # to define Plotly graph object
import plotly.io as io      # to get one of the plot templates

Plotly has different styles for the plots, one of them is the 'simple_white' theme that is useful because it makes the axis lines more visible. To start, we create a Plotly figure and assign variables to pandas Series that will be used as the main x-axis.

io.templates.default = "simple_white"

First, we initialize the Plotly figure object, and assign variables to pandas Series that will be used as primary x-axis.

# initializing a Plotly figure object
fig = go.Figure()

# assigning primary x_axis to pandas Series
x_axis = CV_col

Next, we use functions to add plot traces and to add axis attributes over those traces.

# Defining a function to trace y-axis plots
# x-axis pandas Series
# y-axis pandas Series
# plot color
# y_axis_attr used as identity of the trace
# customdata that takes values from data and displays it on hovering
# hovertemplate which is an HTML like text diplayed on hovering

def y_axis_traces(x_axis, y_axis, color, y_axis_attr, \
                  customdata=list(zip(CV_col, mL_col, Stage_col, UV_col, Cond_col, Conc_col, pH_col, PreC_col, DeltaC_col)),\
 hovertemplate="<br><b>\t\t%{y:3.2f}</b>"):
    return fig.add_trace(go.Scatter(
                                    x=x_axis,
                                    y=y_axis,
                                    name=y_axis.name,
                                    line=dict(color=color),
                                    yaxis=y_axis_attr,\
                                    hovertemplate=hovertemplate,\
                                    customdata=customdata
                                    )
                        )
# Defining a function to add y-axis attributes to existing plots
# axis title
# anchor to define where the axis will be anchored
# position to determine how far from plot area the axes is located
# overlaying parameter to confirm that all parameters are overlayed on first plot
# side to determine which side of the plot the axis is placed
# color of axis labels, ticks, and ticklabels

def y_axis_attributes(title=UV_col.name, anchor="free", position=0.0, overlaying='y', side="left", color='blue'):
    return dict(title=go.layout.yaxis.Title(text=title, standoff=0), \
                anchor=anchor,\
                position=position,\
                tickcolor=color,\
                tickfont=dict(color=color),\
                linecolor = color,\
                title_font_color=color,\
                overlaying=overlaying, \
                side=side)

Next, we assign functions to create the plots.

# Tracing y-axis plots
y_axis_traces(x_axis, UV_col, 'blue', 'y1', hovertemplate='Stage:<br><b>\t\t%{customdata[2]}</b><br>UV: <br><b>\t\t%{y:3.2f}</b><br>CV:<br><b>\t\t%{customdata[0]:3.2f}</b><br>Volume(mL):<br><b>\t\t%{customdata[1]:3.2f}</b><extra></extra>')
y_axis_traces(x_axis, Conc_col, 'green', 'y2')
y_axis_traces(x_axis, PreC_col, 'black', 'y3')
y_axis_traces(x_axis, Cond_col, 'orange', 'y4')
y_axis_traces(x_axis, pH_col, 'red', 'y5')
y_axis_traces(x_axis, DeltaC_col, 'violet', 'y6')

# Create axis objects and assign atributes to each axis
fig.update_layout(
    xaxis=dict(title=x_axis.name, domain=[0.14, 0.86]),
    yaxis=y_axis_attributes(title=UV_col.name, anchor="free", position=0.14, side="left", color='blue', overlaying='free'),
    yaxis2=y_axis_attributes(title=Conc_col.name, anchor="free", position=0.07, side="left", color='green'),
    yaxis3=y_axis_attributes(title=PreC_col.name, anchor="free", position=0.0, side="left", color='black'),
    yaxis4=y_axis_attributes(title=Cond_col.name, anchor="free", position=0.86, side="right", color='orange'),
    yaxis5=y_axis_attributes(title=pH_col.name, anchor="free", position=0.93, side="right", color='red'),
    yaxis6=y_axis_attributes(title=DeltaC_col.name, anchor="free", position=1.0, side="right", color='violet')
)

Next, we add a title and size to the plot.

# Update layout properties and plot title
fig.update_layout(title_text="Python-Based Process Chromatogram", title_x=0.5, width=950, height=700)

Next, we update the legend.

# Update position and orientation of legends
fig.update_layout(legend=dict(yanchor="top",y=1.1, xanchor="left", x=0.01, orientation='h'), margin={'pad': 0})

The RangeSlider is a handy feature of Plotly that allows you to zoom in on a specific range of the plot based on the CV values. This is a useful tool that was not available in earlier methods, where you had to manually change the CV range in the code to zoom in on a particular area.

# Add range slider to plot
fig.update_layout(xaxis=dict(rangeslider=dict(visible=True),type="linear"))

The 'x unified' hovermode attribute is a useful plot component that creates a vertical marker to display all parameters at a single CV on the plot.

# Change hovermode to display a vertical marker with all parameters for a single x-axis value
fig.update_layout(hovermode="x unified")

We can also adjust the y-axes attributes

# update y-axes attributes
fig.update_yaxes(tickfont = dict(size=12), titlefont=dict(size=12))

Additionally, we can add colored rectangular areas to different stages in the plot

colors = ['red', 'green', 'blue', 'purple','orange', 'yellow', 'gray']

# last rectangle needs to be updated specially, considering out-of-index error for the last value
for i,CV in enumerate(df_method_labels['CV'][:-1]):
    if i == (len(df_method_labels['CV'])-1):
        fig.add_vrect(x0=df_method_labels['CV'].values[i], \
                  x1=max(CV_col),\
                  fillcolor=colors[i%7], opacity=0.2)
        continue
    fig.add_vrect(x0=df_method_labels['CV'].values[i], \
                  x1=df_method_labels['CV'].values[i+1],\
                  fillcolor=colors[i%7], opacity=0.2)

Finally, we save the plot as an HTML file to keep the interactive functionality

fig.write_html("file.html")

We get an HTML file, the snapshot of which is provided below

Plotly based process chromatogram — Plotly Based Process Chromatogram

The HTML output is shared here.

While the Plotly approach might seem difficult to grasp at first, it is actually the most convenient way to visualize chromatography parameters as required. Using its in-built modules and user-friendly interface, creating and customizing plots becomes a simple task, enabling efficient data analysis on the go.