

Peak Purity In an ideal chromatographic separation, all sample components would be isolated from each other and detected as fully resolved peaks. However, it is not uncommon to encounter situations when separation is less than ideal, resulting in some degree of overlap. In extreme cases, what appears to be a single peak contains in fact two or more coeluting components: 

Figure 1  
Checking the purity of such peaks is critical in order to correctly interpret the results of the analysis. Hyphenated techniques, such as LCDAD (diode array detector) and LCMS (mass spectrometry detector) add a “spectral dimension” to the chromatographic separation, which can facilitate the interpretation of peak purity. Most Chromatography Data Systems (CDS) implement their own algorithms to calculate “peak purity indexes”, but the calculations involved and the interpretation of the results are often obscure. In this column, we attempt to shed some light on how purity index is arrived at and how to make sense of it. 

Comparing Spectra The chromatogram obtained with a “traditional” HPLC or GC detector is the visual representation of detector response (for example, absorbance at a given wavelength) as a function of time: 

Figure 2  
This chromatogram is electronically stored and manipulated by the CDS in a 2column table of absorbance versus time: 



A photodiode array detector can measure absorbance at multiple wavelengths (sometimes called channels) simultaneously, which generates a response surface rather than a flat curve. The resulting “spectrochromatogram” shows sample absorbance (z axis) at each time (x axis) and wavelength (y axis): 

Figure 3  
Slicing the spectrochromatogram along a specific wavelength (e.g. 215nm) displays the chromatogram as it would be seen at that wavelength (sometimes called an “extracted” chromatogram, Figure 2). Slicing at a specific time (e.g. at the retention time of the second peak) produces the absorbance spectrum recorded at that point in time (Figure 4): 

Figure 4  
Each peak in the spectrochromatogram is therefore made up of a succession of spectra collected at constant time intervals:  
Figure 5  
Comparing spectra by successively “doubleclicking” on different parts of the peak is probably the most common method to discover potential impurities. This method, however, is time consuming and prone to error, as it relies on the subjective inspection of each spectrum.  
Data Tables It is often said that the large amount of data produced by a spectrochromatogram is stored in a “3dimensional” matrix. This is misleading, and probably contributes to the confusion surrounding peak purity algorithms. Absorbance readings from a spectrochromatogram are in reality stored in a larger table than the single wavelength chromatogram: a table which has as many columns as channels: 



Mathematicians would call such a table a Matrix, and mathematical calculations involving matrices belong in the realm of Matrix Algebra. 

What is the Matrix? A matrix is a collection of numbers ordered by rows and columns. The elements of a matrix are enclosed in parentheses, brackets, or braces. Matrices are often represented by capital (sometimes bold) letters, and the elements in a matrix are identified by subscripts (row, column): A vector is a special type of matrix that has only one row (called a row vector) or one column (called a column vector). Vectors are represented by small case letters with an arrow, or in bold: Matrices provide a convenient tool to manipulate large amounts of data. If the whole of the spectrochromatogram is represented by a matrix, then the extracted chromatograms are represented by its column vectors and the spectra are represented by its row vectors. Most of the discussion that follows relates to these row vectors and their mathematical manipulation. 

Figure 6  
The row vectors in our spectrochromatogram simply extend this concept to higher dimensions, one for each wavelength being measured. Each element in the vector is the absorbance measured at a specific wavelength (Figure 7), and in that way each vector represents a unique spectrum: 

Figure 7  
Two spectra can differ in spectral terms (some wavelengths will have zero absorbance, others will have a positive absorbance), as well as in intensity (spectra at the leading and trailing edges of a peak tend to have low abundances, whereas spectra close to the apex have high intensities). When translated into vectors, spectral differences result in vectors with different “directions” and intensity differences result in “long” or “short” vectors. All peak purity algorithms are based on the same principle: comparing several spectral vectors across a chromatographic peak, and determining how similar they are to each other. Matrix algebra is used to perform these comparisons. The spectrum at the apex of the peak is often assumed to be that of the pure compound, and thus is taken as the reference against which all other spectra are compared. 

Comparing Spectra the Smart Way Since any vector describes a direction, one can define an angle (called θ in Figure 6) between any pair of vectors. If the two vectors are identical, the angle between them will be zero and the cosine of the angle, cosθ, will be one. The greater the difference between the vectors, the greater the angle, and the lower the cosine. If two vectors are perpendicular, the angle defined between them is 90 degrees and the cosine is 0. Thanks to this simple principle, the cosine of the angle between row vectors provides the simplest way to quantify the “similarity” between any two spectra. This is the technique used by Shimadzu Class VP software in their peak purity algorithms. The cosine between two vectors is easy to calculate in matrix algebra:
The symbol is called “norm”; it is a scalar (a number) equal to the length of the vector, and is calculated as:
The symbol is called “dot product”, or “scalar product” of two vectors. The result is a scalar, calculated as:
The expanded expression of cos θ is therefore: 
The cosine of the angle between two vectors is therefore the scalar product of the two vectors divided by their lengths. Dividing by the lengths is important as it normalises the vectors, making their length equal to 1. Whether the vectors are “long” or “short” has no effect on the final result: the only important factor is the difference in “direction”. In spectral terms, differences in intensity are negated and only differences in “shape” are considered. As an example, the following diagram shows two spectra taken from the same peak, at 4.6 (leading front) and 5.0 minutes (apex). Although the two spectra are vastly different in intensity, their similarity index (SI) demonstrates that they are identical in shape: 
Figure 8 
The two spectra below were taken from the apex of two different peaks, at 5.0 minutes and 7.0 minutes. The difference in shape is obvious, and this is reflected in the similarity index: 
Figure 9 
Waters Empower goes one step further, and reports the angle (in degrees) between the two vectors rather than the cosine. This provides a very intuitive measure of their degree of similarity. When calculating peak purity, this angle is called “purity angle”. The same techniques is used to calculate spectral library matching, in which context the resulting angle is called “spectral contrast angle”. 
Peak Purity Plots A peak purity plot represents graphically how the similarity index varies across the peak. For an ideal, pure peak, every spectral comparison would yield a perfect similarity index, resulting in a horizontal line at SI=1.0000 across the whole peak: 
Figure 10 
If the peak is impure, however, the similarity index will drop below 1 in the region of the peak that contains the impurity. For example, the following figure shows the purity plot for a main peak (t_{R}=5.0min) with two impurities at 4.7min and 5.3min. Although the impurities are only at 1% of the main peak, the purity plot clearly shows their presence: 
Figure 11 
By convention, most commercial algorithms plot sin θ = 1  cos θ versus time, so that lower values (close to zero) indicate high purity, and increasing values reflect the presence of impurities. The chromatographic peak profile is often overlaid for greater clarity. Real purity plots, even for chemically pure peaks, never have a constant value across the whole peak. The reason is the random noise that accompanies every empirical measurement. This noise may be caused by unstable lamp output, absorption from mobile phase components, thermal fluctuations, electronic noise, etc. At the apex of the peak, where absorbance is intense, the effect of noise is insignificant and SI reflects the true purity of the peak. At the edges of the peak, however, absorbance is much weaker and the contribution of noise is no longer negligible. Noise introduces significant differences between edge spectra and the reference peak and, as a result, the apparent impurity level increases towards the front and tail of the peak. For this reason, it is necessary to assess whether a similarity index is due to noise or to genuine spectral differences. Many algorithms show a noise threshold curve to help interpret SI: if the similarity curve is below the noise level, the peak is assumed pure; whereas an impurity is suspected in those sections where the purity curve is above the noise threshold. Figure 12 shows evidence of an impurity at the tail of the peak, where the similarity index (blue trace) raises above a noise level of 0.1mAU RMS (red trace). The chromatographic peak is shown in grey: 
Figure 12 
The noise threshold curve is calculated from real noise measurements (noise reference), taken from a defined section of the chromatogram. For isocratic methods, this is typically the first few seconds of the chromatographic baseline. For gradient methods, where the spectral makeup of the baseline changes over time, noise references should be taken as close to the peak of interest as possible. CDS often offer the possibility to measure noise references at the baseline before the peak, after the peak, or both. The option to manually select noise reference spectra is normally available as well. 
Regression Analysis Some software packages, like Agilent’s ChemStation, use least squares regression to calculate the correlation between pairs of vectors. The similarity factor is defined in ChemStation as: Where r^{2} is the coefficient of determination, calculated from: This magnitude is different from cos θ, but its interpretation is similar: highly correlated vectors will yield values of r^{2} close to one, whereas very different vectors will result in a lower coefficient of determination. 
Agilent recommends that SF values greater than 995 indicate very similar spectra, whereas values between 900 and 990 indicate significant differences that should be studied in more detail. The software generates purity plots (called similarity curves) very similar to Figure 12 . ChemStation, however, offers a number of useful additional ways to plot similarity factors:
The purity ratio plot is divided in two horizontal bands: green (below 1) indicating high purity, and a red band (above 1) suggesting the presence of impurities (the two bands are simulated in Figure 16 by green and redcoloured symbols). Notice that this mode produces a similar output as the similarity to threshold ratio mode, but it enhances fine details at high purity. 
Comparing Wavelengths A very simple, but less powerful, tool to assess peak purity is provided by many CDS. By selecting two or more different wavelengths, it is possible to overlay the respective chromatographic signals and therefore inspect how well they correlate. Figure 17 shows extracted chromatograms at 200nm, 240nm, and 280nm. All three traces overlap exactly for the peak at 2.0min, indicating that this peak is pure. However, the profiles of the three wavelengths do not overlap exactly for the peak at 5.0min, evidence that this is not a pure peak: 
Figure 17 
The effectiveness of this technique, however, depends on a careful choice of wavelengths. Figure 18 shows the same pair of peaks, this time at 200nm, 220nm and 300nm. This choice of wavelengths, unfortunately, does not help elucidate the purity of the peak at 5.0 minutes: the trace at 220nm shows a perfect alignment of the peak profiles, evidence that only the main compound absorbs at this wavelength whilst the impurity remains invisible. At 300nm, neither the main compound nor the impurity exhibit enough absorbance to provide conclusive results. 
Figure 18 
Overlaying two or more wavelengths, therefore, has a fundamental limitation: it can prove that an impurity exists if the peak profiles appear misaligned, as in Figure 17, but it cannot prove conclusively the opposite (that is, that the peak is pure) as demonstrated in Figure 18, unless all the wavelengths in the acquisition range were explored. 
Ratiograms Agilent’s ChemStation and Dionex’s Chromeleon provide yet another simple technique to evaluate peak purity: the socalled ratiogram or peak ratio. The method consists in extracting two wavelengths from the spectrochromatogram and plotting their absorbance ratio against time. For a pure peak, the absorbance ratio will be constant and the ratiogram will result in a horizontal line across the peak. For peaks with impurities, the absorbance ratio will change across the width of the peak, resulting in a curve (Figure 19): 
Figure 19 
The same careful choice of wavelengths as above is required for ratiograms, as some wavelength pairs may not reveal the presence of impurities. A preliminary inspection of peak spectra to visually identify the best candidate wavelengths is the most common approach. 
Conclusion CDS developers offer a variety of techniques to evaluate peak purity, by making use of the spectral information obtained from a diode array detector. Although some of them are very sophisticated, they are all based on simple mathematical principles, and a certain amount of computational power. When several techniques are offered by the same software, choosing one algorithm over another can be a daunting task. As ever, it is good practice to compare the results obtained from different methods and choose the one that is easiest to interpret; but a good understanding of the underlying mathematics is essential. We hope that this column makes that understanding easier. Finally, the human eye is superbly adept at identifying patterns and differences. Investing a little time in visually exploring the data will often result in better decisions. 