Phys291 - Data handling in physics

Project introduction

Last semester I wrote a paper in the subject faststofffysikk (PHYS208) about the use of neutron diffraction in archeaology[1]. So when the opportunity to make a program about something physics related I thought it would be interesting to make it about automating the analysis of diffraction data. Specifically the finding of peaks in the data and further processing of that information including plotting using ROOT.



Crystallography is the experimental studie within solid state physics of crystall structure. Where a crystall structure is a repetativ and ordered configuration of atoms. This is in contrast to a non crystall structur (amorphous) solid that is not ordered.

One of the tools used in crystallography is radiation diffraction. There are primarly three types of radiation used: x-rays, electrons and neutrons. Similar to the the known double-slit experiment radiation sent at the crystall structure will either contructiv or destructiv interfer with it. We can see at what angles the radiation interfers constructively and use this to interpret the structure of the materiall studied.

The constructiv interferens will give peaks in the data that make it possible to determin the d spacing in the crystall structur from Braggs law. Bragg diffraction happens when the wavelength of the radiation is close to the value of the d spacing.

Diffrent angels will give diffrent d values. This is because the orientation of the crystall compared to the incoming radiation has changed. Depending on the crystal the n number might only be 1 or 1 and higher numbers.

The data from crystal diffraction can be used to more than just finding the d spacing of the crystal. The information we get from diffraction is highly useful in both physics and chemistry as its and over lapping field of study. It includes information on the bonds in the crystall, identifying composition of the sample and more[4]. The program described under have implemented d spacing calculations from peaks of the data and a comparison of peaks that can be used in identification of crystal elements in samples.

The Code

Here is my code r_andp.C and the files from the webpage[2] used in the example under CORUNDUM.PRN and CPD-1A.PRN.

The code is made such that given there is data files from a diffraction experiment, in the same folder as the code, the user can get the peaks from it. It can also plot the data in a graph and compare the peaks from another file. 

The program has the function r_and_p() as its main function (naming is old for read and print, when I started to code). It connects all the other functions by first reading a file of two columns of data into arrays that can be passed on. It counts the number of lines in the file as it is needed using the function count_lines(). The user has then options of finding the peaks for the file, graphing the data, comparing two "peak files" or making the program read a diffrent file that then can be processed.

The function graph simply uses the information taken from the files and graphs it using ROOT.

The d spacing calculations (in writer()) have been hard coded for the files given from the webpage of the International Union of Crystallography[2]. Changing it to take wavelength from user or other means would not be a problem.

The algorithm to find peaks (peaks()) works by looping through the intensity data and checking if its above a set limit. The limit is set by the user and is given by: mean+standarddiv*f, where f is a floating point value set by user. If the cell the loop is looking at is above the limit it will set value and index to that cells value and index. If the next cell is of larger value then the value and index will be updated to that value and index. If the next value is lower the value and index variable will not be updated and the next iteration of the loop will start. If the value from the array is under the limit then the index is stored in an array and the index and value variables are reset to 0. Then the algorithm waits for another jump above the limit in data and repeats the steps. The algorithm has some limitations that will be discussed later. The function writer() takes the data and calulates the d values and writes it all in a file. This function is made so there is less to read and hopefully makes the code more readable.

The peakfinding was inspired by a webpage containing diffrent kinds of peak detecting algorithms[3]. It was then modified for the c language and use of array instead of heap. The choice of limit to be mean+x*std is so that the limit scales with the data given. It could easily be altared to take any number as limit but is very convenient when working with many files with diffrent data. The choice to add one standard diviation will give most of the peaks and no peaks from noice.

The graph now also has an extended version available, where the peaks are indecate and the graph is saved.I want to thank Ladislav with the assistance on the extension.

An example

Lets have an example run of the program.

The sample CPD-1A.PRN contains the minerals Corundum, Fluorite and Zincite so lets compare it to Corundum. Start of the program we give the file name Corundum,to the program and press p at the next prompt to find the peaks of the file. We set the limit to 1(*std+mean)and write the name of the file we want to create and we get the file shown under, see Fig3. So to compare them we must do the same for the CORUNDUM.PRN file. This can be done with either changing file by pressing 'n' by the main prompt or restarting the program. Lets assume this is done and we get the file in Fig4.

Further we press 'c' at the main prompt to compare the files we have generated. It will first ask for the cut of distance from the peak that is wanted. From some testing it seems that the peaks can move with a couple or so angels. Lets choose 5 to get most of them. Then we write the names we chose for the peak files and we get the result printed out under in Fig5.

Then with this data we might like to se the plot to see if this is reasonable and if we missed any peaks. We are returned to the main prompt again so we chose 'g' and it will work for a second and return us again. We write 'e' to end the program and it displays the graph under to us,see Fig6 for the sample and Fig7 for Corundum.

And we may choose to use the extended feature that indecates peaks and saves the graph. See Fig8.


The peakfinding algorithm used has two flaws as i can see.

  • Peaks lost because they are under limit.
  • Peaks lost beacuse they are over limit but not under again before another peak.

The first points effect has been limited because the user, that knows the data, can set the limit. If they suspect there is points lost the limit can be lowered or raised. A look at the graph can also be used by the user to make decisions on the limit.

The second point is more tricky. The algorithm "triggers" a new peak if its above the limit and goes below it again. If the data does not go below the limit again before another peak, the highest peak will only be recorded. See fig9 and fig10.

These two problems are not that big in diffraction data as the peaks are more often then not seperated and large in value.

Small comment on data type choice. All floating-point types are double that is because I though d values would need the extra decimals. It became a habit and almost all double could be set to float. This would use less storage space.