PHYS291-project: Clustersize along the track

Ingrid Marie Stuen, xij007

June 2019

Project description

The objective of this project is to study how the cluster size changes along the path for carbon atoms hitting the ALPIDE-chip from the side. To do this the data will be visualized in hit maps. From the hit maps a method for separating tracks will be made. When the tracks are separated, the width and length of the tracks can be visualized in a 2D histogram.


In conjunction with the planned proton center at Haukeland University Hospital (HUH), there is an ongoing project at the university in collaboration with HUH and Western Norway University of Applied Sciences to develop a proton CT. This proton CT will be a last check in the treatment room to see if the stopping power of the tissue is as expected. For the proton CT to work it needs to be able to detect the direction of the particles as they enter the body and both direction and energy of the particles as they exit the body. A detector that might be used for this is the ALPIDE-chip.

The ALPIDE-chip is a CMOS Monolithic Active Pixel Sensor developed at CERN to detect particles at the ALICE experiment. The chip has 512 x 1024 pixels and a total area of 15mm x 30mm. To track the particles, one could use several layers of ALPIDE-chips oriented so that the particle beam is perpendicular to the surface of the chip. With this setup one could track the particles and put absorbers between the layers to be able to find the energy of the particles depending on which layer they stop in. Another setup could have the particles enter the chip from the side. One would still be able to track the particle, and it might give a better energy resolution.

The data sets used in this project was collected in Heidelberg. A carbon beam was used, hitting a single chip from the side. The sets of data collected varies in size. The data consists of .dat files which contain four columns with x-values, y-values, event ID and number of hits in the event ID.


I started out reviewing some of the data files. When studying the data, I found that the files included some lines of hits that seemed to not contain any actual hits (number of hits = 0) and some lines where there were many more lines with the same event ID than the number of hits in that event ID. The number of hits was mainly less than 10 in these lines, so I decided to ignore all lines with number of hits less than 10 in my scripts.

The first code that was written is a simple script (Hitmap.C) that reads a data file and plots all of the x- and y-values (unless number of hits less than 10) from the file into a 2D histogram, a hitmap. I also made modified versions of this script where I only looked at hits in a single event ID. The objective was to see how many particle tracks there were in a single event, and to get an idea of how I could separate the tracks.

The next script (Clustering.C), is made to separate the tracks in a single event ID. In order to visualize the data, it makes a hitmap with the hits in the given event. To separate the tracks, all of the y-values in the ID are put into a 1D histogram. After the histogram is made, the values for each y-position are put into a vector. The code then runs through all of the values in the vector and checks the difference between a value and the next value in the vector. If there is a drop in value, then the y-position of this value is stored in a new vector. To avoid getting several values for one cluster, the code goes through the values in this vector and checks if the y-positions are far apart. While the y-positions are close, the positions get added together and a counter keeps track of how many values have been added together. When there is a big jump in y-position, the sum is divided by the counter, and this number is stored in a vector. The sum and counter are then set to 0. This results in a vector that hopefully contains only values close to the center y-value of each track. To be able to see if the resulting vector matches the data, the values in the vector are given as output.

The final script (TrackShape.C) implements Clustering.C for every event ID in the data file. After finding the y-positions of the track centers it runs through all of the x- and y-values in the event ID for each of the track centers. If the y-values are within a radius of 20 pixels from the center it fills the y-value and the corresponding x-value into a temporary vector. These values are then filled into the "Track Shape" histogram. It starts filling at x=0, and adds 1 to x when there is a new x-value in the x-vector. The number of y-values with the same x-value are put in the y-direction, centered at 0. This gives an idea of how the tracks cange in width/cluster size along the length of the track.


You can find the .tar file with all of the scripts and data files here: Project_xij007.tar.gz The induvidual scripts are linked below:

An example of the output from Hitmap.C is shown in Fig.1.

Fig.1: Output from Hitmap.C when run for the file "singEvt_141218_23483_EVENTS".

An example of the output from Clustering.C is shown in Fig.2. The outputs for cluster centers are 258,348,424,534,629,781,798 and 982, it looks like these values fit quite well with the data.

Fig.2: Clustering.C run for event ID 18382 in file "singEvt_141218_23483_EVENTS". Gives out cluster-centers: 258,348,424,534,629,781,798 and 982.

An example of the output from TrackShape.C is shown in Fig.3. By looking at the number of entrys in the two histograms it seems that a lot of data is "lost" in the clustering process. From the "Track Shape" histogram one can see the majority of the tracks are shorter than 150 pixels. It also looks like most of the tracks stay narrower than 10 pixels and get narrower at the end of the track.

Fig.3: Output of TrackShape.C run for "singEvt_151218_0222_EVENTS".


The final script does approximately what I intended. The biggest weakness is probably the clustering/separating of tracks. Some of the problems with clustering are that the tracks might overlap and sometimes it is hard to tell a track from random noise. My method for finding track centers seems to work ok, but it is far from perfect. Another issue is that it looks like the particles entered the chip from x=512 and moved twoards x=0, my script on the other hand reads from x=0 and up.