Project for PHYS291

By Sivert Olsen Loevaas


I wanted to make a project familiarizing myself with the methods needed and useful for the analyzing of CERN (etc) Open Data. For this purpose, CMS Open Data has been used. All the coding have been done inside a viritual machine provided by CMS Open Data, as this makes accessing the data easier. Data from CMS run A 2011 has been used, and "global muons" has been analyzed.

Setting up the Viritual Machine

As per instructed in the "CMS 2011 Virtual Machines: How to install" page, the program ViritualBox was installed to my local PC, and the CMS-specific Viritual Machine (hereby referenced VM) was downloaded and set up. To set up the working environment, the command
cmsrel CMSSW_5_3_32
creates the directory CMSSW_5_3_32 with several subdirectories, some of which will be used by the analyzers as they compile. One of the directories, CMSSW_5_3_32/src, is where we'll work from from here on.
To prepare for any analyzis, that is, every time we open a new CMS shell terminal, the command 'cmsenv' should be run. It sets up the runtime variables. The very first analyzis to run in the VM is a demoanalyser. This will also create a so called "skeleton" for the analyzers.
mkdir Demo
cd Demo
mkedanlzr DemoAnalyzer
Inside the DemoAnalyzer directory lies now a simple analyzis program, that does not have a specified datasource yet. The DemoAnalyzer directory will look like this, 'Tester' here being replaced by Demo:
In here, we see the BuildFile.xml, that is, a configuration document for the 'scram b' command. More on scram later. Then there are several files that I have not touched, and that probably should remain untouched, including the whole python directory. The src directory however contains the main analyzer program, in the C++ file (or as TestAnalyzer in the picture above). This is the file where the methods for analyzing will be written. Then lastly there is the python file, which is used as a configuration file. It's in this file we'll choose our datasource and validation file, as well as possible settings for the MessageLogger method of the process-function.
I wanted to shortly note how the Tester/TestAnalyzis test worked better than expected, as the automatically generated analyzer (normaly called '') was named according to the directory-name used for mkedanlzr. This is important, as the files made during 'scram b' could get tangled up, so to speak, if several analyzers with the same name were created under the same src directory. SCRAM is a configuration management tool [1]. The 'scram b' command builds the specific enviornment, making so called packages and compiling , for the analazys to run. If done successfully, the output should look something like this:

Now, for the first test analyzis - the one provided as a validation run - one simply (being in scr, having done cmsenv and scram) write cmsRun Demo/DemoAnalyzer/, and see what happenes. Notice that for the actual analyzis, we run the configuration file, not the actual analyzis-file, though most relevant analyzis setup is done in the src/ The result should look like this:
This is not making any actual root-file, this is just to check that the VM and CMS shell terminal works.
Also important notice: there are two different program icons in the CMS VM for entering a terminal. One of them should just not be used when working with Open Data. The cmsRun will not work, and you will waste time and get frustrated (from experience). Don't click the one on the bottom of the screen, click the one in upper left corner!
This will make the terminal to a CMS shell, not a "outer shell":

(This is barely noted in the CMS VM guide, easly missed)
Methods for Analyzis
For clearness sake, let's first take a look at the simple configuration file provided for the validation:
Most python config files will start with the line
import FWCore.ParameterSet.Config as cms
which imports our CMS-specific Python classes and functions [4].
The top-level item of a configuration program is a Process object. Each configuration program must have a Process object assigned to a variable named process [5]. Long story short, this is what puts together the configuration information for the cmsRun executable. The argument for the Process-object is the name that gets carried along with the output-data.

For the simple demoanalyzer to work, the line containing file:myfile.root must be replaced with the actual root datafile. For this project, this example has been used, with data from the first index list for CMS RunA of 2011:
CMS_Run2011A_DoubleMu_AOD_12Oct2013-v1_10000_file_index.txt [A]
Containing several root-files like this:
With the corresponding validation file, from CMS validated runs:
Cert_160404-180252_7TeV_ReRecoNov08_Collisions11_JSON.txt [B]
In the case of using a index of data-files instead of a single data root-file, the 'fileNames = cms.untracked.vstring(...)' in process.source should be given an argument *refering to a FileUtils.loadListFromFile([PLACEMENT OF INDEX FILE]), and for this example:
(first, we need:) import FWCore.Utilities.FileUtils as FileUtils
files2011data = FileUtils.loadListFromFile ('/home/cms-opendata/CMSSW_5_3_32/src/Demo/DemoAnalyzer/datasets/CMS_Run2011A_DoubleMu_AOD_12Oct2013-v1_10000_file_index.txt')
process.source = cms.Source("PoolSource",
fileNames = cms.untracked.vstring(*files2011data )
Then we need a certification file for this dataset (without, it'll run for ages, and thorugh lots of unverified data).
The certification file provides good luminosity sections for analyzis [2], and consists of several good lumisections like:
{"160431": [[19, 218]], "160577": [[254, 306]],"160578": [[6, 53], [274, 400]]},
where the general format is:
{"Run Number":[Lumi range, Lumi range, Lumi range, ...],
"Run Number":[Lumi range, Lumi range, Lumi range, ...],
This is added, written here in a summerized manner, as this:
import FWCore.PythonUtilities.LumiList as LumiList
import FWCore.ParameterSet.Types as CfgTypes
goodJSON = 'Cert_160404-180252_7TeV_ReRecoNov08_Collisions11_JSON.txt'
myLumis = LumiList.LumiList(filename = goodJSON).getCMSSWString().split(',')
process.source.lumisToProcess = CfgTypes.untracked(CfgTypes.VLuminosityBlockRange()

With the imports generally being done at the beginning of the code, definition of goodJSON and myLumis anywhere before use and the lumisToProcess must be placed after the provess.source input file definition!
With all this added to the configuration file, the next step is to construct the actual analyzer. One last thing about the configuration file, as it feeds into the making of the analyzer file directly:
If we need to find out which branch of an event we can use, and what is the "label" of it, we can use a EventContentAnalyzer [3]. To do this, one adds the line
as well as adding '*process.dump' in the Path in the last line.

When doing this, one should set the process.maxEvents-number to 1, as the information-dump will be unnecessary long otherwise.
(↑This is all in my file, #commented out)

Now for the actual analyzer file:
This file contains our methods, that will be called and run through per event. [VURDERE A ENDRE PAA]First more on the analyzer-file in general, then we'll go through the steps make some possible histograms. The central member of the DemoAnalyer class is the analyze method, initiated from:
void DemoAnalyzer::analyze(const edm::Event& iEvent, const edm::EventSetup& iSetup) {
The method analyze receives a pointer iEvent to the object edm::Event which contains all event data.[6]
Containers are provided for each type of event data and can be obtained by using the object edm::Handle. iEvent.getByLabel (handle to types of event data) will retrieve the data from the event and store them in a container in memory. In this code, it looks like this:
Handle gmuons; iEvent.getByLabel("globalMuons", gmuons);
For this project, only some of the methods for analyzing globalMuons will be looked at, however, more methods are also covered in the file. Some calculations will be left out, as they are described with "WHAT"s and "WHY"s in the codes commenting.
Before the analyzer-methods, the histograms have been "prepared for filling": (summerized)
In the definition of the module EDAnalyzer{
TH1D *h1;
TH1D *h2;

TH1D *h5;
TH1D *h6;

TH1D *h10;

In the definition of the main constructor call DemoAnalyzer{:

edm::Service<TFileService> fs;

// global muon multiplicity
h10 = fs->make("GMmultiplicty", "GMmultiplicity", 8, 0, 8);
h10->GetXaxis()->SetTitle("Number of Global Muons");
h10->GetYaxis()->SetTitle("Number of Events");

// global muon momentum
h1 = fs->make("GMmomentum", "GM_Momentum", 240, 0., 120.);
h1->GetXaxis()->SetTitle("Global Muon Momentum (in GeV/c)");
h1->GetYaxis()->SetTitle("Number of Events");

// global muon Transverse_momentum
h2 = fs->make("GM_Transverse_momentum", "TransverseMomentum", 240, 0., 120.);
h2->GetXaxis()->SetTitle("Transverse Momentum of global muons (in GeV/c)");
h2->GetYaxis()->SetTitle("Number of Events");

// dimuon mass spectrum up to 4 GeV (low mass range, rho/omega, phi, psi)
h5 = fs->make("GMmass" , "GMmass" , 400, 0. , 4. );
h5->GetXaxis()->SetTitle("Invariant Mass for Nmuon>=2 (in GeV/c^2)");
h5->GetYaxis()->SetTitle("Number of Events");

// dimuon mass spectrum up to 120 GeV (high mass range: upsilon, Z)
h6 = fs->make("GMmass_extended" , "GMmass" , 240, 0. , 120. );
h6->GetXaxis()->SetTitle("Invariant Mass for Nmuon>=2 (in GeV/c^2)");
h6->GetYaxis()->SetTitle("Number of Events");
Now, we're back in the analyzer member function. For the making of the first histogram, the size of the gmuons-tracks is filled in, so simply:
Then, for the other histograms, we need to iterate through the globalMuons of the current iEvent. For the momenta and transverse momenta of the globalMuons, we then simply fill the values with p() and pt(). With the iterator 'it' first being defined as the begin() value of the gmuons, then being iterated through until 'it' is the end() value of gmuons:
for (reco::TrackCollection::const_iterator it = gmuons->begin();
it != gmuons->end(); it++) {
For the calculation of the dimuon mass, quality cuts are made. With more descriptions commented in the cc-file, the summerized quality cut procedure is as follows:
int ValidHits = 0, PixelHits = 0;
const reco::HitPattern& p = it->hitPattern();
for (int i = 0; i < p.numberOfHits(); i++) {
uint32_t hit = p.getHitPattern(i);
if (p.validHitFilter(hit) && p.pixelHitFilter(hit))
if (p.validHitFilter(hit))

if (gmuons->size() >= 2
&& ValidHits >= 12
&& PixelHits >= 2
&& it->normalizedChi2() < 4.0) {
^This latest if is the actual quality cut. After finding our first good candidate for calculation, we need a second candidate. To be able to compare all the globalMuon-Tracks after 'it', we make a new iterator i=it, and advance it one: reco::TrackCollection::const_iterator i = it;

for (; i != gmuons->end(); i++) {

int ValidHits1 = 0, PixelHits1 = 0;
const reco::HitPattern& p1 = i->hitPattern();

for (int n = 0; n < p1.numberOfHits(); n++) {
uint32_t hit = p1.getHitPattern(n);
if (p1.validHitFilter(hit) && p1.pixelHitFilter(hit))
if (p1.validHitFilter(hit))
Then, having prepared two iterators, where 'it' is a good candidate, we check to see if the charges are oposite, as well as quality checking 'i':
if (it->charge() == -(i->charge()) // unlike charges
&& ValidHits1 >= 12
&& PixelHits1 >= 2
&& i->normalizedChi2() < 4.0) {
And finally, if we have two good candidates for globalMuons, we calculate the invariant mass and fill it in the histograms as follows: s1 = sqrt(((it->p())*(it->p()) + sqm1) * ((i->p())*(i->p()) + sqm1));
s2 = it->px()*i->px() + it->py()*i->py() + it->pz()*i->pz();
s = sqrt(2.0 * (sqm1 + (s1 - s2)));


The goal of all this is to create histograms from AOD-data (FYLL INN OM AOD), and I used the methods from the examples. This made it easy to display for myself in TBrowser, however, it was somewhat more challanging to make pure PNGs (without using the snipping tool, which I consider cheeting in this circumstance) or display them in TCanvas. The problem being that, in difference to the methods in exercise 8 in PHYS291 (if they may be referenced), the root-file contained a 'TDirectory', which again contained the histograms. After some trial and error, the following proved to work:
This however only works in the VM, probably as a result of the fact that the IFT klient run a different version of ROOT. (an older one, at that)
Now, for some histograms:
Simply the gmuon->Size() check:

The momentum-distribution of all globalMuons in all events:

The transverse momentum-distribution of all globalMuons, that is, the component of the momentum at hit perpundicular to the beam direction:

Calculated invariant mass for two muons (with unlike charge) in the range 0 to 4 GeV:

We see here a peak around 3.1 GeV. This might well correspond to a decay mode of the J/psi particle, with a earlier publicized invariant mass of 3.0969 GeV. [7]

Calculated invariant mass for two muons (with unlike charge) in the range 0 to 120 GeV:

Here we see a peak at about 90 GeV. One of the decay modes of the Z boson is to two muons, so this might be a Z boson, as they have a invariant mass of about 91 GeV. [8]

List of all files:
For scram and cmsRun:
BuildFile.xml (Actually txt file with copied content, since xml wouldn't show)
With data and certification:
Making the root-file, with histograms: