Home » Analysis and Reconstruction » Tracking & Vertexing » Fitting scheme
Fitting scheme [message #1981] 
Tue, 11 May 2010 08:33 
rosemann Messages: 41 Registered: March 2009 Location: hamburg.de 



Hi,
after giving it some thought, I would like start the discussion about the current and future scheme of fitting in MarlinTPC.
Probably Martin (are you there?) will comment on this, but this represents my understanding of the current scheme:
 The base class is "TrackFitterBase", which basically just defines an interface.
CR: I'm not too happy about the naming scheme for the /residuals/ access, but in principle I think it is OK to have something like this. It forces to implement some functions, but I can't see the disadvantage of this.
 The actual fitter class are derived (inherited) from the above base class and are not Marlin processors.
CR: I can agree to both. We would like to call the fit iteratively. With processors it is believed to be possible (I talked to Steve about "processors calling processors"), but I don't think we should make life more complicated.
CR: As a different thought I eventually think that a much simpler "fit object" might be more natural. I would throw the fit object the LCIO::TrackerHits and deal with the result.
 The calling of the fitter class is done by a processor.
CR: This seems obvious, since we want to do it within MarlinTPC. But I guess we might want to do the "bookkeeping" part in the processor, rather than in the fitter class.
 The TrackFitterFactory can be used to handle different fitted tracks by their type; by using slightly peculiar "types" defined in the TrackFitterBase class.
CR: I have to admit that this is a possible solution to the problem: "how do I determine the residuals of a particular already fitted track". But it is not the most straightforward (at least to me).
 BREAK 
I would approach this whole problem slightly different. I would start with the description of what we want to do, then try to write it and think about how we might want to use it differently/figure out what is missing.
What do we want do?
 fit tracks on already found track hypotheses with associated hits
 calculate residuals (and/or distance) to obtain resolutions
 reject outliers
 ...
I can say that we (in Hamburg) now start(ed) to develop/implement a full fledged (with input and output errors) chi square fitter for a straight line and a helix. Hopefully more ideas will come along the way.
Cheers,
Christoph
When you have eliminated the impossible, whatever remains, however improbable, must be the truth. (Sir A.C. Doyle in Sign of Four)



Re: Fitting scheme [message #1984 is a reply to message #1981] 
Tue, 11 May 2010 11:44 
jabernathy Messages: 78 Registered: March 2006 Location: University of Victoria 



Hello all,
Here are my thoughts a fitting scheme:
I like the idea of a TrackFitterBase. From experience, trying to put all of the functionality in one class can be unwieldy.
However, after that it is more complicated.
As it was pointed out before, there are different methods (and meanings) of the residuals. One has to be careful that the correct routine is called for a fitted track.
One idea that I had previously was to let the fitting processor be responsible for calculating the residuals. For example, a reconstruction steering file could contain this processor:
<processor name="MyTrackFitterLikelihood" type="TrackFitterLikelihoodProcessor">
<parameter name="Sigma0" type="double">0.81013581</parameter>
<parameter name="FitSigma0?" type="boolean">true</parameter>
<parameter name="Noise" type="double">0.001</parameter>
<parameter name="SigmaZ" type="double">0.5</parameter>
<parameter name="ProduceTrackTuples?" type="bool">true</parameter>
<parameter name="PathIntegralStepSize" type="double">25</parameter>
<parameter name="OmegaTau" type="double">10</parameter>
<parameter name="DriftVelocity" type="double">35</parameter>
<parameter name="ReadoutElectronicsInnerZ" type="double">2245</parameter>
<parameter name="TPCOuterRadius" type="double">1800</parameter>
<parameter name="FitNonHomogenousFields" type="bool">true</parameter>
</processor>
Then the analysis code could run the same processor except with a flag telling the fitter to only calculate the residuals:
<processor name="MyTrackFitterLikelihood" type="TrackFitterLikelihoodProcessor">
<parameter name="CalculateResiduals">true</parameter>
...
</processor>
One advantage to using this method is that the fitter could check the track to see if it was fit using this method, and quit gracefully, if it wasn't. It also allows most of the code to be kept in one place. Note, I am not suggesting that the processor _is_ the fitter, they could still use a subclass to manage the fitting and residual calculations.
One disadvantage of this method is that I don't think analysis logic should be placed in the fitter. If the residual calculation is done with the fitting processor then it is probably going to be an "allornothing" choice of which tracks to calculate residuals for. This may not be a bad thing  a _real_ analysis processor could just ignore any unneeded information.
Another method that I thought of was keeping the track fitter processors in pairs: one for fitting and one for residuals. So there would be a
TrackFitterLikelihoodProcessor in the reconstruction chain and TrackResidualLikelihoodProcessor in the analysis chain. This may not be a bad idea. It allows logic to be located where it needs to be. Code could be shared by using the common object inherited from the TrackFitterBase.
I hesitate to suggest using a factory method for calculating the residuals because it seems to be moving in an orthogonal direction to the intent of Marlin  keeping the chain "modularized". Also, if a factory method/processor is used then it could be difficult to manage the processor parameters because each residual method is parameterized differently.
However, I must admit, I don't know a great deal about the analysis side of things.
Is it common for a objects to be reconstructed with one method and then fit with another?
When analysis is done is it usually done to all of the tracks or just ones which are interesting?



Re: Fitting scheme [message #1987 is a reply to message #1984] 
Wed, 12 May 2010 04:52 
rosemann Messages: 41 Registered: March 2009 Location: hamburg.de 



Hi Jason,
jabernathy  [...]
I like the idea of a TrackFitterBase. From experience, trying to put all of the functionality in one class can be unwieldy.
However, after that it is more complicated.

Yes, I think one base class is what we actually want and need; to simplify the usage down the road.
jabernathy 
As it was pointed out before, there are different methods (and meanings) of the residuals. One has to be careful that the correct routine is called for a fitted track.

Yes, I agree. I'm heavily influenced by the terminology of the DESY group, so I suggest this naming scheme:
residual  the geometrical distance in x/phi at constant y/r between an associated hit (from track finding) to its fitted track, when it is excluded from the fit
distance  the geometrical distance in x/phi at constant y/r between an associated hit (from track finding) to its fitted track, when it is included from the fit
The resolution is then the geometric mean of both values. (For reference check e.g. Ralf Dieners thesis or physics/0402054)
So my proposal would be to call the two functions in the TrackFitterBase, including a third one (which is optional, but might be more interesting down the line)
 calculateDistances(...)
 calculateResiduals(...)
 calculateResolution(...)
We would still need some kind of definition or interface to express the alternatives, e.g. perpendicular distances to the track (although this can get pretty ugly).
By doing this in the actual fitter class we would hide the iterative fitting calls (which are needed for the residuals calculation) within this code. I'm not really aware if there are any limitations to the actual fit method. To be more specific, I don't know if the Likelihood method can work like this.
jabernathy 
One idea that I had previously was to let the fitting processor be responsible for calculating the residuals.
[...]
One advantage to using this method is that the fitter could check the track to see if it was fit using this method, and quit gracefully, if it wasn't. It also allows most of the code to be kept in one place. Note, I am not suggesting that the processor _is_ the fitter, they could still use a subclass to manage the fitting and residual calculations.
One disadvantage of this method is that I don't think analysis logic should be placed in the fitter. If the residual calculation is done with the fitting processor then it is probably going to be an "allornothing" choice of which tracks to calculate residuals for. This may not be a bad thing  a _real_ analysis processor could just ignore any unneeded information.

I think it is not a good idea to let the processor handle the logic, for several reasons. The extra overhead needed for the determination of the fit method is not large, but somehow confusing. As you mention, several things would get mixed by doing it this way.
The underlying problem is of course something else  the LCIO objects lack some, eventually fundamental functionality. For example, if it was possible to attach a hit to track, but mark it "unused", then we would have the solution. This would encode all information from the fitting inside the data object. It would also go beyond the scope of fitting.
Probably I will have to take this issue to the LCIO maintainers. In principle this would the solution to a lot of the problems we have there right now.
The next is more like a comment:
jabernathy  [...]
Is it common for a objects to be reconstructed with one method and then fit with another?
When analysis is done is it usually done to all of the tracks or just ones which are interesting?

a) what do you mean by "reconstruction method" ?
b) yes and no. You usually have a lot of requirements for tracks (and its components), and you usually only take a fraction of all tracks that passed these requirements into account. (Something like: Chi^2 value, number of hits, position, hit quality, track angle, environmental factors, ...)
Cheers,
Christoph
When you have eliminated the impossible, whatever remains, however improbable, must be the truth. (Sir A.C. Doyle in Sign of Four)



Re: Fitting scheme [message #1988 is a reply to message #1987] 
Wed, 12 May 2010 10:03 
jabernathy Messages: 78 Registered: March 2006 Location: University of Victoria 



rosemann wrote on Wed, 12 May 2010 04:52  Hi Jason,
So my proposal would be to call the two functions in the TrackFitterBase, including a third one (which is optional, but might be more interesting down the line)
 calculateDistances(...)
 calculateResiduals(...)
 calculateResolution(...)

In fact, it is even more complicated than that. The "residuals" from the likelihood method are in units of charge or electrons.
rosemann  I think it is not a good idea to let the processor handle the logic, for several reasons. The extra overhead needed for the determination of the fit method is not large, but somehow confusing. As you mention, several things would get mixed by doing it this way.

I agree, this is what led me to the idea of having pairs of fitting processors: one for fitting and one for doing the analysis.
rosemann  The underlying problem is of course something else  the LCIO objects lack some, eventually fundamental functionality. For example, if it was possible to attach a hit to track, but mark it "unused", then we would have the solution. This would encode all information from the fitting inside the data object. It would also go beyond the scope of fitting.

There is an LCRelation from LCIO. I believe these are used to abstractly associate two LCIO objects with a single weighting parameter.
Do you mean marking a track as unused for the purpose of calculating the "distance" measure that was previously defined?
If that is the case then this would be a good motivator for using the dualclass approach. The algorithm for calculation these distances could be contained in a "TrackResidualChiSquaredProcessor".
"rosemann"  a) what do you mean by "reconstruction method" ?

The type of fitter used. So, is it likely that multiple types of track fitters will be used to reconstruct/analyse the data during one execution of Marlin processor chain?



Re: Fitting scheme [message #1990 is a reply to message #1981] 
Fri, 14 May 2010 08:21 
killenberg Messages: 125 Registered: July 2005 Location: CERN 



Hello Christoph,
hello Jason,
please excuse my delayed reply.
rosemann 
residual  the geometrical distance in x/phi at constant y/r between an associated hit (from track finding) to its fitted track, when it is excluded from the fit distance  the geometrical distance in x/phi at constant y/r between an associated hit (from track finding) to its fitted track, when it is included from the fit
The resolution is then the geometric mean of both values. (For reference check e.g. Ralf Dieners thesis or physics/0402054)

I was not aware that residual and resolution are used with this definition at Desy. Now I understand why you don't like it the way it is used in the current scheme. Here is how I used it:
residual  The geometric distance (calculated by a method defined by the actual fitter implementation) of a 3D space point (hit) to a track hypothesis (=track with a certain set of parameters).
This is much more general than the definition you use. Especially with the usage of the term "resolution" I feel a little uncomfortable. For me resolution is the width of the residual distribution, for example the RMS of a histogram, not a value of a single measured hit.
But also restricting the measured distance to x/phi at constant y/r is not general enough:
 Hit coordinates are stored in global coordinates. The pads are in local coordinates. For the LP for instance you hit one module in each module row. So you have hits from three different local polar coordinate systems, which are stored in global Cartesian coordinates.
 For the Timepix there is no bias of a pad row, the hits are just measurements of x and y. One wants to measure the distance perpendicular to the helix.
The geometric mean is used because it is an unbiased estimator for the resolution. But we always used the mean of the withs of the two distributions (with and without the specific hit used in the fit). I don't know which is correct, probably both methods are equivalent if you calculate it through.
However, there are other methods to calculate an unbiased estimator. For instance one can use the distance calculated with the hit included in the track fit and multiply it by sqrt(n) / sqrt(nDoF), where n is the number if hits in the track and DoF is the number of degrees of freedom in the track fit.
Another method is to use an unbiased track estimate, for instance from a separate measurement with a hodoscope. Or using MC truth in simulation.
This is why the interface has two functions:
EVENT::DoubleVec TrackFitterBase::calculateResiduals(EVENT::TrackerHit *testHit, EVENT::Track *trackWithoutTestHit);
IMPL::TrackImpl * fitTrack (EVENT::Track const *seedTrack);
The first function calculates the residuals to the track that has been provided. The naming of the variable should definitely been changed, it is somewhere between misleading and wrong. How about "trackHypothesis"? There is also a calculateResiduals function which takes the track and the hit number on the track. It was intended to be a convenience function which calls the other one. But I see that it probably causes only confusion.
The fitTrack does what Christoph wants to do with the simpler "fit object". You put in an seed track (probably also not a good name since it is not a seed for pattern recognition) which serves as a container for the hits and provides start parameters for a numerical minimisation. And you get out the fitted track.
To avoid code duplication a chi^2 fitter will call calculateResiduals() during the minimisation. And the histogramming processor calls it to get the residual values.
In addition, once the chi^2 algorithm is implemented one can inherit from it and just change the calcualteResiduals (for instance from 'perpendicular to the helix' to 'along the pad row'.
A Likelihood fitter does not use the geometrical residuals in the fit process. Here it behaves like Jason's pair proposal: One function for fitting, one for residual calculation.
I always considered the decision which definition of resolution (in the sense of width of residual distribution) to use as part of the analysis. This is why there are several histogramming processors for different definitions. The processor for geometric mean resolution for instance loops all hits, removes
the hit from the track, refits it and then retrieves the residual with and without the hit included in the fit. It has to be done 'manually' and is not in the fitter interface.
According the factory:
I was looking for a way to have only one histogramming processor for each "resolution" definition which works independently from the residual implementation (i.e. track fitter implementation) that was used. The Japanese group proposed that one possibly would use different fitters depending on the track properties (I can imagine using a different fit for particles coming from the vertex). So I would not store the fitter type as a collection
parameter but as a number in the track itself (bits 1623 in the track type word). So every fitter implementation needed a unique ID, which is the peculiar list of types in the base class.
As the geometric mean processor does refitting and some implementations need additional parameters for this, the factory has an interface which takes LCParameters. The fitter processor stores these parameters in the tracks collection, so you pass its collectionParameters to the factory, together with the ID you retrieved from the track.
A combinatorial Kalman filter might not only want to reject hits but also to add hits that were not associated to the track. To have access to the hits collection (or whatever collection might be needed) a pointer to the event can be passed.
The factory reads out the implementation specific parameters and calls the constructor of the fitter. For every new fitter implementation the factory has to be extended. This is the price you have to pay to have the histogramming processor independent from the fitter implementation.
I admit the scheme is complicated and a bit tangled, but I could not come up with a simpler solution for the functionality I wanted.
Cheers
Martin
Martin Killenberg
CERN
martin.killenberg@cern.ch



Re: Fitting scheme (part 1/2 for Jason) [message #2006 is a reply to message #1988] 
Wed, 02 June 2010 02:26 
rosemann Messages: 41 Registered: March 2009 Location: hamburg.de 



Hi Martin and Jason,
I probably need to add several "oops" in my previous mails. I really have to get some things straight; it seems that I misunderstood a couple of things. (And of course: sorry for replying so late, but I really had to think a bit).
First I reply to Jason (I'd rather answer Martin in a second post, so not to confuse any further)
jabernathy wrote on Wed, 12 May 2010 10:03  [...]
In fact, it is even more complicated than that. The "residuals" from the likelihood method are in units of charge or electrons.

This makes life a lot more complicated. I suggest that we split the fitting schemes. I don't see so many different methods coming up in the future. So then we have two fundamentally separate ways of fitting, still with hopefully only one base class/interface.
[...]
rosemann  The underlying problem is of course something else  the LCIO objects lack some, eventually fundamental functionality.[...]

jabernathy 
There is an LCRelation from LCIO. I believe these are used to abstractly associate two LCIO objects with a single weighting parameter.

Since there is further evidence that the functionality is needed for tracking in general, I'd rather go with the try to extend the LCIO track class. For the time being it might be better to create some intermediate solution; maybe this is something for the next MarlinTPC meeting.
Ever so slowly I see some things converging on the horizon...
Cheers,
Christoph
When you have eliminated the impossible, whatever remains, however improbable, must be the truth. (Sir A.C. Doyle in Sign of Four)



Re: Fitting scheme (part 2/2 for Martin) [message #2007 is a reply to message #1990] 
Wed, 02 June 2010 02:47 
rosemann Messages: 41 Registered: March 2009 Location: hamburg.de 



Hi Martin and Jason,
now come the "oops" parts.
killenberg  [...]rosemann 
residual  the geometrical distance in x/phi at constant y/r between an associated hit (from track finding) to its fitted track, when it is excluded from the fit distance  the geometrical distance in x/phi at constant y/r between an associated hit (from track finding) to its fitted track, when it is included from the fit
The resolution is then the geometric mean of both values. (For reference check e.g. Ralf Dieners thesis or physics/0402054)


Wrong on my part, the (generic) distance is not at a fixed value, but really the perpendicular distance. While I personally think that this is a small nightmare in reading the formulas (e.g. think about: polar coordinates for the hits and a helix), it needs to be written only once.
It seems that I had completely misunderstood something; I believed that the definition of resolution was like I stated in the previous post (taken at fixed row position)  but it turns out, that there was no such agreement and the actual use (also at DESY) is the perpendicular definition.
I'm still looking, if there ever was a common agreement about this definition; maybe even written down. Otherwise we should somehow specify this (but where?).
killenberg  [...]The geometric mean is used because it is an unbiased estimator for the resolution. But we always used the mean of the withs of the two distributions (with and without the specific hit used in the fit). I don't know which is correct, probably both methods are equivalent if you calculate it through.

I'm not sure if I can follow you here. Afaik the geometric mean method is the one described in the paper I mentioned before (physics/0402054).
killenberg  However, there are other methods to calculate an unbiased estimator. For instance one can use the distance calculated with the hit included in the track fit and multiply it by sqrt(n) / sqrt(nDoF), where n is the number if hits in the track and DoF is the number of degrees of freedom in the track fit.

I've never seen this way of defining the resolution. This would be a nice way to determine it without needing to refit the track n1 times for n hits on the track. Do you have a reference where this is described or shown?
killenberg  Another method is to use an unbiased track estimate, for instance from a separate measurement with a hodoscope. Or using MC truth in simulation.

Agreed, but both are academic for the time being. But maybe still this year we will have something like a hodoscope for the LP...(?)
killenberg  [skip the longer part of the description]
I admit the scheme is complicated and a bit tangled, but I could not come up with a simpler solution for the functionality I wanted.[...]

I have thought a lot about the scheme and it seems to me that either way it will be some kind of conflict. What I'm doing right now is sort of "start all over" and see where I'm going. The base class is a good idea and I will build on it.
I definitely propose to discuss this in the next MarlinTPC meeting. Maybe I will come to the same conclusion as Martin. Maybe there are other solutions.
I will report my thoughts and findings also here, to allow other thoughts into to process.
Cheers,
Christoph
When you have eliminated the impossible, whatever remains, however improbable, must be the truth. (Sir A.C. Doyle in Sign of Four)



Re: Fitting scheme (part 2/2 for Martin) [message #2008 is a reply to message #2007] 
Wed, 02 June 2010 05:56 
killenberg Messages: 125 Registered: July 2005 Location: CERN 



rosem 
killenb  ... For instance one can use the distance calculated with the hit included in the track fit and multiply it by sqrt(n) / sqrt(nDoF), where n is the number if hits in the track and DoF is the number of degrees of freedom in the track fit.
 ...Do you have a reference where this is described or shown?

I use Wikipedia because it is easy to link.
For the 1D case of a Gaussian distribution there is only one free parameter, the mean of the distribution. In this case the factor is n1, known as Bessel's correction. The variance calculated using n1 is the sample variance.
For the 2D case of a linear regression (which is an ordinary least squares estimator) I use the German text book
Lothar Sachs, "Statistische Methoden", Springer Verlag 1979, ISBN 3540092269, chapter 8.6
In
Frederick James, "Statistical Methods in Experimental Physics, 2nd Edition", World Scientific 2006, ISBN 9812705279, chapter 8.4.1
I find that the method is retrieved for a linear function of the free parameters. (see also German Wikipedia page about linear regression. On the English page I could not find it).
I don't know how well a linear approximation is fulfilled for our helix fit.
However, the nDoF correction can be turned on in the BiasedResidualsProcessor and we can compare it to the geometric mean method. It uses DoF = 3 for a helix in the xy/rphi plane (2 for a straight line), and 2 in the sz plane.
Cheers
Martin
Martin Killenberg
CERN
martin.killenberg@cern.ch



Re: Fitting scheme (part 2/2 for Martin) [message #2009 is a reply to message #2008] 
Wed, 02 June 2010 06:23 
killenberg Messages: 125 Registered: July 2005 Location: CERN 



Hello Christoph,
according the residuals I found on Wikipedia that in general it means "distance of a measured point to the function", in our case the track. In the name residual it is not determined how the function is defined.
I propose to explicitly state which residuals we refer to:
 r_n : residual with all hits in the track fit
 r_(n1) : residual with the test hit excluded in track fit
 r_geo : geometric mean of r_n and r_(n1)
 r_MC : residual to Monte Carlo truth as track
 r_hod : residual to track from hodoscope
Cheers
Martin
Martin Killenberg
CERN
martin.killenberg@cern.ch



Re: Fitting scheme (part 2/2 for Martin) [message #2010 is a reply to message #2009] 
Wed, 02 June 2010 09:22 
ralf Messages: 43 Registered: August 2007 Location: DESY Hamburg 



Hi all
killenberg wrote on Wed, 02 June 2010 06:23 
I propose to explicitly state which residuals we refer to:
 r_n : residual with all hits in the track fit
 r_(n1) : residual with the test hit excluded in track fit
 r_geo : geometric mean of r_n and r_(n1)
 r_MC : residual to Monte Carlo truth as track
 r_hod : residual to track from hodoscope

For the residual including and excluding the fit, I don't have much of an opinion. It is just definition how to call it.
I only used the names "distance" and "residual" in my thesis since its easy to write/read and to enunciate  at least a bit easier than with some kind of indices. And this definition spread in the DESY group.
But as I said, it was just a definition...
I would not call the geometric mean "r_geo", since it is the geometric mean of the widths of the two corresponding distributions. So IMHO, "sigma" would be more appropriate here to avoid confusion.
Or does anyone need the geometric mean of two residuals? At the moment, I don't see for what this could be used.
The rest of proposed naming is good.
CU, Ralf.



Goto Forum:
[ ]
Current Time: Wed Dec 11 14:51:37 Pacific Standard Time 2019
