TMVA
TMVA version 4.0.4 is included in this root release.
Methods
- A new Category method allowing the user to
separate the training data (and accordingly the application
data) into disjoint sub-populations exhibiting significantly
different properties. The separation into phase space regions is
done by applying requirements on the input and/or spectator
variables. In each of these disjoint regions (each event must
belong to one and only one region), an independent training is
performed using the most appropriate MVA method, training
options and set of training variables in that zone. The division
into categories in presence of distinct sub-populations reduces
the correlations between the training variables, improves the
modelling, and hence increases the classification and regression
performance. Presently, the Category method works for
classification only, but regression will follow soon. Please
contact us if urgently needed.
An example scripts and data files illustrating how the new
Category method is configured and used. Please check the macros
test/TMVAClassificationCategory.C and
test/TMVAClassificationCategoryApplication.C or the
corresponding executables.
- Regression functionality for gradient boosted trees using a Huber loss function.
Comments
On Input Data:
New TMVA event vector building. The code for splitting the input
data into training and test samples for all classes and the
mixing of those samples to one training and one test sample has
been rewritten completely. The new code is more performant and
has a clearer structure. This fixes several bugs which have been
reported by the TMVA users.
On Minimization:
Variables, targets and spectators are now checked if they are
constant. (The execution of TMVA is stopped for variables and
targets, a warning is given for spectators.)
On Regression:
The analysis type is no longer defined by calling a dedicated
TestAllMethods-member-function of the Factory, but with the
option "AnalysisType" in the Factory. The default value is
"Auto" where TMVA tries to determine the most suitable analysis
type from the targets and classes the user has defined. Other
values are "regression", "classification" and "multiclass" for
the forthcoming multiclass classification.
Missing regression evaluation plots for training sample were
added.
On Cut method:
Removed obsolete option "FVerySmart" from Cuts method.
On MLP method:
Display of convergence information in the progress bar for MLP during training.
Creation of animated gifs for MLP convergence monitoring (please
contact authors if you want to do this).
On Datasets:
Checks are performed if events are unvoluntarily cut by using a
non-filled array entry (e.g. "arr[4]" is used, when the array
has not always at least 5 entries). A warning is given in that
case.
Bug fixes
- Spectators and Targets could not be used with by-hand assignment of events.
- Corrected types (training/testing) for assigning single events.
- Changed message from FATAL to WARNING when the user requests more events for
training or testing than available.
- Fixed bug which caused TMVA to crash if the number of input variables exceeded
the allowed maximum for generating scatter plots.
- Prevent TMVA from crashing when running with an empty TTree or TChain.
- A variable expression like "Alt$(arr[3],0)" can now be used
to give a default value for a variable if for some events the
array don't contain enough elements (e.g. in two jet events,
sometimes only one jet is found and thus, the array jetPt[] has
only one entry in that cases).
- Plot ranges for scatter-plots showing the transformed events are now correct.
- User defined training/testing-trees are now handled correctly.
- Fix bug in correlation computation for regression.
- Consistent use of variable labels (for the log output) and variable titles (in histograms).
- Drawing of variable labels in network architecture display for regression mode has been added.
- Bug fixes to Cuts which improves performance on datasets with many variables.
- Bug fix in GaussTransformation which improves handling of gaussian tails.