I/O Libraries
LZMA Compression and compression Level setting
ROOT I/O now support the LZMA compression algorithm to compress data in
addition to the ZLIB compression algorithm.
LZMA compression typically results in smaller files, but takes more
CPU time to compress data. To use the new feature, the external XZ
package must be installed when ROOT is configured and built:
Download 5.0.3 from here tukaani.org
and make sure to configure with fPIC:
./configure CFLAGS='-fPIC'
Then the client C++ code must call routines to explicitly request LZMA
compression.
ZLIB compression is still the default.
Setting the Compression Level and Algorithm
There are three equivalent ways to set the compression level and
algorithm. For example, to set the compression to the LZMA algorithm
and compression level 5.
TFile f(filename, option, title);
f.SetCompressionSettings(ROOT::CompressionSettings(ROOT::kLZMA, 5));
-
TFile f(filename, option, title, ROOT::CompressionSettings(ROOT::kLZMA, 5));
-
TFile f(filename, option, title);
f.SetCompressionAlgorithm(ROOT::kLZMA);
f.SetCompressionLevel(5);
These methods work for TFile, TBranch, TMessage, TSocket, and TBufferXML.
The compression algorithm and level settings only affect compression of
data after they have been set. TFile passes its settings to a TTree's branches
only at the time the branches are created. This can be overidden by
explicitly setting the level and algorithm for the branch. These classes
also have the following methods to access the algorithm and level for
compression.
Int_t GetCompressionAlgorithm() const;
Int_t GetCompressionLevel() const;
Int_t GetCompressionSettings() const;
If the compression level is set to 0, then no compression will be
done. All of the currently supported algorithms allow the level to be
set to any value from 1 to 9. The higher the level, the larger the
compression factors will be (smaller compressed data size). The
tradeoff is that for higher levels more CPU time is used for
compression and possibly more memory. The ZLIB algorithm takes less
CPU time during compression than the LZMA algorithm, but the LZMA
algorithm usually delivers higher compression factors.
The header file core/zip/inc/Compression.h declares the function
"CompressionSettings" and the enumeration for the algorithms.
Currently the following selections can be made for the algorithm:
kZLIB (1), kLZMA (2), kOldCompressionAlgo (3), and kUseGlobalSetting
(0). The last option refers to an older interface used to control the
algorithm that is maintained for backward compatibility. The following
function is defined in core/zip/inc/Bits.h and it set the global
variable.
R__SetZipMode(int algorithm);
If the algorithm is set to kUseGlobalSetting (0), the global variable
controls the algorithm for compression operations. This is the
default and the default value for the global variable is kZLIB.
gDirectory
gDirectory is now a thread local!
The value of gDirectory and gFile are now all accessed via a static function of their respective class. The access is made transparent via a CPP macro.
Note: Whenever a thread has an associated TThread object, the value of gDirectory is now thread local, i.e. all modifications direct or indirect of gDirectory will not be seen by the other thread. In particular this means that several I/O operations (including TDirectory::Write) are thread safe (as long as all the required TClass and TStreamerInfo has been previously setup).
Note: This model does not support sharing TFile amongst threads (i.e. a TFile must be accessed from exactly one thread). This means that whenever a TFile's control is passed from a thread to another, the code must explicitly reset gDirectory to another value or there is a risk for this gDirectory to point to a stale pointer if the other thread deletes the TFile object. A TFile deletion will only affect the value of the local gDirectory and gFile.
TMemFile
Introduce TMemFile and update TFileMerger to support incremental merges.
Add new tutorials (net/treeClient.C + net/fastMergeServer.C)
demonstrating how a TMemFile can be used to do parallel merge
from many clients. (TMemFile still needs to be better integrated
with TMessage and TSocket).
The new TMemFile class support the TFile interface but only store
the information in memory. This version is limited to 32MB.
TMessage mess;
...
mess->ReadFastArray(scratch,length);
transient = new TMemFile("hsimple.memroot",scratch,length);
will copy the content of 'scratch' into the in-memory buffer
created by/for the TMemFile.
TMemFile *file = new TMemFile("hsimple.memroot","RECREATE");
Will create an empty in-memory of (currently fixed) size 32MB.
file->ResetAfterMerge(0);
Will reset the objects in the TDirectory list of objects
so that they are ready for more data accumulations (i.e.
returns the data to 0 but keep the customizations).
TFile::MakeProject
- New option 'par' in to pack in a PAR file the generated
code. The first argument defines the directory and the name of the package.
For example, the following generates a PAR package equivalent to
tutorials/proof/event.par:
root [] TFile *f = TFile::Open("http://root.cern.ch/files/data/event_1.root")
root [] f->MakeProject("packages/myevent.par", "*", "par");
Note that, because a PAR file is a tarball, for the time being, on Windows
only the package directory and the files are generated and a warning message
is printed.
- Properly handle the case of class which version is zero and to properly initialization array of objects (streamerElement type kStreamLoop).
- Fix support for call to MakeProject like:
gFile->MakeProject("./classCode/","*","RECREATE++")
- Better error handling if the source file failed to be created
or if the project directory can not be created.
TParallelMergingFile
Introduce the class TParallelMergingFile part of the net package. This class connect ot a parallel merge server
and upload its content every time Write is called on the file object. After the upload the object of classes
with a ResetAfterMerge function are reset.
A TParallelMergingFile is created whether a ?pmerge option is passed to TFile::Open as part of the file name.
For example:
TFile::Open("mergedClient.root?pmerge","RECREATE"); // For now contact localhost:1095
TFile::Open("mergedClient.root?pmerge=localhost:1095","RECREATE");
TFile::Open("rootd://root.cern.ch/files/output.root?pmerger=pcanal:password@locahost:1095","NEW")
tutorials/net/treeClient.C and fastMergeServer.C: update to follow the change in interfaces
Introduce the tutorials parallelMergerClient.C and the temporary tutorials parallelMergerServer.C
to demonstrate the parallel merging (with parallelMergerServer.C being the prototype of the upcoming
parallel merger server executable).
Other
- Introduce the new function TFileMerger::PartialMerge(Int_t) which
will Merge the list of file _with_ the content of the output
file (if any). This allows make several successive Merge
into the same TFile object.
Yhe argument defines the type of merge as define by the bit values in EPartialMergeType:
- kRegular : normal merge, overwritting the output file.
- kIncremental : merge the input file with the content of the output file (if already exising) (default).
- kAll : merge all type of objects (default).
- kResetable : merge only the objects with a MergeAfterReset member function.
- kNonResetable : merge only the objects without a MergeAfterReset member function.
- Removed TFileMerger::RecursiveMerge from the interface.
- Prevent TFileMerger (and hadd) from trying to open too many files.
Add a new member function TFileMerger::SetMaxOpenedFiles and
new command line option to hadd ( -n requested_max ) to allow
the user to reduce the number of files opened even further.
- Update hadd and TFileMerger so that they prefix all their information message
with their names (when running hadd, the TFileMerger message are prefixed by hadd):
$ hadd -v 0 -f output.root input1.root input2.root
$ hadd -v 1 -f output.root input1.root input2.root
hadd merged 2 input files in output.root.
$ hadd -v 2 -f output.root input1.root input2.root
hadd target file: output.root
hadd Source file 1: input1.root
hadd Source file 2: input2.root
hadd Target path: output.root:/
- Introduce non-static version of TFile::Cp allows the copy of
an existing TFile object.
-
Introduce new explicit interface for providing reseting
capability after a merge. If a class has a method with
the name and signature:
void ResetAfterMerge(TFileMergeInfo*);
it will be used by a TMemFile to reset its objects after
a merge operation has been done.
If this method does not exist, the TClass will use
a method with the name and signature:
void Reset(Optiont_t *);
-
TClass now provides a quick access to these merging
function via TClass::GetResetAfterMerge. The wrapper function
is automatically created by rootcint and can be installed
via TClass::SetResetAfterMerge. The wrapper function should have
the signature/type ROOT::ResetAfterMergeFunc_t:
void (*)(void *thisobj, TFileMergeInfo*);
ResetAfterMerge functions were added to the following classes:
TDirectoryFile, TMemFile, TTree, TChain, TBranch, TBranchElement,
TBranchClones, TBranchObject and TBranchRef.
- Avoid leaking the inner object in a container like vector<vector<MyClass*> >
and vector<vector<MyClass*> *> .
- Put in place the infrastructure to optimize the I/O writes in the same way we optimized the I/O reads.
- Add the function TBuffer::AutoExpand to centralize the automatic
buffer extension policy. This enable the ability to tweak it later
(for example instead of always doubling the size, increasing by
only at most 2Mb or take hints from the number of entries already
in a TBasket).