I/O
File Format
Run time performance
We introduced an optimized infrastructure for reading objects using a StreamerInfo. Rather than driving the streaming using a switch statement inside TStreamerInfo::ReadBuffer,
the streaming is now driven using a simple loop over a sequence of configured StreamerInfo actions. This improves run-time performance by allowing a dramatic reduction in function calls and code
branches at the expense of some code duplication. There are 3 versions of this loop implemented in TBufferFile and overloaded in TBufferXML and TBufferSQL:
- virtual Int_t ReadSequence(const TStreamerInfoActions::TActionSequence &sequence, void *object);
virtual Int_t ReadSequence(const TStreamerInfoActions::TActionSequence &sequence,
void *start_collection, void *end_collection);
virtual Int_t ReadSequence(const TStreamerInfoActions::TActionSequence &sequence,
void *start_collection, void *end_collection);
The 1st version is optimized to read a single object. The 2nd version is optimized to read the content of TClonesArrays and vectors of pointers to objects. The 3rd version is used to streamed any collections.
TBufferXML and TBufferSQL overload the loops to introduce extra code to help the buffer keep track of which streamer element is being streamed (this functionality is not used by TBufferFile.)
A TStreamerInfoActions::TActionSequence is an ordered sequence of configured actions.
A configured action has both an action which is a free standing function and a configuration object deriving
from TStreamerInfoActions::TConfiguration. The configuration contains information that is specific to the action
but varies from use to use, including the offset from the beginning of the object that needs to be updated.
Other examples of configuration include the number of bits requested for storing a Double32_t or its factor and minimum.
When the sequence is intended for a collection, the sequence has a configuration object deriving
from TStreamerInfoActions::TLoopConfiguration which contains for example the size of the element of
a vector or the pointers to the iterators functions (see below).
Each TStreamerInfo has 2 reading sequences, one for object-wise reading (GetReadObjectWiseActions)
and one for member-wise reading (GetReadMemberWiseActions) which is used when streaming a TClonesArray
of a vector of pointer to the type of objects described by the TClass.
Each collection proxy has at least one reading sequences, one for the reading each version of the
contained class layout.
Each case of the TStreamerInfo::ReadBuffer switch statement is replaced by 4 new action functions,
one for the object wise reading, one for the member wise reading for TClonesArray and vector of pointers,
one for the member wise reading for a vector of object and one for all other collections.
Each collection (proxy) needs to provide 5 new free standing functions:
// Set of functions to iterate easily throught the collection
static const Int_t fgIteratorArenaSize = 16; // greater than sizeof(void*) + sizeof(UInt_t)
typedef void (*CreateIterators_t)(void *collection, void **begin_arena, void **end_arena);
virtual CreateIterators_t GetFunctionCreateIterators(Bool_t read = kTRUE) = 0;
// begin_arena and end_arena should contain the location of a memory arena of size fgIteratorSize.
// If the collection iterator are of that size or less, the iterators will be constructed in place in those location
// (new with placement.) Otherwise the iterators will be allocated via a regular new and their address returned by
// modifying the value of begin_arena and end_arena.
typedef void* (*CopyIterator_t)(void *dest, const void *source);
virtual CopyIterator_t GetFunctionCopyIterator(Bool_t read = kTRUE) = 0;
// Copy the iterator source, into dest. dest should contain the location of a memory arena of size fgIteratorSize.
// If the collection iterator is of that size or less, the iterator will be constructed in place in this location
// (new with placement.) Otherwise the iterator will be allocated via a regular new and its address returned by
// modifying the value of dest.
typedef void* (*Next_t)(void *iter, const void *end);
virtual Next_t GetFunctionNext(Bool_t read = kTRUE) = 0;
// iter and end should be pointers to respectively an iterator to be incremented and the result of collection.end()
// If the iterator has not reached the end of the collection, 'Next' increment the iterator 'iter' and return 0 if
// the iterator reached the end.
// If the end was not reached, 'Next' returns the address of the content pointed to by the iterator before the
// incrementation ; if the collection contains pointers, 'Next' will return the value of the pointer.
typedef void (*DeleteIterator_t)(void *iter);
typedef void (*DeleteTwoIterators_t)(void *begin, void *end);
virtual DeleteIterator_t GetFunctionDeleteIterator(Bool_t read = kTRUE) = 0;
virtual DeleteTwoIterators_t GetFunctionDeleteTwoIterators(Bool_t read = kTRUE) = 0;
// If the size of the iterator is greater than fgIteratorArenaSize, call delete on the addresses,
// Otherwise just call the iterator's destructor.
TFile::MakeProject
-
Extend TFile::MakeProject to support genreflex, cases of user's data model where
the 2 distincts pointers point to a single object and more cases where we are
missing the StreamerInfo and need to guess whether the symbol represent an enum,
a class or a namespace.
To use genreflex, call MakeProject with the "genreflex" option, for example:
file->MakeProject(libdir,"*","NEW+genreflex");
-
To make sure the library created by MakeProject does not double delete an object,
tell the StreamerElement representing one of the pointers pointing to the object
to never delete the object. For example:
TClass::AddRule("HepMC::GenVertex m_event attributes=NotOwner");
-
MakeProject now implements a move constructor for each classes. For the implementation, we 'use' the 'copy constructor' until the C++ compilers properly support the official move constructor notation. Implementing a move constructor avoid having to delete and reconstruct resource during a std::vector resize and avoid the double delete induced by using the default copy constructor.
- MakeProject now adds dictionaries for auto_ptr.
- MakeProject no longer request the dictionary for std::pair instances that already have been loaded.
Misc.
- TFile::Open now does variable expansion so that you can include the protocol in the variable (for example:
export H1="http://root.cern.ch/files/h1"
...
TFile::Open("$H1/dstarmb.root");
- Added warning if the file does contain any StreamerInfo objects and was written with a different version of ROOT.
- Implemented polymorphism for Emulated object (still not supporting polymorphism of Emulated Object inheriting from compiled class). See the Core/Meta section for details.
- Add support for streaming auto_ptr when generating their dictionary via rootcint
- Enable the use of the I/O customization rules on data members that are either a variable size array or a fixed size array. For example:
#pragma read sourceClass = "ACache" targetClass = "ACache" version = "[8]" \
source = "Int_t *fArray; Int_t fN;" \
target = "fArray" \
code = "{ fArray = new Char_t[onfile.fN]; Char_t* gtc=fArray; Int_t* gti=onfile.fArray; \
for(Int_t i=0; i<onfile.fN; i++) *(gtc+i) = *(gti+i)+10; }"
#pragma read sourceClass = "ACache" targetClass = "ACache" version = "[8]" \
source = "float fValues[3]" \
target = "fValues" \
code = "{ for(Int_t i=0; i<3; i++) fValues[i] = 1+onfile.fValues[i]; }"
- Allow the seamless schema evolution from map<a,b> to vector<pair<a,b> >.
- Avoid dropping information when reading a long written on a 64 bits platforms
and being read into a long long on a 32 bits platform (previously the higher
bits were lost due to passing through a 32 bits temporary long).
- Migrate the functionality of TStreamerInfo::TagFile to a new interface TBuffer::TagStreamerInfo
so that TMessage can customize the behavior. TMessage now relies on this new interface
instead of TBuffer::IncrementLevel.
- New option to hadd, -O requesting the (re)optimization of the basket size (by avoid the fast merge technique). The equivalent in TFileMerger is to call
merger->SetFastMethod(kFALSE)
- To make sure that the class emulation layer of ROOT does not double delete an object,
tell the StreamerElement representing one of the pointers pointing to the object
to never delete the object. For example:
TClass::AddRule("HepMC::GenVertex m_event attributes=NotOwner");
- The handling of memory by the collection proxy has been improved in the case of a
collection of pointers which can now become owner of its content.
The default, for backward compatibility reasons and to avoid double delete (at the expense
of memory leaks), the container of pointers are still not owning their content
unless they are a free standing container (i.e. itself not contained in another
object).
To make a container of pointers become owner of its content do something like:
TClass::AddRule("ObjectVector<LHCb::MCRichDigitSummary> m_vector options=Owner");
- Added TKey::Reset and TKey::WriteFileKeepBuffer to allow derived classes (TBasket) to be re-use as key rather than always recreated.
- TH1::Streamer and TGraph2D::Streamer no longer reset the kCanDelete bit directly so that the user can give
ownership of the object to the canvas they are stored with. However, if they are saved on their own, the mechanism
that associates them to the current directory (DirectoryAutoAdd) will now reset the bit to avoid any possible
ownsership confusion.
- Added TFile::SetOffset and TFile::ReadBuffer(char *buf, Long64_t pos, Int_t len); to drastically reduce
the number of fseek done on the physical file when using the TTreeCache.
- To support future changes in the API of the CollectionProxy, we added the new #define:
ROOT_COLLECTIONPROXY_VERSION and REFLEX_COLLECTIONPROXY_VERSION
- Reduce possible confusions and conflicts by always using in TClass and TStreamerInfo the version of template instance names with ULong64_t and Long64_t rather than [unsigned] long long.
- new Hadoop TFile plugin.