.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.13)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.ie \nF \{\
.    de IX
.    tm Index:\\$1\t\\n%\t"\\$2"
..
.    nr % 0
.    rr F
.\}
.el \{\
.    de IX
..
.\}
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
.    \" fudge factors for nroff and troff
.if n \{\
.    ds #H 0
.    ds #V .8m
.    ds #F .3m
.    ds #[ \f1
.    ds #] \fP
.\}
.if t \{\
.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
.    ds #V .6m
.    ds #F 0
.    ds #[ \&
.    ds #] \&
.\}
.    \" simple accents for nroff and troff
.if n \{\
.    ds ' \&
.    ds ` \&
.    ds ^ \&
.    ds , \&
.    ds ~ ~
.    ds /
.\}
.if t \{\
.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
.    \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
.    \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
.    \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
.    ds : e
.    ds 8 ss
.    ds o a
.    ds d- d\h'-1'\(ga
.    ds D- D\h'-1'\(hy
.    ds th \o'bp'
.    ds Th \o'LP'
.    ds ae ae
.    ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "PMP::PMP 3"
.TH PMP::PMP 3 "2009-02-28" "perl v5.10.1" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
PMP \- Poor Man's Pipeline; programmatic pipeline control
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
The best synposis is found in the tutorial:
http://wiki.bic.mni.mcgill.ca/index.php/PoorMansPipeline
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
\&\s-1PMP\s0 stands for \*(L"Poor Man's Pipeline\*(R" and is a perl library that provides
control over arbitrarily complex commands linked through
dependencies. The main goals of \s-1PMP\s0 are:
.IP "\(bu" 4
Execution of a set of commands describing a pipeline
.IP "\(bu" 4
Tracking of dependencies between the different commands
.IP "\(bu" 4
Parallel execution mode by using one of two batch queueing system
.IP "\(bu" 4
Drop in replacement of parallel or sequential modes.
.IP "\(bu" 4
Generation of dependency graphs for easier debugging.
.IP "\(bu" 4
Full programmatic control over the pipeline. I.e. it is designed as a
series of perl classes rather than a separate language. The key
advantage to the approach that \s-1PMP\s0 takes is that it makes it possible
for generic pipelines to be written since argument parsing and all
control structures of Perl are available to the user.
.IP "\(bu" 4
Easily customizeable through the use of inheritance. Use a pipeline
that calls on a batch queueing system or not \- by changing one line
of code.
.PP
The main features currently not present which might be added in the
near future are:
.IP "\(bu" 4
Use of a database to track dependencies and pipeline status. Using a
database rather than the filesystem is a blessing in that it can
allow for faster execution times since there is much less file
access, and a curse in that it makes an application much less portable.
.SH "COMPONENTS"
.IX Header "COMPONENTS"
\&\s-1PMP\s0 currently consists of four different classes:
.IP "\(bu" 4
\&\s-1PMP::PMP\s0
.Sp
The main class which is used to configure a pipeline. A pipeline is,
for the purposes of \s-1PMP\s0, defined as a the set of commands and their
dependencies for a single subject.
.IP "\(bu" 4
PMP::spawn
.Sp
A subclass of \s-1PMP\s0 in which the command execution uses the MNI::Spawn
batch system. Otherwise should be entirely exchangeable with \s-1PMP::PMP\s0.
.IP "\(bu" 4
PMP::pbs
.Sp
A subclass of \s-1PMP\s0 in which the command execution uses the \s-1PBS\s0 batch
queueing system rather than the MNI::Spawn interface. Otherwise
should be entirely exchangeable with \s-1PMP::PMP\s0.
.IP "\(bu" 4
PMP::sge
.Sp
A subclass of \s-1PMP\s0 in which the command execution uses the \s-1SGE\s0 (Sun
Grid Engine) batch queueing system. Otherwise should be entirely
exchangeable with \s-1PMP::PMP\s0.
.IP "\(bu" 4
PMP::Array
.Sp
Designed to deal with a set of pipelines. Most pipeline runs will
consist of multiple subjects executing the same set of commands \-
PMParray is designed to make that easy.
.SH "OVERVIEW"
.IX Header "OVERVIEW"
The usual way of setting up a \s-1PMP\s0 pipeline is the following:
.PP
Import the necessary components through the use statement, e.g.:
.PP
.Vb 5
\&    use PMP::PMP;
\&    use PMP::spawn;
\&    use PMP::pbs;
\&    use PMP::sge;
\&    use PMP::Array;
.Ve
.PP
The pipearray is also declared at this early point:
.PP
.Vb 1
\&    my $pipes = PMP::Array\->new();
.Ve
.PP
Then comes any argument processing that your application might have
to deal with as well as setting up some global variables that will
remain unchanged for each pipeline. This is followed by the
definitions of each individual pipeline, usually placed inside a
foreach loop which processes each subject. Inside this loop the
pipeline is initialised like so:
.PP
.Vb 4
\&    my $pipeline = PMP::PMP\->new(); # sequential version (default spawn)
\&    my $pipeline = PMP::pbs\->new(); # parallel version using PBS
\&    my $pipeline = PMP::sge\->new(); # parallel version using SGE
\&    my $pipeline = PMP::spawn\->new(); # sequential version using MNI::Spawn
.Ve
.PP
Then certain globals for that pipeline are set, such as
.PP
.Vb 2
\&    $pipeline\->name("some\-name");
\&    $pipeline\->statusDir("/some/directory");
.Ve
.PP
This makes a good place also for defining variables that change for
each subject, such as input and output filenames.
.PP
This is followed by defining all the stages through the addStage
method, an example of which is:
.PP
.Vb 6
\&    $pipeline\->addStage(
\&        { name => "total",
\&          label => "this does something interesting",
\&          inputs => [$filename],
\&          outputs => [$talTransform],
\&          args => ["mritotal", $filename, $talTransform] });
.Ve
.PP
This same stage can also be written more concisely:
.PP
.Vb 3
\&    $pipeline\->addStage(
\&        { name => "total",
\&          args => ["mritotal", "in:$filename", "out:$talTransform"] });
.Ve
.PP
After all the stages have been defined some further initialisation
commands can be run:
.PP
.Vb 6
\&    # compute the dependencies based on the filenames:
\&    $pipeline\->computeDependenciesFromInputs()
\&    # update the status of all stages based on previous pipeline runs
\&    $pipeline\->updateStatus();
\&    # restart all stages that failed in a previous run
\&    $pipeline\->resetFailures();
.Ve
.PP
Then the pipeline can be added to the Pipearray:
.PP
.Vb 1
\&    $pipes\->addPipe($pipeline);
.Ve
.PP
The foreach loop can then be closed and the pipeline itself run:
.PP
.Vb 2
\&    # loop until all pipes are done
\&    $pipes\->run();
.Ve
.SH "PUBLIC METHODS"
.IX Header "PUBLIC METHODS"
.SS "new"
.IX Subsection "new"
Initialises a pipeline. Has to be the first method called. Takes no arguments.
.SS "addStage"
.IX Subsection "addStage"
Adds a stage definition to the pipeline. Takes a hash as an
argument. The hash has the following components:
.IP "\(bu" 4
name
.Sp
The name of that particular stage. The name is what will be used to
address this stage for later usage (such as dependency tracking).
.IP "\(bu" 4
label
.Sp
A description of this stage. Entirely optional, and is only used when
generating dependency graphs. Some formatting codes are allowed,
especially for newlines: use \e\en.
.IP "\(bu" 4
inputs
.Sp
An array of the input filenames. Input files can be specified
explicitly in this array or within the args statement (see
below). Inputs and outputs can be used to define relationships between
stages.
.IP "\(bu" 4
outputs
.Sp
An array of output filenames. Output files can be specified explicitly
in this array or within the args statement (see below).
.IP "\(bu" 4
sge_opts
.Sp
A string which is directly passed to qsub when using the \s-1SGE\s0 execution
mode (and is ignored otherwise). The following string \*(L"\-l vf=2G\*(R"
would, for example, reserve 2 gigabytes of memory.
.IP "\(bu" 4
args
.Sp
An array containing the actual command that will be run when this
stage is executed. The first element is the program name, the
following the options and filenames in the same order as that program
needs them. If an option is prefixed with either in: or out:
(i.e. \*(L"in:$filename\*(R") it is considered to be an input or output
to/from this stage.
.IP "\(bu" 4
prereqs
.Sp
An optional array of stage names upon which this current stage
depends. Dependencies can also be computed based on relationships
between the inputs and outputs of different pipeline stages. In that
case only stages which would not be included through that mechnism
should be added manually to the prereqs array.
.IP "\(bu" 4
shellquote
.Sp
An optional boolean variable (0 or 1) which specifies whether
shellquoting should be used in this stage. Only makes a difference for
PMP::pbs and PMP::sge at this moment. By default shell-quoting is
turned off; this flag has to be set for each stage which should use
shell-quoting.
.PP
An example of adding a stage would be:
.PP
.Vb 5
\&    $pipeline\->addStage(
\&        { name => "cls",
\&          label => "does something else that is interesting",
\&          args => ["classify_clean", "\-clobber", "\-clean_tags", 
\&                   "in:$final", "out:$cls"] });
.Ve
.SS "statusDir"
.IX Subsection "statusDir"
Gets or sets the directory in which status files are placed. Status
files are used to keep track of each stage's completion status as
well as whatever messages the running of that stage produced. The
following files can thus be created for each stage during the
processing of a pipeline:
.IP "\(bu" 4
statusDir/pipelineName.stageName.running
.Sp
An empty file that is created while the stage is running or has been
submitted to the batch system. This file is removed once the stage
completes or crashes.
.IP "\(bu" 4
statusDir/pipelineName.stageName.finished
.Sp
An empty file that is created when a stage has completed successfully.
.IP "\(bu" 4
statusDir/pipelineName.stageName.failed
.Sp
An empty file that is created when a stage has existed with any value
other than zero.
.IP "\(bu" 4
statusDir/pipelineName.stageName.log
.Sp
A file that is created once a stage has finished and which holds the
messages printed to stdout and stderr during the execution of a job.
.SS "name"
.IX Subsection "name"
Gets or sets the name of the pipeline (if an argument is supplied
than it sets the name to that argument).
.SS "debug"
.IX Subsection "debug"
Gets or sets whether debug messages will be printed. A value of 0
turns debugging off, anything else turns it on.
.SS "printUnfinished"
.IX Subsection "printUnfinished"
Prints the unfinished stages. If no arguments are supplied it prints
them tersely, if an argument is supplied it gives more detail about
each stage that is still unfinished.
.SS "computeDependenciesFromInputs"
.IX Subsection "computeDependenciesFromInputs"
Uses the input and output files of all stages to compute between stage
dependencies. Should be called after all stages have been added and
before the pipeline is executed.
.SS "statusFromFiles"
.IX Subsection "statusFromFiles"
Sets the status of each stage based on its inputs and outputs (as
specified in addStage). A stage will be considered to have finished
if both the outputs and inputs exist and if the outputs are newer
than the inputs.
.SS "updateStatus"
.IX Subsection "updateStatus"
Updates the status of each stage based on the status files. Should be
called after all the stages have been added and before the pipeline
is executed.
.SS "registerPrograms"
.IX Subsection "registerPrograms"
Registers all the programs used in the pipeline. The assumption is
that the first element of the args array that is passed to addStage
contains the program name. A benefit of registering the programs is
that \s-1PMP\s0 will die if any of the programs cannot be found on the
environment.
.SS "run"
.IX Subsection "run"
Run one iteration of the pipeline. Returns a value of 0 when the
pipeline has no more stages that can be executed.
.SS "resetStage"
.IX Subsection "resetStage"
Takes a stage name as an argument and resets that stage's status so
that it becomes runnable again.
.SS "resetFailures"
.IX Subsection "resetFailures"
Resets all stages that have failed so that they can be run again.
.SS "resetFromStage"
.IX Subsection "resetFromStage"
Takes a stage name as an argument and resets all stages from that
stage onwards (including that stage itself).
.SS "resetAfterStage"
.IX Subsection "resetAfterStage"
Takes a stage name as an argument and resets all stages after that
stage onwards (excluding that stage itself).
.SS "resetAll"
.IX Subsection "resetAll"
Resets all stages in the pipeline.
.SS "resetRunning"
.IX Subsection "resetRunning"
Resets all stages thought to be running.
.SS "createDotGraph"
.IX Subsection "createDotGraph"
Takes an filename as an input \- a graph description will be written to
that file. One can use dot (a tool that is part of graphviz) to
generate a graphical representation of the dependecies like so: dot
\&\-Tps filename \-o output.ps.
.SS "createFilenameDotGraph"
.IX Subsection "createFilenameDotGraph"
Takes a filename as an argument as well as optional third argument
representing a substring to be removed from the filenames. It then
creates a dot file for generating a graph of the filename
dependenencies.
.SS "printStatusReportHeader"
.IX Subsection "printStatusReportHeader"
Takes a filehandle reference as an argument, and prints a \s-1CSV\s0 separated
header containing all the stage names to that file.
.SS "printStatusReport"
.IX Subsection "printStatusReport"
Takes a filehandle reference as an argument, and prints the status for
each stage in \s-1CSV\s0 format to that filehandle.
.SS "printStage"
.IX Subsection "printStage"
Takes a stage name as an argument and prints information about that
stage.
.SS "printStages"
.IX Subsection "printStages"
Prints all stages in the pipeline.
.SS "getPipelineStatus"
.IX Subsection "getPipelineStatus"
Gets the pipeline's status, returning one of four possible strings:
.IP "\(bu" 4
\&\*(L"not started\*(R" This pipeline has not yet been started
.IP "\(bu" 4
running This pipeline is running; also returns a list of the
stages that are currently running.
.IP "\(bu" 4
failed This pipeline has failed; also returns a list of the
stages that have failed.
.IP "\(bu" 4
finished This pipeline has finished.
.SS "subsetToStage"
.IX Subsection "subsetToStage"
Takes a stage name as an argument, and creates a subset of stages
running from the beginning of the pipeline up to that stage.
.SH "SEMI-PRIVATE METHODS"
.IX Header "SEMI-PRIVATE METHODS"
In the good old perl tradition \s-1PMP\s0 has no private methods. The
following methods listed here, however, are not really meant for the
calling program. Most should not do any harm, but there is no
guarantee. In other words, use at your own risk.
.SS "stageStatusFromFiles"
.IX Subsection "stageStatusFromFiles"
Takes a stage as an argument and sets the status of that stage to
finished if it has all inputs and outputs and the outputs are newer
than the inputs.
.SS "printDependencyTree"
.IX Subsection "printDependencyTree"
Prints the dependency tree. Sort of. The issue is that the dependency
is both downwards as well as rightwards. In other words, there is a
guarantee that when a stage appears in this tree that it does not
depend on any stages to its right or below it. A bit hard to read,
which is why this is still considered a semi-private method.
.SS "sortStages"
.IX Subsection "sortStages"
Sorts the stages based on their dependencies. Gets called
automatically when needed, so has no real place in user space. The
order only guarantees that a stage does not depend on any of the
following stages.
.SS "isStageFinished"
.IX Subsection "isStageFinished"
Takes a stage name as an argument and returns true if the stage has
finished. In \s-1PMP\s0 it checks first whether the status flag has been set
to finished, and if not whether the finished file exists for that
stage in the statusDir. Would have to be overwritten in a subclass
that uses a database to track the pipelines status.
.SS "isStageRunning"
.IX Subsection "isStageRunning"
Same as above but checks whether the stage is running.
.SS "isStageFailed"
.IX Subsection "isStageFailed"
Same as above but checks whether the stage has failed.
.SS "updateStageStatus"
.IX Subsection "updateStageStatus"
Takes a stage name as the argument and updates its status. Called
automatically when needed and therefore has no place in userland.
.SS "execStage"
.IX Subsection "execStage"
Takes a stage name as the argument and executes that stage.
.SS "execAllStages"
.IX Subsection "execAllStages"
Execute all stages in one lumped job.
.SS "getStatusBase"
.IX Subsection "getStatusBase"
Takes a stage name as an argument and returns the base for its status files.
.SS "getRunningFile"
.IX Subsection "getRunningFile"
Takes a stage name as an argument and returns the running filename for
that stage.
.SS "getFailedFile"
.IX Subsection "getFailedFile"
Takes a stage name as an argument and returns the failed filename for
that stage.
.SS "getFinishedFile"
.IX Subsection "getFinishedFile"
Takes a stage name as an argument and returns the finished filename
for that stage.
.SS "getLogFile"
.IX Subsection "getLogFile"
Takes a stage name as an argument and returns the log filename for
that stage.
.SS "declareStageRunning"
.IX Subsection "declareStageRunning"
Takes a stage name as an argument and declares that stage to be
running. Touches the appropriate filename.
.SS "declareStageFailed"
.IX Subsection "declareStageFailed"
Same as above but for failure.
.SS "declareStageFinished"
.IX Subsection "declareStageFinished"
Same as above but for successful completion.