.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.13) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "MNI::PathUtilities 3" .TH MNI::PathUtilities 3 "1997-10-03" "perl v5.10.1" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" MNI::PathUtilities \- recognize, parse, and tweak POSIX file and path names .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use MNI::PathUtilities qw(:all); \& \& normalize_dirs ($dir1, $dir2, ...); \& \& ($dir, $base, $ext) = split_path ($path); \& ($dir, $base, $ext) = split_path ($path, \*(Aqfirst\*(Aq); # the default \& ($dir, $base, $ext) = split_path ($path, \*(Aqlast\*(Aq); \& ($dir, $base, $ext) = split_path ($path, \*(Aqlast\*(Aq, \e@skip_ext); \& ($dir, $base) = split_path ($path, \*(Aqnone\*(Aq); \& \& @files = replace_dir ($newdir, @files); \& $file = replace_dir ($newdir, $file); \& \& @files = replace_ext ($newext, @files); \& $file = replace_ext ($newext, $file); \& \& @dirs = merge_paths (@dirs); \& \& $path = expand_path ($path) || exit 1; .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" \&\fIMNI::PathUtilities\fR provides a collection of subroutines for doing common string transformations and matches on Unix/POSIX filenames. I use \*(L"filenames\*(R" here in the generic sense of either a directory name, a bare filename, or a complete path to a file. It should be clear from context what meaning you (or the code) should attach to a given string; if it's not, that's a documentation bug, so please holler at me. .PP Throughout this module, directories are usually treated as something to be directly concatenated onto a bare filename, i.e. they either end with a slash or are empty. (The exception is \f(CW\*(C`merge_paths\*(C'\fR, which returns a list of directories ready to be \f(CW\*(C`join\*(C'\fR'd and stuffed into something like \f(CW$ENV{\*(AqPATH\*(Aq}\fR\-\-\-for this, you want '.' for the current directory, and no trailing slashes.) You generally don't have to worry about doing this for the benefit of the \fIMNI::PathUtilities\fR subroutines; they use \f(CW\*(C`normalize_dirs\*(C'\fR to take care of it for you. However, you might want to use \f(CW\*(C`normalize_dirs\*(C'\fR in your own code to spare yourself the trouble of converting empty strings to '.' and sticking in slashes. .PP Error handling is not a worry in this module; the criterion for a subroutine going in \fIMNI::PathUtilities\fR (as opposed to \&\fIMNI::FileUtilities\fR) is that it not explicitly interact with the filesystem, so there aren't many opportunities for errors to occur. (But see \f(CW\*(C`expand_path\*(C'\fR for one routine that does have to worry about error handling.) .SH "EXPORTS" .IX Header "EXPORTS" By default, \fIMNI::PathUtilities\fR exports no symbols. You can import in the usual one-name-at-a-time way like this: .PP .Vb 1 \& use MNI::PathUtilities qw(normalize_dirs split_path); .Ve .PP or you can import everything using the \f(CW\*(C`all\*(C'\fR export tag: .PP .Vb 1 \& use MNI::PathUtilities qw(:all); .Ve .SH "SUBROUTINES" .IX Header "SUBROUTINES" .IP "normalize_dirs (\s-1DIR\s0, ...)" 4 .IX Item "normalize_dirs (DIR, ...)" Each \s-1DIR\s0 (a simple list of strings\-\-\-no references here) is modified in-place so that it can be concatenated directly to a filename to form a complete path. This just means that we append a slash to each string, unless it already has a trailing slash or is empty. .Sp For example, the following table shows how \f(CW\*(C`normalize_dirs\*(C'\fR will modify the contents of a passed-in variable: .Sp .Vb 5 \& if input value is... it will become... \& \*(Aq.\*(Aq \*(Aq./\*(Aq \& \*(Aq\*(Aq \*(Aq\*(Aq \& \*(Aq/foo/bar\*(Aq \*(Aq/foo/bar/\*(Aq \& \*(Aq/foo/bar/\*(Aq \*(Aq/foo/bar/\*(Aq .Ve .Sp If you try to pass a constant string to \f(CW\*(C`normalize_dirs\*(C'\fR, Perl will die with a \*(L"Modification of a read-only value attempted\*(R" error message. So don't do that. .IP "split_path (\s-1PATH\s0 [, \s-1EXT_OPT\s0, [\s-1SKIP_EXT\s0]])" 4 .IX Item "split_path (PATH [, EXT_OPT, [SKIP_EXT]])" Splits a Unix/POSIX path into directory, base filename, and extension. (The extension always starts with some dot after the last slash; which dot is chosen depends on \s-1EXT_OPT\s0 and \s-1SKIP_EXT\s0. By default, it splits on the first dot in the filename.) .Sp \&\f(CW\*(C`split_path\*(C'\fR is normally called like this: .Sp .Vb 1 \& ($dir,$base,$ext) = split_path ($path); .Ve .Sp If there is no directory (i.e. \f(CW$path\fR refers implicitly to a file in the current directory), then \f(CW$dir\fR will be the empty string. Otherwise, \&\f(CW$dir\fR will be the head of \f(CW$path\fR up to and including the last slash. Usually, you can count on \f(CW\*(C`split_path\*(C'\fR to do the right thing; you should only have to read the next couple of paragraphs if you're curious about the exact rules it uses, or if you need to customize how it picks the extension. .Sp If \s-1EXT_OPT\s0 is supplied, it must be one of \f(CW\*(Aqfirst\*(Aq\fR, \f(CW\*(Aqlast\*(Aq\fR, or \&\f(CW\*(Aqnone\*(Aq\fR. It defaults to \f(CW\*(Aqfirst\*(Aq\fR, meaning that \f(CW$ext\fR will start at the first period after the last slash in \s-1PATH\s0, and go the end of the string. If \s-1EXT_OPT\s0 is \f(CW\*(Aqlast\*(Aq\fR, then \f(CW$ext\fR will start at the \fIlast\fR period after the last slash, unless \s-1SKIP_EXT\s0 is supplied (see below). If \&\s-1EXT_OPT\s0 is \f(CW\*(Aqnone\*(Aq\fR, then \f(CW$ext\fR will be undefined and any extensions in \&\f(CW$path\fR will be rolled into \f(CW$base\fR. Finally, if there are no extensions at all in \s-1PATH\s0, then \f(CW$ext\fR will be undefined whatever the value of \&\s-1EXT_OPT\s0. .Sp \&\s-1SKIP_EXT\s0, if supplied, must be a reference to a list of extensions to ignore when deciding which extension is the last one. Thus, it only affects things if \s-1EXT_OPT\s0 is \f(CW\*(Aqlast\*(Aq\fR. For example, splitting \&\f(CW\*(Aqfoo_bar.mnc.gz\*(Aq\fR with the \*(L"last extension\*(R" option would return \&\f(CW\*(Aqfoo_bar.mnc\*(Aq\fR as the basename, and \f(CW\*(Aq.gz\*(Aq\fR as the extension. Most likely, you want \f(CW\*(C`split_path\*(C'\fR to skip over \f(CW\*(Aq.gz\*(Aq\fR while finding the extension, and treat the dot before \f(CW\*(Aqmnc.gz\*(Aq\fR as the \*(L"last\*(R" dot. This can be done by including \f(CW\*(Aqgz\*(Aq\fR in the \s-1SKIP_EXT\s0 list: .Sp .Vb 1 \& ($dir,$base,$ext) = split_path ($path, \*(Aqlast\*(Aq, [qw(gz z Z)]); .Ve .Sp This works by repeatedly attempting to strip off a trailing \f(CW\*(C`/\e.(gz|z|Z)/\*(C'\fR from \s-1PATH\s0 before searching for the \*(L"last dot\*(R" to find the extension. After the remaining extension is extracted, the \*(L"skipped\*(R" extensions are appended to it in order to preserve the entire original pathname. This method can be used to parse \f(CW\*(Aqfoo.bar.pgp.gz\*(Aq\fR or \f(CW\*(Aqfoo.bar.gz.pgp\*(Aq\fR, assuming that both \f(CW\*(Aqpgp\*(Aq\fR and \f(CW\*(Aqgz\*(Aq\fR are in the \s-1SKIP_EXT\s0 list (in any order). .Sp (Note that even though the return value \f(CW$ext\fR includes a leading dot, you should not put leading dots on the extensions in \s-1SKIP_EXT\s0. The idea is to maximize your convenience on both ends: it is easiest to type a list of extensions without dots, and including a dot on the output side means you can reconstruct the original path by just concatenating the three return values.) .Sp Finally, \f(CW$base\fR is just the portion of \f(CW$path\fR left after pulling off \&\f(CW$dir\fR and \f(CW$ext\fR\-\-\-i.e., from the last slash to the first period (if \&\f(CW\*(C`EXT_OPT\*(C'\fR is \f(CW\*(Aqfirst\*(Aq\fR), or from the last slash to the last period excluding skipped extensions (if \f(CW\*(C`EXT_OPT\*(C'\fR is \f(CW\*(Aqlast\*(Aq\fR). .Sp For example, .Sp .Vb 1 \& split_path ($path) .Ve .Sp will split the \f(CW$path\fRs in the right-hand column into the lists shown on the left: .Sp .Vb 5 \& \*(Aqfoo.c\*(Aq (\*(Aq\*(Aq, \*(Aqfoo\*(Aq, \*(Aq.c\*(Aq) \& \*(Aq/unix\*(Aq (\*(Aq/\*(Aq, \*(Aqunix\*(Aq, undef) \& \*(Aq/bin/ls\*(Aq (\*(Aq/bin/\*(Aq, \*(Aqls\*(Aq, undef) \& \*(Aq/foo/bar/zap.mnc\*(Aq (\*(Aq/foo/bar/\*(Aq, \*(Aqzap\*(Aq, \*(Aq.mnc\*(Aq) \& \*(Aq/foo/bar/zap.mnc.gz\*(Aq (\*(Aq/foo/bar/\*(Aq, \*(Aqzap\*(Aq, \*(Aq.mnc.gz\*(Aq) .Ve .Sp However, if you called it with an \s-1EXT_OPT\s0 of \f(CW\*(Aqlast\*(Aq\fR: .Sp .Vb 1 \& split_path ($path, \*(Aqlast\*(Aq) .Ve .Sp then the last example would be split differently, like this: .Sp .Vb 1 \& \*(Aq/foo/bar/zap.mnc.gz\*(Aq (\*(Aq/foo/bar/\*(Aq, \*(Aqzap.mnc\*(Aq, \*(Aq.gz\*(Aq) .Ve .Sp But if you add a \s-1SPLIT_EXT\s0 list to that example: .Sp .Vb 1 \& split_path ($path, \*(Aqlast\*(Aq, [qw(gz z Z)]) .Ve .Sp then we return to the original split: .Sp .Vb 1 \& \*(Aq/foo/bar/zap.mnc.gz\*(Aq (\*(Aq/foo/bar/\*(Aq, \*(Aqzap, \*(Aq.mnc\*(Aq.gz\*(Aq) .Ve .Sp If the filename, however, had been something like \f(CW\*(Aqding.dong.mnc.gz\*(Aq\fR, where you want to treat \f(CW\*(Aqding.dong\*(Aq\fR as the basename, then you would have to use an \s-1EXT_OPT\s0 of \f(CW\*(Aqlast\*(Aq\fR with a \s-1SPLIT_EXT\s0 list. (Despite this convention being at odds with most of the Unix world, it appears to have some currency.) .Sp Finally, with an \s-1EXT_OPT\s0 of \f(CW\*(Aqnone\*(Aq\fR, filenames with extensions would be split like this: .Sp .Vb 3 \& \*(Aqfoo.c\*(Aq (\*(Aq\*(Aq, \*(Aqfoo.c\*(Aq, undef) \& \*(Aq/foo/bar/zap.mnc\*(Aq (\*(Aq/foo/bar/\*(Aq, \*(Aqzap.mnc\*(Aq, undef) \& \*(Aq/foo/bar/zap.mnc.gz\*(Aq (\*(Aq/foo/bar/\*(Aq, \*(Aqzap.mnc.gz\*(Aq, undef) .Ve .Sp Note that a \*(L"missing directory\*(R" becomes the empty string, whereas a \&\*(L"missing extension\*(R" becomes \f(CW\*(C`undef\*(C'\fR. This is not a bug; my rationale is that every path has a directory component that may be empty, but a missing extension means there really is no extension. .Sp See File::Basename for an alternate solution to this problem. \&\f(CW\*(C`File::Basename\*(C'\fR is not specific to Unix paths, usually results in nicer looking code (you don't have to do things like \&\f(CW\*(C`(split_path($path))[1]\*(C'\fR to get the basename), and is part of the standard Perl library; however, it doesn't deal with file extensions in quite so flexible and generic a way as \f(CW\*(C`split_path\*(C'\fR. .IP "replace_dir (\s-1NEWDIR\s0, \s-1FILE\s0, ...)" 4 .IX Item "replace_dir (NEWDIR, FILE, ...)" Replaces the directory component of each \s-1FILE\s0 with \s-1NEWDIR\s0. You can supply as many \s-1FILE\s0 arguments as you like; they are \fInot\fR modified in place. \&\s-1NEWDIR\s0 is first \*(L"normalized\*(R" so that it ends in a trailing slash (unless it is empty), so you don't have to worry about doing this yourself. (\f(CW\*(C`replace_dir\*(C'\fR does not modify its \s-1NEWDIR\s0 parameter, though, so you might want to normalize it yourself if you're going to use it for other purposes.) .Sp Returns the list of modified filenames; or, in a scalar context, returns the first element of that list. (That way you can say either \&\f(CW\*(C`@f = replace_dir ($dir, @f)\*(C'\fR or \f(CW\*(C`$f = replace_dir ($dir, $f)\*(C'\fR without worrying too much about context.) .Sp For example, .Sp .Vb 1 \& @f = replace_dir (\*(Aq/tmp\*(Aq, \*(Aq/foo/bar/baz\*(Aq, \*(Aqblam\*(Aq, \*(Aq../bong\*(Aq) .Ve .Sp sets \f(CW@f\fR to \f(CW\*(C`(\*(Aq/tmp/baz\*(Aq, \*(Aq/tmp/blam\*(Aq, \*(Aq/tmp/bong\*(Aq)\*(C'\fR, and .Sp .Vb 1 \& $f = replace_dir (\*(Aq/tmp\*(Aq, \*(Aq/foo/bar/baz\*(Aq) .Ve .Sp sets \f(CW$f\fR to \f(CW\*(Aq/tmp/baz\*(Aq\fR. .IP "replace_ext (\s-1NEWEXT\s0, \s-1FILE\s0, ...)" 4 .IX Item "replace_ext (NEWEXT, FILE, ...)" Replaces the final extension (whatever follows the last dot) of each \s-1FILE\s0 with \s-1NEWEXT\s0. You can supply as many \s-1FILE\s0 arguments as you like; they are \&\fInot\fR modified in place. .Sp Returns the list of modified filenames; or, in a scalar context, returns the first element of that list. (That way you can say either \&\f(CW\*(C`@f = replace_ext ($ext, @f)\*(C'\fR or \f(CW\*(C`$f = replace_dir ($ext, $f)\*(C'\fR without worrying too much about context. .Sp For example, .Sp .Vb 1 \& replace_ext (\*(Aqxfm\*(Aq, \*(Aqblow_joe_mri.mnc\*(Aq) .Ve .Sp in a scalar context returns \f(CW\*(Aqblow_joe_mri.xfm\*(Aq\fR; in an array context, it would just return the one-element list \f(CW\*(C`(\*(Aqblow_joe_mri.xfm\*(Aq)\*(C'\fR. .IP "merge_paths (\s-1DIRS\s0)" 4 .IX Item "merge_paths (DIRS)" Goes through a list of directories, culling duplicates and converting them to a form more amenable to stuffing in \s-1PATH\s0 variables and the like. Basically, this means undoing the work of \f(CW\*(C`normalize_path\*(C'\fR: trailing slashes are stripped, and empty strings are replaced by '.'. .Sp Returns the input list with duplicates removed (after those minor string transformations). .IP "expand_path (\s-1PATH\s0)" 4 .IX Item "expand_path (PATH)" Expands user home directories (using the ~ notation) and environment variables (using the $ notation) in a path. .Sp Home directories are expanded as follows: if \s-1PATH\s0 starts with a tilde (~), the text from the tilde to the first slash or end of string (if no slashes) is taken to be a username. If this username is empty (ie. \s-1PATH\s0 is just \&\f(CW\*(Aq~\*(Aq\fR or starts with \f(CW\*(Aq~/\*(Aq\fR), then the tilde is replaced by the current user's home directory (from \f(CW$ENV{\*(AqHOME\*(Aq}\fR). Otherwise, the username is looked up in the password file to find that user's home directory, which then replaces the leading \f(CW\*(Aq~username\*(Aq\fR in \s-1PATH\s0. If the username is unknown, \f(CW\*(C`expand_path\*(C'\fR prints a warning and returns false. .Sp Environment variables are expanded as follows: any $ seen in \s-1PATH\s0 followed by a string of one or more letters, digits, and underscores is replaced by the environment variable named by that string. If no such variable is found, \f(CW\*(C`expand_path\*(C'\fR prints a warning and returns false. .Sp Note that the first call to \f(CW\*(C`expand_path\*(C'\fR that expands a home directory other than that of the current user will involve a slight delay as the entire password file is read in. This information is cached for future invocations, though. .SH "AUTHOR" .IX Header "AUTHOR" Greg Ward, . .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright (c) 1997 by Gregory P. Ward, McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University. .PP This file is part of the \s-1MNI\s0 Perl Library. It is free software, and may be distributed under the same terms as Perl itself.