smartie.py: CCP4 logfile parser tools

Usage documentation for version 0.0.15

Introduction

smartie is a set of Python classes and methods intended to provide tools for parsing the content of CCP4 logfiles. The name "smartie" reflects its origins as the driver for a "smart logfile browser", although this aim has not yet been realised.

The logfile class lies at the heart of smartie. Once populated from a file, a logfile object gives a high-level view of that file in terms of "components" (CCP4i comments, individual program logs, tables, warnings, summaries and so on). The logfile class was loosely inspired by the Javascript DOM (document object model) for describing hypertext documents, although it is far more limited at present.

To see an example of smartie parsing a logfile, feed it one of the example logfiles included in the distribution, e.g.:

% python smartie.py 7_refmac5.log

or all the example log files:

% python smartie.py *.log | more

or try it on a file of your own.

Module documentation

The documentation for the classes and functions in the smartie module (generated using pydoc) are here:

There are also overviews of the different classes below.

Usage examples

1. Interrogating logfiles

To create a new logfile object describing (say) a CCP4i logfile from a scala job, we use the smartie's parselog method:

>>> import smartie
>>> logfile = smartie.parselog("22_scala.log")

We can then find out for example how many "fragments" smartie thought it had found in this file:

>>> logfile.nfragments()
8

Fragments are any particular chunk of logfile that smartie recognised. We can interrogate the type of fragment, for example:

>>> logfile.fragment(0).isccp4i_info()
True

If we're only interested in the fragments that looked like individual program output then we can also find out how many programs it thought it had found, by querying its list of program logs:

>>> logfile.nprograms()
7

We can ask it some questions about individual fragments, for example:

>>> logfile.program(1).isccp4()
True
>>> logfile.program(1).name
'Scala'
>>> logfile.program(1).version
'5.99'
>>> logfile.program(1).termination_message
'** Normal termination **'

It is also possible to get a list of the keyword input lines for each program logfile, for example:

>>> logfile.program(0).name
'SORTMTZ'
>>> logfile.program(0).keywords()
['ASCEND', 'H K L M/ISYM BATCH']

For each program any "logical name/filename" pairs found in the logfile are also stored and can be retrieved. To get a list of the logical names associated with files that were opened:

>>> logfile.program(1).logicalnames()
['HKLIN', 'HKLOUT', 'SYMINFO']

Then, to find out what the associated file is for a particular logical name:

>>> logfile.program(1).logicalnamefile("HKLIN")
'/home/pjx/PROJECTS/myProject/aucn_sorted.mtz'
>>> logfile.program(1).logicalnamefile("HKLOUT")
'/tmp/pjx/PROJECT_22_2_mtz.tmp'

The logfile class offers similar methods for fragments which are not CCP4 program output but are (for example) messages from CCP4i. Smartie also has a summarise function which will print a report of the logfile contents:

>>> logfile = smartie.parselog("22_scala.log")
>>> smartie.summarise(logfile)
Summary for 22_scala.log

This is a CCP4i logfile

8 logfile fragments

Fragments:
        CCP4i info
        Program: SORTMTZ
        Program: Scala
        Program: MTZDUMP
        Program: UNIQUE
        Program: FREERFLAG
        Program: CAD
        Program: FREERFLAG

7 program logfiles

Programs:
        SORTMTZ v5.99   (CCP4 5.99)
        Scala   v5.99   (CCP4 5.99)

                Tables:
                Table: ">>> Scales v rotation range, red_aucn"
                Table: "Analysis against Batch, red_aucn"
                Table: "Analysis against resolution , red_aucn"

...

The fragments and fragment-derived objects (programs and ccp4i_info) also allow the text of the fragment to be retrieved from the logfile. For example:

>>> logfile = smartie.parselog("22_scala.log")
>>> prog = logfile.program(0)
>>> print prog.retrieve()
 ###############################################################
 ###############################################################
 ###############################################################
 ### CCP4 5.99: SORTMTZ            version 5.99      : 06/09/05##
 ###############################################################
 User: pjx  Run date: 31/ 1/2006 Run time: 16:02:55

...

2. Extracting summaries from marked up logfiles

As of version 0.0.8, logfile objects store information about blocks of "summary" text that are found in the source logfile. A summary block is a section of logfile output that is enclosed within <!--SUMMARY_BEGIN--> and <!--SUMMARY_END--> tags, for example:

<B><FONT COLOR="#FF0000"><!--SUMMARY_BEGIN-->

================================================================================

Summary data for Project: DMSO Crystal: DMSO Dataset: red_aucn

                                           Overall  OuterShell

  Low resolution limit                       35.27      3.16
  High resolution limit                       3.00      3.00
...
<!--SUMMARY_END--></FONT></B>

There are a number of methods available to interrogate the summary block information, for example: to find out how many summary blocks a logfile contains:

>>> logfile = smartie.parselog("22_scala.log")
>>> logfile.nsummaries()
30

i.e. the logfile holds 30 summary blocks. For each summary block the start and end lines in the source logfile can be retrieved, as can the actual text, for example to get information on the 13th summary block in a log file:

>>> logfile.summary(12).start()
853
>>> logfile.summary(12).end()
855
>>> print logfile.summary(12).retrieve()
<B><FONT COLOR="#FF0000"><!--SUMMARY_BEGIN-->
Logical name: CORRELPLOT, Filename: /home/pjx/PROJECTS/myProject/PROJECT_22_correlplot.xmgr
<!--SUMMARY_END--></FONT></B>

It's not clear that this functionality is particularly useful. One application is to write out all the summaries in one go e.g.:

>>> for i in range(0,logfile.nsummaries()):
...    print logfile.summary(i).retrieve()
...

(In this last example, Smartie's strip_logfile_html() command could also be used to remove any HTML tags in the output - and to escape any HTML special characters - in order to make the summary output easier to read.)

This functionality is provided in the show_summary.py example script.

3. Working with tables and graphs

Once a logfile object has been constructed from a file, smartie offers various ways to find out about the tables associated with the file overall, and with individual programs and fragments.

We can ask it about tables that it found for an individual fragment or program:

>>> logfile.fragment(2).ntables()
7
>>> logfile.program(1).tables()[3].title()
'Analysis against intensity, red_aucn'
>>> logfile.program(1).tables()[4].ngraphs()
4
>>> logfile.program(1).tables()[4].table_graph(0).title()
'Completeness v Resolution '
>>> logfile.program(1).tables()[4].nrows()
10

We can also ask it similar questions about tables in the logfile as a whole, for example:

>>> logfile.ntables()
7

We can also fetch a table in a logfile or a program by specifying a regular expression pattern that matches the table title, for example:

>>> logfile = smartie.parselog("7_refmac5.log")
>>> logfile.tables("Rfactor analysis, stats vs cycle")[0].title()
'Rfactor analysis, stats vs cycle'
>>> logfile.program(0).tables("Cycle   11")[0].title()
'Cycle   11. Rfactor analysis, F distribution v resln'

For a particular table we can get the values for a particular column:

>>> tbl = logfile.tables()[6]
>>> tbl.col("Rfree")
['0.178', '0.196', '0.204', '0.210', '0.215', '0.221', '0.222', '0.225',
'0.227', '0.228', '0.229']
>>> tbl.col("Rfree")[-1]
'0.229'

4. Using the table class to create tables and graphs

smartie's table, table_graph and table_columns are intended to be useful not only for reading tables from logfiles, but also for constructing and writing them.

In outline the steps involved are:

  1. Create a new table e.g. tbl = smartie.table(title)
  2. Define a set of columns e.g. tbl.addcolumn(column_name)
  3. Add data to the table "row-wise" e.g. tbl.add_data(dictionary)
  4. (Optionally) add graph definitions e.g. tbl.definegraph(title,column_list)

Graph definitions are required in order to generate $TABLE marked-up loggraph table using the table.loggraph() and related methods.

An example of creating and populating a new table can be found in the smartie.table_example() method. Alternatively:

>>> tbl = smartie.table("A table with random data")
>>> for i in range(0,3):
...     col = tbl.addcolumn("col_"+str(i))
...
>>> for j in range(0,6):
...     tbl.add_data({"col_0":j,"col_1":j*2,"col_2":j*3})
...
>>> tbl.definegraph("An arbitrary graph",("col_0","col_1"))
>>> tbl.definegraph("Another arbitrary graph",("col_0","col_2"))
>>> print tbl.loggraph()
$TABLE: A table with random data:
$GRAPHS
 :An arbitrary graph:A:1,2:
 :Another arbitrary graph:A:1,3:
$$
  col_0  col_1  col_2 $$ $$
      0      0      0
      1      2      3
      2      4      6
      3      6      9
      4      8     12
      5     10     15
$$

If you want the graph to be written with the Jloggraph applet markup also included then you can use instead:

>>> print tbl.jloggraph()
<applet width="400" height="300" code="JLogGraph.class"
codebase=""><param name="table" value="
$TABLE: A table with random data:
$GRAPHS
 :An arbitrary graph:A:1,2:
 :Another arbitrary graph:A:1,3:
$$
  col_0  col_1  col_2 $$ $$
      0      0      0
      1      2      3
      2      4      6
      3      6      9
      4      8     12
      5     10     15
$$"><b>For inline graphs use a Java browser</b></applet>

Alternatively, the show method will just return the table body (column titles plus data) as a block of text without any additional markup, and the html method will return a similar table formatted with the appropriate HTML tags (and with any special characters converted to their HTML equivalents for correct display).

Overview of Smartie classes

Smartie offers the following principle classes:

There are a number of additional classes support to support these:

Finally there are also some classes which are primarily intended for use internally to smartie:

A logfile object is populated and returned by the parselog() function. This takes a file name as a single compulsory argument; the optional "progress" argument specifies a number of lines at which to report progress when parsing the file. It can recognise the following features in a logfile:

parselog reads a logfile and returns a logfile object based on the file contents. A logfile object holds lists of fragments, programs, tables, keytext messages, CCP4i information messages and summary blocks.

Applications using Smartie

Smartie is currently used in three applications:

Known issues/to-do list

Change log

Changes in 0.0.15

Changes in 0.0.14

Changes in 0.0.13

Changes in 0.0.12

Changes in 0.0.11

Changes in 0.0.10

Changes in 0.0.9

Changes in 0.0.8

Changes in 0.0.7

Changes in 0.0.6

Changes in 0.0.5

Changes in 0.0.4

Changes in 0.0.3

Changes in 0.0.2

See also

The CCP4 documentation contains details of the format and syntax for CCP4 tables, graphs and keytext messages:

Acknowledgements

Thanks to Ronan Keegan, Wendy Yang and Martyn Winn for providing useful input to the development of smartie.

Kevin Cowtan has also provided code changes and useful feedback.

Author

Peter Briggs, 2006-8