DB Handler Documentation

 

                                                           Wanjuan Yang

                                                             August 2007

 

 

Part 1:  Components of DB handler:

 

This section describes the components of DB handler and what each module does.

 

DB handler includes modules dbccp4i.py, manager.py, DBcommand.py, ccp4i.py, dbapi_sqlite.py.

 

dbccp4i.py: This is the main program. It is a server program. When it is running, it listens and waits for clients to connect. The initial port number that dbccp4i uses is 4090. But when it is not available, db handler tries to increase the port number by 1 each time until it finds an available port number to use. Then the port number will be written into lock file dbccp4i.LOCK in .CCP4 directory.

 

When a client connects to the handler, dbccp4i starts a thread and deals with the requests from clients.

 

manager.py: contains class Manager, Project, Client, User.

 

Manager is a class that manages projects in the handler. It stores the projects that opened in the handler, the clients that connecting to the handler, the users that connecting to the handler. It provides the methods of OpenProject, NewProject, SaveProject, CloseProject etc.  When it is initialised, it takes an argument of directories.def file. The default directories.def is the one in .CCP4/unix or .CCP4/window directory. But you could give an alternative file for that. Currently, the handler used in MrBUMP takes an alternative directories.def. How to modify the manager.py to suit MrBUMP will be described in part 4.

 

Project is a class that stores a database instance and the clients that open this database.

 

Client is a class that represents a client connection to the handler.  It stores the address of the client, the projects that the client open, the broadcast flag that the client set, the user name of the client etc.

 

User is a class that represents a user that log in the computer. It stores the user name, user agent.

 

manager.py also contains a class messager which outputs the message to a file or on the screen; a function of response_wrapper and result_wrapper.

 

DBcommand.py: This file contains most of dbhandler commands. There are sections for the commands.

 

Section 1: Commands for central SQL db

This section contains functions that access central SQL db.

Central SQL db is designed for one handler opens one SQLite database. The current central SQLite consists of tables Project, Job.  However, currently the central SQLite db is not in use.

 

Commands in this section include:

 

Functions:

1. do_sql_newrecord_command:

NewRecord: Generic command for creating a new record in a pre-defined table.

NewProject: Create a new record in Project table.

NewJob: create a new record in Job table.

 

2. do_sql_setdata_command:

SetJobData: update a job record in Job table.

SetData: update record in a given table.

 

3. do_sql_command:

GetJobData: Get a job attribute value in the Job table.

GetProjectList: Get a list of projects.

GetJobList: Get a list of jobs.

GetTableSchema: Get attributes for a table.

GetRowofTable: Get all of the records in a given table.

GetData: Generic command to get data from a given table.

GetProjectId: Get a project Id for a given project name.

CloseDB: close a central SQLite db.

 

Section 2:  Commands for dealing with database.def

Functions:

do_def_readdir_command:

GetDataDir: Return the directory corresponding to a def dir name

GetProjectDir: Return the directory corresponding to a project name

ListDataDirs: Return a list of the user’s default directories

GetNProjects: Return the number of projects

ListProjects: Return a list of the user’s projects

 

Section 3: Commands for writing data in database.def

Functions:

process_writedb_command:

            DbNewJob: create a new job in the project

            DbDeleteJob: Remove an existing job record from the project.

            DbSetData: Set the value of a data item for the specified job.

            Updatetime: Update the time of the job to be the current time.

            AddInputFile: Add a file to the list of input files for a job.

            AddOutputFile: Add a file to the list of output files for a job.

            AddSubJob: Add a subjob to a job in the project database

do_def_readdb_command:

            HasSQLdb: Check if the project has a SQLite db backend

            GetNJOB: Return the value of the NJOBS data item.

            DbGetData: Retrieve the value of a data item stored for a job.

            GetFiles: Return a list of files for a job.

            ListInputFiles: Return a list of input files for a job.

            ListOutputFiles: Return a list of output files for a job.

            DbItemExists: Check for the existence of a data item for a particular job.

            DbSelectJobs: Retrieve a list of jobs based on some selection criterion.

            ListJobs: Return the list of all jobs in the project.

            GetDbItems: Return a list of the core data items stored for all jobs.

            DbReturnJobs: Return a list of formatted strings populated with job data.

            DbGetListofRecords: Retrieve a list of job records.

            GetNextJobList: Return a list of job ids corresponding to the “children” jobs

                                        of the specified id.

            GetAllFileLinks: Return all the links between jobs for a project

            GetFileLinks: Return the links between the given job ids.

            GetAllChildren: Return all the jobs that are descendents of a particular job.

            GetAllParents: Return all the jobs that are antecedents of a particular job.

            GetChildren: Return the immediate children of a given job.

            GetParents: Return the immediate parents of a given job.

            GetAllParentsChildren: Return all the jobs that has direct or indirect link

                                                      with the given job.

GetNotebook: Return the note book name with the path.

ProjectWriteable: Test if it is possible to modify and save the data.

ProjectReadable: Test if it is possible to get data from the database object.

ReacquireProject: force to grab the project lock.

SelectSubJobs: Retrieve a list of subjobs based on some selection criterion.

HasSubJobs: Check if a job currently has subjobs defined.

ListJobswithsubjobs: Return a list of the jobs which also have subjobs.

 

Section 4:  Commands for SQLite db for a particular  project

One project can have both database.def and SQLite db backend. Currently, There is a ‘Jobs’ table to hold the additional job data. It has attributes ‘JobNumber’, ‘JobQuality’. This section contains commands that dealing with SQLite db.

 

Fuctions:

do_parallel_sqlitedb_command:

SetJobQuality: Set the job quality in ‘Jobs’ table.

SetSQLdbData: Generic command to set data in ‘Jobs’ table.

GetSQLdbData: Generic command to get data from ‘Jobs’ table.

GetAllSQLdbData: Get all the data from ‘Jobs’ table,

NewTableRecord: Generic command for inserting a record in a given table

DeleteTableRecord: Generic command for removing a record in a given table

DeleteTableRecords: Delete records in a table bases on certain condition.

SetTableData: Set value for a given table and its attribute

GetTableData: Retrieve the value of a given table and attribute

GetTablePrimaryKey: Retrieve primary key(s) for a given table based on certain condition.

GetAllTableRecords: Retrieve all the records in a given table.

GetTableRecords: Retrieve records in a given table based on certain condition.

 

ccp4i.py:  This is a class library for emulating the CCP4i def file based database. It also provides classes for dealing with the CCP4i “directories.def” file and generic classes for .def files and lockfiles.

 

dbapi_sqlite.py:  It contains the class for interacting SQLite database.

It takes a schema file to create a new db. The schema file must follow the format:

  1. Each create statement end with ”;”.
  2. The primary key for each table must be $tablename_Id . e.g. for table Dataset, its primary key is: ‘Dataset_Id’. For table Jobs, its primary key is ’Jobs_Id’.

e.g. current SQLite db schema is as follows:

 

CREATE TABLE Jobs (Jobs_Id INTEGER primary key,

                            JobNumber INTEGER,

                                                    JobQuality VARCHAR(40),

                                                    UserAgent VARCHAR(40));

 

 

CREATE TABLE Dataset (Dataset_Id INTEGER primary key,

                                                         DatasetName VARCHAR(64) unique, not null,

                                                         MTZfileProject VARCHAR(64) not null,

                                                         MTZfileName VARCHAR(200) not null,

                                                         Fmean VARCHAR(30) not null,

                                                         SigFmean VARCHAR(30) not null,

                                                         Dano VARCHAR(30),

                                                         SigDano VARCHAR(30),

                                                         MTZCrystalName VARCHAR(64),

                                                         MTZDatasetName VARCHAR(64),

                                                         CurrentHA INTEGER);

 

CREATE TABLE HA (HA_Id INTEGER primary key,

                                                   HAfileProject VARCHAR(64) not null,

                                                   HAfileName VARCHAR(200) not null,

                                                   JobNumber VARCHAR(10),

                                                   DatasetId INTEGER);

 

 

Part 2: How to make changes in dbhandler

 

1. How to add commands to the handler

In class ServeClient, it defines valid commands. The valid commands have 9 groups:

self._admin_db_command: commands for dbhandler itself including ShutDown, DbDisconnect, DbRegister, DbRequestStatus, DbSupported, DbOpen, DbClose.

self._def_readdb_command: commands for reading data from database.def file.

self._def_writedb_command: commands for writing data to database.def file.

self._def_readdir_command: commands for reading data from directories.def file.

self._def_writedir_command: commands for writing data to directories.def file.

self._parallel_sqldb_command: commands for dealing with ‘paralle sql db’, i.e. for each project, it has a sqlite db to store knowledge base data and additional job data.

self._sql_newrecord_command: commands for create a new record in tables in central sqlite db. Some of the commands are for particular tables which not in use currently.

self._sql_setdata_command: commands for set data value in central sqlite db. ( Currently not in use ) 

self._sql_command: commands for retrieve data from central sqlite db. ( Currently not in use)

 

 

For each command, what it does is to get the arguments and then call the functions either from ccp4i.database object, ccp4i.directories object or dbapi_sqlite.DB object. According to the result return, assign the status for the response. If needed, construct the broadcast message and return the response.

 

Steps for adding commands:

  1. In dbccp4i.py, there are 9 groups commands. According to the function of the new command, assign one of these groups for the new command. Add the command in the group in class ServeClient, __init__().
  2. Find the function that  processes the group of command. For function of  process_admin_db_command and process_def_writedir_command are in dbccp4i.py. The others are in DBcommand.py.

3.  Add the command in the function.

 

 

2.  How to add tables in db handler

SQLite db schema is in the file schema.sql in the directory of $DBCCP4_TOP/dbccp4i/.

Any changes of the table schema, e.g. adding a new table or removing a table etc, just add/replace the new schema in the schema.sql file.

 

The APIs are mostly generic, it can be called from ClientAPI. So to update the APIs, it can be done in ClientAPI.

 

3.  How to modify dbhandler to suit MrBUMP.

Currently, MrBUMP uses a modified version of dbhandler. It uses its own directories.def file, so the projects that created by MrBUMP couldn’t be seen from CCP4i.

 

The changes are in manager.py. At the very bottom, there is a line:

ProjectManager = Manager(). This is by default to use the directories.def in .CCP4/unix or .CCP4/window directory which CCP4i uses.  For MrBUMP use, comment out this line.

Instead, uncomment the following 20 lines. What that does is to make a copy of directories.def in .CCP4/windows or .CCP4/unix directory and put it .CCP4 directory with the name of ‘directories_mr.def’. Then all the projects created from MrBUMP are recorded in directories_mr.def. These projects can only be viewed from dbviewer, but not in CCP4i.

 

4. How to change the timeout for the handler

In the function of sockfactory indbccp4i.py, change the arguments of socket.setdefaulttimeout(). Note that there are two places of socket.setdefaulttimeout(). Both need to be changed.

 

5. Where is the debugging output written

The debugging message is written in the file dbhandler.log in .CCP4 directory.

If use ‘-debug’ option, then it will output to the screen.

 

6.  How/when is the data saved?

Data will be saved regularly every the timeout period. The save of database.def is implemented in class Manager.SaveProject(). The save of directories.def is implemented in class Manager.SaveDirectory().

 

7.  How to construct the broadcast messages

The construct of broadcast message is manually written in pre-define XML format. It is constructed in each command. Different commands have different broadcast messages to send or maybe no broadcast message at all.

 

Part 3: dbviewer

 

How does dbviewer store project data internally?

 

Procedure ‘GetDbData project { jobid } ’ gets the database.def data from handler when dbviewer initially displays a project or subjob, or when the dbviewer refresh.

If the jobid is supplied, then the data is subjob data. All the items of data are stored in global array ‘database’.

 

Procedure ‘getdata id item’ gets the data from database array.

 

Procedure ‘GetDb_extensionData project itemlist’ get the SQLite table ‘Jobs’

data from handler. The procedure is called when dbviewer initially displays a project  or subjob, or when the dbviewer refresh. Currently table ‘Jobs’ has items ‘JobNumber’, ‘JobQuality’. Dbviewer stores these two items value in the global array 

database_extension’.

 

Procedure ‘get_extension_data item id’ gets the data from array  database_extension’.