DB Handler
Documentation
Wanjuan Yang
August 2007
Part 1: Components
of DB handler:
This section describes the components of DB handler and what each module does.
DB handler includes modules dbccp4i.py, manager.py, DBcommand.py, ccp4i.py, dbapi_sqlite.py.
dbccp4i.py: This is the main program. It is a server program. When it is running, it listens and waits for clients to connect. The initial port number that dbccp4i uses is 4090. But when it is not available, db handler tries to increase the port number by 1 each time until it finds an available port number to use. Then the port number will be written into lock file dbccp4i.LOCK in .CCP4 directory.
When a client connects to the handler, dbccp4i starts a thread and deals with the requests from clients.
manager.py: contains class Manager, Project, Client, User.
Manager is a class that manages projects in the handler. It stores the projects that opened in the handler, the clients that connecting to the handler, the users that connecting to the handler. It provides the methods of OpenProject, NewProject, SaveProject, CloseProject etc. When it is initialised, it takes an argument of directories.def file. The default directories.def is the one in .CCP4/unix or .CCP4/window directory. But you could give an alternative file for that. Currently, the handler used in MrBUMP takes an alternative directories.def. How to modify the manager.py to suit MrBUMP will be described in part 4.
Project is a class that stores a database instance and the clients that open this database.
Client is a class that represents a client connection to the handler. It stores the address of the client, the projects that the client open, the broadcast flag that the client set, the user name of the client etc.
User is a class that represents a user that log in the computer. It stores the user name, user agent.
manager.py also contains a class messager which outputs the message to a file or on the screen; a function of response_wrapper and result_wrapper.
DBcommand.py: This file contains most of dbhandler commands. There are sections for the commands.
Section 1: Commands
for central SQL db
This section contains functions that access central SQL db.
Commands in this section include:
Functions:
1. do_sql_newrecord_command:
NewRecord: Generic command for creating a new record in a pre-defined table.
NewProject: Create a new record in Project table.
NewJob: create a new record in Job table.
2. do_sql_setdata_command:
SetJobData: update a job record in Job table.
SetData: update record in a given table.
3. do_sql_command:
GetJobData: Get a job attribute value in the Job table.
GetProjectList: Get a list of projects.
GetJobList: Get a list of jobs.
GetTableSchema: Get attributes for a table.
GetRowofTable: Get all of the records in a given table.
GetData: Generic command to get data from a given table.
GetProjectId: Get a project Id for a given project name.
CloseDB: close a central SQLite db.
Section 2: Commands for dealing with database.def
Functions:
do_def_readdir_command:
GetDataDir: Return the directory corresponding to a def dir name
GetProjectDir: Return the directory corresponding to a project name
ListDataDirs: Return a list of the user’s default directories
GetNProjects: Return the number of projects
ListProjects: Return a list of the user’s projects
Section 3: Commands
for writing data in database.def
Functions:
process_writedb_command:
DbNewJob: create a new job in the project
DbDeleteJob: Remove an existing job record from the project.
DbSetData: Set the value of a data item for the specified job.
Updatetime: Update the time of the job to be the current time.
AddInputFile: Add a file to the list of input files for a job.
AddOutputFile: Add a file to the list of output files for a job.
AddSubJob: Add a subjob to a job in the project database
do_def_readdb_command:
HasSQLdb: Check if the project has a SQLite db backend
GetNJOB: Return the value of the NJOBS data item.
DbGetData: Retrieve the value of a data item stored for a job.
GetFiles: Return a list of files for a job.
ListInputFiles: Return a list of input files for a job.
ListOutputFiles: Return a list of output files for a job.
DbItemExists: Check for the existence of a data item for a particular job.
DbSelectJobs: Retrieve a list of jobs based on some selection criterion.
ListJobs: Return the list of all jobs in the project.
GetDbItems: Return a list of the core data items stored for all jobs.
DbReturnJobs: Return a list of formatted strings populated with job data.
DbGetListofRecords: Retrieve a list of job records.
GetNextJobList: Return a list of job ids corresponding to the “children” jobs
of the specified id.
GetAllFileLinks: Return all the links between jobs for a project
GetFileLinks: Return the links between the given job ids.
GetAllChildren: Return all the jobs that are descendents of a particular job.
GetAllParents: Return all the jobs that are antecedents of a particular job.
GetChildren: Return the immediate children of a given job.
GetParents: Return the immediate parents of a given job.
GetAllParentsChildren: Return all the jobs that has direct or indirect link
with the given job.
GetNotebook: Return the note book name with the path.
ProjectWriteable: Test if it is possible to modify and save the data.
ProjectReadable: Test if it is possible to get data from the database object.
ReacquireProject: force to grab the project lock.
SelectSubJobs: Retrieve a list of subjobs based on some selection criterion.
HasSubJobs: Check if a job currently has subjobs defined.
ListJobswithsubjobs: Return a list of the jobs which also have subjobs.
Section 4: Commands for SQLite db
for a particular
project
One project can have both database.def and SQLite db backend. Currently, There is a ‘Jobs’ table to hold the additional job data. It has attributes ‘JobNumber’, ‘JobQuality’. This section contains commands that dealing with SQLite db.
Fuctions:
do_parallel_sqlitedb_command:
SetJobQuality: Set the job quality in ‘Jobs’ table.
SetSQLdbData: Generic command to set data in ‘Jobs’ table.
GetSQLdbData: Generic command to get data from ‘Jobs’ table.
GetAllSQLdbData: Get all the data from ‘Jobs’ table,
NewTableRecord: Generic command for inserting a record in a given table
DeleteTableRecord: Generic command for removing a record in a given table
DeleteTableRecords: Delete records in a table bases on certain condition.
SetTableData: Set value for a given table and its attribute
GetTableData: Retrieve the value of a given table and attribute
GetTablePrimaryKey: Retrieve primary key(s) for a given table based on certain condition.
GetAllTableRecords: Retrieve all the records in a given table.
GetTableRecords: Retrieve records in a given table based on certain condition.
ccp4i.py: This is a class library for emulating the CCP4i def file based database. It also provides classes for dealing with the CCP4i “directories.def” file and generic classes for .def files and lockfiles.
dbapi_sqlite.py: It contains the class for interacting SQLite database.
It takes a schema file to create a new db. The schema file must follow the format:
e.g. current SQLite db schema is as follows:
CREATE TABLE Jobs (Jobs_Id INTEGER primary key,
JobNumber INTEGER,
JobQuality VARCHAR(40),
UserAgent VARCHAR(40));
CREATE TABLE Dataset (Dataset_Id INTEGER primary key,
DatasetName VARCHAR(64) unique, not null,
MTZfileProject VARCHAR(64) not null,
MTZfileName VARCHAR(200) not null,
Fmean VARCHAR(30) not null,
SigFmean VARCHAR(30) not null,
Dano VARCHAR(30),
SigDano VARCHAR(30),
MTZCrystalName VARCHAR(64),
MTZDatasetName VARCHAR(64),
CurrentHA INTEGER);
CREATE TABLE HA (HA_Id INTEGER primary key,
HAfileProject VARCHAR(64) not null,
HAfileName VARCHAR(200) not null,
JobNumber VARCHAR(10),
DatasetId INTEGER);
Part 2: How to make changes in dbhandler
1. How to add
commands to the handler
In class ServeClient, it defines valid commands. The valid commands have 9 groups:
self._admin_db_command: commands for dbhandler itself including ShutDown, DbDisconnect, DbRegister, DbRequestStatus, DbSupported, DbOpen, DbClose.
self._def_readdb_command: commands for reading data from database.def file.
self._def_writedb_command: commands for writing data to database.def file.
self._def_readdir_command: commands for reading data from directories.def file.
self._def_writedir_command: commands for writing data to directories.def file.
self._parallel_sqldb_command: commands for dealing with ‘paralle sql db’, i.e. for each project, it has a sqlite db to store knowledge base data and additional job data.
self._sql_newrecord_command: commands for create a new record in tables in central sqlite db. Some of the commands are for particular tables which not in use currently.
self._sql_setdata_command: commands for set data value in central sqlite db. ( Currently not in use )
self._sql_command: commands for retrieve data from central sqlite db. ( Currently not in use)
For each command, what it does is to get the arguments and then call the functions either from ccp4i.database object, ccp4i.directories object or dbapi_sqlite.DB object. According to the result return, assign the status for the response. If needed, construct the broadcast message and return the response.
Steps for adding commands:
3. Add the command in the function.
2. How to add tables in db handler
SQLite db schema is in the file schema.sql in the directory of $DBCCP4_TOP/dbccp4i/.
Any changes of the table schema, e.g. adding a new table or removing a table etc, just add/replace the new schema in the schema.sql file.
The APIs are mostly generic, it can be called from ClientAPI. So to update the APIs, it can be done in ClientAPI.
3. How to modify dbhandler
to suit MrBUMP.
Currently, MrBUMP uses a modified version of dbhandler. It uses its own directories.def file, so the projects that created by MrBUMP couldn’t be seen from CCP4i.
The changes are in manager.py. At the very bottom, there is a line:
ProjectManager = Manager(). This is by default to use the directories.def in .CCP4/unix or .CCP4/window directory which CCP4i uses. For MrBUMP use, comment out this line.
Instead, uncomment the following 20 lines. What that does is to make a copy of directories.def in .CCP4/windows or .CCP4/unix directory and put it .CCP4 directory with the name of ‘directories_mr.def’. Then all the projects created from MrBUMP are recorded in directories_mr.def. These projects can only be viewed from dbviewer, but not in CCP4i.
4. How to change the timeout for the
handler
In the function of sockfactory indbccp4i.py, change the arguments of socket.setdefaulttimeout(). Note that there are two places of socket.setdefaulttimeout(). Both need to be changed.
5. Where is the debugging output written
The debugging message is written in the file dbhandler.log in .CCP4 directory.
If use ‘-debug’ option, then it will output to the screen.
6. How/when
is the data saved?
Data will be saved regularly every the timeout period. The save of database.def is implemented in class Manager.SaveProject(). The save of directories.def is implemented in class Manager.SaveDirectory().
7. How to construct the broadcast messages
The construct of broadcast message is manually written in pre-define XML format. It is constructed in each command. Different commands have different broadcast messages to send or maybe no broadcast message at all.
Part 3: dbviewer
How does dbviewer store project data internally?
Procedure ‘GetDbData project { jobid } ’ gets the database.def data from handler when dbviewer initially displays a project or subjob, or when the dbviewer refresh.
If the jobid is supplied, then the data is subjob data. All the items of data are stored in global array ‘database’.
Procedure ‘getdata id item’ gets the data from database array.
Procedure ‘GetDb_extensionData project itemlist’ get the SQLite table ‘Jobs’
data from handler. The procedure is called when dbviewer initially displays a project or subjob, or when the dbviewer refresh. Currently table ‘Jobs’ has items ‘JobNumber’, ‘JobQuality’. Dbviewer stores these two items value in the global array
‘database_extension’.
Procedure ‘get_extension_data item id’ gets the data from array ‘database_extension’.