Python: class p

p_con.p_con = class p_con

Class to create Models to classify Molecules active or inactive using threshold for value in training-data

Methods defined here:

__init__(self, acc_id=None, proxy={})
Constructor to initialize Object, use proxy if neccessary

__str__(self)
String-Representation for Object

load_models(self, model_files)
load model or list of models into self.model

load_mols(self, sd_file)
load SD-File from .sdf, .sdf.gz or .sd.gz

predict(self, model_number)
try to predict activity of compounds using giving model-Number

save_model(self, outfile, model_number=0)
save Model to file using cPickle.dump

save_model_info(self, outfile, mode='html')
create html- or csv-File for models according to mode (default: "html")

save_mols(self, outfile, gzip=True)
create SD-File of current molecules in self.sd_entries

step_0_get_chembl_data(self)
Download Compound-Data for self.acc_id, these are available in self.sd_entries afterwards

step_1_keeplargestfrag(self)
remove all smaller Fragments per compound, just keep the largest

step_2_remove_dupl(self)
remove duplicates from self.sd_entries

step_3_merge_IC50(self)
merge IC50 of duplicates into one compound using mean of all values if: min(IC50) => IC50_avg-3*IC50_stddev && max(IC50) <= IC50_avg+3*IC50_stddev && IC50_stddev <= IC50_avg

step_4_set_TL(self, threshold, ic50_tag='value')
set Property "TL"(TrafficLight) for each compound: if ic50_tag (default:"value") > threshold: TL = 0, else 1

step_5_remove_descriptors(self)
remove list of Properties from each compound (hardcoded) which would corrupt process of creating Prediction-Models

step_6_calc_descriptors(self)
calculate descriptors for each compound, according to Descriptors._descList

step_7_train_models(self)
train models according to trafficlight using sklearn.ensamble.RandomForestClassifier self.model contains up to 10 models afterwards, use save_model_info(type) to create csv or html containing data for each model