|
Methods defined here:
- __init__(self, acc_id=None, proxy={})
- Constructor to initialize Object, use proxy if neccessary
- __str__(self)
- String-Representation for Object
- load_models(self, model_files)
- load model or list of models into self.model
- load_mols(self, sd_file)
- load SD-File from .sdf, .sdf.gz or .sd.gz
- predict(self, model_number)
- try to predict activity of compounds using giving model-Number
- save_model(self, outfile, model_number=0)
- save Model to file using cPickle.dump
- save_model_info(self, outfile, mode='html')
- create html- or csv-File for models according to mode (default: "html")
- save_mols(self, outfile, gzip=True)
- create SD-File of current molecules in self.sd_entries
- step_0_get_chembl_data(self)
- Download Compound-Data for self.acc_id, these are available in self.sd_entries afterwards
- step_1_keeplargestfrag(self)
- remove all smaller Fragments per compound, just keep the largest
- step_2_remove_dupl(self)
- remove duplicates from self.sd_entries
- step_3_merge_IC50(self)
- merge IC50 of duplicates into one compound using mean of all values if:
min(IC50) => IC50_avg-3*IC50_stddev && max(IC50) <= IC50_avg+3*IC50_stddev && IC50_stddev <= IC50_avg
- step_4_set_TL(self, threshold, ic50_tag='value')
- set Property "TL"(TrafficLight) for each compound:
if ic50_tag (default:"value") > threshold: TL = 0, else 1
- step_5_remove_descriptors(self)
- remove list of Properties from each compound (hardcoded)
which would corrupt process of creating Prediction-Models
- step_6_calc_descriptors(self)
- calculate descriptors for each compound, according to Descriptors._descList
- step_7_train_models(self)
- train models according to trafficlight using sklearn.ensamble.RandomForestClassifier
self.model contains up to 10 models afterwards, use save_model_info(type) to create csv or html
containing data for each model
|