BOINC Project Setup for Virtual Drug Screening

This page summarises and introduces to the employment of BOINC to orchestrate tasks for the docking of small chemical compounds to a protein. This is commonly a flexible ligand fitted to a solid structure - or a fit to a set of structures that capture the protein in various moments. The world has seen several projects on docking with BOINC before, e.g. aforemost the World Community Grid's FightAids@Home and others of the WCG realm, but there is also Docking@Home and the Rosetta team could prepare a docking experiment at any time.

Our ambition is to bring all components directly into a regular Debian package or present it as a dependency. The authors of this page have their own in-house BOINC-based AutoDock project going, with all components available on Debian, but to round it all up, the development is still ongoing - and particularly so is this documentation. For joining in, please contact us.

1. Conceptional Overview

The project is centered around Debian as the sole source of all software tools required for the project and for an automated retrieval of data. The BOINC client is shipping with Debian-proper since a long time. The BOINC Server we decided to leave in the experimental section of Debian, so we can update this publicly exposed software at any time.

Other than Debian's packages for SETI and Milkyway client applications, there is no dedicated package for this docking project. The binary, i.e. AutoDock Vina in this first developmental stage, is wrapped and both the wrapper and its piggyback vina application are both sent once to every participating client. Ideas for optimisations should be sent to the (very responsive) upstream authors at Scripps.

https://wiki.debian.org/BOINC/Server/Projects/AutoDock?action=AttachFile&do=get&target=BOINC_Server_AutoDock_Overview.png

In the following, we describe the yellow arrow of above figure, i.e. how to get from the boinc-server-autodock Debian package with the help of what is shipping with boinc-server-maker to a web site that invites users to contribute and to a repository of ligand evaluations that can be interpreted by Raccoon. All scripts described or referenced below (if not referenced directly) are available from the git repository.

2. Preparation of BOINC side

The AutoDock BOINC project first of all is a regular BOINC project. All tools one knows about how to set up BOINC projects are working completely the same. We prepared a script to set up the AutoDock project at a predefined location with no human intervention. As nice as it is, please take the extra time to mentally follow the BOINC/ServerGuide. This transports a bit - not too much - of an extra understanding on how BOINC works internally, which we consider to help you all in helping us to improve the workflow. The current implementation is working solely with AutoDock Vina. Support for the classical AutoDock 4.x was once described on BOINC/ServerGuide/AutoDockApp and has yet to be updated and incorporated into our scripts.

The executable scripts all reside in /usr/share/boinc-server-autodock/bin. All BOINC-specific preparation is completed with the script "install.sh" in that directory. Caveat: This first cleans the database, do not use it inadvertedly. Then, it invokes separate scripts to perform steps 2.1 and 2.2 as described below.

2.1. Install BOINC web server

The tool autodockvina_install_project.sh performs

At this point, users can subscribe to the project and wait for work units.

2.2. Prepare AutoDock Vina binaries and inform BOINC about them

To script autodockvina_install_apps.sh continues with

The project is functional now, except for the missing workunits.

2.x. Comments

In a perfect world there would be no need wrappers. Instead, we would patch the AutoDock application to learn how to use the BOINC file descriptors. Also, we should indicate the progress in a file for BOINC to display to the user. For single ligands, though, AutoDock Vina is so quick, that this seems not to be required to help the user experience, much. For multiple ligands to be executed within the same work unit, the granularity of the progress indication is with the percentage of ligands evaluated.

3. Preparation of Docking side

It may be convenient to keep AutoDock Vina configuration files, receptor and ligands models on the same machine that runs BOINC server to avoid large data transmission over network. PDBQT files for the receptors and ligands of interest may be prepared with use of utilites from ?AutoDockTools. A number of prepared ligand sets can be downloaded from http://zinc.docking.org/pdbqt/. To automate the retrieval, install the ?getData Debian package. The one-time configuration of the receptor for the docking is performed as for every docking project with AutoDock and is supported by the MGLTools. Debian supports the CADD tool "Raccoon", which also features an interface to perform this preparation.

3.1. Make a database of receptor models for screening

For the bash commands below we will suppose that receptor files, in PDBQT format, are kept in /home/boincadm/my_autodock_vina_library/receptors.

3.2. Make a database of ligand models for screening

For the bash commands below we will suppose that ligand files, also in PDBQT format, are kept in /home/boincadm/my_autodock_vina_library/ligands.

3.3. Set configuration parameters for docking

For the bash commands below we will suppose that configuration files are kept in /home/boincadm/my_autodock_vina_library/configs.

4. Management of running project

4.1. Implement an assimilator program for collecting docking results

For the quick start we suggest just to use the sample assimilator provided in the BOINC source. For the result template that we created, it will collect output files into sample_results folder under the main project directory.

4. Management of running project

4.1. Implement an assimilator program for collecting docking results

For the quick start we suggest just to use the sample assimilator provided in the BOINC source. For the result template that we created, it will collect output files into sample_results folder under the main project directory.

4.2. Create a bash script to generate workunits

To automate generation of workunits basing on ligands and receptor libraries, we suggest the following script as a basis.

   1 #!/bin/bash
   2 # Generation of BOINC workunits for AutoDock Vina application
   3 set -e
   4 # Set configuration parameters
   5 BOINC_DBUSER="boincadm"
   6 BOINC_HOMEDIR="/home/boincadm"
   7 BOINC_DBPASS="boincpass"
   8 BOINC_PROJECTNAME="autodockvina"
   9 BOINC_INSTALLROOT="/var/lib/boinc-server-autodockvina"
  10 BATCH=`mysql -u ${BOINC_DBUSER} -p${BOINC_DBPASS} -N -s -e "use ${BOINC_PROJECTNAME}; select MAX(batch) from workunit;" 2> /dev/null`
  11 if [ -z "$BATCH" ]; then
  12   echo "E: Error selecting batch number from the database! Please check MySQL connection parameters."
  13   exit 1
  14 else
  15   let BATCH+=1
  16 fi
  17 cd ${BOINC_INSTALLROOT}/${BOINC_PROJECTNAME}
  18 for lig_file in ${BOINC_HOMEDIR}/my_autodock_vina_library/ligands/test_lib/*.pdbqt
  19 do
  20   lig_name=`basename $lig_file .pdbqt`
  21   for conf_file in `ls ${BOINC_HOMEDIR}/my_autodock_vina_library/configs/test_config.txt`
  22   do
  23     ligand_input=ligand_input_tmp_${lig_name}_${i}`date '+%s'`
  24     receptor_input=test_receptor.pdbqt
  25     config_input=config_tmp_`basename $conf_file .txt`
  26     cp $lig_file $ligand_input
  27     cp ${BOINC_HOMEDIR}/my_autodock_vina_library/receptors/test_receptor.pdbqt $receptor_input
  28     cp $conf_file $config_input
  29     #Stage input files
  30     ./bin/stage_file --copy $receptor_input
  31     ./bin/stage_file --copy $ligand_input
  32     ./bin/stage_file --copy $config_input
  33     #Generate workunit
  34     wuname=test_${lig_name}_${BATCH}_`basename $conf_file .txt`_${i}`date '+%s'`
  35     ./bin/create_work --appname autodock-vina --batch $BATCH --wu_name $wuname --wu_template templates/autodockvina_wu_template.xml \
  36                       --result_template templates/autodockvina_result_template.xml \
  37                       --command_line "--cpu 1 --receptor receptor.pdbqt --ligand ligand.pdbqt --config config.txt --out vina_result.pdbqt" $ligand_input $receptor_input $config_input
  38       echo "$wuname is prepared successfully."
  39   done
  40 done
  41 echo "I: Workunits were successfully created for batch #${BATCH}."

5. Result Collection

5.1. Create a bash script to filter out docking results

The "top" best compounds names may be extracted from the output files with use of convenient Linux utilites. We propose the following example.

The script extract_energy.sh to output the ligand names with their predicted binding energies:

   1 #!/bin/sh
   2 for n in $*
   3 do
   4   echo -n $(basename $n)
   5   sed  -n '/-+-/{n;p}' $n | sed 's/  */ /g' | cut -f1,3 -d' '
   6 done

The script get_top_energies.sh to sort the whole set of results and select the best ones:

   1 #/bin/sh
   2 #Call: ./get_top_energies.sh [DIRECTORY WITH RESULT FILES] [NUMBER OF HITS]
   3 TOP=10
   4 OUTPUTSDIR='.'
   5 if [ $# -ge 1 ] ; then
   6   if [ -d $1 ] ;  then
   7     OUTPUTSDIR=$1
   8     if [ $# -ge 2 ] ; then
   9       if ! [[ $2 =~ ^[0-9]+$ ]] ; then
  10         echo "Please, specify the number of hits properly."
  11         exit
  12       else
  13         TOP=$2
  14       fi
  15     fi
  16   else
  17     echo "Please, specify number of hits and the directory with result files."
  18     exit
  19   fi
  20 fi
  21 find ${OUTPUTSDIR} -name "*_0" | xargs -r ./extract_energy.sh | sort -n -k 2,2 | head -$TOP

This last bash script, executed in the directory with output files or with an appropriate parameter, should give the ligands names with binding affinity values. This data may be the first of interest for the project owner, and the script may easily be extended to extract more detailed information about docking results from the output files and the database. From the social point of view, it can be interesting to get the list of top ligands together with the names of users whose computers found them, and to display them at the project website.