Differences between revisions 3 and 4
Revision 3 as of 2013-01-01 20:14:26
Size: 8581
Comment: overhaul
Revision 4 as of 2013-01-01 22:36:46
Size: 9113
Deletions are marked like this. Additions are marked like this.
Line 141: Line 141:
  * [[http://boinc.berkeley.edu/trac/wiki/ValidationSummary|ValidationSummary]] in the BOINC Wiki - read late if at all
  * [[http://boinc.berkeley.edu/trac/wiki/ValidationSimple|ValidationSimple]] in the BOINC Wiki - for developing your own validator
  * [[http://boinc.berkeley.edu/trac/wiki/ValidationLowLevel|ValidationLowLevel]] in the BOINC Wiki - for developing your own validator with a more fine-grain access to internals
Line 142: Line 145:
 * [[http://boinc.berkeley.edu/trac/wiki/ProjectConfigFile|ProjectConfigFile]] in the BOINC Wiki

This page explains how to work with files submitted back from the volunteer-run BOINC clients.

1. Config clients

The project should now be ready to run a couple of work units. Please use any Linux BOINC client. The Debian boinc-client package will do. It needs to be Linux since our application was yet only prepared for this OS. The project URL is $hosturl/$fileprojectname.

At this stage, from the users' perspective this BOINC uppercase project should already function just like any other BOINC project. Get yourself an account and process a result.

2. Treatment of results

2.1. Inspection of internal representation

Once the user was created, the respective entry in the database will look like

mysql> select id,create_time,email_addr,name,authenticator,country,postal_code,url,donated from user;
 id | create_time | email_addr    | name | authenticator | country | postal_code | url  | donated
  1 |  1310218977 | your@addr.com |  you | ce5e2e50akle6 | Country | nnn         | NULL |    0

and when the first result was computed, it will show as

mysql> select id,create_time,workunitid,outcome,hostid,userid,name,cpu_time,app_version_num,appid
         from result;
id | create_time | workunitid |outcome|hostid|userid| name   | cpu_time |app_version_num|appid
 1 |  1310217514 |          1 |      1|    1 |    1 | test_0 | 19.94525 |           612 |   11

the workunitid points to the compute challenge/job. When the job is executed multiple times, there will be multiple results for the same work unit

mysql> select id,create_time,appid,name,need_validate from workunit;
 id | create_time | appid | name | need_validate
  1 |  1310213354 |    11 | test |             1

The "need_validate" on 1 means that this result is not yet trusted to be correct.

3. Validation

The validation is nicely described at ValidationIntro of the BOINC Wiki. It is refering to a tool that decides if a result can be trusted or not. Typically this is by replication: two independent runs by different individuals should give the same results. The difficulty is not the false positive result, you just recompute. It is the false negatives - a good result falsely uploaded, which is of concern.

Here we aim not at a wild internet project but at projects for local groups. We shall do without verification. If this is acceptable will depend on your project. You take the following security measures:

  • a single false negative result will not be considered too dramatic because of some redundancy in the analysis, say by an identification of a cluster of results
  • you invite colleagues, students and their families to contribute, so the scientific value becomes evident to everyone and one feels connected to the research
  • to cheat shall be not too difficult, hence not technically interesting
  • you don't give credit for the results, which reduces evil doers to those with technical problems.
  • you compare individual results with your own calculations

The initial make_project put two validators into your project's bin folder: bin/sample_trivial_validator and bin/sample_bitwise_validator. We have not seen it yet, but make_project also generated a config.xml file for your project:

$ cat $installroot/$fileprojectname/config.xml
<?xml version="1.0" ?>
      <cmd>feeder -d 3 </cmd>
      <cmd>transitioner -d 3 </cmd>
      <cmd>file_deleter -d 3 </cmd>

For a validation of our results, we now need to add our favorite validator to the list of demons. For our example application we expect bit-identity between result files from the same input. When there are applications with a stochastic component, like for molecular dynamics, we may need something more problem-sensitive.

For a start, we suggest to go for the sample_trivial_validator, which accepts every result as valid. This reduces the complexity for now and your very first project is likely not to profit too much from your investment at this corner.

For the integration of the validator with the project, you do not call the thing directly. Well you could, actually, just make sure you are running as the right user, otherwise you cannot read the files:

/usr/lib/boinc-server-maker/bin/sample_trivial_validator -app upper_case --one_pass -d 4

The workunit needed to be validated then is assigned the credits and so are the users. Since one is not interested in executing such manually too often, one could either create a UNIX-typical cron file, or the daemon facility that BOINC provides in the config.xml file. It may look like

         sample_bitwise_validator -app upper_case --sleep_interval 600 -d 3

Every 10 minutes should be sufficient.

FIXME: explain the -d argument

4. Access on result files

The earlier described output template describes what files are expected to be created. Those are shipped back and can then be prepared for subsequent scrutiny. This is explained neatly on the page AssimilateIntro on the BOINC Wiki . For projects that have only results that are successfully validated and that have only one file per workunit, the action to be performed is basically "none". Well, One moves the right file of the many (for complicated projects) into some result area that is then archived. For our purpose, the assimilator could just be "mv", the UNIX file mover. But there is also some special communication with the server, so one takes the sample_assimilator, which gets the files from their original location

$ find $installroot/$fileprojectname/upload/

And look like expected

$ cat upload/ba/test_0_0

into a new directory. From the upload directory they will otherwise be removed by the file_deleter daemon. With some deeper insights one can decide to do additional things. To mind come to ring a bell for exceptionally nice results, or to immediately remove a result rather than copying it when it is too bad to be true. The sample_assimilator names the files after their workunit name, basically to bring relief to the programmer to think about a unique namespace. Again, one specifies a daemon section. In complete analogy to the verification:

$ /usr/lib/boinc-server-maker/bin/sample_assimilator --app upper_case --one_pass -d 4
2011-08-10 02:43:07.3138  Starting
query: select * from app where name='upper_case'
query: select * from workunit where appid=11 and assimilate_state=1  limit 1000
2011-08-10 02:43:07.3234 [debug] [test] assimilating WU 1; state=1
query: select * from result where workunitid=1
query: update workunit set assimilate_state=2, transition_time=1312936987 where id=1
query: COMMIT
2011-08-10 02:43:07.3502  Assimilated 1 workunits.

The entry for the config.xml is

         sample_assimilator --app upper_case --sleep_interval 610 -d 3

and the completed workunits are then found in

$ find sample_results -type f

and indeed

$ cat sample_results/test

where "test" is indeed the name of the workunit as can be seen in the table above. Please make better workunit names than shown here. Those should have a meaning of some sort. Files in here will not be touched by the system since the workunit will be completed. Anyway - make backups.

5. References

Back to BOINC/ServerGuide