If a domU crashes or freezes while uttering the famous lasts words 'clocksource/0: Time went backwards', your domU is likely using the xen clocksource instead of its own clock ticks. In practice, this seems to be the cause of infrequent lockups under load (and/or problems with suspending).
A workaround is to decouple the clock in the domU from the dom0:
In your dom0 and domU /etc/sysctl.conf add the line: xen.independent_wallclock=1. On the dom0, edit the configuration file of the domU (e.g. /etc/xen/foobar.cfg and add (or expand) the extra-line: extra="clocksource=jiffies".
These settings can be activated without rebooting the domU. After editing the configuration files, issue sysctl -p and echo "jiffies"> /sys/devices/system/clocksource/clocksource0/current_clocksource on the domU prompt.
Because the clock won't be relying on the dom0 clock anymore, you probably need to use ntp on the domU to synchronize it properly to the world.
Another possibility ist to use the behaviour of the previous xen-kernel settings: clocksource=jiffies and independent_wallclock=0
Setting clocksource=jiffies for the dom0 and each domU as kernel parameter has eliminated the "Time went backwards" for me (14 dom0s and 27 domUs running stable for two weeks). You can check the values with
With these settings, ntp ist only needed in the dom0. If you change the time in a domU while ntp is running on the according dom0, time will be corrected within a few minutes in the domU. Hint: I didn't manage to influence the time of the domU with setting the time in the dom0 with date or hwclock, nevertheless ntp seems to do this (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534978#29).
There are cases where setting the clocksource to jiffies just makes the clock more unstable and leads to continous resets. A working solution appears to be the following:
- set independent_wallclock to 0 (all domains; VMs will follow dom0's clock)
- set clocksource to xen (it's the default in lenny)
- configure ntpd in dom0 only; set "disable kernel" in ntp.conf
This succeeded in stabilizing a Xen server's clock where all other workarounds failed.
More information can be found at http://my.opera.com/marcomarongiu/blog/2010/08/18/debugging-ntp-again-part-4-and-last. You can browse for the whole process at http://my.opera.com/marcomarongiu/blog/index.dml/tag/Sysadmin