View Issue Details

IDProjectCategoryView StatusLast Update
0000743LDMud 3.5Runtimepublic2018-01-30 03:59
Reporter_xtian_ Assigned Tozesstra  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Target Version3.5.0Fixed in Version3.5.0 
Summary0000743: 3.5.0-current skipping heart_beats
DescriptionA few days ago we switched from a several months old 3.5.0 to a current (march 2010) 3.5.0 and immediately players started complaining that timers (that were actually fired by their heart_beats) were noticeably delayed.

Measuring several objects heart_beat showed that about every 10% HBs of an object were delayed. Instead of a 2s delay, they had a 4s delay. This occured all the time - like I said easily for every 10% of HBs.

Switching back to an older build of 3.5.0 solved the problem.

If you can tell me where I can look-up the build number at runtime, I'll tell you which builds were involved.
TagsNo tags attached.

Activities

zesstra

2010-04-21 15:34

administrator   ~0001820

Concerning the 'look up revision at run-time': AFAIK it is not possible. The information should be in the debug file, but obviously does not help right now for older reversions.

Concerning the problem: interesting. ;-) I guess I will have to start with some testcase first (or do you have a portable one ready?). Do you have any further info about the circumstances or do you just need 100 objects with no code but a heart_heat() in them?

zesstra

2010-04-21 16:03

administrator   ~0001821

First try: I just cloned 300 objects with heartbeats and monitored if the HBs are skipped/delayed and the number of HB objects and processed HB objects in each cycle (via debug_info, DID_ST_HBEAT_OBJS, DID_ST_HBEAT_PROCESSED, DID_ST_HBEAT_AVG_PROC). So far I did not notice any problems, no skipped/delayed HBs on my machine at home.

_xtian_

2010-04-21 16:05

reporter   ~0001822

I just have an object that notifies me on every HB.
Hmm, what else can I tell you. I can reproduce it in our develmud (on the server) but not in my homemud. The Develmud has <10 object with HBs, but a lot more call-outs running (if that is relevant) than the homemud.
Or the server load somehow factors into this.

zesstra

2010-04-21 16:22

administrator   ~0001823

Mhmm, the server load can influence it, because the drivers signal to execute the heartbeats is alarm/ITIMER_REAL and that can be delayed (or even missed on heavy server loads). But usually that delays should not be in the range of seconds. (What average loads and peak loads do you have?) It may be interesting to find out if a the HBs in a limited number of objects are missed or the complete HB cycle is delayed.

_xtian_

2010-04-21 16:39

reporter   ~0001825

Last edited: 2010-04-21 16:48

The CPU never goes above 20%.
All my testobjects delay at the same time.

EDIT: Yep, approx 38 from 40 objects with HBs enabled delay at the same time. Looks like the whole queue.

_xtian_

2010-04-24 03:32

reporter   ~0001828

after bisecting: the regression appears in rev 2874

zesstra

2010-04-25 15:43

administrator   ~0001829

r2874? Are you sure? I read it again and still can't think of a reason, why this should introduce skipped heartbeats. It just synchronizes the alarms of the driver with full realtime seconds, making the first timer interval after bootup a little bit shorter.
(Out of curiosity: do you have any "Last alarm was x seconds ago - restarting it." messages in your debug log?)

zesstra

2010-04-26 08:53

administrator   ~0001830

Ah, another thing: could you please measure the time between your heartbeats with utime() instead of time()?

zesstra

2010-04-26 13:35

administrator   ~0001831

Ok, problem found, I guess. r2874 is not faulty in itself (the problem could occur without it as well), but increases the chance for a very unlucky timing:
r2874 tries to synchronize the time of alarms with whole seconds. The driver works with a granularity of seconds. Now consider: Alarm1 at 2.000001s. Alarm2 at 3.999999s. First thing: time() returns 2 and 3 at the time of the two alarms. The difference is < __HEART_BEAT_INTERVAL__ and the heartbeat is not executed, although _nearly_ __HEART_BEAT_INTERVAL__ has passed. Result: a heartbeat delayed by __ALARM_TIME__.
Second possibility: Alarm1 at 1.999999s, Alarm2 at 4.000001s. The heartbeat is executed properly, about 2s passed since the last one, but the LPC internal time() will return 1 and 4, which looks like a delayed heartbeat (by one second).

zesstra

2010-04-26 14:15

administrator   ~0001832

Ok, I changed the way current_time is determined in r2902. Could you please test in your mud if it solves your problems?

zesstra

2010-04-26 14:41

administrator   ~0001834

Xtian reported success, so I am closing. In case of problems, please tell me to re-open.

zesstra

2010-11-16 09:42

administrator   ~0001910

Fix committed to master branch.

zesstra

2018-01-29 18:59

administrator   ~0002340

Fix committed in revision 06ea2820f6c36c54f5766f51fde1c6f9bd0c53bf to master branch (see changeset 1640 for details). Thank you for reporting!

zesstra

2018-01-29 21:57

administrator   ~0002391

Fix committed in revision 06ea2820f6c36c54f5766f51fde1c6f9bd0c53bf to master branch (see changeset 2972 for details). Thank you for reporting!

zesstra

2018-01-30 03:59

administrator   ~0002442

Fix committed in revision 06ea2820f6c36c54f5766f51fde1c6f9bd0c53bf to master branch (see changeset 4053 for details). Thank you for reporting!

Issue History

Date Modified Username Field Change
2010-04-04 02:39 _xtian_ New Issue
2010-04-21 15:34 zesstra Note Added: 0001820
2010-04-21 16:00 zesstra Project LDMud => LDMud 3.5
2010-04-21 16:03 zesstra Note Added: 0001821
2010-04-21 16:05 _xtian_ Note Added: 0001822
2010-04-21 16:22 zesstra Note Added: 0001823
2010-04-21 16:39 _xtian_ Note Added: 0001825
2010-04-21 16:48 _xtian_ Note Edited: 0001825
2010-04-24 03:32 _xtian_ Note Added: 0001828
2010-04-25 15:43 zesstra Note Added: 0001829
2010-04-26 08:53 zesstra Note Added: 0001830
2010-04-26 13:35 zesstra Note Added: 0001831
2010-04-26 14:10 zesstra Status new => assigned
2010-04-26 14:10 zesstra Assigned To => zesstra
2010-04-26 14:15 zesstra Note Added: 0001832
2010-04-26 14:15 zesstra Status assigned => feedback
2010-04-26 14:41 zesstra Note Added: 0001834
2010-04-26 14:41 zesstra Status feedback => resolved
2010-04-26 14:41 zesstra Fixed in Version => 3.5.0
2010-04-26 14:41 zesstra Resolution open => fixed
2010-11-16 09:42 zesstra Source_changeset_attached => ldmud.git master 06ea2820
2010-11-16 09:42 zesstra Note Added: 0001910
2011-02-13 21:38 zesstra Target Version => 3.5.0
2018-01-29 18:59 zesstra Source_changeset_attached => ldmud.git master 06ea2820
2018-01-29 18:59 zesstra Note Added: 0002340
2018-01-29 21:57 zesstra Source_changeset_attached => ldmud.git master 06ea2820
2018-01-29 21:57 zesstra Note Added: 0002391
2018-01-30 03:59 zesstra Source_changeset_attached => ldmud.git master 06ea2820
2018-01-30 03:59 zesstra Note Added: 0002442