Problem: Computations restarting from zero each time computer is restarted.


advanced search

Message boards : Number crunching : Problem: Computations restarting from zero each time computer is restarted.

Reply to this thread
Subscribe to this thread
Sort
AuthorMessage
Atanu Maulik
private message
Joined: Dec 2, 2007
Posts: 2
ID: 11263
Credit: 206,112
RAC: 42
Message 2264 - Posted 13 Sep 2008 15:22:42 UTC

I find that each time I restart the computer the computation of unfinished tasks are restarting again from zero. Even if 90 % of a particular task is completed as I shut down the computer, on restart I find that the computation of same workunit is restarting as if it is new ( showing 0% in the progress column). That way valueable computing task is lost. Please help.

m.somers User profile image
Forum moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
private message
Joined: Nov 14, 2005
Posts: 662
ID: 1
Credit: 1,417,572
RAC: 2
Message 2283 - Posted 30 Sep 2008 7:17:14 UTC

Checkpointing some of the trajtou apps is a non-trivial task still to be done... The tasks are also rather short compared to the classical WUs making the need for checkpointing less relevant. When I have some extra spare time... i'll investigate this...

m.

____________
M.F. Somers

Grutte Pier [Wa Oars]~GP500
private message
Joined: Oct 30, 2010
Posts: 3
ID: 27291
Credit: 849,606
RAC: 292
Message 2760 - Posted 14 Aug 2011 9:21:53 UTC

Still irrelevant it seems because It is still happening.

After 98% and 1:20h it restats @ 0%.

That's a real downer to crunch for a project.

FIX IT !!!!!!

m.somers User profile image
Forum moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
private message
Joined: Nov 14, 2005
Posts: 662
ID: 1
Credit: 1,417,572
RAC: 2
Message 2761 - Posted 15 Aug 2011 5:44:02 UTC

Checkpointing has been implemented in those apps where it was possible. In some trajtou apps it was not possible (due to the nature of the old f77 code being f2c-ed) to implement it. We made sure that the loss of those WUs is within 'reasonable' time and only matters of an hour or so and not days. These old f77 codes were never designed to be used in BOINC or in any other way than in the 'I need my results now and fast because I need papers to be published and who cares about the next guy' spirrit often encountered nowadays in science.


m.


____________
M.F. Somers

Grutte Pier [Wa Oars]~GP500
private message
Joined: Oct 30, 2010
Posts: 3
ID: 27291
Credit: 849,606
RAC: 292
Message 2762 - Posted 16 Aug 2011 7:53:52 UTC

To bad, thx for the reply.

What share do these ugly wu's take up, in the trajtou wu-bunch
____________

Reply to this thread

Message boards : Number crunching : Problem: Computations restarting from zero each time computer is restarted.



Return to Leiden Classical main page


Copyright © 2017 Leiden University - Leiden Institute of Chemistry - Theoretical Chemistry Department