Work unit claimed 22.12 credits, granted 0.00 "Too many total results"


advanced search

Message boards : Number crunching : Work unit claimed 22.12 credits, granted 0.00 "Too many total results"

Reply to this thread
Subscribe to this thread
Sort
AuthorMessage
SandJ
private message
Joined: Jan 4, 2007
Posts: 11
ID: 2851
Credit: 214,196
RAC: 134
Message 2912 - Posted 21 Jan 2014 19:10:41 UTC
Last modified: 21 Jan 2014 19:11:00 UTC

Work Unit 20269812 has been granted no credits because it says "Too many total results".

Of the 15 failed results, 13 were all by the same computer over a period of 2½ months. They are the only results submitted by that computer.

Isn't that very odd?
____________

SandJ
private message
Joined: Jan 4, 2007
Posts: 11
ID: 2851
Credit: 214,196
RAC: 134
Message 2913 - Posted 21 Jan 2014 19:13:27 UTC

Replying to my own query, two of the other failed results for that work unit are also from one computer but 20 days apart.
____________

SandJ
private message
Joined: Jan 4, 2007
Posts: 11
ID: 2851
Credit: 214,196
RAC: 134
Message 2914 - Posted 23 Jan 2014 22:28:44 UTC

I just knew I should have copy 'n' pasted the screen dumps of the above. :-( Now my evidence has expired.
____________

m.somers User profile image
Forum moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
private message
Joined: Nov 14, 2005
Posts: 662
ID: 1
Credit: 1,417,572
RAC: 2
Message 2915 - Posted 28 Jan 2014 8:19:25 UTC

Hi,

no this is not so 'odd'; it should happen often, but it is possible. Let me explain; because we do classical trajectories, that are very sensitive to the precise starting conditions and floating point capabilities of CPUs, we use a thing called homogeneous redundancy. This means that work is send out and the workunit to check is afterwards send to any computer with the same CPU. This could be the same computer though if there is only one computer with a very special OS (BSD i.e.) running a less often used CPU. If that computer fails; all these workunits will fail on that computer. This project has a limit of 16 attempts before it stops. That's exactly what you have seen.

m.



____________
M.F. Somers

SandJ
private message
Joined: Jan 4, 2007
Posts: 11
ID: 2851
Credit: 214,196
RAC: 134
Message 2916 - Posted 28 Jan 2014 19:47:52 UTC - in response to Message ID 2915.

no this is not so 'odd'; it should happen often, but it is possible. Let me explain; because we do classical trajectories, that are very sensitive to the precise starting conditions and floating point capabilities of CPUs, we use a thing called homogeneous redundancy. This means that work is send out and the workunit to check is afterwards send to any computer with the same CPU. This could be the same computer though if there is only one computer with a very special OS (BSD i.e.) running a less often used CPU. If that computer fails; all these workunits will fail on that computer. This project has a limit of 16 attempts before it stops. That's exactly what you have seen.

m.
Even though one computer successfully completed the packet? You threw away a completed result. And gave no credits for it.

Why didn't my computer ever receive that packet again, why just the one that failed?

Presumably I should stop this particular computer of mine from processing Leiden Classical packets since its wingmen will fail them?
____________

SandJ
private message
Joined: Jan 4, 2007
Posts: 11
ID: 2851
Credit: 214,196
RAC: 134
Message 2917 - Posted 28 Jan 2014 20:55:39 UTC - in response to Message ID 2916.
Last modified: 28 Jan 2014 20:56:11 UTC

I see I have another work unit: http://boinc.gorlaeus.net/workunit.php?wuid=20965326 which may go the same way. One PC has processed it twice and failed, and that PC is another serial failer.

Can you not prevent a given PC receiving the same packet twice? I thought that happened automatically in these BOINC-based applications?
____________

m.somers User profile image
Forum moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Avatar
private message
Joined: Nov 14, 2005
Posts: 662
ID: 1
Credit: 1,417,572
RAC: 2
Message 2918 - Posted 31 Jan 2014 9:03:57 UTC

Yes, this is possible, to only allow a single work unit of a set to a single host. However, we did run into some problems with that in the past making this project shift to the 'only a single workunit of the set in time on a host' method. The thing that happened was that due to the detailed homogeneous redundancy we use, it was possible (and it actually happened too) that only a single host was available of a specific type of CPU and OS (BSD hosts are rare). With the default option you suggest, the database got filled with results that were waiting to be matched, but obviously couldn't.

m.
____________
M.F. Somers

Reply to this thread

Message boards : Number crunching : Work unit claimed 22.12 credits, granted 0.00 "Too many total results"



Return to Leiden Classical main page


Copyright © 2017 Leiden University - Leiden Institute of Chemistry - Theoretical Chemistry Department