Bug #44

Requesting the computer list to the master is too slow

Added by Redmine Admin over 10 years ago. Updated almost 8 years ago.

Status:In progressStart date:
Priority:NormalDue date:
Assignee:Andreas Schröder% Done:

0%

Category:all
Target version:0.64.5

Description

I have tested this both with the drqman interface, the compinfo program and the python bindings with similar results. When you request the computer list, it is very slow, and very dependent on the number of computers. With three computers it can be less than second, but with 50 computers you have to wait 15 seconds to get the response.

Repro steps:
  • Configure a queue of N computers, with N > 10
  • time compinfo -l

History

#1 Updated by Andreas Schröder almost 10 years ago

Sorry, but we don't have such a big render farm available here.

#2 Updated by Andreas Schröder almost 10 years ago

Please give us more debugging information. Run compinfo with an execution time analysis tool.

#3 Updated by Alistair Leslie-Hughes about 8 years ago

The way the data is sent across the network would be the cause of the issue. It appears to send multiple packets for each slave connected to the master. A better approach is to send all the computer information in one go.

Possible Solutions
1. For each Slave send all its data in one packet.
2. Send all Slaves data in one packet.

#4 Updated by Alistair Leslie-Hughes about 8 years ago

gethostbyname is the cause of the issue. DNS lookups are slow, and even longer when the record doesn't exist.

#5 Updated by Andreas Schröder almost 8 years ago

  • Assignee deleted (Redmine Admin)

The IP addresses of master and clients could be looked up once (slave: on startup, master: when a slave comes along / leaves the renderfarm) and then stored for example in global variables. For each connection, only cached IP addressess would be used which avoids DNS lookups every time.

#6 Updated by Andreas Schröder almost 8 years ago

  • Status changed from New to Feedback
  • Assignee set to Andreas Schröder
  • Priority changed from Low to Normal

#7 Updated by Alistair Leslie-Hughes almost 8 years ago

The patch looks ok for me.

#9 Updated by Andreas Schröder almost 8 years ago

  • Status changed from Feedback to Fixed

#10 Updated by Ruben Lopez almost 8 years ago

I can still reproduce this issue:

$ time ./compinfo.Linux.x86_64 -l | wc -l
54

real    0m10.768s
user    0m0.008s
sys    0m0.034s

With drqman and python binding the results are similar. The job list, even with a much larger number of jobs, is requested much faster.

I don't know how to reopen this task by myself, can anyone reopen it?

Thanks.

#11 Updated by Andreas Schröder almost 8 years ago

  • Status changed from Fixed to In progress

Also available in: Atom PDF