Requesting the computer list to the master is too slow
|Status:||In progress||Start date:|
|Assignee:||Andreas Schröder||% Done:|
I have tested this both with the drqman interface, the compinfo program and the python bindings with similar results. When you request the computer list, it is very slow, and very dependent on the number of computers. With three computers it can be less than second, but with 50 computers you have to wait 15 seconds to get the response.Repro steps:
- Configure a queue of N computers, with N > 10
- time compinfo -l
#3 Updated by Alistair Leslie-Hughes almost 8 years ago
The way the data is sent across the network would be the cause of the issue. It appears to send multiple packets for each slave connected to the master. A better approach is to send all the computer information in one go.
1. For each Slave send all its data in one packet.
2. Send all Slaves data in one packet.
#5 Updated by Andreas Schröder over 7 years ago
- Assignee deleted (
The IP addresses of master and clients could be looked up once (slave: on startup, master: when a slave comes along / leaves the renderfarm) and then stored for example in global variables. For each connection, only cached IP addressess would be used which avoids DNS lookups every time.
#6 Updated by Andreas Schröder over 7 years ago
- Status changed from New to Feedback
- Assignee set to Andreas Schröder
- Priority changed from Low to Normal
I've modified some things to make this possible:
Works for me so far.
#8 Updated by Andreas Schröder over 7 years ago
I fixed a bug in my code. With this commit it should be fine:
#10 Updated by Ruben Lopez over 7 years ago
I can still reproduce this issue:
$ time ./compinfo.Linux.x86_64 -l | wc -l 54 real 0m10.768s user 0m0.008s sys 0m0.034s
With drqman and python binding the results are similar. The job list, even with a much larger number of jobs, is requested much faster.
I don't know how to reopen this task by myself, can anyone reopen it?