DrQueueIPython

Motivation (work in progress)

the present

  • DrQueue has a long history and many features, but feels a little bit shop-soiled. Because DrQueue doesn't depend on any other software component, it has to do everything on it's own: daemon startup, network communication, tasks and job management, and so on.
  • The core is written in C and some of the Tools C++, which makes it harder to support different operating systems.
  • To connect other programming languages, SWIG is used to provide some glue code for Python and Ruby. However this approach tends to leak memory when data type convertions are not configured correctly.

the future?

  • Because a big part of the 3D programming and rendering scene is using Python, the script generators were transfered from TCSH into Python code some time ago. This was meant to make it easier for users to customize them. If DrQueue was all Python code, it might also be a lot easier to reach other Python developers and give them a better possibility to contribute to DrQueue.
  • For the things people want to archieve with DrQueue, one shouldn't have the need to reinvent the wheel. There are great libraries from other developers which do things with modern and efficient approaches. ZMQ (library for lightweight message passing) and IPython (framework for interactive and parallel Python programming) are two of them.
  • If you take out network communication (done by IPython/ZMQ) and task management (done by IPython), there's left: general render cluster concept, DrQueue Python module, DrQueue clients (command line, GUI).

Design changes

  • Master runs IPController
  • Slave runs IPEngine
  • Clients can use the IPython.parallel.Client class to talk to IPController
  • No compiling needed anymore. Just Python code.
  • ZMQ, PyZMQ, MongoDB, PyMongo and IPython become dependencies.
  • MongoDB is be used for information storage of tasks, jobs, pools, ... .
  • There is no direct access to frame information anymore. Jobs can be devided into tasks. Depending on the blocksize, one task can consist of one or more frames.
  • High water mark (HWM) can be set for IPEngines in order to control how many tasks are queued to each engine and to always keep some tasks for late joining engines.
  • DrQueueIPython provides a Python module for easy accces to the underlying technology. This makes integration into other software which uses Python possible.

Requirements

Master node

  • Python >= 2.7
  • ZMQ >= 2.1.4
  • pyzmq >= 2.1.4
  • IPython >= 0.13.1
  • MongoDB >= 2.0
  • PyMongo >= 1.10
  • Samba, NFS, FUSE sshfs or other network filesystem server

Slave node

  • Python >= 2.7
  • ZMQ >= 2.1.4
  • pyzmq >= 2.1.4
  • IPython >= 0.13.1
  • Samba, NFS or other network filesystem client
  • render program (Blender, Maya, ...)

Features

  • Platform support (provided by IPython): Linux, Mac OSX, AIX, Solaris, xBSD, Windows (CygWin, XP, Vista, etc.)
  • Scalable backend database: MongoDB

Integration into other Software

  • Ruby clients (for example DrQueueOnRails can use rubypython to work with the DrQueue Python module
  • Blender plugin is planned to be developed

Workarounds for missing features / specialties in IPython

  • IPython doesn't know the concept of jobs (group of tasks). So far we use the session name for storing the job id. All tasks with the same session name belong to the same job.
  • The job owner is stored using the session username.
  • For having email notifications about finished jobs, a 'send_email' task is created as a dependent task and run when the real job is finished.
  • Jobs can have specific dependencies (OS, minimum amount of RAM / CPU cores, pool membership).
  • Render slaves are started through a wrapper script which sets up logging and calls the ipengine program along with a startup script. The startup script collects information about the render slave and stores that in the database. That information is needed directly at the start for checking task dependencies because IPython will start giving tasks to slaves as soon as they are connected to the cluster.
  • Stored information about render slaves is updated through IPython's DirectView task interface after a specific cache time is exceeded. That information collection task can time-out because a slave might be busy with a very long running task. In that case the old stored information is returned.
  • Pool membership is initialized through the DRQUEUE_POOL environment variable. So it can be set during the boot process of the render slave before startup of the IPython engine process. Pool membership information is also stored in the database, so it can be accessed without having to use the DirectView task interface (which might block / time out).

Open questions

  • None at the moment. Add here.

Development

see Development

Setup and configuration

see SetupAndConfiguration

Deployment via Puppet

Startup

see Startup

User interfaces

Command line

drQT GUI

DrQueueOnRails webinterface

Use DrQueue Python module from Ruby

  • RubyPython can be used to use Python libraries in Ruby.
  • install RubyPython:
    sudo gem install rubypython
    
  • try to load the module in irb:
    require "rubypython" 
    RubyPython.start(:python_exe => "python2.7")
    sys = RubyPython.import "sys" 
    sys.argv = [""]
    DrQueue = RubyPython.import("DrQueue")