PHP jobs with Gearman and Supervisor

The problem

Often in a PHP project there could be operations that need to be executed asynchronously. Some example are: processing mail queues, indexing data, computation that requires long elaboration time.

A common behavior is handle those operations by using cron to execute processes in background. However, using cron requires expedients to avoid cross executions and forces us to implement some specific procedures and mechanism to store data needed to elaborate.

Solution: Gearman + Supervisor

The solution that involves Gearman and Supervisor, instead, don’t require any kind of data storage mechanism and supply a very simple way to develop processes in PHP.

From

Gearman homepage:

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

Moreover, to develop clients and workers for Gearman in PHP is very simple as you can see in this Gearman PHP documentation example.

On the other hand, as written on Supervisor homepage:

Supervisor, is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.

Actually, combination of Gearman and Supervisor creates a stable, flexible and scalable infrastructure to handle all our jobs. As shown in the following diagram, it provides us a client/server structure and cares about communication and (if we want) distribution. We only need to take care of code to complete our tasks.

Gearman stack

Gearman stack

Installation

I’m going to show how to install and setup Gearman and Supervisor in debian wheezy environment.

Normally, following packages should be present in a php ready system, but we want to be sure:

$ apt-get install apache2 php5 libapache2-mod-php5 php5-dev

Now we are ready to start Gearman installation. Required command is the following:

$ apt-get install gearman-job-server libgearman-dev

At this point, we need to install Gearman Pecl extension in order to use required PHP Classes. In a perfect World, we should simply install latest version, but (at the moment) latest stable Gearman Pecl requires a libgearman version newer then that included in debian repositories and, if we try to install it, we should recieve an error like this:

configure: error: libgearman version 1.1.0 or later required
ERROR: `/tmp/pear/temp/gearman/configure' failed

To resolve this issue, we could compile and install libgearman by ourself, or, as I prefer, install an older pecl extension version by running following commands:

$ apt-get install php-pear
$ pecl install gearman-1.0.3

Note: If you prefer compile libgearman yourself, you will easly find a way on Google.

To complete installation, following commands activate Gearman extension for Apache2:

$ echo "extension=gearman.so" > /etc/php5/conf.d/gearman.ini
$ service apache2 restart

At this point, we are ready to create our Gearman workers and clients as described in Gearman section of PHP documentation.

But we are not satisfied yet: we also want to handle our php processes in a smart way. We want to be able to control them, monitor them and specify things like: how many instances, priority, etc. In a few words, we want to run them through Supervisor.

So, let’s start to install it:

$ apt-get install supervisor

In debian, supervisor is a deamon and can be started/stopped using usual commands:

$ service supervisor start
$ service supervisor stop

Last step is configure our Gearman workers in Supervisor. To do that, we will create files in /etc/supervisor/conf.d/ that will contain something like this:

/etc/supervisor/conf.d/myprocess.conf:

[program:myprocess]
command=php /path/to/project/myprocess.php
numprocs=12
directory=/path/to/project/spool/myprocess/
autostart=true
autorestart=true
stdout_logfile=/path/to/project/log/myprocess.log
stdout_logfile_maxbytes=1MB
stderr_logfile=/path/to/project/log/myprocess.log
stderr_logfile_maxbytes=1MB

For a complete list of options, please see Supervisor documentation.

References:

PHP 

See also