CouchDB Replication Scheduler – tweak and tuning

A few tweaking hints for the Replication Scheduler in CouchDB 2.x

Lately we’ve been experimenting a lot with CouchDB an its replication features.

It’s a very cool paradigm that allows you to hide many layer of complexity related to data synchronisation between different systems into an automated and almost-instant replication process.

Basically CouchDB implements two kind of replications, a “one-shot” replication and a “continuous” replication. In the first case there’s a process that starts, replicate an entire DB and then goes in a “Completed” state, while in the second case there’s an always-on iterating process that, using some kind of internal sequence numbers (something conceptually close to a Journal Log of a filesystem), keeps the slave database continuously in sync with the master one.

When dealing with many databases and replication processes it’s pretty easy to reach a point where you have many Replication Processes running on a single server and that may lead to slowness and, in general, a high load of (effectively idle) activities on the machines.

To avoid such circumstances CouchDB, since version 2.1, implements a replication scheduler that is able to cycle trough all the replication jobs in some sort of circular fashion pausing and restarting all the jobs to avoid resources exhaustion.

The Replication Scheduler is controlled by a few tuneable parameters (see http://docs.couchdb.org/en/stable/config/replicator.html#replicator for more details). Three of those parameters are the real deal of the situation as they control the basic aspects of the scheduler:

  • max_jobs – which controls the threshold of max simultaneously running jobs;
  • interval – which controls how often the scheduler is invoked to rearrange replication jobs, pausing some jobs and starting some others;
  • max_churn – which controls the maximum number of jobs to be “replaced” (read: one job is paused, another one is started) in any single execution of the scheduler.

This is a basic diagram outlining the Replication Scheduler process:

Untitled Diagram

So, basically, with “max_jobs” you control how much you want to stress your server, with “interval” you control how often you want to shuffle things up, and with “max_churn” you control how violently the scheduler will act.

  • If max_jobs is too high your server load will increase (a lot!).
  • If max_jobs is too low your replication will be less “realtime” as there is an higher chance that a replication job could be paused.
  • If interval is too high a paused replication job could stay paused for way too long.
  • If interval is too low a running replication job could be paused to early, before it could actually catch up with it’s queued activities.
  • If max_churn is too high there may be an high expense in setup and kick off timings (when a replication process is started up it has to connect to the server, authenticate, check that everything is aligned and so on…)
  • If max_churn is too low the amount of time a process could stay paused may be pretty long.

As usual, your working environment – I mean, database size, hardware performances, document sizes, whatever – has a huge impact on how you tweak those parameters.

My only personal consideration is that the default value of max_jobs (500) seems to me a pretty high value for a common server. After some tweaking, on a “small” Virtual Machine we use for development we’ve settled with max_jobs set to 20, interval set to 60000 (60 seconds) and max_churn set to 10. On the Production server, with better Hardware (Real HW instead of VM, SSD drives, more CPU cores, and whatever) we expect an higher value for max_jobs – but in the 2x/3x range, so maybe something like 40/60 max_jobs – I strongly doubt we could ever reach a max_jobs value of 500.

Have fun.

PHP Proc_Open and STDIN – STDOUT – STDERR

In gCloud Storage, our Storage-as-a-Service system, we developed some years ago some chain technologies that allowed us to expand dynamically the features of the Storage subsystem allowing it to translate incoming or outgoing files.

Some while ago we developed a chain that allows our users to securely store a file by ciphering it when it enters in the system and decipher it when it’s fetched, without our party saving the password.

After some thinking we decided to embrace already existing technologies for the purpose, and we decided to rely on openssl for the purpose.

So we had to wrap some code that was able to interact with a spawned openssl process. We did some try-and-guess and surely we did our research on google. After various attempts we found this code that proved to be pretty reliable:

http://omegadelta.net/2012/02/08/stdin-stdout-stderr-with-proc_open-in-php/

We tried first on our Mac OS machines, then on our FreeBSD server and it worked flawlessly for a couple of years. Recently one of our customer asked for a on-premises installation of a stripped-down clone of gCloud Storage, that had to run on Linux (CentOS if that’s relevant). We were pretty confident that everything would go smoothly but that wasn’t the case. When the system went live we found out that when deciphering the files it would lose some ending blocks.

Long story short we found that on Linux a child process can finish while leaving data still in the stdout buffer while – apparently – it can’t on FreeBSD.

The code we adopted had a specific control to make sure that it wasn’t trying to interact with a dead process. Specifically:

if (!is_resource($process)) break;

was the guilty portion of the code. What was happening was that openssl was closing, the code was detecting it and bailing out before fetching the whole stdout/stderr.

So in the end we came out with this:

public function procOpenHandler($command = '', $stdin = '', $maxExecutionTime = 30) {

    $timeLimit = (time() + $maxExecutionTime);

    $descriptorSpec = array(
        0 => array("pipe", "r"),
        1 => array('pipe', 'w'),
        2 => array('pipe', 'w')
    );

    $pipes = array();

    $response = new stdClass();
    $response->status = TRUE;
    $response->stdOut = '';
    $response->stdErr = '';
    $response->exitCode = '';

    $process = proc_open($command, $descriptorSpec, $pipes);
    if (!$process) {
        // could not exec command
        $response->status = FALSE;
        return $response;
    }

    $txOff = 0;
    $txLen = strlen($stdin);
    $stdoutDone = FALSE;
    $stderrDone = FALSE;

    // Make stdin/stdout/stderr non-blocking
    stream_set_blocking($pipes[0], 0);
    stream_set_blocking($pipes[1], 0);
    stream_set_blocking($pipes[2], 0);

    if ($txLen == 0) {
        fclose($pipes[0]);
    }

    while (TRUE) {

        if (time() > $timeLimit) {
            // max execution time reached
            // echo 'MAX EXECUTION TIME REACHED'; die;
            @proc_close($process);
            $response->status = FALSE;
            break;
        }

        $rx = array(); // The program's stdout/stderr

        if (!$stdoutDone) {
            $rx[] = $pipes[1];
        }

        if (!$stderrDone) {
            $rx[] = $pipes[2];
        }

        $tx = array(); // The program's stdin

        if ($txOff < $txLen) {
              $tx[] = $pipes[0];
          }
          $ex = NULL;
          stream_select($rx, $tx, $ex, NULL, NULL); // Block til r/w possible
          if (!empty($tx)) {
              $txRet = fwrite($pipes[0], substr($stdin, $txOff, 8192));
              if ($txRet !== FALSE) {
                  $txOff += $txRet;
              }
              if ($txOff >= $txLen) {
                fclose($pipes[0]);
            }
        }

        foreach ($rx as $r) {

            if ($r == $pipes[1]) {

                $response->stdOut .= fread($pipes[1], 8192);

                if (feof($pipes[1])) {

                    fclose($pipes[1]);
                    $stdoutDone = TRUE;
                }
            } else if ($r == $pipes[2]) {

                $response->stdErr .= fread($pipes[2], 8192);

                if (feof($pipes[2])) {

                    fclose($pipes[2]);
                    $stderrDone = TRUE;
                }
            }
        }
        if (!is_resource($process)) {
            $txOff = $txLen;
        }

        $processStatus = proc_get_status($process);
        if (array_key_exists('running', $processStatus) && !$processStatus['running']) {
            $txOff = $txLen;
        }

        if ($txOff >= $txLen && $stdoutDone && $stderrDone) {
            break;
        }
    }

    // Ok - close process (if still running)
    $response->exitCode = @proc_close($process);

    return $response;
}

Have Fun! 😉

FreeBSD 10.0 bhyve – VMWare ESXi 5.5 comparison – part 2

A few days ago I posted a comparison between FreeBSD’s bhyve and VMWare ESXi 5.5. I received a lot of feedbacks from the result of our test, so we decided to investigate further with a new round of tests, in a more scientific approach.

As in previous test, we used a standard “empty” FreeBSD 10 machine + latest portsnap that we used as our main “template”. The VM was using “ahci-hd” as the storage backend and the tests were run in SSH, not local console. We always started from this template for every test and run the same test in different scenarios. The hardware was the same one as the previous tests.

Note: I didn’t write it in the past post, but our first round of test was run on a ZFS filesystem with both compression and deduplication enabled.

Continua a leggere “FreeBSD 10.0 bhyve – VMWare ESXi 5.5 comparison – part 2”

FreeBSD 10.0 BHyVe – VMWare ESXi 5.5 comparison

Hey, I wrote a “part 2” to this article, you may want to check it out!

Hello,

recently FreeBSD10 has come out and one of the most intresting new features was the introduction of bhyve, a “type 2 hypervisor” that allow you to easily create a Virtual Machine inside of a FreeBSD Host.

As with every new technology, it is yet very rough, but the first “driving” experience was very good. Recently we had a new project starting, some new hardware still unused and in general I’m not very fond of VMWare so we decided to do a comparison between VMWare and bhyve to understand what would be the real performance downfall of using a new technology.

Continua a leggere “FreeBSD 10.0 BHyVe – VMWare ESXi 5.5 comparison”

Casa nuova, vita nuova

Ok, ci siamo spostati. Alla fine la pigrizia ha preso il sopravvento ed ho deciso di spostarmi direttamente su WordPress.com, in modo da non dovermi sbattere a mantenere aggiornato WordPress su un mio server 😉

Ora i prossimi focus sono:

  • Rilasciare MySQLfs 0.4.2
  • Organizzare la tournée in Germania per gli Antistamina
  • Finire di registrare i nuovi pezzi degli Antistamina! 🙂
  • Riprendere (finalmente) in mano il progetto Cupido

Cosette semplici come sempre. Intanto tra pochi giorni parto per una settimana a Groningen!

Ciao!