Thursday, July 17, 2014

Django admin - register all your models the quick and dirty way

Would you like to just import all your models and make them all available to your django admin interface? I would, and here's how I'm doing it now:
# app/admin.py
from django.contrib import admin

from . import models


for _, inst in models.__dict__.items():
    if isinstance(inst, type):
        try:
            admin.site.register(inst)
        except:
            pass
I needed the empty try/except block to avoid bugging out on the use of an AbstractUser model but for now, this works great.

Tuesday, August 20, 2013

Divide and Conquer.. with bash and friends!

Another day, another script in need of a huge performance boost. The scenerio is somewhat common to me: datasets in the form of files are being transferred using bash scripts (glorified rsync wrappers with some additional error checking) and after the transfer, the same bash process spawns a php script for the actual proccessesing of the records (in this case, performs some transformations followed by DB inserts).

The problem was that the (single threaded) php step was unable to keep up with the high rates of massive files (>3 Gb) being sent its way.

Instead of trying to optimize the php processor, I decided to wrap it with some job control logic and divide the hug files into smaller chunks. Finally, a use-case for using `split` (man 1 split).

The idea was to cut the big file into lots of smaller pieces and then spawn X php processes to consume the files in parallel. For this problem I decided to split the file based on the number of lines because each line contains a full record and then spawn off 1 php process per chunk. It worked like a charm, dividing the work into smaller and easier to digest pieces fed into a pool of php workers:

#
# divide_and_conquer( file_name, max_number_of_workers )
#
function divide_and_conquer {
 local _big_file="${1}"; shift;
 local _parallel_count="${1}"; shift;

 # where to place the file chunks:
 local _prefix="/var/tmp/$(date +%s)_";

 split --lines=10000 ${_big_file} ${_prefix};

 local _file_chunks="$(ls -1 ${_prefix}*)";

 for f in ${_file_chunks}; do
  # spawn off a php worker for this file chunk and if the php script returns a non error code,
  # then delete the processed chunk:
  ( php /var/script.php ${f} && rm ${f} ) &

  # limit the total number of worker processes:
  while [[ $(jobs -p | wc -l) -ge ${_parallel_count} ]]; do
   sleep 0.1;
  done
 done

 # wait for the last of the children:
 while [[ $(jobs -p | wc -l) -ne 0 ]]; do
  sleep 0.1;
 done
}

# and let's use it:
divide_and_conquer "/var/lib/huge_file.csv" "8"


Cheers -

Tuesday, July 9, 2013

php syntax checking with vim

Showing off your cowboy skills by modifying PHP code on a production server with vim? Here's a neat trick to at least check the syntax of the modified file before saving it:
:w ! php -l
Or even save the above as a binding (ctrl-b) in your vimrc:
map <C-B> :w ! php -l<CR>
And a big thanks to http://vim.wikia.com/wiki/Runtime_syntax_check_for_php for making this so clear.

Thursday, June 6, 2013

ZeroMQ, HWM, and INPROC

I have been banging my head pretty hard for the past 2 days using ZeroMQ with a combination of inproc transports and HWM. In my scenario I have an inproc ZMQ_PUSH socket pushing and an inproc ZMQ_PULL reading from the pipe. The client (pusher) blocked somewhere between 1 and 2k messages and no matter what I set it's ZMQ_HWM to it just kept blocking. As the project I'm working on requires something like a dynamic HWM I wrote a custom implementation of HWM and deal with them in the application and just wanted to disable the built-in version. Before filing a bug report I decided to take a glance into the github repo.. and that's where it all made sense, here's a snippet from this source:
        // The total HWM for an inproc connection should be the sum of
        // the binder's HWM and the connector's HWM.
        int sndhwm = 0;
        if (options.sndhwm != 0 && peer.options.rcvhwm != 0)
            sndhwm = options.sndhwm + peer.options.rcvhwm;
        int rcvhwm = 0;
        if (options.rcvhwm != 0 && peer.options.sndhwm != 0)
            rcvhwm = options.rcvhwm + peer.options.sndhwm;
And it's even clearly stated in the comments just above, I need to set the HWM to 0 on both the sender AND the receiver :) Moral of the story is: if you need to set HWM limits on inproc sockets, you have to set the limits on both sides!

Wednesday, April 24, 2013

Locking processes with flock

Got a cronjob that might overlap if it runs slower than usual and you need to avoid multiple instances running?
Here's how to lock them using flock in bash:
function delicate_process() {
  # the 'locked down' code
  return 0;
}

function main() {
  (
    if ! flock -x --nonblock 200; then
      return 1;
    fi

    delicate_process;

  ) 200>/var/lock/.my.lock
}

main "${@}";
This is essentially opening the file /var/lock/.my.lock and assigning it the FD 200, then inside flock attempts a non blocking exclusive lock on the FD 200, returning `1` on failure.