tips around Parallel::ForkManager

| No TrackBacks
Parallel::ForkManager is my choice for 'forks'. it's simple to use, fit my demand and well-maintained.

I use it frequently. for example, in scrape job with Tor. or there is a lot of db rows to process. forks is required to fast the whole progress when we have enough resource.

here is some tips I use with Parallel::ForkManager.

1. Scope::OnExit
we know we always need call $pm->finish; in child so that we won't get something like 'Cannot start another process while you are in the child process'.
it won't be an issue if you have simple code without much next in a loop.
but it could be very troublesome if you have lots of 'next' in the loop. you have to call ->finish before next. that's stupid. and Scope::OnExit can save you out.

foreach my $part (@parts) {
    $pm->start and next; # do the fork
   
    ## when on next
    on_scope_exit {
        $pm->finish; # Terminates the child process
    };
   
    # do whatever you like, call 'next' on whenever you want.
}
2. List::MoreUtils
see you have a big list to process, simple you can fork on every element. but in this case, you'll need init or clone every object in forked child, and it's expensive sometimes.so how about something like this, you divide the big list into $PROCESS parts, then fork on each part. so at last, you just init/clone $PROCESS times instead of scalar(@big_list) times. List::MoreUtils part can do the job here:

my @big_list = (1 .. 10000); # from file, database or whatever
my $i = 0;
my @parts = part { $i++ % $FORK_CHILDREN } @big_list;
foreach my $part (@parts) {
    $pm->start and next; # do the fork
   
    ## when on next
    on_scope_exit {
        $pm->finish; # Terminates the child process
    };
   
    # init dbh/ua etc.

    while (my $ele = shift @$part) {

    }
}
3. DBI and LWP::UserAgent clone
I don't know if it's wise or not to call DBI->connect or LWP::UserAgent->new in child code. but usually we can do

my $dbh = $odbh->clone();
my $ua2 = $ua->clone(); # will copy the cookies and referer etc.
4. share variables between parent and children
well, I don't like threads::shared. and I don't like IPC.
usually a cache solution can do the tricky. from one simple txt file (with lock), maybe you can use Parallel::Scoreboard to my choice Cache::FastMmap
sample code below:

my $cache = Cache::FastMmap->new;
my @array = (1 .. 10); # in parent
$cache->set($cache_key, \@array);

### then in forked child after ->start
        $cache->get_and_set( $cache_key, sub {
            my $v = $_[1];
            push @$v, $value_in_child;
            return $v;
        } );

### after $pm->wait_all_children;
my $array_ref = $cache->get($cache_key);

get_and_set does the tricky here. anyway, that's just my solution. it won't fit into every situation.

that's all for Parallel::ForkManager. hope it helps when you want to use it.

Thanks.

No TrackBacks

TrackBack URL: http://blog.fayland.org/mt-tb.cgi/2

About this Entry

This page contains a single entry by Fayland Lam published on January 8, 2011 9:58 PM.

new start was the previous entry in this blog.

WebService::IPRental is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.