Parallel::ForkManager is my choice for 'forks'. it's simple to use, fit my demand and well-maintained.
I use it frequently. for example, in scrape job with Tor. or there is a lot of db rows to process. forks is required to fast the whole progress when we have enough resource.
here is some tips I use with Parallel::ForkManager.
1. Scope::OnExit
we know we always need call $pm->finish; in child so that we won't get something like 'Cannot start another process while you are in the child process'.
it won't be an issue if you have simple code without much next in a loop.
but it could be very troublesome if you have lots of 'next' in the loop. you have to call ->finish before next. that's stupid. and Scope::OnExit can save you out.
see you have a big list to process, simple you can fork on every element. but in this case, you'll need init or clone every object in forked child, and it's expensive sometimes.so how about something like this, you divide the big list into $PROCESS parts, then fork on each part. so at last, you just init/clone $PROCESS times instead of scalar(@big_list) times. List::MoreUtils part can do the job here:
I don't know if it's wise or not to call DBI->connect or LWP::UserAgent->new in child code. but usually we can do
well, I don't like threads::shared. and I don't like IPC.
usually a cache solution can do the tricky. from one simple txt file (with lock), maybe you can use Parallel::Scoreboard to my choice Cache::FastMmap
sample code below:
that's all for Parallel::ForkManager. hope it helps when you want to use it.
Thanks.
I use it frequently. for example, in scrape job with Tor. or there is a lot of db rows to process. forks is required to fast the whole progress when we have enough resource.
here is some tips I use with Parallel::ForkManager.
1. Scope::OnExit
we know we always need call $pm->finish; in child so that we won't get something like 'Cannot start another process while you are in the child process'.
it won't be an issue if you have simple code without much next in a loop.
but it could be very troublesome if you have lots of 'next' in the loop. you have to call ->finish before next. that's stupid. and Scope::OnExit can save you out.
foreach my $part (@parts) {2. List::MoreUtils
$pm->start and next; # do the fork
## when on next
on_scope_exit {
$pm->finish; # Terminates the child process
};
# do whatever you like, call 'next' on whenever you want.
}
see you have a big list to process, simple you can fork on every element. but in this case, you'll need init or clone every object in forked child, and it's expensive sometimes.so how about something like this, you divide the big list into $PROCESS parts, then fork on each part. so at last, you just init/clone $PROCESS times instead of scalar(@big_list) times. List::MoreUtils part can do the job here:
my @big_list = (1 .. 10000); # from file, database or whatever3. DBI and LWP::UserAgent clone
my $i = 0;
my @parts = part { $i++ % $FORK_CHILDREN } @big_list;
foreach my $part (@parts) {
$pm->start and next; # do the fork
## when on next
on_scope_exit {
$pm->finish; # Terminates the child process
};
# init dbh/ua etc.
while (my $ele = shift @$part) {
}
}
I don't know if it's wise or not to call DBI->connect or LWP::UserAgent->new in child code. but usually we can do
my $dbh = $odbh->clone();4. share variables between parent and children
my $ua2 = $ua->clone(); # will copy the cookies and referer etc.
well, I don't like threads::shared. and I don't like IPC.
usually a cache solution can do the tricky. from one simple txt file (with lock), maybe you can use Parallel::Scoreboard to my choice Cache::FastMmap
sample code below:
my $cache = Cache::FastMmap->new;get_and_set does the tricky here. anyway, that's just my solution. it won't fit into every situation.
my @array = (1 .. 10); # in parent
$cache->set($cache_key, \@array);
### then in forked child after ->start
$cache->get_and_set( $cache_key, sub {
my $v = $_[1];
push @$v, $value_in_child;
return $v;
} );
### after $pm->wait_all_children;
my $array_ref = $cache->get($cache_key);
that's all for Parallel::ForkManager. hope it helps when you want to use it.
Thanks.
