Tag Archives: Parallelize

Genetic Algorithm with Multiple ROS Instances

Note: Source code discussed in this post is available here.  Specifically, we’ll be working with the adding_service ROS package.

Genetic algorithms are “pleasantly parallel” in that the evaluation phase can be distributed across multiple cores of a single machine or a cluster to better utilize hardware and lower running time of the algorithm.  When working with ROS, however, there are some platform specific considerations that need to be made in order to fully exploit parallel evaluations.

In this blog post, we will explore a genetic algorithm that evolves a genome of floats to maximize their sum.  ROS will perform the addition while a Python implementation of a GA will be run outside the scope of ROS.  The GA will communicate with ROS instances through the use of ZeroMQ messages.  This code is intended to provide a minimal working example of how to pass a genome into ROS and retrieve fitness values out.  It can be extended by a user as needed to implement more complex evaluations or a more advanced GA.

Communicating between External GA and ROS Instance

During an evaluation, information is passed between the external GA and the ROS instance through a combination of ZeroMQ messages (external) and ROS parameters and topics (internal).  The flow of information can be seen in the following diagram for a single evaluation.  Adder_Transporter handles the message passing and Adder_Worker handles the actual evaluation.

This is the information flow between a GA and ROS instance for a single evaluation. The GA sends the genome over a ZeroMQ message. A transport node handles the message, loads the genome into a ROS parameter and then alerts the worker that there is an evaluation to be performed. Once the evaluation is complete, the fitness value is sent back to the transporter, is converted to a ZeroMQ message and delivered back to the GA.

This is the information flow between a GA and ROS instance for a single evaluation. The GA sends the genome over a ZeroMQ message. A transport node handles the message, loads the genome into a ROS parameter and then alerts the worker that there is an evaluation to be performed. Once the evaluation is complete, the fitness value is sent back to the transporter, is converted to a ZeroMQ message and delivered back to the GA.

Parallelizing ROS

Alone, an individual ROS instance typically handles all facets of a controller, physics simulation, and any other nodes that are needed to evaluate a problem.  Typically, an instance is launched from within a launch file that specifies the individual ROS nodes along with any additional configuration needed.  However, an individual ROS instance cannot perform multiple evaluations simultaneously.  Thankfully, there is a mechanism within ROS to allow for many parallel ROS instances to be run using the group tag in the launch file along with a namespace argument specified with ns.

For the example GA discussed in this post, a single ROS instance contains two nodes.  The first, adder_transporter, handles communication with the GA and passes information onto an adding node.  adder_worker handles the actual summation of the genome and sends the value back to adder_transporter.  Fitness and the genome ID are then returned to the GA.

<node name="adder_transporter" pkg="adding_service" type="adder_transporter.py" output="screen"></node>
<node name="adder_worker" pkg="adding_service" type="adder.py" output="screen"></node>

Alone, these two nodes are sufficient to allow the GA to evaluate the complete population, but it does not harness all cores available on a machine.  Instead, multiple instances are needed to parallelize the evaluation step.  Wrapping the node setup code with the group tag allows us to create multiple ROS instances.  Thus, the full launch script for this example is available in adding_service.launch

<launch>

<group ns="adder0">
<node name="adder_transporter" pkg="adding_service" type="adder_transporter.py" output="screen"></node>
<node name="adder_worker" pkg="adding_service" type="adder.py" output="screen"></node>
</group>

<group ns="adder1">
<node name="adder_transporter" pkg="adding_service" type="adder_transporter.py" output="screen"></node>
<node name="adder_worker" pkg="adding_service" type="adder.py" output="screen"></node>
</group>

<group ns="adder2">
<node name="adder_transporter" pkg="adding_service" type="adder_transporter.py" output="screen"></node>
<node name="adder_worker" pkg="adding_service" type="adder.py" output="screen"></node>
</group>

</launch>

More instances could be added by copy/pasting the code within the group tags and changing the ns attribute accordingly.

Running the Example

The genetic algorithm can be seen in /src/adding_service/test/ga_server.py while a single addition can be seen in /src/adding_service/test/server.py.  To run the example, open two terminal tabs and change to the ros_gazebo_python directory.  Run the command:

source devel/setup.sh

in each terminal.  Then, in one terminal, start the ROS instances with the command:

roslaunch adding_service adding_service.launch

The GA can then be launched in the other terminal with:

python src/adding_service/test/ga_server.py

Outstanding Issues:

There is a possibility that if ROS isn’t up and running before the GA is launched the GA will not work.  I am currently trying to determine what causes this and what a potential resolution is.  I plan to edit this post once I figure that out.