SimBOINC: A Simulator for Desktop Grids and Volunteer Computing Systems

Introduction

The SimBOINC project has been put on hold until further notice due to new recent major changes and developments with both BOINC and SimGrid. Thank you for your understanding.SimBOINC is a simulator for heterogeneous and volatile desktop grids and volunteer computing systems. The goal of this project is to provide a simulator by which to test new scheduling strategies in BOINC, and other desktop and volunteer systems, in general. SimBOINC is based on the SimGrid simulation toolkit for simulating distributed and parallel systems, and uses SimGrid to simulate BOINC (in particular, the client CPU scheduler, and eventually the work fetch policy) by implementing a number of required functionalities.

Simulator Overview

SimBOINC simulates a client-server platform where multiple clients request work from a central server. In particular, we have implemented a client class that is based on the BOINC client, and uses (almost exactly) the client's CPU scheduler source code. The characteristics of client (for example, speed, project resource shares, and availability), of the workload (for example, the projects, the size of each task, and checkpoint frequency), and of the network connecting the client and server (for example, bandwidth and latency) can all be specified as simulation inputs. With those inputs, the simulator will execute and produce an output file that gives the values for a number of scheduler performance metrics, such as effective resource shares, and task deadline misses.

Download and Compilation

Request the SimBOINC source code by emailing dkondo a_t lri d_o_t fr. In there, you will find the SimGrid header files in the include directory and the static library libsimgrid.a in the lib directory compiled on Linux 2.6.12.5 i686. You must change the install path in the Makefile (which is currently configured to use the static library) to point to the location of your simboinc directory. Afterwards, typing 'make', should compile everything and give you the executable simboinc .

The source has been successfully compiled and run with gcc 4.0.1 on Mac OS X 10.4.7, and gcc 4.0.3 on Debian Linux 4.0.3-1. If you need to compile it on Windows, send me an email at dkondo a_t lri d_o_t fr, and I'll look into it.

Execution

SimBOINC expects the following inputs in the form of xml files:

As an example, see the corresponding files mini_platform.xml, mini_workload.xml, mini_client_states.xml, mini_sim.xml in the main directory. To run SimBOINC, using these files, run:

./simboinc ./mini_platform.xml ./mini_workload.xml ./mini_client_states.xml ./mini_sim.xml

Platform File

The platform file is where one constructs the computing and network resources on which the BOINC client and server run. In particular, SimBOINC expects a set of cpu resources, and a set of network links that connect those resources. Moreover, SimBOINC expects the server to be named "Server" (as any other host in the specified in the file will run the BOINC client). For each resource, one can specify set of attributes. For example, with cpu resources, one can specify the power, and corresponding availability trace files. For network resources, one can specify their bandwidth and latency.

Here is a small example from mini_platform.xml

<?xml version='1.0'?>
<!DOCTYPE platform_description SYSTEM "surfxml.dtd">
<platform_description>

  <cpu name="Server" power="100"/>
  <cpu name="Host_1" power="100" availability_file="mini_avail.txt" state_file="mini_fail.txt"/>
  <cpu name="Host_1-SB-CPU1" power="100" availability_file="mini_avail.txt" state_file="mini_fail.txt"/>
  <network_link name="0" bandwidth="100" latency=".001"/>
  <network_link name="1" bandwidth="100" latency=".001"/>
  <network_link name="loopback" bandwidth="100.00" latency="0.001"/>
  <route src="Server" dst="Server"><route_element name="loopback"/></route>
  <route src="Host_1" dst="Host_1"><route_element name="loopback"/></route>
  <route src="Server" dst="Host_1">
    <route_element name="0"/>
  </route>
  <route src="Host_1" dst="Server">
    <route_element name="1"/>
  </route>
</platform_description>

We have a server named Server and a client named Host_1. Server and Host_1 are connected by network links with bandwidth of 100 and latency of 0.001. Also, the availability of Server and Host_1 is specified by the trace files mini_avail.txt and mini_fail.txt.

To create a multiple cpu host in the platform file, create a cpu with the basename, and then for each additional cpu, create another with the suffix "-SB-CPUN", where "N" is the cpu number and first cpu number is 0. In the above example, Host_1 is a dual-cpu host with cpu's "Host_1" and "Host_1-SB-CPU1". Data transfers are handled only through the "primary" cpu, which is this case is "Host_1", and so only the network information for the primary cpu is relevant and used in the simulation. While the platform file has an entry for each cpu, only individual hosts are represented in the client states file. For example, for the about platform, the client states file will only have a record for "Host_1", not "Host_1-SB-CPU1".

WE ASSUME THAT IF A CPU FAILS, THEN ALL CPUS IN THE SAME HOST FAIL AT THE SAME TIME. SO THE TRACE FILES FOR THE CPUS MUST BE IDENTICAL.

For more details on constructing a platform file, see here.

Workload File

The workload file specifies the projects to be executed over the BOINC platform. In particular, it specifies for each project, the name, total number of tasks to execute, the task size in terms of computation, the task size in terms of communication, the checkpoint frequency for each task, and the delay_bound, and rsc_fpops_est BOINC task attributes.

Here is a small example from mini_platform.xml:

<jobs>
<job>
    <name>predictor</name>
    <num_tasks>1000</num_tasks>
    <task_comp_size>1000</task_comp_size>
    <task_comm_size>2</task_comm_size>
    <chkpt_freq>10</chkpt_freq>
    <delay_bound>200000000</delay_bound>
    <rsc_fpops_est>30000000000.00</rsc_fpops_est>
</job>
<job>
    <name>seti</name>
    <num_tasks>200</num_tasks>
    <task_comp_size>500</task_comp_size>
    <task_comm_size>.0000000001</task_comm_size>
    <chkpt_freq>5</chkpt_freq>
    <delay_bound>2000000000</delay_bound>
    <rsc_fpops_est>30000000000.00</rsc_fpops_est>
</job>
</jobs>

Client States File

The client states input file is based on the client states format exported by the BOINC client to store persistant state. For a BOINC developer, the meaning of the fields should be obvious. The idea is that client states files could be collected and assembled to produce a client_states input file to SimBOINC, which would allow the simulation of BOINC clients using realistic settings. WE ASSUME THE HOST_CPID IN THE CLIENT_STATES FILE IS THE HOST NAME FOR THE PRIMARY CPU IN THE PLATFORM FILE.

See mini_client_states.xml as an example.

Simulation File

This simulation input file specifies the type of simulation to be conducted (e.g. BOINC), the maximum time for simulation after which the simulation will be terminated, and the output file name

Here is min_sim.xml as an example:

<sim>
    <id>1</id>
    <type>boinc</type>
    <max_time>3000</max_time>
    <output_filename>output.xml</output_filename>
</sim>

Using Availability Traces

In SimGrid, the availability of network and cpu resources can be specified through traces. For cpu resources, one specifies a cpu availability file that denotes the availability of the cpu as a percentage over time. Also, for the cpu, one specifies a failure file that indicates when the cpu fails. In SimGrid, a cpu failure causes all processing running on that cpu to terminate.

In BOINC, at least three things can cause an executing task to fail. First, the task could be preempted by the BOINC client because of the client scheduling policy. Second, the task could be preempted by the BOINC client because of user activity according to the user's preferences. Third, the host could fail (for example due to a machine crash or shutdown). In SimBOINC, the failures of a host specified in the cpu trace files represent the failure resulting from the latter two causes. That is, when a cpu fails as specified in the traces, all processes on the cpu will terminate. However, their state is maintained and persists through the failure so that when the host becomes available again, the processes will be restarted in the same state. That is, the tasks that had been executing before the failure are restarted from the last checkpoint after the failure, and the client state data structure is the same as before the failure.

Logging

SimBOINC uses the logging facility called XBT provided by SimGrid, which is similar in spirit to log4j (and in turn, log4cxx, and etc.) It allows for runtime configuration of messages output and the level of detail. However, it does yet support appenders.

We chose to use XBT instead the BOINC's message logger because XBT it integrated with SimGrid, and as such can show more informative messages by default (like the name of the process, the simulation time, and etc.).

To control the level of output for each logger, use xbt_log_control_set in main of simboinc.C

Simulator Output and Performance Metrics

The simulator output file must be specified in the simulation input file. The simulator then outputs the following metrics to that file in xml:

Also, for each cpu specified in the platform.xml file, the simulator will output a corresponding .trace file, which records information about the execution of tasks on that cpu . In particular, the trace file shows in each column, the simulation time, the task name, the event (START, COMPLETED, CANCELLED, or FAILED), the cpu name, and completion time when applicable.

Implementation

If you are already familiar with the BOINC client, then you should be able to jump right into the simulation source code without much trouble. The corresponding BOINC client source file in SimBOINC has the suffix "_sim" appended to the original file name.

If you want to understand the additional simulation classes that SimBOINC provides, you might take at look at the following classes:

Why did we choose SimGrid?

We chose to implement the BOINC simulator using SimGrid for a number of reasons. First, SimGrid provides a number abstractions and tools that simplify the process of simulating of complex parallel and distributed systems. For example, SimGrid provides abstractions for processes, computing elements, network links, and etc. These abstractions and tools greatly simplified the implementation of the BOINC simulator. Second, we can leverage the proven accuracy of SimGrid's resource models. For example, SimGrid models allocation of network bandwidth among competing data transfers using a flow-based TCP model for networks that has been shown to be reasonably accurate. Using SimBOINC based on SimGrid, one could easily construct a network and simulate large peer-to-peer file transfers as a novel usage scenario for BOINC. Third, SimGrid was implemented in C and using it with BOINC's C++ source code is straightforward.

If you want to learn more about SimGrid, take a look here.

BOINC code in SimBOINC

When transplanting BOINC code into SimBOINC, we tried to make as few modifications as possible to the original boinc cpu scheduler source code. Nevertheless, some changes had to be made because we are of course running a simulation. In most cases where BOINC code was left out, it was just commented out so that BOINC developers can see what has been removed. Here is a list that summarizes the changes:

FAQ

TO DO list

Here's is the TO DO list where items are listed from highest to lowest priority:

Authors

Derrick Kondo is the developer of SimBOINC, which is based on the BOINC project and SimGrid.
David Anderson is the leader and developer of the BOINC project.
Arnaud Legrand and Martin Quinson are currently the primary developers of SimGrid.
Generated on Mon Mar 12 16:21:01 2007 for SimBOINC by  doxygen 1.4.6