From HPCRL Wiki
Jump to: navigation, search

The Unbalanced Tree Search (UTS) benchmark is a parallel benchmarking code that reports the performance achieved when performing an exhaustive search on an unbalanced tree. The tree is generated on the fly using a splittable random number generator (RNG) that allows the random stream to be split and processed in parallel while still producing a deterministic tree. The splittable RNG has been constructed using the SHA1 secure hash algorithm. Thus, generating a node's children requires multiple applications of the SHA1 hash algorithm to generate splittable hashes for each child.


Project Members

The Ohio State University

  • Professor P. Sadayappan
  • James Dinan
  • Gerald Sabin

University of Maryland

  • Professor Chau-Wen Tseng

University of North Carolina

  • Professor Jan Prins
  • Stephen Olivier

Load Balancing

The UTS benchmark presents significant load imbalance problems to any parallelization scheme. The size and shape of the tree are random and inferring these parameters from any node in the tree would require a shortcut to solving the SHA-1 secure hash algorithm. Thus, any static work distribution scheme is highly unlikely to yield good performance for UTS.

We have investigated a number of load balancing schemes in the context of UTS. Most schemes fall into two categories: work stealing and work sharing.

Work Stealing
Work stealing is a distributed load balancing scheme where idle processors poll their peers in the computation for work. Depending on the programming model stealing may be done in a one-sided manner (e.g. in UPC, OpenMP, Shmem, etc) or in a two sided manner (e.g. in MPI). When a thief finds a victim with work, it steals a portion of the victim's work and is able to continue processing.
Work Sharing
Work sharing uses a centralized mechanism for distributing work. Work sharing can be accomplished through the use of work servers or shared work queues. In these schemes, workers send new tasks (in the case of UTS these are tree nodes) to the server or enqueue them in the queue. When a worker exhausts their work they can get more from the server or the shared queue.

Implementations of UTS

Our distribution of UTS include implementations for:

  • MPI
  • UPC
  • OpenMP
  • Shmem
  • GPShmem
  • Pthreads
  • Cray MTA/XMT

Implementations in additional parallel models include:

  • Chapel (Note: This is a reference implementation and has not been tuned for performance/scalability.)
  • X10
  • Scioto


Scalable Dynamic Load Balancing Using UPC PDF
Stephen Olivier, Jan Prins.
Proc. of 37th International Conference on Parallel Processing (ICPP-08). Portland, OR, September 2008. 
A Message Passing Benchmark for Unbalanced Applications DOI
James Dinan, Stephen Olivier, Gerald Sabin, Jan Prins, P. Sadayappan, Chau-Wen Tseng
J. Simulation Modelling Practice and Theory (SIMPAT). Volume 16, Issue 9, Pages 1177-1189. October, 2008.
Dynamic Load Balancing of Unbalanced Computations Using Message Passing PDF Slides
James Dinan, Stephen Olivier, Jan Prins, Gerald Sabin, P Sadayappan and Chau-Wen Tseng.
Proc. of 6th Intl. Workshop on Performance Modeling, Evaluation, and Optimization of
Parallel and Distributed Systems (PMEO-PDS 2007). Long Beach, CA, March 26-30, 2007.
UTS: An Unbalanced Tree Search Benchmark PDF
Stephen Olivier, Jun Huan, Jinze Liu, Jan Prins, James Dinan, P Sadayappan and Chau-Wen Tseng.
Proc. 19th Intl. Workshop on Languages and Compilers for Parallel Computing (LCPC). 2006.
UPC Implementation of an Unbalanced Tree Search Benchmark PDF
Jan Prins, Jun Huan, Bill Pugh, Chau-Wen Tseng, P. Sadayappan.
UNC Dept. of Computer Science Technical Report 03-034, Oct 2003.


UTS is currently maintained by James Dinan. Please contact James with any questions via email at dinan-at-mcs.anl.gov.

Personal tools