TCE-IP

From HPCRL Wiki
(Difference between revisions)
Jump to: navigation, search
(fhnhvfcEeNsdumY)
 
(856 intermediate revisions by 4 users not shown)
Line 1: Line 1:
http://xlam1984.mycyberway.com/ free gossip girl episode 13
+
=== Introduction ===
http://mumloc980.phreesite.com/ xbox 360 console number
+
The IP-TCE library is a set of C functions designed to provide
http://insert567.fizwig.com/ graphics suite x4 keygen
+
high-performance array index permutation (index sorting), an important
http://poli7764.gofreeserve.com/ sims 2 free object downloads
+
group of kernels used in many scientific applications and compilation
http://allg1235.freehostplace.com/ canada state code  
+
techniques. The generation of the library code combines analytical and
http://elfa8815.101freehost.com/ rice cake recipes
+
empirical approaches. The details of how to generate the efficient 2-D IP
http://sand0398.001webs.com/ free video converter com
+
(Matrix Transposition) code can be found in [1]. For high-dimensional arrays,
http://xlam1984.mycyberway.com/gossip-girl-episode-listing.html gossip girl episode review
+
with challenges such as high indexing costs and short dimensions, we employ
http://mumloc980.phreesite.com/which-xbox-360-console.html put in xbox 360 console
+
optimizations such as restricting code versions, using one-level tiling and
http://insert567.fizwig.com/graphics-suite-12-serial.html graphics suite 3
+
generating indexing code to achieve high performance without code size
http://poli7764.gofreeserve.com/sims-2-maternity-downloads.html sims 2 downloads build
+
explosion.
http://allg1235.freehostplace.com/state-phone-code.html state code for michigan
+
 
http://elfa8815.101freehost.com/sweet-rice-recipes.html rice a roni recipes
+
The source code we provide does index permutation on 2-D,4-D and 6-D
http://sand0398.001webs.com/free-video-converter-for-vista.html free music and video converter
+
64-bit floating point arrays on IA-32 machines. Except for the 2-D functions that
http://xlam1984.mycyberway.com/gossip-girl-episode-19-online.html gossip girl 1x14 online
+
have more complicated optimizations than the others, the library employs SSE2
http://mumloc980.phreesite.com/xbox-360-console-crate.html xbox 360 console differences
+
instructions when the fastest varying dimensions of source and destination
http://insert567.fizwig.com/graphics-suite-x3-activation.html graphics suite 11 free
+
arrays are multiples of vector sizes in elements, otherwise a scalar version
http://poli7764.gofreeserve.com/sims-2-room-downloads.html sims 2 set downloads
+
is chosen. The user should be able to modify our code to get IP code working for
http://allg1235.freehostplace.com/state-code-ap.html state code number
+
other dimension numbers.
http://elfa8815.101freehost.com/brown-rice-syrup-recipes.html fried rice all recipes
+
 
http://sand0398.001webs.com/free-video-converter-reviews.html free wii video converter
+
The IP-TCE library implements two variants of index permutation. One variant
http://xlam1984.mycyberway.com/gossip-girl-poison-ivy-watch-online.html watch gossip girl finale online
+
implements B = factor * Permute(A, permutation) and the other one is the
http://mumloc980.phreesite.com/console-number-on-xbox-360.html into an xbox 360 console
+
accumulative version B = B + factor * Permute(A, permutation). Doing so makes
http://insert567.fizwig.com/graphics-suite-12-torrent.html graphics suite x3 review
+
our code compatible with the index permutation routines in nwchem/tce. We have
http://poli7764.gofreeserve.com/sims-2-hospital-downloads.html sims 2 movie downloads
+
been able to plug our code into nwchem/tce and obtained overall performance
http://allg1235.freehostplace.com/area-code-866-state.html state class code
+
improvements ranging from 74% to 253% with different methods and inputs.
http://elfa8815.101freehost.com/brown-rice-casserole-recipes.html recipes with brown rice syrup
+
 
http://sand0398.001webs.com/download-total-video-converter.html audio video converter
+
=== Compilation ===
http://xlam1984.mycyberway.com/gossip-girl-full-season-online.html gossip girl games online
+
The library code can be compiled by either the Intel C compiler or the GNU C
http://mumloc980.phreesite.com/mass-effect-branded-xbox-360-console.html rockband cheats for xbox 360
+
compiler. We used icc 10.1 and gcc 4.1.2 when testing the code. If other
http://insert567.fizwig.com/graphics-suite-x-4-torrent.html draw graphics suite 12 activation code
+
compilers such as pgcc is used, the user is responsible for finding the
http://poli7764.gofreeserve.com/sims-2-store-downloads.html jonas brothers sims 2 downloads
+
alignment directives used by the compiler and changing the library code
http://elfa8815.101freehost.com/tasty-brown-rice-recipes.html special fried rice recipe
+
correspondingly.
http://sand0398.001webs.com/best-video-converter.html color7 video converter
+
 
http://poli7764.gofreeserve.com/sims-castaway-wii-cheats.html sims cheat codes money
+
=== Library Usage ===
http://sand0398.001webs.com/mediacell-video-converter.html ds video converter
+
The interface of the IP-TCE library is compatible with the index permutation
http://sand0398.001webs.com/video-converter-torrents.html rer video converter
+
routines in nwchem/tce and is mainly for Fortran used. For example,
 +
the prototype of the 4-D non-accumulative permutation routine is:
 +
 
 +
                  tce_sort_4_(double* unsorted,double* sorted,
 +
                              int* a_in, int* b_in, int* c_in, int* d_in,
 +
                              int* i_in, int* j_in, int* k_in, int* l_in,
 +
                              double* factor_in)
 +
 
 +
Where all the arguments are pointers and function names are in the lowercase
 +
and end with "_". If needed,  the user can write a wrapper or directly modify
 +
the function (which is actually a wrapper to the SIMD and scalar IP functions)
 +
to obtain a desired interface.
 +
 
 +
=== Contact Info ===
 +
Please contact Qingda Lu(luq@cse.ohio-state.edu) for questions.
 +
 
 +
=== Reference ===
 +
[1] Qingda Lu, Sriram Krishnamoorthy, P. Sadayappan: Combining analytical and
 +
empirical approaches in tuning matrix transposition. 15th International
 +
Conference on Parallel Architecture and Compilation Techniques(PACT 2006):233-242

Latest revision as of 23:34, 20 February 2009

Contents

Introduction

The IP-TCE library is a set of C functions designed to provide high-performance array index permutation (index sorting), an important group of kernels used in many scientific applications and compilation techniques. The generation of the library code combines analytical and empirical approaches. The details of how to generate the efficient 2-D IP (Matrix Transposition) code can be found in [1]. For high-dimensional arrays, with challenges such as high indexing costs and short dimensions, we employ optimizations such as restricting code versions, using one-level tiling and generating indexing code to achieve high performance without code size explosion.

The source code we provide does index permutation on 2-D,4-D and 6-D 64-bit floating point arrays on IA-32 machines. Except for the 2-D functions that have more complicated optimizations than the others, the library employs SSE2 instructions when the fastest varying dimensions of source and destination arrays are multiples of vector sizes in elements, otherwise a scalar version is chosen. The user should be able to modify our code to get IP code working for other dimension numbers.

The IP-TCE library implements two variants of index permutation. One variant implements B = factor * Permute(A, permutation) and the other one is the accumulative version B = B + factor * Permute(A, permutation). Doing so makes our code compatible with the index permutation routines in nwchem/tce. We have been able to plug our code into nwchem/tce and obtained overall performance improvements ranging from 74% to 253% with different methods and inputs.

Compilation

The library code can be compiled by either the Intel C compiler or the GNU C compiler. We used icc 10.1 and gcc 4.1.2 when testing the code. If other compilers such as pgcc is used, the user is responsible for finding the alignment directives used by the compiler and changing the library code correspondingly.

Library Usage

The interface of the IP-TCE library is compatible with the index permutation routines in nwchem/tce and is mainly for Fortran used. For example, the prototype of the 4-D non-accumulative permutation routine is:

                  tce_sort_4_(double* unsorted,double* sorted,
                              int* a_in, int* b_in, int* c_in, int* d_in,
                              int* i_in, int* j_in, int* k_in, int* l_in,
                              double* factor_in)

Where all the arguments are pointers and function names are in the lowercase and end with "_". If needed, the user can write a wrapper or directly modify the function (which is actually a wrapper to the SIMD and scalar IP functions) to obtain a desired interface.

Contact Info

Please contact Qingda Lu(luq@cse.ohio-state.edu) for questions.

Reference

[1] Qingda Lu, Sriram Krishnamoorthy, P. Sadayappan: Combining analytical and empirical approaches in tuning matrix transposition. 15th International Conference on Parallel Architecture and Compilation Techniques(PACT 2006):233-242