======================================================= Stencil Domain Specific Language Compiler (sdslc) 0.2.2 ======================================================= The Stencil Domain Specific Language Compiler (sdslc) is a source-to-source translator for C/C++ files with embedded sections of the Stencil Domain Specific Language (SDSL). ----------------- REQUIRED SOFTWARE ----------------- The following components are required to build the SDSL compiler: * Apache Ant * Bison * CMake 2.8 or higher * gcc/g++ 4.4 or higher * Java JDK 1.6 or higher - Must be JDK, not JRE * LLVM 3.0 or higher - Must be built with CMake * Nvidia CUDA SDK 5.0 or higher * Python 2.7 The SDSL compiler has been successfully built and tested on Fedora 16, Ubuntu 12.04, and RHEL 6.3. -------- BUILDING -------- To setup the build environment, please set the following environment variables: * JAVA_HOME: Set to installation path of Java JDK e.g. 'export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk' * PATH: Make sure 'ant' is available on your PATH The entire build process is controlled by a CMake script. You can generate the makefiles for the project by creating a build directory and invoking CMake. The following CMake options are recognized: * OT_LLVM_BINARY_DIR - The installation path of LLVM - Required * CUDA_INSTALL_DIR - The installation path of the CUDA SDK - Defaults to /usr/local/cuda * SDSLC_INSTALL_DIR - Path to install sdslc at - Defaults to /usr/local The following series of commands will build sdslc starting from a tar.gz distribution: $ tar xzvf sdslc-0.2.2.tar.gz $ cd sdslc-0.2.2 $ mkdir build $ cd build $ cmake -DOT_LLVM_BINARY_DIR=$LLVM_HOME -DCUDA_INSTALL_DIR=$CUDA_HOME \ -DSDSLC_INSTALL_DIR=/usr/local/sdslc-0.2.2 .. $ make $ make install Root user or sudo access may be required for the 'make install' command, depending on the value of SDSLC_INSTALL_DIR. The main executable produced is the $SDSLC_INSTALL_DIR/bin/sdslc script that wraps the sdslc Java program. ----- USAGE ----- The basic usage of the SDSL compiler involves writing a C/C++ source file with embedded SDSL syntax. The embedded SDSL code must be placed between '#pragma sdsl begin' and '#pragma sdsl end' statements, e.g. #pragma sdsl begin int dim0; int dim1; iterate { ... } #pragma sdsl end Any grid data or parameters defined in SDSL must have corresponding arrays or variables with the same name and type defined in the local C/C++ scope. -------- EXAMPLES -------- Full examples are included in $SDSLC_INSTALL_DIR/share/sdslc/examples/general (and subdirectories) and the $SDSLC_INSTALL_DIR/share/sdslc/examples/cdsc directories. The affine versions of the general benchmarks can be built with the following commands: $ cd $SDSLC_INSTALL_DIR/share/sdslc/examples/general $ make affine This will call the sdslc compiler to produce intermediate C code and gcc to produce executables in each benchmark's subdirectory. These codes are built to be run on the CPU and contain affine C sections demarcated by '#pragma scop begin' and '#pragma scop end'. These codes are built to be further optimized with polyhedral compilation tools such as the following: * PoCC - http://www.cs.ucla.edu/~pouchet/software/pocc * PolyOpt/C - http://www.cs.ucla.edu/~pouchet/software/polyopt The overlap tiled (overtile) versions of the general benchamrks can be built with the following commands: $ cd $SDSLC_INSTALL_DIR/share/sdslc/examples/general $ make overtile This will call the sdslc compiler to produce intermediate CUDA code and nvcc to produce executables in each benchmark's subdirectory. These codes are built to run on CUDA-capable Nvidia GPUs of the Fermi and Kepler generations. For GT2xx series chips (GeForce GTX2xx, TESLA C10xx) it is necessary to add the '--legacy-gpu' option to the SDSLC_FLAGS variable in the examples/common.mk file. Both affine and overtile versions of the CDSC pipeline can be built with the following commands: $ cd $SDSLC_INSTALL_DIR/share/sdslc/examples/cdsc $ make denoise $ make register $ make segment $ make pipeline The pipeline app can also be built as a standalone object file with 'make pipeline-obj'. ---------- AUTOTUNING ---------- The overtile GPU versions of all benchmarks can be autotuned to acheive maximum performance on the current GPU. For the general benchmarks, single and double precision versions can be autotuned by executing the following commands: $ cd $SDSLC_INSTALL_DIR/share/sdslc/examples/general/ $ make autotune-sp $ make autotune-dp The CDSC pipeline can be autotuned with the following commands: $ cd $SDSLC_INSTALL_DIR/share/sdslc/examples/cdsc $ make autotune-denoise $ make autotune-register $ make autotune-segment $ make autotune-pipeline The autotuner works by repeatedly executing a benchmark with different thread block sizes, space tile sizes, and time tile sizes. This process can take a very long time (multiple hours) and can also lead to combinations of sizes that are not compatible with the current GPU. In cases where a size combination is unable to execute on the current GPU an error message will be printed and the next size combination will be tried. It is perfectly normal to see long stretches where most execution attempts fail. The files examples/general/autotune*.conf and examples/cdsc/autotune*.conf are used to configure the autotuner. Thread block and tile size ranges along with flags for sdslc (such as '--legacy-gpu') and nvcc can be specified in these files. At any time during an autotuning run the fastest code is available in both SDSL and CUDA source as .sdsl.autotuned. and .autotuned.cu. --------------- FURTHER READING --------------- More details about SDSL and the SDSL compiler can be found in the user guide, sdsl-guide-0.2.2.pdf, in this directory.