PolyMage

PolyMage is a domain-specific language and optimizing code generator for automatic optimization of image processing pipelines, being developed at the Multicore Computing Lab, Indian Institute of Science. PolyMage takes an image processing pipeline expressed by the user in a high-level language (embedded in Python) and generates an optimized parallelized C++ implementation of the pipeline.

...

PolyMage's optimizer primarily relies on the polyhedral framework to automatically perform a sequence of transformations for parallelization and locality optimization on multicore processors. The transformations include ways to enable fusion and tiling across image processing stages, using an asymmetric overlapped tiling technique.

...

Download

PolyMage is available as free and open source software under the permissive Apache License version 2.0 through its git repository on bitbucket.org here.

PolyMage optimized and base line code for the benchmarks used in its ASPLOS'15 paper can be found in a separate repository here. These generated codes are in the public domain; however, if you plan to reuse these, please consider including a note stating that these were generated by PolyMage, and cite the polymage-benchmarks github repo or this web page. Do keep in mind that the optimization capabilities of PolyMage's compiler/code generator have changed significantly since its release right after ASPLOS 2015.
PolyMage generated code typically performs best when compiled with Intel's C/C++ compilers.

Examples

Harris Corner Detection

R, C = Parameter(Int, "R"), Parameter(Int, "C")
I = Image(Float, "I", [R+2, C+2])

x, y = Variable("x"), Variable("y")
row, col = Interval(0, R+1, 1), Interval(0, C+1, 1)

c = Condition(x, '>=', 1) & Condition(x, '<=', R) & \
    Condition(y, '>=', 1) & Condition(y, '<=', C)

cb = Condition(x, '>=', 2) & Condition(x, '<=', R-1) & \
     Condition(y, '>=', 2) & Condition(y, '<=', C-1)

Iy = Function(varDom = ([x, y], [row, col]), Float, "Iy")
Iy.defn = [ Case(c, Stencil(I(x, y), 1.0/12, \
                                   [[-1,-2,-1], \
                                    [ 0, 0, 0], \
                                    [ 1, 2, 1]])) ]

Ix = Function(varDom = ([x, y], [row, col]), Float, "Ix")
Ix.defn = [ Case(c, Stencil(I(x, y), 1.0/12, \
                                   [[-1, 0, 1], \
                                    [-2, 0, 2], \
                                    [-1, 0, 1]])) ]

Ixx = Function(varDom = ([x, y], [row, col]), Float, "Ixx")
Ixx.defn = [ Case(c, Ix(x, y) * Ix(x, y)) ]

Iyy = Function(varDom = ([x, y], [row, col]), Float, "Iyy")
Iyy.defn = [ Case(c, Iy(x, y) * Iy(x, y)) ]

Ixy = Function(varDom = ([x, y], [row, col]), Float, "Ixy")
Ixy.defn = [ Case(c, Ix(x, y) * Iy(x, y)) ]

Sxx = Function(varDom = ([x, y], [row, col]), Float, "Sxx")
Syy = Function(varDom = ([x, y], [row, col]), Float, "Syy")
Sxy = Function(varDom = ([x, y], [row, col]), Float, "Sxy")
for pair in [(Sxx, Ixx), (Syy, Iyy), (Sxy, Ixy)]:
    pair[0].defn = [ Case(cb, Stencil(pair[1], 1, \
                                       [[1, 1, 1], \
                                        [1, 1, 1], \
                                        [1, 1, 1]])) ]

Det = Function(varDom = ([x, y], [row, col]), Float, "det")
d = Sxx(x, y) * Syy(x, y) - Sxy(x, y) * Sxy(x, y)
Det.defn = [ Case(cb, d) ]

Trace = Function(varDom = ([x, y], [row, col]), Float, "trace")
Trace.defn = [ Case(cb, Sxx(x, y) + Syy(x, y)) ]

Harris = Function(varDom = ([x, y], [row, col]), Float, "harris")
coarsity = Det(x, y) - 0.04 * Trace(x, y) * Trace(x, y)
Harris.defn = [ Case(cb, coarsity) ]
        
...

back to top

Pyramid Blending

...

Image courtesy: Kyros Kutulakos, 'Introduction to Visual Computing' course (University of Toronto)

back to top

Performance

Benchmark Stages Image size PolyMage (1 core) PolyMage (16 cores) OpenCV
Harris Corner Dectetion 11 6400 * 6400 233.79 ms 18.69 ms 810.24 ms
Pyramid Blending 44 3 * 2048 * 2048 196.99 ms 21.91 ms 197.28 ms
Unsharp Mask 4 3 * 2048 * 2048 165.40 ms 14.85 ms 349.57 ms
Local Laplacian 99 3 * 2560 * 1536 274.40 ms 32.35 ms *
Camera Pipeline 32 2528 * 1920 67.87 ms 5.86 ms *
Bilateral Grid 7 2560 * 1536 89.76 ms 8.47 ms *
Multiscale Interpolation 49 3 * 2560 * 1536 101.70 ms 18.18 ms *

Machine configuration: Dual socket 8-core Xeon E5 2680 (SandyBridge) 2.70 GHz processors with 64 GB of non-ECC RAM running Linux 3.8.0-38 (64-bit)

Compiler: Intel C/C++ compiler 14.0.1 with flags “-O3 -xhost”

PolyMage (unoptimized) is code generated with only scalar optimizations enabled.

PolyMage (optimized) is auto-tuned code generated with grouping, tiling, storage optimization and vectorization enabled.

Halide (tuned) schedules for Local Laplacian Filters, Camera Pipeline, Bilateral Grid, and Multiscale Interpolation are optimized schedules available with the Halide distribution, and were further tuned for our target machine.

back to top

Harris Corner Detection

...

Bilateral Grid

...

Local Laplacian Filters

...

Multiscale Interpolation

...

Pyramid Blending

...

Camera Pipeline

...
back to top


Publications

  1. PolyMage: Automatic Optimization for Image Processing Pipelines (PDF, slides)
    Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
    20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2015), Mar 2015.

  2. An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
    Abhinav Jangda, Uday Bondhugula
    ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2018 (to appear).
    Artifact evaluated (reusable and available).