======================
Using Polly with Clang
======================
This documentation discusses how Polly can be used in Clang to automatically
optimize C/C++ code during compilation.
.. warning::
Warning: clang/LLVM/Polly need to be in sync (compiled from the same
revision).
Make Polly available from Clang
===============================
Polly is available through clang, opt, and bugpoint, if Polly was checked out
into tools/polly before compilation. No further configuration is needed.
Optimizing with Polly
=====================
Optimizing with Polly is as easy as adding -O3 -mllvm -polly to your compiler
flags (Polly is not available unless optimizations are enabled, such as
-O1,-O2,-O3; Optimizing for size with -Os or -Oz is not recommended).
.. code-block:: console
clang -O3 -mllvm -polly file.c
Automatic OpenMP code generation
================================
To automatically detect parallel loops and generate OpenMP code for them you
also need to add -mllvm -polly-parallel -lgomp to your CFLAGS.
.. code-block:: console
clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c
Switching the OpenMP backend
----------------------------
The following CL switch allows to choose Polly's OpenMP-backend:
-polly-omp-backend[=BACKEND]
choose the OpenMP backend; BACKEND can be 'GNU' (the default) or 'LLVM';
The OpenMP backends can be further influenced using the following CL switches:
-polly-num-threads[=NUM]
set the number of threads to use; NUM may be any positive integer (default: 0, which equals automatic/OMP runtime);
-polly-scheduling[=SCHED]
set the OpenMP scheduling type; SCHED can be 'static', 'dynamic', 'guided' or 'runtime' (the default);
-polly-scheduling-chunksize[=CHUNK]
set the chunksize (for the selected scheduling type); CHUNK may be any strictly positive integer (otherwise it will default to 1);
Note that at the time of writing, the GNU backend may only use the
`polly-num-threads` and `polly-scheduling` switches, where the latter also has
to be set to "runtime".
Example: Use alternative backend with dynamic scheduling, four threads and
chunksize of one (additional switches).
.. code-block:: console
-mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=4
-mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1
Automatic Vector code generation
================================
Automatic vector code generation can be enabled by adding -mllvm
-polly-vectorizer=stripmine to your CFLAGS.
.. code-block:: console
clang -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c
Isolate the Polly passes
========================
Polly's analysis and transformation passes are run with many other
passes of the pass manager's pipeline. Some of passes that run before
Polly are essential for its working, for instance the canonicalization
of loop. Therefore Polly is unable to optimize code straight out of
clang's -O0 output.
To get the LLVM-IR that Polly sees in the optimization pipeline, use the
command:
.. code-block:: console
clang file.c -c -O3 -mllvm -polly -mllvm -polly-dump-before-file=before-polly.ll
This writes a file 'before-polly.ll' containing the LLVM-IR as passed to
polly, after SSA transformation, loop canonicalization, inlining and
other passes.
Thereafter, any Polly pass can be run over 'before-polly.ll' using the
'opt' tool. To found out which Polly passes are active in the standard
pipeline, see the output of
.. code-block:: console
clang file.c -c -O3 -mllvm -polly -mllvm -debug-pass=Arguments
The Polly's passes are those between '-polly-detect' and
'-polly-codegen'. Analysis passes can be omitted. At the time of this
writing, the default Polly pass pipeline is:
.. code-block:: console
opt before-polly.ll -polly-simplify -polly-optree -polly-delicm -polly-simplify -polly-prune-unprofitable -polly-opt-isl -polly-codegen
Note that this uses LLVM's old/legacy pass manager.
For completeness, here are some other methods that generates IR
suitable for processing with Polly from C/C++/Objective C source code.
The previous method is the recommended one.
The following generates unoptimized LLVM-IR ('-O0', which is the
default) and runs the canonicalizing passes on it
('-polly-canonicalize'). This does /not/ include all the passes that run
before Polly in the default pass pipeline. The '-disable-O0-optnone'
option is required because otherwise clang adds an 'optnone' attribute
to all functions such that it is skipped by most optimization passes.
This is meant to stop LTO builds to optimize these functions in the
linking phase anyway.
.. code-block:: console
clang file.c -c -O0 -Xclang -disable-O0-optnone -emit-llvm -S -o - | opt -polly-canonicalize -S
The option '-disable-llvm-passes' disables all LLVM passes, even those
that run at -O0. Passing -O1 (or any optimization level other than -O0)
avoids that the 'optnone' attribute is added.
.. code-block:: console
clang file.c -c -O1 -Xclang -disable-llvm-passes -emit-llvm -S -o - | opt -polly-canonicalize -S
As another alternative, Polly can be pushed in front of the pass
pipeline, and then its output dumped. This implicitly runs the
'-polly-canonicalize' passes.
.. code-block:: console
clang file.c -c -O3 -mllvm -polly -mllvm -polly-position=early -mllvm -polly-dump-before-file=before-polly.ll
Further options
===============
Polly supports further options that are mainly useful for the development or the
analysis of Polly. The relevant options can be added to clang by appending
-mllvm -option-name to the CFLAGS or the clang command line.
Limit Polly to a single function
--------------------------------
To limit the execution of Polly to a single function, use the option
-polly-only-func=functionname.
Disable LLVM-IR generation
--------------------------
Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
see the effects of the preparing transformation, but to disable Polly code
generation add the option polly-no-codegen.
Graphical view of the SCoPs
---------------------------
Polly can use graphviz to show the SCoPs it detects in a program. The relevant
options are -polly-show, -polly-show-only, -polly-dot and -polly-dot-only. The
'show' options automatically run dotty or another graphviz viewer to show the
scops graphically. The 'dot' options store for each function a dot file that
highlights the detected SCoPs. If 'only' is appended at the end of the option,
the basic blocks are shown without the statements the contain.
Change/Disable the Optimizer
----------------------------
Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
for data-locality and parallelism using the Pluto algorithm.
To disable the optimizer entirely use the option -polly-optimizer=none.
Disable tiling in the optimizer
-------------------------------
By default both optimizers perform tiling, if possible. In case this is not
wanted the option -polly-tiling=false can be used to disable it. (This option
disables tiling for both optimizers).
Import / Export
---------------
The flags -polly-import and -polly-export allow the export and reimport of the
polyhedral representation. By exporting, modifying and reimporting the
polyhedral representation externally calculated transformations can be
applied. This enables external optimizers or the manual optimization of
specific SCoPs.
Viewing Polly Diagnostics with opt-viewer
-----------------------------------------
The flag -fsave-optimization-record will generate .opt.yaml files when compiling
your program. These yaml files contain information about each emitted remark.
Ensure that you have Python 2.7 with PyYaml and Pygments Python Packages.
To run opt-viewer:
.. code-block:: console
llvm/tools/opt-viewer/opt-viewer.py -source-dir /path/to/program/src/ \
/path/to/program/src/foo.opt.yaml \
/path/to/program/src/bar.opt.yaml \
-o ./output
Include all yaml files (use \*.opt.yaml when specifying which yaml files to view)
to view all diagnostics from your program in opt-viewer. Compile with `PGO
<https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation>`_ to view
Hotness information in opt-viewer. Resulting html files can be viewed in an internet browser.