CoSy Technology

CoSy

Background

Because of the specific nature of embedded processors and strong requirements on software tools, the construction of optimizing compilers can be a complicated task that is best left to the experienced. CoSy's open infrastructure and easy targeting are definitely making the life of compiler engineers much easier.

Obtaining results with CoSy goes beyond time and cost effectiveness. The quality and robustness of the CoSy system and the compilers it generates are equally important. What is a quick-turn compiler worth if it doesn't pass validation? With each release and update of CoSy, compiler experts at ACE go to great lengths performing rigorous testing and validation of all CoSy components in order to insure that the generator system produces correct compilers. For this we use our in-house developed SuperTest environment, as well as a range of industry-standard compiler test and validation suites.

CoSy

System CoSy System

In CoSy, optimal code selectors and optimization strategies are generated from descriptions that reflect the features, parallelism and timing of the architecture. And as the focus in producing compilers is moved to creation of the processor description, optimizing compilers can be available as soon as architecture specifications are stable.

Compilers built with the CoSy compiler development system are inherently of high quality and performance. To reach this level, front-ends, analysis algorithms, optimization algorithms, code selectors and register allocators are all engineered and generated as self-contained engines. These engines perform their specific function on the IR in co-operation with all the other engines configured into the compiler. Typically there will be over fifty of these independent engines in a compiler. A generated compiler supervisor controls the order in which the engines are invoked, and their interactions. Particularly with respect to the latest powerful processor architectures, the dynamics of compiler optimization can be quite challenging. During the compilation process many decisions are made in various phases of the compiler. Such decisions may seem appropriate in the context of a particular phase, but may be suboptimal, or even counter productive from the perspective of subsequent phases. With CoSy's independent engines, interacting on the IR and managed by the supervisor, the phase ordering challenge is more easy to grasp, as the basis is provided for powerful performance and space optimization strategies.

Target architectures

IR CoSy's generic design makes it the perfect environment for development of compilers for any class of processor architectures. As a natural consequence, CoSy has been used for the generation of compilers for dozens of different processor architectures ranging from 8/16/32/64-bit CISC microcontrollers to RISC, DSP, NoC and VLIW processors. Specific analysis engines in CoSy make sure that the instruction level parallelism that is inherent to DSP and VLIW architectures is fully taken into account. Based upon this information conditional execution and advanced scheduling techniques, among which software pipelining, ensure optimal use of the architecture's dedicated registers, ALUs, Load/Store units, pipelines, etc. Additionally, the ACE original DSP-C and recently accepted ISO/IEC Embedded C language extensions provide the necessary high-level language support to efficiently program embedded signal processing algorithms in C. Support for multiple memory spaces, fixed-point (fractional) data types and circular arrays and pointers has been built in the very foundations of CoSy, thus enabling CoSy DSP compilers to produce code that is on average 5 times faster and 36% smaller than with standard C. Lately we have seen the advent of reconfigurable processor architectures, which allow adaptation of the original design far beyond the standard peripheral functions, such as internal RAM, ROM and cache size. The more advanced configurable architectures allow significant flexibility of the design, including configuration of the number and depth of pipelines, the number of ALUs and subsets or supersets to the instruction set. This obviously is the ideal playground for CoSy, for these kind of 'minor' changes to the processor description can swiftly be turned into a newly generated, dedicated compiler. Providing prompt feedback on the effectiveness of the latest design step of the reconfigurable architecture, CoSy thus enables true integrated Hardware/Software design.

A wealth of Analysis and Optimization techniques

Analyses: Optimizations:
  • Advanced loop analysis
  • Alias analysis
  • Available expressions analysis
  • Code use-estimate analysis & prediction
  • Data flow analysis
  • Def-use analysis
  • Dominator tree analysis
  • Leaf procedure analysis
  • Optimized code debug support
  • Profiling:
    • Path (dynamic) profiling
    • Static profiling
  • SSA
  • Value range analysis
  • Algebraic optimization
  • Base binding
  • Basic block reordering
  • Branch optimization
  • Code simplification
  • Common sub expression elimination
  • Common tail merging
  • Constant sharing:
    • Floating Point constant sharing
    • String sharing
  • Constant folding
  • Constant optimization
  • Constant propagation
  • Control flow optimizations:
    • Chain flow optimization
    • If simplification
    • Straightening
  • Copy propagation
  • Data size optimization
  • Dead code removal:
    • Unreachable code
    • Unused code
  • Dead object removal:
    • Dead function elimination
    • Dead variable elimination
  • Delay-slot filling
  • Expression canonization
  • Forward substitution
  • Function inlining
  • Hardware loop generation
  • Instruction combining
  • Instruction scheduling:
    • Hyperblock
    • Forward & backward list
    • Look-ahead & exhaustive
    • Out of order / Delay slot filling
  • Leaf procedure optimization
  • Life range splitting
  • Loop optimizations:
    • Loop canonization
    • Loop fusion
    • Loop hoisting
    • Loop induction variable rewriting & elimination
    • Loop invariant code motion
    • Loop inversion
    • Loop scalar replacement
    • Loop removal
    • Loop reverse conversion
    • Loop unrolling
    • Software pipelining
  • Normalization
  • Operator overloading
  • Post-increment optimizations
  • Predicated execution
  • Register coalescing
  • Register promotion
  • Scalar replacement of aggregates and array references
  • SIMD instruction generation
  • Strength reduction
  • Switch optimization
  • Tail recursion elimination
  • User-defined intrinsic functions
  • Versatile peephole optimization framework