dmaze

My on-and-off work project for the past several months has been designing a new intermediate representation for our compiler; that is, come up with a new way for our program to look at the program you typed in, store it internally, mangle it, and ultimately spit out something the computer understands. This week I've been attacking CiteSeer to try to find out what the state-of-the-art in IRs is, particularly since I have something in mind that I'm sure somebody else has thought of before.

Conclusions: intermediate representations that store code as glorified lists of assembly instructions have been around forever, and the most recent interesting advance was static single assignment (SSA) form, around 1990ish. If you want to abandon that, there are various graph forms that are less obvious but more flexible. The most common of these is program dependence graph form (1987); Microsoft Research came up with something called value dependence graphs in 1994, but they never really escaped into the broader world.

And that's it. People keep implementing things, but as far as new and exciting things go, the newest commonly-used thing is 15 years old. Whatever I do for this, it's unlikely to be new and innovative; everything has pretty much been done. Thinking about compilers as a whole, there are classical optimizations, which aren't exciting, and processor-specific things. For really weird architectures (and even for some pretty conventional architectures these days) the compiler is the only thing that makes the chip useful, but in a sense it's grunge work supporting other people's advances.

Flat | Top-Level Comments Only

From: (Anonymous)

Some of this is because the chip makers are doing interesting things, and the bulk of the compilers out there are C/C++ and are scrambling to keep up, presumably? The problem there is of course that C++ doesn't have much that doesn't map directly to C (some, but not much) and the C is glorified PDP11 assembler (heck, there are VAX instructions that are higher level than you can express in C :-)

Doesn't it make more sense to look at more expressive higher levels, than just "more ways to turn C into something that doesn't match it anymore"? I was both impressed and a little dismayed at a paper Ken showed me (5+ years ago?) about essentially "mining" a C compiler for information about a chipset, in such a way that you can take the results and generate code directly [the claim was that it was both more powerful and generated better code than just using C as a back end "machine"] Most of the dismay was that non-free compilers made this kind of work necessary, let alone worthwhile.

Or perhaps there's some value in looking down a layer - I've always found that knowing what's going on "one abstraction layer deeper" gave me a major edge on just about any project; Laura's career is at least partly based on "knocking some sense into routing protocol designers by poking ocscilliscopes at network cards". Likewise, looking *up* a layer gives you more insight into why you're doing anything at the layer you've chosen. Something to consider, when considering your career path, as well as any given project...

_Mark_

From:

dmaze

There seem to be two main areas of actual new compiler work on existing languages. One is increasing parallelism; if you have four issue units on your processor, figure out how to do four things at the same time, all the time. The other is program correctness, figuring out how to prove that the programmer isn't unintentionally writing broken code. What we're doing involves a "new" language, but its core syntax is Java-like, and we can compile to either Java or C right now. We're aiming some for increased parallelism, but one of the reasons I'm doing this is that our compiler (and nobody else's, really) is written in Java.

(

obra didn't tag you with an account? I certainly blame him for mine, and my impression was that he was handing out invite codes to everyone he could think of...)

If I ask, he'd probably give me one, but I've been resisting. (at least partly because it's *so* badly engineered.) Also, using anonymous posts makes the point that they're not *really* authenticated anyway :-)

_Mark_

I'm in a dead field

I'm in a dead field

no subject

no subject

no subject

Profile

Page Summary

Expand Cut Tags