Drinking from the Firehose – Run-ahead Transfer Prediction
Date: Tuesday, November 12, 2013
Speaker: Ivan Godard
, CTO Out-of-the-Box Computing
: 6:30 PM (PT) Networking/Refreshments, 7:00 PM Presentation
Please register by ordering a ticket below
Cadence / Bldg 10, 2655 Seely Ave, San Jose, CA (map
Programs frequently execute only a handful of operations between transfers of control: branches, calls, and returns. Yet modern wide-issue VLIW and superscalar CPUs can issue similar handfuls of operations every cycle, so the hardware must be able to change to a new point of execution each cycle if performance is not to suffer from stalls. To change the point of execution requires determining the new execution address, fetching instructions at that address from the memory hierarchy, decoding the instructions, and issuing them, steps that can take tens to hundreds of cycles on modern out-of-order machines. Without special provision, a machine could take 20 cycles to transfer in order to do one cycle of actual work.
The special provision in conventional out-of-order processors is the branch predictor, which attempts to predict the taken vs. untaken state of conditional branches based on historical behavior of the same branch in earlier executions. Modern predictors achieve 95% accuracy, and large instruction windows can hide top-level cache latency, which together are sufficient for programs like benchmarks that are regular and small. On real-world problems such CPUs can spend a third or more of cycles stalled for code.
The Mill uses a novel prediction mechanism to avoid these problems; it predicts transfers rather than only branches. It can do so for all code, including cold code that has never been executed, running well ahead of execution so as to mask all cache and most memory latency. It needs no area- and power-hungry instruction window, using instead a very short decode pipeline and direct in-order issue and execution. It can use all present and future prediction algorithms, with the same accuracy as any other processor. On those occasions in which prediction is in error, the mispredict penalty is four cycles, a quarter that of superscalar designs. As a result, code stall is a rarity on a Mill, even on large programs with irregular control flow.
The talk describes the prediction mechanism of the Mill and compares it with the conventional approach. The talk is one of a series on the Mill architecture; others can be found at: http://ootbcomp.com/docs
Ivan Godard has designed, implemented or led the teams for 11 compilers for a variety of languages and targets, an operating system, an object-oriented database, and four instruction set architectures. He participated in the revision of Algol68 and is mentioned in its Report, was on the Green team that won the Ada language competition, designed the Mary family of system implementation languages, and was founding editor of the Machine Oriented Languages Bulletin. He is a Member Emeritus of IFIPS Working Group 2.4 (Implementation languages) and was a member of the committee that produced the IEEE and ISO floating-point standard 754-2011.
Ivan is currently CTO at Out-of-the-Box Computing, a startup now emerging from stealth mode. OOTBC has developed the Mill, a clean-sheet rethink of general-purpose CPU architectures. The Mill is the subject of this talk.
Stream Presentation [Requires MS Silverlight]