Message-ID:

New York... when civilization falls apart, remember, we were way ahead of you. -- David Letterman

devel / comp.arch / Re: Tonight's tradeoff

Branch miss logic versus clock frequency.

The branch miss logic for the current OoO version of Thor is quite
involved. It needs to back out the register source indexes to the last
valid source before the branch instruction. To do this in a single
cycle, the logic is about 25+ logic levels deep. I find this somewhat
unacceptable.

I can remove a lot of logic improving the clock frequency substantially
by removing the branch miss logic that resets the registers source id to
the last valid source. Instead of stomping on the instruction on a miss
and flushing the instructions in a single cycle, I think the predicate
for the instructions can be cleared which will effectively turn them
into a NOP. The value of the target register will be propagated in the
reorder buffer meaning the registers source id need not be reset. The
reorder buffer is only eight entries. So, on average four entries would
be turned into NOPs. The NOPs would still propagate through the reorder
buffer so it may take several clock cycles for them to be flushed from
the buffer. Meaning the branch latency for miss-predicted branches would
be quite high. However, if the clock frequency can be improved by 20%
for all instructions, much of the lost performance on the branches may
be made up.

Robert Finch wrote:
> Branch miss logic versus clock frequency.
>
> The branch miss logic for the current OoO version of Thor is quite
> involved. It needs to back out the register source indexes to the last
> valid source before the branch instruction. To do this in a single
> cycle, the logic is about 25+ logic levels deep. I find this somewhat
> unacceptable.
>
> I can remove a lot of logic improving the clock frequency substantially
> by removing the branch miss logic that resets the registers source id to
> the last valid source. Instead of stomping on the instruction on a miss
> and flushing the instructions in a single cycle, I think the predicate
> for the instructions can be cleared which will effectively turn them
> into a NOP. The value of the target register will be propagated in the
> reorder buffer meaning the registers source id need not be reset. The
> reorder buffer is only eight entries. So, on average four entries would
> be turned into NOPs. The NOPs would still propagate through the reorder
> buffer so it may take several clock cycles for them to be flushed from
> the buffer. Meaning the branch latency for miss-predicted branches would
> be quite high. However, if the clock frequency can be improved by 20%
> for all instructions, much of the lost performance on the branches may
> be made up.

Basically it sounds like you want to eliminate the checkpoint and rollback,
and instead let resources be recovered at Retire. That could work.

However you are not restoring the Renamer's future Register Alias Table (RAT)
to its state at the point of the mispredicted branch instruction, which is
what the rollback would have done, so its state will be whatever it was at
the end of the mispredicted sequence. That needs to be re-sync'ed with the
program state as of the branch.

That can be accomplished by stalling the front end, waiting until the
mispredicted branch reaches Retire and then copying the committed RAT,
maintained by Retire, to the future RAT at Rename, and restart front end.
The list of free physical registers is then all those that are not
marked as architectural registers.
This is partly how I handle exceptions.

Also you still need a mechanism to cancel start of execution of the
subset of pending uOps for the purged set. You don't want to launch
a LD or DIV from the mispredicted set if it has not already started.
If you are using a reservation station design then you need some way
to distribute the cancel request to the various FU's and RS's,
and wait for them to clean themselves up.

Note that some things might not be able to cancel immediately,
like an in-flight MUL in a pipeline or an outstanding LD to the cache.
So some of this will be asynchronous (send cancel request, wait for ACK).

There are some other things that might need cleanup.
A Return Stack Predictor might be manipulated by the mispredicted path.
Not sure how to handle that without a checkpoint.
Maybe have two copies like RAT, a future one maintained by Decode and
a committed one maintained by Retire, and copy the committed to future.

Robert Finch wrote:

> Branch miss logic versus clock frequency.

> The branch miss logic for the current OoO version of Thor is quite
> involved. It needs to back out the register source indexes to the last
> valid source before the branch instruction. To do this in a single
> cycle, the logic is about 25+ logic levels deep. I find this somewhat
> unacceptable.
<
When you launch a predicted branch into execution (prelude to signaling
recovery is required), while the branch is determining whether to backup
(or not) have the branch recovery logic setup the register indexes such
that::
a) if the branch succeeds keep the current map
b) if the branch fails, you are 1 multiplexer delay from having the state
you want.
<
That is move the setup to repair the previous clock.
<
> I can remove a lot of logic improving the clock frequency substantially
> by removing the branch miss logic that resets the registers source id to
> the last valid source. Instead of stomping on the instruction on a miss
> and flushing the instructions in a single cycle, I think the predicate
> for the instructions can be cleared which will effectively turn them
> into a NOP. The value of the target register will be propagated in the
> reorder buffer meaning the registers source id need not be reset. The
> reorder buffer is only eight entries. So, on average four entries would
> be turned into NOPs. The NOPs would still propagate through the reorder
> buffer so it may take several clock cycles for them to be flushed from
> the buffer. Meaning the branch latency for miss-predicted branches would
> be quite high. However, if the clock frequency can be improved by 20%
> for all instructions, much of the lost performance on the branches may
> be made up.

EricP wrote:

> Robert Finch wrote:
>> Branch miss logic versus clock frequency.
>>
>> The branch miss logic for the current OoO version of Thor is quite
>> involved. It needs to back out the register source indexes to the last
>> valid source before the branch instruction. To do this in a single
>> cycle, the logic is about 25+ logic levels deep. I find this somewhat
>> unacceptable.
>>
>> I can remove a lot of logic improving the clock frequency substantially
>> by removing the branch miss logic that resets the registers source id to
>> the last valid source. Instead of stomping on the instruction on a miss
>> and flushing the instructions in a single cycle, I think the predicate
>> for the instructions can be cleared which will effectively turn them
>> into a NOP. The value of the target register will be propagated in the
>> reorder buffer meaning the registers source id need not be reset. The
>> reorder buffer is only eight entries. So, on average four entries would
>> be turned into NOPs. The NOPs would still propagate through the reorder
>> buffer so it may take several clock cycles for them to be flushed from
>> the buffer. Meaning the branch latency for miss-predicted branches would
>> be quite high. However, if the clock frequency can be improved by 20%
>> for all instructions, much of the lost performance on the branches may
>> be made up.

> Basically it sounds like you want to eliminate the checkpoint and rollback,
> and instead let resources be recovered at Retire. That could work.

> However you are not restoring the Renamer's future Register Alias Table (RAT)
> to its state at the point of the mispredicted branch instruction, which is
> what the rollback would have done, so its state will be whatever it was at
> the end of the mispredicted sequence. That needs to be re-sync'ed with the
> program state as of the branch.
<
I, personally, don't use a RAT--I use a CAM based architectural decoder
for operand read and a standard physical equality decoder for writes.
<
Every cycle the CAM.valid bits are block loaded into a history table
and if you need to return the CAMs to the checkpointed mappings, you
take the valid bits from the history table and write the CAM.valid
bits back into the physical register file. Presto, the map is how it
used to be.
<
Can even be made to be performed in 0-cycles. {yes: 0 not 1 cycles}
<
> That can be accomplished by stalling the front end, waiting until the
> mispredicted branch reaches Retire and then copying the committed RAT,
> maintained by Retire, to the future RAT at Rename, and restart front end.
> The list of free physical registers is then all those that are not
> marked as architectural registers.
<
Sounds slow.
<
> This is partly how I handle exceptions.

> Also you still need a mechanism to cancel start of execution of the
> subset of pending uOps for the purged set. You don't want to launch
> a LD or DIV from the mispredicted set if it has not already started.
> If you are using a reservation station design then you need some way
> to distribute the cancel request to the various FU's and RS's,
> and wait for them to clean themselves up.
<
I use the concept of an execution window to do this both at the reservation
station and function units. There is an insert pointer and a consistent
pointer RS is only allowed to launch when the instruction is between.
FU are only allowed to calculate so long as the instruction remains
between these 2 pointers. The 2 pointers (4-bits each) are broadcast
around the machine every cycle. Each station and unit decide for themselves.

> Note that some things might not be able to cancel immediately,
> like an in-flight MUL in a pipeline or an outstanding LD to the cache.
> So some of this will be asynchronous (send cancel request, wait for ACK).
<
If an instruction that should not have its result delivered is delivered,
it is delivered to the physical register it was assigned at its issue time.
But since the value had not been delivered, that register is not in the
pool of assignable registers, so no dependency has been created.
<
> There are some other things that might need cleanup.
> A Return Stack Predictor might be manipulated by the mispredicted path.
<
Do these with a linked list and you can backup a misprediced return
to a mispredicted call.
<
> Not sure how to handle that without a checkpoint.
<
Every (non exceptional) flow altering instruction needs a checkpoint.
Predicated strings of instructions use a light weight checkpoint;
predicted branches use a heavy weight version.
<
> Maybe have two copies like RAT, a future one maintained by Decode and
> a committed one maintained by Retire, and copy the committed to future.

Re: Tonight's tradeoff

<uj1o0t$1kves$1@dont-email.me>

Subject	Author
Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	Anton Ertl
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Anton Ertl
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Anton Ertl
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	Anton Ertl
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	Anton Ertl
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	Chris M. Thomasson
Re: Tonight's tradeoff	EricP
Re: Tonight's tradeoff	Anton Ertl
Re: Tonight's tradeoff	Chris M. Thomasson
Re: Tonight's tradeoff	Chris M. Thomasson
Re: Tonight's tradeoff	BGB
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch
Re: Tonight's tradeoff	Scott Lurndal
Re: Tonight's tradeoff	MitchAlsup
Re: Tonight's tradeoff	Robert Finch