Respuesta :
Answer:
Clock 2.5GHz
L1 I cache 32KB, 8way, 64B line size, 4 cycle access latency
L1 Dcache write-back, write-allocate; MSHR with 0 (lockup
cache), 1, 2, and 64 (unconstrained non-blocking
cache) entries, write-back buffer with 16 entries
L2 cache 256KB, 8way, 64B line size, 10 cycle access latency
L3 cache 2MB per core, 64B line size, 36 cycle access latency
Memory DDR3-1600, 90 cycle access latency
Issue width 4
Instruction window size 36
ROB Size 128
Load Buffer Size 48
Store Buffer Size 32
b)
parallelism took this a step further by providing more parallelism and hence more
latency-hiding opportunities. It is likely that the use of instruction- and threadlevel
parallelism will be the primary tool to combat whatever memory delays are
encountered in modern multilevel cache systems.
that of the lockup cache setup (hit-under-0-miss). For the integer programs: the average performance
(measured as CPI) improvement is 7.08% for hit-under-1-miss, 8.36% for hit-under-2-misses, and 9.02%
for hit-under-64-misses (essentially the unconstraint non-blocking cache), compared to lockup cache. For
the floating point programs, the three numbers are 12.69%, 16.22%, and 17.76%, respectively
c)
Non-blocking caches are an effective technique for tolerating cache-miss latency. They can reduce
miss-induced processor stalls by buffering the misses and continuing to serve other independent access
requests. Previous research on the complexity and performance of non-blocking caches supporting
non-blocking loads showed they could achieve significant performance gains in comparison to blocking
caches. However, those experiments were performed with benchmarks that are now over a decade old.
Furthermore the processor that was simulated was a single-issue processor with unlimited run-ahead
capability, a perfect branch predictor, fixed 16-cycle memory latency, single-cycle latency for floating
point operations, and write-through and write-no-allocate caches. These assumptions are very different
from today's high performance out-of-order processors such as the Intel Nehalem. Thus, it is time to
re-evaluate the performance impact of non-blocking caches on practical out-of-order processors using
up-to-date benchmarks. In this study, we evaluate the impacts of non-blocking data caches using the latest
SPECCPU2006 benchmark suite on practical high performance out-of-order (OOO) processors.
Simulations show that a data cache that supports hit-under-2-misses can provide a 17.76% performance
gain for a typical high performance OOO processor running the SPECCPU 2006 benchmarks in
comparison to a similar machine with a blocking cache.
Explanation: