Here, Intel engineers also had to work hard, in order to provide support for new 256-bit instructions without affecting the core size and power consumption.
Back in Nehalem, it has execution ports and three 128 bit stacks of execution units:
In Sandy Bridge, SIMD FP blocks responsible for processing of floating point operations got extended by two 128-bit stacks:
This solution enables one 256-bit instruction to “cover” two 128-bit stacks. This results in minimal die expenses while theoretically increasing floating point throughput two times. The bandwidth of execution cluster is extended by two 256-bit AVX operations per clock plus one 256-bit AVX load operation. The support for 128-bit XMM instruction has not been effected, however, the use of both SSE and AVX instruction in program code will lead to noticeable performance drop — to execute SSE instruction, processor needs to sent High 128 bits of AVX instruction to special cache.
Nehalem memory controller is connected to L1 Cache via 32 byte bus (16 bytes load and 16 bytes store per cycle) and has one load unit and and one read/write address generating block :
Sandy Bridge memory interface will have two symmetric load and store address ports and L1 cache bus of 48 byte width (32 bytes read + 16 bytes write per cycle):