Intel Sandy Bridge. Microarchitecture
Intel Sandy Bridge microarchitecture overview
Previous generation processor systems had data transfer between different functional blocks organized via specialized data buses — QPI и DMI. In case of Sandy Bridge, graphics and system logic is integrated, therefore, an internal ring bus (Ring Interconnect) is used instead of QPI и DMI buses :
The Ring has connected all processor cores, graphics core, Last Level Cache (LLC) and System agent that includes control logic.
In fact, the Ring — is an enhanced version of QPI protocol with some specific additional features.
The Ring in Sandy Bridge — it is a common word while data transfer is really organized via four 32-byte functional buses:
- Data Ring
- Request Ring
- Acknowledge ring
- Snoop Ring
All of them are used in different phases of transaction, therefore, you need four clocks to gain full access to the Ring. It also allows any component it connects to communicate to each other directly. Thereby, both graphics system and processor core can use cache at the same time.
Compared to Ring bus in Westmere EX cores which had a common access point to all processor core, Sandy Bridge has individual access points (Cache Box) that provides scalability for future development. In quad-core processor the core to LLC speed increased from 96GB/s (Westmere EX) to 384GB/s (Sandy Bridge). The cache latency has reduced from 36 clocks (Westmere) to 26-31 (Sandy Bridge).
The Ring runs at the core clock and voltages. With load applied to processor, core clock increases as well as Ring and LLC clocks does. This allows reaching full scalability of per-core performance and power efficiency. However, to me personally, it is not clear how will this effect the performance of graphics system in case the load is applied to graphics system only while the Ring, LLC and cores continue to run at lower clocks.