英伟达(Nvidia)日前揭开其客制化 64位ARM核心处理器之神秘面纱,这款代号“丹佛(Denver)”的处理器开发案早在 2011年1月就首度曝光,采用微指令(microcode)架构,具备新一代执行优化功能(execution optimizer)。
该款Nvidia预定在今年推出的双核心处理器是Tegra K1的升级,锁定平板设备应用;目前的32位版本Tegra K1目标应用是Android平台产品,已进驻了宏碁(Acer)的Chromebook、Goole的Project Tango平板设备、小米(Xiaomi)的MyPad,以及Nvidia自家的Shield平板设备。
Nvidia声 称,64位的Tegra K1将可让移动设备具备PC等级的性能,支持游戏、企业应用以及内容创作等;根据该公司表示,基准检验数据显示Denver的效能与英特尔(Intel) Haswell处理器相当,且超越苹果(Apple)的A7系列处理器10~25%。
Nvidia 展示的数据为x86架构处理器与32位ARM处理器的性能测试比较
jXmesmc
不过Nvidia并没有提供Denver与ARM的标准64位A57核心之性能比较;锁定服务器与网络设备应用,AMD最近开始提供采用A57核心的处理器样品,而Applied Micro也推出了客制化64位ARM核心的芯片样品。
因为缺乏标准与客制化64位ARM核心处理器的性能测试比较数据,Nvidia是否能藉Denver提升在移动设备应用领域的地位还不清楚;在该领域,Nvidia还远远落后龙头厂商高通(Qualcomm)。
Denver处理器核心架构
jXmesmc
Denver 每频率最多能执行7个指令集,最高运作频率2.5GHz,内涵128+64KB L1高速缓存,以及2MB的16路集合关联(set associative) L2高速缓存。该处理器最新奇的部分则是取代全乱序执行的优化执行功能,可处理包括缓存器重新命名、回路展开(unrolling loops)、断开对false指令归属(breaking false code dependencies),以及移除未用的运算等。
该优化程序链结了相关的例行程序(routines),并应用了128MB的主存储器,在操作系统开机之前进行安全分割(securely partitioned)。Nvidia架构长Darrell Boggs在近日于美国举行的Hot Chip大会上表示:“我们看到优化程序可带来两倍以上的速度提升。”
Denver代表Nvidia使用协同处理器核心 (companion core)的时代已经结束,这是该公司早期32位ARM处理器的优势所在,而ARM仍持续寻求混合搭配32位与64位核心的解决方案。其他 Denver的特点包括重复使用内存管线(pipeline)以统整流量,以及可补偿高速缓存遗漏的预先撷取(pre-fetch)功能。
本文授权编译自EE Times,版权所有,谢绝转载
编译:Judith Cheng
参考英文原文:Nvidia Flexes Custom 64-Bit ARM,by Rick Merritt
{pagination}
Nvidia Flexes Custom 64-Bit ARM
Rick Merritt
CUPERTINO, Calif. — Nvidia has opened the hood on its custom 64-bit ARM core first announced in January 2011. "Denver" is an ARM processor that uses microcode to enable a novel execution optimizer.
Two cores will ship this year in an SoC that is an upgrade to Nvidia's Tegra K1, targeting tablets. The existing 32-bit chip targets Android and is used in an Acer Chromebook, Google's Project Tango tablet, Xaomi's MyPad, and Nvidia's own Shield tablet.
Nvidia clams the 64-bit Tegra K1 will sport PC-class performance in mobile systems for gaming, business apps, and content creation. Denver was nearly on par with an Intel Haswell processor and surpassed by 10 to 25% an Apple A7 series SoC in benchmarks Nvidia showed.
Nvidia only showed benchmarks against the x86 and 32-bit ARM SoCs.
The company did not give any comparisons with a standard A57 64-bit core from ARM. Targeting servers and networking gear, AMD just started to sample SoCs using the A57, and Applied Micro has started sampling its custom 64-bit ARM.
Until benchmarks against standard and custom 64-bit ARM SoCs emerge, it's not clear whether Denver will help Nvidia improve its position in mobile systems, where it significantly trails leader Qualcomm.
Denver can execute as many as seven instructions per clock, running up to a 2.5 GHz rate. It packs a 128+64 kbyte L1 cache and 2 Mbyte 16-way set associative L2 cache.
The most novel aspect of Denver is an optimized execution feature used as an alternative to a full out-of-order design. It handles a variety of optimizations such as renaming registers, unrolling loops, breaking false code dependencies, and removing unused computations.
The optimizer chains related routines and uses 128 Mbytes of main memory, securely partitioned before an operating system boots. "We see a 2x speed-up or better with optimized routines," said Darrell Boggs, chief architect on the project, speaking in a talk at the annual Hot Chips conference here.
The new core marks the end of Nvidia's use of a companion core, something it pioneered with its early 32-bit ARM SoCs. ARM continues to pursue the approach with mixed 32- and 64-bit cores.
Among other techniques, Denver can reuse memory pipelines for integer traffic, and it has a pre-fetch to compensate for cache misses.
Denver is a microcoded seven-wide superscalar 64-bit ARM.
责编:Quentin