这个页面收集了关于新的 hard-float ABI ARM Debian 移植的观点想法, 很有希望在 Wheezy 中第一次发行。 在 Debian 存在的更多关于 ARM 构架的发行, 请参见 ArmPorts

目前状态

ArmHardFloatTodo 包含了目前的状态信息,什么工作了,什么没有,缺陷,翻译,等等。

ArmHardFloatChroot 包含了快速构架 armhf 发行版的 chroot 。

这个页面包含了这个发行版的背后,怎么工作的,怎么制造的,等等。

原理

现在很多 ARM 构架的板和设备带有浮点单元, 但是目前的 Debian armel 并没有很多利用它。

新的 ARM 移植要求 cpu 存在浮点单元帮助减小对硬件浮点单元以为的性能依赖。

支持的设备

目前 Debian armhf 移植要求至少是支持 ARMv7 指令集, Thumb-2 指令集 和 VFP3D16 指令集的 cpu 。

The ARM core and application list on wikipedia 可能是个有用的信息。 它也提到了一些类似更全面的列表。

背景信息

这一节提供了一些关于 FPUs (浮点单元), ARM EABI ,GCC 浮点 ABI , hwcaps 等等的信息

VFP

被叫做 Vector Floating Point 的 ARMv5 的可选指令集 (VFP) 。这是目前很有影响力的现代 ARM 标准; although it is possible to configure Cortex-A8, Cortex-A9 and Cortex-A5 with no VFP almost all implementations do provide it. Prior to this there was no real set of floating point instructions, with some vendors supplying their own coprocessors, most supplying none, and the original ARM FPU 'standard' implementation being almost entirely unused in the real world.

VFP was extended over time, with VFPv2 (some ARM9 / ARM11) VFPv3-D16 (e.g. Marvell Dove) and VFPv3+NEON (Most Cortex-A8) present in current production silicon.

In spite of the name, the base VFP architecture is not well suited for vector operations. For practical purposes it is a normal scalar floating point unit. The NEON extension defines vector instructions similar to SSE or ?AltiVec and shares a register file with the VFP unit.

NEON and VFP/VFP2/VFP3 remain an optional part of the architecture.

ARM EABI

The ARM EABI specification covers calling conventions across libraries and binaries. It defines two incompatible ABIs: one uses (VFP) floating point registers for passing function arguments, the other does not.

Unlike many other architectures, ARM supports use of FPU instructions while still conforming to the base ABI. This allows code to take advantage of the FPU without breaking compatibility with older libraries or applications. This does incur some overhead relative to a full hard-float system, and obviously requires a VFP capable CPU.

GCC floating-point options

For historical reasons and to match the ARM RVCT kit, the GCC FPU and ABI selection options are not entirely orthogonal. The -mfloat-abi= option controls both the ABI, and whether floating point instructions may be used. The available options are:

In addition, the -mfpu= option can be used to select a VFP/NEON (or FPA or Maverick) variant. This has no effect when -mfloat-abi=soft is specified.

The combination of -mfpu=vfp and -mfloat-abi=hard is not available in FSF GCC 4.4; see TODO section below for options.

See ?/VfpComparison for an in depth discussion and some performance research.

ld.so hwcaps

The GCC -mfloat-abi=softfp flag allows use of VFP while remaining compatible with soft-float code. This allows selection of appropriate routines at runtime based on the availability of VFP hardware.

The runtime linker, ld.so, supports a mechanism for selecting runtime libraries based on features reported by the kernel. For instance, it's possible to provide two versions of libm, one in /lib and another one in /lib/vfp, and ld.so will select the /lib/vfp one on systems with VFP.

This mechanism is dubbed "hwcaps".

This only works when the binaries are compatible (use the same ABI); you can't select between hard-float and soft-float libs with hwcaps. See ?/VfpComparison for more details.

http://wiki.debian.org/Multiarch/Spec is a great in-depth explanation of how ld.so and gcc and multilib paths interact if you want to understand this stuff.

Endianess, architecture level, CPU, VFP level

A new port would be little-endian as that is almost invariably used in recent ARM designs.

Since the new port would require VFP, it would limit which ?SoCs are supported by the new port.

The toolchain needs to be configured with a specific base CPU and base VFP version in mind.

It might make sense for such a new port -- which would essentially target newer hardware -- to target newer CPUs. For instance, it could target ARMv6 or ARMv7 ?SoCs, and VFPv2, VFPv3-D16 or NEON.

If targeting ARMv7, another option is to build for Thumb-2 which provides both code-size and speed(generally) improvements.

Name of the port

The table below recaps which port names Debian/dpkg saw so far.

name

endianess

status

arm

little-endian

Original Debian arm port using original ABI ('OABI'), last release in Debian lenny; being retired in favor of armel

armel

little-endian

introduced in Debian lenny; EABI, actively maintained; targets armv4t; doesn't require an FPU

armeb

big-endian

unofficial OABI port; inactive and dead

The name of the new ARM port using the hard-float ABI is 'armhf' (for 'hard-float').

See Port naming debate notes below for info on why and how this name was chosen.

In practice armel will be used for older CPUs (armv4t, armv5, armv6), and armhf for newer CPUs (armv7+VFP).

Other ('flavoured') builds of the base ports for different CPU optimisations and hardware capabilities are possible. The question of which (if any) other flavours will be maintained is currently under discussion, and is likely to be decided at the Debian ARM/Embedded Sprint in Feb 2011. e.g armel flavours for v6 and v7 CPUs without VFP, or an armhf flavour for VFP+NEON.

Triplet

GCC when built to target the GNU arm-linux-gnueabi triplet will support both the hard-float and soft-float calling conventions.

dpkg relies on the triplet to identify the port (gcc -dumpmachine output). Some other projects such as multiarch rely on having distinct triplets across all Debian architectures.

One option would be to use the vendor field in the triplet to have distinct triplets. For instance, the triplet could be arm-hardfloat-linux-gnueabi.

arm-none-linux-gnueabi, just like in CodeSourcery compilers, would be an option but it is confusing to relate to arm-linux-gnueabi versus arm-none-linux-gnueabi; it is clearer to relate to arm-hardfloat-linux-gnueabi and also allows distinguishing between CodeSourcery and the new port.

The final descision was to use a triplet of arm-linux-gnueabihf

Performance improvements and benchmarks

Genesi USA, Inc. did a proof-of-concept rebuild of Ubuntu karmic (9.10)'s armel port with the hard-floating. They noticed important wins (in the order of 40% performance improvement) in floating-point heavy applications/libraries such as mesa, with a Cortex-A8 CPU.

It's likely that the performance benefits are much larger on Cortex-A8 CPUs than on Cortex-A9 CPUs which have a faster VFP design and more conventional pipeline.

NEON

NEON is an extension of the VFP which allows for very efficient manipulation of matrices, and vector data in general. This is notably useful for processing audio and video data, or for fast memcpy().

Programs usually take advantage of NEON thanks to hand-crafted assembly routines. GCC can automatically vectorize code and generate NEON instructions, however this tends to have limited success. It would seem sensible NOT to require NEON in a new port since some modern ARMv7 ?SoCs such as Marvell Dove and NVidia Tegra2 don't implement it.

It is also possible to use NEON instructions for regular scalar floating point code, and this can give significant (2-3x) speedup on Cortex-A8 hardware. However GCC does not currently implement this, and it is not always applicable as NEON instructions are not fully IEEE compliant.

See ?/VfpComparison for an in depth discussion and some performance research.

Hardware

Genesi-USA would be happy to continue sharing the 9 EfikaMX (Freescale i.MX51) buildds used for their proof-of-concept to help get a new port started.

Genesi-USA is also giving hardware (10 EfikaMX T03) to main Debian sub-project leads for Education, Embedded, Live systems, Ubuntu developers, and Linaro developers.

Genesi-USA is also giving old hardware (EfikaMX T02) which could be used to help out buildds, setup porterboxes or give away to interested developers who would work on the new port. While stock it is limited, if you are interested, register yourself into PowerDeveloper site and, then, contact Hector Oron <zumbi@debian.org>.

Task List

Partial reference of SoC and supported ISAs

Manufacturer

SoC

architecture

VFP

SIMD

Notes

Freescale

iMX5x

armv7

VFPv3

NEON

Cortex-A8; NEON only reliable in Tape-Out 3 or above

Nvidia

Tegra2

armv7

VFPv3 D16

none

Marvell

Dove

armv7

VFPv3 D16

iwMMXt

Texas Instruments

OMAP3xxx

armv7

VFPv3

NEON

Cortex-A8

Texas Instruments

OMAP4xxx

armv7

VFPv3

NEON

Cortex-A9

Texas Instruments

OMAP5xxx

armv7

VFPv4

NEON

Cortex-A15 (ARMv7-A) + Cortex-M4 (ARMv7-ME)

Qualcomm

Snapdragon

armv7

VFPv3

NEON[1]

Qualcomm "Scorpion" core

Samsung

S5PC100

armv7

VFPv3

NEON

Cortex-A8

Larger list at http://en.wikipedia.org/wiki/ARM_architecture TI OMAP specific infos at http://en.wikipedia.org/wiki/Texas_Instruments_OMAP

Port naming debate notes

Port naming is a classic bikeshed issue. It is also important as decisions have long-term effects, and the issues are quite complex. The ARM architecture has more variation than most and thus presents particular naming challenges. So we had quite a long discussion about the port name (and the need for a port at all).

You can read the full naming debate threads here: http://lists.debian.org/debian-arm/2010/07/msg00019.html "armelfp: new architecture name for an armel variant" http://lists.debian.org/debian-arm/2010/07/msg00100.html "cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)"

(that'll keep you entertained for a couple of hours at least :-)

Here is a very brief summary of some major points raised.

'cortex' was suggested by Genesi's Matt Sealey as it more-or-less defines the armv7+vpf3-d16+Thumb-2 set of options the port is targetting, but the idea of using a marketing name for the arch was not popular.

Other suggestions generally were about encoding more things in the port name: base arm architecture flavour, endianness, arm 'profile' (A/M/R as in Cortex-A8, cortex-M3). So we had "armelfp" or "armelhp" and "armelvfp" and "arm7hf" and "armel7hf". THis was followed by suggestions to shorten names along the lines of: Losing the "e" for EABI and "l" for little endian doesn't seem like that big a loss, as by far the most usual operation of these processor cores is under a little-endian EABI environment, and OABI is not even considered supported anymore for these chips.

The eventual conclusion was that port names in Debian should encode incompatible ABIs, not compatible variations within an ABI (such as CPU optimisations, referred to as 'flavours'). The default flavour for a port can change over time as older CPUs become obsolete. (e.g. the i386 architecture has been built for 386, 486 and 586 flavours over time). Rebuilds of a port for a new flavour within the ABI are possible to gain performance improvements, but Debian itself normally provides builds to the lowest common denominator still in widespread use, maximising generality. Thus attempts to encode all the possible flavour options in the port name were unnecessary and produced long and awkward names. A better solution to the problem of recording the flavour to which a port or package is built is suitable package metadata.

编译器

CodeSourcery 和 Linaro 的gcc 几乎相同, Genesi 目前推荐 CodeSourcery 2010q1 - 尽管它是 gcc 4.4.1 ,但是它与 gcc 4.5 对应同一移植有着同样的功能。这里有些对不使用主线 FSF GCC 的不安 - 我 (matt sealey) 明白不使用定制补丁直到它将被加入主线 Debian 的政策。这里可能有关于 Codesourcery 贡献了大部分 GCC 于 gcc4.5 的 ARM, PPC 浮点运算的争论 (主要来自 SourceryG++ 代码树)。它仍是 GPL 协议的。

Minimum CPU & FPU

Summary of Benefits

Using armv7 as a base

[1] http://www.insidedsp.com/Articles/tabid/64/articleType/ArticleView/articleId/238/Qualcomm-Reveals-Details-on-Scorpion-Core.aspx