Halaman ini mengumpulkan pemikiran dan ide-ide seputar port ARM hard-float untuk Debian. <> == Status Terkini == ArmHardFloatTodo berisi semua informasi status terkini. Apa yang sudah berjalan dan tidak, kutu, terjemahan, dan lain-lain. ArmHardFloatChroot berisi instruksi cepat untuk mengeset sebuah chroot armhf. Halaman ini berisi latar belakang port, bagaimana dan mengapa hal ini ada. == Alasan == Banyak papan-papan ARM modern dan perangkat disertai dengan floating-point unit (FPU) tetapi port armel Debian terkini tidak banyak memanfaatkannya. Sebuah port ARM baru dengan kehadiran FPU akan membantu memanfaatkan sebagian besar kinerja dari perangkat keras dengan FPU. == Perangkat-perangkat yang didukung == Saat ini port Debian armhf membutuhkan paling tidak sebuah CPU ARMv7 dengan Thumb-2 dan VFP3D16. [[http://en.wikipedia.org/wiki/ARM_architecture#ARM_cores|Daftar inti dan aplikasi ARM di wikipedia]] dapat menjadi referensi yang berguna. Halaman tersebut juga mempunyai referensi lain yang lebih lengkap. == Informasi latar belakang == Bagian ini menyediakan beberapa informasi latar belakang untuk FPU, ARM EABI, GCC floating-point ABI, hwcaps... === VFP === With ARMv5 an optional floating point instruction set known as Vector Floating Point ([[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]]) was introduced. This is now effectively the norm for modern ARM implementations; although it is possible to configure [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]], [[http://www.arm.com/products/processors/cortex-a/cortex-a9.php|Cortex-A9]] and [[http://www.arm.com/products/processors/cortex-a/cortex-a5.php|Cortex-A5]] with no [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] almost all implementations do provide it. Prior to this there was no real standardized set of floating point instructions, with some vendors supplying their own coprocessors, most supplying none, and the original ARM FPU 'standard' implementation being almost entirely unused in the real world. [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] was extended over time, with VFPv2 (some [[http://www.arm.com/products/processors/classic/arm9/index.php|ARM9]] / [[http://www.arm.com/products/processors/classic/arm11/index.php|ARM11]]) VFPv3-D16 (e.g. Marvell Dove) and VFPv3+NEON (Most [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]]) present in current production silicon. In spite of the name, the base [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] architecture is not well suited for vector operations. For practical purposes it is a normal scalar floating point unit. The [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] extension defines vector instructions similar to SSE or AltiVec and shares a register file with the [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] unit. [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] and VFP/VFP2/VFP3 remain an optional part of the architecture. === ARM EABI === The ARM EABI specification covers calling conventions across libraries and binaries. It defines two incompatible ABIs: one uses ([[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]]) floating point registers for passing function arguments, [[ArmEabiPort|the other]] does not. Unlike many other architectures, ARM supports use of FPU instructions while still conforming to the base ABI. This allows code to take advantage of the FPU without breaking compatibility with older libraries or applications. This does incur some overhead relative to a full hard-float system, and obviously requires a [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] capable CPU. === GCC floating-point options === For historical reasons and to match the ARM RVCT kit, the GCC FPU and ABI selection options are not entirely orthogonal. The `-mfloat-abi=` option controls both the ABI, and whether floating point instructions may be used. The available options are: * `soft`: Full software floating point. * `softfp`: Use the FPU, but remain compatible with soft-float code. * `hard`: Full hardware floating point. In addition, the `-mfpu=` option can be used to select a VFP/NEON (or FPA or Maverick) variant. This has no effect when -mfloat-abi=soft is specified. The combination of -mfpu=vfp and -mfloat-abi=hard is not available in FSF GCC 4.4; see TODO section below for options. See /VfpComparison for an in depth discussion and some performance research. === ld.so hwcaps === The GCC `-mfloat-abi=softfp` flag allows use of [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] while remaining compatible with soft-float code. This allows selection of appropriate routines at runtime based on the availability of [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] hardware. The runtime linker, `ld.so`, supports a mechanism for selecting runtime libraries based on features reported by the kernel. For instance, it's possible to provide two versions of libm, one in `/lib` and another one in `/lib/vfp`, and `ld.so` will select the `/lib/vfp` one on systems with [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]]. This mechanism is dubbed "hwcaps". This only works when the binaries are compatible (use the same ABI); you can't select between hard-float and soft-float libs with hwcaps. See /VfpComparison for more details. http://wiki.debian.org/Multiarch/Spec is a great in-depth explanation of how ld.so and gcc and multilib paths interact if you want to understand this stuff. == Endianess, architecture level, CPU, VFP level == A new port would be little-endian as that is almost invariably used in recent ARM designs. Since the new port would require [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]], it would limit which SoCs are supported by the new port. The toolchain needs to be configured with a specific base CPU and base [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] version in mind. It might make sense for such a new port -- which would essentially target newer hardware -- to target newer CPUs. For instance, it could target ARMv6 or ARMv7 SoCs, and VFPv2, VFPv3-D16 or [[http://www.arm.com/products/processors/technologies/neon.php|NEON]]. If targeting ARMv7, another option is to build for Thumb-2 which provides both code-size and speed(generally) improvements. == Name of the port == The table below recaps which port names Debian/dpkg saw so far. || '''name''' || '''endianess''' || '''status''' || || `arm` || little-endian || Original Debian arm port using original ABI ('OABI'), last release in Debian lenny; being retired in favor of `armel`|| || `armel` || little-endian || introduced in Debian lenny; EABI, actively maintained; targets `armv4t`; doesn't require an FPU || || `armeb` || big-endian || unofficial OABI port; inactive and dead || The name of the new ARM port using the hard-float ABI is 'armhf' (for 'hard-float'). See Port naming debate notes below for info on why and how this name was chosen. In practice armel will be used for older CPUs (armv4t, armv5, armv6), and armhf for newer CPUs (armv7+VFP). Other ('flavoured') builds of the base ports for different CPU optimisations and hardware capabilities are possible. The question of which (if any) other flavours will be maintained is currently under discussion, and is likely to be decided at the [[Sprints/2011/EmdebianSprint|Debian ARM/Embedded Sprint]] in Feb 2011. e.g armel flavours for v6 and v7 CPUs without [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]], or an armhf flavour for VFP+NEON. == Triplet == GCC when built to target the GNU `arm-linux-gnueabi` triplet will support both the hard-float and soft-float calling conventions. dpkg relies on the triplet to identify the port (`gcc -dumpmachine` output). Some other projects such as multiarch rely on having distinct triplets across all Debian architectures. For the new Debian port, it is possible to use the vendor field in the triplet to have distinct triplets. For instance, the triplet could be `arm-hardfloat-linux-gnueabi`. `arm-none-linux-gnueabi`, just like in CodeSourcery compilers, would be an option but it is confusing to relate to `arm-linux-gnueabi` versus `arm-none-linux-gnueabi`; it is clearer to relate to `arm-hardfloat-linux-gnueabi` and also allows distinguishing between CodeSourcery and the new port. == Performance improvements and benchmarks == Genesi USA, Inc. did a proof-of-concept rebuild of Ubuntu karmic (9.10)'s armel port with the hard-floating. They noticed [[http://www.powerdeveloper.org/forums/viewtopic.php?p=13609|important wins]] (in the order of 40% performance improvement) in floating-point heavy applications/libraries such as mesa, with a [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]] CPU. It's likely that the performance benefits are much larger on [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]] CPUs than on [[http://www.arm.com/products/processors/cortex-a/cortex-a9.php|Cortex-A9]] CPUs which have a faster [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] design and more conventional pipeline. == NEON == [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] is an extension of the [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFP]] which allows for very efficient manipulation of matrices, and vector data in general. This is notably useful for processing audio and video data, or for fast `memcpy()`. Programs usually take advantage of [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] thanks to hand-crafted assembly routines. GCC can automatically vectorize code and generate [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] instructions, however this tends to have limited success. It would seem sensible NOT to require [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] in a new port since some modern ARMv7 SoCs such as Marvell Dove and NVidia Tegra2 don't implement it. It is also possible to use [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] instructions for regular scalar floating point code, and this can give significant (2-3x) speedup on [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]] hardware. However GCC does not currently implement this, and it is not always applicable as [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] instructions are not fully IEEE compliant. See /VfpComparison for an in depth discussion and some performance research. == Hardware == Genesi-USA would be happy to continue sharing the 9 EfikaMX (Freescale i.MX51) buildds used for their proof-of-concept to help get a new port started. Genesi-USA is also giving hardware (10 EfikaMX T03) to main Debian sub-project leads for Education, Embedded, Live systems, Ubuntu developers, and Linaro developers. Genesi-USA is also giving old hardware (EfikaMX T02) which could be used to help out buildds, setup porterboxes or give away to interested developers who would work on the new port. While stock it is limited, if you are interested, register yourself into [[http://powerdeveloper.org|PowerDeveloper]] site and, then, contact Hector Oron . == Task List == * Choose a port name (DONE) * Decide on a triplet (DONE) * Get these it into dpkg (PENDING) * Get a compiler in shape for this port; (DONE) * GCC 4.5 supports the hard-float ABI, but 4.4 does not * .. the new port could either have backported support in `gcc-4.4`, or use `gcc-4.5` from the start * .. or use a different code base such as CodeSourcery SourceryG++ (~4.4.1) or Linaro GCC (~4.4.1 -> 4.5) * These two compilers are essentially FSF GCC with the fancier features from future GCC releases backported * Start bootstrapping (DONE) * Fix / port packages (IN PROGRESS) * libffi needs porting == Partial reference of SoC and supported ISAs == || Manufacturer || '''SoC''' || '''architecture''' || '''VFP''' || '''SIMD''' || '''Notes''' || || Freescale || iMX5x || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3]] || [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] || [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]]; [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] only reliable in Tape-Out 3 or above || || Nvidia || Tegra2 || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3 D16]] || none || || || Marvell || Dove || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3 D16]] || iwMMXt || || || [[http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=11989&contentId=4682|Texas Instruments]] || OMAP3xxx || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3]] || [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] || [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]] || || [[http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=12842&contentId=53247|Texas Instruments]] || OMAP4xxx || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3]] || [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] || [[http://www.arm.com/products/processors/cortex-a/cortex-a9.php|Cortex-A9]] || || [[http://www.ti.com/ww/en/omap/omap5/omap5-platform.html|Texas Instruments]] || OMAP5xxx || `armv7` || VFPv4 || [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] || [[http://www.arm.com/products/processors/cortex-a/cortex-a15.php|Cortex-A15]] (ARMv7-A) + [[http://www.arm.com/products/processors/cortex-m/cortex-m4-processor.php|Cortex-M4]] (ARMv7-ME) || || Qualcomm || Snapdragon || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3]] || [[http://www.arm.com/products/processors/technologies/neon.php|NEON]][1] || Qualcomm "Scorpion" core || || [[http://www.samsung.com/global/business/semiconductor/productInfo.do?fmly_id=834&partnum=S5PC100&xFmly_id=229|Samsung]] || S5PC100 || `armv7` || [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|VFPv3]] || [[http://www.arm.com/products/processors/technologies/neon.php|NEON]] || [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]] || Larger list at http://en.wikipedia.org/wiki/ARM_architecture TI OMAP specific infos at http://en.wikipedia.org/wiki/Texas_Instruments_OMAP == Port naming debate notes == Port naming is a classic bikeshed issue. It is also important as decisions have long-term effects, and the issues are quite complex. The ARM architecture has more variation than most and thus presents particular naming challenges. So we had quite a long discussion about the port name (and the need for a port at all). You can read the full naming debate threads here: http://lists.debian.org/debian-arm/2010/07/msg00019.html "armelfp: new architecture name for an armel variant" http://lists.debian.org/debian-arm/2010/07/msg00100.html "cortex / arm-hardfloat-linux-gnueabi (was Re: armelfp: new architecture name for an armel variant)" (that'll keep you entertained for a couple of hours at least :-) Here is a very brief summary of some major points raised. 'cortex' was suggested by Genesi's Matt Sealey as it more-or-less defines the armv7+vpf3-d16+Thumb-2 set of options the port is targetting, but the idea of using a marketing name for the arch was not popular. Other suggestions generally were about encoding more things in the port name: base arm architecture flavour, endianness, arm 'profile' (A/M/R as in [[http://www.arm.com/products/processors/cortex-a/cortex-a8.php|Cortex-A8]], cortex-M3). So we had "armelfp" or "armelhp" and "armelvfp" and "arm7hf" and "armel7hf". THis was followed by suggestions to shorten names along the lines of: Losing the "e" for EABI and "l" for little endian doesn't seem like that big a loss, as by far the most usual operation of these processor cores is under a little-endian EABI environment, and OABI is not even considered supported anymore for these chips. The eventual conclusion was that port names in Debian should encode incompatible ABIs, not compatible variations within an ABI (such as CPU optimisations, referred to as 'flavours'). The default flavour for a port can change over time as older CPUs become obsolete. (e.g. the i386 architecture has been built for 386, 486 and 586 flavours over time). Rebuilds of a port for a new flavour within the ABI are possible to gain performance improvements, but Debian itself normally provides builds to the lowest common denominator still in widespread use, maximising generality. Thus attempts to encode all the possible flavour options in the port name were unnecessary and produced long and awkward names. A better solution to the problem of recording the flavour to which a port or package is built is suitable package metadata. === Compiler === CodeSourcery and Linaro gcc are almost the same thing, Genesi recommend CodeSourcery 2010q1 for now - it is gcc 4.4.1 but it is functionally equivalent to gcc 4.5 for the purpose of a port. There are qualms about not using mainline FSF GCC - I (matt sealey) understand Debian policy decides no custom patches unless they are going to be mainlined in the future. It can be argued that Codesourcery commits most of the ARM, PPC support for GCC and are the architects of hardfp support in gcc 4.5 (published and maintained in the SourceryG++ tree). It's still GPL. === Minimum CPU & FPU === * The lowest worthwhile CPU implementation is ARMv7-A (therefore the recommended build option is `-march=armv7-a`) * FPU should be set at VFPv3-D16 as they represent the miminum specification of the processors to support here (therefore the recommended build option is `-mfpu=vfpv3-d16`) * Building for VFPv3-D16 instead of VFPv3[-D32] only loses the use of 16 FP registers - not a great loss * Thumb-2/ThumbEE: Thumb-2 provides code size improvements and unlike thumb(1) there is no interworking overhead except in a few corner cases. Also Thumb-2 is sufficiently complete that there is no need to fall back to ARM ISA for some operations (unlike thumb1). Thus defaulting to Thumb-2 on v7 or later makes sense. * Some concern for fast-enough, pretty awesome (600MHz+) ARMv6 + VFPv2 processors here - i.MX37 etc. - which will not be supported by armhf default flavour, but.. we will have to live with that === Summary of Benefits === Using armv7 as a base [1] http://www.insidedsp.com/Articles/tabid/64/articleType/ArticleView/articleId/238/Qualcomm-Reveals-Details-on-Scorpion-Core.aspx ---- CategoryPorts