Please contact the NGA team (firstname.lastname@example.org) for any updates with regards to your source code.
This paper proposes open-source hardware P osit A rithmetic Co re Gen erator (PACoGen) for the recently developed universal number posit number system, along with a set of pipelined architectures. The posit number system composed of a run-time varying exponent component, which is defined by a composition of varying length “regime-bit” and “exponent-bit” (with a maximum size of ES bits, the exponent size). This in effect also makes the fraction part to vary at run-time in size and position. These run-time variations inherit an interesting hardware design challenge for posit arithmetic architectures. The posit number system, being at an infant stage of its development, possess very limited hardware solutions for its arithmetic architectures. In this view, this paper targets the algorithmic development and generic HDL generators (PACoGen) for basic posit arithmetic. The proposed open source PACoGen currently includes the adder/subtractor, multiplier, and division arithmetic. The PACoGen can provide the Verilog HDL code respective posit arithmetic for any given posit word width (N) and exponent size (ES), as defined under the posit number system. Further, pipelined architectures of 32-bit posit with 6-bit exponent size are also proposed and discussed for addition/subtraction, multiplication, and division arithmetic. The proposed posit arithmetic architectures are demonstrated on the Virtex-7 (xc7vx330t-3ffg1157) FPGA device as well as Nangate 15 nm ASIC platform.
The PACoGen would open a gateway for further posit arithmetic hardware exploration and evaluation.
Strassen's recursive algorithm for matrix-matrix multiplication has seen slow adoption in practical applications despite being asymptotically faster than the traditional algorithm.
A primary cause for this is the comparatively weaker numerical stability of its results. Techniques that aim to improve the errors of Strassen stand the risk of losing any potential performance gain.
Moreover, current methods of evaluating such techniques for safety are overly pessimistic or error prone and generally do not allow for quick and accurate comparisons.
In this paper we present an efficient technique to obtain rigorous error bounds for floating point computations based on an implementation of unum arithmetic.
Using it, we evaluate three techniques - exact dot product, fused multiply-add, and matrix quadrant rotation - that can potentially improve the numerical stability of Strassen's algorithm for practical use. We also propose a novel error-based heuristic rotation scheme for matrix quadrant rotation.
Finally we apply techniques that improve numerical safety with low overhead to a LINPACK linear solver to demonstrate the usefulness of the Strassen algorithm in practice.
Names of Working Group Members can be found in the "about us" section
It is still a work in progress document which is currently missing on language support, debugger support and human-readable formatting standards.
A new version will be updated as soon as it is available.
Published on 2018 IEEE 36th International Conference on Computer Design
This is a proposed extension of RISC-V for 32-bits Posits suggested by John Gustafson [Not Official]
Presentation slides and video recording from the inaugural Conference for Next Generation Arithmetic (CoNGA) held on 28 March 2018 at Resorts World Sentosa, Singapore.
Links are found under each speaker's time slot
To overcome the limitations of conventional floatingpoint number formats, an interval arithmetic and variable width storage format called universal number (unum) has been
recently introduced . This paper presents the first (to the best of our knowledge) silicon implementation measurements
of an application-specific integrated circuit (ASIC) for unum floating-point arithmetic. The designed chip includes a 128-bit
wide unum arithmetic unit to execute additions and subtractions, while also supporting lossless (for intermediate results) and lossy
(for external data movements) compression units to exploit the memory usage reduction potential of the unum format. Our chip,
fabricated in a 65 nm CMOS process, achieves a maximum clock frequency of 413 MHz at 1.2 V with an average measured power
of 210 uW/MHz.
An exclusive release of John Gustafson's active notebook onhis latest Type III Unums (posits).
An exclusive release of John Gustafson's active mathematical notebook on his latest Type III Unums (posits).
In the land of computer arithmetic, a tyrant has ruled since its very beginning: the floating point number. Under its rule we have all endured countless hardships and cruelties. To this very day the floating point number still denies 0.1 + 0.2 == 0.3 and returns insidious infinities to software developers everywhere. But a new hero has entered fray: the universal number (unum). Can it topple the float number system and its century long reign? This talk will introduce unums, explain their benefits over floating point numbers, and examine multiple real world examples comparing the two. For those not familiar with floating point numbers and their pitfalls, this talk also includes a primer on the topic. Code examples are in Rust, though strong knowledge of the language is not needed.
A discussion on posit computing with regards to John Gustafson's keynote at the HPC Advisory Council Australia Conference.
Supercomputing Asia magzine Issue 02, July 2017 featuring John Gustafson on "How posits revolutionaizing computing".
Dr. Gustafson was invited to Perth in July 2017 to give a keynote on unum. He shares in particular posit, the latest unum type, and how it beats floats.
This work discusses the implementation of UNUM arithmetic and reports hardware implementation results of some of the UNUM operators.
This page contains a video where Yonemoto uses a "quire" Julia package, which is a data structure intended to be implemented on a binary hardware. He will demonstrate in the video on how he perform an exact solution to a system of linear equations.
A new data type called a "posit" is designed for direct drop-in replacement for IEEE Standard 754 floats. Unlike unum arithmetic, posits do not require interval-type mathematics or variable size operands, and they round if an answer is inexact, much the way floats do. However, they provide compelling advantages over floats, including simpler hardware implementation that scales from as few as two-bit operands to thousands of bits. For any bit width, they have a larger dynamic range, higher accuracy, better closure under arithmetic operations, and simpler exception-handling. For example, posits never overflow to infinity or underflow to zero, and there is no "Not-a-Number" (NaN) value. Posits should take up less space to implement in silicon than an IEEE float of the same size. With fewer gate delays per operation as well as lower silicon footprint, the posit operations per second (POPS) supported by a chip can be significantly higher than the FLOPs using similar hardware resources. GPU accelerators, in particular, could do more arithmetic per watt and per dollar yet deliver superior answer quality.
A series of comprehensive benchmarks compares how many decimals of accuracy can be produced for a set number of bits-per-value, using various number formats. Low-precision posits provide a better solution than "approximate computing" methods that try to tolerate decreases in answer quality. High-precision posits provide better answers (more correct decimals) than floats of the same size, suggesting that in some cases, a 32-bit posit may do a better job than a 64-bit float. In other words, posits beat floats at their own game.
A new data type called a posit is designed as a direct drop-in replacement for IEEE Standard 754 floating-point numbers (floats). Unlike earlier forms of universal number (unum) arithmetic, posits do not require interval arithmetic or variable size operands; like floats, they round if an answer is inexact. However, they provide compelling advantages over floats, including larger dynamic range, higher accuracy, better closure, bitwise identical results across systems, simpler hardware, and simpler exception handling. Posits never overflow to infinity or underflow to zero, and “Nota-Number”
(NaN) indicates an action instead of a bit pattern. A posit processing unit takes less circuitry than an IEEE float FPU. With lower power use and smaller silicon footprint, the posit operations per second (POPS) supported by a chip can be significantly higher than the FLOPS
using similar hardware resources. GPU accelerators and Deep Learning processors, in particular, can do more per watt and per dollar with posits, yet deliver superior answer quality.
A comprehensive series of benchmarks compares floats and posits for decimals of accuracy produced for a set precision. Low precision posits provide a better solution than “approximate computing” methods that try to tolerate decreased answer quality. High precision posits provide more correct decimals than floats of the same size; in some cases, a 32-bit posit may safely replace a 64-bit float. In other words, posits beat floats at their own game.
Gustafson slides: http://arith23.gforge.inria.fr/slides/Gustafson.pdf
Kahan slides: http://arith23.gforge.inria.fr/slides/Kahan.pdf
Full transcription of debate by John Gustafson: http://www.johngustafson.net/pdfs/DebateTranscription.pdf