Technical Programme

13 - 14 March 2019
Level 3, Room 329
Suntec Singapore Convention & Exhibition Centre
Co-located with Supercomputing Asia 2019
Click here to view Abstract,
here for videos.

13 March 2019

TIME	TITLE	SPEAKER (Affiliation)
9:00 AM - 9:10 AM	Welcome	John Gustafson ASTAR Singapore, NUS* [Slides]
9:10 AM - 10:00 AM	Keynote: POSITs and Computing at 200 PB/sec for the SKA Telescope	Peter Braam Oxford University [Abstract] \| [Slides] \| [Video]
10:00 AM - 10:30 AM	Paper: Universal Coding of the Reals using Bisection	Peter Lindstrom Lawrence Livermore National Laboratory [Abstract] \| [Slides] \| [Video]
Tea Break (Posit implementation demos)
Session Chair: Peter Lindstrom, Lawrence Livermore National Laboratory
11:00 AM - 11:30 AM	Paper: Posits as an alternative to floats for weather and climate models	Milan Klöwer University of Oxford [Abstract] \| [Slides] \| [Video]
11:30 AM - 12:00 PM	Paper: SMURF: Scalar Multiple-precision Unum Risc-V Floating-point Accelerator for Scientific Computing	Andrea Bocco CEA-LETI [Abstract] \| [Slides] \| [Video]
12:00 PM - 12:30 PM	Paper: Sinking-Point: Dynamic precision tracking for floating-point	Bill Zorn University of Washington [Abstract] \| [Slides] \| [Video]
Lunch (Posit implementation demos)
1:30 PM - 2:30 PM	Keynote: Towards Automated Floating-point Tools for End Users	Zach Tatlock University of Washington [Abstract] \| [Slides] \| [Video]
2:30 PM - 3:30 PM	Tutorial #1: Beginner	John Gustafson, Leong Siew Hoon (Cerlane) ASTAR Singapore*
Tea Break (Posit implementation demos)
4:00 PM - 6:00 PM	Tutorial #1: Beginner	John Gustafson, Leong Siew Hoon (Cerlane) ASTAR Singapore* [Slides] \| [Tutorial]

14 March 2019

TIME	TITLE	SPEAKER (Affiliation)
9:00 AM - 9:50 AM	Keynote: Approximate floating point for AI and beyond	Jeff Johnson Facebook AI Research [Abstract] \| [Slides] \| [Video]
9:50 AM - 10:00 AM	NGA Working Group	Chair: John Gustafson ASTAR Singapore, NUS*
Tea Break (Posits implementation demos)
Session Chair: Gerd Bohlender, Karlsruhe Institute of Technology (retired)
10:30 AM - 11:00 AM	Paper: Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks	Dhireesha Kudithipudi Rochester Institute of Technology [Abstract] \| [Slides] \| [Video]
11:00 AM - 11:30 AM	Paper: An Accelerator for Posit Arithmetic Targeting Posit Level 1 BLAS Routines and Pair-HMM	Johan Peltenburg Delft University of Technology [Abstract] \| [Slides] \| [Video]
11:30 AM - 12:00 PM	Paper: Posits: the good, the bad and the ugly	Florent de Dinechin University Lyon [Abstract] \| [Slides] \| [Video]
Lunch (Posits implementation demos)
1:00 PM - 2:00 PM	Panel Discussion: Switching from Floats to Posits: Is it Worth the Pain?	Chair: John Gustafson Panelists: Peter Lindstrom, Richard Murphy, Florent de Dinechin, Leong Siew Hoon (Cerlane)
2:00 PM - 3:30 PM	Tutorial #2: Intermediate/Advanced	John Gustafson, Leong Siew Hoon (Cerlane) ASTAR Singapore*
Tea Break (Posits implementation demos)
4:00 PM - 6:00 PM	Tutorial #2: Intermediate/Advanced	John Gustafson, Leong Siew Hoon (Cerlane) ASTAR Singapore* [Slides] \| [Tutorial]

Abstract

Keynote: POSITs and Computing at 200 PB/sec for the SKA Telescope

Peter Braam - Oxford University
The SKA radio telescope will be a massive, world class scientific instrument, currently under design by a world wide consortium, to progress to full operation in South Africa and Australia in the mid 2020's. The capabilities of the telescope are expected to enable major scientific breakthroughs. At the center of its data processing sits the Science Data Processor, a large HPC system with specialized software. The work of the SDP design consortium elucidated many performance requirements combining scientific requirements, knowledge of hardware, and algorithms. The emerging computational challenge is very significant requiring 200 PB/sec. It is not impossible that a more modern approach with POSITs may have significant impact on this magnificent computing challenge.

Universal Coding of the Reals using Bisection

Peter Lindstrom - Lawrence Livermore
We propose a simple yet expressive framework for encoding any real number as a binary string based on bisecting intervals, starting with (–∞, +∞). Each bit of such a string represents the outcome of a binary comparison with a value contained in the interval. Our framework draws upon ideas from unbounded and binary search, and requires only the specification of two functions: a generator for producing a monotonic sequence that brackets the number being encoded, and a refinement operator that computes an "average" of two finite numbers. Our framework is flexible enough to support many well-known representations, including POSITS, ELIAS codes, logarithmic number systems, and a slightly modified version of IEEE floating point. We show that the associated generators are simple expressions given by hyperoperators. Moreover, the generality of our approach allows for exponent-less number systems based on Fibonacci and the other sequences. We further analyze the probability densities associated with known and new encoding schemes and show how a compatible refinement operator and density can be derived from the bracketing sequence. This gives an almost everywhere smooth mapping between bit strings and real numbers and shows, for instance that POSITS follow a Pareto distribution. Contributions of our work include new insights into existing number sequences, suggestions for how to build new, perhaps exponent-less number systems, a method for designing number systems that match a desired density, and a much simpler and verifiably correct implementation of existing representations using as few as two lines of code.

Posits as an alternative to floats for weather and climate models

Milan Klöwer, Tim Palmer - University of Oxford,
Peter Dueban - European Centre for Medium-Range Weather Forecasts
Posit numbers, a recently proposed alternative to floating-point numbers, claim to have smaller arithmetic rounding errors in many applications. By study weather and climate models of low and medium complexity (the Lorenz system and a shallow water model) we present benefits of posits compared to floats at 16 bit. As a standardised posit processor does not exist yet, we emulate posit arithmetic on a conventional CPU. Using a shallow water model, forecasts based on 16-bit posits with 1 or 2 exponent bits are clearly more accurate than half precision floats. We therefore proposed 16 bit posits with 2 exponent bits as a standard posit format, as its wide dynamic range of 32 orders of magnitude provides a great potential for many weather and climate models. Although the focus is on geophysical fluid simulations, the results are also meaningful and promising for reduced preicision posit arithmetic in the wider field of computational fluid dynamics.

SMURF: Scalar Multiple-precision Unum Risc-V Floating-point Accelerator for Scientific Computing

Andrea Bocco, Yves Durand - CEA-LETI,
Florent de Dinechin - University Lyon
This paper proposes an innovative Floating Point (FP) architecture for Variable Precision (VP) computation suitable for high precision FP computing, based on a refined version of the UNUM type I format. This architecture supports VP FP intervals where each interval endpoint can have up to 512 bits of mantissa. The proposed hardware architecture is pipelined with an internal word-size of 64 bits. Computations on longer mantissas are done by iteratively on the existing hardware. The prototype is integrated in a RISCV environment, it is exposed to the user through an instruction set extension. The paper we provide an example of software usage. The system has been prototyped on a FPGA (Field-Programmable Gate Array) platform and also synthesized for ASIC in 28nm FDSOI technology. The respective working frequency of FPGA and ASIC implementations were 50MHz and 600MHz respectively. The estimated chip area is 1.5m square and the estimated power consumption is 95mW. The flops performance of this architecture is comparable to a regular fixed-precision IEEE FPU while enabling arbitrary precision computiation at reasonable cost.

Sinking-Point: Dynamic precision tracking for floating-point

Bill Zorn, Dan Grossman, Zachary Tatlock - University of Washington
We present sinking-point, a floating-point-like number system that tracks precision dynamically though computations. With existing floating-point number systems, such as the venerable IEEE 754 standard, numerical results do not inherently contain any information about their precision or accuracy; to determine if a result is numerically accurate, a separate analysis must be performed. By contrast, sinking-point records the precision of each intermediate value and result computed, so highly imprecise results can be identified immediately. Compared to IEEE 754 floating-point, sinking-point's representation requires only a few additional bits of storage, and computations require only a few additional bitwise operations. Sinking-point is fully generalizable, and can be extended to provide dynamic error tracking for nearly any digital number system, including posits.

Keynote: Towards Automated Floating-point Tools for End Users

Zach Tatlock - University of Washington
Engineering and scientific computer programs follow mathematical models described by real numbers, but use floating point arithmetic internally. The two number systems can give rather different results, and when that happens programmers are often ill-equipped to improve the accuracy of their code. Experts in numerical methods fix these problems by rearranging computations, but acquiring that expertise takes years. Our team at the University of Washington and UC San Diego has been working to bridge this gap with two tools: (1) Herbgrind, a tool that dynamically analyzes program binaries to identify rounding error that significantly affects program outputs, and (2) Herbie, a tool that automatically improves the accuracy of floating point expressions using heuristic search, series expansions, and regime inference.

We've used these tools on expressions found everywhere from textbooks to large-scale surveys of open-source software, and consistently find good results, leading to users at NASA and NIST labs integrating our tools into their workflow. I'll describe the design and implementation of Herbgrind and Herbie and highlight some recent extensions to these tools that get us closer to the ultimate goal of lowering the barrier to entry for non-experts to effectively write accurate numerical codes.

See here and here for more.

Keynote: Approximate floating point for AI and beyond

Jeff Johnson - Facebook AI Research
Leveraging neural network tolerance to many sources of error and quantization noise has been the subject of much recent research. However, this spirit of experimentation with approximate computation has not extended far beyond NN inference. There is potential for many new approximate arithmetics with applications beyond machine learning, by combining old ideas from non-integer, logarithmic and multiple-base number systems, stochastic computing and approximate digital filters, together with new quantization techniques like Gustafson's posit.

We present an approximate log-linear arithmetic that combines aspects of logarithmic number systems (LNS) and traditional log-linear floating point, deriving benefits from both. It is widely applicable, has configurable accuracy and can be more accurate than traditional floating point with substantial energy efficiency savings: up to 3x over bfloat16 fused multiply-add at 28 nm, with near-equivalent precision and dynamic range. Trading multiplications for additions, as in Winograd's inner product (1968) and Horner's method for polynomials, are not useful in this arithmetic, and some computation restructuring is required. The design is applicable to higher precision domains for computer graphics and ML training, and hardware structures to enable this will be considered. When combined with posit-type codes and mixing representation domain as well as precision, a very wide range of accuracy and efficiency trade-offs can be realized.

Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks

Zachariah Carmichael, Hamed Langroudi, Char Khazanov, Jeffrey Lillie, Dhireesha Kudithipudi - Rochester Institute of Technology,
John Gustafson - National University of Singapore
Deep neural networks (DNNs) have been demonstrated as effective prognostic models across various domains e.g. natural language processing, computer vision, and genomics. However, modern-day DNNs demand high compute and memory storage for executing reasonably complex tasks. To optimize the inference time and alleviate the power consumption of these networks, DNN accelerators with low-precision representations of data and DNN parameters are being actively studied. An interesting research question is in how low-precision networks be ported to edge-devices with similar performance as high-precision networks. In this work, we employ the fixed-point, floating point and posit numerical formats at ≤ 8-bit precision within a DNN accelerator. Deep Positron with exact multiply-and-accumulate (EMAC) units for inference. A unified analysis quantifies the trade-offs between overall network efficiency and performance across five classifications tasks. Our results indicate that posits are natural fit for DNN inference, outperforming at ≤ 8-bit precision, and can be realised with competitive resource requirements relative to those of floating points.

An Accelerator for Posit Arithmetic Targeting Posit Level 1 BLAS Routines and Pair-HMM

Laurens van Dam, Johan Peltenburg, Zaid Al-Ars - Delft University of Technology,
H. Peter Hofstee - IBM
he newly proposed posit number format uses a significantly different approach to representing floating point numbers. This paper introduces a framework for posit arithmetic in reconfigurable logic that maintains full precision in intermediate results. We present the design and implementation of a L1 BLAS arithmetic accelerator on posit vector leveraging Apache Arrow. For a vector dot product with an input vector length of 10^6 elements, a hardware speedup of approximate 10^4 is achieved as compared to posit software emulation. For 32-bit numbers, the decimal accuracy of the posit dot product results improve by one decimal of accuracy on average compared to a software implementation, and two extra decimals compared to the IEEE754 format. We also present a posit-based implementation of pair-HMM. In this case, the speedup vs. a posit-based software implementation ranges from 10^5 to 10^6. With appropriate initial scaling constants, accuracy improves on an implementation based on IEEE 754.

Posits: the good, the bad and the ugly

Florent de Dinechin, Luc Forget, Jean-Michel Muller - University Lyon,
Yohann Uguen, ENS
Many properties of the IEEE-754 floating-point number system are taken for granted in modern computers and are deeply embedded in compilers and low-level software routines such as elementary functions or BLAS. This article reviews such properties on the Posit number system. Some are still true. Some are no longer true, but sensible work-arounds are possible, and even represent exciting challenge for the community. Some represent a danger if Posits are to replace floating point completely. This study helps framing where posits are better than floating-point, where they are worse, what is the costs of posit hardware, and what tools are missing in the Posit landscape. For general-purpose computing, using Posits as a storage format only could be a way to reap their benefits without losing those of classical floating-point.