Web site development - 198 ARITHMETIC 4.2 4.2. FLOATING POINT ARITHMETIC IN

198 ARITHMETIC 4.2 4.2. FLOATING POINT ARITHMETIC IN THIS SECTION, we shall study the basic principles of doing arithmetic on floating point numbers, by analyzing the internal mechanisms underlying such calculations. Perhaps many readers will have little interest in this subject, since their computers either have built-in floating point instructions or their computer manufacturer has supplied suitable subroutines. But, in fact, the material of this section should not merely be the concern of computer-design engineers or of a small clique of people who write library subroutines for new machines; every well-rounded programmer ought to have a knowledge of what goes on during the elementary steps of floating point arithmetic. This subject is not at all as trivial as most people think; it involves a surprising amount of interesting information. 4.2.1. Single-Precision Calculations A. Floating point notation. We have discussed fixed point notation for numbers in Section 4.1; in such a case the programmer knows where the radix point is assumed to lie in the numbers he manipulates. For many purposes it is considerably more convenient to let the position of the radix point be dynamically variable or floating as a program is running, and to carry with each number an indication of its current radix point position. This idea has been used for many years in scientific calculations, especially for expressing very large numbers like Avogadro s number N = 6.02252 x 1023, or very small numbers like Planck s constant h = 1.0545 X 1O-27 erg sec. In this section we shall work with base b, excess q, Aoating point numbers with p digits: Such numbers will be represented by pairs of values (e, f), denoting (e, f) = f x bepq. 0) Here e is an integer having a specified range, and f is a signed fraction. We will adopt the convention that Ifl < 1; in other words, the radix point appears at the left of the positional representation of f. More precisely, the stipulation that we have pdigit numbers means that bpf is an integer, and that -bP < bPf < bP. (2) The term floating binary implies that b = 2, floating decimal implies b = 10, etc. Using excess-50 floating decimal numbers with 8 digits, we can write, for example, Avogadro s number N = (74, +.60225200) ; Planck s constant h = (24, +.10545000). (3) The two components e and f of a floating point number are called the exponent and the fraction parts, respectively. (Other names are occasionally

Leave a Reply