Archive for the 'Coldfusion' Category

4.2.2 ACCURACY OF FLOATING POINT ARITHMETIC 217 As (Professional web hosting)

Saturday, May 31st, 2008

4.2.2 ACCURACY OF FLOATING POINT ARITHMETIC 217 As an example of typical error-estimation procedures, let us consider the associative law for multiplication. Exercise 3 shows that (U @ U) @ w is not in general equal to u @ (V @ w); but the situation in this case is much better than it was with respect to the associative law of addition (1) and the distributive law (12). In fact, we have (u El v) @I w = ((UW + h,) @ w = UN1 + a1 + (521, uLB(v @WI = u@((vw)P+63))= uvw(l+63)(1+~4), for some 61, 62, Ss, 64, provided that no exponent underflow or overflow occurs, where I& < $b -p for each j. Hence (u @ v) @ w cl+ m1+ 62) u @(v 020) = (1+ 63)(1+ 64) = l+ 6T where 16) < 2PP/(l -&bl-q2. (19) The number bl-p occurs so often in such analyses, it has been given a special name, one ulp, meaning one unit in the last place of the fraction part. Floating point operations are correct to within half an ulp, and the calculation of uvw by two floating point multiplications will be correct within about one ulp (ignoring second-order terms). Hence the associative law for multiplication holds to within about two ulps of relative error. We have shown that (u @ v) @ w is approximately equal to u @ (v @ w), except when exponent overflow or underflow is a problem. It is worthwhile to study this intuitive idea of being approximately equal in more detail; can we make such a statement more precise in a reasonable way? A programmer using floating point arithmetic almost never wants to test if two computed values are exactly equal to each other (or at least he hardly ever should try to do so), because this is an extremely improbable occurrence. For example, if a recurrence relation x,+1 = f(d is being used, where the theory in some textbook says that x, approaches a limit as n + 00, it is usually a mistake to wait until x,+1 = xn for some n, since the sequence zn might be periodic with a longer period due to the rounding of intermediate results. The proper procedure is to wait until Ix,+1 -x, 1 < 6, for some suitably chosen number 6; but since we don t necessarily know the order of magnitude of x, in advance, it is even better to wait until I%+1 -%I I +kl; (20) now t is a number that is much easier to select. This relation (20) is another way of saying that x,+1 and x, are approximately equal; and our discussion

216 ARITHMETIC (Disney web site) 42.2 On the other hand we

Friday, May 30th, 2008

216 ARITHMETIC 42.2 On the other hand we do have b @ (w $ w) = (b @I v) $ (b @ w), when b is the floating point radix, since round( bz) = bround(z). (13) (Strictly speaking, the identities and inequalities we are considering in this section implicitly assume that exponent underflow and overflow do not occur. The function round(z) is undefined when 1×1 is too small or too large, and equations such as (13) hold only when both sides are defined.) The failure of Cauchy s fundamental inequality (x:T+. . * + &(Yi + . . . + Yi, 2 (QYl+ *. . + %Y?-J2 is another important example of the breakdown of traditional algebra in the presence of floating point arithmetic. Exercise 7 shows that Cauchy s inequality can fail even in the simple case n = 2, zr = 22 = 1. Novice programmers who calculate the standard deviation of some observations by using the textbook formula (14 often find themselves taking the square root of a negative number! A much better way to calculate means and standard deviations with floating point arithmetic is to use the recurrence formulas Ml = Xl, Mk = Mk-1 $ (xk 8 Mk-1) 8 k, (15) Sl = 0, Sk = Sk-1 $ (zk 8 Mk-1) 8 (xk 8 Mk)r (16) for 2 5 k 5 n, where u = dm. [Cf. B. P. Welford, Technometrics 4 (1962), 419-420.1 With this method S, can never be negative, and we avoid other serious problems encountered by the naive method of accumulating sums, as shown in exercise 16. (See exercise 19 for a summation technique that provides an even better guarantee on the accuracy.) Although algebraic laws do not always hold exactly, we can often show that they aren t too far off base. When be- 2 z < b we have round(s) = z + p(s), where [p(z)1 5 4be-P; hence round(s) = ~(1 + 6(z)), (17) where the relative error is bounded independently of 2: 16(x)1 I $/(bl-p + $) < &b -? (18) We can use this inequality to estimate the relative error of normalized floating point calculations in a simple way, since u $ Y = (u + w)(l + 6(u + v)), etc.

Web site designers - 4.2.2 ACCURACY OF FLOATING POINT ARITHMETIC 215 Identities

Thursday, May 29th, 2008

4.2.2 ACCURACY OF FLOATING POINT ARITHMETIC 215 Identities (2) to (6) are easily deduced from the algorithms in Section 4.2.1. The following rule is slightly less obvious: if U 0, then u@w 5 v@w and u@w 5 v@w and w@u 2 w@v. If u $ v = u + v, then (u $ v) 8 v = u; and if u @ v = u X v # 0, then (u @ V) @ w = u. We see that a good deal of regularity is present in spite of the inexactness of the floating point operations, when things have been defined properly. Several familiar rules of algebra are still, of course, conspicuously absent from the collection of identities above; the associative law for floating point multiplication is not strictly true, as shown in exercise 3, and the distributive law between @ and $ can fail rather badly: Let u = 20000.000, v = -6.0000000, and w = 6.0000003; then (u @ v) $ (u @ w) = -120000.00 $120000.01 = .010000000 u @I (v $ w) = 20000.000 @ .00000030000000 = .0060000000 so ZJ. 63 (v $4 # (u 63 4 GE b 63 4. (14

214 ARITHMETIC 4.2.2 One of the consequences of

Thursday, May 29th, 2008

214 ARITHMETIC 4.2.2 One of the consequences of the possible unreliability of floating point addi- tion is that the associative law breaks down: (u $ VI CD w z u G3 (v @WI, for many u, v, w. (1) For example, (11111113. $ -11111111.) $7.5111111 = 2.0000000 $7.5111111 ZZ 9.5111111; 11111113. $ (-11111111. $7.5111111) = 11111113. $ -11111103. = 10.000000. (AI1 examples in this section are given in eight-digit floating decimal arithmetic, with exponents indicated by an explicit decimal point. Recall that, as in Section 4.2.1, the symbols $, 0, @, @ are used to stand for floating point operations corresponding to the exact operations +, -, X, /.) In view of the failure of the associative law, the comment of Mrs. La Touche that appears at the beginning of this chapter [taken from Math. Gazette 12 (1924), 951 makes a good deal of sense with respect to floating point arithmetic. Mathematical notations like ai -j- uz + us or xi < k..n ok are inherently based upon the assumption of associativity, so a programmer must be especially careful that he does not implicitly assume the validity of the associative law. A. An axiomatic approach. Although the associative law is not valid, the commutative law U@z)=V@U (2) does hold, and this law can be a valuable conceptual asset in programming and in the analysis of programs. This example suggests that we should look for important laws that are satified by @, 8, @, and 0; it is not unreasonable to say that floating point routines should be designed to preserve as many of the ordinary mathematical laws as possible. If more axioms are valid, it becomes easier to write good programs, and programs also become more portable from machine to machine. Let us therefore consider some of the other basic laws that are valid for normalized floating point operations as described in the previous section. First we have u~v=u@–v; (3) -(u$v) = -u$-ZJ; (4) u@?J=O if and only if v=-U 7 (5) u$O=u. (6) From these laws we can derive further identities; for example (exercise 1), u 0 ZJ= -(w 8 u) . (7)

Photo web hosting - 42.2 ACCURACY OF FLOATING POINT ARITHMETIC 213 Round

Wednesday, May 28th, 2008

42.2 ACCURACY OF FLOATING POINT ARITHMETIC 213 Round numbers are always false. -SAMUEL JOHNSON (1750) I shall speak in round numbers, not absolutely accurate, yet not so wide from truth as to vary the result materially. -THOMAS JEFFERSON (1824) 19. [,%$I What is the running time for the FADD subroutine in Program A, in terms of relevant characteristics of the data? What is the maximum running time, over all inputs that do not cause overflow or underflow? 4.2.2. Accuracy of Floating Point Arithmetic Floating point computation is by nature inexact, and it is not difficult to misuse it so that the computed answers consist almost entirely of noise. One of the principal problems of numerical analysis is to determine how accurate the results of certain numerical methods will be. A credibility-gap problem is involved here: we don t know how much of the computer s answers to believe. Novice computer users solve this problem by implicitly trusting in the computer as an infallible authority; they tend to believe that all digits of a printed answer are significant. Disillusioned computer users have just the opposite approach, they are constantly afraid that their answers are almost meaningless. Many a serious mathematician has attempted to give rigorous analyses of a sequence of floating point operations, but has found the task to be so formidable that he has tried to content himself with plausibility arguments instead. A thorough examination of error analysis techniques is, of course, beyond the scope of this book, but in this section we shall study some of the characteristics of floating point arithmetic errors. Our goal is to discover how to perform floating point arithmetic in such a way that reasonable analyses of error propagation are facilitated as much as possible. A rough (but reasonably useful) way to express the behavior of floating point arithmetic can be based on the concept of significant figures or relative error. If we are representing an exact real number II: inside a computer by using the approximation 2 = ~(1 + E), the quantity E = (? -X)/X is called the relative error of approximation. Roughly speaking, the operations of floating point multiplication and division do not magnify the relative error by very much; but floating point subtraction of nearly equal quantities (and floating point addition, u $ 21, where u is nearly equal to -V) can very greatly increase the relative error. So we have a general rule of thumb, that a substantial loss of accuracy is expected from such additions and subtractions, but not from multiplications and divisions. On the other hand, the situation is somewhat paradoxical and needs to be understood properly, since bad additions and subtractions are performed with perfect accuracy! (See exercise 25.)

212 ARITHMETIC 4.2.1 (Web server version) b 11. [MZO] Give an

Tuesday, May 27th, 2008

212 ARITHMETIC 4.2.1 b 11. [MZO] Give an example of normalized, excess 50, eight-digit floating decimal numbers u and w for which rounding overflow occurs in multiplication. 12. [i-vi%] Prove that rounding overflow cannot occur during the normalization phase of floating point division. 13. [SO] When doing interval arithmetic we don t want to round the results of a floating point computation; we want rather to implement operations such as v and A, which give the tightest possible representable bounds on the true sum: How should the algorithms of this section be modified for such a purpose? 14. [~?5] Write a MIX subroutine that begins with an arbitrary floating point number in register A, not necessarily normalized, and converts it to the nearest fixed point integer (or determines that the number is too large in absolute value to make such a conversion possible). b 15. [B] Write a MIX subroutine, to be used in connection with the other subroutines of this section, that calculates u @ 1, that is, u -[u] rounded to nearest floating point number, given a floating point number U. Note that when u is a very small negative number, u @ 1 will be rounded so that the result is unity (even though u mod 1 has been defined to be always less than unity, as a real number). 16. [HM.%?l] (Robert L. Smith.). Design an algorithm to compute the real and imagi- nary parts of the complex number (a + bi)/(c + di), g iven real floating point values a, b, c, and d. Avoid the computation of c2 + d2, since it would cause floating point overflow even when ]c] or Id] is approximately the square root of the maximum allowable floating point value. 17. [40] (John Cocke.) Explore the idea of extending the range of floating point numbers by defining a single-word representation in which the precision of the fraction decreases as the magnitude of the exponent increases. 18. [~5] Consider a binary computer with 36-bit words, on which positive floating binary numbers are represented as (Oerea esflfa . . . fzr)s; here (eiez . . . es)2 is an excess (1OOOOOOO)z exponent and (frfa . . f 27 2 is a 27-bit fraction. Negative floating ) point numbers are represented by the two s complement of the corresponding positive representation (see Section 4.1). Thus, 1.5 is .201]600000000 in octal notation, while -1.5 is 576],200000000; the octal representations of 1.0 and -1.0 are .201]400000000 and 576]400000000, respectively. (A vertical line is used here to show the boundary between exponent and fraction.) Note that bit fr of a normalized positive number is always 1, while it is almost always zero for negative numbers; the exceptional cases are representations of -2 k. Suppose that the exact result of a floating point operation has the octal code 572 I740000000 ] 01; this (negative) 33-bit fraction must be normalized and rounded to 27 bits. If we shift left until the leading fraction bit is zero, we get 576]000000000]~0, but this rounds to the illegal value 576]000000000; we have over-normalized, since the correct answer is 575]400000000. On the other hand if we start (in some other problem) with the value 572 ] 740000000]05 and stop before over-normalizing it, we get 575 ]400000000 ] 50, which rounds to the unnormalized number 575 ]4OOOOOOOl; sub- sequent normalization yields 576 ] OOOOOOOOZ while the correct answer is 576 ]OOOOOOOOl. Give a simple, correct rounding rule that resolves this dilemma on such a machine (without abandoning two s complement notation).

4.2.1 SINGLE-PRECISION CALCULATIONS 211 the state of the

Tuesday, May 27th, 2008

4.2.1 SINGLE-PRECISION CALCULATIONS 211 the state of the floating point art as of 1980; these carefully considered procedures will probably be published some day. Additional references, which deal primarily with the accuracy of floating point methods, are given in Section 4.2.2. EXERCISES 1. [IO] How would Avogadro s number and Planck s constant be represented in base 100, excess 50, four-digit floating point notation? (This would be the representation used by MIX, as in (4), if the byte size is 100.) 2. [l.Z] Assume that the exponent e is constrained to lie in the range 0 5 e 5 E; what are the largest and smallest positive values that can be written as base b, excess g, p-digit floating point numbers? What are the largest and smallest positive values that can be written as normalized floating point numbers with these specifications? 3. [II] (K. Zuse, 1936.) Show that if we are using normalized floating binary arithmetic, there is a way to increase the precision slightly without loss of memory space: A pbit fraction part can be represented using only p -1 bit positions of a computer word, if the range of exponent values is decreased very slightly. b 4. [1,9] Assume that b = 10, p = 8. What result does Algorithm A give for (50, $.98765432) $ (49, +.33333333)? For (53, -.99987654) @ (54, +.lOOOOOOO)? For (45, -.50000001) $ (54, +.10000000)? b 5. [24] Let us say that z -y (with respect to a given radix b) if z and y are real numbers satisfying the following conditions: lx/b1 = LylbJ; zmodb=O iff ymodb=O; 0 < xmodb < $b iff 0 < ymodb < ib; xmodb = $b iff ymodb = $b; +b < xmodb < b iff $b < ymodb < b. Prove that if fV is replaced by b-p-2F, between steps A5 and A6 of Algorithm A, where F,, -bPf2fv, the result of that algorithm will be unchanged. (If F, is an integer and b is even, this operation essentially truncates fv to p + 2 places while remembering whether any nonzero digits have been dropped, thereby minimizing the length of register that is needed for the addition in step A6.) 6. [zoo] If the result of a FADD instruction is zero, what will be the sign of rA, according to the definitions of MIX s floating point attachment given in this section? 7. [.27] Discuss floating point arithmetic using balanced ternary notation. 8. [zoo] Give examples of normalized eight-digit floating decimal numbers ZL and w for which addition yields (a) exponent underflow, (b) exponent overflow, assuming that exponents must satisfy 0 5 e < 100. 9. [M.Z4] (W. M. Kahan.) Assume that the occurrence of exponent underflow causes the result to be replaced by zero, with no error indication given. Using excess zero, eight-digit floating decimal numbers with e in the range -50 5 e < 50, find positive values of a, b, c, d, and y such that (11) holds. 10. [I.%?] Give an example of normalized eight-digit floating decimal numbers 2~ and v for which rounding overflow occurs in addition,

Free web hosting music - 210 ARITHMETIC 4.2.1 The use of floating binary

Monday, May 26th, 2008

210 ARITHMETIC 4.2.1 The use of floating binary arithmetic was seriously considered in 1944-1946 by researchers at the Moore School in their plans for the first electronic digital computers, but it turned out to be much harder to implement floating point circuitry with tubes than with relays. The group realized that scaling was a problem in programming; but at the time it was only a very small part of a total programming job, and it seemed to be worth the time and trouble it took, since it tended to keep a programmer aware of the numerical accuracy he was getting. Furthermore, they argued that floating point representation would take up valuable memory space, since the exponents must be stored, and that it would be difficult to adapt floating point arithmetic to multiple-precision calculations. [See von Neumann s Collected Works 5 (New York: Macmillan, 1963), 43, 73-74.1 At this time, of course, they were designing the first stored- program computer and the second electronic computer, and their choice had to be either fixed point or floating point arithmetic, not both. They anticipated the coding of floating binary routines, and in fact shift left and shift right instructions were put into their machine primarily to make such routines more efficient. The first machine to have both kinds of arithmetic in its hardware was apparently a computer developed at General Electric Company [see Proc. 2nd Symp. Large-Scale Digital Calculating Machinery (Cambridge: Harvard University Press, 1951), 65-691. Floating point subroutines and interpretive systems for early machines were coded by D. J. Wheeler and others, and the first publication of such routines was in The Preparation of Programs for an Electronic Digital Computer by Wilkes, Wheeler, and Gill (Reading, Mass.: Addison-Wesley, 1951), subroutines Al-All, pp. 35-37, 105-117. It is interesting to note that floating decimal subroutines are described here, although a binary computer was being used; in other words, the numbers were represented as lO f, not aef, and therefore the scaling operations required multiplication or division by 10. On this particular machine such decimal scaling was about as easy as shifting, and the decimal approach greatly simplified input/output conversions. Most published references to the details of floating point arithmetic routines are scattered in technical memorandums distributed by various computer man- ufacturers, but there have been occasional appearances of these routines in the open literature. Besides the reference above, the following are of historical interest: R. H. Stark and D. B. MacMillan, Math. Comp. 5 (1951), 86-92, where a plugboard-wired program is described; D. McCracken, Digital Computer Pro- gramming (New York: Wiley, 1957) 121-131; J. W. Carr III, CACM 2,5 (May 1959), 10-15; W. G. Wadey, JACM 7 (1960), 129-139; D. E. Knuth, JACM 8 (1961), 119-128; 0. Kesner, CACM 5 (1962) 269-271; F. P. Brooks and K. E. Iverson, Automatic Data Processing (New York: Wiley, 1963), 184-199. For a discussion of floating point arithmetic from a computer designer s standpoint, see Floating point operation by S. G. Campbell, in Planning a computer System, ed. by W. Buchholz (New York: McGraw-Hill, 1962), 92-121. A set of algorithms by J. Coonen, W. M. Kahan, and H. S. Stone, submitted to the IEEE Micro- processor Floating-Point Standards Committee during 1978-1980, represented

Web host 4 life - 4.2.1 SINGLE-PRECISION CALCULATIONS 209 Similarly, suppose MIX had

Sunday, May 25th, 2008

4.2.1 SINGLE-PRECISION CALCULATIONS 209 Similarly, suppose MIX had a FADD operation but not FIX. If we wanted to round a number u from floating point form to the nearest fixed point integer, and if we knew that the number was nonnegative and would fit in at most three bytes, we could write FADD FUDGE where location FUDGE contains the constant + Q+4 1 0 0 0 ; the result in rA would be + Q+4 I iz$j-. (13) I 1 D. History and bibliography. The origins of floating point notation can be traced back to Babylonian mathematicians (1800 B.C. or earlier), who made extensive use of radix-60 floating point arithmetic but did not have a notation for the exponents. The appropriate exponent was always somehow understood by the man doing the calculations. At least one case has been found in which the wrong answer was given because addition was performed with improper alignment of the operands, but such examples are very rare; see 0. Neugebauer, The Exact Sciences in Antiquity (Princeton, N. J.: Princeton University Press, 1952), 26-27. Another early contribution to floating point notation is due to the Greek mathematician Apollonius (3rd century B.C.), who apparently was the first to explain how to simplify multiplication by collecting powers of 10 separately from their coefficients, at least in simple cases. [For a discussion of Apollonius s method, see Pappus, Mathematical Collections (4th century A.D.).] After the Babylonian civilization died out, the first significant uses of floating point notation for products and quotients did not emerge until much later, about the time logarithms were invented (1600) and shortly afterwards when Oughtred invented the slide rule (1630). The modern notation z~ for exponents was being introduced at about the same time; separate symbols for x squared, x cubed, etc., had been in use before this. Floating point arithmetic was incorporated into the design of some of the ear- liest computers. It was independently proposed by Leonardo Torres y Quevedo in Madrid, 1914; by Konrad Zuse in Berlin, 1936; and by George Stibitz in New Jersey, 1939. Zuse s machines used a floating binary representation that he called semi-logarithmic notation ; he also incorporated conventions for dealing with special quantities like 00 and undefined. The first American computers to operate with floating point arithmetic hardware were the Bell Laboratories Model V and the Harvard Mark II, both of which were relay calculators designed in 1944. [See B. Randell, The Origins of Digital Computers (Berlin: Springer, 1973), 100, 155, 163-164, 259-260; Proc. Symp. Large-Scale Digital Calculating Machinery (Harvard, 194i ), 41-68, 69-79; Datamation 13 (April 1967), 35-44 (May 196i ), 45-49; Zeit. fiir angew. Math. und Physik 1 (1950), 345-346.1

208 ARITHMETIC 4.2.1 The MIX computer, (Com web hosting) which is

Sunday, May 25th, 2008

208 ARITHMETIC 4.2.1 The MIX computer, which is being used as an example of a typical machine in this series of books, has an optional floating point attachment (available at extra cost) that includes the following seven operations: l FADD, FSUB, FMUL, FDIV, FLOT, FCMP (C = 1, 2, 3, 4, 5, 56, respectively; F = 6). The contents of rA after the operation FADD V” are precisely the same as the contents of rA after the operations STA ACC LDA V JMP FADD where FADD is the subroutine that appears earlier in this section, except that both operands are automatically normalized before entry to the subroutine if they are not already in normalized form. (If exponent underflow occurs during this pre- normalization, but not during the normalization of the answer, no underflow is signalled.) Similar remarks apply to FSUB, FMUL, and FDIV. The contents of rA after the operation FLOT” are the contents after JMP FLOT” in the subroutine (10) above. The contents of rA are unchanged by the operation FCMP V”; this in- struction sets the comparison indicator to less, equal, or greater, depending on whether the contents of rA are definitely less than, approximately equal to, or definitely greater than V; this subject is discussed in the next section, and the precise action is defined by the subroutine FCMP of exercise 4.2.2-17 with EPSILON in location 0. No register other than rA is affected by any of the floating point operations. If exponent overflow or underflow occurs, the overflow toggle is turned on and the exponent of the answer is given modulo the byte size. Division by zero leaves undefined garbage in rA. Execution times: 4u, 4u, 9u, llu, 3u, 421, respectively. l FIX (C = 5; F = 7). The contents of rA are replaced by the integer round(rA) , rounding to the nearest integer as in step N5 of Algorithm N. However, if this answer is too large to fit in the register, the overflow toggle is set on and the result is undefined. Execution time: 3~. Sometimes it is helpful to use floating point operators in a nonstandard way. For example, if the operation FLOT had not been included as part of MIX s floating point attachment, we could easily achieve its effect on 4-byte numbers by writing FLOT STJ 9F SLA 1 ENTX Q+4 SRC I (12) FADD =0= 9H JMP * I This routine is not strictly equivalent to the FLOT operator, since it assumes that the 1:l byte of rA is zero, and it destroys rX. The handling of more general situations is a little tricky, because rounding overflow can occur even during a FLOT operation.