View Latest Questions

Floating-point
Posted on: July 26, 2006 at 12:00 AM
Floating-point numbers are like real numbers in mathematics, for example, 3.14159, -0.000001.

# Floating-point

Floating-point numbers are like real numbers in mathematics, for example, 3.14159, -0.000001. Java has two kinds of floating-point numbers: `float` and `double`, both stored in IEEE-754 format. The default type when you write a floating-point literal is `double`.

## Java floating-point types

 type Size Range Precision name bytes bits approximate in decimal digits float 4 32 +/- 3.4 * 1038 6-7 double 8 64 +/- 1.8 * 10308 15

## Limited precision

Because there are only a limited number of bits in each floating-point type, some numbers are inexact, just as the decimal system can not represent some numbers exactly, for example 1/3. The most troublesome of these is that 1/10 can not be represented exactly in binary.

## Floating-point literals

There are two types of notation for floating-point numbers. Any of these numbers can be followed by "F" (or "f") to make it a `float` instead of the default `double`.

• Standard (American) notation which is a series of digits for the integer part followed by a decimal point followed by a series of digits for the fraction part. Eg, 3.14159 is a `double`. A sign (+ or -) may precede the number.
• Scientific notation which is a standard floating-point literal followed by the letter "E" (or "e") followed by an optionally signed exponent of 10 which is used as a multiplier (ie, how to shift the decimal point). Generally scientific notation is used only for very large or small numbers.
ScientificStandard
1.2345e5123450.0
1.2345e+5123450.0
1.2345e-50.000012345

## Infinity and NaN

No exceptions are generated by floating-point operations. Instead of an interruption in execution, the result of an operation may be positive infinity, negative infinity, or NaN (not a number). Division by zero or overflow produce infinity. Subtracting two infinities produces a NaN. Use methods in the wrapper classes (Float or Double) to test for these values.

## References

Copyleft 2005 Fred Swartz MIT License