Signed and Unsigned Arithmetic Operations using java

The XDR standard defines signed integers as integer. A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295].

The signed integer is represented in twos complement notation. The most significant byte is 0 and the least significant is 3.

The unsigned integer is represented by an unsigned binary number whose most significant byte is 0; the least significant is 3. See the Signed Integer and Unsigned Integer.

This diagram shows the most significant byte on the left, which is byte 0. To the right of byte 0, is byte 1, followed by byte 2, and then byte 3 (the least significant byte). The length of the 4 bytes is 32 bits.

 The XDR standard also defines 64-bit (8-byte) numbers called signed and unsigned hyperinteger.

Their representations are extensions of signed integers and unsigned integers. Hyperintegers are represented in twos-complement notation. The most significant byte is 0 and the least significant is 7. See the Signed Hyperinteger and Unsigned Hyperinteger figure.


This diagram shows the most significant byte on the left which is byte 0. To the right of byte 0, is byte 1, followed by byte 2, and byte 3 continued up to byte 7 (the least significant byte). The length of the 8 bytes is 64 bits.

The XDR standard provides enumerations for describing subsets of integers.

XDR defines enumerations as an enum. Enumerations have the same representation as signed integers and are declared as follows: Encoding any integers as enum, besides those assigned in the enum declaration, causes an error condition.

Unsigned arithmetic in Java

In our introduction to unsigned arithmetic in Java, we began by comparing to a language such as C/C++. We saw that in C/C++ when you declare a variable as unsigned, the compiler will know to perform unsigned arithmetic on that variable. Arithmetic operations are operations such as addition and division, where we're not simply treating the number as a "raw string of bits", but where the bits have some "numerical" meaning. Perhaps less obviously, arithmetic operations include comparison. In our overview, we mentioned that to perform unsigned bitwise operations in Java— such as XOR, AND etc, where we do just treat the number as a series of bits— these are essentially the same whether or not we later treat the resulting integer is signed or unsigned. See the previous page's introduction to unsigned operations in C/C++ and Java for more details. On this page, we consider cases where implementing unsigned arithmetic correctly in Java is not so straightforward.

The reason that unsigned arithmetic is inherently tricky in Java is that in most cases, Java only "natively" caters to signed arithmetic: that is, it expects to interpret the top bit of an integer type (int, but also byteshort and long) as the sign bit. This is a departure from other languages, even including modern examples such as Swift, where the programmer generally has the option of specifying signed or unsigned values and arithmetic, even if signed is often the default.

So, what are the trickier cases that we must consider, and how do we deal with them in Java

32-bit unsigned arithmetic (and below)

If you are dealing purely with numbers that are 32 bits or smaller— i.e. the size of a Java int— then you can perform unsigned arithmetic reliably by using a long to perform the arithmetic. Only the top bit of a long is interpreted as the sign bit. So provided you keep to the bottom bits, you will effectively be performing unsigned arithmetic. We just need to be careful of two things:

  • To cast a signed int to an unsigned value in the bottom 32 bits of a long, we need to AND with 0xffffffffL (232-1— or the number with all 32 bottom bits set).
  • After an operation on longs that we're treating as unsigned ints, we should either cast back to an int or and with 0xffffffffL if we want to keep the (unsigned) value in a long.

Wherever the result of an operation could be larger than the bottom 32 bits, we "chop off" the top 32 bits of the long. We can chop them off by casting back to an int, or ANDing with 0xffffffffL if we want to keep the result as a long, e.g. for further arithmetic. For example, here is one way to perform an unsigned 32-bit division in Java:

64-bit unsigned arithmetic

Performing 64-bit unsigned arithmetic in Java is a bit more tricky because there's no "next size up" primitive that you can use. We need to watch out for "special cases" and "manual" to interpret the sign bit as magnitude. In general:

  • unsigned additionsubtraction, and multiplication are identical: we can just use the ordinary +- and * operators;
  • unsigned comparison and unsigned division need a special code.

The idea of unsigned addition/subtraction being identical to their signed counterparts may seem counterintuitive, but numbers are deliberately stored in a form (called "two's complement"), that allows them to be identical at the bit level.

Efficient emulation

The Java language specification requires its signed integers to be represented in two’s complement format. Because of this, many basic operations are exactly the same whether the integer type is signed or unsigned. For example, the sum of two 8-bit integers giving an 8-bit result will have the same bit pattern whether all three of these values are treated as signed or unsigned. The same goes for subtraction, multiplication, left shift, equality, and of course bitwise AND/OR/XOR/NOT operations. The only ones that are signedness-sensitive are division, remainder, right shift, inequalities, casting, and string conversion. To summarize:

  • These operations are the same: +-*~&|^<<==!=, down-casting.

  • These operations are different: /%>><<=>>=, up-casting, parseInt()toString().

Java’s advantage of having mostly unified signed/unsigned arithmetic operations (from a bit-level standpoint) is not enjoyed by C/C++ users. This is because signed integers can be implemented on the machine in two’s complement, one’s complement, or sign-magnitude, and the C/C++ standards allow the low-level detail of the bit format to be exposed to the programmer. Furthermore, while the signed overflow is silent and well-defined in Java, it is undefined behavior in C/C++ and leads to all sorts of nasty problems. (In fact, if you stretch the argument you could say that if you wanted safe and reliable two’s complement signed overflow in C/C++, you should emulate the signed arithmetic using unsigned arithmetic – the reverse situation of what’s being argued here for Java.



Comments

Post a Comment