Bitten by Bit Shifts
I recently found myself decoding some binary data and getting results I wasn’t expecting. This caused me to review one of the most basic operations in computer science: bit shifts. My “research” led me to a great article by Lenovo on the subject. After a few experiments, I learned another valuable lesson about strongly-typed languages and why choosing data types with intent is so important.
Bit Shift Basics
There are two ways to shift bits: logically and arithmetically. Logical bit shifting is simple. It will always fill in the vacant bit with a zero. When dealing with serialized binary data this is nearly always what we want.
1000
0001
Logical right shift
Arithmetic bit shifting behaves a little differently: it preserves the sign bit when shifting bits to the right. This difference is based in the fact that arithmetic bit shifting is equivalent to multiplying or dividing by 2. Shifting to the left multiplies a value by 2. Shifting to the right divides by 2. This means that the representation of signed values in binary requires arithmetic right shifts to fill in the vacant bit with ones and preserve the signedness.
1000
1111
Arithmetic right shift
This is what bit me. I was decoding binary data using arithmetic right shifts when I didn’t mean to. Why? Because I wasn’t using the correct data type to hold the binary data.
Bit Shifting in C++
C++ uses the type of the value being shifted to determine whether to use a logical or arithmetic shift.
- Unsigned values use logical left and right shifts.
- Signed values use a logical left shift and an arithmetic right shift.
This means we must choose the types of our values with intent. It’s amazing to me how even something as fundemental as bit shifting requires us to design software well and choose appropriate types to represent the values in our program!
Let’s look at a few examples. In each example, I will include the binary
representation of the final value. You can print this value in C++ using the
std::bitset
type.
// LEFT SHIFTS
char a = 0x01 << 7; // prints '10000000'
unsigned char b = 0x01 << 7; // prints '10000000'
// RIGHT SHIFTS
char a = 0x80 >> 7; // prints '11111111'
unsigned char b = 0x80 >> 7; // prints '00000001'
Note that char
is a signed data type, so the bitshift operator >>
will
perform an arithmetic right shift to preserve the signedness. However, when we
use unsigned char
, the operator >>
will perform a logical right shift and
fill the vacancies with zeros.
Processing Binary Data
One area where we can see a lot of bitshifting is when decoding binary data. In
this case we almost always want to use unsigned data types to represent the
data. Since C++17, the best way to do this is to use std::byte
.
std::ifstream input{"data.bin", std::ios::binary};
std::vector<std::byte> buf(1024); // 1
input.read(buf.data(), buf.size()); // 2
std::byte d = buf[0] >> 7; // 3
At 1, we pre-allocate a buffer of bytes to store the data int. Then at 2
we read the data into our buffer. Since std::byte
is an unsigned type, at
3 we are performing a logical right shift on the first byte.
We could use unsigned char
, uint8_t
, or some other type to represent the
binary data. However, using std::byte
cleary communicates to the reader that
we are dealing with plain binary data that has to be decoded before it can be
used. This prevents confusion and also forces us to be more explicit with our
conversions in code, and in my experience the more explicit you are the fewer
bugs you encounter.
Conclusions
Remember your types when you need to bit-shift. Choosing an appropriate type for the data you are processing will prevent headaches and reduce logic errors later in your algorithms.