The first element I want to cover is Item #9 in the article above which deals with member alignment. I assumed that the compiler would find the most efficient layout for POD data in a class/struct. I was wrong. Try the following program yourself:
#include <iostream>
struct A
{
bool a;
int b;
bool c;
};
struct B
{
bool a;
bool c;
int b;
};
int main(int argc, char** argv)
{
std::cout << "sizeof(A) = " << sizeof(A) << std::endl;
std::cout << "sizeof(B) = " << sizeof(B) << std::endl;
}
The result you'll see is:
sizeof(A) = 12
sizeof(B) = 8
Wow. I wasn't expecting that! I guess the rule of thumb here is that each member needs to align on a four byte boundary. After each bool is inserts 3 padding bytes. Therefore, in A you'll see 3 padding bytes added to 1 byte for member a, then 4 bytes for b, then another byte for c followed by 3 more padding bytes.
struct A
{
bool a; //< 1 Byte for value, 3 Bytes Padding
int b; //< 4 Bytes
bool c; //< 1 Byte for value, 3 bytes for padding.
}; // SUM = 12 bytes.
struct B
{
bool a; //< 1 Byte for value.
bool c; //< 1 Byte for value, 2 more for Byte Padding
int b; //< 4 Bytes.
}; // SUM = 8 bytes
{
bool a; //< 1 Byte for value.
bool c; //< 1 Byte for value, 2 more for Byte Padding
int b; //< 4 Bytes.
}; // SUM = 8 bytes
I've never tried writing anything that used bit-packing. Bit-packing is a technique where developers can specify how many bits to use in a data type. The example that follows was inspired by [3] but as the example progressed I asked a question, typed it into the compiler, then collected the answer. The first example I have is:
#include <iostream>
struct D
{
unsigned char a : 1; // sum = 1
unsigned char b : 2; // sum = 3
unsigned char c : 2; // sum = 5
unsigned char d : 3; // sum = 8
};
int main(int argc, char** argv)
{
std::cout << "sizeof(D) = " << sizeof(D) << std::endl;
}
The result ends up being:
sizeof(D) = 1
So why 1 and not 4? Try adding another member to the structure:
struct D
{
unsigned char a : 1; // sum = 1
unsigned char b : 2; // sum = 3
unsigned char c : 2; // sum = 5
unsigned char d : 3; // sum = 8
unsigned char e : 4; // sum = 12
};
sizeof(D) = 2
So why 2 now instead of 4? Member alignment is confusing. I understand why this is 2: we want to align on a byte boundary. Even though we only use 4 bits of the second byte in D, we need to align on a byte boundary.
Then I tried the following:
struct D
{
unsigned char a : 1; // sum = 1 bit
unsigned char b : 2; // sum = 3 bits
unsigned char c : 2; // sum = 5 bits
unsigned char d : 3; // sum = 8 bits
int e; // sum = (32 bits + 8 bits) == 40 bits == 5 bytes
};
sizeof(D) = 8
That's what I would have expected. The first four characters are allocated into an unsigned char which is 1 Byte. Add 3 Bytes of padding. Then e will take up another 4 bytes. Ultimately the total is 5 bytes and if we are allocating based on a 4 byte boundary the total becomes 8 bytes.
That was a lot of work! It still doesn't answer the question as to why structures with bit packed unsigned char values only take up 1 byte. I tried the following:
struct E
{
unsigned char a;
};
sizeof(E) = 1
Then I tried:
struct F
{
unsigned short a;
};
sizeof(F) = 2;
Finally I tried:
struct G
{
unsigned short a;
unsigned short b;
unsigned short c;
};
sizeof(G) = 6;
I guess the theory I'm rolling with is that we always align based on the largest type in a structure. To test the theory, we should be able to make the largest size a short with two other char's:
struct H
{
unsigned char a;
unsigned short b;
unsigned char c;
};
Drum-Roll.... Size is 6! So then I tried something a bit more unusual (and probably warrants its own post):
#include <iostream>
struct I
{
char a;
long long b;
char c;
};
int main(int argc, char** argv)
{
std::cout << "sizeof(long long) = " << sizeof(long long) << std::endl;
std::cout << "sizeof(I) = " << sizeof(I) << std::endl;
}
This breaks the pattern:
sizeof(long long) = 8
sizeof(I) = 16
My guess that I will try to back up in another post is that a long long is actually just aligned using two long's (which would only make sense). This post has already gone too far- it just flies when you are having fun.
References
[1] - http://www.devx.com/cplus/Article/16328
[2] - http://www.cplusplus.com/doc/tutorial/variables/
[3] - http://cplus.about.com/od/learningc/ss/lowlevel_10.htm
No comments:
Post a Comment