C++ STL 中 vector 内存用尽后, 为什么每次是 2 倍的增长, 而不是 3 倍或其他值?
关注者
623被浏览
220,313登录后你可以
不限量看优质回答私信答主深度交流精彩内容一键收藏
folly/FBVector.md at master · facebook/folly · GitHub
It is well known that std::vector grows exponentially (at a constant factor) in order to avoid quadratic growth performance. The trick is choosing a good factor (any factor greater than 1 ensures O(1) amortized append complexity towards infinity). A factor that's too small causes frequent vector reallocation; one that's too large forces the vector to consume much more memory than needed. The initial HP implementation by Stepanov used a growth factor of 2, i.e. whenever you'd push_back into a vector without there being room, it would double the current capacity.
With time, other compilers reduced the growth factor to 1.5, but gcc has staunchly used a growth factor of 2. In fact it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory. That makes the vector cache- unfriendly and memory manager unfriendly.
使用 k=2 增长因子的问题在于,每次扩展的新尺寸必然刚好大于之前分配的总和:
c \sum_{i=0}^n 2^i = c(2^{n+1} - 1) < c2^{n + 1}
也就是说,之前分配的内存空间不可能被使用。这样对于缓存并不友好。最好把增长因子设为 1 < k < 2,例如 Folly 采用 1.5,RapidJSON 也是跟随采用 1.5:
GenericValue& PushBack(GenericValue& value, Allocator& allocator) {
RAPIDJSON_ASSERT(IsArray());
if (data_.a.size >= data_.a.capacity)
Reserve(data_.a.capacity == 0 ? kDefaultArrayCapacity : (data_.a.capacity + (data_.a.capacity + 1) / 2), allocator);
data_.a.elements[data_.a.size++].RawAssign(value);
return *this;
}
比较内存分配的情况:
k = 2, c = 4
0123
01234567
012345789ABCDEF
0123456789ABCDEF0123456789ABCDEF
012345...
k = 1.5, c = 4
0123
012345
012345678
0123456789ABCD
0123456789ABCDEF0123
0123456789ABCDEF0123456789ABCD
0123456789ABCDEF0123456789ABCDEF...
可以看到,k = 1.5 在几次扩展之后,可以重用之前的内存空间。
其实在C++ 标准中,并没有规定 vector::push_back() 要用哪一个增长因子。这是由标准库的实现者决定的。