SL: The Standard Library

Using only the bare language, every task is tedious (in any language). Using a suitable library any task can be reasonably simple.

The standard library has steadily grown over the years. Its description in the standard is now larger than that of the language features. So, it is likely that this library section of the guidelines will eventually grow in size to equal or exceed all the rest.

C++ Standard Library component summary:

Standard-library rule summary:

SL.1: Use libraries wherever possible

Reason

Save time. Don't re-invent the wheel. Don't replicate the work of others. Benefit from other people's work when they make improvements. Help other people when you make improvements.

SL.2: Prefer the standard library to other libraries

Reason

More people know the standard library. It is more likely to be stable, well-maintained, and widely available than your own code or most other libraries.

SL.3: Do not add non-standard entities to namespace std

Reason

Adding to std might change the meaning of otherwise standards conforming code. Additions to std might clash with future versions of the standard.

Example
???
Enforcement

Possible, but messy and likely to cause problems with platforms.

SL.4: Use the standard library in a type-safe manner

Reason

Because, obviously, breaking this rule can lead to undefined behavior, memory corruption, and all kinds of other bad errors.

Note

This is a semi-philosophical meta-rule, which needs many supporting concrete rules. We need it as an umbrella for the more specific rules.

Summary of more specific rules:

SL.con: Containers

???

Container rule summary:

SL.con.1: Prefer using STL array or vector instead of a C array

Reason

C arrays are less safe, and have no advantages over array and vector. For a fixed-length array, use std::array, which does not degenerate to a pointer when passed to a function and does know its size. Also, like a built-in array, a stack-allocated std::array keeps its elements on the stack. For a variable-length array, use std::vector, which additionally can change its size and handles memory allocation.

Example
int v[SIZE]; // BAD
std::array<int, SIZE> w; // ok
Example
int* v = new int[initial_size]; // BAD, owning raw pointer
delete[] v; // BAD, manual delete
std::vector<int> w(initial_size); // ok
Note

Use gsl::span for non-owning references into a container.

Note

Comparing the performance of a fixed-sized array allocated on the stack against a vector with its elements on the free store is bogus. You could just as well compare a std::array on the stack against the result of a malloc() accessed through a pointer. For most code, even the difference between stack allocation and free-store allocation doesn't matter, but the convenience and safety of vector does. People working with code for which that difference matters are quite capable of choosing between array and vector.

Enforcement
  • Flag declaration of a C array inside a function or class that also declares an STL container (to avoid excessive noisy warnings on legacy non-STL code). To fix: At least change the C array to a std::array.

SL.con.2: Prefer using STL vector by default unless you have a reason to use a different container

Reason

vector and array are the only standard containers that offer the following advantages:

  • the fastest general-purpose access (random access, including being vectorization-friendly);
  • the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
  • the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).

Usually you need to add and remove elements from the container, so use vector by default; if you don't need to modify the container's size, use array.

Even when other containers seem more suited, such as map for O(log N) lookup performance or a list for efficient insertion in the middle, a vector will usually still perform better for containers up to a few KB in size.

Note

string should not be used as a container of individual characters. A string is a textual string; if you want a container of characters, use vector</*char_type*/> or array</*char_type*/> instead.

Exceptions

If you have a good reason to use another container, use that instead. For example:

  • If vector suits your needs but you don't need the container to be variable size, use array instead.

  • If you want a dictionary-style lookup container that guarantees O(K) or O(log N) lookups, the container will be larger (more than a few KB) and you perform frequent inserts so that the overhead of maintaining a sorted vector is infeasible, go ahead and use an unordered_map or map instead.

Note

To initialize a vector with a number of elements, use ()-initialization. To initialize a vector with a list of elements, use {}-initialization.

vector<int> v1(20); // v1 has 20 elements with the value 0 (vector<int>{})
vector<int> v2 {20}; // v2 has 1 element with the value 20

(Prefer the -initializer syntax).

Enforcement
  • Flag a vector whose size never changes after construction (such as because it's const or because no non-const functions are called on it). To fix: Use an array instead.

SL.con.3: Avoid bounds errors

Reason

Read or write beyond an allocated range of elements typically leads to bad errors, wrong results, crashes, and security violations.

Note

The standard-library functions that apply to ranges of elements all have (or could have) bounds-safe overloads that take span. Standard types such as vector can be modified to perform bounds-checks under the bounds profile (in a compatible way, such as by adding contracts), or used with at().

Ideally, the in-bounds guarantee should be statically enforced. For example:

  • a range-for cannot loop beyond the range of the container to which it is applied
  • a v.begin(),v.end() is easily determined to be bounds safe

Such loops are as fast as any unchecked/unsafe equivalent.

Often a simple pre-check can eliminate the need for checking of individual indices. For example

  • for v.begin(),v.begin()+i the i can easily be checked against v.size()

Such loops can be much faster than individually checked element accesses.

Example, bad
void f()
{
array<int, 10> a, b;
memset(a.data(), 0, 10); // BAD, and contains a length error (length = 10 * sizeof(int))
memcmp(a.data(), b.data(), 10); // BAD, and contains a length error (length = 10 * sizeof(int))
}

Also, std::array<>::fill() or std::fill() or even an empty initializer are better candidates than memset().

Example, good
void f()
{
array<int, 10> a, b, c{}; // c is initialized to zero
a.fill(0);
fill(b.begin(), b.end(), 0); // std::fill()
fill(b, 0); // std::ranges::fill()
if ( a == b ) {
// ...
}
}
Example

If code is using an unmodified standard library, then there are still workarounds that enable use of std::array and std::vector in a bounds-safe manner. Code can call the .at() member function on each class, which will result in an std::out_of_range exception being thrown. Alternatively, code can call the at() free function, which will result in fail-fast (or a customized action) on a bounds violation.

void f(std::vector<int>& v, std::array<int, 12> a, int i)
{
v[0] = a[0]; // BAD
v.at(0) = a[0]; // OK (alternative 1)
at(v, 0) = a[0]; // OK (alternative 2)
v.at(0) = a[i]; // BAD
v.at(0) = a.at(i); // OK (alternative 1)
v.at(0) = at(a, i); // OK (alternative 2)
}
Enforcement
  • Issue a diagnostic for any call to a standard-library function that is not bounds-checked. ??? insert link to a list of banned functions

This rule is part of the (bounds profile).

SL.con.4: don't use memset or memcpy for arguments that are not trivially-copyable

Reason

Doing so messes the semantics of the objects (e.g., by overwriting a vptr).

Note

Similarly for (w)memset, (w)memcpy, (w)memmove, and (w)memcmp

Example
struct base {
virtual void update() = 0;
};
struct derived : public base {
void update() override {}
};
void f(derived& a, derived& b) // goodbye v-tables
{
memset(&a, 0, sizeof(derived));
memcpy(&a, &b, sizeof(derived));
memcmp(&a, &b, sizeof(derived));
}

Instead, define proper default initialization, copy, and comparison functions

void g(derived& a, derived& b)
{
a = {}; // default initialize
b = a; // copy
if (a == b) do_something(a, b);
}
Enforcement
  • Flag the use of those functions for types that are not trivially copyable

TODO Notes:

  • Impact on the standard library will require close coordination with WG21, if only to ensure compatibility even if never standardized.
  • We are considering specifying bounds-safe overloads for stdlib (especially C stdlib) functions like memcmp and shipping them in the GSL.
  • For existing stdlib functions and types like vector that are not fully bounds-checked, the goal is for these features to be bounds-checked when called from code with the bounds profile on, and unchecked when called from legacy code, possibly using contracts (concurrently being proposed by several WG21 members).

SL.str: String

Text manipulation is a huge topic. std::string doesn't cover all of it. This section primarily tries to clarify std::string's relation to char*, zstring, string_view, and gsl::span<char>. The important issue of non-ASCII character sets and encodings (e.g., wchar_t, Unicode, and UTF-8) will be covered elsewhere.

See also: (regular expressions)

Here, we use "sequence of characters" or "string" to refer to a sequence of characters meant to be read as text (somehow, eventually). We don't consider ???

String summary:

See also:

SL.str.1: Use std::string to own character sequences

Reason

string correctly handles allocation, ownership, copying, gradual expansion, and offers a variety of useful operations.

Example
vector<string> read_until(const string& terminator)
{
vector<string> res;
for (string s; cin >> s && s != terminator; ) // read a word
res.push_back(s);
return res;
}

Note how >> and != are provided for string (as examples of useful operations) and there are no explicit allocations, deallocations, or range checks (string takes care of those).

In C++17, we might use string_view as the argument, rather than const string& to allow more flexibility to callers:

vector<string> read_until(string_view terminator) // C++17
{
vector<string> res;
for (string s; cin >> s && s != terminator; ) // read a word
res.push_back(s);
return res;
}
Example, bad

Don't use C-style strings for operations that require non-trivial memory management

char* cat(const char* s1, const char* s2) // beware!
// return s1 + '.' + s2
{
int l1 = strlen(s1);
int l2 = strlen(s2);
char* p = (char*) malloc(l1 + l2 + 2);
strcpy(p, s1, l1);
p[l1] = '.';
strcpy(p + l1 + 1, s2, l2);
p[l1 + l2 + 1] = 0;
return p;
}

Did we get that right? Will the caller remember to free() the returned pointer? Will this code pass a security review?

Note

Do not assume that string is slower than lower-level techniques without measurement and remember that not all code is performance critical. (Don't optimize prematurely)

Enforcement

???

SL.str.2: Use std::string_view or gsl::span<char> to refer to character sequences

Reason

std::string_view or gsl::span<char> provides simple and (potentially) safe access to character sequences independently of how those sequences are allocated and stored.

Example
vector<string> read_until(string_view terminator);
void user(zstring p, const string& s, string_view ss)
{
auto v1 = read_until(p);
auto v2 = read_until(s);
auto v3 = read_until(ss);
// ...
}
Note

std::string_view (C++17) is read-only.

Enforcement

???

SL.str.3: Use zstring or czstring to refer to a C-style, zero-terminated, sequence of characters

Reason

Readability. Statement of intent. A plain char* can be a pointer to a single character, a pointer to an array of characters, a pointer to a C-style (zero-terminated) string, or even to a small integer. Distinguishing these alternatives prevents misunderstandings and bugs.

Example
void f1(const char* s); // s is probably a string

All we know is that it is supposed to be the nullptr or point to at least one character

void f1(zstring s); // s is a C-style string or the nullptr
void f1(czstring s); // s is a C-style string constant or the nullptr
void f1(std::byte* s); // s is a pointer to a byte (C++17)
Note

Don't convert a C-style string to string unless there is a reason to.

Note

Like any other "plain pointer", a zstring should not represent ownership.

Note

There are billions of lines of C++ "out there", most use char* and const char* without documenting intent. They are used in a wide variety of ways, including to represent ownership and as generic pointers to memory (instead of void*). It is hard to separate these uses, so this guideline is hard to follow. This is one of the major sources of bugs in C and C++ programs, so it is worthwhile to follow this guideline wherever feasible.

Enforcement
  • Flag uses of [] on a char*
  • Flag uses of delete on a char*
  • Flag uses of free() on a char*

SL.str.4: Use char* to refer to a single character

Reason

The variety of uses of char* in current code is a major source of errors.

Example, bad
char arr[] = {'a', 'b', 'c'};
void print(const char* p)
{
cout << p << '\n';
}
void use()
{
print(arr); // run-time error; potentially very bad
}

The array arr is not a C-style string because it is not zero-terminated.

Alternative

See (((zstring), string), and string_view).

Enforcement
  • Flag uses of [] on a char*

SL.str.5: Use std::byte to refer to byte values that do not necessarily represent characters

Reason

Use of char* to represent a pointer to something that is not necessarily a character causes confusion and disables valuable optimizations.

Example
???
Note

C++17

Enforcement

???

SL.str.10: Use std::string when you need to perform locale-sensitive string operations

Reason

std::string supports standard-library (locale facilities)

Example
???
Note

???

Enforcement

???

SL.str.11: Use gsl::span<char> rather than std::string_view when you need to mutate a string

Reason

std::string_view is read-only.

Example

???

Note

???

Enforcement

The compiler will flag attempts to write to a string_view.

SL.str.12: Use the s suffix for string literals meant to be standard-library strings

Reason

Direct expression of an idea minimizes mistakes.

Example
auto pp1 = make_pair("Tokyo", 9.00); // {C-style string,double} intended?
pair<string, double> pp2 = {"Tokyo", 9.00}; // a bit verbose
auto pp3 = make_pair("Tokyo"s, 9.00); // {std::string,double} // C++14
pair pp4 = {"Tokyo"s, 9.00}; // {std::string,double} // C++17
Enforcement

???

SL.io: Iostream

iostreams is a type safe, extensible, formatted and unformatted I/O library for streaming I/O. It supports multiple (and user extensible) buffering strategies and multiple locales. It can be used for conventional I/O, reading and writing to memory (string streams), and user-defined extensions, such as streaming across networks (asio: not yet standardized).

Iostream rule summary:

SL.io.1: Use character-level input only when you have to

Reason

Unless you genuinely just deal with individual characters, using character-level input leads to the user code performing potentially error-prone and potentially inefficient composition of tokens out of characters.

Example
char c;
char buf[128];
int i = 0;
while (cin.get(c) && !isspace(c) && i < 128)
buf[i++] = c;
if (i == 128) {
// ... handle too long string ....
}

Better (much simpler and probably faster):

string s;
s.reserve(128);
cin >> s;

and the reserve(128) is probably not worthwhile.

Enforcement

???

SL.io.2: When reading, always consider ill-formed input

Reason

Errors are typically best handled as soon as possible. If input isn't validated, every function must be written to cope with bad data (and that is not practical).

Example
???
Enforcement

???

SL.io.3: Prefer iostreams for I/O

Reason

iostreams are safe, flexible, and extensible.

Example
// write a complex number:
complex<double> z{ 3, 4 };
cout << z << '\n';

complex is a user-defined type and its I/O is defined without modifying the iostream library.

Example
// read a file of complex numbers:
for (complex<double> z; cin >> z; )
v.push_back(z);
Exception

??? performance ???

Discussion: iostreams vs. the printf() family

It is often (and often correctly) pointed out that the printf() family has two advantages compared to iostreams: flexibility of formatting and performance. This has to be weighed against iostreams advantages of extensibility to handle user-defined types, resilience against security violations, implicit memory management, and locale handling.

If you need I/O performance, you can almost always do better than printf().

gets(), scanf() using %s, and printf() using %s are security hazards (vulnerable to buffer overflow and generally error-prone). C11 defines some "optional extensions" that do extra checking of their arguments. If present in your C library, gets_s(), scanf_s(), and printf_s() might be safer alternatives, but they are still not type safe.

Enforcement

Optionally flag <cstdio> and <stdio.h>.

SL.io.10: Unless you use printf-family functions call ios_base::sync_with_stdio(false)

Reason

Synchronizing iostreams with printf-style I/O can be costly. cin and cout are by default synchronized with printf.

Example
int main()
{
ios_base::sync_with_stdio(false);
// ... use iostreams ...
}
Enforcement

???

SL.io.50: Avoid endl

Reason

The endl manipulator is mostly equivalent to '\n' and "\n"; as most commonly used it simply slows down output by doing redundant flush()s. This slowdown can be significant compared to printf-style output.

Example
cout << "Hello, World!" << endl; // two output operations and a flush
cout << "Hello, World!\n"; // one output operation and no flush
Note

For cin/cout (and equivalent) interaction, there is no reason to flush; that's done automatically. For writing to a file, there is rarely a need to flush.

Note

For string streams (specifically ostringstream), the insertion of an endl is entirely equivalent to the insertion of a '\n' character, but also in this case, endl might be significantly slower.

endl does not take care of producing a platform specific end-of-line sequence (like "\r\n" on Windows). So for a string stream, s << endl just inserts a single character, '\n'.

Note

Apart from the (occasionally important) issue of performance, the choice between '\n' and endl is almost completely aesthetic.

SL.regex: Regex

<regex> is the standard C++ regular expression library. It supports a variety of regular expression pattern conventions.

SL.chrono: Time

<chrono> (defined in namespace std::chrono) provides the notions of time_point and duration together with functions for outputting time in various units. It provides clocks for registering time_points.

SL.C: The C Standard Library

???

C Standard Library rule summary:

SL.C.1: Don't use setjmp/longjmp

Reason

a longjmp ignores destructors, thus invalidating all resource-management strategies relying on RAII

Enforcement

Flag all occurrences of longjmpand setjmp