SL: The Standard Library
Using only the bare language, every task is tedious (in any language). Using a suitable library any task can be reasonably simple.
The standard library has steadily grown over the years. Its description in the standard is now larger than that of the language features. So, it is likely that this library section of the guidelines will eventually grow in size to equal or exceed all the rest.
C++ Standard Library component summary:
- (SL.con: Containers)
- (SL.str: String)
- (SL.io: Iostream)
- (SL.regex: Regex)
- (SL.chrono: Time)
- (SL.C: The C Standard Library)
Standard-library rule summary:
- (SL.1: Use libraries wherever possible)
- (SL.2: Prefer the standard library to other libraries)
- (SL.3: Do not add non-standard entities to namespace
std
) - (SL.4: Use the standard library in a type-safe manner)
- ???
SL.1: Use libraries wherever possible
Reason
Save time. Don't re-invent the wheel. Don't replicate the work of others. Benefit from other people's work when they make improvements. Help other people when you make improvements.
SL.2: Prefer the standard library to other libraries
Reason
More people know the standard library. It is more likely to be stable, well-maintained, and widely available than your own code or most other libraries.
SL.3: Do not add non-standard entities to namespace std
Reason
Adding to std
might change the meaning of otherwise standards conforming code.
Additions to std
might clash with future versions of the standard.
Example
???
Enforcement
Possible, but messy and likely to cause problems with platforms.
SL.4: Use the standard library in a type-safe manner
Reason
Because, obviously, breaking this rule can lead to undefined behavior, memory corruption, and all kinds of other bad errors.
Note
This is a semi-philosophical meta-rule, which needs many supporting concrete rules. We need it as an umbrella for the more specific rules.
Summary of more specific rules:
SL.con: Containers
???
Container rule summary:
- (SL.con.1: Prefer using STL
array
orvector
instead of a C array) - (SL.con.2: Prefer using STL
vector
by default unless you have a reason to use a different container) - (SL.con.3: Avoid bounds errors)
- (SL.con.4: don't use
memset
ormemcpy
for arguments that are not trivially-copyable)
SL.con.1: Prefer using STL array
or vector
instead of a C array
Reason
C arrays are less safe, and have no advantages over array
and vector
.
For a fixed-length array, use std::array
, which does not degenerate to a pointer when passed to a function and does know its size.
Also, like a built-in array, a stack-allocated std::array
keeps its elements on the stack.
For a variable-length array, use std::vector
, which additionally can change its size and handles memory allocation.
Example
int v[SIZE]; // BADstd::array<int, SIZE> w; // ok
Example
int* v = new int[initial_size]; // BAD, owning raw pointerdelete[] v; // BAD, manual deletestd::vector<int> w(initial_size); // ok
Note
Use gsl::span
for non-owning references into a container.
Note
Comparing the performance of a fixed-sized array allocated on the stack against a vector
with its elements on the free store is bogus.
You could just as well compare a std::array
on the stack against the result of a malloc()
accessed through a pointer.
For most code, even the difference between stack allocation and free-store allocation doesn't matter, but the convenience and safety of vector
does.
People working with code for which that difference matters are quite capable of choosing between array
and vector
.
Enforcement
- Flag declaration of a C array inside a function or class that also declares an STL container (to avoid excessive noisy warnings on legacy non-STL code). To fix: At least change the C array to a
std::array
.
SL.con.2: Prefer using STL vector
by default unless you have a reason to use a different container
Reason
vector
and array
are the only standard containers that offer the following advantages:
- the fastest general-purpose access (random access, including being vectorization-friendly);
- the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
- the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).
Usually you need to add and remove elements from the container, so use vector
by default; if you don't need to modify the container's size, use array
.
Even when other containers seem more suited, such as map
for O(log N) lookup performance or a list
for efficient insertion in the middle, a vector
will usually still perform better for containers up to a few KB in size.
Note
string
should not be used as a container of individual characters. A string
is a textual string; if you want a container of characters, use vector</*char_type*/>
or array</*char_type*/>
instead.
Exceptions
If you have a good reason to use another container, use that instead. For example:
If
vector
suits your needs but you don't need the container to be variable size, usearray
instead.If you want a dictionary-style lookup container that guarantees O(K) or O(log N) lookups, the container will be larger (more than a few KB) and you perform frequent inserts so that the overhead of maintaining a sorted
vector
is infeasible, go ahead and use anunordered_map
ormap
instead.
Note
To initialize a vector with a number of elements, use ()
-initialization.
To initialize a vector with a list of elements, use {}
-initialization.
vector<int> v1(20); // v1 has 20 elements with the value 0 (vector<int>{})vector<int> v2 {20}; // v2 has 1 element with the value 20
(Prefer the -initializer syntax).
Enforcement
- Flag a
vector
whose size never changes after construction (such as because it'sconst
or because no non-const
functions are called on it). To fix: Use anarray
instead.
SL.con.3: Avoid bounds errors
Reason
Read or write beyond an allocated range of elements typically leads to bad errors, wrong results, crashes, and security violations.
Note
The standard-library functions that apply to ranges of elements all have (or could have) bounds-safe overloads that take span
.
Standard types such as vector
can be modified to perform bounds-checks under the bounds profile (in a compatible way, such as by adding contracts), or used with at()
.
Ideally, the in-bounds guarantee should be statically enforced. For example:
- a range-
for
cannot loop beyond the range of the container to which it is applied - a
v.begin(),v.end()
is easily determined to be bounds safe
Such loops are as fast as any unchecked/unsafe equivalent.
Often a simple pre-check can eliminate the need for checking of individual indices. For example
- for
v.begin(),v.begin()+i
thei
can easily be checked againstv.size()
Such loops can be much faster than individually checked element accesses.
Example, bad
void f(){ array<int, 10> a, b; memset(a.data(), 0, 10); // BAD, and contains a length error (length = 10 * sizeof(int)) memcmp(a.data(), b.data(), 10); // BAD, and contains a length error (length = 10 * sizeof(int))}
Also, std::array<>::fill()
or std::fill()
or even an empty initializer are better candidates than memset()
.
Example, good
void f(){ array<int, 10> a, b, c{}; // c is initialized to zero a.fill(0); fill(b.begin(), b.end(), 0); // std::fill() fill(b, 0); // std::ranges::fill() if ( a == b ) { // ... }}
Example
If code is using an unmodified standard library, then there are still workarounds that enable use of std::array
and std::vector
in a bounds-safe manner. Code can call the .at()
member function on each class, which will result in an std::out_of_range
exception being thrown. Alternatively, code can call the at()
free function, which will result in fail-fast (or a customized action) on a bounds violation.
void f(std::vector<int>& v, std::array<int, 12> a, int i){ v[0] = a[0]; // BAD v.at(0) = a[0]; // OK (alternative 1) at(v, 0) = a[0]; // OK (alternative 2) v.at(0) = a[i]; // BAD v.at(0) = a.at(i); // OK (alternative 1) v.at(0) = at(a, i); // OK (alternative 2)}
Enforcement
- Issue a diagnostic for any call to a standard-library function that is not bounds-checked. ??? insert link to a list of banned functions
This rule is part of the (bounds profile).
SL.con.4: don't use memset
or memcpy
for arguments that are not trivially-copyable
Reason
Doing so messes the semantics of the objects (e.g., by overwriting a vptr
).
Note
Similarly for (w)memset, (w)memcpy, (w)memmove, and (w)memcmp
Example
struct base { virtual void update() = 0;};struct derived : public base { void update() override {}};void f(derived& a, derived& b) // goodbye v-tables{ memset(&a, 0, sizeof(derived)); memcpy(&a, &b, sizeof(derived)); memcmp(&a, &b, sizeof(derived));}
Instead, define proper default initialization, copy, and comparison functions
void g(derived& a, derived& b){ a = {}; // default initialize b = a; // copy if (a == b) do_something(a, b);}
Enforcement
- Flag the use of those functions for types that are not trivially copyable
TODO Notes:
- Impact on the standard library will require close coordination with WG21, if only to ensure compatibility even if never standardized.
- We are considering specifying bounds-safe overloads for stdlib (especially C stdlib) functions like
memcmp
and shipping them in the GSL. - For existing stdlib functions and types like
vector
that are not fully bounds-checked, the goal is for these features to be bounds-checked when called from code with the bounds profile on, and unchecked when called from legacy code, possibly using contracts (concurrently being proposed by several WG21 members).
SL.str: String
Text manipulation is a huge topic.
std::string
doesn't cover all of it.
This section primarily tries to clarify std::string
's relation to char*
, zstring
, string_view
, and gsl::span<char>
.
The important issue of non-ASCII character sets and encodings (e.g., wchar_t
, Unicode, and UTF-8) will be covered elsewhere.
See also: (regular expressions)
Here, we use "sequence of characters" or "string" to refer to a sequence of characters meant to be read as text (somehow, eventually). We don't consider ???
String summary:
(SL.str.2: Use
std::string_view
orgsl::span<char>
to refer to character sequences)(SL.str.3: Use
zstring
orczstring
to refer to a C-style, zero-terminated, sequence of characters)(SL.str.5: Use
std::byte
to refer to byte values that do not necessarily represent characters)(SL.str.10: Use
std::string
when you need to perform locale-sensitive string operations)(SL.str.11: Use
gsl::span<char>
rather thanstd::string_view
when you need to mutate a string)(SL.str.12: Use the
s
suffix for string literals meant to be standard-librarystring
s)
See also:
SL.str.1: Use std::string
to own character sequences
Reason
string
correctly handles allocation, ownership, copying, gradual expansion, and offers a variety of useful operations.
Example
vector<string> read_until(const string& terminator){ vector<string> res; for (string s; cin >> s && s != terminator; ) // read a word res.push_back(s); return res;}
Note how >>
and !=
are provided for string
(as examples of useful operations) and there are no explicit
allocations, deallocations, or range checks (string
takes care of those).
In C++17, we might use string_view
as the argument, rather than const string&
to allow more flexibility to callers:
vector<string> read_until(string_view terminator) // C++17{ vector<string> res; for (string s; cin >> s && s != terminator; ) // read a word res.push_back(s); return res;}
Example, bad
Don't use C-style strings for operations that require non-trivial memory management
char* cat(const char* s1, const char* s2) // beware! // return s1 + '.' + s2{ int l1 = strlen(s1); int l2 = strlen(s2); char* p = (char*) malloc(l1 + l2 + 2); strcpy(p, s1, l1); p[l1] = '.'; strcpy(p + l1 + 1, s2, l2); p[l1 + l2 + 1] = 0; return p;}
Did we get that right?
Will the caller remember to free()
the returned pointer?
Will this code pass a security review?
Note
Do not assume that string
is slower than lower-level techniques without measurement and remember that not all code is performance critical.
(Don't optimize prematurely)
Enforcement
???
SL.str.2: Use std::string_view
or gsl::span<char>
to refer to character sequences
Reason
std::string_view
or gsl::span<char>
provides simple and (potentially) safe access to character sequences independently of how
those sequences are allocated and stored.
Example
vector<string> read_until(string_view terminator);void user(zstring p, const string& s, string_view ss){ auto v1 = read_until(p); auto v2 = read_until(s); auto v3 = read_until(ss); // ...}
Note
std::string_view
(C++17) is read-only.
Enforcement
???
SL.str.3: Use zstring
or czstring
to refer to a C-style, zero-terminated, sequence of characters
Reason
Readability.
Statement of intent.
A plain char*
can be a pointer to a single character, a pointer to an array of characters, a pointer to a C-style (zero-terminated) string, or even to a small integer.
Distinguishing these alternatives prevents misunderstandings and bugs.
Example
void f1(const char* s); // s is probably a string
All we know is that it is supposed to be the nullptr or point to at least one character
void f1(zstring s); // s is a C-style string or the nullptrvoid f1(czstring s); // s is a C-style string constant or the nullptrvoid f1(std::byte* s); // s is a pointer to a byte (C++17)
Note
Don't convert a C-style string to string
unless there is a reason to.
Note
Like any other "plain pointer", a zstring
should not represent ownership.
Note
There are billions of lines of C++ "out there", most use char*
and const char*
without documenting intent.
They are used in a wide variety of ways, including to represent ownership and as generic pointers to memory (instead of void*
).
It is hard to separate these uses, so this guideline is hard to follow.
This is one of the major sources of bugs in C and C++ programs, so it is worthwhile to follow this guideline wherever feasible.
Enforcement
- Flag uses of
[]
on achar*
- Flag uses of
delete
on achar*
- Flag uses of
free()
on achar*
SL.str.4: Use char*
to refer to a single character
Reason
The variety of uses of char*
in current code is a major source of errors.
Example, bad
char arr[] = {'a', 'b', 'c'};void print(const char* p){ cout << p << '\n';}void use(){ print(arr); // run-time error; potentially very bad}
The array arr
is not a C-style string because it is not zero-terminated.
Alternative
See (((zstring
), string
), and string_view
).
Enforcement
- Flag uses of
[]
on achar*
SL.str.5: Use std::byte
to refer to byte values that do not necessarily represent characters
Reason
Use of char*
to represent a pointer to something that is not necessarily a character causes confusion
and disables valuable optimizations.
Example
???
Note
C++17
Enforcement
???
SL.str.10: Use std::string
when you need to perform locale-sensitive string operations
Reason
std::string
supports standard-library (locale
facilities)
Example
???
Note
???
Enforcement
???
SL.str.11: Use gsl::span<char>
rather than std::string_view
when you need to mutate a string
Reason
std::string_view
is read-only.
Example
???
Note
???
Enforcement
The compiler will flag attempts to write to a string_view
.
SL.str.12: Use the s
suffix for string literals meant to be standard-library string
s
Reason
Direct expression of an idea minimizes mistakes.
Example
auto pp1 = make_pair("Tokyo", 9.00); // {C-style string,double} intended?pair<string, double> pp2 = {"Tokyo", 9.00}; // a bit verboseauto pp3 = make_pair("Tokyo"s, 9.00); // {std::string,double} // C++14pair pp4 = {"Tokyo"s, 9.00}; // {std::string,double} // C++17
Enforcement
???
SL.io: Iostream
iostream
s is a type safe, extensible, formatted and unformatted I/O library for streaming I/O.
It supports multiple (and user extensible) buffering strategies and multiple locales.
It can be used for conventional I/O, reading and writing to memory (string streams),
and user-defined extensions, such as streaming across networks (asio: not yet standardized).
Iostream rule summary:
- (SL.io.1: Use character-level input only when you have to)
- (SL.io.2: When reading, always consider ill-formed input)
- (SL.io.3: Prefer iostreams for I/O)
- (SL.io.10: Unless you use
printf
-family functions callios_base::sync_with_stdio(false)
) - (SL.io.50: Avoid
endl
) - ???
SL.io.1: Use character-level input only when you have to
Reason
Unless you genuinely just deal with individual characters, using character-level input leads to the user code performing potentially error-prone and potentially inefficient composition of tokens out of characters.
Example
char c;char buf[128];int i = 0;while (cin.get(c) && !isspace(c) && i < 128) buf[i++] = c;if (i == 128) { // ... handle too long string ....}
Better (much simpler and probably faster):
string s;s.reserve(128);cin >> s;
and the reserve(128)
is probably not worthwhile.
Enforcement
???
SL.io.2: When reading, always consider ill-formed input
Reason
Errors are typically best handled as soon as possible. If input isn't validated, every function must be written to cope with bad data (and that is not practical).
Example
???
Enforcement
???
SL.io.3: Prefer iostream
s for I/O
Reason
iostream
s are safe, flexible, and extensible.
Example
// write a complex number:complex<double> z{ 3, 4 };cout << z << '\n';
complex
is a user-defined type and its I/O is defined without modifying the iostream
library.
Example
// read a file of complex numbers:for (complex<double> z; cin >> z; ) v.push_back(z);
Exception
??? performance ???
Discussion: iostream
s vs. the printf()
family
It is often (and often correctly) pointed out that the printf()
family has two advantages compared to iostream
s:
flexibility of formatting and performance.
This has to be weighed against iostream
s advantages of extensibility to handle user-defined types, resilience against security violations,
implicit memory management, and locale
handling.
If you need I/O performance, you can almost always do better than printf()
.
gets()
, scanf()
using %s
, and printf()
using %s
are security hazards (vulnerable to buffer overflow and generally error-prone).
C11 defines some "optional extensions" that do extra checking of their arguments.
If present in your C library, gets_s()
, scanf_s()
, and printf_s()
might be safer alternatives, but they are still not type safe.
Enforcement
Optionally flag <cstdio>
and <stdio.h>
.
SL.io.10: Unless you use printf
-family functions call ios_base::sync_with_stdio(false)
Reason
Synchronizing iostreams
with printf-style
I/O can be costly.
cin
and cout
are by default synchronized with printf
.
Example
int main(){ ios_base::sync_with_stdio(false); // ... use iostreams ...}
Enforcement
???
SL.io.50: Avoid endl
Reason
The endl
manipulator is mostly equivalent to '\n'
and "\n"
;
as most commonly used it simply slows down output by doing redundant flush()
s.
This slowdown can be significant compared to printf
-style output.
Example
cout << "Hello, World!" << endl; // two output operations and a flushcout << "Hello, World!\n"; // one output operation and no flush
Note
For cin
/cout
(and equivalent) interaction, there is no reason to flush; that's done automatically.
For writing to a file, there is rarely a need to flush
.
Note
For string streams (specifically ostringstream
), the insertion of an endl
is entirely equivalent
to the insertion of a '\n'
character, but also in this case, endl
might be significantly slower.
endl
does not take care of producing a platform specific end-of-line sequence (like "\r\n" on
Windows). So for a string stream, s << endl
just inserts a single character, '\n'
.
Note
Apart from the (occasionally important) issue of performance,
the choice between '\n'
and endl
is almost completely aesthetic.
SL.regex: Regex
<regex>
is the standard C++ regular expression library.
It supports a variety of regular expression pattern conventions.
SL.chrono: Time
<chrono>
(defined in namespace std::chrono
) provides the notions of time_point
and duration
together with functions for
outputting time in various units.
It provides clocks for registering time_points
.
SL.C: The C Standard Library
???
C Standard Library rule summary:
SL.C.1: Don't use setjmp/longjmp
Reason
a longjmp
ignores destructors, thus invalidating all resource-management strategies relying on RAII
Enforcement
Flag all occurrences of longjmp
and setjmp