STL From Scratch - Introduction

update on Sept. 29th: I've modified the schedule of the first two posts to split-out integer sequences and the rest of the <utility> header from our implementation of pairs and tuples.

If you're interested in discussing this, consider checking out the reddit post.

The Standard Template Library (STL) is the modern C++ programmer's best friend. It contains a wealth of data structures, algorithms, and utility libraries from which we can build almost anything -- but only if we know how to use it correctly and efficiently. However, many people, from both inside and outside the C++ community, lament the large surface area of the STL and the (admittedly) large language specification, claiming that to really understand C++ one needs to be a language-lawyer, if not to have the entire standard memorized. I think this view is mistaken.

I claim that, with a modest and correct understanding of the C++ language semantics, you too can understand the ins and outs of the STL. How can you do this? By writing the STL yourself, of course.

In this series of blog posts I will lead you through an implementation of the core utilities, data structures, algorithms provided in the STL. We will be writing it in C++11, with some added tweaks from the C++14 STL (mostly constexpr qualifiers, and other such things, where appropriate). We will also be avoiding implementation of features deprecated in C++17, as soon enough we should be avoiding their use anyways. Our guide to the specifications will be cppreference.com. You can follow along by cloning the GitHub repository, which I'll keep updated with each new blog post as we progress. All the code in this project will be released under the MIT License (link to the file in the main repository).

I will be assuming a knowledge of core language semantics. That means you should understand, at a minimum: value types, pointers, references, const references, and forwarding references, as well as the differences between them; copy versus move constructors and assignment operators; lambda expressions; exceptions and the noexcept qualifier/operator; and (last but certainly not least) RAII. You will also need a basic understanding of what classes and templates are and why we use them. If you are not familiar with any of the above I would recommend picking out an introductory book from the list on this Stack Overflow FAQ. A useful reference for concepts is the Library Concepts page from cppreference; we will be making heavy use of the terms found therein.

This is a big project, and will take a while. If you stick with it, however, you should come out the other side with a stronger understanding of the C++ STL, its capabilities and drawbacks, and when and how to use it.

Here is the timeline for what we'll implement and in what order. Each of these posts is going to be fast and dense, with a lot of material covered in each with not very much room for extensive discussion, so it may be worthwhile picking up a textbook from the Stack Overflow book list to fill in any gaps as we go along. As each blog post is published I will change these to the appropriate links.

  • Chapter 0 - utilities and iterators
  1. utilities and integer sequences
  2. pairs and tuples (part 1, part 2)
  3. reference wrappers and function objects
  4. iterators, iterator operations, and range/container access
  • Chapter 1 - algorithms
  1. minimums, maximums, and permutations
  2. non-modifying sequence operations
  3. modifying sequence operations
  4. partitioning and sorting algorithms
  5. searching algorithms
  6. set algorithms
  7. heap algorithms
  8. numeric algorithms
  • Chapter 2 - data structures
  1. array
  2. time travel to Chapter 5: allocators
  3. string
  4. vector
  5. deque
  6. forward_list and list
  7. abstract data types part 1:  stack
  8. abstract data types part 2: queue, and priority queue
  9. time travel to Chapter 5: hashing
  10. set and unordered_set
  11. multiset and unordered_multiset
  12. map and unordered_map
  13. multimap and unordered_multimap
  • Chapter 3 - input and output
  1. motivation and demotivation - why and why not iostreams?
  2. buffers of various sorts - streambuf, filebuf, and stringbuf
  3. base classes - ios_base and basic_ios
  4. i/o manipulators
  5. input streams
  6. output streams
  7. file streams
  8. string streams
  • Chapter 4 - future STL components
  1. motivation for more expressive types
  2. memory and uninitialized storage utilities
  3. optional
  4. variant
  5. any
  • Chapter 5 - Miscellany
  1. allocators
  2. hashing
  3. regex
  4. atomics
  5. chrono
  6. locks and condition variables
  7. threads
  8. async, tasks, and futures

You will have noticed that algorithms come before data structures, which may seem odd and perhaps a little unmotivated. Why should we bother to write algorithms if we have nothing to use them on? Fear not, for this is intentional. One part of the genius of the STL is that algorithms are decoupled from data structures. The only thing we need in order to implement the algorithms in Chapter 1 is the utilities that we created in Chapter 0. Once we get on to implementing data structures, all the material from Chapter 2 will just work, right out of the box.

From Chapter 2 and onwards I will also be writing small toy-programs as motivating example usages of the data structures and algorithms we implement. I haven't yet decided what these will be, but I plan on having them build upon one-another to incorporate as much of our STL as possible.

Before we get started I have a few caveats. First, I will not (at least initially) be touching on libraries that require platform specific implementations, such as the chrono, atomic operations, async, and thread support libraries, although some of this material will be covered in Chapter 5 - Miscellany where I will allow myself to use platform specific material (we'll be using *nix and posix libraries). Neither will I be discussing the common mathematical functions, special mathematical functions, or localizations libraries, as these will take us too far afield from our main goal (data structures and algorithms). We will be taking a few STL features for granted as necessary. In particular, we will not be implementing the stdexcept, type_traits, or initializer_list headers, as well as certain parts of the utility header (std::swap, in particular). Lastly, my goal is not to write  a perfectly optimized STL implementation. We will be avoiding needlessly complex optimizations in favor of clarity and correctness wherever possible, but I will mention common optimizations where it is relevant to do so.

Good luck (to myself as well). I hope you enjoy the series and I'll see you at Chapter 0, Episode 1.