The data structures challenge
Written by: Richard Smith Last updated: 2013-08-18 12:00:00 +0000
To kick start this new blog, I’m setting myself a challenge: implement 42 data structures in 42 days.
Update: After about 2 weeks of this, my PhD exploded and I found myself totally overwhelmed. I’ve postponed (or failed!) the datastructures challenge having completed 10 structures and will resume work on the project when my PhD workload calms down again.
I’m nearing the end of the first year of my PhD. I started 10 months ago as a biologist with a bit of programming experience. A fair description of my current role in the lab is somewhere between a computational biologist and bioinformatician. I’m writing software every day to handle huge amounts of data in short turnaround times, and I have no formal CS training.
The real killer insights of CS are in data structures and algorithms. I’ve done project-specific implementations of various simple structures, and I know a bit in abstract about more complex structures like bloom filters from reading source code (e.g. of khmer). But I know my work would benefit greatly from a thorough grounding in data structures. I’m fed up with not knowing enough, so I’m fixing it. Abstract learning is not sufficient - I want to cement the knowledge by doing.
From now (Sunday 18th August 2013) until I complete the list below, I’m going to implement at least one data structure per day (on average). I’ll start with basics and abstract structures, and move up to specific variants and finally tackle the complex stuff. I’m not aiming for algorithmic perfection: the implementations won’t be optimal first time round. Once I’ve finished the list, I’ll go back and try to improve the implementations.
Because I often don’t have spare coding time on a given day, I might stack up my structures on one day (for example, I might do 6 structures on a Sunday to cover the next week). There are 42 structures in the list below, so assuming I don’t add more as I go, I’ll be finished on or before September 29th 2013.
My language of choice is Ruby. There are various reasons, but it boils down to the fact that Ruby is a beautiful, intuitive and expressive language and I can think faster in it than any of the others I know. A decent set of Ruby data structures would also be useful to me on a daily basis, and I haven’t found an existing one. Maybe it can help others too - I’m releasing it as a gem. You can follow developent on Github.
Enough talk: here’s the unordered list (cobbled together from Stack Overflow posts) with wiki links.
- Queue - done! Code on Github
- Priority queue - done! Code on Github
- Deque - done! Code on Github
- Double-ended priority queue - done! Code on Github
- [Stack](http://en.wikipedia.org/wiki/Stack_(data_structure%29) - done! Code on Github
- Linked list - done! Code on Github
- Work-stealing queue (note: PDF link, no wiki article)
- Suffix array
- Disjoint set
- Sparse array
- Sparse matrix
- [Graph](http://en.wikipedia.org/wiki/Graph_(abstract_data_type%29) - done (adjacency list form)! Code on Github
- Directed graph
- Directed acyclic graph
- [Tree](http://en.wikipedia.org/wiki/Tree_(data_structure%29) - done! Code on Github
- Trie (radix tree)
- Binary tree
- Binary search tree
- Weight balanced tree
- Self-balancing binary search tree
- Decision tree
- Huffman tree
- Merkle tree
- Min-max heap
- Fenwick tree
- Suffix tree
- Suffix trie (note: PDF link, no wiki article)
- Splay tree
- Ternary tree
Probabilistic and streaming structures
Whew! A pretty long list. But many of these build on one another, so that many will inherit from their parent structures. The challenge is less daunting than it might appear at first.
Of course, some of these structures might be prohibitively slow in pure Ruby. I’m thinking in particular of bloom-filters and count-min sketches. But I’m going to go ahread an implement them in Ruby so that I understand them. Then, if performance is weak, I can rewrite in C and use Ruby bindings.
I’m also interested in implementing lock-free versions of as many of the above as possible. I couldn’t find a nice list of structures with lock-free implementations, so I’ll expand this point as I learn more..
This is a living document - I will update with links to explanatory posts and code as I complete them. I will also structure and annotate the list as I learn more about the structures.
Have I missed an important structure? Is one of these redundant? Suggestions/corrections to this list are welcome.
See you on the other side!