Of course some businesses have their specific needs for computer science. During the last round of interviews I had to face questions on some strange constructs. Nothing you cannot find in a good book on data structures, obviously, but still something you are probably not used very much to see in a general context.

When we speak about data structures in interviews, we are probably speaking about containers or collections. So, you most likely want to know how to store your billion of objects for retrieving the ones with a particular information that not what does a particular object store. The must-have of knowledge on data structures that every developer should know deeply is composed by:

  • arrays: you have to know that random access is constant (O(1)), but size is limited
  • lists: access to front (and maybe tail) is constant, size can increase, random access is linear (O(n)). You have also to know that you can have single linked list to next (like the default immutable list in Scala) or double linked list to next and previous (like LinkedList in Java). Please remember that Nil is the empty List in Scala.
  • arraylistit mixes the previous two by storing a list on an array. Random access is constant, but you have to store in memory an array and, when it is full, arraylist creates another array of twice the size and copies all the previous objects into the new one
  • vector: here we have differences among Java and Scala. In Java, vectors are synchronized arraylist, basically. In Scala, vectors are 32 bit little endian trie (see after), meaning that they are like lists, but with a better performing random access (but a less performing prepending). Basically, if you need to work with a linear algorithm you probably want lists, if you need random access and insert/delete, you probably need a vector.
  • hashset/hashmap: it is complicated to explain everything shortly. Hashset is like an array in which you can find an element in constant time. This magic is done by a function that, for each object, computes an unique value called hashcode. But the hashcode can be any kind of value, and the array has a limited number of slots (in this case called buckets), this means that the value is then restricted to the number of slots available, causing collisions in the array itself. Related to the hashset there are the following concepts: hashcode (and equals method, for Java), buckets, load factor, collision resolution. Hashmap are hashset in which, for each element of the set, we associate an element creating a key-value pair. In Scala, hashmap is the default implementation of the Map trait.
  • treeset/treemap: it is basically a tree structure, but usually (in Java and Scala it is like that) the class is implemented as a binary search tree, meaning that add/remove/contains method are logarithmic in time (O(logN)) and keep the structure with its order, making this THE data structure for algorithms that need an ordered collection (if you just need to  know if a collection has a specific element, hashmaps are still the preferred choice)
  • queue: generally it is a First In First Out list, meaning that insert method is inserting at the tail and pool/peek methods extract from the head. Please notice that Queue in Java is an interface, extended by LinkedList.
  • stack: generally it is a Last In First Out list, meaning that insert and pop are done on the same side of the list (head). Please notice that in Java stacks are implemented over the Vector class and in Scala they are deprecated (since version 2.11) in favour of List, that they were wrapping.
  • heap/priority queue: heaps are trees in which every father has a “bigger” (respect a concept of ordering) value than its children. To be more precise, those are max-heap, or priority queue, but it exists also a min-heap in which fathers are “smaller” than children.

Other data structures that I had to face during interviews (but they unlikely have ever appeared during my career) are:

  • sortedmap: hey, this is only a trait/interface. If they ask any strange question, in Scala its only default implementation is the treemap and in Java it is the treemap plus a less known skip list.
  • trie: it is a prefix-tree, it is not available in core Java, although Apache Commons Collections has an implementation called PATRICIA Trie, on the other hand Scala has a TrieMap defined as a concurrent map. Being a prefix-tree means that you can usually search by prefix, or you can take/drop in scala for a specific prefix, with optimised performances (usually O(K) being K the largest number of bits in the collection)
  • quadtree: they are spatial data structures, meaning that their nodes are not just data, but are data based on position. Quadtree is a tree with 4 children, the root representing all the space we want to represent, and each child representing one out of four sectors of the same size. The rule is that each sector can store a maximum number of elements, if this number is not enough, we add a level of nesting by adding a child with its 4 sectors. For example, if we have as max number 3, we have the root node empty and we call the sector N, S, E and O, let’s say that we put 4 elements in N we will have N adding a layer by creating its 4 sectors and splitting its data on it. This is really useful in simulations or games, for the collision detection

Ok, the idea was not to give a complete list of collections, but to give an idea. It is the reason of the Part I in the title.

Stay tuned!