10  Collections Framework

The collections framework consists in a set of related classes contained in the package java.util that provide:

Originally the were designed to use Object as a universal reference type. Since Java 5 redefined as generic types, in a backward compatible way.

The main interfaces of the collections framework are reported in the top part of Figure 10.1.

The ADT interfaces can be divided into two main groups:

The corresponding implementation classes are summarized in the bottom part of Figure 10.1.

classDiagram
    Collection <|-- Set
    Set <|-- SortedSet
    Collection <|-- Queue
    Collection <|-- List
    Map <|-- SortedMap

    Set <|.. HashSet
    HashSet <|-- LinkedHashSet
    SortedSet <|.. TreeSet
    List <|.. ArrayList
    List <|.. LinkedList
    Queue <|.. LinkedList
    Queue <|.. PriorityQueue

    Map <|.. HashMap
    SortedMap <|.. TreeMap
    HashMap <|-- LinkedHashMap

    <<interface>> Collection
    <<interface>> Set
    <<interface>> SortedSet
    <<interface>> Queue
    <<interface>> List
    <<interface>> Map
    <<interface>> SortedMap
    
    style Collection fill:bisque,stroke:darkorange
    style Set fill:bisque,stroke:darkorange
    style SortedSet fill:bisque,stroke:darkorange
    style Queue fill:bisque,stroke:darkorange
    style List fill:bisque,stroke:darkorange
    style Map fill:bisque,stroke:darkorange
    style SortedMap fill:bisque,stroke:darkorange
Figure 10.1: Linked list structure example

Often for a single interface, multiple implementations are provided that leverage a different underlying data structure. The correspondance between interface, data structure, and implementation is summarize in Table 10.1.

Table 10.1: Correspondence between data structure and collection implementation
\(\downarrow\) structure / interface \(\rightarrow\) Set List Map
Hash Table HashSet HashMap
Balanced Tree TreeSet TreeMap
Resizable Array ArrayList
Linked List LinkedList
Hash Table + Linked List LinkedHashSet LinkedHashMap

10.1 Group containers

10.1.1 Collection

The container represented by interface Collection represents a group of elements (references to objects) without any specific constraint. In particular it is not specified whether they are orderd or not, and whether the accept duplicates.

The interface Collection extends Iterable and thus can be used in the for-each construct.

All classes implementing Collection shall provide two constructors

  • C(): creates an empty container
  • C(Collection c): creates a container filled in with the same elements present in c

The main methods are:

  • int size(): number of elements present in the collection
  • boolean isEmpty(): tests if the collection is empty, i.e. size()==0
  • boolean add(E element): add an element to the collection
  • boolean addAll(Collection<? extends E> c): add all elements in c to the collection
  • boolean contains(E element): tests if the given element in present in the collection
  • boolean containsAll(Collection<?> c): tests if all elements in c are present in the collection
  • boolean remove(E element): removes the given element from the collection, true if succeeds
  • boolean removeAll(Collection<?> c): removes all elements from c from the collection, true if succeeds
  • void clear(): removes all elements from the collection
  • Object[] toArray(): creates an array containing the elements present in the collection

The methdod Object[] toArray() returns an Object even if the interface uses generics because everywhere else. This is due to generic types limitations deriving from type erasure, the method cannot instantiate an array of the correct type.

As an alternative, the overload <T> T[] toArray(T[]) accepts an array of the target type. If not large enough a new array of the same type and right length is instantiated and returned; it uses the Arrays.copyOf() method that create an array of the right type. In practice, an empty array of the correct type can be passed and let the method do the allocation.

The interface Collection has no direct implementations. Although it can be used as a more general reference type to specific container types, for instance:

1Collection<Person> persons = new LinkedList<Person>();
2persons.add( new Person(“Alice”) );
3System.out.println( persons.size() );
4Collection<Person> copy = new TreeSet<Person>();
5copy.addAll(persons);
6Person[] array= copy.toArray(new Person[0]);
7System.out.println( array[0] );
1
instantiates a new collection, implemented by LinkedList class
2
adds a new element to the collection
3
prints the size, the is 1 at this point
4
creates a new instance of a collection, implemented by TreeSet class
5
adds all elements in persons into copy, the two operations could be conflated into new TreeSet<Person>(persons)
6
creates an array containing the elements of the collection
7
prints the first (and only) element of the array

10.1.2 List

The container represented by interface List is a specialization of Collection – which it implements –. It can contain duplicate elements, in addition the insertion order is preserved and the user can define the insertion point, and most importantly the elements can be accessed by position. Extends Collection interface

The main methods of the List interface, in addition to those inherited from Collection, are:

  • E get(int index): retrieves the element at the given position
  • E set(int index, E element): replaces the element at the given position
  • void add(int index, E element): adds an element in a given position, can cause a shift of elements already present
  • E remove(int index): removes the element at the given position and returns it
  • int indexOf(E o): returns the index of the first occurrence of the given element
  • int lastIndexOf(E o): returns the index of the latest occurrence of the given element
  • List<E> subList(int from, int to): returns a list that contains a portion of the starting one

An example of usage of class List is:

List<Integer> l = new ArrayList<>();

1l.add(42);
2l.add(0, 13);
3l.set(0, 20);
4int a = l.get(1);
5l.add(9, 30);
6l.add(2, 30);
1
42 in position 0
2
42 moved to position 1
3
13 replaced by 20
4
returns 42
5
Error: out of bounds, IndexOutOfBoundsException: Index: 9, Size: 2
6
Ok, immediately after the last element, same as add()

The main List implementations are:

  • ArrayList<E> it is based on a resizable array,
  • LinkedList<E> it is based on linked elements.

The two classes impelements the method defined in the interface and in addition provide additional methods that are convenient for the specific data structure used:

10.1.2.1 LinkedList

The LinkedList implementation uses a series of linked elements as shown in Figure 10.2. The list maintains a reference to the first element first and one to the last one last.

G linkedlist :ArrayList first last size = 2 el1 :Element prev item next linkedlist:e->el1:n el2 :Element prev item next linkedlist:e->el2:s el1:e->el2:n first "First" el1:item->first el2:w->el1:s second "Second" el2:item->second
Figure 10.2: Linked list structure example

The add() operations adds a new element at the tail of the list as shown in Figure 10.3.

G linkedlist :ArrayList first last size = 2 el1 :Element prev item next linkedlist:e->el1:n el2 :Element prev item next linkedlist:e->el2:s el3 :Element prev item next linkedlist:e->el3 el1:e->el2:n first "First" el1:item->first el2:w->el1:s el2:e->el3:n second "Second" el2:item->second el3:w->el2 third "Third" el3:item->third
Figure 10.3: Linked list structure example

Essentially:

  • a new element is created, its next element being the last one,
  • the next element of the last is the newly created element,
  • the new element becomes the last.

The class LinkedList<E> provides the following additional constructors and methods that leverage the internal structure:

  • void addFirst(E o): adds an element in first position, i.e. l.add(0, o)
  • void addLast(E o): adds an element in the latest position, i.e. l.add(o)
  • E getFirst(): returns the first element, i.e. get(0)
  • E getLast(): returns the last element, i.e. l.get( l.size()-1 )
  • E removeFirst(): removes the first element
  • E removeLast(): removes the last element

10.1.2.2 Array list

The Arraylist implementation uses an array that is resized when required. An example is shown in Figure 10.4.

G arraylist :ArrayList elements size = 2 elements null ... null arraylist:e->elements:n first "First" elements:e0->first second "Second" elements:e1->second
Figure 10.4: Array list structure example

The add() operations adds a reference to the element at the next available place in the array, as shown in Figure 10.5.

G arraylist :ArrayList elements size = 16 elements null ... arraylist:e->elements:n first "First" elements:e0->first second "Second" elements:e1->second _16th "Sixteenth" elements:e15->_16th
Figure 10.5: Array list structure example

When the array is full a resize operation is required as shown in Figure 10.6.

G arraylist :ArrayList elements size = 16 elements ... arraylist:e->elements:n elementsNew null ... null ... null arraylist:e->elementsNew:n first "First" elements:e0->first:w second "Second" elements:e1->second:w _16th "Sixteenth" elements:e15->_16th:w first:e->elementsNew:e0 second:e->elementsNew:e1 _16th->elementsNew:e15 _17th "Seventeenth" _17th:e->elementsNew:e16
Figure 10.6: Array list structure example

In summary:

  • a new (larger) array is created and the existing elements are copied into it (Arrays.copyOf())
  • then the first new position of the array will store the reference to the element

The operation of instantiating the new array and copying the elements is very expensive, for this reason to improve performance it can be advisable to define an initial size suitable for the usage.

The class ArrayList<E> provides the following additional constructors and methods:

  • ArrayList(int initialCapacity): initializes the underlying array to the given initial size,
  • void ensureCapacity(int minCapacity): resizes the underlying array so that is has at least the given size.

10.1.3 Queue

The container represented by interface Queue is a specialization of the Collection. The elements are inserted in a specific order. In particular, it defines a head position where is the first element that can be accessed.

The typical orders are:

  • insertion order, i.e. FIFO (Fist-In-First-Out), or
  • elements’ natural order (Priority queue).

Interface Queue, in addition to those inherited from Collection, defines the following main methods:

  • peek(): just retrieves the element at the head of the queue
  • poll(): retrieves and removes the element at the head of the queue

The Queue interface has two predefined implementations:

  • LinkedList, in this case the head is the tail of the list, it is a FIFO;
  • PriorityQueue, the head is the smallest element.

An example usage of Queue that compares the two predefined implementations:

1Queue<Integer> fifo = new LinkedList<Integer>();
Queue<Integer> pq = new PriorityQueue<Integer>();
2fifo.add(3); pq.add(3);
fifo.add(1); pq.add(1);
fifo.add(2); pq.add(2);
3System.out.println(fifo.peek());
4System.out.println(pq.peek());
1
create two queues using a FIFO and priority strategies
2
adds the same elements in the same order to both queues
3
the head element of the fifo is the first added
4
the head element of the pq is the smallest

10.1.4 Set and SortedSet

The interface Set contains no additional methods w.r.t. those inherited from Collection. It serves the purpose of a flagging interface. It is intended to represents containers that have no duplicate elements. The duplication detection is demanded to the equals() method.

The classes implementing Set must comply with the following restriction on the add() method:

\[\forall e_1, e_2 \in \Sigma : e_1 \neq e_2 \implies e_1\mathtt{.equals}(e_2) == \mathtt{false}\]

The elements are traversed in no particular order

A further specialization of the Set interface is SortedSet that mandates that the elements are traversed according to the element ordering. Like its parent interface, it does not allow duplicate elements.

The interface SortedSet provides the following additional methods:

  • E first(): retrieved the first element in order
  • E last(): retrieved the last element in order
  • SortedSet<E> headSet(E toElement): retrieved the initial portion of elements in order until the given one
  • SortedSet<E> tailSet(E fromElement): retrieved the final portion of elements in order starting from the given one
  • SortedSet<E> subSet(E from, E to): retrieves a subset consisting of the elements in orderd comprised within the two provided

Depending on the constructor used, it is possible to use two alternative implementations of the custom ordering:

  • TreeSet() default constructor implies natural ordering (elements must be implementations of Comparable),
  • TreeSet(Comparator c) comparator constructor implies the ordering according to the comparator criteria.

There are three standard implementations:

  • HashSet implements Set, usus hash tables as internal data structure (faster)
  • LinkedHashSet extends HashSet, in addition the elements are traversed by iterator according to the insertion order
  • TreeSet implements SortedSet, uses R-B trees as internal data structure (computationally expensive)

10.2 Associative containers

10.2.1 Map

The interface Map represents containers that associates keys to values (e.g., SSN -> Person). Keys and values must be objects, and keys must be unique. For a given key, only one value can be stored.

The following constructors are common to all map implementers:

  • M(): creates an empyt map
  • M(Map m): creates a map containing the same entries as the given map

The Map interface provides the following main methods:

  • V put(K key, V value): add an entry key-value in the map
  • V get(K key): retrieves the value associated with the given key
  • V remove(K key): removes the association for the given key
  • boolean containsKey(K key): tests if the map contains the given key
  • public Set<K> keySet(): returns the set of all keys present in the map
  • public Collection<V> values(): returns the collection of all values present in the map
  • int size(): returns the number of entries present in the map
  • boolean isEmpty(): tests if the map is empty, i.e. size()==0
  • void clear(): removes all entries from the map

Example of usage of the Map interface:

1Map<String,Person> people =new HashMap<>();
2people.put( "ALCSMT",  new Person("Alice Smith") );
people.put( "RBTGRN",  new Person("Robert Green") );

3if( ! people.containsKey(©`))
    System.out.println( "Not found" );

4Person bob = people.get("RBTGRN");

5int populationSize = people.size();

6for(Person p : people.values()){
    System.out.println(p);
}

7for(String ssn : people.keySet()){
    System.out.println(ssn);
}
1
creates a new map implemented by a HashMap class
2
adds two entries in the map
3
checks if the key "RBTGRN" is present in the map
4
retrieves the person associated to key "RBTGRN"
5
retrieves the number of entries in the map, 2
6
retrieves the values in the map, i.e. the persons
7
retrieves the set fo keys in the map

A furhter specialization of Map is represented by SortedMap that mandates that its elements be traversed according to the keys’ ordering. The ordering can be based on the natural ordering (Comparable) or provided as a Comparator object passed to the constructor.

It provides the followint specialized methods:

  • SortedMap headMap(K toKey): creates a new sorted map the contains only initial keys, up to the given one
  • SortedMap tailMap(K fromKey): creates a new sorted map the contains only final keys, starting from the given one
  • SortedMap subMap(K fromKey, K toKey): creates a new sorted map the contains only keys within the two provided
  • K firstKey(): retrieves the first key
  • K lastKey(): retrieves the last key

The standard implementations of Map interface are

  • HashMap implements Map, keys have no order, but must provide an hash code
  • LinkedHashMap extends HashMap, key have no order, but values are traversed in insertion order
  • TreeMap implements SortedMap, keys are traversed in ascending key order and values are traversed in ascending order of the relative key

10.2.1.1 HashMap

Class HashTable is based on an hash table data structure. The table is automatically resized when a specific load factor reached. By default the load factor is 0.75 and the initial size is 16. These values can be changed through a constructor. The class shares part of the implementation with the HashSet class, thus most consideration here apply also to that class.

The basic structure of the HashMap is exemplified in Figure 10.7.

G treemap :TreeMap buckets size = 3 buckets null ... treemap:e->buckets:n entry1 :Entry key value dog "Dog" entry1:k1->dog dogdef "omnivore domesticated mammal" entry1:v1->dogdef entry2 :Entry key value cat "Cat" entry2:k1->cat catdef "small domesticated carnivorous animal" entry2:v1->catdef entry3 :Entry key value pigdef "omnivorous domesticatedhoofed animal" entry3:v1->pigdef pig "Pig" entry3:k1->pig buckets:b1->entry1 buckets:b14->entry2 buckets:b15->entry3
Figure 10.7: HashMap internal structure example

Hash table based containers HashMap and HashSet work better if entries define a suitable hashCode() method. Values must be as spread as possible, otherwise, collisions occur. A collision occurs when two entries fall in the same bucket. In such a case elements are put in a chained list, chaining reduces time efficiency. Get and put operatiosn take constant time in case of no collisions.

The chaining procedure is shown in Figure 10.8.

G treemap :TreeMap buckets size = 3 buckets null null treemap:e->buckets:n entry1 :Entry key value next dog "Dog" entry1:k1->dog dogdef "omnivore domesticated mammal" entry1:v1->dogdef entry2 :Entry key value next entry3 :Entry key value next entry2:n->entry3 cat "Cat" entry2:k1->cat catdef "small domesticated carnivorous animal" entry2:v1->catdef pigdef "omnivorous domesticatedhoofed animal" entry3:v1->pigdef pig "Pig" entry3:k1->pig buckets:b1->entry1 buckets:b2->entry2
Figure 10.8: HashMap internal structure example

10.2.1.2 TreeMap

The class TreeMap is based on a Red-Black tree. The get and put operations takes \(log(n)\) time. Due to the structure of the tree, the keys are maintained and traversed in order. Therefore it is required that the key class (K) must be Comparable or a Comparator must be provided to the constructor.

An exemplification of the TreeMap structure is shown in Figure 10.9.

G treemap :TreeMap root size = 3 entry1 :Entry key value left right treemap:e->entry1 entry2 :Entry key value left right entry1:s->entry2:n entry3 :Entry key value left right entry1:s->entry3:n dogdef "omnivore domesticated mammal" entry1:e->dogdef dog "Dog" entry1:e->dog cat "Cat" entry2:w->cat catdef "small domesticated carnivorous animal" entry2:w->catdef pigdef "omnivorous domesticatedhoofed animal" entry3:e->pigdef pig "Pig" entry3:e->pig
Figure 10.9: TreeMap internal structure example

10.3 Optional

The typical convention in Java APIs is to let a method return a null reference to represent the absence of a result. This approach, require that the caller must check the return value of the method to detect that case. If the caller omits – forget? – such a check, a NullPointerException (NPE) may occur. This is called the nullability problem.

The collections framework contain class Optional that is intended to mitigate the nullability problem. Class Optional is used to represent a potential value, i.e. a value that might be null.

Methods returning Optional<T> make explicit that the return value may be missing. Such a return type forces the clients to deal with potentially empty optional.

Access to the value embedded within the Optional can be done through:

  • boolean isPresent(): checks if Optional contains a value
  • T get(): returns the value if present; otherwise it throws a NoSuchElementException.
  • ifPresent(Consumer<T> block): executes the given code if a value is present.
  • T orElse(T default): returns the value if present; otherwise it returns the provided default value.
  • T orElse(Supplier<T> s): returns the value if present; otherwise it returns the value supplied by s

The Optional objects can be created through a set of static factory methods:

  • of(T v): throws exception if v is null
  • ofNullable(T v): returns an empty Optional when v is null
  • empty(): returns an empty Optional, same as ofNullable(null)

Such methods force the programmers to think about what they are about to return.

10.4 Using Collections

Note

The reccomendation is to always use the most general interface suitable for the usage in the program. E.g. List<> is better than LinkedList<>.

The motivations for such recommendation are:

  • more general interfaces are more flexible for future changes, i.e. a change in the specific implementing class does not affect the usage, which always goes through the general interface.
  • improve the programmer’s approach, makes them think first about the type of container, then about the implementation

The general rule for selecting the container type is:

  • if access by key is needed use a Map
    • if values sorted by key use a SortedMap
  • otherwise use a Collection (or derived interfaces)
    • if indexed access, use a List
    • if access in order, use a Queue
    • if no duplicates, use a Set
    • if elements sorted, use a SortedSet

10.4.1 Using lists

10.4.1.1 Iteration

Since all group containers – through Collection – implements the Iterable interface, they can be used with the try-catch construct:

Iterable<Person> persons = new LinkedList<Person>();

for(Person p: persons) {
    System.out.println(p);
}

In addition Iterable defines the default method:

  • forEach(Consumer<? super T> action) perfoms an action on each element.

The method can be used to perform operations of elements with a functional interface:

Iterable<Person> persons = new LinkedList<Person>();

persons.forEach( p -> {
    System.out.println(p);
});

It is important to note that it is unsafe to iterate over a collection while modifying – i.e., add or remove elements – at the same time:

List<Integer> lst=new LinkedList<>();
lst.add( 10 );   
lst.add( 11 );   
lst.add( 13 ); 
lst.add( 20 );

int count = 0;
for (Iterator<?> itr = lst.iterator(); itr.hasNext(); ) {
   itr.next(); 
   if (count==1) 
1      lst.remove(count);
   count++;
}
1
a ConcurrentModificationException is raised at run-time

The correct way of modifying while iterating is by using the iterator’s own methods:

  • Iterator.remove(): removes the current element
  • ListIterator.add(): adds an element at the current position
List<Integer> lst = new LinkedList<>();
lst.add( 10 );   
lst.add( 11 );   
lst.add( 13 ); 
lst.add( 20 );

int count = 0;
for(Iterator<?> itr = lst.iterator(); itr.hasNext(); ) {
   itr.next();
   if (count==1) 
      itr.remove(); // ok
   count++;
}

10.4.1.2 Time Performance

Based on the implementation structured of the two list classes, the general time performance for the two main implementations of List are sumamrized in the following Table 10.2.

Table 10.2: Summary of time performance behavior of main List implementations.
Operation ArrayList LinkedList
get(n) Constant Linear
add(0,e) Linear Constant
add() Constant Constant

More in detail we can build the following analytical models for two insertion methods:

add(0,e): add in first position in list of size \(n\):

  • LinkedList: \(t(n) = C_L\)
  • ArrayList: \(t(n) = n \dot C_A\)

for(int i=0; i<n; ++i) l.add(i);: add \(n\) elements in an initially empty list

  • LinkedList: \(t(n) = n \cdot C_L\)
  • ArrayList: \(t(n) = \sum_{i=1}^n C_A \cdot i = \frac{C_A}{2}n\cdot(n-1)\)

On a MacBookPro M2, \(C_L=16.0 ns\) and \(C_A = 0.2 ns\)

10.4.2 Using maps

10.4.2.1 Getting an item

The retrieval of an element from a map should consider the case of a missing element (i.e. key not present). A first option is to check the return value of get() that returns null if the key is not found:

String val = map.get(key);
if( val == null ) val = "Undefined";

The above solution has a possible vulnerability, if the value associated with a key is exactly null a valid return value cannot be distinguished from a missing key.

Alternatively it is possible to first check the presence of the key and then retrive the value:

if( ! map.containsKey(key))
     val = "Undefined"; 
else
     val = map.get(key);

Since this kind of operation is very common, a specific default method getOrDefault() has been added to the interface, which can be used for the same purpose:

String val = map.getOrDefault(key, "Undefined");

10.4.2.2 Updating entries

Another common operation is to update the entries, i.e. changing the value associated with a key. A typical exambe is when a map is used to count the occurrences of (repeated) items in a collection. A possible example is:

Map<String,Integer> wc=new HashMap<>();
for(String w : words) {
    Integer i= wc.get(w);
    wc.put(w, i==null?1:i+1);
}

Since this operation is very commong a suitable default function compute() is provided. The method compute() accepts two argument:

  • the key,
  • a BiFunction that takes the key and value present in the map (value is null if not present) and computes the new value to be stored.

A simple alternative to the above code for counting occurrence that uses compute() can be:

Map<String,Integer> wc=new HashMap<>();
for(String w : words) {
    wc.compute(w,(k,v)->v==null?1:v+1);
}

The above code although clean and compact is not particularly efficient since it uses the autoboxing feature.

Every increment of an Integer is equivalent to Integer.valueOf(v.intValue()+1), that requires additional time for the instantiation and and additional memory fee of 16 bytes per object creation.

A much better solution would be to use a mutable object. For instance:

class Counter {
    private int i=0;
    public String toString(){
        return ":"+i; 
    }
    public void inc(){ ++i; }
}

In this case another default method can be usefule computeIfAbsent() that allows creating a Counter objects only on the first occurrence of a word and just increment the counter on every occurrence:

Map<String,Counter> wc=new HashMap<>();
for(String w : words) {
    wc.computeIfAbsent(w, k->new Counter()).inc();
}

This solution is ~40% faster than the one using Integer and requires 16 bytes less per each increment.

10.4.2.3 Keeping items sorted

Using sorted maps

SortedMap<String,Integer> wc=new TreeMap<>();

produces:

`“A”=1, “All”=3, “And"=2, “Barefoot”=1,…`

Vs. using non sorted maps:

Map<> wc=new HashMap<>();

produces:

“reason”=1, “been”=1, “spoke”=1, “let”=1

10.4.2.4 Time performance

Example: 100k searches in a container with two different sizes, require

size HashMap TreeMap ArrayList LinkedList
100k 3ms 60ms 40s >1h
200k 3ms 65ms 110s

10.5 Algorithms

A number of common algorithms are provided as static methods of java.util.Collections. Please note the trailing s!

Most of the algorithms work on List since it has the concept of position:

  • sort() - merge sort of List
  • binarySearch() – requires ordered sequence
  • shuffle() – unsort
  • reverse() - requires ordered sequence
  • rotate() – of given a distance
  • min(), max() – min and max in a Collection

10.5.1 Sort

The method uses the merge sort algorithm that has a \(n~\mathrm{log}(n)\) time complexity. Requires a List<E> since it needs access by index.

As the sort() method in Arrays, it has two overloads:

  • <T extends Comparable<? super T>> void sort(List<T> list)
  • <T> void sort(List<T> list, Comparator<? super T> cmp)

10.6 Wrap-up

  • The collections framework includes interfaces and classes for containers
  • There are two main families
    • Group containers
    • Associative containers
  • All the components of the framework are defined as generic types