10 Collections Framework

The collections framework consists in a set of related classes contained in the package java.util that provide:

Interfaces for several ADTs (Abstract Data Types)
Classes implementating ADTs
Algorithms (sort)

Originally the were designed to use Object as a universal reference type. Since Java 5 redefined as generic types, in a backward compatible way.

The main interfaces of the collections framework are reported in the top part of Figure 10.1.

The ADT interfaces can be divided into two main groups:

group containers, that collect simple objects using different strategies,
associative containers, that store an association between two objects, called key and value.

The corresponding implementation classes are summarized in the bottom part of Figure 10.1.

classDiagram
    Collection <|-- Set
    Set <|-- SortedSet
    Collection <|-- Queue
    Collection <|-- List
    Map <|-- SortedMap

    Set <|.. HashSet
    HashSet <|-- LinkedHashSet
    SortedSet <|.. TreeSet
    List <|.. ArrayList
    List <|.. LinkedList
    Queue <|.. LinkedList
    Queue <|.. PriorityQueue

    Map <|.. HashMap
    SortedMap <|.. TreeMap
    HashMap <|-- LinkedHashMap

    <<interface>> Collection
    <<interface>> Set
    <<interface>> SortedSet
    <<interface>> Queue
    <<interface>> List
    <<interface>> Map
    <<interface>> SortedMap
    
    style Collection fill:bisque,stroke:darkorange
    style Set fill:bisque,stroke:darkorange
    style SortedSet fill:bisque,stroke:darkorange
    style Queue fill:bisque,stroke:darkorange
    style List fill:bisque,stroke:darkorange
    style Map fill:bisque,stroke:darkorange
    style SortedMap fill:bisque,stroke:darkorange

Figure 10.1: Linked list structure example

Often for a single interface, multiple implementations are provided that leverage a different underlying data structure. The correspondance between interface, data structure, and implementation is summarize in Table 10.1.

Table 10.1: Correspondence between data structure and collection implementation

\(\downarrow\) structure / interface \(\rightarrow\)	Set	List	Map
Hash Table	`HashSet`		`HashMap`
Balanced Tree	`TreeSet`		`TreeMap`
Resizable Array		`ArrayList`
Linked List		`LinkedList`
Hash Table + Linked List	`LinkedHashSet`		`LinkedHashMap`

10.1 Group containers

10.1.1 Collection

The container represented by interface Collection represents a group of elements (references to objects) without any specific constraint. In particular it is not specified whether they are orderd or not, and whether the accept duplicates.

The interface Collection extends Iterable and thus can be used in the for-each construct.

All classes implementing Collection shall provide two constructors

C(): creates an empty container
C(Collection c): creates a container filled in with the same elements present in c

The main methods are:

int size(): number of elements present in the collection
boolean isEmpty(): tests if the collection is empty, i.e. size()==0
boolean add(E element): add an element to the collection
boolean addAll(Collection<? extends E> c): add all elements in c to the collection
boolean contains(E element): tests if the given element in present in the collection
boolean containsAll(Collection<?> c): tests if all elements in c are present in the collection
boolean remove(E element): removes the given element from the collection, true if succeeds
boolean removeAll(Collection<?> c): removes all elements from c from the collection, true if succeeds
void clear(): removes all elements from the collection
Object[] toArray(): creates an array containing the elements present in the collection

The methdod Object[] toArray() returns an Object even if the interface uses generics because everywhere else. This is due to generic types limitations deriving from type erasure, the method cannot instantiate an array of the correct type.

As an alternative, the overload <T> T[] toArray(T[]) accepts an array of the target type. If not large enough a new array of the same type and right length is instantiated and returned; it uses the Arrays.copyOf() method that create an array of the right type. In practice, an empty array of the correct type can be passed and let the method do the allocation.

The interface Collection has no direct implementations. Although it can be used as a more general reference type to specific container types, for instance:

1Collection<Person> persons = new LinkedList<Person>();
2persons.add( new Person(“Alice”) );
3System.out.println( persons.size() );
4Collection<Person> copy = new TreeSet<Person>();
5copy.addAll(persons);
6Person[] array= copy.toArray(new Person[0]);
7System.out.println( array[0] );

1: instantiates a new collection, implemented by LinkedList class
2: adds a new element to the collection
3: prints the size, the is 1 at this point
4: creates a new instance of a collection, implemented by TreeSet class
5: adds all elements in persons into copy, the two operations could be conflated into new TreeSet<Person>(persons)
6: creates an array containing the elements of the collection
7: prints the first (and only) element of the array

10.1.2 List

The container represented by interface List is a specialization of Collection – which it implements –. It can contain duplicate elements, in addition the insertion order is preserved and the user can define the insertion point, and most importantly the elements can be accessed by position. Extends Collection interface

The main methods of the List interface, in addition to those inherited from Collection, are:

E get(int index): retrieves the element at the given position
E set(int index, E element): replaces the element at the given position
void add(int index, E element): adds an element in a given position, can cause a shift of elements already present
E remove(int index): removes the element at the given position and returns it
int indexOf(E o): returns the index of the first occurrence of the given element
int lastIndexOf(E o): returns the index of the latest occurrence of the given element
List<E> subList(int from, int to): returns a list that contains a portion of the starting one

An example of usage of class List is:

List<Integer> l = new ArrayList<>();

1l.add(42);
2l.add(0, 13);
3l.set(0, 20);
4int a = l.get(1);
5l.add(9, 30);
6l.add(2, 30);

1: 42 in position 0
2: 42 moved to position 1
3: 13 replaced by 20
4: returns 42
5: Error: out of bounds, IndexOutOfBoundsException: Index: 9, Size: 2
6: Ok, immediately after the last element, same as add()

The main List implementations are:

ArrayList<E> it is based on a resizable array,
LinkedList<E> it is based on linked elements.

The two classes impelements the method defined in the interface and in addition provide additional methods that are convenient for the specific data structure used:

10.1.2.1 LinkedList

The LinkedList implementation uses a series of linked elements as shown in Figure 10.2. The list maintains a reference to the first element first and one to the last one last.

Figure 10.2: Linked list structure example

The add() operations adds a new element at the tail of the list as shown in Figure 10.3.

Figure 10.3: Linked list structure example

Essentially:

a new element is created, its next element being the last one,
the next element of the last is the newly created element,
the new element becomes the last.

The class LinkedList<E> provides the following additional constructors and methods that leverage the internal structure:

void addFirst(E o): adds an element in first position, i.e. l.add(0, o)
void addLast(E o): adds an element in the latest position, i.e. l.add(o)
E getFirst(): returns the first element, i.e. get(0)
E getLast(): returns the last element, i.e. l.get( l.size()-1 )
E removeFirst(): removes the first element
E removeLast(): removes the last element

10.1.2.2 Array list

The Arraylist implementation uses an array that is resized when required. An example is shown in Figure 10.4.

Figure 10.4: Array list structure example

The add() operations adds a reference to the element at the next available place in the array, as shown in Figure 10.5.

Figure 10.5: Array list structure example

When the array is full a resize operation is required as shown in Figure 10.6.

Figure 10.6: Array list structure example

In summary:

a new (larger) array is created and the existing elements are copied into it (Arrays.copyOf())
then the first new position of the array will store the reference to the element

The operation of instantiating the new array and copying the elements is very expensive, for this reason to improve performance it can be advisable to define an initial size suitable for the usage.

The class ArrayList<E> provides the following additional constructors and methods:

ArrayList(int initialCapacity): initializes the underlying array to the given initial size,
void ensureCapacity(int minCapacity): resizes the underlying array so that is has at least the given size.

10.1.3 Queue

The container represented by interface Queue is a specialization of the Collection. The elements are inserted in a specific order. In particular, it defines a head position where is the first element that can be accessed.

The typical orders are:

insertion order, i.e. FIFO (Fist-In-First-Out), or
elements’ natural order (Priority queue).

Interface Queue, in addition to those inherited from Collection, defines the following main methods:

peek(): just retrieves the element at the head of the queue
poll(): retrieves and removes the element at the head of the queue

The Queue interface has two predefined implementations:

LinkedList, in this case the head is the tail of the list, it is a FIFO;
PriorityQueue, the head is the smallest element.

An example usage of Queue that compares the two predefined implementations:

1Queue<Integer> fifo = new LinkedList<Integer>();
Queue<Integer> pq = new PriorityQueue<Integer>();
2fifo.add(3); pq.add(3);
fifo.add(1); pq.add(1);
fifo.add(2); pq.add(2);
3System.out.println(fifo.peek());
4System.out.println(pq.peek());

1: create two queues using a FIFO and priority strategies
2: adds the same elements in the same order to both queues
3: the head element of the fifo is the first added
4: the head element of the pq is the smallest

10.1.4 Set and SortedSet

The interface Set contains no additional methods w.r.t. those inherited from Collection. It serves the purpose of a flagging interface. It is intended to represents containers that have no duplicate elements. The duplication detection is demanded to the equals() method.

The classes implementing Set must comply with the following restriction on the add() method:

\[\forall e_1, e_2 \in \Sigma : e_1 \neq e_2 \implies e_1\mathtt{.equals}(e_2) == \mathtt{false}\]

The elements are traversed in no particular order

A further specialization of the Set interface is SortedSet that mandates that the elements are traversed according to the element ordering. Like its parent interface, it does not allow duplicate elements.

The interface SortedSet provides the following additional methods:

E first(): retrieved the first element in order
E last(): retrieved the last element in order
SortedSet<E> headSet(E toElement): retrieved the initial portion of elements in order until the given one
SortedSet<E> tailSet(E fromElement): retrieved the final portion of elements in order starting from the given one
SortedSet<E> subSet(E from, E to): retrieves a subset consisting of the elements in orderd comprised within the two provided

Depending on the constructor used, it is possible to use two alternative implementations of the custom ordering:

TreeSet() default constructor implies natural ordering (elements must be implementations of Comparable),
TreeSet(Comparator c) comparator constructor implies the ordering according to the comparator criteria.

There are three standard implementations:

HashSet implements Set, usus hash tables as internal data structure (faster)
LinkedHashSet extends HashSet, in addition the elements are traversed by iterator according to the insertion order
TreeSet implements SortedSet, uses R-B trees as internal data structure (computationally expensive)

10.2 Associative containers

10.2.1 Map

The interface Map represents containers that associates keys to values (e.g., SSN -> Person). Keys and values must be objects, and keys must be unique. For a given key, only one value can be stored.

The following constructors are common to all map implementers:

M(): creates an empyt map
M(Map m): creates a map containing the same entries as the given map

The Map interface provides the following main methods:

V put(K key, V value): add an entry key-value in the map
V get(K key): retrieves the value associated with the given key
V remove(K key): removes the association for the given key
boolean containsKey(K key): tests if the map contains the given key
public Set<K> keySet(): returns the set of all keys present in the map
public Collection<V> values(): returns the collection of all values present in the map
int size(): returns the number of entries present in the map
boolean isEmpty(): tests if the map is empty, i.e. size()==0
void clear(): removes all entries from the map

Example of usage of the Map interface:

1Map<String,Person> people =new HashMap<>();
2people.put( "ALCSMT",  new Person("Alice Smith") );
people.put( "RBTGRN",  new Person("Robert Green") );

3if( ! people.containsKey(©`))
    System.out.println( "Not found" );

4Person bob = people.get("RBTGRN");

5int populationSize = people.size();

6for(Person p : people.values()){
    System.out.println(p);
}

7for(String ssn : people.keySet()){
    System.out.println(ssn);
}

1: creates a new map implemented by a HashMap class
2: adds two entries in the map
3: checks if the key "RBTGRN" is present in the map
4: retrieves the person associated to key "RBTGRN"
5: retrieves the number of entries in the map, 2
6: retrieves the values in the map, i.e. the persons
7: retrieves the set fo keys in the map

A furhter specialization of Map is represented by SortedMap that mandates that its elements be traversed according to the keys’ ordering. The ordering can be based on the natural ordering (Comparable) or provided as a Comparator object passed to the constructor.

It provides the followint specialized methods:

SortedMap headMap(K toKey): creates a new sorted map the contains only initial keys, up to the given one
SortedMap tailMap(K fromKey): creates a new sorted map the contains only final keys, starting from the given one
SortedMap subMap(K fromKey, K toKey): creates a new sorted map the contains only keys within the two provided
K firstKey(): retrieves the first key
K lastKey(): retrieves the last key

The standard implementations of Map interface are

HashMap implements Map, keys have no order, but must provide an hash code
LinkedHashMap extends HashMap, key have no order, but values are traversed in insertion order
TreeMap implements SortedMap, keys are traversed in ascending key order and values are traversed in ascending order of the relative key

10.2.1.1 HashMap

Class HashTable is based on an hash table data structure. The table is automatically resized when a specific load factor reached. By default the load factor is 0.75 and the initial size is 16. These values can be changed through a constructor. The class shares part of the implementation with the HashSet class, thus most consideration here apply also to that class.

The basic structure of the HashMap is exemplified in Figure 10.7.

Figure 10.7: HashMap internal structure example

Hash table based containers HashMap and HashSet work better if entries define a suitable hashCode() method. Values must be as spread as possible, otherwise, collisions occur. A collision occurs when two entries fall in the same bucket. In such a case elements are put in a chained list, chaining reduces time efficiency. Get and put operatiosn take constant time in case of no collisions.

The chaining procedure is shown in Figure 10.8.

Figure 10.8: HashMap internal structure example

10.2.1.2 TreeMap

The class TreeMap is based on a Red-Black tree. The get and put operations takes \(log(n)\) time. Due to the structure of the tree, the keys are maintained and traversed in order. Therefore it is required that the key class (K) must be Comparable or a Comparator must be provided to the constructor.

An exemplification of the TreeMap structure is shown in Figure 10.9.

Figure 10.9: TreeMap internal structure example

10.3 Optional

The typical convention in Java APIs is to let a method return a null reference to represent the absence of a result. This approach, require that the caller must check the return value of the method to detect that case. If the caller omits – forget? – such a check, a NullPointerException (NPE) may occur. This is called the nullability problem.

The collections framework contain class Optional that is intended to mitigate the nullability problem. Class Optional is used to represent a potential value, i.e. a value that might be null.

Methods returning Optional<T> make explicit that the return value may be missing. Such a return type forces the clients to deal with potentially empty optional.

Access to the value embedded within the Optional can be done through:

boolean isPresent(): checks if Optional contains a value
T get(): returns the value if present; otherwise it throws a NoSuchElementException.
ifPresent(Consumer<T> block): executes the given code if a value is present.
T orElse(T default): returns the value if present; otherwise it returns the provided default value.
T orElse(Supplier<T> s): returns the value if present; otherwise it returns the value supplied by s

The Optional objects can be created through a set of static factory methods:

of(T v): throws exception if v is null
ofNullable(T v): returns an empty Optional when v is null
empty(): returns an empty Optional, same as ofNullable(null)

Such methods force the programmers to think about what they are about to return.

10.4 Using Collections

Note

The reccomendation is to always use the most general interface suitable for the usage in the program. E.g. List<> is better than LinkedList<>.

The motivations for such recommendation are:

more general interfaces are more flexible for future changes, i.e. a change in the specific implementing class does not affect the usage, which always goes through the general interface.
improve the programmer’s approach, makes them think first about the type of container, then about the implementation

The general rule for selecting the container type is:

if access by key is needed use a Map
- if values sorted by key use a SortedMap
otherwise use a Collection (or derived interfaces)
- if indexed access, use a List
- if access in order, use a Queue
- if no duplicates, use a Set
- if elements sorted, use a SortedSet

10.4.1 Using lists

10.4.1.1 Iteration

Since all group containers – through Collection – implements the Iterable interface, they can be used with the try-catch construct:

Iterable<Person> persons = new LinkedList<Person>();

for(Person p: persons) {
    System.out.println(p);
}

In addition Iterable defines the default method:

forEach(Consumer<? super T> action) perfoms an action on each element.

The method can be used to perform operations of elements with a functional interface:

Iterable<Person> persons = new LinkedList<Person>();

persons.forEach( p -> {
    System.out.println(p);
});

It is important to note that it is unsafe to iterate over a collection while modifying – i.e., add or remove elements – at the same time:

List<Integer> lst=new LinkedList<>();
lst.add( 10 );   
lst.add( 11 );   
lst.add( 13 ); 
lst.add( 20 );

int count = 0;
for (Iterator<?> itr = lst.iterator(); itr.hasNext(); ) {
   itr.next(); 
   if (count==1) 
1      lst.remove(count);
   count++;
}

1: a ConcurrentModificationException is raised at run-time

The correct way of modifying while iterating is by using the iterator’s own methods:

Iterator.remove(): removes the current element
ListIterator.add(): adds an element at the current position

List<Integer> lst = new LinkedList<>();
lst.add( 10 );   
lst.add( 11 );   
lst.add( 13 ); 
lst.add( 20 );

int count = 0;
for(Iterator<?> itr = lst.iterator(); itr.hasNext(); ) {
   itr.next();
   if (count==1) 
      itr.remove(); // ok
   count++;
}

10.4.1.2 Time Performance

Based on the implementation structured of the two list classes, the general time performance for the two main implementations of List are sumamrized in the following Table 10.2.

Table 10.2: Summary of time performance behavior of main List implementations.

Operation	`ArrayList`	`LinkedList`
`get(n)`	Constant	Linear
`add(0,e)`	Linear	Constant
`add()`	Constant	Constant

More in detail we can build the following analytical models for two insertion methods:

add(0,e): add in first position in list of size \(n\):

LinkedList: \(t(n) = C_L\)
ArrayList: \(t(n) = n \dot C_A\)

for(int i=0; i<n; ++i) l.add(i);: add \(n\) elements in an initially empty list

LinkedList: \(t(n) = n \cdot C_L\)
ArrayList: \(t(n) = \sum_{i=1}^n C_A \cdot i = \frac{C_A}{2}n\cdot(n-1)\)

On a MacBookPro M2, \(C_L=16.0 ns\) and \(C_A = 0.2 ns\)

10.4.2 Using maps

10.4.2.1 Getting an item

The retrieval of an element from a map should consider the case of a missing element (i.e. key not present). A first option is to check the return value of get() that returns null if the key is not found:

String val = map.get(key);
if( val == null ) val = "Undefined";

The above solution has a possible vulnerability, if the value associated with a key is exactly null a valid return value cannot be distinguished from a missing key.

Alternatively it is possible to first check the presence of the key and then retrive the value:

if( ! map.containsKey(key))
     val = "Undefined"; 
else
     val = map.get(key);

Since this kind of operation is very common, a specific default method getOrDefault() has been added to the interface, which can be used for the same purpose:

String val = map.getOrDefault(key, "Undefined");

10.4.2.2 Updating entries

Another common operation is to update the entries, i.e. changing the value associated with a key. A typical exambe is when a map is used to count the occurrences of (repeated) items in a collection. A possible example is:

Map<String,Integer> wc=new HashMap<>();
for(String w : words) {
    Integer i= wc.get(w);
    wc.put(w, i==null?1:i+1);
}

Since this operation is very commong a suitable default function compute() is provided. The method compute() accepts two argument:

the key,
a BiFunction that takes the key and value present in the map (value is null if not present) and computes the new value to be stored.

A simple alternative to the above code for counting occurrence that uses compute() can be:

Map<String,Integer> wc=new HashMap<>();
for(String w : words) {
    wc.compute(w,(k,v)->v==null?1:v+1);
}

The above code although clean and compact is not particularly efficient since it uses the autoboxing feature.

Every increment of an Integer is equivalent to Integer.valueOf(v.intValue()+1), that requires additional time for the instantiation and and additional memory fee of 16 bytes per object creation.

A much better solution would be to use a mutable object. For instance:

class Counter {
    private int i=0;
    public String toString(){
        return ":"+i; 
    }
    public void inc(){ ++i; }
}

In this case another default method can be usefule computeIfAbsent() that allows creating a Counter objects only on the first occurrence of a word and just increment the counter on every occurrence:

Map<String,Counter> wc=new HashMap<>();
for(String w : words) {
    wc.computeIfAbsent(w, k->new Counter()).inc();
}

This solution is ~40% faster than the one using Integer and requires 16 bytes less per each increment.

10.4.2.3 Keeping items sorted

Using sorted maps

SortedMap<String,Integer> wc=new TreeMap<>();

produces:

`“A”=1, “All”=3, “And"=2, “Barefoot”=1,…`

Vs. using non sorted maps:

Map<…> wc=new HashMap<>();

produces:

“reason”=1, “been”=1, “spoke”=1, “let”=1

10.4.2.4 Time performance

Example: 100k searches in a container with two different sizes, require

size	HashMap	TreeMap	ArrayList	LinkedList
100k	3ms	60ms	40s	>1h
200k	3ms	65ms	110s

10.5 Algorithms

A number of common algorithms are provided as static methods of java.util.Collections. Please note the trailing s!

Most of the algorithms work on List since it has the concept of position:

sort() - merge sort of List
binarySearch() – requires ordered sequence
shuffle() – unsort
reverse() - requires ordered sequence
rotate() – of given a distance
min(), max() – min and max in a Collection

10.5.1 Sort

The method uses the merge sort algorithm that has a \(n~\mathrm{log}(n)\) time complexity. Requires a List<E> since it needs access by index.

As the sort() method in Arrays, it has two overloads:

<T extends Comparable<? super T>> void sort(List<T> list)
<T> void sort(List<T> list, Comparator<? super T> cmp)

10.5.2 Search

<T> int binarySearch(List<? extends Comparable<? super T>> l, T key)
<T> int binarySearch(List<? extends T> l, T key, Comparator<? super T> c)

Searches the specified object, the list must be sorted into ascending order according to natural ordering or comparator’s order.

10.6 Wrap-up

The collections framework includes interfaces and classes for containers
There are two main families
- Group containers
- Associative containers
All the components of the framework are defined as generic types