classDiagram
Collection <|-- Set
Set <|-- SortedSet
Collection <|-- Queue
Collection <|-- List
Map <|-- SortedMap
Set <|.. HashSet
HashSet <|-- LinkedHashSet
SortedSet <|.. TreeSet
List <|.. ArrayList
List <|.. LinkedList
Queue <|.. LinkedList
Queue <|.. PriorityQueue
Map <|.. HashMap
SortedMap <|.. TreeMap
HashMap <|-- LinkedHashMap
<<interface>> Collection
<<interface>> Set
<<interface>> SortedSet
<<interface>> Queue
<<interface>> List
<<interface>> Map
<<interface>> SortedMap
style Collection fill:bisque,stroke:darkorange
style Set fill:bisque,stroke:darkorange
style SortedSet fill:bisque,stroke:darkorange
style Queue fill:bisque,stroke:darkorange
style List fill:bisque,stroke:darkorange
style Map fill:bisque,stroke:darkorange
style SortedMap fill:bisque,stroke:darkorange
10 Collections Framework
The collections framework consists in a set of related classes contained in the package java.util that provide:
- Interfaces for several ADTs (Abstract Data Types)
- Classes implementating ADTs
- Algorithms (sort)
Originally the were designed to use Object as a universal reference type. Since Java 5 redefined as generic types, in a backward compatible way.
The main interfaces of the collections framework are reported in the top part of Figure 10.1.
The ADT interfaces can be divided into two main groups:
- group containers, that collect simple objects using different strategies,
- associative containers, that store an association between two objects, called key and value.
The corresponding implementation classes are summarized in the bottom part of Figure 10.1.
Often for a single interface, multiple implementations are provided that leverage a different underlying data structure. The correspondance between interface, data structure, and implementation is summarize in Table 10.1.
| \(\downarrow\) structure / interface \(\rightarrow\) | Set | List | Map |
|---|---|---|---|
| Hash Table | HashSet |
HashMap |
|
| Balanced Tree | TreeSet |
TreeMap |
|
| Resizable Array | ArrayList |
||
| Linked List | LinkedList |
||
| Hash Table + Linked List | LinkedHashSet |
LinkedHashMap |
10.1 Group containers
10.1.1 Collection
The container represented by interface Collection represents a group of elements (references to objects) without any specific constraint. In particular it is not specified whether they are orderd or not, and whether the accept duplicates.
The interface Collection extends Iterable and thus can be used in the for-each construct.
All classes implementing Collection shall provide two constructors
C(): creates an empty containerC(Collection c): creates a container filled in with the same elements present inc
The main methods are:
int size(): number of elements present in the collectionboolean isEmpty(): tests if the collection is empty, i.e.size()==0boolean add(E element): add an element to the collectionboolean addAll(Collection<? extends E> c): add all elements incto the collectionboolean contains(E element): tests if the given element in present in the collectionboolean containsAll(Collection<?> c): tests if all elements incare present in the collectionboolean remove(E element): removes the given element from the collection,trueif succeedsboolean removeAll(Collection<?> c): removes all elements fromcfrom the collection,trueif succeedsvoid clear(): removes all elements from the collectionObject[] toArray(): creates an array containing the elements present in the collection
The methdod Object[] toArray() returns an Object even if the interface uses generics because everywhere else. This is due to generic types limitations deriving from type erasure, the method cannot instantiate an array of the correct type.
As an alternative, the overload <T> T[] toArray(T[]) accepts an array of the target type. If not large enough a new array of the same type and right length is instantiated and returned; it uses the Arrays.copyOf() method that create an array of the right type. In practice, an empty array of the correct type can be passed and let the method do the allocation.
The interface Collection has no direct implementations. Although it can be used as a more general reference type to specific container types, for instance:
- 1
-
instantiates a new collection, implemented by
LinkedListclass - 2
- adds a new element to the collection
- 3
-
prints the size, the is
1at this point - 4
-
creates a new instance of a collection, implemented by
TreeSetclass - 5
-
adds all elements in
personsintocopy, the two operations could be conflated intonew TreeSet<Person>(persons) - 6
- creates an array containing the elements of the collection
- 7
- prints the first (and only) element of the array
10.1.2 List
The container represented by interface List is a specialization of Collection – which it implements –. It can contain duplicate elements, in addition the insertion order is preserved and the user can define the insertion point, and most importantly the elements can be accessed by position. Extends Collection interface
The main methods of the List interface, in addition to those inherited from Collection, are:
E get(int index): retrieves the element at the given positionE set(int index, E element): replaces the element at the given positionvoid add(int index, E element): adds an element in a given position, can cause a shift of elements already presentE remove(int index): removes the element at the given position and returns itint indexOf(E o): returns the index of the first occurrence of the given elementint lastIndexOf(E o): returns the index of the latest occurrence of the given elementList<E> subList(int from, int to): returns a list that contains a portion of the starting one
An example of usage of class List is:
- 1
- 42 in position 0
- 2
- 42 moved to position 1
- 3
- 13 replaced by 20
- 4
- returns 42
- 5
-
Error: out of bounds,
IndexOutOfBoundsException: Index: 9, Size: 2 - 6
-
Ok, immediately after the last element, same as
add()
The main List implementations are:
ArrayList<E>it is based on a resizable array,LinkedList<E>it is based on linked elements.
The two classes impelements the method defined in the interface and in addition provide additional methods that are convenient for the specific data structure used:
10.1.2.1 LinkedList
The LinkedList implementation uses a series of linked elements as shown in Figure 10.2. The list maintains a reference to the first element first and one to the last one last.
The add() operations adds a new element at the tail of the list as shown in Figure 10.3.
Essentially:
- a new element is created, its next element being the last one,
- the next element of the last is the newly created element,
- the new element becomes the last.
The class LinkedList<E> provides the following additional constructors and methods that leverage the internal structure:
void addFirst(E o): adds an element in first position, i.e.l.add(0, o)void addLast(E o): adds an element in the latest position, i.e.l.add(o)E getFirst(): returns the first element, i.e.get(0)E getLast(): returns the last element, i.e.l.get( l.size()-1 )E removeFirst(): removes the first elementE removeLast(): removes the last element
10.1.2.2 Array list
The Arraylist implementation uses an array that is resized when required. An example is shown in Figure 10.4.
The add() operations adds a reference to the element at the next available place in the array, as shown in Figure 10.5.
When the array is full a resize operation is required as shown in Figure 10.6.
In summary:
- a new (larger) array is created and the existing elements are copied into it (
Arrays.copyOf()) - then the first new position of the array will store the reference to the element
The operation of instantiating the new array and copying the elements is very expensive, for this reason to improve performance it can be advisable to define an initial size suitable for the usage.
The class ArrayList<E> provides the following additional constructors and methods:
ArrayList(int initialCapacity): initializes the underlying array to the given initial size,void ensureCapacity(int minCapacity): resizes the underlying array so that is has at least the given size.
10.1.3 Queue
The container represented by interface Queue is a specialization of the Collection. The elements are inserted in a specific order. In particular, it defines a head position where is the first element that can be accessed.
The typical orders are:
- insertion order, i.e. FIFO (Fist-In-First-Out), or
- elements’ natural order (Priority queue).
Interface Queue, in addition to those inherited from Collection, defines the following main methods:
peek(): just retrieves the element at the head of the queuepoll(): retrieves and removes the element at the head of the queue
The Queue interface has two predefined implementations:
LinkedList, in this case the head is the tail of the list, it is a FIFO;PriorityQueue, the head is the smallest element.
An example usage of Queue that compares the two predefined implementations:
- 1
- create two queues using a FIFO and priority strategies
- 2
- adds the same elements in the same order to both queues
- 3
-
the head element of the
fifois the first added - 4
-
the head element of the
pqis the smallest
10.1.4 Set and SortedSet
The interface Set contains no additional methods w.r.t. those inherited from Collection. It serves the purpose of a flagging interface. It is intended to represents containers that have no duplicate elements. The duplication detection is demanded to the equals() method.
The classes implementing Set must comply with the following restriction on the add() method:
\[\forall e_1, e_2 \in \Sigma : e_1 \neq e_2 \implies e_1\mathtt{.equals}(e_2) == \mathtt{false}\]
The elements are traversed in no particular order
A further specialization of the Set interface is SortedSet that mandates that the elements are traversed according to the element ordering. Like its parent interface, it does not allow duplicate elements.
The interface SortedSet provides the following additional methods:
E first(): retrieved the first element in orderE last(): retrieved the last element in orderSortedSet<E> headSet(E toElement): retrieved the initial portion of elements in order until the given oneSortedSet<E> tailSet(E fromElement): retrieved the final portion of elements in order starting from the given oneSortedSet<E> subSet(E from, E to): retrieves a subset consisting of the elements in orderd comprised within the two provided
Depending on the constructor used, it is possible to use two alternative implementations of the custom ordering:
TreeSet()default constructor implies natural ordering (elements must be implementations ofComparable),TreeSet(Comparator c)comparator constructor implies the ordering according to the comparator criteria.
There are three standard implementations:
HashSetimplementsSet, usus hash tables as internal data structure (faster)LinkedHashSetextendsHashSet, in addition the elements are traversed by iterator according to the insertion orderTreeSetimplementsSortedSet, uses R-B trees as internal data structure (computationally expensive)
10.2 Associative containers
10.2.1 Map
The interface Map represents containers that associates keys to values (e.g., SSN -> Person). Keys and values must be objects, and keys must be unique. For a given key, only one value can be stored.
The following constructors are common to all map implementers:
M(): creates an empyt mapM(Map m): creates a map containing the same entries as the given map
The Map interface provides the following main methods:
V put(K key, V value): add an entrykey-valuein the mapV get(K key): retrieves the value associated with the givenkeyV remove(K key): removes the association for the givenkeyboolean containsKey(K key): tests if the map contains the givenkeypublic Set<K> keySet(): returns the set of all keys present in the mappublic Collection<V> values(): returns the collection of all values present in the mapint size(): returns the number of entries present in the mapboolean isEmpty(): tests if the map is empty, i.e.size()==0void clear(): removes all entries from the map
Example of usage of the Map interface:
1Map<String,Person> people =new HashMap<>();
2people.put( "ALCSMT", new Person("Alice Smith") );
people.put( "RBTGRN", new Person("Robert Green") );
3if( ! people.containsKey(©`))
System.out.println( "Not found" );
4Person bob = people.get("RBTGRN");
5int populationSize = people.size();
6for(Person p : people.values()){
System.out.println(p);
}
7for(String ssn : people.keySet()){
System.out.println(ssn);
}- 1
-
creates a new map implemented by a
HashMapclass - 2
- adds two entries in the map
- 3
-
checks if the key
"RBTGRN"is present in the map - 4
-
retrieves the person associated to key
"RBTGRN" - 5
- retrieves the number of entries in the map, 2
- 6
- retrieves the values in the map, i.e. the persons
- 7
- retrieves the set fo keys in the map
A furhter specialization of Map is represented by SortedMap that mandates that its elements be traversed according to the keys’ ordering. The ordering can be based on the natural ordering (Comparable) or provided as a Comparator object passed to the constructor.
It provides the followint specialized methods:
SortedMap headMap(K toKey): creates a new sorted map the contains only initial keys, up to the given oneSortedMap tailMap(K fromKey): creates a new sorted map the contains only final keys, starting from the given oneSortedMap subMap(K fromKey, K toKey): creates a new sorted map the contains only keys within the two providedK firstKey(): retrieves the first keyK lastKey(): retrieves the last key
The standard implementations of Map interface are
HashMapimplementsMap, keys have no order, but must provide an hash codeLinkedHashMapextendsHashMap, key have no order, but values are traversed in insertion orderTreeMapimplementsSortedMap, keys are traversed in ascending key order and values are traversed in ascending order of the relative key
10.2.1.1 HashMap
Class HashTable is based on an hash table data structure. The table is automatically resized when a specific load factor reached. By default the load factor is 0.75 and the initial size is 16. These values can be changed through a constructor. The class shares part of the implementation with the HashSet class, thus most consideration here apply also to that class.
The basic structure of the HashMap is exemplified in Figure 10.7.
Hash table based containers HashMap and HashSet work better if entries define a suitable hashCode() method. Values must be as spread as possible, otherwise, collisions occur. A collision occurs when two entries fall in the same bucket. In such a case elements are put in a chained list, chaining reduces time efficiency. Get and put operatiosn take constant time in case of no collisions.
The chaining procedure is shown in Figure 10.8.
10.2.1.2 TreeMap
The class TreeMap is based on a Red-Black tree. The get and put operations takes \(log(n)\) time. Due to the structure of the tree, the keys are maintained and traversed in order. Therefore it is required that the key class (K) must be Comparable or a Comparator must be provided to the constructor.
An exemplification of the TreeMap structure is shown in Figure 10.9.
10.3 Optional
The typical convention in Java APIs is to let a method return a null reference to represent the absence of a result. This approach, require that the caller must check the return value of the method to detect that case. If the caller omits – forget? – such a check, a NullPointerException (NPE) may occur. This is called the nullability problem.
The collections framework contain class Optional that is intended to mitigate the nullability problem. Class Optional is used to represent a potential value, i.e. a value that might be null.
Methods returning Optional<T> make explicit that the return value may be missing. Such a return type forces the clients to deal with potentially empty optional.
Access to the value embedded within the Optional can be done through:
boolean isPresent(): checks ifOptionalcontains a valueT get(): returns the value if present; otherwise it throws aNoSuchElementException.ifPresent(Consumer<T> block): executes the given code if a value is present.T orElse(T default): returns the value if present; otherwise it returns the provideddefaultvalue.T orElse(Supplier<T> s): returns the value if present; otherwise it returns the value supplied bys
The Optional objects can be created through a set of static factory methods:
of(T v): throws exception ifvisnullofNullable(T v): returns an emptyOptionalwhenvisnullempty(): returns an emptyOptional, same asofNullable(null)
Such methods force the programmers to think about what they are about to return.
10.4 Using Collections
The reccomendation is to always use the most general interface suitable for the usage in the program. E.g. List<> is better than LinkedList<>.
The motivations for such recommendation are:
- more general interfaces are more flexible for future changes, i.e. a change in the specific implementing class does not affect the usage, which always goes through the general interface.
- improve the programmer’s approach, makes them think first about the type of container, then about the implementation
The general rule for selecting the container type is:
- if access by key is needed use a
Map- if values sorted by key use a
SortedMap
- if values sorted by key use a
- otherwise use a
Collection(or derived interfaces)- if indexed access, use a
List - if access in order, use a
Queue - if no duplicates, use a
Set - if elements sorted, use a
SortedSet
- if indexed access, use a
10.4.1 Using lists
10.4.1.1 Iteration
Since all group containers – through Collection – implements the Iterable interface, they can be used with the try-catch construct:
Iterable<Person> persons = new LinkedList<Person>();
for(Person p: persons) {
System.out.println(p);
}In addition Iterable defines the default method:
forEach(Consumer<? super T> action)perfoms an action on each element.
The method can be used to perform operations of elements with a functional interface:
Iterable<Person> persons = new LinkedList<Person>();
persons.forEach( p -> {
System.out.println(p);
});It is important to note that it is unsafe to iterate over a collection while modifying – i.e., add or remove elements – at the same time:
List<Integer> lst=new LinkedList<>();
lst.add( 10 );
lst.add( 11 );
lst.add( 13 );
lst.add( 20 );
int count = 0;
for (Iterator<?> itr = lst.iterator(); itr.hasNext(); ) {
itr.next();
if (count==1)
1 lst.remove(count);
count++;
}- 1
-
a
ConcurrentModificationExceptionis raised at run-time
The correct way of modifying while iterating is by using the iterator’s own methods:
Iterator.remove(): removes the current elementListIterator.add(): adds an element at the current position
List<Integer> lst = new LinkedList<>();
lst.add( 10 );
lst.add( 11 );
lst.add( 13 );
lst.add( 20 );
int count = 0;
for(Iterator<?> itr = lst.iterator(); itr.hasNext(); ) {
itr.next();
if (count==1)
itr.remove(); // ok
count++;
}10.4.1.2 Time Performance
Based on the implementation structured of the two list classes, the general time performance for the two main implementations of List are sumamrized in the following Table 10.2.
List implementations.
| Operation | ArrayList |
LinkedList |
|---|---|---|
get(n) |
Constant | Linear |
add(0,e) |
Linear | Constant |
add() |
Constant | Constant |
More in detail we can build the following analytical models for two insertion methods:
add(0,e): add in first position in list of size \(n\):
LinkedList: \(t(n) = C_L\)ArrayList: \(t(n) = n \dot C_A\)
for(int i=0; i<n; ++i) l.add(i);: add \(n\) elements in an initially empty list
LinkedList: \(t(n) = n \cdot C_L\)ArrayList: \(t(n) = \sum_{i=1}^n C_A \cdot i = \frac{C_A}{2}n\cdot(n-1)\)
On a MacBookPro M2, \(C_L=16.0 ns\) and \(C_A = 0.2 ns\)
10.4.2 Using maps
10.4.2.1 Getting an item
The retrieval of an element from a map should consider the case of a missing element (i.e. key not present). A first option is to check the return value of get() that returns null if the key is not found:
String val = map.get(key);
if( val == null ) val = "Undefined";The above solution has a possible vulnerability, if the value associated with a key is exactly null a valid return value cannot be distinguished from a missing key.
Alternatively it is possible to first check the presence of the key and then retrive the value:
if( ! map.containsKey(key))
val = "Undefined";
else
val = map.get(key);Since this kind of operation is very common, a specific default method getOrDefault() has been added to the interface, which can be used for the same purpose:
String val = map.getOrDefault(key, "Undefined");10.4.2.2 Updating entries
Another common operation is to update the entries, i.e. changing the value associated with a key. A typical exambe is when a map is used to count the occurrences of (repeated) items in a collection. A possible example is:
Map<String,Integer> wc=new HashMap<>();
for(String w : words) {
Integer i= wc.get(w);
wc.put(w, i==null?1:i+1);
}Since this operation is very commong a suitable default function compute() is provided. The method compute() accepts two argument:
- the key,
- a
BiFunctionthat takes the key and value present in the map (value isnullif not present) and computes the new value to be stored.
A simple alternative to the above code for counting occurrence that uses compute() can be:
Map<String,Integer> wc=new HashMap<>();
for(String w : words) {
wc.compute(w,(k,v)->v==null?1:v+1);
}The above code although clean and compact is not particularly efficient since it uses the autoboxing feature.
Every increment of an Integer is equivalent to Integer.valueOf(v.intValue()+1), that requires additional time for the instantiation and and additional memory fee of 16 bytes per object creation.
A much better solution would be to use a mutable object. For instance:
class Counter {
private int i=0;
public String toString(){
return ":"+i;
}
public void inc(){ ++i; }
}In this case another default method can be usefule computeIfAbsent() that allows creating a Counter objects only on the first occurrence of a word and just increment the counter on every occurrence:
Map<String,Counter> wc=new HashMap<>();
for(String w : words) {
wc.computeIfAbsent(w, k->new Counter()).inc();
}This solution is ~40% faster than the one using Integer and requires 16 bytes less per each increment.
10.4.2.3 Keeping items sorted
Using sorted maps
SortedMap<String,Integer> wc=new TreeMap<>();produces:
`“A”=1, “All”=3, “And"=2, “Barefoot”=1,…`
Vs. using non sorted maps:
Map<…> wc=new HashMap<>();produces:
“reason”=1, “been”=1, “spoke”=1, “let”=1
10.4.2.4 Time performance
Example: 100k searches in a container with two different sizes, require
| size | HashMap | TreeMap | ArrayList | LinkedList |
|---|---|---|---|---|
| 100k | 3ms | 60ms | 40s | >1h |
| 200k | 3ms | 65ms | 110s |
10.5 Algorithms
A number of common algorithms are provided as static methods of java.util.Collections. Please note the trailing s!
Most of the algorithms work on List since it has the concept of position:
sort()- merge sort ofListbinarySearch()– requires ordered sequenceshuffle()– unsortreverse()- requires ordered sequencerotate()– of given a distancemin(),max()– min and max in aCollection
10.5.1 Sort
The method uses the merge sort algorithm that has a \(n~\mathrm{log}(n)\) time complexity. Requires a List<E> since it needs access by index.
As the sort() method in Arrays, it has two overloads:
<T extends Comparable<? super T>> void sort(List<T> list)<T> void sort(List<T> list, Comparator<? super T> cmp)
10.5.2 Search
<T> int binarySearch(List<? extends Comparable<? super T>> l, T key)<T> int binarySearch(List<? extends T> l, T key, Comparator<? super T> c)
Searches the specified object, the list must be sorted into ascending order according to natural ordering or comparator’s order.
10.6 Wrap-up
- The collections framework includes interfaces and classes for containers
- There are two main families
- Group containers
- Associative containers
- All the components of the framework are defined as generic types