Fundamental Java Classes with Generics

Vlad Patryshev, 02/07/2005 (with an 2007 update for version 6)

Introduction

Java already has a rich library that deals with abstract types like Set, List (aka tuple), Map; and it seems like there is not much need to add more. Nevertheless, for years I've been adding classes and methods to my private library. Then Java generics went out of the closet - and I have discovered that with generics one can make miracles... I mean, develop and start using really generic and mathematically sound classes. So I refactored my library to generics, and here it is. Below I will use Java features and classes from version 6.

Collections Revisited

Users, both beginners and those that are too busy to upgrade, often mix interfaces and abstract classes from Collections framework with their concrete implementations that combine abstract ideas with specific strategies: for instance, List has two implementation strategies, LinkedList and ArrayList. The fact that it is linked or array-ed, has nothing to do with its core functionality; implementations can vary, and what's important from the client point of view is that they are all Lists - and another thing is important: performance; but we definitely need to separate performance concerns from presentation.

So, having this in mind, here's the current hierarchy of fundamental classes in Java:

Iterable

It is an abstraction of an enumerable entity; all it has is iterator() that returns an Iterator to scan through components. It does not mean that elements are in a certain order - there is a big conceptual difference between having a certain order and an existence of an order in which elements could be listed, with none of such orders being an attribute of the entity.
We cannot know the number of elements in an Iterable; in principle, we could define boolean isEmpty() as
{ return iterator().hasNext(); }, but in many cases this definition can be costly. For instance, the Iterable can be a resource on the internet, each next() returning a byte - checking its emptiness would involve opening a connection. It would be interesting, on the other hand, to to have Iterable File and URL readers.
It would be interesting to find out whether there is an analog of Iterable in mathematics.

Collection

It is an Iterable that has knows its size, can have methods for adding and deleting elements, check whether an element is a member of the collection, and which can be converted to an array - using the fact that its size is known in advance. No exact match in mathematics; a bag, or a multiset, can be considered as a model if we could take Collections out of the hierarchy and treat them as a separate entity.

List

It is a Collection where elements have a linear order. List is known as Sequence in IDL, and as tuple in mathematics. It has the first element, the last element, and can have methods for inserting and deleting elements. Two lists are declared equal if they have the same size and all its consecutive elements are equal.

Set

It is a Collection where elements do not repeat. Sets are modeled after sets in Set Theory: two sets are equal if they contain the same elements (Extensionality Axiom). Sets do not have any additional operations; the only additional method equals() makes them distinct from other possible types of collections.

SortedSet

It is a special kind of Set that is linearly ordered. SortedSet in Java extends Set - which basically means that order is used only for storing data. While two SortedSets that have the same elements but a different comparator are equal(), it is impossible to addAll() from one SortedSet to another unless their comparators are equal().

Map

Map represents a function defined on a set and taking values of a certain type. Technically speaking, a Map is fully determined by a Set of its key-value pairs. A Map can return a value for a specified key, the whole set of its keys, keySet() and a collection of its values; this collection is neither a set (can have repeating values) nor a List (order does not matter). A Map can have methods for adding and removing key-value pairs.

Foundation Classes

The classes reviewed above were enough for Java Before Tiger. Now, with generics, more can be offered to effectively and efficiently write Java code using collections and avoiding 'for' loops and plain arrays, two legacy constructs that we inherited from Fortran and C. Below I'll talk about two foundation classes and three utility classes with a bunch of static methods for dealing with collections. The most fundamental notion in mathematics is a...

Function

In Java, we can define interface Function<X,Y>: it has just one method, Y apply(X x). An objectivised function can be handled now as an entity: we can compose and invert functions, we can apply them to collections, we can instantiate functions from other objects.
There is a big difference between a Function and a Map: a Function is defined on a type, and a Map is defined on a Set. If we have a Map m, we can define a Function function(m): it will return m.get(x) for any X x, but to get a Map from a Function, we will need a Set of keys. Also, equals() is defined for Maps, but for Functions it is very impractical to define equals(): we would need to scan through the whole range of type values to check whether two Functions are equal.

Actually, in the package com.myjavatools.lib.foundation, Function.java is not an interface, but an abstract class with one abstract method, apply(). Other methods provide the functionality I wrote above: function factory static Function<X,Y> function(Map<X,Y> map); Map<X,Y> toMap(Set<X> keys) - a Map view of the function, with Set<X> keys as keys; Map<X,Y> toMap(Collection<X> keys).

List<Y> apply(List<X>) - this one returns a List view of function values on the provided list. Notice, it is a view, and the user will want to use a copy constructor to avoid recalculating function values every time an element is retrieved; on the other hand, the user is free to deal with this List view, and avoid creating new objects. Saving memory does not seem to be a big issue these days, but memory management, creation and disposal of objects may take too many resources.
Similar methods Iterable<Y> apply(Iterable<X>) and Iterator<Y> apply(Iterator<X>) apply a Function to an Iterable and an Iterator.

Function has two methods for composition, one is static, to apply this function after the parameter function: f.compose(g) is a function that returns f.apply(g.apply(arg)); another one is a static method, which looks closer to classical mathematics:
<X,Y,Z> Function <X,Z> compose(Function<X,Y> f, Function<Y,Z> g)

Filter

A Filter, a.k.a. predicate, is a Function<X,Boolean>, that is, a function that returns logical values. One could argue regarding overusing "boolean", but Java does not give much of a choice. Filter has an alias for apply(): accept(). Two methods in this class do the actual data filtration, the same way grep function does: Iterator<T> filter(Iterator<T>) and Iterable<T> filter(Iterable<T>). The latter is, of course, a view Iterable, not a new array or something; the user is free to create a "hard copy" of the filtered Iterable, when necessary.

Pair

This is an implementation of Map.Entry<X,Y> interface, only key and value are named left and right components; with the addition of swap() method. Theoretically, Pair<X,Y> could serve as an alternative to List<X> implementation of the notion of tuple from mathematics.

Utility Foundation Classes

Iterators

This class contains several static cat() methods that concatenate collections. Namely, cat(Arrays.asList("abc", "def"), Arrays.asList("ghi", "jkl")) will return a virtual Collection that contains the four strings. This method concatenates Collections; similarly one can concatenate Iterables or Iterators.

Maps

This class contains two groups of methods in this class: creational and functional.
Creational methods build maps from materials that the user supplies: Map<X,Y> toMap(X key1, Y value1, X key2, Y key2) and the like will create a map that contains just these two keys; you can pass an array with keys and values (odd elements are keys, even elements are values), and toMap() will create a Map based on this array. Another toMap() takes a vararg array of Map.Entry instances and creates a virtual Map based on this array. Neither of these methods cares to copy arrays; they are all just views.
There is a caveat regarding using these creational methods: it is your responsibility to guarantee that all keys are distinct; it is a contract of Map interface that keys in the map are all distinct, and can form a Set. None of the creational methods checks whether keys are distinct. Generally speaking, no existing implementation of Set or Map interface, except probably UnmodifiableMap and UnmodifiableCollection, can guarantee this distinction. When you add elements to a Set, they are all distinct - but what if you change the elements later? There are no observers to react to such changes.
The purpose of functional methods is to implement standard or almost standard operations on Maps. Map<X,Z> compose(Map<X,Y>, Map<Y,Z>) returns a virtual Map which is the two Maps' composition. Four map() methods apply a Map to a List, a Collection, an Iterable and an Iterator respectively. Map<X,Y> restrict(Map<X,Y>, Collection<X>) restricts a Map to a Collection.

The following functional methods do create Set or Map instances. So far there is no factory method that would return the right Map or Set instance; I just use LinkedHashMap and HashSet as an ultimate container choice. This area definitely need some development.
Set<X> resolve(Map<X,Y> map, Y y) returns a Set of X such that map.get(x).equals(y).
Map<Y,X> inverse(Map<X,Y>) returns a new map that is inverse to the original map; if none exists, an exception is thrown. Another method, revert(Map<X,Y>) revert a Map even if it is not an one-to-one. It returns, for a Map<X,Y> f it returns a Map<Y,Set<X>> g such that for each Y y in g.keySet() the value of g.get(y) is a Set<Y> of all those X x that f.get(x).equals(y).

Example of Usage

In the following sample program the files in the specified directory are grouped by their types, as defined in the provided list.


package com.myjavatools.lib.foundation;

import java.io.*;
import java.util.*;

public class Sample1 {

  public static void main(String[] args) {
    String folderName = args.length < 1 ? "." : args[0];

    Map<String,String> fileToType =
        Maps.toMap("gif",  "image",
                   "jpg",  "image",
                   "jpeg", "image",
                   "png",  "image",
                   "java", "source code",
                   "cpp",  "source code",
                   "hpp",  "source code",
                   "class","binary",
                   "obj",  "binary",
                   "exe",  "binary",
                   "dll",  "library",
                   "lib",  "library",
                   "so",   "library",
                   "sl",   "library");

    Function<File,String> extension = new Function<File,String>() {
      public String apply(File file) {
        String name = file.getName();
        return name.substring(name.lastIndexOf('.') + 1);
      }
    };

    // the function returns file type for a file

    Function<File,String> fileType = Function.function(fileToType).compose(extension);

    // the list of files in the folder
    List<File> contents = Arrays.asList(new File(folderName).listFiles());

    // the same files grouped by their file types

    // only during this operation a new container is created.
    Map<String, Set<File>> filesGroupedByType =
        Maps.revert(fileType.toMap(contents));

    for (String type : filesGroupedByType.keySet()) {
      System.out.println(type + ":");
      for (File file : filesGroupedByType.get(type)) {
        System.out.println("  " + file);
      }
    }
  }
}

Conclusion

Although these simple classes are very versatile, and very non-intrusive if used correctly, that is, in a natural and simple way, I have a feeling that this is only the beginning of a long way to switching from XIX century programming style - "matrices with indexes" to the more advanced "operator" style.
Here are the links:
documentation,
source code jar,
binary jar,
the full archive: source, docs, tests, project file for JBuilder.

Everything is free, as in free thought. I am thankful to the great Borland's JBuilder team for creating the beautiful JBuilder 2005 that I've been using while developing this. Yes, I was a Borland employee for 7 years - but this seems to be the first time I thank my colleagues.