Your favorite Java collection is slow!

A little history

Java collection framework is an object oriented framework for dealing with data vectors. It started with JDK 1.2 and is today is very popular. There are well-known Collection classes like

Todays status

Vector, once upon a time the only JDK-built-in dynamically growing in-heap object storage, is (almost) not used anymore. Most people are using ArrayLists for Lists and HashSets for Sets.

I’ve asked myself whether this is the optimum choice and did some performance benchmarking.

The benchmarks

In the following I’ll show you the results of three different benchmarks executed on 10, 100 and 1000 elements on the different collection types:

  • Add N elements
  • Remove N elements
  • Iterate over N elements

Add N elements

List<Long> source = ...;
List<Long> target = ...;
for (Long val : source) {
    target.add(val);
}

Please note that Collection.addAll() would do the job much better, but you’d miss all the internal buffer expansions and re-hashing that come with „normal“ use.

Remove N elements

Collection<Long> l = ...;
Iterator<Long> iter = l.iterator();
while (iter.hasNext()) {
    iter.next();
    iter.remove();
}

Please note that Collection.clear() would also do the job much better, but you’d miss all the internal buffer reorganisations.

Iterate over N elements

Collection<Long> l = ...;
Iterator<Long> iter = l.iterator();
while (iter.hasNext()) {
    iter.next();
}

Test specs

Before executing the benchmarks I ensured that the JIT had enough time to translate the Java byte code to native code.

The JDK used was 1.8.0_51 on a Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz.

The benchmark results

I’ve listed three benchmark results as charts with 10, 100 and 1000 elements. The bars depict the per-element operation time in nanoseconds. Besides the bars for add, remove and iterate there’s an „all“ bar that adds the times for the add, remove and iterate operations.

Why is the different number of elements interesting? The more elements there are, the more internal reorganization operations need to be performed. For ArrayList there’s a bigger backing array allocated every bunch of add() operations.

Benchmark with 10 elements
Benchmark with 10 elements
Benchmark with 100 elements
Benchmark with 100 elements
Benchmark with 1000 elements
Benchmark with 1000 elements

Please note that the execution times are specific to the testing machine of course and only the relative performance of the algorithms is interesting.

Discussion

As a discussion of the benchmark results, I’d like to take a look at several findings of the graphs.

  • For small number of elements (10) the performance is not dramatically different. But for 1000 objects there’s performance difference by a a factor of two possible for certain operations.
  • Vector and ArrayList perform quite comparable because their implementations are comparable. ArrayList performs better because of the lack of object synchronisation operations that is built-in in Vector.
  • Vector and ArrayList don’t do well for removing elements using the Iterator. That’s because the Iterator removes in iteration order, at the beginning. The implementations need to do to N-1 arraycopy() calls. Remove operations at the beginning are expensive, remove operations at the end are cheap.
  • LinkedList has a high per-element overhead for adding which is not surprising because of the per-element node overhead it has.
  • All Set derivates have a high overhead for adding which is probably because of the internal layout, hashing and checking for duplicates and so on.
  • ArrayDeque outperforms all other collections. This is quite interesting because ArrayDeque is not very familiar to developers. ArrayDeque is my new favorite collection if not requiring the List interface (especially List.get(int)).

Conclusion

I’ve investigated a limited aspect of the performance of several collection types. It’s difficult to say what the typical collection usage pattern in your code is. This is why every recommendation of the ‚best collection‘ will only be true for certain use-cases.

One interesting insight is that ArrayList and Iterator seem to be ‚arch-enemies‘ when it comes to the Iterator.remove() call. The same will probably be true for ArrayDeque when you start deletion with the second element, not the first.

One of the top-surprises for me was the ArrayDeque class which was a total stranger for me before. My recommendation is that you take a look at this collection implementation.