Archiv der Kategorie: Java

Java command line parsing libraries compared

Java command line parsing libraries compared

This is a small comparison of command line parsing libraries in Java. It is not meant to be complete, but shall help people to find the best library for their needs.

What types are there?

There are basically two different types of librarys.No type is better than the other. It’s a question of personal preference what you like.

The types are exaplained in the next two subsections.

Builder style

The following example shows the builder style approach.

// create Options object
Options options = new Options();

// add t option
options.addOption("t", false, "display current time");

The possible command line parameters are defined by creating an object tree. In the example, a generic Options object is created and an option is added using a method invocation.

After parsing, the command line options are queried from a generic command line holder object:

CommandLineParser parser = new DefaultParser();
CommandLine cmd = parser.parse( options, args);

if(cmd.hasOption("t")) {
    // print the date and time
}

The example shows the very popular library Apache Commons CLI.

Annotation style

The following example shows the annotation style approach:

public class Args {
  @Parameter
  private List<String> parameters = new ArrayList<>();

  @Parameter(names = { "-log", "-verbose" }, description = "Level of verbosity")
  private Integer verbose = 1;

  @Parameter(names = "-groups", description = "Comma-separated list of group names to be run")
  private String groups;

  @Parameter(names = "-debug", description = "Debug mode")
  private boolean debug = false;
}

The possible command line parameters are Java fields that are annotated with additional information for the command line parsing library.

The parsing results are written by the library in an instance of the  annotated class (in this case Args) itself:

Args args = new Args();
String[] argv = { "-log", "2", "-groups", "unit" };
JCommander.newBuilder()
  .addObject(args)
  .build()
  .parse(argv);

Assert.assertEquals(args.verbose.intValue(), 2);

The example shows the quite popular JCommander library.

Comparison table

The following table shows a selection of libraries that can be downloaded and are more or less actively maintained.

I have not included libraries that are obviously not maintained anymore for multiple years.

LibraryLicenseTypeLivenessMin JDK Artifact
Args4JMITAnnotationMedium (10 months)6artifact
JCommanderApache 2.0AnnotationHigh (20 days)7artifact
Apache Commons CLIApache 2.0BuilderMedium (8 months)5artifact
JArgsBSD-3BuilderVery low (5 years)5?
JOptSimpleMITBuilderMedium (3 months)8artifact
JewelCLIApache 2.0AnnotationVery low (3 years)6artifact
PicoCLIApache 2.0AnnotationHigh (5 days)5artifact

About the columns

The following is the description of the columns used in the table.

  • Library: Tells the library name and links to the library development site.
  • License: States which software license the library is developed under.
  • Type: The API approach this library takes. Annotation means the library uses Java annotations to document the command line options. Builder means the user needs to actively build a command line object model in code.
  • Liveness: How much is this library alive? When was the last release or accepted pull request? A library that is not changed for a long time is unlikely to receive support or bug fixes.
  • Min JDK: The minimum JDK version needed to use this library. If you are forced to use an old JDK, this can be very important.
  • Artifact: A link to an artifact that is usable by building tools like Maven or Gradle. It can be ugly to include a jar file library into a Maven-based project.

What else could be interesting?

  • Test coverage of the code: The more tests there are, the more stable and robust the library is.
  • Number of dependencies: If you need to take a look at the disk footprint, you should stick to a library that has only a few or no dependencies.

Recommendation

I won’t recommend a certain library for you. You should decide what library you can use at all regarding the hard facts (Min JDK, license, liveness). Regarding the type of library it’s a matter of taste.

Annotation based

I personally prefer Annotation based libraries and have contributed to the args4j project in the past. Unfortunately it looks like the author is busy with other stuff at the moment. This is why I chose JCommander for the annotation examples which is a very lively project.

PicoCLI looks like a promising newcomer with a very good documentation. Because of the coloring of the command line it doesn’t look very pico to me, more mini.

Builder based

Apache Commons CLI is the indestructable evergreen with the builder approach that has been used by thousands of projects.

Further reading

I’ve started an open source project called args2all that reads annotated classes for multiple libraries and generates Markdown or manpage documentation for them.

Your favorite Java collection is slow!

A little history

Java collection framework is an object oriented framework for dealing with data vectors. It started with JDK 1.2 and is today is very popular. There are well-known Collection classes like

Todays status

Vector, once upon a time the only JDK-built-in dynamically growing in-heap object storage, is (almost) not used anymore. Most people are using ArrayLists for Lists and HashSets for Sets.

I’ve asked myself whether this is the optimum choice and did some performance benchmarking.

The benchmarks

In the following I’ll show you the results of three different benchmarks executed on 10, 100 and 1000 elements on the different collection types:

  • Add N elements
  • Remove N elements
  • Iterate over N elements

Add N elements

List<Long> source = ...;
List<Long> target = ...;
for (Long val : source) {
    target.add(val);
}

Please note that Collection.addAll() would do the job much better, but you’d miss all the internal buffer expansions and re-hashing that come with „normal“ use.

Remove N elements

Collection<Long> l = ...;
Iterator<Long> iter = l.iterator();
while (iter.hasNext()) {
    iter.next();
    iter.remove();
}

Please note that Collection.clear() would also do the job much better, but you’d miss all the internal buffer reorganisations.

Iterate over N elements

Collection<Long> l = ...;
Iterator<Long> iter = l.iterator();
while (iter.hasNext()) {
    iter.next();
}

Test specs

Before executing the benchmarks I ensured that the JIT had enough time to translate the Java byte code to native code.

The JDK used was 1.8.0_51 on a Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz.

The benchmark results

I’ve listed three benchmark results as charts with 10, 100 and 1000 elements. The bars depict the per-element operation time in nanoseconds. Besides the bars for add, remove and iterate there’s an „all“ bar that adds the times for the add, remove and iterate operations.

Why is the different number of elements interesting? The more elements there are, the more internal reorganization operations need to be performed. For ArrayList there’s a bigger backing array allocated every bunch of add() operations.

Benchmark with 10 elements
Benchmark with 10 elements
Benchmark with 100 elements
Benchmark with 100 elements
Benchmark with 1000 elements
Benchmark with 1000 elements

Please note that the execution times are specific to the testing machine of course and only the relative performance of the algorithms is interesting.

Discussion

As a discussion of the benchmark results, I’d like to take a look at several findings of the graphs.

  • For small number of elements (10) the performance is not dramatically different. But for 1000 objects there’s performance difference by a a factor of two possible for certain operations.
  • Vector and ArrayList perform quite comparable because their implementations are comparable. ArrayList performs better because of the lack of object synchronisation operations that is built-in in Vector.
  • Vector and ArrayList don’t do well for removing elements using the Iterator. That’s because the Iterator removes in iteration order, at the beginning. The implementations need to do to N-1 arraycopy() calls. Remove operations at the beginning are expensive, remove operations at the end are cheap.
  • LinkedList has a high per-element overhead for adding which is not surprising because of the per-element node overhead it has.
  • All Set derivates have a high overhead for adding which is probably because of the internal layout, hashing and checking for duplicates and so on.
  • ArrayDeque outperforms all other collections. This is quite interesting because ArrayDeque is not very familiar to developers. ArrayDeque is my new favorite collection if not requiring the List interface (especially List.get(int)).

Conclusion

I’ve investigated a limited aspect of the performance of several collection types. It’s difficult to say what the typical collection usage pattern in your code is. This is why every recommendation of the ‚best collection‘ will only be true for certain use-cases.

One interesting insight is that ArrayList and Iterator seem to be ‚arch-enemies‘ when it comes to the Iterator.remove() call. The same will probably be true for ArrayDeque when you start deletion with the second element, not the first.

One of the top-surprises for me was the ArrayDeque class which was a total stranger for me before. My recommendation is that you take a look at this collection implementation.