_
Prep
Not so long ago, Java lovers were engulfed by a mammoth addition — yes, we all know, right ? — the Lambda Expressions in Java 1.8! Eclipsed by lambda, minor in its avatar for the oblivious, was another message - “should not be used as an identifier, since it is a reserved keyword from source level 1.8 on” - caught by the eyes of the assiduous and the pedantic; Wait, did I miss anything? hmm…, yes, like some of us who missed out that this article began with _, an underscore. An underscore was being removed as a legal identifier, quietly — And it was being promoted (ssshhh..underscore does not know yet) to a more complex role— It was just another sign of times to come — Time has come to Match Patterns in Java!
ABCs of Pattern Matching
So what is pattern matching? All those unix fans out there would suddenly remember the good old awk; wikipedia says “AWK .. used as a data extraction...tool;… A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.” — A Given Pattern is matched, Data is extracted and Some action is taken - with or without the data. If this sounds too involved, we will be surprised to know that we have been using this all-along — The Find and Replace options all of us are very familiar with.
Let us take the example of a text “A journey of Thousand miles” where in we want to find and replace “miles” with “kilometres”. So the pattern we are looking for is “miles”, the data we are extracting is “miles” itself in this case — in these cases the pattern and the data are the same — and then we take an action on the data, ie replacing data- “miles” with “kilometres”. These first principles remain the same in Java pattern matching as well albeit with the addition of bells and whistles of supporting syntax and semantics.
The First Step
All of us would have used the instanceof check and in most cases what we would do is to typecast this into the type for which we were checking; ie
public void foo(Object obj) {
if (obj instanceof Integer) { // Match
Integer myInteger = (Integer) obj; // Extract
System.out.println(myInteger); // Action
}
}
If we look closely to the above, we can clearly see a pattern emerging — a Match-Extract-Action sequence. Not surprisingly, pattern matching forayed into Java by a seemingly easy improvement — the Pattern Instanceof as shown below:
if (obj instanceof Integer myInteger) { // Match and Extract.
System.out.println(myInteger); // Action
}
The match and extract operations were combined into one by the pattern instanceof to provide a more compact code for the programmer. This has become a standard feature in Java 16.
Less Is More
Armed with the initial success of the pattern instanceof in the “if statement”, the natural progression was to figure out how to get this naturalised in a switch statement. The term “naturalised” was used intentionally since the same construct used in switch would look ugly, to put it bluntly, and hence in Java 17, a preview feature is tested called the “Pattern Switch”. At the time of Java 17, this is in the first preview stage and let’s see how it looks:
switch (obj) {
case Integer myInteger -> System.out.println(myInteger);
case String myString -> System.out.println(myString);
default -> System.out.println("Object");
}
The above should be self explanatory, right? Instead of calling a pattern instanceof for every if statement, we just get this construct inside the switch statement and then we call it a “Pattern Switch”. Though the syntax and the semantics looks simple for the programmer, given that we at Eclipse wrote the Eclipse Compiler for Java (ecj) for supporting this [of course, javac folks as well], I can confidently say that the heavy-lifting of implementing this was not a smooth sail; retrofitting this into the existing switch construct without regressions weathered many a storm! Anyway, that’s all done, and now we have this nice looking switch pattern with us.
Traditionally switch always had constant case labels — Now with Pattern Switches, the case labels contains types rather than constants. Can we bring the meaning of “constant-ness” in some way to the types? hmm.. good thought… Looks like we have something brewing already for this — Enter Records and Sealed Types.. These are quite involved topics themselves crying for their own blog-spaces, which I plan to create soon, but neverthless let us briefly glide over their surfaces for now.
Records — A case of constant Class
The concept of Records were introduced in Java 16; Infact, it was a preview feature for the previous two releases and became standard in 16. Suffice to say for now that Records are, practically, constant classes with a specific syntax — A syntax akin to a constructor — the definition as shown below:
record R(String name, Integer age) {}
For now, just keep in mind that this compiles into a “constant” class, or just a class R with two private final fields “name” and “age”, the values of both of which need to be given at the time of instantiating the class. We would also have two accessors “built-in” by the compiler — namely “name()” and “age()” which return the values name and age respectively. Thus, with the advent of records, we are achieving the Constant-ness in “Data”.
Seal The Types
There is one more dimension to constant-ness in types. You can continue to derive subtypes from a type. From a compiler view point, especially from a pattern switch view point, this would mean that there should always be a “default” case arm to do the “catch-all” slippage of the pattern. If we really want to achieve “constant-ness” in that, all sub-types be known at the compile time, so that we can enumerate all the types in the code, then we should be able to somehow “seal” the hierarchy and “permit” only those subtypes which we “permit”. And from 17 onwards, this can be achieved via the “sealed” and “permits” combination as shown below:
sealed interface I permits Y, Z, R {}final class Y implements I {}final class Z implements I {}record R(String name, Integer age) implements I {}
“sealed” and “permits” are restricted identifiers where they have special meaning at class or interface definitions — we are sealing the interface I for the hierarchy at compile-time and permitting only classes Y and Z and the record R to implement the interface — thus giving an extra power for compile-time check for exhaustive analysis. Notice that the “final” is missing from R, since a record is final by definition, and hence the modifier final is optional.
Putting It All Together
Now, let us use this new-gained knowledge in our pattern switches to see how it looks, if we were switching on the interface I:
public void foo(I myInterface) {
switch (myInterface) {
case Y y -> System.out.println(y);
case Z z -> System.out.println(z);
case R r -> System.out.println(r.name());
}
}
A careful reader would observe that the default arm is missing since we are enumerating all the cases — similar to the case where we cover all enum values — the essence is the same.
Future — Not so Distant.
Are we satisfied? Nice Try — we want more , Nay! We want less — less at extraction; So what can we do? Let us look at the code above — we see that in the case of record, we are not using the record R as it is, but we are extracting the name — can we do something better? we can, if we are able to put the “name and age” in the case label itself as shown below:
case R(name, age) -> System.out.println(name);
Now, the extraction is simpler; its just using the “name” — but Hold on — this is just in the discussion stage, as part of Record and Array Patterns JEP [Java Enhancement Proposal] — Mentioning here so that we get an idea of what the future holds for pattern matching.
Oh UnderScore! Wherefore art Thou?
We have come to the end of this article, but as mentioned earlier, each of these features deserves separate article(s) to justify their nuances. And I promise to enhance your boredom by writing them soon.. Before you go into deep slumber, let us check about our forgotten hero, the underscore (_). Where does this little fellow fit in the whole story?
Anyone who has tried Python will know that the underscores can be used to signal that we are not interested in that variable — now, if you don’t know Python, its fine — this is just an unabashed attempt of Yours Truly to showcase that he knows more than what he does; given that he is still learning Java itself for the almost a decade making progress in such a pace giving the laziest snail a competition! So, cutting the blah, blah..what is the possible use of _ here? In the code in the previous section, we know that we use only “name”; “age” was added just to make the syntax complete.
case R(name, _) -> System.out.println(name);
What if, we could just put an underscore instead, for those variables which we don’t use or care — like the code above? Again, this is expected in future, in similar situations and do take this with a pinch of salt, because until the feature is incorporated as standard, we cannot make tall claims of its actual avatar. However, suffice to end saying that, when you see an underscore, please remember that there is this whole story behind it — Never underestimate the power of the “_”.
Thanks for being with me so far!