Blog
 

17.03.2020

Comparing some Java Decompilers

Comparing some Java Decompilers

 

image001.jpg

 

In this post, we will review and compare the following four decompilers: FernflowerCFRProcyon, and jadx. 

Disclaimer: this is neither a formal nor a scientific comparison, but rather an overview of all Java bytecode decompilers relevant, as of autumn 2019. 

Author: Anna Yaveyn, Solar appScreener developer

Background 

Our tool – Solar appScreener – is designed to search code for vulnerabilities and can analyze Java bytecode as well. However, analysis itself is not enough: a user should get results, which can be integrated into the development process. So, you cannot just say "look at bytecode instruction No. 147 in a certain method", but need to match these errors to a source code in some way to make your findings actionable. 

And, right away, a problem arises: what to do if no source code is available? The solution is to decompile a bytecode, find the lines corresponding to the detected vulnerabilities, and show the user error messages linked to respective lines in the decompiled code. 

thus, we must be able to do the following two things: 

·         Decompile a bytecode

·         Match bytecode instructions to source code lines 

Spoiler alert: currently, no known decompilers (UPD: except Fernflower) offer the latter functionality. That’s why we have to match errors to lines in a decompiled code separately, after decompilation is completed. 

Well, I will tell you about the first point: the decompilation itself.

What we need from a decompiler 

Different decompilers are tailored to different tasks. For example, Fernflower is positioned as an analytical decompiler. While there is no clear explanation of what this means, in theory, this decompiler should focus on deeper code analysis and deobfuscation. For us, this functionality is not very important (at least when it comes to displaying results). In general, our top priority is ensuring the resulting code is clear and readable. 

Therefore, we need our tools to:

·         Provide readable and (to the extent possible) correct code

·         Support syntactic sugar (foreachtry-with-resources, etc.) 

The comparison is based on these considerations and may be irrelevant when other decompiler requirements are in place.

Tools 

For comparison, we chose four open source software tools (latest versions as of autumn 2019, when this post was written). Others were rejected at the pre-qualification stage and were not analyzed in detail. 

·         Fernflower is an open source decompiler which is currently developed and supported by JetBrains (code available on GitHub).

·         CFR (0.146) decompiler seems to have been developed by a single person who states that he has done it “for fun”. Repository on GitHub. Website with project info.

·         Procyon (0.5.36) is a toolkit for Java code generation and analysis that includes a decompiler and is hosted on Bitbucket.

·         jadx (1.0.0) is a compiler designed for bytecode compilation for Dalvik but not for JVM (Source code available on GitHub). 

For a full list of decompilers that were reviewed but not included in this postfollow this link. 

·         JD-Core (aka JD Project) has neither a published library nor source code available (only plugins for development environments and GUI).
UPD: 
Source code was made available on GitHub.

·         Krakatau is written on Python.

·         JAD is completely outdated and does not even support Java 5. 

Decompiler technical details in brief: 

 

Fernflower

CFR

Procyon

jadx

License

Apache 2.0

MIT

Apache 2.0

Apache 2.0

Library

non-official mirror on GitHub

Maven: org.benf.cfr

Maven: org.bitbucket.mstrobel

Bintray

Supported Java versions

Not specified

8, 9 (some features)

8 (most features)

8 (some features)

Written on

Java 8

Java 6

Java 7

Java 8

Documentation

Unavailable

Available!

A bit

README on GitHub

 

Keep in mind that jadx is primarily designed for Android projects. Therefore, to analyze a code written for jvm, the decompiler first converts it with dx tool. Since such conversion is sometimes incorrect, it is impossible to reliably compare jadx with other tools. Therefore, jadx functionality is reviewed separately most of the time. 

Moreover, the fact that jadx supports DEX up to ver. 37 only results in, for example, lambda processing problems. 

Comparison 

The comparison covered Fernflower (GitHub version of 16.09.19), CFR (0.146), Procyon (0.5.36), and jadx (1.0.0). Not all criteria were used when comparing jadx. 

The project used as the basis for comparison is Fernflower itself, since it has a rather large codebase written completely on Java 8 with extensive use of Java features. We could not use more recent versions of Java: Procyon does not support Java 9 at all, CFR only supports some features, and there is no official information about Fernflower on this matter. 

For run scripts, follow this link.

0.PNG

To view results right away without diving deep into the details.

Metrics 

·         Project support and activity

·         Error rate when building a decompilation result

·         Speed

·         Handling of certain language features

Project support and activity

Fernflower 

On the one hand, this decompiler is used in Intellij IDEA, which guarantees that the project will be kept active and supported. 

On the other hand, Fernflower only operates as a part of the Intellij IDEA project. The decompiler itself does not even have a separate repository on GitHub (only above-mentioned non-official mirror, the link to which is the only way to connect Fernflower to your project as a dependency). 

We can conclude from the GitHub repository state that new features are rarely added to this tool, with the latest commit to master taking place three months ago (as of autumn 2019). It is difficult to understand what is going on with this project since the codebase is part of the Intellij IDEA repository. 

CFR 

While the code is written by a single person, releases occur regularly (several times a year). The author responded to all reports I sent via email in a few days and fixed the bugs in a week or two. At the time of writing, a new release (0.147) had been issued and one of the reported bugs was fixed. 

In addition, the tool is evolving rather quickly and support of new features is being added without undue delays.

Procyon 

The project is supported, and a new release was issued in summer 2019. However, this release contained nothing new except for bug fixes. In general, while the project does not seem to have been abandoned, further development is very unlikely.

jadx 

This decompiler is continuously evolving and its GitHub repository and issue tracker are active. On June 20, 2019, release 1.0.0 was issued and new features and support for new DVM versions are being added regularly.

Error rate when building a decompilation result 

The comparison included running all decompilers on the same project and building decompilation results. This allowed us to understand what errors each decompiler may make and whether (and to what extent) the code after decompilation is adequate. 

We did not include jadx here since it throws 39 exceptions when decompiling fernflower.jar and therefore is unable to decompile a large part of the code at all. 

Let us note first that there are three error classes: syntax (none detected, although older CFR versions had some), semantic errors associated with types (incorrect generic type inference, methods not found, incorrect type conversions), and all other semantic errors. 

There are three reasons why type-related errors are included in a separate category:

·         Decompilers are principally unable to restore the entire type information correctly.

·         These errors are more frequent than all other errors taken together.

·         Their impact on code readability is relatively low. 

In addition, in terms of type-related error rate, these compilers are close to each other (although CFR is slightly worse than competitors). 

Therefore, non-type-related errors are much more interesting. 

 

Syntax

All semantic

Type-related

Other

Fernflower

0

101

65

36

CFR

0

82

80

2

Procyon

0

79

61

16

 

A code generated by Fernflower contains the highest number of such errors, with 34 out of 36 being “variable <var> is already defined”. Two CFR errors are also related to variable redefenition. In the case of Procyon, most (10 out of 16) errors occur because boolean variables are used as an array index. This is caused by incorrect ternary operator processing, which is discussed in more detail below. 

Remarkably, CFR is the only tool that has improved within the last four months (it used to return 10 non-type-related and 72 type-related errors). This leads us to the assumption that a large number of type-related CFR errors are due to fewer other errors and consequent increase in volume of otherwise valid code for incorrect type inference.

Speed 

Disclaimer: once again, this review does not claim to be scientific in any way. 

Speed was compared roughly: tools were run several times on medium-size .jar files, and the fastest was identified. 

Below are the results for 100 runs on a 5.2 MB .jar file (which, of course, comprises.class files only). 

 

Time, sec

Fernflower

74

CFR

43

Procyon

74

The next table shows results for 15 runs on a 14 MB .jar file.

 

Time, sec

Fernflower

939

CFR

128

Procyon

573

From these results, we can assume that these decompilers use algorithms with different asymptotic behaviors. CFR performs consistently faster than competitors, while Fernflower slows down dramatically when handling large files. However, 14 MB is too much for an archive of .class files and, in reality, such large projects are rather rare. 

Handling of certain language features 

Here, I’ve taken some important language features and compared how well they are handled by different decompilers. 

A summary of this paragraph is shown in the table below. Keep in mind that results demonstrated by jadx are not completely relevant. In the next section, jadx is applied to an Android project and reviewed separately. 

image003.png

Let’s begin with discussing language features which only one tool has failed to handle. 

for-each

FullInstructionSequence.java 

1.jpeg

Fernflower always expands for-each loop through iterators, albeit not entirely accurately.

For example, here handler clutters an external namespace, which may cause variable redefenition. In addition, type parameter is not specified for iterator var3, which leads to unchecked cast in the fourth line: 

2.jpeg

Ternary operator in array indexing 

SSAConstructorSparseEx.java 

3.jpeg

This is a common and very unpleasant Procyon error. Understanding the initial idea with no source code available, is a non-trivial task, especially in more complicated cases: 

4.jpeg

Static field in an interface 

IFernflowerPreferences.java

5.jpeg

This mysterious issue is only reproduced by Procyon. An error causes when the default attribute specified instead of static in getDefaults() definition. 

6.jpeg

Other errors 

Below, we address a few more cases that turned out to be too complicated for several tools. 

For peace of mind, they are hidden under a spoiler alert.

Explicit unboxing 

VarVersionPair.java 

VarVersionsProcessor.java 

7.png

Fernflower 

Uncertainty in case of a constructor call. 

8.png

Procyon 

9.png

CFR 

10.png

dx + jadx 

Operate correctly, but there are excessive type conversions. 

11.png

Try-with-resources 

ConsoleDecompiler.java

12.png

Fernflower

Obviously, try-with-resources is not supported at all. However, you can see how difficult it is to rewrite it via an ordinary try-catch. The result is ambiguous (:

13.png

CFR 

We see degradation in this decompiler: version 0.142 handled try-with-resources properly, while version 0.146 outputs an excessive try 

(UPD: this was fixed in latter versions). 

14.png

Procyon 

15.png

dx + jadx (with --show-bad-code option) 

jadx fails and proudly gives us the relevant details. 

16.png

Lambdas 

ClassReference14Processor.java

17.png

Fernflower

18.png

Lambda is handled correctly. Problems with for-each are not related to lambda and are reproduced without it (see for-each section).

Procyon

The for-each within lambda is handled incorrectly (even though for-each loop itself is usually handled by Procyon quite properly). In this case, declarations of ent and iterator2 variables were put outside the lambda, which led to a build error, since ent is not an effectively final variable.

19.png

CFR 

20.png

dx + jadx

jadx does not support some new instructions yet, with respective issue being posted to GitHub.

21.png

Two initializations in one for 

SwitchInstruction.java

22.png

Fernflower

Variable i is set as an external scope and causes redefinition.

23.png

Procyon 

24.png

CFR

Here, two variables are declared in the external scope, but no redefinition occurs. 

25.png

dx + jadx 

Similar to CFR. 

26.png

Generics

Here are a couple of simple examples from a great number of type inference errors.

ConcatenationHelper.java

27.png

Fernflower

28.png

Procyon

29.png

CFR

30.png

dx + jadx

31.png
VarTypeProcessor.java
32.png

Fernflower

unchecked assignment in the first line.

33.png

Procyon

While this works, it comes at the cost of excessive and meaningless type conversions. 

34.png

CFR

It is not possible to compile the last line because Statement cannot be added to a list of <RootStatement> objects. 

35.png

dx + jadx

36.png

Statement.java

37.png

Fernflower 

38.png

Procyon

Compillation error in the last line.

39.png

CFR 

40.png

dx + jadx 

41.png

How jadx works on a dex file 

I have also looked at how jadx works on a real-life Android project – AntennaPod (a podcast listening app). 

The most unpleasant and unusual errors are presented here.

Static fields 

jadx regularly faces problems accessing static fields. For example, it transforms a bytecode taken from the following sources: 

42.png

into something like this:

43.png

Lambdas and anonymous classes 

For each lambda or anonymous class, jadx generates a separate named class. For example, a simple lambda like this: 

44.png

is transformed into this: 

45.png

King-size horror: adding and deleting variables 

Sometimes, the number of variables sharply increases after decompilation. Before: 

46.png

Now: 

47.png

And sometimes, on the contrary, jadx throws out a couple of variables it doesn’t like. There used to be the variables: 

48.png

Now there are none: 

49.png


A weird situation 

How can it be that jadx turns this: 

50.png

into this: 

51.png

And once again

52.png

turns into... 

53.png

In general, we can conclude that a code decompiled by jadx is not always easily readable but quite good in terms of accuracy and variety of constructs handled. At the same time, the rare but disastrous occasions when jadx adds 15 unnecessary variables, or expands the simplest switch-case through a three-level if-else, dramatically spoil the impression of the resulting code.

Results 

Following the comparison results, we can conclude:

CFR 

This tool prevails in terms of both code readability (better handles syntactic sugar like for-eachtry-with-resources, etc. with less semantic errors) and speed (especially for large files). In addition, it is also continuously developing and is supported by the developer. 

Weaknesses: the project is relatively young, was developed by only one person and is, presumably, rather raw. For example, while one release with a slight degradation was quickly fixed, just six months ago the resulting code contained syntax errors.

Procyon

While being more reliable and stable, the tool is barely evolving. Consequently, it lags behind CFR in terms of support of Java 9 (and newer) features. In addition, Procyon still suffers from marginal bugs (handling some ternary operators and static fields in interfaces). 

Fernflower 

Overall, the tool is not very well suited to our needs and loses to its competitors in terms of speed and result quality (at least on non-obfuscated data). On the other hand, the fact that Fernflower is used in Intellij IDEA means the project is likely to continue to evolve, at least in the short term.

jadx 

While being the only good (if not the only at all) decompiler for Android with good results, it is not stable (sometimes decompiles a bytecode accurately, but the result is totally unreadable). It also does not support some language features (such as try-with-resources) or some DVM instructions higher than ver. 37 (as of autumn 2019) and is incapable of decompiling .jar files. 

P.S. After writing this post, I found a very detailed decompiler comparison. Although it is official and scientific in nature, the report mainly evaluates decompilers by the resulting code accuracy, without giving attention to readability and speed.


Back to the list

Buy a Solar appScreener
.