In this post, we will review and compare the following four decompilers: Fernflower, CFR, Procyon, and jadx.
Disclaimer: this is neither a formal nor a scientific comparison, but rather an overview of all Java bytecode decompilers relevant, as of autumn 2019.
Author: Anna Yaveyn, Solar appScreener developer
Our tool – Solar appScreener – is designed to search code for vulnerabilities and can analyze Java bytecode as well. However, analysis itself is not enough: a user should get results, which can be integrated into the development process. So, you cannot just say "look at bytecode instruction No. 147 in a certain method", but need to match these errors to a source code in some way to make your findings actionable.
And, right away, a problem arises: what to do if no source code is available? The solution is to decompile a bytecode, find the lines corresponding to the detected vulnerabilities, and show the user error messages linked to respective lines in the decompiled code.
thus, we must be able to do the following two things:
· Decompile a bytecode
· Match bytecode instructions to source code lines
Spoiler alert: currently, no known decompilers (UPD: except Fernflower) offer the latter functionality. That’s why we have to match errors to lines in a decompiled code separately, after decompilation is completed.
Well, I will tell you about the first point: the decompilation itself.
Different decompilers are tailored to different tasks. For example, Fernflower is positioned as an analytical decompiler. While there is no clear explanation of what this means, in theory, this decompiler should focus on deeper code analysis and deobfuscation. For us, this functionality is not very important (at least when it comes to displaying results). In general, our top priority is ensuring the resulting code is clear and readable.
Therefore, we need our tools to:
· Provide readable and (to the extent possible) correct code
·
Support syntactic sugar (foreach
, try-with-resources
, etc.)
The comparison is based on these considerations and may be irrelevant when other decompiler requirements are in place.
For comparison, we chose four open source software tools (latest versions as of autumn 2019, when this post was written). Others were rejected at the pre-qualification stage and were not analyzed in detail.
· Fernflower is an open source decompiler which is currently developed and supported by JetBrains (code available on GitHub).
· CFR (0.146) decompiler seems to have been developed by a single person who states that he has done it “for fun”. Repository on GitHub. Website with project info.
· Procyon (0.5.36) is a toolkit for Java code generation and analysis that includes a decompiler and is hosted on Bitbucket.
· jadx (1.0.0) is a compiler designed for bytecode compilation for Dalvik but not for JVM (Source code available on GitHub).
For a full list of decompilers that were reviewed but not included in this post, follow this link.
·
JD-Core (aka JD Project) has neither a published
library nor source code available (only plugins for development environments
and GUI).
UPD: Source code was made available on GitHub.
· Krakatau is written on Python.
· JAD is completely outdated and does not even support Java 5.
Decompiler technical details in brief:
|
Fernflower |
CFR |
Procyon |
jadx |
License |
Apache 2.0 |
MIT |
Apache 2.0 |
Apache 2.0 |
Library |
Maven: org.benf.cfr |
Maven: org.bitbucket.mstrobel |
||
Supported Java versions |
Not specified |
8, 9 (some features) |
8 (most features) |
8 (some features) |
Written on |
Java 8 |
Java 6 |
Java 7 |
Java 8 |
Documentation |
Unavailable |
Keep in mind that jadx is primarily designed for Android projects. Therefore, to analyze a code written for jvm, the decompiler first converts it with dx tool. Since such conversion is sometimes incorrect, it is impossible to reliably compare jadx with other tools. Therefore, jadx functionality is reviewed separately most of the time.
Moreover, the fact that jadx supports DEX up to ver. 37 only results in, for example, lambda processing problems.
The comparison covered Fernflower (GitHub version of 16.09.19), CFR (0.146), Procyon (0.5.36), and jadx (1.0.0). Not all criteria were used when comparing jadx.
The project used as the basis for comparison is Fernflower itself, since it has a rather large codebase written completely on Java 8 with extensive use of Java features. We could not use more recent versions of Java: Procyon does not support Java 9 at all, CFR only supports some features, and there is no official information about Fernflower on this matter.
For run scripts, follow this link.
To view results right away without diving deep into the details.
· Project support and activity
· Error rate when building a decompilation result
· Speed
· Handling of certain language features
On the one hand, this decompiler is used in Intellij IDEA, which guarantees that the project will be kept active and supported.
On the other hand, Fernflower only operates as a part of the Intellij IDEA project. The decompiler itself does not even have a separate repository on GitHub (only above-mentioned non-official mirror, the link to which is the only way to connect Fernflower to your project as a dependency).
We can conclude from the GitHub repository state that new
features are rarely added to this tool, with the latest commit to master
taking place three months ago (as of autumn 2019). It is
difficult to understand what is going on with this project since the codebase
is part of the Intellij IDEA repository.
While the code is written by a single person, releases occur regularly (several times a year). The author responded to all reports I sent via email in a few days and fixed the bugs in a week or two. At the time of writing, a new release (0.147) had been issued and one of the reported bugs was fixed.
In addition, the tool is evolving rather quickly and support of new features is being added without undue delays.
The project is supported, and a new release was issued in summer 2019. However, this release contained nothing new except for bug fixes. In general, while the project does not seem to have been abandoned, further development is very unlikely.
This decompiler is continuously evolving and its GitHub repository and issue tracker are active. On June 20, 2019, release 1.0.0 was issued and new features and support for new DVM versions are being added regularly.
The comparison included running all decompilers on the same project and building decompilation results. This allowed us to understand what errors each decompiler may make and whether (and to what extent) the code after decompilation is adequate.
We did not include jadx here since it
throws 39 exceptions when decompiling fernflower.jar
and therefore is unable to decompile a large part of the
code at all.
Let us note first that there are three error classes: syntax (none detected, although older CFR versions had some), semantic errors associated with types (incorrect generic type inference, methods not found, incorrect type conversions), and all other semantic errors.
There are three reasons why type-related errors are included in a separate category:
· Decompilers are principally unable to restore the entire type information correctly.
· These errors are more frequent than all other errors taken together.
· Their impact on code readability is relatively low.
In addition, in terms of type-related error rate, these compilers are close to each other (although CFR is slightly worse than competitors).
Therefore, non-type-related errors are much more interesting.
|
Syntax |
All semantic |
Type-related |
Other |
Fernflower |
0 |
101 |
65 |
36 |
CFR |
0 |
82 |
80 |
2 |
Procyon |
0 |
79 |
61 |
16 |
A code generated by Fernflower contains the
highest number of such errors, with 34 out of 36 being “variable
<var> is already defined
”. Two
CFR errors
are also related to variable redefenition. In the case of Procyon,
most (10 out of 16) errors occur because boolean
variables are used as an array index. This is caused by
incorrect ternary operator processing, which is discussed in more detail below.
Remarkably, CFR is the only tool that has improved within the last four months (it used to return 10 non-type-related and 72 type-related errors). This leads us to the assumption that a large number of type-related CFR errors are due to fewer other errors and consequent increase in volume of otherwise valid code for incorrect type inference.
Disclaimer: once again, this review does not claim to be scientific in any way.
Speed was compared roughly: tools were run several times on medium-size .jar files, and the fastest was identified.
Below are the results for 100 runs on a 5.2 MB .jar file (which,
of course, comprises.class
files only).
|
Time, sec |
Fernflower |
74 |
CFR |
43 |
Procyon |
74 |
The next table shows results for 15 runs on a 14 MB .jar file.
|
Time, sec |
Fernflower |
939 |
CFR |
128 |
Procyon |
573 |
From these results, we can assume that these decompilers use
algorithms with different asymptotic behaviors. CFR performs consistently faster
than competitors, while Fernflower slows down
dramatically when handling large files. However, 14 MB is too much for an
archive of .class
files
and, in reality, such large projects are rather rare.
Here, I’ve taken some important language features and compared how well they are handled by different decompilers.
A summary of this paragraph is shown in the table below. Keep in mind that results demonstrated by jadx are not completely relevant. In the next section, jadx is applied to an Android project and reviewed separately.
Let’s begin with discussing language features which only one tool has failed to handle.
Fernflower always
expands for-each
loop through iterators,
albeit not entirely accurately.
For example, here handler
clutters an external namespace, which may cause variable
redefenition. In addition, type parameter is not specified for iterator var3
, which leads to unchecked cast
in the fourth line:
This is a common and very unpleasant Procyon error. Understanding the initial idea with no source code available, is a non-trivial task, especially in more complicated cases:
This mysterious issue is only reproduced by Procyon.
An error causes when the default
attribute specified instead of static
in getDefaults()
definition.
Below, we address a few more cases that turned out to be too complicated for several tools.
For peace of mind, they are hidden under a spoiler alert.
Fernflower
Uncertainty in case of a constructor call.
Procyon
Operate correctly, but there are excessive type conversions.
Fernflower
Obviously, try-with-resources
is not supported at all. However, you can see how difficult it is to rewrite it via an
ordinary try-catch
. The result is ambiguous (:
We see degradation in this decompiler: version 0.142
handled try-with-resources
properly, while version 0.146 outputs an excessive try
(UPD: this was fixed in latter versions).
Procyon
--show-bad-code
option) jadx fails and proudly gives us the relevant details.
ClassReference14Processor.java
Lambda is handled correctly. Problems with for-each
are not related to lambda and are reproduced without it
(see for-each section
).
Procyon
The for-each
within lambda is handled
incorrectly (even though for-each
loop itself is usually handled
by Procyon quite
properly). In this case, declarations of ent
and iterator2
variables were put outside
the lambda, which led to a build error, since ent
is not an effectively
final
variable.
jadx does not support some new instructions yet, with respective issue being posted to GitHub.
Two
initializations in one for
Variable i
is set as an external scope and causes redefinition.
Here, two variables are declared in the external scope, but no redefinition occurs.
dx + jadx
Similar to CFR.
Generics
Here are a couple of simple examples from a great number of type inference errors.
VarTypeProcessor.java
Fernflower
unchecked assignment
in the first line.
While this works, it comes at the cost of excessive and meaningless type conversions.
CFR
It is not possible to compile the last line because Statement
cannot be added to a list of <RootStatement>
objects.
dx + jadx
Fernflower
Compillation error in the last line.
CFR
I have also looked at how jadx works on a real-life Android project – AntennaPod (a podcast listening app).
The most unpleasant and unusual errors are presented here.
jadx regularly faces problems accessing static fields. For example, it transforms a bytecode taken from the following sources:
into something like this:
For each lambda or anonymous class, jadx generates a separate named class. For example, a simple lambda like this:
is transformed into this:
Sometimes, the number of variables sharply increases after decompilation. Before:
Now:
And sometimes, on the contrary, jadx throws out a couple of variables it doesn’t like. There used to be the variables:
Now there are none:
How can it be that jadx turns this:
into this:
And once again
turns into...
In general, we can conclude that a code decompiled by jadx is not always easily readable but quite good in terms of accuracy and variety of constructs handled. At the same time, the rare but disastrous occasions when jadx adds 15 unnecessary variables, or expands the simplest switch-case through a three-level if-else, dramatically spoil the impression of the resulting code.
Following the comparison results, we can conclude:
This tool prevails in terms of both code readability (better
handles syntactic sugar like for-each
, try-with-resources
, etc. with less semantic errors) and speed (especially for
large files). In addition, it is also continuously developing and is supported
by the developer.
Weaknesses: the project is relatively young, was developed by only one person and is, presumably, rather raw. For example, while one release with a slight degradation was quickly fixed, just six months ago the resulting code contained syntax errors.
While being more reliable and stable, the tool is barely evolving. Consequently, it lags behind CFR in terms of support of Java 9 (and newer) features. In addition, Procyon still suffers from marginal bugs (handling some ternary operators and static fields in interfaces).
Overall, the tool is not very well suited to our needs and loses to its competitors in terms of speed and result quality (at least on non-obfuscated data). On the other hand, the fact that Fernflower is used in Intellij IDEA means the project is likely to continue to evolve, at least in the short term.
While being the only good (if not the only at all) decompiler
for Android with good results, it is not stable (sometimes decompiles a
bytecode accurately, but the result is totally unreadable). It also does not
support some language features (such as try-with-resources
) or some DVM instructions higher than ver. 37 (as of autumn
2019) and is incapable of decompiling .jar files.
P.S. After writing this post, I found a very detailed decompiler comparison. Although it is official and scientific in nature, the report mainly evaluates decompilers by the resulting code accuracy, without giving attention to readability and speed.
.