While Solared Appscrener can detect Java bytecode vulnerabilities, highlighting a vulnerable bytecode instruction is not enough. How to show a vulnerability in source code that you do not actually have access to?
In practice, you should invoke one of the following actions for each detected vulnerability:
Eliminate
Accept risks
Prove it was a false positive
Each of these three actions requires some vulnerability assessment and at least an understanding of elimination procedures and the inherent risks involved.
How can we make such an assessment if we reveal Java bytecode vulnerabilities while lacking the respective source code?
Solared Appscreener mainly uses static analysis, i.e. it searches for vulnerabilities without executing a code. In the case of static analysis, you have to:
Build a code model (internal representation)
Enrich the model with information generated by static analysis algorithms such as control flow analysis, dataflow analysis (e.g., taint analysis)
Apply vulnerability search rules to locate vulnerabilities in the code model in terms of the model and the information it is enriched with
When vulnerabilities are discovered in the internal representation, you should display them in terms of the source code, for further analysis.
Initially, Solared Appscreener was used to look for vulnerabilities in Android and Java apps. Binary analysis is in demand now:
The contract terms may not allow for the transfer of source code to a customer
Even if the source code is transferred to the customer, its resulting executable code may differ from the code provided for production use or that is uploaded to Google Play
While developers use third-party components without any source code, such code should also be under control
This is why, for static analysis of Android and Java apps, we have chosen Java bytecode as an internal representation. Once a mobile app source code is compiled into Java bytecode, Dalvik compiler merges class files and recompiles the code into Dalvik bytecode, thereby obtaining an executable .dex file. The executable file, resources and a configuration file are then packed into an .apk package and distributed through Google Play. There are tools to process and convert .apk packages, including unpacking, decrypting resources and configuration files, and converting Dalvik code into Java bytecode ( apktool, dex2jar).
Bytecode can also be obtained by compiling respective source code (just as we do when analyzing Java and Scala source codes). As a result, Java bytecode is ideal for internal representation when analyzing source and executable codes of Java and Android apps (in fact, you can analyze all languages that can be compiled into Java bytecode).
Java bytecode can be decompiled into a relatively high quality code, with many different Java decompilers being available ( JD, fernflower, Procyon). We have decided against the use of reconstructed Java code for internal representation due to the fact that all decompilers are prone to errors affecting vulnerability searches.
So, we have detected vulnerabilities in Java bytecode (to learn more about how we managed this, please see further posts). What to do with the findings?
We should describe such findings in terms of source code, or rather reconstructed high-level code. In this context, a vulnerability means a set of instruction positions within a bytecode that define vulnerability (insecure method call; instruction set, through which insecure dataflow goes). Therefore, we have to map each bytecode instruction to a line number in the reconstructed code. A class file (i.e. a bytecode file corresponding to class in the source code) contains a LineNumberTable attribute showing the mapping of bytecode positions to source code line numbers. As a result, the LineNumberTable attribute in a bytecode is required to display vulnerabilities in Java terms.
When analyzing a bytecode (including the one derived from an .apk file), the LineNumberTable attribute may be absent since it may have been deleted during compilation or decompilation from .apk. Although it doesn’t really matter, it should be noted that the LineNumberTable attribute deleted from the bytecode corresponded with a source code created by a developer, rather than a reconstructed almost-source code. Therefore, you need to reconstruct the LineNumberTable attribute in an analyzed bytecode that refers to the reconstructed code.
The basic algorithm here is to build an abstract syntax tree (AST) of the decompiled code from Java bytecode and output the reconstructed almost-source code while traversing the AST. In addition, the AST depth-first search also includes the retention of information about the mapping of reconstructed code line numbers to the positions of method bytecode instructions.
At each tree node, we know the instruction position within the bytecode with respect to the start of the current method and the line number in the reconstructed code file. Therefore, during traversing, method borders are also retained in the reconstructed code for subsequent filtration of 'position in method bytecode' / 'file line number' pairs.
Anonymous classes are processed separately as they generate nested methods. As a result, after traversing, we analyze the nesting of source code methods.
In practice, we are often given projects containing both a source code (which we can compile to obtain a bytecode with a line number table) and a bytecode (various third-party components, libraries, etc.). To analyze such projects, Solared Appscreener features Java projects pre-processing, which comprises the following steps:
Analysis of the project and detection of all class files, Java and Scala source code files, and jar/war files with bytecode.
Depending on user-defined project scanning settings, the class file list is augmented by class files derived from jar/war files (typically, this means that the project is analyzed jointly with libraries).
We obtain full class names from class files and source code files and use the class names to match the source code and bytecode files.
The bytecode files, which lack matching source code files, are decompiled and the line number information is reconstructed as described above.
Such pre-processing also covers anonymous and nested classes, with it being possible for one source code file to correspond with several bytecode files.
As a result, for each bytecode file we have a file with Java code (either reconstructed or source code) and the line number table connecting it to this file.
Using the procedures and algorithms described in this post, Solared Appscreener maps any vulnerability detected in the Java or Android app to a source code, whether it was submitted for analysis or not.
A similar approach is applied to analyze binaries of iOS apps, although this case is much more complicated, with the decompilation of ARM architecture binary code still requiring further study and research. Check out our next posts to learn more about this topic!
.