Java source code (.java files) is typically compiled to bytecode (.class
files). Bytecode is more compact than Java source code, but it may still
contain a lot of unused code, especially if it includes program libraries.
Shrinking programs such as ProGuard can analyze bytecode and remove
unused classes, fields, and methods. The program remains functionally
equivalent, including the information given in exception stack traces.
By default, compiled bytecode still contains a lot of debugging information:
source file names, line numbers, field names, method names, argument names,
variable names, etc. This information makes it straightforward to decompile
the bytecode and reverse-engineer entire programs. Sometimes, this is not
desirable. Obfuscators such as ProGuard can remove the debugging
information and replace all names by meaningless character sequences, making
it much harder to reverse-engineer the code. It further compacts the code as a
bonus. The program remains functionally equivalent, except for the class
names, method names, and line numbers given in exception stack traces.
When loading class files, the class loader performs some sophisticated
verification of the byte code. This analysis makes sure the code can't
accidentally or intentionally break out of the sandbox of the virtual machine.
Java Micro Edition and Java 6 introduced split verification. This means that
the JME preverifier and the Java 6 compiler add preverification information to
the class files (StackMap and StackMapTable attributes, respectively), in order
to simplify the actual verification step for the class loader. Class files can
then be loaded faster and in a more memory-efficient way. ProGuard can
perform the preverification step too, for instance allowing to retarget older
class files at Java 6.
Apart from removing unused classes, fields, and methods in the shrinking step,
ProGuard can also perform optimizations at the bytecode level, inside
and across methods. Thanks to techniques like control flow analysis, data flow
analysis, partial evaluation, static single assignment, global value numbering,
and liveness analysis, ProGuard can:
Evaluate constant expressions.
Remove unnecessary field accesses and method calls.
Remove unnecessary branches.
Remove unnecessary comparisons and instanceof tests.
Remove unused code blocks.
Merge identical code blocks.
Reduce variable allocation.
Remove write-only fields and unused method parameters.
Inline constant fields, method parameters, and return values.
Inline methods that are short or only called once.
Simplify tail recursion calls.
Merge classes and interfaces.
Make methods private, static, and final when possible.
Make classes static and final when possible.
Replace interfaces that have single implementations.
Perform over 200 peephole optimizations, like replacing ...*2 by
...<<1.
Optionally remove logging code.
The positive effects of these optimizations will depend on your code and on
the virtual machine on which the code is executed. Simple virtual machines may
benefit more than advanced virtual machines with sophisticated JIT compilers.
At the very least, your bytecode may become a bit smaller.
Some notable optimizations that aren't supported yet:
Yes, you can. ProGuard itself is distributed under the GPL, but this
doesn't affect the programs that you process. Your code remains yours, and
its license can remain the same.
Yes, ProGuard supports all JDKs from 1.1 up to and including 7.0. Java 2
introduced some small differences in the class file format. Java 5 added
attributes for generics and for annotations. Java 6 introduced optional
preverification attributes. Java 7 made preverification obligatory and
introduced support for dynamic languages. ProGuard handles all versions
correctly.
Yes. ProGuard itself runs in Java Standard Edition, but you can freely
specify the run-time environment at which your programs are targeted,
including Java Micro Edition. ProGuard then also performs the required
preverification, producing more compact results than the traditional external
preverifier.
ProGuard also comes with an obfuscator plug-in for the JME Wireless
Toolkit.
Yes. Google's dx compiler converts ordinary jar files into files
that run on Android devices. By preprocessing the original jar files,
ProGuard can significantly reduce the file sizes and boost the run-time
performance of the code.
It should. RIM's proprietary rapc compiler converts ordinary JME
jar files into cod files that run on Blackberry devices. The compiler performs
quite a few optimizations, but preprocessing the jar files with
ProGuard can generally still reduce the final code size by a few
percent. However, the rapc compiler also seems to contain some
bugs. It sometimes fails on obfuscated code that is valid and accepted by other
JME tools and VMs. Your mileage may therefore vary.
Yes. ProGuard provides an Ant task, so that it integrates seamlessly
into your Ant build processes. You can still use configurations in
ProGuard's own readable format. Alternatively, if you prefer XML, you
can specify the equivalent XML configuration.
Yes. First of all, ProGuard is perfectly usable as a command-line tool
that can easily be integrated into any automatic build process. For casual
users, there's also a graphical user interface that simplifies creating,
loading, editing, executing, and saving ProGuard configurations.
Yes. ProGuard automatically handles constructs like
Class.forName("SomeClass") and SomeClass.class. The
referenced classes are preserved in the shrinking phase, and the string
arguments are properly replaced in the obfuscation phase.
With variable string arguments, it's generally not possible to determine their
possible values. They might be read from a configuration file, for instance.
However, ProGuard will note a number of constructs like
"(SomeClass)Class.forName(variable).newInstance()". These might
be an indication that the class or interface SomeClass and/or its
implementations may need to be preserved. The user can adapt his configuration
accordingly.
No. Storing encrypted string constants in program code is fairly futile, since
the encryption has to be perfectly reversible by definition. Moreover, the
decryption costs additional memory and computation at run-time. If this feature
is ever incorporated, I'll provide a tool to decrypt the strings as well.
Not explicitly. Control flow obfuscation injects additional branches into the
bytecode, in an attempt to fool decompilers. ProGuard does not do this,
in order to avoid any negative effects on performance and size. However, the
optimization step often already restructures the code to the point where most
decompilers get confused.
Yes. This feature allows you to specify a previous obfuscation mapping file in
a new obfuscation step, in order to produce add-ons or patches for obfuscated
code.
Yes. You can specify your own obfuscation dictionary, such as a list of
reserved key words, identifiers with foreign characters, random source files,
or a text by Shakespeare. Note that this hardly improves the obfuscation.
Decent decompilers can automatically replace reserved keywords, and the effect
can be undone fairly easily, by obfuscating again with simpler names.