Traces interpretation

Hello everyone!
I need help in interpreting traces. Could you please explain me what is going on in the example below?

This is the class I am analysing.

public class UtilClass {

    public String taintedMethod() {
        return "tainted string";
    }

    public String getName(int i) {
        if (i > 5) {
            return taintedMethod();
        } else return "Less";
    }

    public void createFile(String pathName) throws IOException {
        Path path = Path.of(pathName);
        Files.createFile(path);
    }
}

I set taintedMethod as a source and Files.createFile as a sink.
And I got the following after running val traces = taintMemoryLocationCpaRun.extractLinearTraces():

I would expect the trace to show me the path from source to sink, but apparently it is vice versa. But for some reason I don’t see taintedMethod itself anywhere in the trace (I was inly able to find it in one of the parents of some trace element).
And am I correct, that the numbers at the end of each element are the numbers of lines of bytecode instructions?

Thanks in advance!

Hi!

Since the trace reconstruction works indeed backwards (from the sink to the source), the trace simply preserves the order.

The trace shows the location of the tainted value along the control path from the source call to the sink call. The call to taintedMethod() occurs in getName(int i), hence the first occurrence of the tainted value is the top of the operand stack after the call. If there was a way to specify the "tainted string" literal as the taint source, then the trace would start in taintedMethod() after the ldc instruction loading the literal to the top of the operand stack.

You are right, the number after the colon is the offset number. Thus, the trace element consists of the memory location and the program location of the tainted value.

Thanks a lot!
@Dmitry_Ivanov I have another related question.
For example I have this piece of code:

 public static void doSmth(int i) {
        var r = new Random();
        boolean b1 = r.nextBoolean();
        boolean b2 = r.nextBoolean();
        var tainted = getTaintedData();
        if (b1) {
            ...
        } else if (b2) {
            ...
        }
        *sink*

Here I get tainted data and it arrives at the sink. In the middle there can a few variants of what will be going on. I can enter the if branch or the else one or none, so there should be three possible traces. But the analysis will only show me one trace, could you please explain why? Is there a way to get all the possible traces?

(If I manipulate the tainted data somehow in both branches, there still will be only one trace)

Thank you for spotting this! My expectation would be that traces for all branches are present. I consider this to be a bug but I do not expect it to be fixed soon. If the matter is urgent for you, you can debug TraceExtractor#extractLinearTraces to check whether parents from missing branches are present. If they are, the fix should be trivial. Otherwise, you would need to look into JvmMemoryLocationTransferRelation#getAbstractSuccessors and ensure that parents for the tainted JvmLocalVariableLocation are found and preserved in the ReachedSet.

1 Like

For some reason in the example like below the tool does not find any traces at all.

var x = true;
var tainted = getTaintedData();
if (x) {
    *sink*
} else {
    *sink*
}

And it seems that in the method JvmTaintMemoryLocationBamCpaRun#getEndPoints both sink locations are not seen and they are not included into the reached set. I expected them to be found when the algorithm was looking for reached taint sinks in all cached reached sets (in getEndPoints), but it didn’t. It can only see the sink located after the if - else block.

I saw the ConfigurableProgramAnalysis interface, can it be configured manually and is it possible that its configuration may help me find those skipped sinks? Maybe there is a way out?

Hi @olesya,

thanks a lot for bringing this to our attention. We are investigating this now, and will keep you in the loop!

All the best,
Dennis

1 Like

We tried out the examples you provided and could not reproduce your problem for Java 8. Can you provide us with the bytecode of the code exhibiting unexpected behavior? Use java p to pretty print the bytecode. Note that we do not analyze the source code but the byte code, i.e., the input program goes through compiler optimizations and may have some of its parts influencing the expected analysis result (calls to taint sources/sinks, conditional branches) removed.

Actually, your comment on bytecode optimization gave me some food for thought (the example was a bit more complex than the one I mentioned above) and I have to admit I was a bit inattentive there. So I was able to fix it, thanks a lot and sorry for bothering you so much on this topic))

While working on it, I noticed another curious moment though…
I analyzed the code below with source set to taintedString() and sink set to Files.createFile()

Code that was analysed
public class Main {

    public static void main(String[] args) throws IOException {
        util();
    }

    public static String taintedString() {
        return "tainted string";
    }

    public static void util() throws IOException {
        boolean b = new Random().nextBoolean();
        String s = taintedString();
        String v = "";
        if (b) {
            v = taintedString();
        }
        Files.createFile(Path.of(s));
        Files.createFile(Path.of(v));
    }
}

Correct me if I’m wrong, but I expected to get two traces, one from v to the second createFile() invocation and another from s to the first createFile() invocation. But I got three, you can see them below.

Traces

First:

JvmStackLocation(1)@Ls1/Main;util()V:54
JvmStackLocation(1)@Ls1/Main;util()V:51
JvmStackLocation(0)@Ls1/Main;util()V:50
JvmStackLocation(1)@Ls1/Main;util()V:47
JvmStackLocation(1)@Ls1/Main;util()V:44
JvmStackLocation(0)@Ls1/Main;util()V:43
JvmLocalVariableLocation(2)@Ls1/Main;util()V:42
JvmLocalVariableLocation(2)@Ls1/Main;util()V:41
JvmLocalVariableLocation(2)@Ls1/Main;util()V:38
JvmLocalVariableLocation(2)@Ls1/Main;util()V:35
JvmLocalVariableLocation(2)@Ls1/Main;util()V:34
JvmLocalVariableLocation(2)@Ls1/Main;util()V:31
JvmLocalVariableLocation(2)@Ls1/Main;util()V:28
JvmLocalVariableLocation(2)@Ls1/Main;util()V:27
JvmLocalVariableLocation(2)@Ls1/Main;util()V:26
JvmStackLocation(0)@Ls1/Main;util()V:25

Second:

JvmStackLocation(1)@Ls1/Main;util()V:38
JvmStackLocation(1)@Ls1/Main;util()V:35
JvmStackLocation(0)@Ls1/Main;util()V:34
JvmStackLocation(1)@Ls1/Main;util()V:31
JvmStackLocation(1)@Ls1/Main;util()V:28
JvmStackLocation(0)@Ls1/Main;util()V:27
JvmLocalVariableLocation(1)@Ls1/Main;util()V:26
JvmStackLocation(0)@Ls1/Main;util()V:25

Third:

JvmStackLocation(1)@Ls1/Main;util()V:38
JvmStackLocation(1)@Ls1/Main;util()V:35
JvmStackLocation(0)@Ls1/Main;util()V:34
JvmStackLocation(1)@Ls1/Main;util()V:31
JvmStackLocation(1)@Ls1/Main;util()V:28
JvmStackLocation(0)@Ls1/Main;util()V:27
JvmLocalVariableLocation(1)@Ls1/Main;util()V:26
JvmLocalVariableLocation(1)@Ls1/Main;util()V:25
JvmLocalVariableLocation(1)@Ls1/Main;util()V:22
JvmLocalVariableLocation(1)@Ls1/Main;util()V:19
JvmLocalVariableLocation(1)@Ls1/Main;util()V:18
JvmLocalVariableLocation(1)@Ls1/Main;util()V:17
JvmLocalVariableLocation(1)@Ls1/Main;util()V:15
JvmStackLocation(0)@Ls1/Main;util()V:14

The trace I did not expect to get (second one) leads from v to the first createFile() invocation.
For better understanding you can see bytecode of util method below.
v is stored in line 11 into local variable 2. But in line 26 the local variable 1 is being loaded. So I don’t get why there is a trace from v to line 38.

util() bytecode
 0 new #14 <java/util/Random>
 3 dup
 4 invokespecial #16 <java/util/Random.<init> : ()V>
 7 invokevirtual #17 <java/util/Random.nextBoolean : ()Z>
10 istore_0
11 invokestatic #21 <s1/Main.taintedString : ()Ljava/lang/String;>
14 astore_1
15 ldc #25
17 astore_2
18 iload_0
19 ifeq 26 (+7)
22 invokestatic #21 <s1/Main.taintedString : ()Ljava/lang/String;>
25 astore_2
26 aload_1
27 iconst_0
28 anewarray #27 <java/lang/String>
31 invokestatic #29 <java/nio/file/Path.of : (Ljava/lang/String;[Ljava/lang/String;)Ljava/nio/file/Path;>
34 iconst_0
35 anewarray #35 <java/nio/file/attribute/FileAttribute>
38 invokestatic #37 <java/nio/file/Files.createFile : (Ljava/nio/file/Path;[Ljava/nio/file/attribute/FileAttribute;)Ljava/nio/file/Path;>
41 pop
42 aload_2
43 iconst_0
44 anewarray #27 <java/lang/String>
47 invokestatic #29 <java/nio/file/Path.of : (Ljava/lang/String;[Ljava/lang/String;)Ljava/nio/file/Path;>
50 iconst_0
51 anewarray #35 <java/nio/file/attribute/FileAttribute>
54 invokestatic #37 <java/nio/file/Files.createFile : (Ljava/nio/file/Path;[Ljava/nio/file/attribute/FileAttribute;)Ljava/nio/file/Path;>
57 pop
58 return

I am glad that case got resolved.

You expectation here is incorrect because the linear trace reconstructor takes all intraprocedural branches exhaustively, i.e, if there are two branches a data flow can take, two traces will be created. In your example, the data flow from v has only one path to take but the flow from s does not depend on the if statement and generates two traces which gives us three in total.

However, the traces you showed are indeed concerning because the astore_2 instruction looks wrongly interpreted in the second trace. I have tried your example for Java 11 but got three traces having consistent memory location structure:

JvmStackLocation(1)@LMain;util()V:54
JvmStackLocation(1)@LMain;util()V:51
JvmStackLocation(0)@LMain;util()V:50
JvmStackLocation(1)@LMain;util()V:47
JvmStackLocation(1)@LMain;util()V:44
JvmStackLocation(0)@LMain;util()V:43
JvmLocalVariableLocation(2)@LMain;util()V:42
JvmLocalVariableLocation(2)@LMain;util()V:41
JvmLocalVariableLocation(2)@LMain;util()V:38
JvmLocalVariableLocation(2)@LMain;util()V:35
JvmLocalVariableLocation(2)@LMain;util()V:34
JvmLocalVariableLocation(2)@LMain;util()V:31
JvmLocalVariableLocation(2)@LMain;util()V:28
JvmLocalVariableLocation(2)@LMain;util()V:27
JvmLocalVariableLocation(2)@LMain;util()V:26
JvmStackLocation(0)@LMain;util()V:25

JvmStackLocation(1)@LMain;util()V:38
JvmStackLocation(1)@LMain;util()V:35
JvmStackLocation(0)@LMain;util()V:34
JvmStackLocation(1)@LMain;util()V:31
JvmStackLocation(1)@LMain;util()V:28
JvmStackLocation(0)@LMain;util()V:27
JvmLocalVariableLocation(1)@LMain;util()V:26
JvmLocalVariableLocation(1)@LMain;util()V:25
JvmLocalVariableLocation(1)@LMain;util()V:22
JvmLocalVariableLocation(1)@LMain;util()V:19
JvmLocalVariableLocation(1)@LMain;util()V:18
JvmLocalVariableLocation(1)@LMain;util()V:17
JvmLocalVariableLocation(1)@LMain;util()V:15
JvmStackLocation(0)@LMain;util()V:14

JvmStackLocation(1)@LMain;util()V:38
JvmStackLocation(1)@LMain;util()V:35
JvmStackLocation(0)@LMain;util()V:34
JvmStackLocation(1)@LMain;util()V:31
JvmStackLocation(1)@LMain;util()V:28
JvmStackLocation(0)@LMain;util()V:27
JvmLocalVariableLocation(1)@LMain;util()V:26
JvmLocalVariableLocation(1)@LMain;util()V:19
JvmLocalVariableLocation(1)@LMain;util()V:18
JvmLocalVariableLocation(1)@LMain;util()V:17
JvmLocalVariableLocation(1)@LMain;util()V:15
JvmStackLocation(0)@LMain;util()V:14

Can you send us the contents of JvmTaintMemoryLocationBamCpaRun#getOutputReachedSet after the trace reconstruction is done?

PS. Starting a new topic for each question would help other community members make use of it. We can continue our discussion here until this case is solved.

1 Like

Got it, thanks.

Yet again, I was wrong for using not the latest version of your tool. When I first wanted to try out the tool I set the version to the latest one that still allowed me to run code from the tutorial (9.0.3) not to dig too much into the configuration details. I ran it on the latest version now and got the desired results. Thanks a lot and again sorry for bothering you so much with my misunderstandings :smiling_face_with_tear:

That’s what I got after printing out each memoryLocation of the outputReachedSet elements in case you still need it.

JvmLocalVariableLocation(1)@Ls1/Main;util()V:18
JvmLocalVariableLocation(2)@Ls1/Main;util()V:26
JvmLocalVariableLocation(1)@Ls1/Main;util()V:26
JvmLocalVariableLocation(1)@Ls1/Main;util()V:22
JvmLocalVariableLocation(1)@Ls1/Main;util()V:19
JvmStackLocation(1)@Ls1/Main;util()V:54
JvmStackLocation(1)@Ls1/Main;util()V:51
JvmLocalVariableLocation(2)@Ls1/Main;util()V:27
JvmLocalVariableLocation(2)@Ls1/Main;util()V:42
JvmStackLocation(0)@Ls1/Main;util()V:34
JvmLocalVariableLocation(1)@Ls1/Main;util()V:17
JvmStackLocation(0)@Ls1/Main;util()V:27
JvmLocalVariableLocation(1)@Ls1/Main;util()V:15
JvmStackLocation(1)@Ls1/Main;util()V:38
JvmStackLocation(1)@Ls1/Main;util()V:35
JvmLocalVariableLocation(2)@Ls1/Main;util()V:35
JvmStackLocation(0)@Ls1/Main;util()V:50
JvmStackLocation(0)@Ls1/Main;util()V:14
JvmLocalVariableLocation(2)@Ls1/Main;util()V:38
JvmLocalVariableLocation(2)@Ls1/Main;util()V:34
JvmStackLocation(1)@Ls1/Main;util()V:44
JvmLocalVariableLocation(2)@Ls1/Main;util()V:41
JvmStackLocation(0)@Ls1/Main;util()V:43
JvmStackLocation(1)@Ls1/Main;util()V:47
JvmLocalVariableLocation(2)@Ls1/Main;util()V:28
JvmLocalVariableLocation(1)@Ls1/Main;util()V:25
JvmStackLocation(0)@Ls1/Main;util()V:25
JvmStackLocation(1)@Ls1/Main;util()V:28
JvmStackLocation(1)@Ls1/Main;util()V:31
JvmLocalVariableLocation(2)@Ls1/Main;util()V:31