Taint analysis collections access

Hello!
I am using proguard-core’s taint analysis tool and I have a question regarding the way it works with collections.

EDIT: I have just been helped with using newer version and at least the simple bug has been fixed. I’ll check other examples that I have and maybe this question will get resolved by itself.

The question is, if I have code like this I expect a trace to be found, but there are none.

    public void sink(String s) {
    }

    public List<String> source() {
        return List.of("s1", "s2");
    }

    public void launch() {
        sink(source().get(1));
    }

Source returns a malicious list and the data from the list is considered dangerous as well.
Same thing happens when I put tainted data into a list and then access the list: the data from the list is potentially tainted. But no traces are found.

I was wondering if I am setting some parameters wrong or is impossible to produce such traces for now?

For a more real-life example you might want to consider Command injection vulnerability example from the OWASP benchmark. We get tainted data in line 43 and feed it to the process in line 68 (sink). Ideally, I wanted this trace to be found.

Hi!

If I get your edit correctly it works now, right?

But I will provide a little bit more context on what I would expect to happen here (but I haven’t tested the code, I can try to do it in the next days if you still need help). This should work but maybe not for the reason you expect. We are not modelling collections behavior in any way for now, but we are over-approximating taint propagation on method calls, so calling get on a tainted list should return a taint regardless of the state of the content. If in your example the list wasn’t tainted and element 1 was, our code at its current state wouldn’t have been able to detect the tainted element.

Ok, got it, thanks) I’ll test it myself today on more examples and ask futher questions if needed.

A small update: the OWASP example I gave you works if I replace an array with a list like this:

prev:

        if (!map.isEmpty()) {
            String[] values = map.get("BenchmarkTest00500");
            if (values != null) param = values[0];
        }

modified:

        if (!map.isEmpty()) {
            List<String> values = Arrays.stream(map.get("BenchmarkTest00500")).toList();
            if (values != null) param = values.get(0);
        }

So the question is, how does the instrument work with arrays?

The short answer is that getting an element from an array won’t propagate the taint.

A more precise answer: loading elements from arrays is handled at line 344 of JvmTransferRelation, the code pushing the array element value is the following:

abstractState.push(abstractState.getArrayElementOrDefault(abstractState.pop(), index, getAbstractDefault()));

If you check how JvmAbstractState works this results in asking the HeapAbstractState to provide the value. The behavior then depends on the heap model chosen for the analysis, which is a parameter you can set when creating a CpaRun. For example if you are using the forgetful heap model you need to check the code in JvmForgetfulHeapAbstractState:

    public <T> StateT getArrayElementOrDefault(T array, StateT index, StateT defaultValue)
    {
        return this.defaultValue;
    }

That as you can see doesn’t really return a taint even if the array itself is tainted. None of the heap states available at the moment really handles the arrays in any smarter way. If you want to over-approximate the taint propagation you can try to design your own heap state that taints the entire array if a tainted value is set and returns a taint when getting an element if the array is tainted.

In case you would like to have a more precise analysis for the singles array elements, this would require pairing the taint analysis with a value analysis to be able to know the value of the indices, which is a feature that is currently not available.

1 Like

Thanks a lot for your answer!
I’ve tested the tool on the whole set of Command Injection vulnerability examples (~78 files, they have cmdi-00 in the annotation, if you want to take a look) and I have one more question, it’s a similar one.
The two main reasons for failure were the one discussed above and the following: if I add a tainted value to an array/a collection the collection does not become tainted and for example get() method does not taint its return. You mentioned this issue above as well.
The question is, is this planned to be fixed/supported any time soon?

As I said before for collections I would expect it to work as long as you keep the collection in the local context (i.e., if you set it as another object field it won’t work), while for arrays I would expect creating a new HeapAbstractState that over-approximates setting and getting arrays values to taint the array and get the tainted status of the array would work, as I explained in the previous comment (you should also extend the CpaRun implementation you are using to use your new state).

Unfortunately at the moment we don’t have any roadmap to work on this internally.

1 Like

Ok, understood, thanks a lot for your help!

1 Like