Python: Add taint-flow modeling for `re` module #14725

RasmusWL · 2023-11-08T16:25:24Z

I ended up writing the tests first, and then did the modeling with flow-summaries instead of manually propagating flow to the object as we have been forced to do before.

I think overall it was nice to be able to write flow-summaries, although the lack of auto-completion when writing the input/output specs is a little annoying.

As a side effect of the way I worked, we do have some slightly funky test results, such as the one below -- Since we can say that only the first element of the tuple returned by re.subn is tainted, we don't actually taint the whole returned tuple like we used to (I added that "feature" in a few places, but with a TODO notice).

☝️ is more a note to reviewers why the tests look like they do, and that some of the MISSING annotations are actually fine... I wonder if we should remove those lines of the test though, to not confuese our future selves?

    re.subn(pat, repl="safe", string=ts), # $ MISSING: tainted
    re.subn(pat, repl="safe", string=ts)[0], # $ tainted // the string

(I recommend reviewing commit-by-commit)

tausbn · 2023-11-08T16:30:32Z

☝️ is more a note to reviewers why the tests look like they do, and that some of the MISSING annotations are actually fine... I wonder if we should remove those lines of the test though, to not confuese our future selves?
    re.subn(pat, repl="safe", string=ts), # $ MISSING: tainted
    re.subn(pat, repl="safe", string=ts)[0], # $ tainted // the string

In this case, wouldn't the right thing be to just remove the MISSING annotation entirely (and perhaps add a comment that only the first element of re.subn(...) is tainted)?

In my mind, MISSING comes with an implicit meaning of "but we really ought to have this", which -- as I understand it -- is not the case here.

yoff · 2023-11-09T13:53:26Z

I would go even further and add the lines where we do not expect taint under an ensure_not_tainted. This "feature" is similar to what we want to achieve with https://github.com/github/codeql-python-team/issues/728.

Ruby uses 10 as their number. I considered doing the same, but didn't really care _too_ much about it 🤷 https://github.com/github/codeql/blob/14cfb82a8c16e15fadc006ae46331302f0341f63/ruby/ql/lib/codeql/ruby/dataflow/internal/DataFlowPrivate.qll#L636

Mostly to highlight that with flow-summary modeling, we don't expect taint for a lot of these. I aslo opted to make `finditer()` tainted for consistency.

RasmusWL · 2023-11-13T10:28:19Z

Thanks for the good comments @tausbn and @yoff 🙏

RasmusWL · 2023-11-14T14:53:22Z

Performance looks fine 👍

RasmusWL added 4 commits November 8, 2023 16:05

Python: Add tests of taint-flow for re module

ea4761d

Python: Add taint modeling of re.Match objects

851c30e

Python: Model taint from re.<func> calls

4943fc5

Python: Add change-note

3023d3b

RasmusWL requested a review from a team as a code owner November 8, 2023 16:25

github-actions bot added documentation Python labels Nov 8, 2023

RasmusWL added 5 commits November 10, 2023 16:32

Merge branch 'main' into re-modeling

c85d99d

Python: Highlight problem with flow summaries and TAttributeContent

943b2a2

Python: Fix problems with missing TAttributeContent

c3fa3f2

Python: Reorganize taint tests of re

e1c47f5

Mostly to highlight that with flow-summary modeling, we don't expect taint for a lot of these. I aslo opted to make `finditer()` tainted for consistency.

Python: Add taint-flow modeling for `re` module #14725

Python: Add taint-flow modeling for `re` module #14725

RasmusWL commented Nov 8, 2023

tausbn commented Nov 8, 2023

yoff commented Nov 9, 2023

RasmusWL commented Nov 13, 2023

RasmusWL commented Nov 14, 2023

Python: Add taint-flow modeling for re module #14725

Are you sure you want to change the base?

Python: Add taint-flow modeling for re module #14725

Conversation

RasmusWL commented Nov 8, 2023

tausbn commented Nov 8, 2023

yoff commented Nov 9, 2023

RasmusWL commented Nov 13, 2023

RasmusWL commented Nov 14, 2023

Python: Add taint-flow modeling for `re` module #14725

Python: Add taint-flow modeling for `re` module #14725