Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add taint-flow modeling for re module #14725

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Nov 8, 2023

I ended up writing the tests first, and then did the modeling with flow-summaries instead of manually propagating flow to the object as we have been forced to do before.

I think overall it was nice to be able to write flow-summaries, although the lack of auto-completion when writing the input/output specs is a little annoying.

As a side effect of the way I worked, we do have some slightly funky test results, such as the one below -- Since we can say that only the first element of the tuple returned by re.subn is tainted, we don't actually taint the whole returned tuple like we used to (I added that "feature" in a few places, but with a TODO notice).

☝️ is more a note to reviewers why the tests look like they do, and that some of the MISSING annotations are actually fine... I wonder if we should remove those lines of the test though, to not confuese our future selves?

    re.subn(pat, repl="safe", string=ts), # $ MISSING: tainted
    re.subn(pat, repl="safe", string=ts)[0], # $ tainted // the string

(I recommend reviewing commit-by-commit)

@tausbn
Copy link
Contributor

tausbn commented Nov 8, 2023

☝️ is more a note to reviewers why the tests look like they do, and that some of the MISSING annotations are actually fine... I wonder if we should remove those lines of the test though, to not confuese our future selves?

    re.subn(pat, repl="safe", string=ts), # $ MISSING: tainted
    re.subn(pat, repl="safe", string=ts)[0], # $ tainted // the string

In this case, wouldn't the right thing be to just remove the MISSING annotation entirely (and perhaps add a comment that only the first element of re.subn(...) is tainted)?

In my mind, MISSING comes with an implicit meaning of "but we really ought to have this", which -- as I understand it -- is not the case here.

@yoff
Copy link
Contributor

yoff commented Nov 9, 2023

I would go even further and add the lines where we do not expect taint under an ensure_not_tainted. This "feature" is similar to what we want to achieve with https://github.com/github/codeql-python-team/issues/728.

@RasmusWL
Copy link
Member Author

Thanks for the good comments @tausbn and @yoff 🙏

@RasmusWL
Copy link
Member Author

Performance looks fine 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants