Glossary
This document defines common words used to describe our secrets detection engine.
Detectorβ
A set of rules that will be applied to a document to find one type of secret (e.g.: AWS keys, database URI, Google Key...).
Generic detectorβ
We consider that a detector is generic if we are not able to infer the secret's provider directly. For example the detector looking for a pattern such as secret={high_entropy_string}
is a generic detector.
Specific detectorβ
A specific detector is a detector designed to find a well identified type of secret such as AWS keys, MySQL URI, Slack token... Specific detectors are often opposed to generic detectors.
Assignment and assigned variableβ
We refer to an assignment as any statement of the form {assigned_variable} {assignment_token} {value}
. For example in this statement: my_variable = "HelloWorld"
, the assigned variable is: my_variable
.
Documentβ
Any text with a filename. Filename is optional.
Entropyβ
Measure of randomness of a string. An API key should have a high entropy since it is a randomly generated sequence of characters. When mentioning entropy in this documentation, we mean Shannon entropy.
Filepath / Filename / Extensionβ
We adopted the following conventions for naming paths. For example config/secrets.yaml
:
yaml
is the extension.secrets.yaml
is the filename.config/secrets.yaml
is the filepath.
Insightβ
Additional information on a document or a secret.
Matchβ
A string that is part of a secret. A secret can be composed of one or several matches.
Matcherβ
A detection rule that is applied to a document and outputs matches.
PostValidatorβ
A validation rule applied to a secret candidate (e.g.: validate that all the matches have sufficient entropy).
Precisionβ
The fraction of secrets detected that are indeed true secrets. We can keep track of this metric with the feedbacks of our customers.
PreValidatorβ
A validation rule applied to a document (e.g.: look for "datadog" in the document).
Priorityβ
A rule that prioritizes one secret over another one if they are overlapping. A secret detected by a specific detector always has a higher priority than one detected by a generic detector.
Recallβ
The fraction of secrets we were able to detect and classify as such among all secrets that exist. This metric is almost impossible to measure without human labelling.
Scannerβ
A collection of detectors. In terms of code, this is the entry point to scan a document, and the only way of scanning one.
Secretβ
A combination of strings found by a detector in a document. This combination should grant access to a private service.
Secrets overlappingβ
Two secrets overlap if any of one's matches are partially or completely included in any of the other's secrets matches.
Validity Checkβ
A non intrusive call to the concerned service that allows to determine whether a key is valid or invalid. Some validity checks can be used to improve our precision and be sure that we only raise alerts for valid secrets.