Skip to content

Commit 3aa6606

Browse files
authored
Merge pull request #21806 from geoffw0/extsensitive
Shared: Improvements to SensitiveDataHeuristics.qll
2 parents c1e26f9 + a4b2c0f commit 3aa6606

14 files changed

Lines changed: 137 additions & 76 deletions

File tree

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: minorAnalysis
3+
---
4+
* The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `js/clear-text-logging`) may find more correct results and fewer false positive results after these changes.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: minorAnalysis
3+
---
4+
* The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `py/clear-text-logging-sensitive-data`) may find more correct results and less fewer positive results after these changes.

python/ql/test/query-tests/Security/CWE-312-CleartextLogging/CleartextLogging.expected

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ edges
1010
| test.py:48:14:48:35 | ControlFlowNode for social_security_number | test.py:49:15:49:36 | ControlFlowNode for social_security_number | provenance | |
1111
| test.py:48:38:48:40 | ControlFlowNode for ssn | test.py:50:15:50:17 | ControlFlowNode for ssn | provenance | |
1212
| test.py:48:54:48:63 | ControlFlowNode for passportNo | test.py:52:15:52:24 | ControlFlowNode for passportNo | provenance | |
13+
| test.py:54:14:54:22 | ControlFlowNode for post_code | test.py:55:15:55:23 | ControlFlowNode for post_code | provenance | |
14+
| test.py:54:25:54:31 | ControlFlowNode for zipCode | test.py:56:15:56:21 | ControlFlowNode for zipCode | provenance | |
1315
| test.py:54:34:54:45 | ControlFlowNode for home_address | test.py:57:15:57:26 | ControlFlowNode for home_address | provenance | |
1416
| test.py:59:14:59:26 | ControlFlowNode for user_latitude | test.py:60:15:60:27 | ControlFlowNode for user_latitude | provenance | |
1517
| test.py:59:29:59:42 | ControlFlowNode for user_longitude | test.py:61:15:61:28 | ControlFlowNode for user_longitude | provenance | |
@@ -42,7 +44,11 @@ nodes
4244
| test.py:49:15:49:36 | ControlFlowNode for social_security_number | semmle.label | ControlFlowNode for social_security_number |
4345
| test.py:50:15:50:17 | ControlFlowNode for ssn | semmle.label | ControlFlowNode for ssn |
4446
| test.py:52:15:52:24 | ControlFlowNode for passportNo | semmle.label | ControlFlowNode for passportNo |
47+
| test.py:54:14:54:22 | ControlFlowNode for post_code | semmle.label | ControlFlowNode for post_code |
48+
| test.py:54:25:54:31 | ControlFlowNode for zipCode | semmle.label | ControlFlowNode for zipCode |
4549
| test.py:54:34:54:45 | ControlFlowNode for home_address | semmle.label | ControlFlowNode for home_address |
50+
| test.py:55:15:55:23 | ControlFlowNode for post_code | semmle.label | ControlFlowNode for post_code |
51+
| test.py:56:15:56:21 | ControlFlowNode for zipCode | semmle.label | ControlFlowNode for zipCode |
4652
| test.py:57:15:57:26 | ControlFlowNode for home_address | semmle.label | ControlFlowNode for home_address |
4753
| test.py:59:14:59:26 | ControlFlowNode for user_latitude | semmle.label | ControlFlowNode for user_latitude |
4854
| test.py:59:29:59:42 | ControlFlowNode for user_longitude | semmle.label | ControlFlowNode for user_longitude |
@@ -79,6 +85,8 @@ subpaths
7985
| test.py:49:15:49:36 | ControlFlowNode for social_security_number | test.py:48:14:48:35 | ControlFlowNode for social_security_number | test.py:49:15:49:36 | ControlFlowNode for social_security_number | This expression logs $@ as clear text. | test.py:48:14:48:35 | ControlFlowNode for social_security_number | sensitive data (private) |
8086
| test.py:50:15:50:17 | ControlFlowNode for ssn | test.py:48:38:48:40 | ControlFlowNode for ssn | test.py:50:15:50:17 | ControlFlowNode for ssn | This expression logs $@ as clear text. | test.py:48:38:48:40 | ControlFlowNode for ssn | sensitive data (private) |
8187
| test.py:52:15:52:24 | ControlFlowNode for passportNo | test.py:48:54:48:63 | ControlFlowNode for passportNo | test.py:52:15:52:24 | ControlFlowNode for passportNo | This expression logs $@ as clear text. | test.py:48:54:48:63 | ControlFlowNode for passportNo | sensitive data (private) |
88+
| test.py:55:15:55:23 | ControlFlowNode for post_code | test.py:54:14:54:22 | ControlFlowNode for post_code | test.py:55:15:55:23 | ControlFlowNode for post_code | This expression logs $@ as clear text. | test.py:54:14:54:22 | ControlFlowNode for post_code | sensitive data (private) |
89+
| test.py:56:15:56:21 | ControlFlowNode for zipCode | test.py:54:25:54:31 | ControlFlowNode for zipCode | test.py:56:15:56:21 | ControlFlowNode for zipCode | This expression logs $@ as clear text. | test.py:54:25:54:31 | ControlFlowNode for zipCode | sensitive data (private) |
8290
| test.py:57:15:57:26 | ControlFlowNode for home_address | test.py:54:34:54:45 | ControlFlowNode for home_address | test.py:57:15:57:26 | ControlFlowNode for home_address | This expression logs $@ as clear text. | test.py:54:34:54:45 | ControlFlowNode for home_address | sensitive data (private) |
8391
| test.py:60:15:60:27 | ControlFlowNode for user_latitude | test.py:59:14:59:26 | ControlFlowNode for user_latitude | test.py:60:15:60:27 | ControlFlowNode for user_latitude | This expression logs $@ as clear text. | test.py:59:14:59:26 | ControlFlowNode for user_latitude | sensitive data (private) |
8492
| test.py:61:15:61:28 | ControlFlowNode for user_longitude | test.py:59:29:59:42 | ControlFlowNode for user_longitude | test.py:61:15:61:28 | ControlFlowNode for user_longitude | This expression logs $@ as clear text. | test.py:59:29:59:42 | ControlFlowNode for user_longitude | sensitive data (private) |

python/ql/test/query-tests/Security/CWE-312-CleartextLogging/test.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,8 @@ def log1(social_security_number, ssn, className, passportNo):
5252
print(passportNo) # NOT OK
5353

5454
def log2(post_code, zipCode, home_address):
55-
print(post_code) # NOT OK, but NOT FOUND - "code" is treated as encrypted and thus not sensitive
56-
print(zipCode) # NOT OK, but NOT FOUND - "code" is treated as encrypted and thus not sensitive
55+
print(post_code) # NOT OK
56+
print(zipCode) # NOT OK
5757
print(home_address) # NOT OK
5858

5959
def log3(user_latitude, user_longitude):
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: minorAnalysis
3+
---
4+
* The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `rust/cleartext-logging`) may find more correct results and fewer false positive results after these changes.

rust/ql/test/library-tests/sensitivedata/test.rs

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,14 @@ impl MyStruct {
2323
fn get_password() -> String { get_string() }
2424

2525
fn test_passwords(
26-
password: &str, pass_word: &str, passwd: &str, my_password: &str, password_str: &str, password_confirmation: &str,
26+
password: &str, pass_word: &str, passwd: &str, my_password: &str, password_str: &str, password_confirmation: &str, profile_password: &str,
2727
pass_phrase: &str, passphrase: &str, passPhrase: &str, backup_code: &str,
2828
auth_key: &str, authkey: &str, authKey: &str, authentication_key: &str, authenticationkey: &str, authenticationKey: &str, oauth: &str,
29-
one_time_code: &str,
30-
harmless: &str, encrypted_password: &str, password_hash: &str, passwordFile: &str,
29+
one_time_code: &str, api_token: &str, api_tok: &str,
30+
harmless: &str,
31+
encrypted_password: &str, unencrypted_password: &str, encoded_password: &str, unencoded_password: &str,
32+
password_hash: &str, passwordFile: &str, coauthor: &str,
33+
3134
ms: &MyStruct
3235
) {
3336
// passwords
@@ -38,6 +41,9 @@ fn test_passwords(
3841
sink(my_password); // $ sensitive=password
3942
sink(password_str); // $ sensitive=password
4043
sink(password_confirmation); // $ sensitive=password
44+
sink(profile_password); // $ sensitive=password
45+
sink(unencrypted_password); // $ sensitive=password
46+
sink(unencoded_password); // $ sensitive=password
4147
sink(pass_phrase); // $ sensitive=password
4248
sink(passphrase); // $ sensitive=password
4349
sink(passPhrase); // $ sensitive=password
@@ -51,6 +57,8 @@ fn test_passwords(
5157
sink(authenticationKey); // $ sensitive=password
5258
sink(oauth); // $ sensitive=password
5359
sink(one_time_code); // $ MISSING: sensitive=password
60+
sink(api_token); // $ sensitive=password
61+
sink(api_tok); // $ sensitive=password
5462

5563
sink(ms); // $ MISSING: sensitive=password
5664
sink(ms.password.as_str()); // $ sensitive=password
@@ -67,8 +75,10 @@ fn test_passwords(
6775

6876
sink(harmless);
6977
sink(encrypted_password);
78+
sink(encoded_password);
7079
sink(password_hash);
7180
sink(passwordFile);
81+
sink(coauthor);
7282

7383
sink(ms.harmless.as_str());
7484
sink(ms.password_file_path.as_str());
@@ -187,6 +197,10 @@ struct Financials {
187197
harmless: String,
188198
my_bank_account_number: String,
189199
credit_card_no: String,
200+
card_no: String,
201+
cardNumber: String,
202+
card_security_code: String,
203+
190204
credit_rating: i32,
191205
user_ccn: String,
192206
cvv: String,
@@ -201,6 +215,7 @@ struct Financials {
201215
accounting: i32,
202216
unaccounted: bool,
203217
multiband: bool,
218+
wildcard_not_matched: bool,
204219
}
205220

206221
enum Gender {
@@ -298,6 +313,9 @@ fn test_private_info(
298313

299314
sink(info.financials.my_bank_account_number.as_str()); // $ sensitive=private SPURIOUS: sensitive=id
300315
sink(info.financials.credit_card_no.as_str()); // $ sensitive=private
316+
sink(info.financials.card_no.as_str()); // $ sensitive=private
317+
sink(info.financials.cardNumber.as_str()); // $ sensitive=private
318+
sink(info.financials.card_security_code.as_str()); // $ sensitive=private
301319
sink(info.financials.credit_rating); // $ sensitive=private
302320
sink(info.financials.user_ccn.as_str()); // $ sensitive=private
303321
sink(info.financials.cvv.as_str()); // $ sensitive=private
@@ -350,6 +368,7 @@ fn test_private_info(
350368
sink(info.financials.accounting);
351369
sink(info.financials.unaccounted);
352370
sink(info.financials.multiband);
371+
sink(info.financials.wildcard_not_matched);
353372

354373
sink(ContactDetails::FavouriteColor("blue".to_string()));
355374
}

shared/concepts/codeql/concepts/internal/SensitiveDataHeuristics.qll

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ module HeuristicNames {
7676
string maybePassword() {
7777
result =
7878
"(?is).*(pass(wd|word|code|.?phrase)(?!.*question)|(auth(entication|ori[sz]ation)?).?key|oauth|"
79-
+ "api.?(key|token)|([_-]|\\b)mfa([_-]|\\b)).*"
79+
+ "api.?(key|tok)|([_-]|\\b)mfa([_-]|\\b)).*"
8080
}
8181

8282
/**
@@ -104,8 +104,9 @@ module HeuristicNames {
104104
// Geographic location - where the user is (or was)
105105
"latitude|longitude|nationality|" +
106106
// Financial data - such as credit card numbers, salary, bank accounts, and debts
107-
"(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|acc(ou)?nt.?(no|num|credit)|routing.?num|"
107+
"(credit|debit|bank|visa).?(card|num|no|acc(ou)?nt)|(card|acc(ou)?nt).?(no|num|credit)|routing.?num|"
108108
+ "salary|billing|beneficiary|credit.?(rating|score)|([_-]|\\b)(ccn|cvv|iban)([_-]|\\b)|" +
109+
"security.?code|" +
109110
// Communications - e-mail addresses, private e-mail messages, SMS text messages, chat logs, etc.
110111
// "e(mail|_mail)|" + // this seems too noisy
111112
// Health - medical conditions, insurance status, prescription records
@@ -145,13 +146,13 @@ module HeuristicNames {
145146
* suggesting nouns within the string do not represent the meaning of the whole string (e.g. a URL or a SQL query).
146147
*
147148
* We also filter out common words like `certain` and `concert`, since otherwise these could
148-
* be matched by the certificate regular expressions. Same for `accountable` (account), or
149-
* `secretarial` (secret).
149+
* be matched by the certificate regular expressions. Same for `accountable` (account),
150+
* `secretarial` (secret), `wildcard` (card), `coauthor` (oauth).
150151
*/
151152
string notSensitiveRegexp() {
152153
result =
153-
"(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|((?<!un)(en))?(crypt|(?<!pass)code)|"
154-
+ "certain|concert|secretar|account(ant|ab|ing|ed)|file|path|([_-]|\\b)url).*"
154+
"(?is).*([^\\w$.-]|redact|censor|obfuscate|hash|md5|sha|random|(?<!unen)crypt|(?<!un)encode|" +
155+
"certain|concert|secretar|wildcard|coauthor|account(ant|ab|ing|ed)|(?<!pro)file|path|([_-]|\\b)url).*"
155156
}
156157

157158
/**
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: minorAnalysis
3+
---
4+
* The sensitive data heuristics used to identify code that handles passwords and private data have been improved. Most of the changes permit more variations of established patterns, thereby finding more sensitive data. Queries that use the sensitive data library (for example `swift/cleartext-logging`) may find more correct results and fewer false positive results after these changes.

swift/ql/test/query-tests/Security/CWE-311/CleartextTransmission.expected

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ nodes
6464
| testSend.swift:78:27:78:30 | .CarePlanID | semmle.label | .CarePlanID |
6565
| testSend.swift:79:27:79:30 | .BankCardNo | semmle.label | .BankCardNo |
6666
| testSend.swift:80:27:80:30 | .MyCreditRating | semmle.label | .MyCreditRating |
67+
| testSend.swift:81:27:81:30 | .OneTimeCode | semmle.label | .OneTimeCode |
6768
| testSend.swift:86:7:86:7 | self | semmle.label | self |
6869
| testSend.swift:94:27:94:30 | .password | semmle.label | .password |
6970
| testSend.swift:94:27:94:39 | .value | semmle.label | .value |
@@ -118,6 +119,7 @@ subpaths
118119
| testSend.swift:78:27:78:30 | .CarePlanID | testSend.swift:78:27:78:30 | .CarePlanID | testSend.swift:78:27:78:30 | .CarePlanID | This operation transmits '.CarePlanID', which may contain unencrypted sensitive data from $@. | testSend.swift:78:27:78:30 | .CarePlanID | .CarePlanID |
119120
| testSend.swift:79:27:79:30 | .BankCardNo | testSend.swift:79:27:79:30 | .BankCardNo | testSend.swift:79:27:79:30 | .BankCardNo | This operation transmits '.BankCardNo', which may contain unencrypted sensitive data from $@. | testSend.swift:79:27:79:30 | .BankCardNo | .BankCardNo |
120121
| testSend.swift:80:27:80:30 | .MyCreditRating | testSend.swift:80:27:80:30 | .MyCreditRating | testSend.swift:80:27:80:30 | .MyCreditRating | This operation transmits '.MyCreditRating', which may contain unencrypted sensitive data from $@. | testSend.swift:80:27:80:30 | .MyCreditRating | .MyCreditRating |
122+
| testSend.swift:81:27:81:30 | .OneTimeCode | testSend.swift:81:27:81:30 | .OneTimeCode | testSend.swift:81:27:81:30 | .OneTimeCode | This operation transmits '.OneTimeCode', which may contain unencrypted sensitive data from $@. | testSend.swift:81:27:81:30 | .OneTimeCode | .OneTimeCode |
121123
| testSend.swift:94:27:94:39 | .value | testSend.swift:94:27:94:30 | .password | testSend.swift:94:27:94:39 | .value | This operation transmits '.value', which may contain unencrypted sensitive data from $@. | testSend.swift:94:27:94:30 | .password | .password |
122124
| testURL.swift:39:18:39:50 | ... .+(_:_:) ... | testURL.swift:39:50:39:50 | passwd | testURL.swift:39:18:39:50 | ... .+(_:_:) ... | This operation transmits '... .+(_:_:) ...', which may contain unencrypted sensitive data from $@. | testURL.swift:39:50:39:50 | passwd | passwd |
123125
| testURL.swift:41:18:41:51 | ... .+(_:_:) ... | testURL.swift:41:51:41:51 | account_no | testURL.swift:41:18:41:51 | ... .+(_:_:) ... | This operation transmits '... .+(_:_:) ...', which may contain unencrypted sensitive data from $@. | testURL.swift:41:51:41:51 | account_no | account_no |

swift/ql/test/query-tests/Security/CWE-311/SensitiveExprs.expected

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@
170170
| testSend.swift:78:27:78:30 | .CarePlanID | label:CarePlanID, type:private information |
171171
| testSend.swift:79:27:79:30 | .BankCardNo | label:BankCardNo, type:private information |
172172
| testSend.swift:80:27:80:30 | .MyCreditRating | label:MyCreditRating, type:private information |
173+
| testSend.swift:81:27:81:30 | .OneTimeCode | label:OneTimeCode, type:credential |
173174
| testSend.swift:94:27:94:30 | .password | label:password, type:password |
174175
| testURL.swift:39:50:39:50 | passwd | label:passwd, type:password |
175176
| testURL.swift:41:51:41:51 | account_no | label:account_no, type:private information |

0 commit comments

Comments
 (0)