You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: workshop-2021/README.md
+78-51Lines changed: 78 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,13 +58,26 @@ If you get stuck, try searching our documentation and blog posts for help and id
58
58
- CodeQL on [GitHub Learning Lab](https://lab.github.com/search?q=codeql)
59
59
- For more advanced CodeQL development in future, you may wish to set up the [CodeQL starter workspace](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/#using-the-starter-workspace) for all languages.
60
60
61
+
### Useful commands
62
+
- Run a query using the following commands from the Command Palette (`Cmd/Ctrl + Shift + P`) or right-click menu:
63
+
- `CodeQL: Run Query` (run the entire query)
64
+
- `CodeQL: Quick Evaluation` (run only the selected predicate or snippet)
65
+
- Click the links in the query results to navigate to the source code.
66
+
- Explore the CodeQL libraries in your IDE using:
67
+
- autocomplete suggestions (`Cmd/Ctrl + Space`)
68
+
- jump-to-definition (`F12`, or `Cmd/Ctrl + F12` in a Codespace)
69
+
- documentation hovers (place your cursor over an element)
70
+
- the AST viewer on an open source file (`View AST` from the CodeQL sidebar or Command Palette)
71
+
61
72
## Workshop <a id="workshop"></a>
62
73
63
-
The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a **hint** that describes useful classes and predicates in the CodeQL standard libraries for Java. You can explore these in your IDE using the autocomplete suggestions (`Cmd/Ctrl + Space`) and the jump-to-definition command (`F12`).
74
+
The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a **hint** that describes useful classes and predicates in the CodeQL standard libraries for Java.
Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/dev/impls/serialize/) to wrap multiple deserialization formats. Most of the supported serialization libraries might lead to arbitrary code execution upon deserialization of untrusted data. The SPI interface used for deserialization is called [ObjectInput](https://javadoc.io/doc/org.apache.dubbo/dubbo/latest/com/alibaba/dubbo/common/serialize/ObjectInput.html). It provides multiple `readXXX` methods for deserializing data to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits. In this section, we will identify calls to `ObjectInput.readXXX` methods in the codebase.
78
+
Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/dev/impls/serialize/) to wrap multiple deserialization formats. Most of the supported serialization libraries might lead to arbitrary code execution upon deserialization of untrusted data. The SPI interface used for deserialization is called [ObjectInput](https://javadoc.io/doc/org.apache.dubbo/dubbo/latest/com/alibaba/dubbo/common/serialize/ObjectInput.html). It provides multiple `readXXX` methods for deserializing data to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits.
79
+
80
+
In this section, we will identify calls to `ObjectInput.readXXX` methods in the codebase. The qualifiers of these calls are the values being deserialized, and hence are **sinks** for deserialization vulnerabilities.
68
81
69
82
1. Find all method calls in the program.
70
83
<details>
@@ -89,8 +102,9 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
89
102
<summary>Hints</summary>
90
103
91
104
- Add a CodeQL variable called `method` with type `Method`.
92
-
- `MethodAccess` has a predicate called `getMethod()` for returning the method.
93
105
- Add a `where` clause.
106
+
- `MethodAccess` has a predicate called `getMethod()` for returning the method.
107
+
- Use the equality operator `=` to assert that two CodeQL expressions are the same.
94
108
95
109
</details>
96
110
<details>
@@ -112,6 +126,7 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
112
126
113
127
- `Method.getName()` returns a string representing the name of the method.
114
128
- `string.matches("foo%")` can be used to check if a string starts with `foo`.
129
+
- Use the `and` keyword to add multiple conditions to the `where` clause.
115
130
116
131
</details>
117
132
<details>
@@ -128,15 +143,22 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
128
143
```
129
144
</details>
130
145
131
-
1. Refine your query to only match calls to `read` methods on classes implementing the `ObjectInput` interface.<a id="question1"></a>
146
+
1. Refine your query to only match calls to `read` methods on classes implementing the `org.apache.dubbo.common.serialize.ObjectInput` interface.<a id="question1"></a>
132
147
133
148
<details>
134
149
<summary>Hint</summary>
135
150
136
151
- `Method.getDeclaringType()` returns the `RefType` this method is declared on. A `Class` is one kind of `RefType`.
137
-
- `RefType.getASourceSupertype()` returns the immediate parent/supertypes for a given type.
152
+
- `RefType.getASourceSupertype()` returns the immediate parent/supertypes for a given type, as defined in the Java source. (Hover to see the documentation.)
138
153
- Use the "reflexive transitive closure" operator `*` on a call to a predicate with 2 arguments, e.g. `getASourceSupertype*()`, to apply the predicate 0 or more times in succession.
139
-
- `RefType.hasQualifiedName("package", "class")` holds if the given `RefType` has the qualified name `package.class`.
154
+
- `RefType.hasQualifiedName("package", "class")` holds if the given `RefType` has the fully-qualified name `package.class`.
155
+
For example, the query
156
+
```ql
157
+
from RefType r
158
+
where r.hasQualifiedName("java.lang", "String")
159
+
select r
160
+
```
161
+
will find the type `java.lang.String`.
140
162
141
163
</details>
142
164
<details>
@@ -220,11 +242,14 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
220
242
221
243
### Section 2: Find the implementations of the decodeBody method from DubboCodec<a id="section2"></a>
222
244
223
-
Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portions of logic. Classes represent single sets of values, and they can also include operations (known as _member predicates_) specific to that set of values. You have already seen numerous instances of CodeQL classes (`MethodAccess`, `Method` etc.) and associated member predicates (`MethodAccess.getMethod()`, `Method.getName()`, etc.).
245
+
Classes that implement the interface `org.apache.dubbo.remoting.Codec2` process user input in their `decodeBody` methods. In this section we will find these methods and their parameters, which are **sources** of untrusted user input.
246
+
247
+
Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portions of logic. Classes represent sets of values, and they can also include operations (known as _member predicates_) specific to that set of values. You have already seen numerous instances of CodeQL classes (`MethodAccess`, `Method` etc.) and associated member predicates (`MethodAccess.getMethod()`, `Method.getName()`, etc.).
224
248
225
249
1. Create a CodeQL class called `DubboCodec` to find the interface `org.apache.dubbo.remoting.Codec2`. You can use this template:
226
250
```ql
227
251
class DubboCodec extends RefType {
252
+
// Characteristic predicate
228
253
DubboCodec() {
229
254
// TODO Fill me in
230
255
}
@@ -234,13 +259,8 @@ Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portion
234
259
<details>
235
260
<summary>Hint</summary>
236
261
237
-
- Use `RefType.hasQualifiedName(string packageName, string className)` to identify classes with the given package name and class name. For example:
238
-
```ql
239
-
from RefType r
240
-
where r.hasQualifiedName("java.lang", "String")
241
-
select r
242
-
```
243
-
- Within the characteristic predicate you can use the magic variable `this` to refer to the RefType
262
+
- Use `RefType.hasQualifiedName("package", "class")` to identify classes with the given package name and class name.
263
+
- Within the characteristic predicate, use the special variable `this` to refer to the `RefType` we are describing.
244
264
245
265
</details>
246
266
<details>
@@ -284,20 +304,29 @@ Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portion
284
304
```
285
305
</details>
286
306
287
-
3. `decodeBody` methods should consider the second and third parameters as untrusted user input. Write a query to find these parameters (i.e. index 1 and index 2) parameter for `decodeBody` methods.
307
+
3. `decodeBody` methods should consider the second and third parameters as untrusted user input. Add a member predicate to your `DubboCodecDecodeBody` class that finds these parameters of `decodeBody` methods.
288
308
<details>
289
309
<summary>Hint</summary>
290
310
291
-
- Use `Method.getParameter(int index)` to get the i-th index parameter.
292
-
- Create a query with a single CodeQL variable of type `DubboCodecDecodeBody`.
311
+
- Create a predicate `Parameter getAnUntrustedParameter() { ... } ` within the class. This has result type `Parameter`.
312
+
- Within the predicate, use the special variable `result` to refer to the values to be "returned" or identified by the predicate.
313
+
- Within the predicate, use the special variable `this` to refer to the `DubboCodecDecodeBody` method.
314
+
- Use `Method.getParameter(int index)` to get the `i`-th index parameter. Indices are 0-based, so we want index 1 and index 2 here.
315
+
- Use Quick Evaluation to run your predicate.
293
316
294
317
</details>
295
318
<details>
296
319
<summary>Solution</summary>
297
320
298
321
```ql
299
-
from DubboCodecDecodeBody decodeBodyMethod
300
-
select decodeBodyMethod.getParameter([1, 2])
322
+
class DubboCodecDecodeBody extends Method {
323
+
DubboCodecDecodeBody() {
324
+
this.getDeclaringType().getASupertype*() instanceof DubboCodec and
325
+
this.hasName("decodeBody")
326
+
}
327
+
328
+
Parameter getAnUntrustedParameter() { result = this.getParameter([1, 2]) }
329
+
}
301
330
```
302
331
</details>
303
332
@@ -329,9 +358,9 @@ The data flow graph for this method will look something like this:
329
358
330
359
This graph represents the flow of data from the tainted parameter. The nodes of graph represent program elements that have a value, such as function parameters and expressions. The edges of this graph represent flow through these nodes.
331
360
332
-
CodeQL for Java provides data flow analysis as part of the standard library. You can import it using `semmle.code.java.dataflow.DataFlow`. The library models nodes using the `DataFlow::Node` CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
361
+
CodeQL for Java provides data flow analysis as part of the standard library. You can import it using `semmle.code.java.dataflow.DataFlow` or `semmle.code.java.dataflow.TaintTracking`. The library models nodes using the `DataFlow::Node` CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
333
362
334
-
There are a small number of data flow node types – expression nodes and parameter nodes are most common.
363
+
There are a small number of data flow node types – expression nodes and parameter nodes are most common. We can use the `asExpr()` and `asParameter()` methods to convert a `DataFlow::Node` into the corresponding AST node.
335
364
336
365
In this section we will create a data flow query by populating this template:
337
366
@@ -355,11 +384,10 @@ class DubboUnsafeDeserializationConfig extends TaintTracking::Configuration {
1. Complete the `isSink` predicate by using the final query you wrote for [Section 1](#section1). Remember to use the `isDeserialized` predicate!
421
+
1. Complete the `isSink` predicate, using the logic you wrote for [Section 1](#section1).
397
422
<details>
398
423
<summary>Hint</summary>
399
424
400
425
- Complete the same process as above.
426
+
- Remember the `isDeserialized` predicate you defined earlier.
427
+
- Use `asExpr()` to convert a `DataFlow::Node` into an `Expr`.
401
428
402
429
</details>
403
430
<details>
404
431
<summary>Solution</summary>
405
432
406
433
```ql
407
-
override predicate isSink(Node sink) {
408
-
exists(Expr qualifier |
409
-
isDeserialized(qualifier) and
410
-
sink.asExpr() = qualifier
411
-
)
434
+
override predicate isSink(DataFlow::Node sink) {
435
+
isDeserialized(sink.asExpr())
412
436
}
413
437
```
414
438
</details>
415
439
416
-
1. Complete the `isAdditionalTaintStep` predicate by modelling the `Serialization.deserialize()` method so that it connects its first argument with the return value.
440
+
1. Teach CodeQL about extra data flow steps that it should follow. Complete the `isAdditionalTaintStep` predicate by modelling the `Serialization.deserialize()` method, which connects its _first argument_ with the _return value_.
417
441
<details>
418
442
<summary>Hint</summary>
419
443
420
-
- Complete the same process as above.
444
+
- As before, use `exists` to declare new variables, `asExpr()` to convert from `DataFlow::Node` to `Expr`,
445
+
and `=` to assert equality.
446
+
- `isAdditionalTaintStep` has two arguments: the node where data starts, and the node where data ends.
421
447
422
448
</details>
423
449
<details>
@@ -446,9 +472,9 @@ The answer to this is to convert the query to a _path problem_ query. There are
446
472
- Add a new import `DataFlow::PathGraph`, which will report the path data alongside the query results.
447
473
- Change `source` and `sink` variables from `DataFlow::Node` to `DataFlow::PathNode`, to ensure that the nodes retain path information.
448
474
- Use `hasFlowPath` instead of `hasFlow`.
449
-
- Change the select to report the `source` and `sink` as the second and third columns. The toolchain combines this data with the path information from `PathGraph` to build the paths.
475
+
- Change the `select` clause to report the `source` and `sink` as the second and third columns. The toolchain combines this data with the path information from `PathGraph` to build the paths.
450
476
451
-
3. Convert your previous query to a path-problem query.
477
+
3. Convert your previous query to a path-problem query. Run the query to see the paths in the results view.
452
478
<details>
453
479
<summary>Solution</summary>
454
480
@@ -484,20 +510,21 @@ The answer to this is to convert the query to a _path problem_ query. There are
484
510
this.getDeclaringType().getASupertype*() instanceof DubboCodec and
485
511
this.hasName("decodeBody")
486
512
}
513
+
514
+
Parameter getAnUntrustedParameter() {
515
+
result = this.getParameter([1, 2])
516
+
}
487
517
}
488
518
489
-
class DubboUnsafeDeserializationConfig extends DataFlow::Configuration {
519
+
class DubboUnsafeDeserializationConfig extends TaintTracking::Configuration {
490
520
DubboUnsafeDeserializationConfig() { this = "DubboUnsafeDeserializationConfig" }
@@ -516,7 +543,7 @@ The answer to this is to convert the query to a _path problem_ query. There are
516
543
```
517
544
</details>
518
545
519
-
For more information on how the vulnerability was identified, you can read the [blog disclosing the original problem](https://securitylab.github.com/research/apache-dubbo/).
546
+
For more information on how the vulnerability was identified, read the [blog post on the original problem](https://securitylab.github.com/research/apache-dubbo/).
0 commit comments