Skip to content

Commit 9ab90e7

Browse files
authored
Update hints, descriptions, and query structure
1 parent ec2607e commit 9ab90e7

1 file changed

Lines changed: 78 additions & 51 deletions

File tree

workshop-2021/README.md

Lines changed: 78 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,26 @@ If you get stuck, try searching our documentation and blog posts for help and id
5858
- CodeQL on [GitHub Learning Lab](https://lab.github.com/search?q=codeql)
5959
- For more advanced CodeQL development in future, you may wish to set up the [CodeQL starter workspace](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/#using-the-starter-workspace) for all languages.
6060
61+
### Useful commands
62+
- Run a query using the following commands from the Command Palette (`Cmd/Ctrl + Shift + P`) or right-click menu:
63+
- `CodeQL: Run Query` (run the entire query)
64+
- `CodeQL: Quick Evaluation` (run only the selected predicate or snippet)
65+
- Click the links in the query results to navigate to the source code.
66+
- Explore the CodeQL libraries in your IDE using:
67+
- autocomplete suggestions (`Cmd/Ctrl + Space`)
68+
- jump-to-definition (`F12`, or `Cmd/Ctrl + F12` in a Codespace)
69+
- documentation hovers (place your cursor over an element)
70+
- the AST viewer on an open source file (`View AST` from the CodeQL sidebar or Command Palette)
71+
6172
## Workshop <a id="workshop"></a>
6273
63-
The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a **hint** that describes useful classes and predicates in the CodeQL standard libraries for Java. You can explore these in your IDE using the autocomplete suggestions (`Cmd/Ctrl + Space`) and the jump-to-definition command (`F12`).
74+
The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a **hint** that describes useful classes and predicates in the CodeQL standard libraries for Java.
6475
6576
### Section 1: Finding ObjectInput deserialization <a id="section1"></a>
6677
67-
Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/dev/impls/serialize/) to wrap multiple deserialization formats. Most of the supported serialization libraries might lead to arbitrary code execution upon deserialization of untrusted data. The SPI interface used for deserialization is called [ObjectInput](https://javadoc.io/doc/org.apache.dubbo/dubbo/latest/com/alibaba/dubbo/common/serialize/ObjectInput.html). It provides multiple `readXXX` methods for deserializing data to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits. In this section, we will identify calls to `ObjectInput.readXXX` methods in the codebase.
78+
Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/dev/impls/serialize/) to wrap multiple deserialization formats. Most of the supported serialization libraries might lead to arbitrary code execution upon deserialization of untrusted data. The SPI interface used for deserialization is called [ObjectInput](https://javadoc.io/doc/org.apache.dubbo/dubbo/latest/com/alibaba/dubbo/common/serialize/ObjectInput.html). It provides multiple `readXXX` methods for deserializing data to a Java object. By default, the input is not validated in any way, and is vulnerable to remote code execution exploits.
79+
80+
In this section, we will identify calls to `ObjectInput.readXXX` methods in the codebase. The qualifiers of these calls are the values being deserialized, and hence are **sinks** for deserialization vulnerabilities.
6881
6982
1. Find all method calls in the program.
7083
<details>
@@ -89,8 +102,9 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
89102
<summary>Hints</summary>
90103
91104
- Add a CodeQL variable called `method` with type `Method`.
92-
- `MethodAccess` has a predicate called `getMethod()` for returning the method.
93105
- Add a `where` clause.
106+
- `MethodAccess` has a predicate called `getMethod()` for returning the method.
107+
- Use the equality operator `=` to assert that two CodeQL expressions are the same.
94108
95109
</details>
96110
<details>
@@ -112,6 +126,7 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
112126
113127
- `Method.getName()` returns a string representing the name of the method.
114128
- `string.matches("foo%")` can be used to check if a string starts with `foo`.
129+
- Use the `and` keyword to add multiple conditions to the `where` clause.
115130
116131
</details>
117132
<details>
@@ -128,15 +143,22 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
128143
```
129144
</details>
130145
131-
1. Refine your query to only match calls to `read` methods on classes implementing the `ObjectInput` interface.<a id="question1"></a>
146+
1. Refine your query to only match calls to `read` methods on classes implementing the `org.apache.dubbo.common.serialize.ObjectInput` interface.<a id="question1"></a>
132147
133148
<details>
134149
<summary>Hint</summary>
135150
136151
- `Method.getDeclaringType()` returns the `RefType` this method is declared on. A `Class` is one kind of `RefType`.
137-
- `RefType.getASourceSupertype()` returns the immediate parent/supertypes for a given type.
152+
- `RefType.getASourceSupertype()` returns the immediate parent/supertypes for a given type, as defined in the Java source. (Hover to see the documentation.)
138153
- Use the "reflexive transitive closure" operator `*` on a call to a predicate with 2 arguments, e.g. `getASourceSupertype*()`, to apply the predicate 0 or more times in succession.
139-
- `RefType.hasQualifiedName("package", "class")` holds if the given `RefType` has the qualified name `package.class`.
154+
- `RefType.hasQualifiedName("package", "class")` holds if the given `RefType` has the fully-qualified name `package.class`.
155+
For example, the query
156+
```ql
157+
from RefType r
158+
where r.hasQualifiedName("java.lang", "String")
159+
select r
160+
```
161+
will find the type `java.lang.String`.
140162
141163
</details>
142164
<details>
@@ -220,11 +242,14 @@ Apache Dubbo uses an [abstraction layer](https://dubbo.apache.org/en/docs/v2.7/d
220242
221243
### Section 2: Find the implementations of the decodeBody method from DubboCodec<a id="section2"></a>
222244
223-
Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portions of logic. Classes represent single sets of values, and they can also include operations (known as _member predicates_) specific to that set of values. You have already seen numerous instances of CodeQL classes (`MethodAccess`, `Method` etc.) and associated member predicates (`MethodAccess.getMethod()`, `Method.getName()`, etc.).
245+
Classes that implement the interface `org.apache.dubbo.remoting.Codec2` process user input in their `decodeBody` methods. In this section we will find these methods and their parameters, which are **sources** of untrusted user input.
246+
247+
Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portions of logic. Classes represent sets of values, and they can also include operations (known as _member predicates_) specific to that set of values. You have already seen numerous instances of CodeQL classes (`MethodAccess`, `Method` etc.) and associated member predicates (`MethodAccess.getMethod()`, `Method.getName()`, etc.).
224248
225249
1. Create a CodeQL class called `DubboCodec` to find the interface `org.apache.dubbo.remoting.Codec2`. You can use this template:
226250
```ql
227251
class DubboCodec extends RefType {
252+
// Characteristic predicate
228253
DubboCodec() {
229254
// TODO Fill me in
230255
}
@@ -234,13 +259,8 @@ Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portion
234259
<details>
235260
<summary>Hint</summary>
236261
237-
- Use `RefType.hasQualifiedName(string packageName, string className)` to identify classes with the given package name and class name. For example:
238-
```ql
239-
from RefType r
240-
where r.hasQualifiedName("java.lang", "String")
241-
select r
242-
```
243-
- Within the characteristic predicate you can use the magic variable `this` to refer to the RefType
262+
- Use `RefType.hasQualifiedName("package", "class")` to identify classes with the given package name and class name.
263+
- Within the characteristic predicate, use the special variable `this` to refer to the `RefType` we are describing.
244264
245265
</details>
246266
<details>
@@ -284,20 +304,29 @@ Like predicates, _classes_ in CodeQL can be used to encapsulate reusable portion
284304
```
285305
</details>
286306
287-
3. `decodeBody` methods should consider the second and third parameters as untrusted user input. Write a query to find these parameters (i.e. index 1 and index 2) parameter for `decodeBody` methods.
307+
3. `decodeBody` methods should consider the second and third parameters as untrusted user input. Add a member predicate to your `DubboCodecDecodeBody` class that finds these parameters of `decodeBody` methods.
288308
<details>
289309
<summary>Hint</summary>
290310
291-
- Use `Method.getParameter(int index)` to get the i-th index parameter.
292-
- Create a query with a single CodeQL variable of type `DubboCodecDecodeBody`.
311+
- Create a predicate `Parameter getAnUntrustedParameter() { ... } ` within the class. This has result type `Parameter`.
312+
- Within the predicate, use the special variable `result` to refer to the values to be "returned" or identified by the predicate.
313+
- Within the predicate, use the special variable `this` to refer to the `DubboCodecDecodeBody` method.
314+
- Use `Method.getParameter(int index)` to get the `i`-th index parameter. Indices are 0-based, so we want index 1 and index 2 here.
315+
- Use Quick Evaluation to run your predicate.
293316
294317
</details>
295318
<details>
296319
<summary>Solution</summary>
297320
298321
```ql
299-
from DubboCodecDecodeBody decodeBodyMethod
300-
select decodeBodyMethod.getParameter([1, 2])
322+
class DubboCodecDecodeBody extends Method {
323+
DubboCodecDecodeBody() {
324+
this.getDeclaringType().getASupertype*() instanceof DubboCodec and
325+
this.hasName("decodeBody")
326+
}
327+
328+
Parameter getAnUntrustedParameter() { result = this.getParameter([1, 2]) }
329+
}
301330
```
302331
</details>
303332
@@ -329,9 +358,9 @@ The data flow graph for this method will look something like this:
329358

330359
This graph represents the flow of data from the tainted parameter. The nodes of graph represent program elements that have a value, such as function parameters and expressions. The edges of this graph represent flow through these nodes.
331360

332-
CodeQL for Java provides data flow analysis as part of the standard library. You can import it using `semmle.code.java.dataflow.DataFlow`. The library models nodes using the `DataFlow::Node` CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
361+
CodeQL for Java provides data flow analysis as part of the standard library. You can import it using `semmle.code.java.dataflow.DataFlow` or `semmle.code.java.dataflow.TaintTracking`. The library models nodes using the `DataFlow::Node` CodeQL class. These nodes are separate and distinct from the AST (Abstract Syntax Tree, which represents the basic structure of the program) nodes, to allow for flexibility in how data flow is modeled.
333362

334-
There are a small number of data flow node types – expression nodes and parameter nodes are most common.
363+
There are a small number of data flow node types – expression nodes and parameter nodes are most common. We can use the `asExpr()` and `asParameter()` methods to convert a `DataFlow::Node` into the corresponding AST node.
335364

336365
In this section we will create a data flow query by populating this template:
337366

@@ -355,11 +384,10 @@ class DubboUnsafeDeserializationConfig extends TaintTracking::Configuration {
355384
}
356385
override predicate isSink(DataFlow::Node sink) {
357386
exists(/** TODO fill me in **/ |
358-
/** TODO fill me in **/
359387
sink.asExpr() = /** TODO fill me in **/
360388
)
361389
}
362-
override predicate isAdditionalTaintStep(DataFlow::Node from, DataFlow::Node to) {
390+
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) {
363391
/** TODO fill me in **/
364392
}
365393
}
@@ -369,55 +397,53 @@ where config.hasFlow(source, sink)
369397
select sink, "Unsafe deserialization"
370398
```
371399

372-
1. Complete the `isSource` predicate using the query you wrote for [Section 2](#section2).
400+
1. Complete the `isSource` predicate, using the logic you wrote for [Section 2](#section2).
373401

374402
<details>
375403
<summary>Hint</summary>
376404

377-
- You can translate from a query clause to a predicate by:
378-
- Converting the variable declarations in the `from` part to the variable declarations of an `exists`
379-
- Placing the `where` clause conditions (if any) in the body of the exists
380-
- Adding a condition which equates the `select` to one of the parameters of the predicate.
381-
- Remember to include the `DubboCodecDecodeBody` class you defined earlier.
405+
- Remember the `DubboCodecDecodeBody` class and `getAnUntrustedParameter` predicate you defined earlier.
406+
- Use `asParameter()` to convert a `DataFlow::Node` into a `Parameter`.
407+
- Use `exists` to declare new variables, and `=` to assert that two values are the same.
382408

383409
</details>
384410
<details>
385411
<summary>Solution</summary>
386412

387413
```ql
388-
override predicate isSource(Node source) {
389-
exists(DubboCodecDecodeBody decodeBodyMethod|
390-
source.asParameter() = decodeBodyMethod.getParameter([1, 2])
391-
)
414+
override predicate isSource(DataFlow::Node source) {
415+
exists(DubboCodecDecodeBody decodeBodyMethod |
416+
source.asParameter() = decodeBodyMethod.getAnUntrustedParameter()
392417
}
393418
```
394419
</details>
395420
396-
1. Complete the `isSink` predicate by using the final query you wrote for [Section 1](#section1). Remember to use the `isDeserialized` predicate!
421+
1. Complete the `isSink` predicate, using the logic you wrote for [Section 1](#section1).
397422
<details>
398423
<summary>Hint</summary>
399424
400425
- Complete the same process as above.
426+
- Remember the `isDeserialized` predicate you defined earlier.
427+
- Use `asExpr()` to convert a `DataFlow::Node` into an `Expr`.
401428
402429
</details>
403430
<details>
404431
<summary>Solution</summary>
405432
406433
```ql
407-
override predicate isSink(Node sink) {
408-
exists(Expr qualifier |
409-
isDeserialized(qualifier) and
410-
sink.asExpr() = qualifier
411-
)
434+
override predicate isSink(DataFlow::Node sink) {
435+
isDeserialized(sink.asExpr())
412436
}
413437
```
414438
</details>
415439
416-
1. Complete the `isAdditionalTaintStep` predicate by modelling the `Serialization.deserialize()` method so that it connects its first argument with the return value.
440+
1. Teach CodeQL about extra data flow steps that it should follow. Complete the `isAdditionalTaintStep` predicate by modelling the `Serialization.deserialize()` method, which connects its _first argument_ with the _return value_.
417441
<details>
418442
<summary>Hint</summary>
419443
420-
- Complete the same process as above.
444+
- As before, use `exists` to declare new variables, `asExpr()` to convert from `DataFlow::Node` to `Expr`,
445+
and `=` to assert equality.
446+
- `isAdditionalTaintStep` has two arguments: the node where data starts, and the node where data ends.
421447
422448
</details>
423449
<details>
@@ -446,9 +472,9 @@ The answer to this is to convert the query to a _path problem_ query. There are
446472
- Add a new import `DataFlow::PathGraph`, which will report the path data alongside the query results.
447473
- Change `source` and `sink` variables from `DataFlow::Node` to `DataFlow::PathNode`, to ensure that the nodes retain path information.
448474
- Use `hasFlowPath` instead of `hasFlow`.
449-
- Change the select to report the `source` and `sink` as the second and third columns. The toolchain combines this data with the path information from `PathGraph` to build the paths.
475+
- Change the `select` clause to report the `source` and `sink` as the second and third columns. The toolchain combines this data with the path information from `PathGraph` to build the paths.
450476
451-
3. Convert your previous query to a path-problem query.
477+
3. Convert your previous query to a path-problem query. Run the query to see the paths in the results view.
452478
<details>
453479
<summary>Solution</summary>
454480
@@ -484,20 +510,21 @@ The answer to this is to convert the query to a _path problem_ query. There are
484510
this.getDeclaringType().getASupertype*() instanceof DubboCodec and
485511
this.hasName("decodeBody")
486512
}
513+
514+
Parameter getAnUntrustedParameter() {
515+
result = this.getParameter([1, 2])
516+
}
487517
}
488518
489-
class DubboUnsafeDeserializationConfig extends DataFlow::Configuration {
519+
class DubboUnsafeDeserializationConfig extends TaintTracking::Configuration {
490520
DubboUnsafeDeserializationConfig() { this = "DubboUnsafeDeserializationConfig" }
491521
override predicate isSource(DataFlow::Node source) {
492-
exists(DubboCodecDecodeBody decodeBodyMethod|
493-
source.asParameter() = decodeBodyMethod.getParameter([1, 2])
522+
exists(DubboCodecDecodeBody decodeBodyMethod |
523+
source.asParameter() = decodeBodyMethod.getAnUntrustedParameter()
494524
)
495525
}
496526
override predicate isSink(DataFlow::Node sink) {
497-
exists(Expr qualifier |
498-
isDeserialized(qualifier) and
499-
sink.asExpr() = qualifier
500-
)
527+
isDeserialized(sink.asExpr())
501528
}
502529
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) {
503530
exists(MethodAccess ma |
@@ -516,7 +543,7 @@ The answer to this is to convert the query to a _path problem_ query. There are
516543
```
517544
</details>
518545
519-
For more information on how the vulnerability was identified, you can read the [blog disclosing the original problem](https://securitylab.github.com/research/apache-dubbo/).
546+
For more information on how the vulnerability was identified, read the [blog post on the original problem](https://securitylab.github.com/research/apache-dubbo/).
520547
521548
## What's next?
522549
- [CodeQL overview](https://codeql.github.com/docs/codeql-overview/)

0 commit comments

Comments
 (0)