CodeQL workshop for C/C++: Empty if statements, Predicates and Classes

Problem statement
Writing queries on your local machine
Documentation links
Introduction: Empty if statements
Existential quantifiers (local variables in queries)
Predicates
Classes

Problem statement

In this workshop, we will be writing our first CodeQL query, which we will use to analyze the source code of Exiv2, an open source C++ library to manage image metadata.

Redundant code is often an indication that the programmer has made a logical mistake. Consider this simple example:

int write(int buf[], int size, int loc, int val) {
    if (loc >= size) {
       // return -1;
    }

    buf[loc] = val;

    return 0;
}

Here we have a C function which writes a value to a buffer. It includes a check that is intended to prevent the write overflowing the buffer, but the return statement within the condition has been commented out, leaving a redundant if statement and no bounds checking.

In this workshop we will explore the features of CodeQL by writing a query to identify redundant conditionals like this. We will use the example above as a seed vulnerability for writing that query to help us find other instances ("variants") of the same class of vulnerability.

This workshops will teach you:

Basic query structure
QL types and variables
Logical conditions

Writing queries on your local machine

To run CodeQL queries on Exiv2, follow these steps:

Install the Visual Studio Code IDE.
Download and install the CodeQL extension for Visual Studio Code. Full setup instructions are here.
Set up the starter workspace.
- Important: Don't forget to git clone --recursive or git submodule update --init --remote, so that you obtain the standard query libraries.
Open the starter workspace: File > Open Workspace > Browse to vscode-codeql-starter/vscode-codeql-starter.code-workspace.
Download the Exiv2 database.
Unzip the database.
Import the unzipped database into Visual Studio Code:
- Click the CodeQL icon in the left sidebar.
- Place your mouse over Databases, and click the + sign that appears on the right.
- Choose the unzipped database directory on your filesystem.
Create a new file, name it EmptyIf.ql, save it under codeql-custom-queries-cpp.

Documentation links

If you get stuck, try searching our documentation and blog posts for help and ideas. Below are a few links to help you get started:

Introduction: Empty if statements

The workshop is split into several steps. You can write one query per step, or work with a single query that you refine at each step. Each step has a hint that describes useful classes and predicates in the CodeQL standard libraries for C/C++. You can explore these in your IDE using the autocomplete suggestions and jump-to-definition command.

The basic syntax of QL will look familiar to anyone who has used SQL. A query is defined by a select clause, which specifies what the result of the query should be. For example:

import cpp

select "hello world"

This query simply returns the string "hello world".

More complicated queries look like this:

from /* ... variable declarations ... */
where /* ... logical formulas ... */
select /* ... expressions ... */

The from clause specifies some variables that will be used in the query. The where clause specifies some conditions on those variables in the form of logical formulas. The select clauses speciifes what the results should be, and can refer to variables defined in the from clause.

The from clause is defined as a series of variable declarations, where each declaration has a type and a name. For example:

from IfStmt ifStmt
select ifStmt

We are declaring a variable with the name ifStmt and the type IfStmt (from the CodeQL standard library for analyzing C/C++). Variables represent a set of values, initially constrained by the type of the variable. Here, the variable ifStmt represents the set of all if statements in the C/C++ program, as we can see if we run the query.

To find empty if statements we will first need to define what it means for the if statement to be empty. If we consider the code sample again:

    if (loc >= size) {
       // return -1;
    }

What we can see is that there is a block (i.e. an area of code enclosed by { and }) which includes no executable code.

Write a new query to identify all blocks in the program.

Hint

A block statement is represented by the standard library type Block.
Solution
```
from Block block
select block
```

QL is an object-oriented language. One consequence of this is that "objects" can have "operations" associated with them. QL uses a "." notation to allow access to operations on a variable. For example:

from IfStmt ifStmt
select ifStmt, ifStmt.getThen()

This reports the "then" part of each if statement (i.e. the part that is executed if the conditional succeeds). If we run the query, we find that it now reports two columns, where the first column is if statements in the program, and the second column is the blocks associated with the "then" part of those if statements.

Update your query to report the number of statements within each block.

Hint

Block has an operation called getNumStmt().
Solution
```
from Block block
select block, block.getNumStmt()
```

We can apply further constraints to the variables by adding a where clause, and which specifies logical formula that describes additional conditions on the variables. For example:

from IfStmt ifStmt, Block block
where ifStmt.getThen() = block
select ifStmt, block

Here we have added another variable called block, and then added a condition (a "formula" in QL terms) to the where clause that states that the block is equal to the result of the ifStmt.getThen() operation. Note: the equals sign (=) here is not an assignment - it is declaring a condition on the variables that must be satisified for a result to be reported. Additional conditions can be provided by combining them with an and.

When we run this query, we again get two columns, with if statements and then parts, however we now get fewer results. This is because not all if statements have blocks as then parts. Consider:

if (x)
  return 0;

This reveals a feature of QL - proscriptive typing. By specifying that the variable block has type "Block", we are actually asserting some logical conditions on that variable i.e. that it represents a block. In doing so, we have also limited the set of if statements to only those where the then part is associated with a block.

Update your query to report only blocks with 0 stmts.

Hint

Add a where clause which states that the number of statements in the block is equal to 0.
Solution
```
from Block block
where block.getNumStmt() = 0
select block
```
Combine your query identifying "empty" blocks with the query identifying blocks associated with if statements, to report all empty if statements in the program.

Hint

Add a new condition to the where clause using the logical connective and.
Solution
```
from IfStmt ifStmt, Block block
where
  ifStmt.getThen() = block and
  block.getNumStmt() = 0
select ifStmt, block
```

Congratulations, you have now written your first CodeQL query that finds genuine bugs!

In order to use this with the rest of the CodeQL toolchain, we will need to make some final adjustments to the query to add metadata, to help the toolchain interpret and process the results.

Reveal

/**
 * @name Empty if statement
 * @kind problem
 * @id cpp/empty-if-statement
 */
import cpp

from IfStmt ifStmt, Block block
where
  ifStmt.getThen() = block and
  block.getNumStmt() = 0
select ifStmt, "Empty if statement"

Existential quantifiers (local variables in queries)

In this next part of the workshop, we will experiment with refactoring this query using different features of CodeQL. There are no exercises in this section.

The first feature we will explore is existential quantifiers. Although the terminology may sound scary if you are not familiar with logic and logic programming, these are simply ways to introduce temporary variables with some associated conditions. The syntax for them is:

exists(<variable declarations> | <formula>)

They have a similar structure to the from and where clauses, where the first part allows you to declare one or more variables, and the second formula ("conditions") that can be applied to those formula.

For example, we can use this to refactor our query to use a temporary variable for the empty block:

from IfStmt ifStmt
where
  exists(Block block |
    ifStmt.getThen() = block and
    block.getNumStmt() = 0
  )
select ifStmt, "Empty if statement"

Predicates

The next feature we will explore is predicates. These provide a way to encapsulate portions of logic in the program so that they can be reused. Like existential quantifiers, you can think of them as a mini from-where-select query clause. Like a select clause they also produce a set of "tuples" or rows in a result table.

We can introduce a new predicate in our query that identifies the set of empty blocks in the program (for example, to reuse this feature in another query):

predicate isEmptyBlock(Block block) {
  block.getNumStmt() = 0
}

from IfStmt ifStmt
where isEmptyBlock(ifStmt.getThen())
select ifStmt, "Empty if statement"

You can define a predicate with result by replacing the keyword predicate with the type of the result. This introduces the special variable result, which can be used like a regular parameter.

Block getAnEmptyBlock() {
  result.getNumStmt() = 0
}

From an implementation point of view, this is effectively equivalent to the previous predicate:

predicate getAnEmptyBlock(Block result) {
  result.getNumStmt() = 0
}

The main difference is how we use it:

from IfStmt ifStmt
where ifStmt.getThen() = getAnEmptyBlock()
select ifStmt, "Empty if statement"

The predicate is an expression, and so can be used for equality comparisons. However, both forms of predicate calculate the same set of values under the hood.

Classes

In this final part of the workshop we will talk about CodeQL classes. Classes are a way in which you can define new types within CodeQL, as well as providing an easy way to reuse and structure code.

Like all types in CodeQL, classes represent a set of values. For example, the Block type is, in fact, a class, and it represents the set of all blocks in the program. You can also think of a class as defining a set of logical conditions that specifies the set of values for that class.

For example, we can define a new CodeQL class to represent empty blocks:

class EmptyBlock extends Block {
  EmptyBlock() {
    this.getNumStmt() = 0
  }
}

We use the keyword class, provide a name for our class, then provide a "super-type". All classes in QL must have at least one super-type, and the super-types define the initial set of values in our class. In this case, our EmptyBlock starts with all the values in the Block class. However, a class that can only represent the same set of values as another class is not very interesting. We can therefore provide a characteristic predicate that defines some additional conditions that can restrict the set of values further. In this case, we can specify the same condition as before to indicate that our empty blocks are blocks whose getNumStmt() = 0. We can use the special variable this to refer to the instance of Block we are constraining.

Note that a value can belong to more than one of these sets, which means that it can have more than one type. For example, empty blocks are both Blocks and EmptyBlocks.

So far, this class is actually equivalent to the predicate solutions we saw above - we are in fact specifying the same conditions, and this will calculate the same set of values. The difference, again, is how we use it:

from IfStmt ifStmt, EmptyBlock block
where ifStmt.getThen() = block
select ifStmt, "Empty if statement"

This is another instance of the proscriptive typing of QL - by changing the type of the variable to EmptyBlock, we change the meaning of the program.

As discussed previously, classes can also provide operations. These operations are called member predicates, as they are predicates which are members of the class. For example:

class MyBlock extends Block {
  predicate isEmptyBlock() {
    this.getNumStmt() = 0
  }
}

In this case we are not going to provide a characteristic predicate - this class is going to represent the same set of values as Block. However, we will provide a member predicate is specify whether this is an empty block. Member predicates also have a special variable called this which refers to the instance.

We can then use this in the same way as operations provided on standard library classes:

from IfStmt ifStmt, MyBlock block
where
  ifStmt.getThen() = block and
  block.isEmptyBlock()
select ifStmt, "Empty if statement"

In fact, this is how the standard library classes are implemented - if we select getThen(), we can see it is also defined as a member predicate.

hohn/codeql-cpp-empty-if-predicates-classes.md Secret