Entry 3: Unmodified var warnings

27 June 2018

I’m a sucker for old bugs. I find it especially satisfying to fix something that’s been broken or missing for some time.

Looking through some older type-checker bugs, I found SR-1658:

func test() {
  // This ought to give a warning to change the var to a let
  var x: Int
  if Int("abc") == nil {
    x = 1
  } else {
    x = 2
  }
  print(x)
}

I was really surprised that this didn’t already give a warning. I guess I’ve just assumed that Swift would warn, so I never paid it too much attention. But I played with it a bit more and found that this didn’t give a warning either:

func test() {
  var x: Int
  x = 1
  print(x)
}

Now I was confused and intrigued.

Trying to understand a bit more

I did my usual dance to find the source of the warning I expected:

Change my test case to get the actual warning
Search the codebase for the text to find the identifier of the diagnostic
Search for the identifier to find where the diagnostic is evoked

That led me to VarDeclUsageChecker. That certainly seemed like the right place.

But as I continued to understand what was going on, I decided to also look at uninitialized errors. I knew that it would trigger in the original example. And since both have to track reads and writes to a var or let, I thought they’d be related.

I was wrong. Uninitialized warnings come from LifetimeChecker in swiftSILOptimizer. This is a completely separate library and phase of compilation.

It still seems like these things ought to be done together.

`VarDeclUsageChecker`

I thought I’d try to press on and fix my issue anyway. Fixing a minor issue would yield an immediate improvement and help me learn what I needed to make a larger change.

VarDeclUsageChecker is another AST walker. It visits the different nodes of the AST, building up information. Then, in its destructor, it emits various diagnostics.

AST walkers have a few hooks. Besides the per-node visit functions, there’s also “visit declaration before”, “visit declaration after”, “visit expression before”, “visit expression after”, “visit statement before”, and “visit statement after”. These are called, no matter the type of declaration/expression/statement, before/after the specific visit function is called.

Most of the var usage checking happens in these methods. Each declaration node adds the declaration of each variable to a SmallMapVector (a sorted dictionary) of all the variables that are being tracked.

Each declaration is associated with a bit field.

  // Keep track of some information about a variable.
  enum {
    RK_Read        = 1,      ///< Whether it was ever read.
    RK_Written     = 2,      ///< Whether it was ever written or passed inout.

    RK_CaptureList = 4       ///< Var is an entry in a capture list.
  };

Then the expressions are walked. If a variable is referenced, it’s marked as having been read. If it’s assigned to, it’s marked as being written. If it’s used as an inout reference or a closure capture, then it’s marked as read and written (since you don’t know).

Afterwards, you can emit diagnostics:

If a var was never read or written, the declaration can be deleted
If a var was only written once, it can be a let

A few other caveats also apply. e.g. if you do var (x, y, z) = … and only y is mutated, no warning is given to use let for x and z.

Trying to fix the issue

Having understood the basics of how this worked, I thought I’d be able to improve the diagnostic by adding another flag: RK_Initialized.

func test() {
  var x: Int
  x = 1
  print(x)
}

The x = 1 here is initializing the variable, but it’s counted as a write. By separating writes and initialization, the compiler could detect the above case and still suggest a let.

I added RK_Initialized and looked for the correct place to start using it. I figured my change would have 2 parts:

Start the variable as initialized if it’s declared with a value
Change the first written to initialized if it wasn’t already initialized

I started by trying to set the initialized flag on declaration using this code:

func f() {
  let y: Int = 1
  print(y)
}

A look at the AST showed what I should be looking for:

(pattern_binding_decl
  (pattern_typed type='Int'
    (pattern_named type='Int' 'y')
    (type_ident
      (component id='Int' bind=Swift.(file).Int)))
  (call_expr implicit type='Int' location=1658.swift:6:15 range=[1658.swift:6:15 - line:6:15] nothrow arg_labels=_builtinIntegerLiteral:
    (constructor_ref_call_expr implicit type='(_MaxBuiltinIntegerType) -> Int' location=1658.swift:6:15 range=[1658.swift:6:15 - line:6:15] nothrow
      (declref_expr implicit type='(Int.Type) -> (_MaxBuiltinIntegerType) -> Int' location=1658.swift:6:15 range=[1658.swift:6:15 - line:6:15] decl=Swift.(file).Int.init(_builtinIntegerLiteral:) function_ref=single)
      (type_expr implicit type='Int.Type' location=1658.swift:6:15 range=[1658.swift:6:15 - line:6:15] typerepr='Int'))
    (tuple_expr implicit type='(_builtinIntegerLiteral: Int2048)' location=1658.swift:6:15 range=[1658.swift:6:15 - line:6:15] names=_builtinIntegerLiteral
      (integer_literal_expr type='Int2048' location=1658.swift:6:15 range=[1658.swift:6:15 - line:6:15] value=1))))

(var_decl "y" type='Int' interface type='Int' access=private storage_kind=stored)

Variable declarations happen within a pattern binding, which contains 1 or more assignments.

The code in walkToDeclPre actually noted that these were explicitly ignored:

    // Note that we ignore the initialization behavior of PatternBindingDecls,
    // but we do want to walk into them, because we want to see any uses or
    // other things going on in the initializer expressions.

I deleted the comment and found some other code the looked at PatternBindingDecls. I used what I found there to help me mark initialized variables as initialized:

    // Look for the pattern binding declarations
    if (auto *pbd = dyn_cast<PatternBindingDecl>(D)) {
      // Look at each item in the binding. `let (x, y) = (1, 2), z: Int` has
      // 2 items: `(x, y)` and `z`.
      for (PatternBindingEntry pbe : pbd->getPatternList()) {
        // We only care about bindings that have an init since we're
        // specifically trying to mark them as initialized
        if (!pbe.getInit())
          continue;

        // Mark each variable in the entry (`x` and `y`) as initialized
        pbe.getPattern()->forEachVariable([&](VarDecl *VD) {
          VarDecls[VD] |= RK_Initialized;
        });
      }
    }

Next, I needed to track the first write as an initialization. This was easy to hack together. All the reads and writes funnel through a central method. I modified it to mark the first write as an initialization instead:

  void addMark(Decl *D, unsigned Flag) {
    auto *vd = dyn_cast<VarDecl>(D);
    if (!vd) return;

    auto vdi = VarDecls.find(vd);
    if (vdi != VarDecls.end()) {
      if ((Flag & RK_Written) && !(vdi->second & RK_Initialized))
        vdi->second |= (Flag & ~RK_Written) | RK_Initialized;
      else
        vdi->second |= Flag;
    }
  }

This wasn’t very clean, but I wanted to focus on a proof-of-concept first. I’d clean it up later, if it worked.

I quickly discovered that I’d also need to check !(Flag & RK_Read). When a variable is passed as inout to a function, it’s marked as both read and write—since the compiler doesn’t know what the function will do. In those cases, I needed to preserve the RK_Written mark.

Lastly, I needed to update a few other variable declarations to include the initialized flag. E.g., if you have a top-level variable declaration in a file, that declaration was handled here:

    if (auto *TLCD = dyn_cast<TopLevelCodeDecl>(D)) {
      // If this is a TopLevelCodeDecl, scan for global variables
      auto *body = TLCD->getBody();
      for (auto node : body->getElements()) {
        if (node.is<Decl *>()) {
          // Flag all variables in a PatternBindingDecl
          Decl *D = node.get<Decl *>();
          auto *PBD = dyn_cast<PatternBindingDecl>(D);
          if (!PBD) continue;
          for (PatternBindingEntry PBE : PBD->getPatternList()) {
            PBE.getPattern()->forEachVariable([&](VarDecl *VD) {
              VarDecls[VD] = RK_Initialized|RK_Read|RK_Written;
            });
          }
        } else if (node.is<Stmt *>()) {
          // Flag all variables in guard statements
          Stmt *S = node.get<Stmt *>();
          auto *GS = dyn_cast<GuardStmt>(S);
          if (!GS) continue;
          for (StmtConditionElement SCE : GS->getCond()) {
            if (auto pattern = SCE.getPatternOrNull()) {
              pattern->forEachVariable([&](VarDecl *VD) {
                VarDecls[VD] = RK_Read|RK_Initialized;
              });
            }
          }
        }
      }

After verifying that my test case emitted the warning I wanted, I started working through the test suite. Running it turned up quite a few failures, but many of them were legitimate! That was good news: the warning would be useful.

Unfortunately, I discovered some test cases that I couldn’t handle:

var s : String
for _ in [1, 2] {
    s = " "
}
return s

In this case, the write serves as both a write and as initialization—because it’s inside a loop.

To correctly detect this, I’d need to know if any loops existed between the declaration of the variable and the write. I haven’t figured out a good way to do this yet.

Regrouping

To recap, I thought it’d be easier to warn in this case:

var x: Int
x = 1
print(x)

But it turned out to be almost as hard as the case described in the bug:

var x: Int
if Int("abc") == nil {
  x = 1
} else {
  x = 2
}
print(x)

So I think it’s worth considering how to solve the general case before proceeding. I mentioned above that uninitialized variable errors are handled elsewhere:

var x: Int
if Int("abc") == nil {
} else {
  x = 2
}
print(x) // This correctly errors because x wasn't set

It seems to me like those two bits of code should be combined. Both are tracking reads and writes to variables and emitting diagnostics. But since I’m not certain of this, I’ve asked on the forums. Hopefully that will provide some helpful direction for the future.