Indent handling and unterminated multiline string literals

This is a design question, affecting how we might handle multiline string literals in compilation.

When the compiler detects code that starts a multiline string literal (`"""`), it parses the remainder of the file for the terminator. If it can't find a valid terminator, it will consider the remainder of the file to be part of the multiline string literal for the purposes of error generation.

This underlying reason for this is because of a desire to support multi-line string literals that have text in the first column. For example, in an 80-char width code, this allows putting text in 80-char width text blocks. The indent is determined by the _terminator_, so the terminator must also be in the first column, as in the following code:

```
  fn Foo() {
    var x: auto = """
My long text
Goes on and on
""";
  }
```

The compiler issue is what happens if this terminator is missing, as in the following code:

```
  fn Foo() {
    var x: auto = """
My long text
Goes on and on
"";
  }

  fn Bar() {
    ...
  }
```

In the current implementation, `Bar()` is part of the multiline string literal because no terminator was found. This may yield undesirable compiler errors that would behave better if `Bar` were found; for example, if an `api` file is _mostly_ correct, it would be nice to be able to generate good errors for the `impl`, whereas the current approach might result in "name not found" errors for names that are consumed by the multiline string literal.

Noting that the first column string is expected to be desirable but not perhaps less common than other multi-line string literal choices, there seem to be a few options for improving the results. If behavior is changed per the first two options offered below, a multi-line string missing a terminator may still consume substantial code; however, idiomatic code will typically have the string at an indent, and so code after that point could still generate compiler errors. In the preceding example, that would result in `Bar` being seen, improving later compiler errors.

Options are:

1. Enforce that multi-line string indentation must be at least that of the line of the mutli-line string _introducer_. For example:

    ```
    fn Foo() {
      var x: auto = """
      Minimum string indent is at `var`.
      """;
    }

    fn Bar() {
      var x: auto = DoSomething(
        """
        Minimum string indent is at `"""`.
        """);
    }

    fn Baz() {
      var x: auto = DoSomething(
    """
    80-column achieved by putting the introducer in the first column.
    """);
    ```

2. Enforce that indentation may only increase in code blocks, with an exception for the `"""` introducer specifically at the first column.

    ```
    fn Foo() {
      var x: auto = """
      Minimum string indent is at `var`.
      """;
    }

    fn Bar() {
      var x: auto = DoSomething(
      """
      Minimum string indent is still at `var`.
      """);
    }

    fn Baz() {
      var x: auto = DoSomething(
    """
    80-column achieved by putting the introducer in the first column.
    """);
    ```

3. Change nothing.

@zygoloid notes option (2) in particular may help yield better compiler errors in general, for example by catching missing braces based on indentation. (I will note I don't think we'd want to eliminate braces, instead having braces work in combination with indentation enforcement in order to detect the kind of error better)

Note, if there is no desire to change string indent handling, we may still have ways to produce better errors; @zygoloid brought up trying to lex the file from the end as part of recovery, which may be expensive to build, but we could also consider backing up the string literal error range until the last `"""` then guessing where to end it based on indent (which might look similar to option (1) in implementation, and might be fairly straightforward). One thing we are careful about though is error consumption, because under-consuming text for a string literal can lead to O(n^2) behaviors with parsing and re-parsing the same text on string-literal look-alikes. But, as long as an error is produced, the precise form of the errors seems open to tuning without affecting valid code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indent handling and unterminated multiline string literals #1121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Indent handling and unterminated multiline string literals #1121

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions