internal/encoding: binary files are incorrectly treated as UTF-8

### What version of CUE are you using (`cue version`)?

<pre>
$ cue version
cue version (devel)

go version go1.23.5
      -buildmode exe
       -compiler gc
  DefaultGODEBUG asynctimerchan=1,gotypesalias=0,httpservecontentkeepheaders=1,tls3des=1,tlskyber=0,x509keypairleaf=0,x509negativeserial=1
     CGO_ENABLED 0
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
cue.lang.version v0.12.0
</pre>

### Does this issue reproduce with the latest stable release?
Yes it was found using it.

### What did you do?
I was working with some ELF files, trying to use CUE's embed feature to process these files using CUE. To embed them I used `type=binary`.

As I do not want to include some big binaries here is a minimal reproducer (which cannot be packed into txtar format, unfortunately):
```
$ printf "\xf0" > invalid.bin
$ hexdump invalid.bin
0000000 00f0                                   
0000001
$ cat repro.cue
@extern(embed)

package repro
import (
	"list"
	"strings"
)

want: '\xf0'
got: '\xef\xbf\xbd'

invalid: bytes @embed(file=invalid.bin, type=binary)
length_check: len(invalid) & 1
content_check: invalid & want

invalid_length_check: len(invalid) & 3
invalid_content_check: invalid & got

bytelist: [for i in list.Range(0, len(invalid), 1) {strings.ByteAt(invalid, i)}]
$ cue eval
content_check: conflicting values '\xf0' and '�':
    
    ./repro.cue:9:7
    ./repro.cue:14:16
    ./repro.cue:14:26
length_check: conflicting values 1 and 3:
    ./repro.cue:13:15
    ./repro.cue:13:30

```

### What did you expect to see?
I was expecting CUE to give me the file's contents verbatim.

### What did you see instead?
As is hopefully clear by looking at the example above CUE's `@embed()` is returning three bytes instead of just one.

Those three bytes happen to be the unicode "replacement character" encoded in UTF-8.


So CUE appears to pass the binary file through a UTF-8 decoder before handing the value to the evaluator.

I was going through the relevant function and found [an old comment](https://github.com/cue-lang/cue/blob/master/internal/encoding/encoding.go#L191) foreseeing this problem:
> For now we assume that all encodings require UTF-8. This will not be the case for some binary protocols. We need to exempt those explicitly here once we introduce them.

My attempt at fixing this issue basically does exactly that: Instead of reading the bytes out of the UTF8-Reader in l.265, I pass in the raw file reader `srcr` to `ReadAll()`. This resolves the issue, at least in my case, so I created a small PR: #3740

I also noticed that the `Decoder` is pretty much untested. I can offer to write some go-tests in case they are wanted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

internal/encoding: binary files are incorrectly treated as UTF-8 #3741

What version of CUE are you using (`cue version`)?

Does this issue reproduce with the latest stable release?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

internal/encoding: binary files are incorrectly treated as UTF-8 #3741

Description

What version of CUE are you using (cue version)?

Does this issue reproduce with the latest stable release?

What did you do?

What did you expect to see?

What did you see instead?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What version of CUE are you using (`cue version`)?