Skip to content

First draft of proposal for inline classes #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

odersky
Copy link

@odersky odersky commented Jun 13, 2023

This is the first draft of the proposal for inline classes and a different representation of generics that avoids boxing.

[Rendered]

@odersky odersky marked this pull request as draft June 13, 2023 10:56
@odersky
Copy link
Author

odersky commented Jun 13, 2023

I propose to keep this PR open for comments. If the comments are positive enough we give it a SNIP number and try to proceed to a prototype implementation phase. Otherwise we'll close.

This is the first draft of the proposal for inline classes and a different representation of generics
that avoids boxing.
@odersky odersky force-pushed the inline-classes-rfc branch from 5be6984 to 8f4c2e8 Compare June 13, 2023 12:28
@natsukagami

This comment was marked as resolved.

Co-authored-by: Lorenzo Gabriele <[email protected]>
- Some additional complications might arise for GC.
- It would be difficult implement the techniques if usable address space needs to be increased significantly beyond 48 bits.


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Moved here to allow inline replies)

I was not sure what would happen when you mix normal classes and inline classes, here is what I understood, is this correct ?

inline class Inner(x: Int)
inline class Outer(y1: Inner, y2: Inner)

Instances of Outer are C-style structs with structure more or less [y1 -> int, y2 -> int] where y1, y2 are not actually stored, but offsets known at compiletime

case class Inner(x: Int)
inline class Outer(y1: Inner, y2: Inner)

Outer: [y1 -> pointer to Inner, y2 -> pointer to Inner]

inline class Inner(x: Int)
case class Outer(y1: Inner, y2: Inner)

Outer: case class Outer(y1: Int, y2: Int)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I couldn't find a calling convention, so would the first example be [y1.x -> int, y2.x -> int] instead ?
(Where again y1.x and y2.x are replaced by an offset at compiletime)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layout of classes is all as you described. When calling a constructor or function, statically known inline classes of size up to 8 are passed by value, whereas larger inline classes are passed by reference. But that does not affect the field layout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would you call outer.y1.x, or simply outer.y1 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outer.y1.x.

@j-mie6
Copy link

j-mie6 commented Jun 13, 2023

@odersky I like the proposal and it's something I've wanted for a while. However, I'm wondering if this really needs to be a SNIP... could it not be a SIP? It's true that there are some low-level memory layout advantages that can realistically only be achieved by Scala Native, however I think there is scope for this to also have a benefit for Scala as a wider whole.

One reason why I'd like to understand how this would fit in the wider Scala context is that it would be a shame if Scala native started to syntactically diverge from Scala itself -- part of the benefit of Scala Native for me is how I can cross compile to JVM quite easily if I wish, obviously some people would rather just always use SN or SJVM instead.

I think that a lot of this proposal shares similarities with a few features from GHC: the {-# UNPACK #-} pragma, which you can use on the argument to a constructor to ask GHC to inline the corresponding non-sum type into that position, GHC will automatically "unbox" any data as it goes into that constructor, and "boxes" it again when it leaves, if necessary; {-# LANGUAGE UnboxedTuples #-}, which are stack/register resident data that is entirely strict and unboxed -- when passed as an argument to a function, an n-ary unboxed tuple will be unrolled into additional n-arguments, n additional bindings within a function, and returned into n registers/stack locations if required (it must then be unpacked on the caller-side); and also to some extent {-# LANGUAGE UnboxedSums #-}, where sum types can also be unboxed (I know less about this one, haven't personally used it myself). The way I see the situation currently, these behaviours could also be implemented in plain ol' Scala as you describe them with the caveat that the memory layout would not benefit from the same low-level treatment. (Are there things to be learnt from how GHC has made its somewhat disconnected attempts at addressing this issue?)

That said, just because Scala Native can do it better doesn't mean that Scala JVM and Scala JS cannot do it in a "less performant" fashion. I still think it would many of the benefits you rightly describe in the proposal. It might require more boxing or unboxing perhaps, but I still think that provides room for more compact cache coherent data access etc etc.

Glad to hear your thoughts and discuss further 🙂

@odersky
Copy link
Author

odersky commented Jun 13, 2023

I'd be surprised if one could get the benefits of this proposal on the current JVM. Yes, we can try to extend the value class approach to classes with more than one field (which seems to be more or less what {-# UNPACK #-} does). But value classes are already problematic for possibly losing performance due to unexpected boxing. So it's not clear going further down that road will win anything on average. One can try for sure, but it will be a lot of work, and I don't see anyone having the appetite to do it.

So I think the only way forward is: prototype this on SN. If it's a big win, lobby for the syntax changes to be backported to Scala JVM. As far as I can see, the only syntax change necessary would be to allow inline as a modifier for classes. The semantic changes that come with it are some adjustments to the current restrictions on value classes (i.e. multiple fields are now OK, but type parameters are forbidden). Seen just from the standpoint of Scala/JVM this is of course pointless, since only single-element inline classes can be optimized as value classes. But maybe one can make an argument that we should do this to keep the language the same across platforms. If that argument fails, one could still make inline an annotation which is simply ignored on JVM. I don't like that route, since it is really just a way to smuggle in syntax changes without proper control.

On the other hand, if the project is a success on SN, maybe it can influence and speed up Valhalla and we will get equivalent functionality on the JVM at some point?

odersky added 2 commits June 13, 2023 23:13
Otherwise our computations don't work out since inline classes with byte alignment
need 8 consecutive entries in the vtable array, so one of them will hit 000 as an index.
@j-mie6
Copy link

j-mie6 commented Jun 13, 2023

I'd be surprised if one could get the benefits of this proposal on the current JVM. Yes, we can try to extend the value class approach to classes with more than one field (which seems to be more or less what {-# UNPACK #-} does). But value classes are already problematic for possibly losing performance due to unexpected boxing. So it's not clear going further down that road will win anything on average. One can try for sure, but it will be a lot of work, and I don't see anyone having the appetite to do it.

It would be interesting to see what benefits you can get -- I think a system that is a little more robust than the current "best effort" AnyVal would be good. But you make fair points...

So I think the only way forward is: prototype this on SN. If it's a big win, lobby for the syntax changes to be backported to Scala JVM.

... i.e. this makes sense.

As far as I can see, the only syntax change necessary would be to allow inline as a modifier for classes.

Ideally though, it would still be valid syntax on the JVM even if it "did nothing" for portability. Though as @keynmol noted earlier (elsewhere), it seems that the current parser already accepts inline before class and it's rejected in a later phase.

The semantic changes that come with it are some adjustments to the current restrictions on value classes (i.e. multiple fields are now OK, but type parameters are forbidden). Seen just from the standpoint of Scala/JVM this is of course pointless, since only single-element inline classes can be optimized as value classes.

Well, you could still unpack a multi-arg value class directly into multiple arguments/variables on the JVM no?

But maybe one can make an argument that we should do this to keep the language the same across platforms. If that argument fails, one could still make inline an annotation which is simply ignored on JVM. I don't like that route, since it is really just a way to smuggle in syntax changes without proper control.

Yeah, the annotation route seems a little hacky to me, I'd rather syntax that "does nothing" over that, I'd think: inline actually changes the program deterministically, and @inline does not, so it would be good to be consistent on that front.

On the other hand, if the project is a success on SN, maybe it can influence and speed up Valhalla and we will get equivalent functionality on the JVM at some point?

Yeah, this is a great point! I see real value in that, and if a SN prototype is the way to get that, then cool!

@kyouko-taiga
Copy link

Inline classes determine the alignment of their instances depending on their size. A typical scheme would align at 8-byte boundaries for sizes >= 8, and round up to the next power of two for sizes lower than 8.

You can align the class' storage by the alignment of its first field and add padding as you go. That way you can align at 1 a class made of n Bytes. The padding can be seen as a drawback but in practice it is relatively easy to eliminate if the layout algorithm is predictable or by allowing the compiler to rearrange the fields as it sees fit.

If there is a hierarchy of inline classes (which must all be defined in the same compilation unit), all inline classes are given the same size, which is the size of the root inline class.

So IIUC basically all instances of in a class hierarchy are have the same layout. It's just that for parent classes the unused members are considered padding. If that's the case, I suspect there's a way to avoid bloating the size of a parent class instance by laying out the fields of child classes at the end of the inline storage.

Inline classes are implicitly sealed. They cannot have inner classes. If they have child classes, these must be inline classes as well. For now, we also require that they only have immutable fields and that they don't have type parameters or type members.

I do not understand the reason why inline classes could not be parameterized. One can compute the layout of the inline storage during monomorphization, so there should be no problem writing such a class:

inline class Pair[A, B](first: A, second: B)

The main difficulty is to generate the ClassInstance info for such a type. IIUC, this information roughly matches what Swift (and Hylo) calls "witness tables". Perhaps I can shed some light on this concept so that we can figure out if they can be adapted to this proposal.

Boxing in Swift consists of creating an existential type. Just like in CS literature, an existential type "wraps" a witness with an interface. At runtime, existential types are represented as so-called "existential containers", which have a layout of the form:

┌─────────┬─────┬─────┐
│ payload │ vwt │ pwt │
└─────────┴─────┴─────┘

payload is some raw storage that can contain up to 3 words and is used to store the witness (i.e., the boxed instance). vwt is a pointer to a value witness table and pwt is a pointer to a protocol witness table.

[Note: The reason for a 3-word size payload in Swift is historical. In practice I find it to be too small for many witnesses and therefore we usually end up wasting 2 words because we're only storing a pointer to out-of-line storage.]

A value witness table tells the runtime about the "value behavior" of the witness. Specifically, it contains the size of the witness and pointers to methods for copying it, moving it, and destroying it.

The protocol witness table tells the runtime how to apply dynamic dispatch. Protocols in Swift are best understood as a traits, and are the primary tool to write generic code. The protocols to which a witness conform describe the interface of the existential type in which it's been wrapped. At runtime, the protocol witness table is used to lookup the implementation of each trait requirement. You may want to have a look at this paper for some formal description of a protocol witness table.

Note that fields can be represented suing getter and setters that hardcode the offset. So access to a property can be represented like method calls. Better performance can be achieved using subscripts, but that is perhaps an orthogonal feature that I'm happy to describe in another post.

Say you have this generic program:

protocol P {
  func foo(_ n: Int) -> Bool
}

struct A: P {
  let x: Int
  func foo(_ n: Int) -> Bool { n == x }
}

fun bar(_ s: any P) -> Bool { s.foo(1) }

func ham<T: P>(_ s: T) -> Bool { s.foo(1) }

func main() {
  _ = bar(A(x: 1))
  _ = ham(A(x: 1))
}

The call to bar involves boxing an instance of A into an existential container. The size of A's inline storage is 1 word (the size of a single Int) so it fits payload. The value and protocol witness tables are created at compile time, so we can store pointers to static data structures in vwt and pwt, respectively. Nothing should be surprising about the value witness table, so let's focus on the other one.

A conforms to a single protocol, so the protocol witness table must contain a single entry, keyed by some symbol identifier A's conformance to P. The value for that key is another table mapping the single requirement P to a thin adapter that can extract the value of the witness from the container and forward it to the method implementing P.foo in A. When s.foo(1) is called in bar, the runtime simple reads the protocol witness table to dispatch the call.

If the call to ham is monomorphized, everything is dispatched statically (and most likely inlined away). If the call is existentialized, we do something very similar as what we did to box bar's argument, but instead of wrapping the payload along with its witness tables, we pass them separately (i.e., the tables are additional arguments to the existentialized form of ham).

Should A be generic, in principle we can still create witness tables on demand statically because the compiler can see when it is about to get boxed (or passed to an existentialized function). The only complication is that we may have to read from type substitution tables to properly compute the size of the witness.

Of course, witness tables can be reused for every boxed instance of the same type so in practice they are not very expensive.

@kyouko-taiga
Copy link

From here:

Given an inline class reference, we need to be able to access not just its fields but also the class info and vtable of the inline class. Since inline classes don't come with a header the only way to store this info is in the reference itself.

Why not store a header with all the information we need along with the boxed value? Then we would need only a single bit in the reference to tell whether it's a regular reference or one to a boxed struct.

This approach would also work on a 32-bit machine, since we'd need to tag pointers with only a single bit. References on a 32-bit machine are typically aligned at 4 bytes, allowing us to use the two least significant bits.

From here:

One new situation with inline classes is that we can now have inline class references that point to the middle of objects. These references can only live on the stack, not in the heap.

I'd like to propose going with (mutable) value semantics to address this issue. If inline classes (or structs) behave with value semantics, then we no longer need to care about references to the middle of a potentially stack-allocated object. That's because any "use" of a field would result in a copy.

This approach would make inline classes behave differently from regular Scala classes (which would be one more reason to call them structs) but I also suspect it would make it simpler to fit them into the language and lift the immutability restriction.

There is precedent for making value semantics and reference semantics co-exist. C# and Swift are examples. I have a lot of experience with the latter.

@ekrich
Copy link
Member

ekrich commented Mar 22, 2024

This approach would also work on a 32-bit machine, since we'd need to tag pointers with only a single bit. References on a 32-bit machine are typically aligned at 4 bytes, allowing us to use the two least significant bits.

Yes, it would be really nice to support 32bit as Scala Native is quite portable and useful for 32bit platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants