Skip to content

Possible race condition in proxy object creation #9862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
filipnavara opened this issue Feb 28, 2025 · 11 comments · May be fixed by #9921
Open

Possible race condition in proxy object creation #9862

filipnavara opened this issue Feb 28, 2025 · 11 comments · May be fixed by #9921
Assignees
Labels
Area: App Runtime Issues in `libmonodroid.so`.
Milestone

Comments

@filipnavara
Copy link
Member

filipnavara commented Feb 28, 2025

Android framework version

net8.0-android

Affected platform version

VS 2022 17.12.4

Description

We spent the last few weeks trying to track down a mysterious bug where two instances of Android.App.Application .NET proxy objects are created. So far we were not able to reproduce the issue locally but it happens quite consistently for thousands of our customers.

Finally we were able to get stack traces from the constructor where the second instance is created.

The usual first instance gets created with this stack trace:

AndroidApp::.ctor
   at System.Environment.get_StackTrace()
   at MailClient.Mobile.Droid.AndroidApp..ctor(IntPtr handle, JniHandleOwnership ownership)
   at System.Reflection.RuntimeConstructorInfo.InternalInvoke(RuntimeConstructorInfo , Object , IntPtr* , Exception& )
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Constructor(Object obj, IntPtr* args)
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object , Span`1 , BindingFlags )
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.ConstructorInfo.Invoke(Object[] parameters)
   at Java.Interop.TypeManager.CreateProxy(Type , IntPtr , JniHandleOwnership )
   at Java.Interop.TypeManager.CreateInstance(IntPtr , JniHandleOwnership , Type )
   at Java.Lang.Object.GetObject(IntPtr , JniHandleOwnership , Type )
   at Java.Lang.Object._GetObject[Application](IntPtr , JniHandleOwnership )
   at Java.Lang.Object.GetObject[Application](IntPtr handle, JniHandleOwnership transfer)
   at Java.Lang.Object.GetObject[Application](IntPtr jnienv, IntPtr handle, JniHandleOwnership transfer)
   at Android.App.Application.n_OnCreate(IntPtr jnienv, IntPtr native__this)
   at Android.Runtime.JNINativeWrapper.Wrap_JniMarshal_PP_V(_JniMarshal_PP_V callback, IntPtr jnienv, IntPtr klazz)
-- Java --
	at crc647fae2f69c19dcd0d.AndroidApp.n_onCreate(Native Method)
	at crc647fae2f69c19dcd0d.AndroidApp.onCreate(AndroidApp.java:25)
	at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316)
	at android.app.ActivityThread.handleBindApplication(ActivityThread.java:7848)
	at android.app.ActivityThread.-$$Nest$mhandleBindApplication(Unknown Source:0)
	at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2486)
	at android.os.Handler.dispatchMessage(Handler.java:106)
	at android.os.Looper.loopOnce(Looper.java:230)
	at android.os.Looper.loop(Looper.java:319)
	at android.app.ActivityThread.main(ActivityThread.java:9063)
	at java.lang.reflect.Method.invoke(Native Method)
	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:588)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1103)

The second instance comes from a worker (subclass of AndroidX.Work.Worker):

   at System.Environment.get_StackTrace()
   at MailClient.Mobile.Droid.AndroidApp..ctor(IntPtr handle, JniHandleOwnership ownership)
   at System.Reflection.RuntimeConstructorInfo.InternalInvoke(RuntimeConstructorInfo , Object , IntPtr* , Exception& )
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Constructor(Object obj, IntPtr* args)
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object , Span`1 , BindingFlags )
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object , BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags , Binder , Object[] , CultureInfo )
   at System.Reflection.ConstructorInfo.Invoke(Object[] parameters)
   at Java.Interop.TypeManager.CreateProxy(Type , IntPtr , JniHandleOwnership )
   at Java.Interop.TypeManager.CreateInstance(IntPtr , JniHandleOwnership , Type )
   at Java.Lang.Object.GetObject(IntPtr , JniHandleOwnership , Type )
   at Android.Runtime.JNIEnv.<>c.<CreateNativeArrayElementToManaged>b__70_9(Type type, IntPtr source, Int32 index)
   at Android.Runtime.JNIEnv.GetObjectArray(IntPtr , Type[] )
   at Java.Interop.TypeManager.n_Activate(IntPtr jnienv, IntPtr jclass, IntPtr typename_ptr, IntPtr signature_ptr, IntPtr jobject, IntPtr parameters_ptr)
   at Android.Runtime.JNINativeWrapper.Wrap_JniMarshal_PPLLLL_V(_JniMarshal_PPLLLL_V callback, IntPtr jnienv, IntPtr klazz, IntPtr p0, IntPtr p1, IntPtr p2, IntPtr p3)
-- Java --
	at mono.android.TypeManager.n_activate(Native Method)
	at mono.android.TypeManager.Activate(TypeManager.java:7)
	at crc647fae2f69c19dcd0d.SyncWorker.<init>(SyncWorker.java:23)
	at java.lang.reflect.Constructor.newInstance0(Native Method)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
	at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:95)
	at androidx.work.impl.WorkerWrapper.runWorker(WorkerWrapper.java:243)
	at androidx.work.impl.WorkerWrapper.run(WorkerWrapper.java:144)
	at androidx.work.impl.utils.SerialExecutorImpl$Task.run(SerialExecutorImpl.java:96)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
	at java.lang.Thread.run(Thread.java:1012)

Notably, the first instance exists and it's not collected on the .NET side (pinned through a static variable). This suggests that there's a race condition that could lead to creation of two proxy objects.

Steps to Reproduce

#9862 (comment)

Did you find any workaround?

No response

Relevant log output

@filipnavara filipnavara added the needs-triage Issues that need to be assigned. label Feb 28, 2025
@filipnavara filipnavara changed the title Possible race condition in peer object creation Possible race condition in proxy object creation Feb 28, 2025
@filipnavara
Copy link
Member Author

filipnavara commented Feb 28, 2025

If it's expected that a second proxy object can be instantiated (*), then feel free to delegate the issue to dotnet/maui because they depend on the behaviour that this doesn't happen by saving the instance to a static variable:
https://github.com/dotnet/maui/blob/196a1d651c2d0abd60054d6a95866e9b2a14f66f/src/Core/src/Platform/Android/MauiApplication.cs#L19-L22

This in turn is accessed all over that place. With the example stack traces above, once the Application.OnCreate virtual method is called the instance saved to the static variable in constructor has been overwritten.

(*) I very much doubt it's intentional; if there were data associated with the object it would create a lot of issues.

@jpobst jpobst added Area: App Runtime Issues in `libmonodroid.so`. and removed needs-triage Issues that need to be assigned. labels Feb 28, 2025
@jpobst jpobst assigned jonpryor and jonathanpeppers and unassigned jpobst Feb 28, 2025
@jonathanpeppers
Copy link
Member

Do you have a small example of your AndroidX.Work.Worker and where do you start it?

Typical operation would be:

  1. Application is created
  2. ContentProvider starts the runtime
  3. Worker starts

But I wonder if no. 3 can start happening in parallel with 2.

@jonathanpeppers jonathanpeppers added this to the .NET 10 milestone Feb 28, 2025
@filipnavara
Copy link
Member Author

Do you have a small example of your AndroidX.Work.Worker and where do you start it?

The source code for the worker itself is likely not interesting - https://gist.github.com/filipnavara/c8e196bb1bd4471a00174708366ef6ff.

What is likely more interesting is how the worker is scheduled:

var syncWorkRequest = (PeriodicWorkRequest.Builder.From<SyncWorker>(TimeSpan.FromMinutes(15))
  .SetConstraints(new Constraints.Builder().SetRequiredNetworkType(NetworkType.Connected).Build())
  .SetInitialDelay(15, TimeUnit.Minutes) as PeriodicWorkRequest.Builder).Build();
WorkManager.GetInstance(context).EnqueueUniquePeriodicWork("sync", ExistingPeriodicWorkPolicy.Update, syncWorkRequest);

The traces we get from customers suggest that it's often happening after clicking on a notification. It's likely that the "required network type" constraint may play a role because some devices tend to do a "soft sleep" mode for network connections when locked. In this scenario it's more likely that the worker will get triggered alongside creation of the activity to handle the notification.

That said, it's just a theory. We don't have repro on any of our own devices and we have tried for weeks. Weirdly it happens to a large percent of customers in the wild regularly enough.

I don't see any locking in Java.Lang.Object.GetObject that would prevent two calls getting past the Peek* phase and creating separate instances. Launching the worker and the activity itself definitely happens from separate threads.

@filipnavara
Copy link
Member Author

filipnavara commented Mar 3, 2025

We managed to reproduce it on our devices for the first time 🎊

The trick is to mark the application as "Restricted" in the Battery settings. This prevents it from running in the background. We then run it and put it to background. Then we send a notification which wakes up the app and sets up the worker to run every 15 minutes. However, the app is forbidden to run in background, so it doesn't immediately trigger. After 15 minutes we start the app from notification and the issue reproduces because the system starts the worker alongside the app initialization.

This scenario can likely be simplified. The notifications are not essential part, the battery saving setting is.

Turns out that the crash logged from my device is 2 days old, so still no reliable repro 😔

@filipnavara
Copy link
Member Author

filipnavara commented Mar 3, 2025

I have a somewhat reliable repro:
AndroidWorker.zip

  • Run the app from Visual Studio with attached debugger
  • Press the "Run worker" button
  • Kill the app (eg. stop in VS)
  • Wait a minute
  • Place a breakpoint in TestContentProvider.OnCreate (on the return true line) and Worker..ctor
  • Run the app

Both breakpoints should be hit, first TestContentProvider.OnCreate, then Worker..ctor. Once you are in Worker..ctor observe in the watch window that MainApplication.constructorCallCount == 2 (ie. the constructor was called twice). Sometimes it happens even on the first breakpoint.

It's not the most reliable repro ever but I managed to hit it on two separate devices and Windows Subsystem for Android. Restarting the app when on the Worker..ctor breakpoint usually keeps the worker scheduled and the next run will exhibit the bug too.

@jonpryor
Copy link
Member

jonpryor commented Mar 4, 2025

I'm not sure that this is "reasonably" fixable.

Among the many guidelines around writing thread safe code are two of interest here:

  1. The lock should be held for as short as time as possible, and
  2. Code within the lock should not call "arbitrary" code.

See e.g. The COM static store, part 3: Avoiding creation of an expensive temporary when setting a singleton:

be extremely mindful about what you do while holding a lock. Don’t call out to foreign code while holding a lock.

Attempting to throw a lock around TypeManager.CreateInstance() immediately violates both of those guidelines:

  • It's a lot of code to protect by a lock, and
  • It invokes arbitrary code! TypeManager.CreateInstance() calls TypeManager.CreateProxy() which looks up and invokes the T(IntPtr, JniHandleOwnership) constructor, which is user-written code and can thus do anything, including calls to Object.GetObject<T>(). (Hello deadlock?)

I do not believe it is possible to prevent the creation of two AndroidApp instances, not without introducing the possibility of deadlock. Even in the "simpler" thread-safe The COM static store, part 2: Race conditions in setting a singleton, part of the approach involves the possibility of creating two of a thing, knowing that you'd throw one away. ("This algorithm assumes that creating a Thing and throwing it away is relatively harmless.")

Given that this scenario is difficult to address, what underlying issue are you trying to fix/workaround? What happens when two Application instances are created that made you care about this issue in the first place? Maybe we can improve or address that?

@filipnavara
Copy link
Member Author

Given that this scenario is difficult to address, what underlying issue are you trying to fix/workaround?

MAUI uses the constructor of MauiApplication to save the instance in a static variable:
https://github.com/dotnet/maui/blob/3d43ae808015338c8da7d9a5fe4cc36b74a3ac8f/src/Core/src/Platform/Android/MauiApplication.cs#L18-L22

The problem we hit is that this stored instance is a different from the one that the Application.OnCreate method is called on. This eventually results in a large chain of failures in the code where no windows are created for the app here:
https://github.com/dotnet/maui/blob/3d43ae808015338c8da7d9a5fe4cc36b74a3ac8f/src/Core/src/Platform/Android/MauiAppCompatActivity.cs#L26-L29 (IPlatformApplication.Current points to a non-null instance with the member Application == null, it's not the instance where OnCreate was called).

We currently workaround it by updating MAUI's IPlatformApplication.Current in OurApplication.OnCreate to point to this instance. However, it took us weeks to diagnose how this happens and it was crashing the app for a high percentage of our customers at one point or another (around 25% - 50% based on a limited sample of few thousand people).

@filipnavara
Copy link
Member Author

filipnavara commented Mar 4, 2025

Additionally, is there a guarantee that only one of those instances is used for resolving the proxy calls even if more than one is created? If that was not guaranteed then any local state in the C# object would be unusable without the guarantee.

How will the GC bridge work with two instances representing the same Java object? (unless one gets thrown away in some deterministic manner)

@jonpryor
Copy link
Member

jonpryor commented Mar 6, 2025

@filipnavara wrote:

The problem we hit is that this stored instance is a different from the one that the Application.OnCreate method is called on.

I wonder if a suitable workaround would be to set MauiApplication.Current from MauiApplication.OnCreate() instead of the constructor:

partial class MauiApplication {
    public override void OnCreate()
    {
        Current = this;
        // …
    }
}

This assumes that only one thread will be calling activity.onCreate().

@filipnavara
Copy link
Member Author

I wonder if a suitable workaround would be to set MauiApplication.Current from MauiApplication.OnCreate() instead of the constructor:

That's our workaround for the time being. If it's expected scenario to happen then it should definitely be fixed in upstream dotnet/maui code.

@jonpryor
Copy link
Member

jonpryor commented Mar 6, 2025

@filipnavara asked:

How will the GC bridge work with two instances representing the same Java object? (unless one gets thrown away in some deterministic manner)

One gets thrown away in a plausibly deterministic manner:

  • result.SetJniManagedPeerState (JniManagedPeerStates.Replaceable | JniManagedPeerStates.Activatable);
  • bool ShouldReplaceMapping (WeakReference<IJavaPeerable> current, JniObjectReference reference, out IJavaPeerable? target)
    {
    target = null;
    if (current == null)
    return true;
    // Target has been GC'd; see also FIXME, above, in finalizer
    if (!current.TryGetTarget (out target) || target == null)
    return true;
    // It's possible that the instance was GC'd, but the finalizer
    // hasn't executed yet, so the `instances` entry is stale.
    if (!target.PeerReference.IsValid)
    return true;
    if (!JniEnvironment.Types.IsSameObject (target.PeerReference, reference))
    return false;
    // JNIEnv.NewObject/JNIEnv.CreateInstance() compatibility.
    // When two MCW's are created for one Java instance [0],
    // we want the 2nd MCW to replace the 1st, as the 2nd is
    // the one the dev created; the 1st is an implicit intermediary.
    //
    // [0]: If Java ctor invokes overridden virtual method, we'll
    // transition into managed code w/o a registered instance, and
    // thus will create an "intermediary" via
    // (IntPtr, JniHandleOwnership) .ctor.
    if ((target.JniManagedPeerState & JniManagedPeerStates.Replaceable) == JniManagedPeerStates.Replaceable)
    return true;
    return false;
    }
  • if (ShouldReplaceMapping (wref!, reference, out target)) {
    found = true;
    targets [i] = IdentityHashTargets.CreateWeakReference (value);
    break;
    }
    if (JniEnvironment.Types.IsSameObject (value.PeerReference, target!.PeerReference)) {
    found = true;
    if (Logger.LogGlobalRef) {
    Logger.Log (LogLevel.Info, "monodroid-gref", FormattableString.Invariant (
    $"warning: not replacing previous registered handle {target.PeerReference} with handle {reference} for key_handle 0x{hash:x}"));
    }
    }

Generally, the first one "wins", unless the first one is "Replaceable", in which case the second one wins. This is generally encountered with Java-side activation and Java constructors calling virtual methods which are overridden in managed code.

The case here is that the "new" value is also "Replaceable"; we're replacing a "Replaceable" proxy with another "Replaceable" proxy!

We could update AndroidValueManager.AddPeer()/etc. so that we only replace a "Replaceable" value with a non- "Replaceable" value. This would preserve "first one wins" semantics.

jonpryor added a commit to dotnet/java-interop that referenced this issue Mar 14, 2025
Context: 3043d89
Context: dotnet/android#9862
Context: dotnet/android#9862 (comment)

In dotnet/android#9862, there is an observed "race condition" around
`Android.App.Application` subclass creation.  *Two* instances of
`AndroidApp` were created, one from the "normal" app startup:

	at crc647fae2f69c19dcd0d.AndroidApp.n_onCreate(Native Method)
	at crc647fae2f69c19dcd0d.AndroidApp.onCreate(AndroidApp.java:25)
	at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316)

and another from an `androidx.work.WorkerFactory`:

	at mono.android.TypeManager.n_activate(Native Method)
	at mono.android.TypeManager.Activate(TypeManager.java:7)
	at crc647fae2f69c19dcd0d.SyncWorker.<init>(SyncWorker.java:23)
	at java.lang.reflect.Constructor.newInstance0(Native Method)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
	at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:95)

However, what was odd about this "race condition" was that the
*second* instance created would reliably win!

Further investigation suggested that this was less of a
"race condition" and more a bug in `AndroidRuntime`, wherein when
"Replaceable" instances were created, an existing instance would
*always* be replaced.

Aside: JniManagedPeerStates.Replaceable is from 3043d89:

> `JniManagedPeerStates.Replaceable` … means
> that the Peer instance was created through the activation constructor.
> It additionally means that if two managed instances are created around
> the same Java instance, the non-Replaceable instance will be the one
> returned by JniRuntime.JniValueManager.PeekObject().

What we're observing in dotnet/android#9862 is that while the
Replaceable instance is replaced, it's being replaced by *another*
Replaceable instance!  This feels bananas; yes, Replaceable should
be replacable, but only by *non*-Replaceable instances.

Update `JniRuntimeJniValueManagerContract` to add a new
`CreatePeer_ReplaceableDoesNotReplace()` test to codify the desired
semantic that Replaceable instances do not replace Replaceable
instances.

Surprisingly, this does not fail on java-interop!  Apparently
`ManagedValueManager.AddPeer()` bails early when `PeekPeer()` finds
a value, while `AndroidRuntime.AddPeer()` does not bail early.
jonpryor added a commit that referenced this issue Mar 14, 2025
Context: dotnet/java-interop#1323
Context: #9862 (comment)

Does It Build™?

(The expectation is that it *does* build -- only unit tests are changed
in dotnet/java-interop#1323 -- but that the new
`JniRuntimeJniValueManagerContract.cs.CreatePeer_ReplaceableDoesNotReplace()`
test will fail.)`
@jonpryor jonpryor linked a pull request Mar 14, 2025 that will close this issue
jonpryor added a commit that referenced this issue Apr 4, 2025
Context: dotnet/java-interop#1323
Context: #9862 (comment)

Does It Build™?

(The expectation is that it *does* build -- only unit tests are changed
in dotnet/java-interop#1323 -- but that the new
`JniRuntimeJniValueManagerContract.cs.CreatePeer_ReplaceableDoesNotReplace()`
test will fail.)`
jonpryor added a commit to dotnet/java-interop that referenced this issue Apr 9, 2025
Context: dotnet/android#10004

It looks like dotnet/android#10004 is closely tied to dotnet/android#9862.

Might as well update tests to hit this behavior!
jonpryor added a commit to dotnet/java-interop that referenced this issue Apr 15, 2025
Context: 3043d89
Context: dec35f5
Context: dotnet/android#9862
Context: dotnet/android#9862 (comment)
Context: dotnet/android#10004
Context: xamarin/monodroid@326509e
Context: xamarin/monodroid@940136e

Ever get the feeling that everything is inextricably related?

JNI has two pattens for create an instance of a Java type:

 1. [`JNIEnv::NewObject(jclass clazz, jmethodID methodID, const jvalue* args)`][0]

 2. [`JNIEnv::AllocObject(jclass clazz)`][1] + 
    [`JNIEnv::CallNonvirtualVoidMethod(jobject obj, jclass clazz, jmethodID methodID, const jvalue* args)`][2]

In both patterns:

  * `clazz` is the `java.lang.Class` of the type to create.
  * `methodID` is the constructor to execute
  * `args` are the constructor arguments.

In .NET terms:

  * `JNIEnv::NewObject()` is equivalent to using
    `System.Reflection.ConstructorInfo.Invoke(object?[]?)`, while

  * `JNIEnv::AllocObject()` + `JNIEnv::CallNonvirtualVoidMethod()` is
    equivalent to using
    `System.Runtime.CompilerServices.RuntimeHelpers.GetUninitializedObject(Type)` +
    `System.Reflection.MethodBase.Invoke(object?, object?[]?)`.

Why prefer one over the other?

When hand-writing your JNI code, `JNIEnv::NewObject()` is easier.
This is less of a concern when a code generator is used.

The *real* reason to *avoid* `JNIEnv::NewObject()` whenever possible
is the [Java Activation][3] scenario, summarized as the "are you sure
you want to do this?" [^1] scenario of invoking a virtual method from the
constructor:

	class Base {
	  public Base() {
	    VirtualMethod();
	  }
	  public virtual void VirtualMethod() {}
	}

	class Derived : Base {
	  public override void VirtualMethod() {}
	}


Java and C# are identical here: when a constructor invokes a virtual
method, the most derived method implementation is used, which will
occur *before* the constructor of the derived type has *started*
execution.  (With lots of quibbling about field initializers…)

Thus, assume you have a Java `CallVirtualFromConstructorBase` type,
which has its constructor Do The Wrong Thing™ and invoke a virtual
method from the constructor, and that method is overridden in C#?

	// Java
	public class CallVirtualFromConstructorBase {
	  public CallVirtualFromConstructorBase(int value) {
	    calledFromConstructor(value);
	  }
	  public void calledFromConstructor(int value) {
	  }
	}

	// C#
	public class CallVirtualFromConstructorDerived : CallVirtualFromConstructorBase {
	  public CallVirtualFromConstructorDerived(int value)
	    : base (value)
	  {
	  }

	  public override void CalledFromConstructor(int value)
	  {
	  }
	}

What happens with:

	var p = new CallVirtualFromConstructorDerived(42);

The answer depends on whether or not `JNIEnv::NewObject()` is used.

If `JNIEnv::NewObject()` is *not* used (the default!)

 1. `CallVirtualFromConstructorDerived(int)` constructor begins
    execution, immediately calls `base(value)`.

 2. `CallVirtualFromConstructorBase(int)` constructor runs, uses
    `JNIEnv::AllocObject()` to *create* (but not construct!) Java
    `CallVirtualFromConstructorDerived` instance.

 3. `JavaObject.Construct(ref JniObjectReference, JniObjectReferenceOptions)`
    invoked, creating a mapping between the C# instance created in
    (1) and the Java instance created in (2).

 4. `CallVirtualFromConstructorBase(int)` C# constructor calls
    `JniPeerMembers.InstanceMethods.FinishGenericCreateInstance()`,
    which eventually invokes `JNIEnv::CallNonvirtualVoidMethod()`
    with the Java `CallVirtualFromConstructorDerived(int)` ctor.

 5. Java `CallVirtualFromConstructorDerived(int)` constructor invokes
    Java `CallVirtualFromConstructorBase(int)` constructor, which
    invokes `CallVirtualFromConstructorDerived.calledFromConstructor()`.

 6. Marshal method (356485e) for
    `CallVirtualFromConstructorBase.CalledFromConstructor()` invoked,
    *immediately* calls `JniRuntime.JniValueManager.GetPeer()`
    (e288589) to obtain an instance upon which to invoke
    `.CalledFromConstructor()`, finds the instance mapping from (3),
    invokes 
    `CallVirtualFromConstructorDerived.CalledFromConstructor()`
    override.

 7. Marshal Method for `CalledFromConstructor()` returns, Java
    `CallVirtualFromConstructorBase(int)` constructor finishes,
    Java `CallVirtualFromConstructorDerived(int)` constructor
    finishes, `JNIEnv::CallNonvirtualVoidMethod()` finishes.

 8. `CallVirtualFromConstructorDerived` instance finishes construction.

If `JNIEnv::NewObject()` is used:

 1. `CallVirtualFromConstructorDerived(int)` constructor begins
    execution, immediately calls `base(value)`.

    Note that this is the first created `CallVirtualFromConstructorDerived`
    instance, but it hasn't been registered yet.

 2. `CallVirtualFromConstructorBase(int)` constructor runs, uses
    `JNIEnv::NewObject()` to construct Java
    `CallVirtualFromConstructorDerived` instance.

 3. `JNIEnv::NewObject()` invokes Java
    `CallVirtualFromConstructorDerived(int)` constructor, which invokes
    `CallVirtualFromConstructorBase(int)` constructor, which invokes
    `CallVirtualFromConstructorDerived.calledFromConstructor()`.

 4. Marshal method (356485e) for
    `CallVirtualFromConstructorBase.CalledFromConstructor()` invoked,
    *immediately* calls `JniRuntime.JniValueManager.GetPeer()`
    (e288589) to obtain an instance upon which to invoke
    `.CalledFromConstructor()`.

    Here is where things go "off the rails" compared to the
    `JNIEnv::AllocObject()` code path:

    There is no such instance -- we're still in the middle of
    constructing it! -- so we look for an "activation constructor".

 5. `CallVirtualFromConstructorDerived(ref JniObjectReference, JniObjectReferenceOptions)`
    activation constructor executed.

    This is the *second* `CallVirtualFromConstructorDerived` instance
    created, and registers a mapping from the Java instance that
    we started constructing in (3) to what we'll call the
    "activation intermediary".

    The activation intermediary instance is marked as "Replaceable".

 6. `CallVirtualFromConstructorDerived.CalledFromConstructor()` method
    override invoked on the activation intermediary.

 7. Marshal Method for `CalledFromConstructor()` returns, Java
    `CallVirtualFromConstructorBase(int)` constructor finishes,
    Java `CallVirtualFromConstructorDerived(int)` constructor
    finishes, `JNIEnv::NewObject()` returns instance.

 8. C# `CallVirtualFromConstructorBase(int)` constructor calls
    `JavaObject.Construct(ref JniObjectReference, JniObjectReferenceOptions)`,
    to create a mapping between (3) and (1).

    In .NET for Android, this causes the C# instance created in (1)
    to *replace* the C# instance created in (5), which allows
    "Replaceable" instance to be replaced.

    In dotnet/java-interop, this replacement *didn't* happen, which
    meant that `ValueManager.PeekPeer(p.PeerReference)` would return
    the activation intermediary, *not* `p`, which confuses everyone.

 9. `CallVirtualFromConstructorDerived` instance finishes construction.

For awhile, dotnet/java-interop did not fully support this scenario
around `JNIEnv::NewObject()`.  Additionally, support for using
`JNIEnv::NewObject()` as part of
`JniPeerMembers.JniInstanceMethods.StartCreateInstance()` was
*removed* in dec35f5.

Which brings us to dotnet/android#9862: where there is an observed
"race condition" around `Android.App.Application` subclass creation.
*Two* instances of `AndroidApp` were created, one from the "normal"
app startup:

	at crc647fae2f69c19dcd0d.AndroidApp.n_onCreate(Native Method)
	at crc647fae2f69c19dcd0d.AndroidApp.onCreate(AndroidApp.java:25)
	at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316)

and another from an `androidx.work.WorkerFactory`:

	at mono.android.TypeManager.n_activate(Native Method)
	at mono.android.TypeManager.Activate(TypeManager.java:7)
	at crc647fae2f69c19dcd0d.SyncWorker.<init>(SyncWorker.java:23)
	at java.lang.reflect.Constructor.newInstance0(Native Method)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:343)
	at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:95)

However, what was odd about this "race condition" was that the
*second* instance created would reliably win!

Further investigation suggested that this was less of a
"race condition" and more a bug in `AndroidValueManager`, wherein when
"Replaceable" instances were created, an existing instance would
*always* be replaced, even if the new instance was also Replaceable!
This feels bananas; yes, Replaceable should be replaceable, but the
expectation was that it would be replaced by *non*-Replaceable
instances, not just any instance that came along later.

Update `JniRuntimeJniValueManagerContract` to add a new
`CreatePeer_ReplaceableDoesNotReplace()` test to codify the desired
semantic that Replaceable instances do not replace Replaceable
instances.

Surprisingly, this new test did not fail on java-interop, as
`ManagedValueManager.AddPeer()` bails early when `PeekPeer()` finds
a value, while `AndroidValueManager.AddPeer()` does not bail early.

An obvious fix for `CreatePeer_ReplaceableDoesNotReplace()` within
dotnet/android would be to adopt the "`AddPeer()` calls `PeekPeer()`"
logic from java-interop.  The problem is that doing so breaks
[`ObjectTest.JnienvCreateInstance_RegistersMultipleInstances()`][5],
as seen in dotnet/android#10004!

`JnienvCreateInstance_RegistersMultipleInstances()` in turn fails
when `PeekPeer()` is used because follows the `JNIEnv::NewObject()`
[construction codepath][6]!

	public CreateInstance_OverrideAbsListView_Adapter (Context context)
	  : base (
	      JNIEnv.CreateInstance (
	        JcwType,
	        "(Landroid/content/Context;)V",
	        new JValue (context)),
	      JniHandleOwnership.TransferLocalRef)
	{
	  AdapterValue = new ArrayAdapter (context, 0);
	}

as `JNIEnv.CreateInstance()` uses `JNIEnv.NewObject()`.

We thus have a conundrum: how do we fix *both*
`CreatePeer_ReplaceableDoesNotReplace()` *and*
`JnienvCreateInstance_RegistersMultipleInstances()`?

The answer is to add proper support for the `JNIEnv::NewObject()`
construction scenario to dotnet/java-interop, which in turn requires
"lowering" the setting of `.Replaceable`.  Previously, we would set
`.Replaceable` *after* the activation constructor was invoked:

	// dotnet/android TypeManager.CreateInstance(), paraphrasing
	var result = CreateProxy (type, handle, transfer);
	result.SetJniManagedPeerState (JniManagedPeerStates.Replaceable | JniManagedPeerStates.Activatable);
	return result;

This is *too late*, as during execution of the activation constructor,
the instance thinks it *isn't* replaceable, and thus creation of a new
instance via the activation constructor will replace an already
existing replaceable instance; it's not until *after* the constructor
finished executing that we'd set `.Replaceable`.

To fix this, update `JniRuntime.JniValueManager.TryCreatePeerInstance()`
to first create an *uninitialized* instance, set `.Replaceable`, and
*then* invoke the activation constructor.  This allows
`JniRuntime.JniValueManager.AddPeer()` to check to see if the new
value is also replaceable, and ignore the replacement if appropriate.

This in turn requires replacing:

	partial class /* JniRuntime. */ JniValueManager {
	  protected virtual IJavaPeerable? TryCreatePeer ()
	      ref JniObjectReference reference,
	      JniObjectReferenceOptions options,
	      Type type);
	}

with:

	partial class /* JniRuntime. */ JniValueManager {
	  protected virtual bool TryConstructPeer ()
	      IJavaPeerable self,
	      ref JniObjectReference reference,
	      JniObjectReferenceOptions options,
	      Type type);
	}

This is fine because we haven't shipped `TryCreatePeer()` in a stable
release yet.

[^1]:   See also [Framework Design Guidelines > Constructor Design][4]:

    > ❌ AVOID calling virtual members on an object inside its constructor.
    > Calling a virtual member will cause the most derived override to be
    > called, even if the constructor of the most derived type has not
    > been fully run yet.

[0]: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewObject
[1]: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#AllocObject
[2]: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#CallNonvirtual_type_Method_routines
[3]: https://learn.microsoft.com/en-us/previous-versions/xamarin/android/internals/architecture#java-activation
[4]: https://learn.microsoft.com/dotnet/standard/design-guidelines/constructor
[5]: https://github.com/dotnet/android/blob/9ad492a42b384519a8b1f1987adae82335536d9c/tests/Mono.Android-Tests/Mono.Android-Tests/Java.Lang/ObjectTest.cs#L68-L79
[6]: https://github.com/dotnet/android/blob/9ad492a42b384519a8b1f1987adae82335536d9c/tests/Mono.Android-Tests/Mono.Android-Tests/Java.Lang/ObjectTest.cs#L151-L160
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: App Runtime Issues in `libmonodroid.so`.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants