-
Notifications
You must be signed in to change notification settings - Fork 541
Possible race condition in proxy object creation #9862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If it's expected that a second proxy object can be instantiated (*), then feel free to delegate the issue to dotnet/maui because they depend on the behaviour that this doesn't happen by saving the instance to a static variable: This in turn is accessed all over that place. With the example stack traces above, once the (*) I very much doubt it's intentional; if there were data associated with the object it would create a lot of issues. |
Do you have a small example of your Typical operation would be:
But I wonder if no. 3 can start happening in parallel with 2. |
The source code for the worker itself is likely not interesting - https://gist.github.com/filipnavara/c8e196bb1bd4471a00174708366ef6ff. What is likely more interesting is how the worker is scheduled: var syncWorkRequest = (PeriodicWorkRequest.Builder.From<SyncWorker>(TimeSpan.FromMinutes(15))
.SetConstraints(new Constraints.Builder().SetRequiredNetworkType(NetworkType.Connected).Build())
.SetInitialDelay(15, TimeUnit.Minutes) as PeriodicWorkRequest.Builder).Build();
WorkManager.GetInstance(context).EnqueueUniquePeriodicWork("sync", ExistingPeriodicWorkPolicy.Update, syncWorkRequest); The traces we get from customers suggest that it's often happening after clicking on a notification. It's likely that the "required network type" constraint may play a role because some devices tend to do a "soft sleep" mode for network connections when locked. In this scenario it's more likely that the worker will get triggered alongside creation of the activity to handle the notification. That said, it's just a theory. We don't have repro on any of our own devices and we have tried for weeks. Weirdly it happens to a large percent of customers in the wild regularly enough. I don't see any locking in |
We managed to reproduce it on our devices for the first time 🎊
Turns out that the crash logged from my device is 2 days old, so still no reliable repro 😔 |
I have a somewhat reliable repro:
Both breakpoints should be hit, first It's not the most reliable repro ever but I managed to hit it on two separate devices and Windows Subsystem for Android. Restarting the app when on the |
I'm not sure that this is "reasonably" fixable. Among the many guidelines around writing thread safe code are two of interest here:
See e.g. The COM static store, part 3: Avoiding creation of an expensive temporary when setting a singleton:
Attempting to throw a lock around
I do not believe it is possible to prevent the creation of two Given that this scenario is difficult to address, what underlying issue are you trying to fix/workaround? What happens when two |
MAUI uses the constructor of The problem we hit is that this stored instance is a different from the one that the We currently workaround it by updating MAUI's |
Additionally, is there a guarantee that only one of those instances is used for resolving the proxy calls even if more than one is created? If that was not guaranteed then any local state in the C# object would be unusable without the guarantee. How will the GC bridge work with two instances representing the same Java object? (unless one gets thrown away in some deterministic manner) |
I wonder if a suitable workaround would be to set partial class MauiApplication {
public override void OnCreate()
{
Current = this;
// …
}
} This assumes that only one thread will be calling |
That's our workaround for the time being. If it's expected scenario to happen then it should definitely be fixed in upstream dotnet/maui code. |
One gets thrown away in a plausibly deterministic manner:
Generally, the first one "wins", unless the first one is "Replaceable", in which case the second one wins. This is generally encountered with Java-side activation and Java constructors calling virtual methods which are overridden in managed code. The case here is that the "new" value is also "Replaceable"; we're replacing a "Replaceable" proxy with another "Replaceable" proxy! We could update |
Context: 3043d89 Context: dotnet/android#9862 Context: dotnet/android#9862 (comment) In dotnet/android#9862, there is an observed "race condition" around `Android.App.Application` subclass creation. *Two* instances of `AndroidApp` were created, one from the "normal" app startup: at crc647fae2f69c19dcd0d.AndroidApp.n_onCreate(Native Method) at crc647fae2f69c19dcd0d.AndroidApp.onCreate(AndroidApp.java:25) at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316) and another from an `androidx.work.WorkerFactory`: at mono.android.TypeManager.n_activate(Native Method) at mono.android.TypeManager.Activate(TypeManager.java:7) at crc647fae2f69c19dcd0d.SyncWorker.<init>(SyncWorker.java:23) at java.lang.reflect.Constructor.newInstance0(Native Method) at java.lang.reflect.Constructor.newInstance(Constructor.java:343) at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:95) However, what was odd about this "race condition" was that the *second* instance created would reliably win! Further investigation suggested that this was less of a "race condition" and more a bug in `AndroidRuntime`, wherein when "Replaceable" instances were created, an existing instance would *always* be replaced. Aside: JniManagedPeerStates.Replaceable is from 3043d89: > `JniManagedPeerStates.Replaceable` … means > that the Peer instance was created through the activation constructor. > It additionally means that if two managed instances are created around > the same Java instance, the non-Replaceable instance will be the one > returned by JniRuntime.JniValueManager.PeekObject(). What we're observing in dotnet/android#9862 is that while the Replaceable instance is replaced, it's being replaced by *another* Replaceable instance! This feels bananas; yes, Replaceable should be replacable, but only by *non*-Replaceable instances. Update `JniRuntimeJniValueManagerContract` to add a new `CreatePeer_ReplaceableDoesNotReplace()` test to codify the desired semantic that Replaceable instances do not replace Replaceable instances. Surprisingly, this does not fail on java-interop! Apparently `ManagedValueManager.AddPeer()` bails early when `PeekPeer()` finds a value, while `AndroidRuntime.AddPeer()` does not bail early.
Context: dotnet/java-interop#1323 Context: #9862 (comment) Does It Build™? (The expectation is that it *does* build -- only unit tests are changed in dotnet/java-interop#1323 -- but that the new `JniRuntimeJniValueManagerContract.cs.CreatePeer_ReplaceableDoesNotReplace()` test will fail.)`
Context: dotnet/java-interop#1323 Context: #9862 (comment) Does It Build™? (The expectation is that it *does* build -- only unit tests are changed in dotnet/java-interop#1323 -- but that the new `JniRuntimeJniValueManagerContract.cs.CreatePeer_ReplaceableDoesNotReplace()` test will fail.)`
Context: dotnet/android#10004 It looks like dotnet/android#10004 is closely tied to dotnet/android#9862. Might as well update tests to hit this behavior!
Context: 3043d89 Context: dec35f5 Context: dotnet/android#9862 Context: dotnet/android#9862 (comment) Context: dotnet/android#10004 Context: xamarin/monodroid@326509e Context: xamarin/monodroid@940136e Ever get the feeling that everything is inextricably related? JNI has two pattens for create an instance of a Java type: 1. [`JNIEnv::NewObject(jclass clazz, jmethodID methodID, const jvalue* args)`][0] 2. [`JNIEnv::AllocObject(jclass clazz)`][1] + [`JNIEnv::CallNonvirtualVoidMethod(jobject obj, jclass clazz, jmethodID methodID, const jvalue* args)`][2] In both patterns: * `clazz` is the `java.lang.Class` of the type to create. * `methodID` is the constructor to execute * `args` are the constructor arguments. In .NET terms: * `JNIEnv::NewObject()` is equivalent to using `System.Reflection.ConstructorInfo.Invoke(object?[]?)`, while * `JNIEnv::AllocObject()` + `JNIEnv::CallNonvirtualVoidMethod()` is equivalent to using `System.Runtime.CompilerServices.RuntimeHelpers.GetUninitializedObject(Type)` + `System.Reflection.MethodBase.Invoke(object?, object?[]?)`. Why prefer one over the other? When hand-writing your JNI code, `JNIEnv::NewObject()` is easier. This is less of a concern when a code generator is used. The *real* reason to *avoid* `JNIEnv::NewObject()` whenever possible is the [Java Activation][3] scenario, summarized as the "are you sure you want to do this?" [^1] scenario of invoking a virtual method from the constructor: class Base { public Base() { VirtualMethod(); } public virtual void VirtualMethod() {} } class Derived : Base { public override void VirtualMethod() {} } Java and C# are identical here: when a constructor invokes a virtual method, the most derived method implementation is used, which will occur *before* the constructor of the derived type has *started* execution. (With lots of quibbling about field initializers…) Thus, assume you have a Java `CallVirtualFromConstructorBase` type, which has its constructor Do The Wrong Thing™ and invoke a virtual method from the constructor, and that method is overridden in C#? // Java public class CallVirtualFromConstructorBase { public CallVirtualFromConstructorBase(int value) { calledFromConstructor(value); } public void calledFromConstructor(int value) { } } // C# public class CallVirtualFromConstructorDerived : CallVirtualFromConstructorBase { public CallVirtualFromConstructorDerived(int value) : base (value) { } public override void CalledFromConstructor(int value) { } } What happens with: var p = new CallVirtualFromConstructorDerived(42); The answer depends on whether or not `JNIEnv::NewObject()` is used. If `JNIEnv::NewObject()` is *not* used (the default!) 1. `CallVirtualFromConstructorDerived(int)` constructor begins execution, immediately calls `base(value)`. 2. `CallVirtualFromConstructorBase(int)` constructor runs, uses `JNIEnv::AllocObject()` to *create* (but not construct!) Java `CallVirtualFromConstructorDerived` instance. 3. `JavaObject.Construct(ref JniObjectReference, JniObjectReferenceOptions)` invoked, creating a mapping between the C# instance created in (1) and the Java instance created in (2). 4. `CallVirtualFromConstructorBase(int)` C# constructor calls `JniPeerMembers.InstanceMethods.FinishGenericCreateInstance()`, which eventually invokes `JNIEnv::CallNonvirtualVoidMethod()` with the Java `CallVirtualFromConstructorDerived(int)` ctor. 5. Java `CallVirtualFromConstructorDerived(int)` constructor invokes Java `CallVirtualFromConstructorBase(int)` constructor, which invokes `CallVirtualFromConstructorDerived.calledFromConstructor()`. 6. Marshal method (356485e) for `CallVirtualFromConstructorBase.CalledFromConstructor()` invoked, *immediately* calls `JniRuntime.JniValueManager.GetPeer()` (e288589) to obtain an instance upon which to invoke `.CalledFromConstructor()`, finds the instance mapping from (3), invokes `CallVirtualFromConstructorDerived.CalledFromConstructor()` override. 7. Marshal Method for `CalledFromConstructor()` returns, Java `CallVirtualFromConstructorBase(int)` constructor finishes, Java `CallVirtualFromConstructorDerived(int)` constructor finishes, `JNIEnv::CallNonvirtualVoidMethod()` finishes. 8. `CallVirtualFromConstructorDerived` instance finishes construction. If `JNIEnv::NewObject()` is used: 1. `CallVirtualFromConstructorDerived(int)` constructor begins execution, immediately calls `base(value)`. Note that this is the first created `CallVirtualFromConstructorDerived` instance, but it hasn't been registered yet. 2. `CallVirtualFromConstructorBase(int)` constructor runs, uses `JNIEnv::NewObject()` to construct Java `CallVirtualFromConstructorDerived` instance. 3. `JNIEnv::NewObject()` invokes Java `CallVirtualFromConstructorDerived(int)` constructor, which invokes `CallVirtualFromConstructorBase(int)` constructor, which invokes `CallVirtualFromConstructorDerived.calledFromConstructor()`. 4. Marshal method (356485e) for `CallVirtualFromConstructorBase.CalledFromConstructor()` invoked, *immediately* calls `JniRuntime.JniValueManager.GetPeer()` (e288589) to obtain an instance upon which to invoke `.CalledFromConstructor()`. Here is where things go "off the rails" compared to the `JNIEnv::AllocObject()` code path: There is no such instance -- we're still in the middle of constructing it! -- so we look for an "activation constructor". 5. `CallVirtualFromConstructorDerived(ref JniObjectReference, JniObjectReferenceOptions)` activation constructor executed. This is the *second* `CallVirtualFromConstructorDerived` instance created, and registers a mapping from the Java instance that we started constructing in (3) to what we'll call the "activation intermediary". The activation intermediary instance is marked as "Replaceable". 6. `CallVirtualFromConstructorDerived.CalledFromConstructor()` method override invoked on the activation intermediary. 7. Marshal Method for `CalledFromConstructor()` returns, Java `CallVirtualFromConstructorBase(int)` constructor finishes, Java `CallVirtualFromConstructorDerived(int)` constructor finishes, `JNIEnv::NewObject()` returns instance. 8. C# `CallVirtualFromConstructorBase(int)` constructor calls `JavaObject.Construct(ref JniObjectReference, JniObjectReferenceOptions)`, to create a mapping between (3) and (1). In .NET for Android, this causes the C# instance created in (1) to *replace* the C# instance created in (5), which allows "Replaceable" instance to be replaced. In dotnet/java-interop, this replacement *didn't* happen, which meant that `ValueManager.PeekPeer(p.PeerReference)` would return the activation intermediary, *not* `p`, which confuses everyone. 9. `CallVirtualFromConstructorDerived` instance finishes construction. For awhile, dotnet/java-interop did not fully support this scenario around `JNIEnv::NewObject()`. Additionally, support for using `JNIEnv::NewObject()` as part of `JniPeerMembers.JniInstanceMethods.StartCreateInstance()` was *removed* in dec35f5. Which brings us to dotnet/android#9862: where there is an observed "race condition" around `Android.App.Application` subclass creation. *Two* instances of `AndroidApp` were created, one from the "normal" app startup: at crc647fae2f69c19dcd0d.AndroidApp.n_onCreate(Native Method) at crc647fae2f69c19dcd0d.AndroidApp.onCreate(AndroidApp.java:25) at android.app.Instrumentation.callApplicationOnCreate(Instrumentation.java:1316) and another from an `androidx.work.WorkerFactory`: at mono.android.TypeManager.n_activate(Native Method) at mono.android.TypeManager.Activate(TypeManager.java:7) at crc647fae2f69c19dcd0d.SyncWorker.<init>(SyncWorker.java:23) at java.lang.reflect.Constructor.newInstance0(Native Method) at java.lang.reflect.Constructor.newInstance(Constructor.java:343) at androidx.work.WorkerFactory.createWorkerWithDefaultFallback(WorkerFactory.java:95) However, what was odd about this "race condition" was that the *second* instance created would reliably win! Further investigation suggested that this was less of a "race condition" and more a bug in `AndroidValueManager`, wherein when "Replaceable" instances were created, an existing instance would *always* be replaced, even if the new instance was also Replaceable! This feels bananas; yes, Replaceable should be replaceable, but the expectation was that it would be replaced by *non*-Replaceable instances, not just any instance that came along later. Update `JniRuntimeJniValueManagerContract` to add a new `CreatePeer_ReplaceableDoesNotReplace()` test to codify the desired semantic that Replaceable instances do not replace Replaceable instances. Surprisingly, this new test did not fail on java-interop, as `ManagedValueManager.AddPeer()` bails early when `PeekPeer()` finds a value, while `AndroidValueManager.AddPeer()` does not bail early. An obvious fix for `CreatePeer_ReplaceableDoesNotReplace()` within dotnet/android would be to adopt the "`AddPeer()` calls `PeekPeer()`" logic from java-interop. The problem is that doing so breaks [`ObjectTest.JnienvCreateInstance_RegistersMultipleInstances()`][5], as seen in dotnet/android#10004! `JnienvCreateInstance_RegistersMultipleInstances()` in turn fails when `PeekPeer()` is used because follows the `JNIEnv::NewObject()` [construction codepath][6]! public CreateInstance_OverrideAbsListView_Adapter (Context context) : base ( JNIEnv.CreateInstance ( JcwType, "(Landroid/content/Context;)V", new JValue (context)), JniHandleOwnership.TransferLocalRef) { AdapterValue = new ArrayAdapter (context, 0); } as `JNIEnv.CreateInstance()` uses `JNIEnv.NewObject()`. We thus have a conundrum: how do we fix *both* `CreatePeer_ReplaceableDoesNotReplace()` *and* `JnienvCreateInstance_RegistersMultipleInstances()`? The answer is to add proper support for the `JNIEnv::NewObject()` construction scenario to dotnet/java-interop, which in turn requires "lowering" the setting of `.Replaceable`. Previously, we would set `.Replaceable` *after* the activation constructor was invoked: // dotnet/android TypeManager.CreateInstance(), paraphrasing var result = CreateProxy (type, handle, transfer); result.SetJniManagedPeerState (JniManagedPeerStates.Replaceable | JniManagedPeerStates.Activatable); return result; This is *too late*, as during execution of the activation constructor, the instance thinks it *isn't* replaceable, and thus creation of a new instance via the activation constructor will replace an already existing replaceable instance; it's not until *after* the constructor finished executing that we'd set `.Replaceable`. To fix this, update `JniRuntime.JniValueManager.TryCreatePeerInstance()` to first create an *uninitialized* instance, set `.Replaceable`, and *then* invoke the activation constructor. This allows `JniRuntime.JniValueManager.AddPeer()` to check to see if the new value is also replaceable, and ignore the replacement if appropriate. This in turn requires replacing: partial class /* JniRuntime. */ JniValueManager { protected virtual IJavaPeerable? TryCreatePeer () ref JniObjectReference reference, JniObjectReferenceOptions options, Type type); } with: partial class /* JniRuntime. */ JniValueManager { protected virtual bool TryConstructPeer () IJavaPeerable self, ref JniObjectReference reference, JniObjectReferenceOptions options, Type type); } This is fine because we haven't shipped `TryCreatePeer()` in a stable release yet. [^1]: See also [Framework Design Guidelines > Constructor Design][4]: > ❌ AVOID calling virtual members on an object inside its constructor. > Calling a virtual member will cause the most derived override to be > called, even if the constructor of the most derived type has not > been fully run yet. [0]: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#NewObject [1]: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#AllocObject [2]: https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#CallNonvirtual_type_Method_routines [3]: https://learn.microsoft.com/en-us/previous-versions/xamarin/android/internals/architecture#java-activation [4]: https://learn.microsoft.com/dotnet/standard/design-guidelines/constructor [5]: https://github.com/dotnet/android/blob/9ad492a42b384519a8b1f1987adae82335536d9c/tests/Mono.Android-Tests/Mono.Android-Tests/Java.Lang/ObjectTest.cs#L68-L79 [6]: https://github.com/dotnet/android/blob/9ad492a42b384519a8b1f1987adae82335536d9c/tests/Mono.Android-Tests/Mono.Android-Tests/Java.Lang/ObjectTest.cs#L151-L160
Android framework version
net8.0-android
Affected platform version
VS 2022 17.12.4
Description
We spent the last few weeks trying to track down a mysterious bug where two instances of
Android.App.Application
.NET proxy objects are created. So far we were not able to reproduce the issue locally but it happens quite consistently for thousands of our customers.Finally we were able to get stack traces from the constructor where the second instance is created.
The usual first instance gets created with this stack trace:
The second instance comes from a worker (subclass of
AndroidX.Work.Worker
):Notably, the first instance exists and it's not collected on the .NET side (pinned through a static variable). This suggests that there's a race condition that could lead to creation of two proxy objects.
Steps to Reproduce
#9862 (comment)
Did you find any workaround?
No response
Relevant log output
The text was updated successfully, but these errors were encountered: