ScrollObject::Update called after Owner is deleted

edited August 2015 in Help request
Hi everyone,

In orx 1.6 release, I've been struggling with a nasty bug that occasionally crashes my game. I've chased the bug down to the following sequence of events.

I have a ScoreDisplay (scroll)object and it has a IncreaseScore method, which creates ScorePiece (scroll)objects that fly towards the ScoreDisplay. The IncreaseScore creates the ScorePiece objects as an owned child of the ScoreDisplay.

The flying towards the display is handled in ScorePiece::Update, where I query the position of the ScoreDisplay by "GetPos"ing its "Owner".

The crash happens when I delete the ScoreDisplay while there are "ScorePiece"s flying towards it. I'm printing out a lot of stuff for debug purposes and I can see that while the ScoreDisplay is being deleted, the "ScorePiece"s are still among its Owned children and their Refcount is 1. All the other children of the ScoreDisplay get deleted right after the ScoreDisplay but not the "ScorePiece"s. I can also see that in the last ScorePiece::Update that causes the crash, its GetOwner returns null.

I could dodge this bug by simply checking in ScorePiece::Update whether the owner is null, but the trouble is there are many places in the project where I assume no ScrollObject::Update would be called after the owner of an object is deleted. Is that a wrong assumption? If not, why could this be happening and how can I debug it?

Note that I've considered the possibility of memory corruption, but I've run the game under Valgrind and it reports no errors. I can also consistently reproduce the bug under different build configurations (release/debug) and I really doubt a memory corruption could manifest itself so consistently.

Thanks!

Comments

  • edited August 2015
    enobayram, is this an object delete call or a setlifetime call? Not sure if it's the same in scroll, but can you set the lifetime instead of delete?
  • edited August 2015
    Thanks for the quick reply sausage,

    As far as I know, I never make delete calls, I always use SetLifeTime(0). I've just run a quick search on my source code and I never use orxObject_Delete
  • edited August 2015
    Sounds like a bug then maybe in the scroll layer. SetLifeTime is thread safe so you should have been ok.
  • edited August 2015
    That's what I suspected as well. My assumption that the owner should be alive during an Update is reasonable, isn't it?
  • edited August 2015
    That is definitely a valid assumption!

    Now I remember doing some bug fixing in the object hierarchy code with respect to deletion a while ago. So that issue might have been already fixed since the 1.6 release and I'd recommend using the hg latest instead (for both orx and scroll).

    That being said, you will probably have to change a couple of function calls by doing so, but it shouldn't be more than a 5-10 min job (can't remember the detail but it should be in the hg logs if you're curious).
  • edited August 2015
    Thanks iarwain,

    I've just confirmed that the issue persists in the nightly build of orx and the latest commit of Scroll. I've also confirmed that the bug existed in my game for a long time, under very different conditions, so I guess this further reduces the chances of a memory corruption.

    Maybe I should note at this point that I had to fork Scroll to fit it in my build setup. Scroll carrying around its own of orx headers did not suit me, so I've deleted them and changed the scroll headers to use the external headers. That's all I changed.
  • edited August 2015
    That being said, I re-read the whole post - sorry, brain's not completely awake yet :).

    It is possible for a child object to be deleted a frame later than its owner. It depends where in the list it's stored as we keep the list ordered in a way that reflects their actual location in memory to minimize cache misses as much as possible.
    So it is indeed possible to have an object that is "pending deletion" while its ScrollObtject::Update() function is called. In which case its lifetime will be set to 0.

    As you pointed out, your work around is simply to check the owner before using it, or, and that's usually what I do most of the time in my projects, instead of storing direct pointers, I store their GUID.
    I then retrieve objects through their GUID when I need them (either through orx's orxStructure_Get() or directly with Scroll's GetObject()), which will either return a valid object or orxNULL if that object doesn't exist anymore.
  • edited August 2015
    Ooops, that really violates a lot of assumptions I've made all around the codebase, I think I need to inspect each line of code with this in mind. I do use the orxStructure_Get feature when there's no owner-child relationship, I've even written a weak pointer around it, but the owner disappearing before the child is likely to violate some of my logical assumptions too.

    As a side note, doesn't it cause single frame visual glitches this way?

    Thanks for clearing the issue out.
  • edited August 2015
    Well, it all depends what you call a glitch. =)

    None of the structures linked to the object will get updated by orx as soon as its LifeTime has reached 0, however it might be rendered for the current frame.

    The issue you're experiencing is mostly due to how Scroll handles object updates. This isn't a feature built in orx, so it loops over all the objects itself to then call the Update method on them, but this process is disconnected . A simple test in Scroll to prevent calling Update on objects that have a LifeTime set to 0 should do the trick for you and comply with your original assumption.
  • edited August 2015
    Great! Would that also work in a nested ownership scenario as well? When an object gets deleted, does the lifetime of the entire ownership tree become zero in the same frame, or does the deletion (and lifetime setting) propagate frame-by-frame across the tree layers? If that works, a simple lifetime check in Scroll would probably solve most of my problems, though there's still the issue that the fact that an orxOBJECT is alive doesn't mean its scroll object is still logically "alive". After all, it could be referring to a parent who's dead. I guess I could solve this by modifying my orxStructure_Get based weak pointer to treat an object as "dead" when its lifetime is zero.

    Thanks iarwain :)
  • edited August 2015
    Well, there are two scenarii to consider in that case: either if the whole hierarchy is created at once, using the ChildList property, or if the objects have been created separately in time and then reassembled as a big hierarchy.

    In the first case, the whole hierarchy will disappear in the same frame, no delay whatsoever as all the objects will be correctly ordered for the deletion. In the second case, as children can possibly be earlier in the object list than the parent, deletion will be delayed 1 frame everytime this happens.
    So, worst case scenario, it can go up to N frames to delete a whole hierarchy whose deepest depth is N.

    To prevent this from happening, you can modify ScrollObject::SetLifeTime() to recursively set the LifeTime of the whole hierarchy instead, and that should fix any deletion delay as all the objects would then disappear in the same frame.

    I'm not sure which issues you would have by accessing "pending deletion" objects through your smart pointers. Those object will disappear during the next object update inside orx, before any of their structures are updated, so the side effects should be minimal, shouldn't they?
  • edited August 2015
    Last precision, unless you're handling an event, you can use Scroll::DeleteObject(). Either you're not inside a Scroll update, and then the deletion will happen right away, or Scroll wil recognize that the object cannot be deleted right now and will call ScrollObject::SetLifeTime(0) on it instead. That won't change anything related to the whole hierarchy potential deletion delay, though.

    As a FYI, children object deletion used to be done instantly when a parent was deleted but that brought up another issue that I can't remember at the moment. I'm sure one could find it by perusing through the changelog or the hg history, with enough motivation. ;)
  • edited August 2015
    iarwain wrote:
    I'm not sure which issues you would have by accessing "pending deletion" objects through your smart pointers. Those object will disappear during the next object update inside orx, before any of their structures are updated, so the side effects should be minimal, shouldn't they?
    Well, in theory, the object is in an inconsistent state, since it assumes "If I'm alive, my parent must also be alive". Just like in a regular C++ object, in which a class member can always assume that the parent instance would be alive as long as the member itself is alive (unless we're in a destructor). So the child object might try to access its parent to handle a method call (which is safe under the given assumption).

    A simple example of this would be a dialog object which contains an "OK" button object. When clicked, the button object might invoke a click handler that does something with the dialog object (like changing its text). Under normal circumstances, you could assume in the button click handler code that if the button is alive (since it received a click), the dialog must also be alive.
  • edited August 2015
    Multi-frame deletion is concerning :/

    I know these are all edge-cases, and I'm probably manufacturing concerns that will never happen. It's just that I'm trying to make some simple assumptions that would help with reasoning about the whole program so that bugs can't hide in corners.

    How about I run a loop at the start of each frame (Scroll::run) and handle orxObject_Delete of all the descendants of objects pending deletion. I don't have many objects so I don't mind the performance loss, would that make sure that all objects die at the same frame with their parents?
  • edited August 2015
    enobayram wrote:
    Well, in theory, the object is in an inconsistent state, since it assumes "If I'm alive, my parent must also be alive". Just like in a regular C++ object, in which a class member can always assume that the parent instance would be alive as long as the member itself is alive (unless we're in a destructor). So the child object might try to access its parent to handle a method call (which is safe under the given assumption).

    A simple example of this would be a dialog object which contains an "OK" button object. When clicked, the button object might invoke a click handler that does something with the dialog object (like changing its text). Under normal circumstances, you could assume in the button click handler code that if the button is alive (since it received a click), the dialog must also be alive.

    I guess I got confused by your notion of "alive". :) Also I'm not sure one should make a parallel between object ownership/hierarchy and C++ classes. In the first case, it's about two separate entities whereas in the second case it's the same instance (akin to orxSTRUCTURE and its derivatives).

    I see two cases with objects:

    - the owner has been deleted -> in which case your smart pointer will return null and you know that there's no processing that needs to happen

    - the owner has a life time of 0 (pending deletion but not dead) -> that's the one I was referring to, you can manipulate the owner which shouldn't have any serious side effect as it'll get deleted before its next internal (orx) update anyway. Worst case scenario would be a 1-frame display of the wrong text. I could easily prevent orx from displaying any 0-LifeTime object to fix this if need be.

    I'm not saying there isn't any case where undesired behaviour would come out of this, it's just that I can't think of any at the moment.

    In the specific case of UI, I handle all the menu communication/interaction specifics through orxINPUTs (virtual, not bound to any physical keys/buttons) and timelines. That allows me to keep the whole UI code under 20 lines of C/C++ (which is the case in Little Cells, for example) and everything else is data driven.
  • edited August 2015
    enobayram wrote:
    Multi-frame deletion is concerning :/

    I know these are all edge-cases, and I'm probably manufacturing concerns that will never happen. It's just that I'm trying to make some simple assumptions that would help with reasoning about the whole program so that bugs can't hide in corners.

    How about I run a loop at the start of each frame (Scroll::run) and handle orxObject_Delete of all the descendants of objects pending deletion. I don't have many objects so I don't mind the performance loss, would that make sure that all objects die at the same frame with their parents?

    Well it depends, how often do you build hierarchies programmatically? It'll only happen in these cases.
    Which would be fixed either by the way you propose or with a recursive SetLifeTime(0), which I personally prefer, but it's definitely up to you. :)
  • edited August 2015
    iarwain wrote:
    Well it depends, how often do you build hierarchies programmatically? It'll only happen in these cases.
    Which would be fixed either by the way you propose or with a recursive SetLifeTime(0), which I personally prefer, but it's definitely up to you. :)

    I build them quite often, especially since I've been using the behavior trees. For instance, the event of a group of birds coming together, doing something and leaving together is modeled as a behavior of the level. All the objects created by this event are the children of the level.
    I see two cases with objects:

    - the owner has been deleted -> in which case your smart pointer will return null and you know that there's no processing that needs to happen

    - the owner has a life time of 0 (pending deletion but not dead) -> that's the one I was referring to, you can manipulate the owner which shouldn't have any serious side effect as it'll get deleted before its next internal (orx) update anyway. Worst case scenario would be a 1-frame display of the wrong text. I could easily prevent orx from displaying any 0-LifeTime object to fix this if need be.

    How about a deeply nested hierarchy in which neither the object nor its parent has any trace of deletion yet? Say an ancestor 4 layers up is being deleted at this frame, so the ancestor at the 3rd layer has lifetime 0, but the owner and the object doesn't know about that yet. In order to handle this, my smart pointer needs to check parents recursively every time it's accessed, which might be a big performance hit.
    In the specific case of UI, I handle all the menu communication/interaction specifics through orxINPUTs (virtual, not bound to any physical keys/buttons) and timelines. That allows me to keep the whole UI code under 20 lines of C/C++ (which is the case in Little Cells, for example) and everything else is data driven.
    That's a nice idea :) I'll sure look for ways to use this from now on.
  • edited August 2015
    enobayram wrote:
    I build them quite often, especially since I've been using the behavior trees. For instance, the event of a group of birds coming together, doing something and leaving together is modeled as a behavior of the level. All the objects created by this event are the children of the level.

    In that case yep, better to be prepared. :)
    I often do change ownership this way (like adding particles to a container for quick cleanup).
    How about a deeply nested hierarchy in which neither the object nor its parent has any trace of deletion yet? Say an ancestor 4 layers up is being deleted at this frame, so the ancestor at the 3rd layer has lifetime 0, but the owner and the object doesn't know about that yet. In order to handle this, my smart pointer needs to check parents recursively every time it's accessed, which might be a big performance hit.

    I guess I just never had the case. All my deep hierarchies are config-made, so their deletion happens within a single frame. When I compose hierarchies in code, it never goes deeper than a single level. Also I don't think I've ever had children making any assumptions about owners, the few times I have this kind os assumptions, it's always from owner/parent to the way down. When I need communication to go up, I've always used timelines and/or inputs just to get the flexibility for the designer to be able to change hierarchies at will, as usually they don't code.

    In any case, having those two simples changes should cover all your issues, I believe:

    - Add the test in ScrollBase::Update() to not update objects that have a 0-LifeTime. I'll probably add it to Scroll in the end.

    - Add recursivity to ScrollObject::SetLifeTime(), assuming that's what you're using, otherwise your option of having a loop check each frame works too.

    Or did I miss some cases?
  • edited August 2015
    iarwain wrote:
    Also I don't think I've ever had children making any assumptions about owners, the few times I have this kind os assumptions, it's always from owner/parent to the way down. When I need communication to go up, I've always used timelines and/or inputs just to get the flexibility for the designer to be able to change hierarchies at will, as usually they don't code.

    You're right, I agree that trying to minimize direct child->owner reliance is a nice goal on its own. Though some objects exist in a context and they'd be meaningless without it. Like the birds in my game, which try to find food pieces, so they need to access the physical world that the level owns, or the crows that try to pester the sparrows in the same level. It's logically safe to assume they would disappear with the context they're in, so I'd rather avoid checking if the context still exists all around the code.
    In any case, having those two simples changes should cover all your issues, I believe:
    ...
    Or did I miss some cases?

    Thanks a lot iarwain, I'll definitely implement these two. I guess adding the test to ScrollBase::Update (and the smart pointer) is a necessity even if I guaranteed that the hierarchy gets deleted in the same frame, since an object from the hierarchy could remain undeleted with lifetime=0 due to someone holding a reference to it (so its refcount>1).
  • edited August 2015
    enobayram wrote:
    You're right, I agree that trying to minimize direct child->owner reliance is a nice goal on its own. Though some objects exist in a context and they'd be meaningless without it. Like the birds in my game, which try to find food pieces, so they need to access the physical world that the level owns, or the crows that try to pester the sparrows in the same level. It's logically safe to assume they would disappear with the context they're in, so I'd rather avoid checking if the context still exists all around the code.

    Totally. That being said, when I have unique objects (like a scene or, in your case the flock (owner) of sparrows or the owner of the food pieces), I store their GUID in a runtime-only config section. Then I can retrieve them from anywhere (including from timelines). The pattern I'm using is this one:
    [T-StoreID]
    0 = > Object.GetName ^ #
        > Config.GetValue < ID #
          Config.SetValue RunTime < ^
    
    [Scene]
    ; ...
    TrackList = T-StoreID
    ID = Scene
    
    [FlockOfSparrows]
    ; ...
    TrackList = T-StoreID
    ID = Flock
    

    And now I can retrieve them either in code:
    // Pushes runtime section
    orxConfig_PushSection("RunTime");
    
    // Gets flock
    ScrollObject *poFlock = GetObject(orxConfig_GetU64("Flock"));
    
    // Pops config section
    orxConfig_PopSection();
    

    or in config:
    0 = > Config.GetValue RunTime Flock # Object.SetLifeTime < 0
    

    I use the same system to handle menus (using CurrentMenu/PreviousMenu IDs, but this time I store the previous one directly inside the config section of the current menu, akin to a bread crumb), for example.
    Thanks a lot iarwain, I'll definitely implement these two. I guess adding the test to ScrollBase::Update (and the smart pointer) is a necessity even if I guaranteed that the hierarchy gets deleted in the same frame, since an object from the hierarchy could remain undeleted with lifetime=0 due to someone holding a reference to it (so its refcount>1).

    Lemme know if that wasn't enough to make it work on your side. I'll add the check inside ScrollBase in the coming days on my side as well.
  • edited August 2015
    iarwain wrote:
    Totally. That being said, when I have unique objects (like a scene or, in your case the flock (owner) of sparrows or the owner of the food pieces), I store their GUID in a runtime-only config section. Then I can retrieve them from anywhere (including from timelines). The pattern I'm using is this one:
    ...
    Wow, a very smart use of timelines. this makes them behave like constructors! Speaking of which, does the step 0 of the timeline run right after object creation? or sometime in the same frame? I'm curious about the many smart ways you must be utilizing orx's various features.
    Lemme know if that wasn't enough to make it work on your side. I'll add the check inside ScrollBase in the coming days on my side as well.
    Adding the check solved my current bug as it manifested itself in a single step hierarchy, but I've also added the code to make sure the entire hierarchy receives SetLifeTime 0 to ensure correctness in the future. I didn't achieve this through ScrollObject::SetLifeTime since I can't be sure that's the only way objects get deleted. Some could have a finite lifetime at creation and some others could be deleted by timelines etc. I'm currently looping through all the objects at the end of every frame, but I think I'll switch to an object deletion handler that sets the lifetime of the ancestors of the deleted object.
  • edited August 2015
    enobayram wrote:
    Wow, a very smart use of timelines. this makes them behave like constructors! Speaking of which, does the step 0 of the timeline run right after object creation? or sometime in the same frame? I'm curious about the many smart ways you must be utilizing orx's various features.

    Just came back from vacation, sorry for the late reply!

    The step 0 can either run during the next object update (default mode) or right after the object has been created if the config property Immediate = true has been set for the track.
    I'd love to hear of others' somewhat-hijacked uses of config/timelines.
Sign In or Register to comment.