orx on iPhone - fps at ~45

edited January 2012 in Help request
Hello community!

I noticed that orx performance on iPhone 4 is really bad in my game. I have 4 spawners, and at most 20 active objects on the screen. These active objects have textures attached but the textures are all in one big sprite sheet (so i assume opengl shouldn't get crazy about switching contexts too much).
All active objects are scrolling (just by setting Speed to (100, 0, 0) at spawn time).

Why is the FPS so bad? It's around 45 most of the time and goes a little bit up if i zoom out my scene (making even more objects be rendered on screen). Might this be a fill rate issue?
EDIT: actually at a certain zoom level when all objects become relatively small on the screen the fps stays at solid 60. So it only happens when the screen is filled with textures..

From your experience.. did you ever had solid 60 fps in your iPhone games with orx?

Thanks in advance!
Alex

Comments

  • edited January 2012
    Hi!

    That sounds pretty low, indeed. With 20 objects you should obtain a solid 60 FPS, at least that's what I get on my iPod 3G.

    I assume you're speaking of the release build and not the debug one.
    The best way to see what's going on, CPU-wise, is to turn on the profiler (using the profiler build of orx + Render.ShowProfiler set to true).
    If it's a fill-rate issue, you'll see all the time spent on the GPU, meaning the longest bar will be the orxRender_Swap() one. Ignore the orxRender_ShowProfiler() one, of course.

    Let me know if you find any obvious issue.
  • edited January 2012
    Hi Iarwain,

    So i rebuilt and started with the profiler. I couldn't see any obvious things so I'm attaching some screenshots. I don't see the bar of orxDisplay_Swap change a lot in size even if I disable/enable certain parts of the game. I know that I had solid 60 FPS on just background and on background with two animated characters. In the profile build, of course, the maximum fps is around 35.

    Tell me if you see anything interesting in this screenshots:

    p1.png
    p2.png
    p3.png
    p4.png

    TIA,
    Alex
  • edited January 2012
    What we've learned so far is that you're definitely not CPU-bound, the whole simulation taking at worst 0.75ms for updating all your objects/spawners (screenshot #2) and the CPU-side rendering part is taking 1.95ms at worst (excluding the profiler rendering which is more costly, but still within reason with a worst case of 3.63ms).

    That also means that the cost doesn't come from context switching or CPU-GPU buffer transfers as it's part of the whole 1.95ms of orxRender_RenderViewport.

    That leaves us with something really bad happening on the GPU. I'm afraid you'll have to use the new XCode4 OpenGL profiling tools to see exactly what's going on.

    Really quick checks: for your background, are you disabling blending (BlendMode = none) and for any non-transparent object/graphic?
    Do you have many overdraw with alpha blending? How big are your textures?
  • edited January 2012
    I am not setting BlendMode to none. I will try that.
    The textures are 512x512 sprite sheets. I have one sprite sheet for each character (for animation) and one 512x512 for all other objects on screen. The background, however, is a separate 320x480 texture.
    And no, I have almost zero overdraw.. I'm cutting at the edges of the actual sprite so I always have minimal overdraw.

    Am i correct that you are actually loading the big texture (sprite sheet) as GLTexture first and then use that along with texture coords to get the sprite out of the sheet? I mean, you are not creating a GLTexture for each sprite in the sheet, right? Just a guess though..

    I will try the Xcode profiler thingy and let you know.

    Thanks for your time!
    Alex
  • edited January 2012
    Mmh, small textures then, so I have nothing (beside the BlendMode, but for the background only don't expect to recover 15FPS at once). Are you using any custom shader, per chance?

    And yes, you're correct, every bitmap becomes a single texture stored on GPU. We're using UV coordinates for defining sub-objects. Objects are batch rendered (indexed triangle strip) and we sort them by depth, texture, blend mode and smoothing mode first so as to minimize the number of context switching. Vertex buffers are interleaved (vertex coord, UV, color) for better locality.

    If an architecture doesn't support non-power of two textures, we create a power of two one that will contain the original texture and update the UV coords accordingly.

    Let me know if you find anything with the GL profiler that comes with XCode4 as it's not available with XCode3. I can do some regression testing over the last few months to see if I'm missing something obvious that would have found its way in the codebase, probably next week end.

    And no worries for the time, I'm glad if I can help!
  • edited January 2012
    Iarwain, the profiler didn't tell me anything useful. It did tell me, however, that orx has a lot of redundant calls ;)
    I also could answer my own question regarding GLTexture and binding. I could see in the trace that I only have ~7 GLTextures which seems to be correct.

    Any ideas?

    EDIT: BlendMode didn't help either..

    Cheers,
    Alex
  • edited January 2012
    Oh, which calls? I'd be happy to do some cleaning there! :)

    Yep, 7 textures isn't much at all, there are some internal ones too (screen backup for viewport/screen shader, internal font, pixel).

    Mmh, there must be a way to know what the GPU is doing with some time estimation. I thought there was a tool added to XCode4 for that, I'll check that tonight.

    Cheers!
  • edited January 2012
    I found the solution. Not sure why it's a problem though (TapMania and other games work fine on iPhone4 with 320x480 output) but changing Display.ScreenWidth/Height to native retina size completely changes everything! Then even if I hardcode (in config :P ) a 2x zoom for my camera and the picture goes back to normal it has a solid 59-60 fps at all times.

    EDIT: I actually had to change the Frustum sizes so they match the display size too. If I only change the display size but not the frustum it has no effect and the fps is still bad.

    Now the question is WHY???!

    HTH,
    Alex
  • edited January 2012
    O_o

    So, wait, let's resume it. :)

    Changing the display size on iOS shouldn't do anything as orx will use the devide native resolution and replace the config value of Display.ScreenWidth/Height/Depth by the actual ones.
    However your camera frustum matters but I still don't understand why any scaling would kill the FPS this way!

    What was your previous values for the camera frustum and what are the ones that work? Is changing the frustum values enough to make the FPS work nicely? If so, I'll investigate what's going on as really, I don't understand, we simply change UV coord values to simulate scaling, that's about it. And it's done on the CPU side.

    Also, let me know for the redundant calls, I'm getting curious and it's going to be an itch that I'll need to scratch before being able to get some sleep. ;)
  • edited January 2012
    Here is my old config:
    [Display]
    ScreenWidth   = 480
    ScreenHeight  = 320
    
    [Clock]
    MainClockFrequency = 60
    
    [MainViewport]
    Camera              = Camera
    
    [UICamera@Camera]
    ParentCamera    = Camera
    FrustumWidth    = 320
    FrustumHeight   = 480
    Rotation             = -90
    
    [Camera]
    FrustumWidth    = 480
    FrustumHeight   = 320
    Rotation             = 90
    FrustumNear      = 0.0
    FrustumFar        = 2.0
    

    And this is the one which seems to give much better fps:
    [Display]
    ScreenWidth   = 960
    ScreenHeight  = 640
    
    [Clock]
    MainClockFrequency = 60
    
    [MainViewport]
    Camera                = Camera
    BackgroundColor = (110, 120, 120)
    
    [UICamera@Camera]
    ParentCamera    = Camera
    FrustumWidth    = 640
    FrustumHeight   = 960
    Rotation             = -90
    
    [Camera]
    FrustumWidth    = 960
    FrustumHeight   = 640
    Rotation            = 90
    FrustumNear     = 0.0
    FrustumFar        = 2.0
    Zoom                 = 2.0
    

    Note the Zoom on the main camera. This zoom just scales my 320x480 textures to match the 2x size of retina hence making everything look as it was before changing the frustum and display width/height. It's sad but I can't use a static Zoom for the camera because I'm calculating the zoom in code (to zoom the scene as you remember from my other thread on this forum :P ).

    I'll gather some info from the profiler and post them in 10 minutes or so.

    Cheers!
    Alex
  • edited January 2012
    Actually, after looking at the profiler again, I think that there will be no way to get rid of these redundant calls other than saving the current state and testing against it before making the gl call.. but that's stupid unless you are doing context switching and other really expensive things :D

    Here are two screenshots which will give you an idea:

    111.png
    222.png

    HTH,
    Alex
  • edited January 2012
    Thanks for all the details. For the config part, only touching the camera frustum should be enough as changing the Display section shouldn't have any effect. I'm still very curious of what's going on behind the scene...

    Ah yeah, those calls. Well, actually they've been added willingly and weren't there at first. It's simply that orx provides rendering hook for people to roll out their own rendering if need be (and that includes 3D rendering) and to make sure that the settings are set correctly when orx does its own rendering.

    I'm not exactly sure on how to avoid that as it'd be platform specific and wouldn't fit well in orx's more generic philosophy.
    It's not very clean but at least the performance impact should be minimal if any. However I'll still see if I don't find a more elegant way of dealing with this.

    As for using VBO, sounds like a generic recommendation given the fact we mainly push quads. ;)

    I'm more curious about the mipmapping usage line though. What does it mean in this context? Is there more info when you click it?
  • edited January 2012
    There is no more info regarding mipmapping but i'm pretty sure that they just noticed a sprite (a bunch of) on the screen and also noticed that i actually scale them so the analyzer had to be smart enough to point out that we should consider using mipmapping to get better visual results ;)

    I noticed that changing Display settings wont affect anything. You are right. So it's the frustum which makes a big difference on my iPhone4 for some reason.

    I took the time (just in the past 45 min) to build my game for android 2.1+ using the non-native version. I ran it on an HTC device (don't know exactly which one but it's really powerful). What do you think? NO problems. 59-60 fps at both 320x480 and 640x960.. it just works. It's pretty ugly, yes.. because it scales a lot to match the native resolution. But it works and gives a decent fps.

    Hope to hear any ideas/suggestions soon :)

    Cheers!
    Alex
  • edited January 2012
    Hi Alex,

    godexsoft wrote:
    There is no more info regarding mipmapping but i'm pretty sure that they just noticed a sprite (a bunch of) on the screen and also noticed that i actually scale them so the analyzer had to be smart enough to point out that we should consider using mipmapping to get better visual results ;)

    I sure hope it's that and not detecting some weird mipmaps being computed somewhere for god knows which reason. :)
    I noticed that changing Display settings wont affect anything. You are right. So it's the frustum which makes a big difference on my iPhone4 for some reason.

    That still baffles me. There are virtually no reason why scaling up would kill the framerate that way as the device resolution is the same anyway, it shouldn't matter if texture have been scale up or not. More strange even, by adding a 2X zoom, you're basically doing the same thing the render plugin does when computing the scale factor, so in the end the vertex coords and UV coords should be the exact same one. Actually it's a good start for an investigation, I'll see that either tonight or this week end...
    I don't have access to a retina display, unfortunately, but I'll check to see if, by any chance, the simulator shows a similar issue.
    I took the time (just in the past 45 min) to build my game for android 2.1+ using the non-native version. I ran it on an HTC device (don't know exactly which one but it's really powerful). What do you think? NO problems. 59-60 fps at both 320x480 and 640x960.. it just works. It's pretty ugly, yes.. because it scales a lot to match the native resolution. But it works and gives a decent fps.

    That's a good news. It means that it's not a more general OpenGL ES issue with the way orx has been written (the Android display plugin being based on the iOS one, I'll check for any obvious difference).
    Hope to hear any ideas/suggestions soon :)

    Same here! ;) If I find anything tonight, I'll let you know.

    Cheers!
  • edited January 2012
    Unfortunately I couldn't repro this issue with the simulator in retina mode.

    Also, I reread what you posted here's what I think:
    [Clock]
    MainClockFrequency = 60
    

    First of all, you won't need this one and it's probably for the best if you don't have it as it's going to temper with your natural VSync aligned frames which already gives you 60FPS.
    [Camera]
    FrustumWidth    = 960
    FrustumHeight   = 640
    Rotation        = 90
    FrustumNear     = 0.0
    FrustumFar      = 2.0
    Zoom            = 2.0
    

    Ok, I have to admit that this one puzzles me. Orx uses the device natural orientation, which is Portrait. That means that the real display size should be:
    ScreenWidth = 640
    ScreenHeight = 960

    and not the contrary. When you query the screen size with orxDisplay_GetScreenSize(), what do you get, these numbers (640x960) or rather the contrary (960x640)?

    That also means that, with your camera settings, rendering shouldn't fill the viewport as it's going to try to fit the 960 of camera's width into the actual 640 of screen width, which should result in huge letter boxing on screen.
    The height of rendered image on screen should then be 640 * (640 / 960) = ~426 pixels out of the actual 960 ones of the physical display (hence adding (960 - 426) / 2 = 267 pixels wide black bars at the top and the bottom of the screen).
    I tried it here with the simulator, and that's exactly what I'm obtaining, but somehow this is not what you're getting, isn't it?

    My question is: why/how did you get orx to consider the native device being Landscape instead of Portrait? Adding something like UIInterfaceOrientation = UIInterfaceOrientationLandscapeLeft/Right shouldn't affect the way orx perceive the physical device.

    I'm curious. :)
  • edited January 2012
    Sorry for the confusion, I think I just sent you the wrong config as I was playing with values to see what is the difference. The latest version which works is:
    [UICamera@Camera]
    ParentCamera    = Camera
    FrustumWidth    = 960
    FrustumHeight   = 640
    Rotation        = -90
    
    [Camera]
    FrustumWidth    = 640
    FrustumHeight   = 960
    Rotation        = 90
    FrustumNear     = 0.0
    FrustumFar      = 2.0
    

    Thanks,
    Alex
  • edited January 2012
    Ah ok, that makes more sense. :)

    That still doesn't explain the FPS cut you experienced with a 320x480 camera though. Without access to a physical retina device on my side, I think it's going to be hard for me to find the actual problem.

    Also, I just checked in a proper NPOT texture support detection for iOS, it was broken as we were asking for the GL extensions in the thread before creating the thread context and the returned string was null. It shouldn't matter in your case as you're using POT textures, but still. :)

    I tried TapMania on my iPad. It was really nice, too bad there wasn't a native iPad display support!
  • edited January 2012
    I don't know why the 320x480 frustum makes such a difference. It's interesting. As i understand orx is using the native display size and then it is scaling everything according to the frustum/camera settings using UV coordinates. In tapmania, for example, I used 320x480 glOrtho in Tapmania so on retina it gets scaled by the OS, not by the engine. May that be the difference?

    I'm glad you liked tapmania. There is no iPad support because I never had an iPad back in the days of tapmania development :) Also, I'm too lazy.

    Tapmania is opensource (terrible code tho) too: sources

    Cheers!
    Alex
  • edited January 2012
    That's what's killing me: using glOrtho, setting up a smaller frustum or using a zoom should give the exact same UV coordinates in the end. There's no reason at all for the GPU to behave differently based on that only, especially in the last 2 cases. There must be a side effect I have yet to find happening somewhere. :)
Sign In or Register to comment.