My personal orx lazy performance test

jimjim
edited April 2013 in General discussions
Hello everyone,
Finally, I am done with my exams and everything :) so I started orx tutorial again. I changed 01_object tutorial and lazily tried to draw many objects as possible. I am using the demoiselle.png from the tutorial.

I drew objects using the following code. I dont know if it is the best or efficient way to do, but i just did it quickly using the following code :
int i;
for(i = 0 ; i < N_OBJECT ;i++)
    {
        orxVECTOR myVec = {-400 + rand()% 800, - 400 + rand()%800};
        orxOBJECT * j =  orxObject_CreateFromConfig ( "Object" );
        orxObject_SetPosition(j,&myVec);
    }

ORX test :
1. Result : 1000 objects
MSVC 2010 : 60 FPS, CPU on average 15-16%
Codelite 5.1 : 60 FPS, CPU on average 7-8%

2. Result : 5000 objects
MSVC 2010 : 14 FPS , CPU on average 17-18%
Codelite 5.1 : 14 FPS , CPU on average 8-10%

3. Result : 10000 objects
MSVC 2010 : 7 FPS , CPU on average 18-19%
Codelite 5.1 : 7 FPS , CPU on average 8-9%


I did similar things with another game engine which is cocos2d-x using following code, also dont know if its the best or efficient way to do:
for(int i = 0 ; i < N_OBJECT ;i++)
	{
    CCSprite * pSprite =  CCSprite::create("demoiselle.png");
    pSprite->setPosition(ccp(rand()%800, rand()%800));
    this->addChild(pSprite, 0);
	}
cocos2d-x test:
1. Result : 1000 objects
MSVC 2010 : 60 FPS, CPU on average 25-27%
(No Official mingw support)

2. Result : 5000 objects
MSVC 2010 : FPS on average 14, CPU on average 9-10%
(No Official mingw support)

3. Result : 10000 objects
MSVC 2010 : FPS on average 6, CPU on average 9-10%
(No Official mingw support )


What I observed from this tests:
1. Mingw is less CPU intensive in my PC than MSVC 2010
2. Both engine performs same on FPS
3. orx CPU uses is consistent in both MSVC and mingw but mingw performs better than MSVC

I just did the test for fun, and the lazy way possible. What you guys say ?

Note: My system is core i5, windows 7 64 bit. I am using orx 1.4 from download section

Comments

  • edited April 2013
    Hey Jim, thanks for the test!

    First, a small detail regarding the orx test, it won't be more efficient overall, but it is a bit more flexible/data oriented to put the random position in config:
    [Object]
    Position = (-400, -400, 0) ~ (400, 400, 0)
    
    And in code you can simply keep the orxObject_CreateFromConfig() call.

    I have a couple questions for you, if you don't mind.
    How do you limit the FPS? Is it with VSync and/or by setting the Clock.MainClockFrequency property?

    Would you mind doing the same test with the latest from mercurial, with a profile version and post a screenshot of the profiler screen for all 3 tests so that we could see the hotspots? :)

    Thanks!
  • edited April 2013
    Also, if you want to learn how to use the latest version of the profiler (including the history graph), it's all explained in the orx-dev google group (feel free to join the group as well, I'll keep posting news related to new dev there :)).
  • jimjim
    edited April 2013
    I have not touched the config but to show the FPS, but I have noticed even with one object my FPS remains 60.
    Hmm, now I have disabled VSync from config but for 1000 and 5000 the FPS remains same, but for one object its over 1400.

    Ok, I will try to grab the latest from HG and post the results with profiler here hopefully tomorrow
  • jimjim
    edited April 2013
    No new update but one thing I noticed the last day with orx is that, while drawing 5000 objects, my key input and mouse input lags a lot even outside the orx window (with and without VSync enabled ) thats a bit strange. Like I am running the program and trying to type something in the browser, I cant do it properly because of lags. Its even worse for 10000 objects but I have not notice this kind of lag with the other engine even with 15,000 object.

    Was this kind of issue reported after 1.4 release, I am gonna try latest hg after tomorrow. If it was not reported or nobody has noticed yet we have to dig deeper to find the cause.
  • edited April 2013
    Ah, interesting, you're the first one to report something like this.
    As your CPU use is pretty low, I'm curious to know what happens.
    orx's using GLFW to communicate with the OS, so maybe we can try to see if the problem comes from there or not.
    Also what video card do you have? Are your OpenGL drivers up-to-date?
  • jimjim
    edited April 2013
    Yeah, my CPU is low but input is not much responsive.
    Oh, I am using intel HD graphics 4000 with no extra graphics card on board.

    My drivers dated from 2012, so I am gonna update it to latest and report it back
  • edited April 2013
    What worries me a bit is that the "lagginess" doesn't correlate with the use of CPU but still comes with the number of objects.

    I'll do a similar test on my own machine tonight, but I already know that I can easily display 12k objects @ 60Hz, and when removing VSync, I turn at ~6400FPS when displaying 4 objects (compiled with VS2008).
  • jimjim
    edited April 2013
    Even after updating my driver, its still lagging. I think GLFW is doing something silly here. Is there anyway to disable keyboard input for orx ?
  • edited April 2013
    Not that easily, no.
    However you can replace the keyboard plugin by a dummy one that does nothing at compile time.
    In the file /code/src/plugins/orxPlugin_EmbeddedList.cpp, replace the line:

    #include "../plugins/Keyboard/GLFW/orxKeyboard.c"

    by

    #include "../plugins/Keyboard/Dummy/orxKeyboard.c"

    And recompile orx.

    Btw, you're now using the latest from Mercurial, isn't it?
  • edited April 2013
    Ok, I tried a similar test on my machine.

    Here's the results I get:

    VS2008 / 1024x768

    1 + VSync: 60FPS / CPU 0.8%
    1 - VSync: 5240FPS / CPU 16.5-17.5%

    1000 + VSync: 60FPS / CPU 2.2%
    1000 - VSync: 195FPS / CPU - 16.5-17.5%

    5000 + VSync: 42FPS / CPU 16.5-17.5%
    5000 - VSync: 42FPS / CPU 16.5-17.5%

    10000 + VSync: 21FPS / CPU 16.5-17.5%
    10000 - VSync: 21FPS / CPU 16.5-17.5%

    As my CPU has 6 cores, 16.6% means a full core is used at 100% for orx, which is the limit as orx isn't multi-threaded (yet).

    I don't experience any input delay, however, with the 5000x & 10000x tests, my GPU is at 100% and makes aero lags (moving the windows around is not as smooth as it used to be).

    Would you mind sharing your test (including the cocos2d-x binaries), so that I can profile them and look at what could the differences be?

    At the moment, orx runs with 6 threads by default, one being added by OpenGL for communication with ATI's driver, others are created by OpenAL, etc...
    If I use Dummy plugins for Keyboard, Mouse, Joystick and Sound, there are only 2 threads left: the main one created by orx, and the ATI one.

    With the 5000x/10000x test, the CPU goes 100% because of the GPU.
    Orx spins on the CPU while waiting for the OpenGL swap to happen, whereas the actual CPU processing takes ~10-12ms total per frame as you can see on the screenshot below.
    resource_0001.png

    If you know how to not have a blocking OpenGL swap call, let me know, that would lower the overall CPU usage.
  • jimjim
    edited April 2013
    Sorry for the delay, I did those tests again with latest version from mercurial, FPS is the same but I think the situation is same. I dont know how to turn on history graph but I did the test with profile enabled, here is 4 screenshots for 1000,5000,10000 and 15000 objects respectively

    HYcZPCG.png?1
    Ebs34lv.png?1
    0KExYWO.png?1
    JnqNH8d.png?1

    You mean binary files from cocos2d-x project ?
    I have found something on the net dont know if it is related to it or not
    https://github.com/LaurentGomila/SFML/issues/320

    Edit1 : I have attached binary files from cocos2d-x project.
    There is a text file to specify number of objects to draw
    Edit2 : I have also found a way to show profile history graph, if you need that I can provide my tests with graph enabled.
    Edit3 : I have also disabled keyboard and compiled orx again, keypress does not work now but the lag is still there
    https://forum.orx-project.org/uploads/legacy/fbfiles/files/Debug.zip
  • edited April 2013
    Thanks for the screens.
    Your version behaves very closely to mine CPU-wise, however my GPU looks much more capable than yours (it's an ATI HD 5750).

    Yep, if you could send me all your source + binaries for all the tests, that be great. I had a look at cocos2d-x source tonight, didn't find anything unusual in it.
    Their renderer is very similar to orx's except they're using a VBO (much better for 3D, not really convinced for 2D, but I'll try) and don't do GL_TRIANGLE_LIST + indexed rendering, chosing GL_TRIANGLES arrays instead (the first one is supposed to be slightly faster, but I don't know if there are any real world difference).

    As for your last link, that concerns input lag within the game, not outside. It's simply that OpenGL stacks frames before actually rendering them, leading to N frames latency, making the rendering looks laggy. A call to glFinish() makes sure the CPU and GPU are in sync at the end of every frames.
    I'll offer the option too I guess, when VSync's on, but that won't solve the problem you've seen outside of the game itself.
    I wonder if that's because I bind orx's main thread to a single CPU instead of letting the OS dispatch it at will.
    I tried with and without it, but I don't have any input lag to begin with, so that didn't tell me much. ;)

    If you want to try it on your side, open orxSystem.c, and comment both those lines:
    /* Sets thread CPU affinity to remain on the same core */
        SetThreadAffinityMask(GetCurrentThread(), 1);
    
        /* Asks for small time slices */
        timeBeginPeriod(1);
    

    as well as this one:
    /* Resets time slices */
         timeEndPeriod(1);
    

    Those are just to make sure we get as much CPU as possible, leaving as little as possible to other applications, but it might be a bit too aggressive when using 100% CPU. :)
  • jimjim
    edited April 2013
    I also commented out those lines in orxSystem.c, but did notice much difference in input lag. Even with 15,000 object with FPS of 3/4 I dont get any noticeable lag with coco2d-x which concerns me actually.

    Also I just took the 01_object tutorial and just added those line and with cocos2d-x I created a template project and added those lines, otherwise its all same.
  • edited April 2013
    I would still appreciate the cocos2d-x binaries. :)
    I wouldn't be concerned much by that problem in the end anyway, it's not a real situation problem, you can't render 10000+ objects on screen, there isn't a single game out there that can do it. Battlefield 3 has 15000 objects max in the world, not rendered at the same time, not mentioning it's in 3D with little alpha, depth buffer and partitioner. ;)

    One last thing to keep in mind is that when you do this test in orx, you actually have all the systems up and running, including OpenAL and Box2D, is that also the case with cocos2d-x?

    Lastly, do you experience the same issue when having 15000 objects spread in a larger world, not all rendered at the same time?
  • jimjim
    edited April 2013
    I have actually attached cocos2d-x binary with the post I have attached 4 image, you might have not noticed. One thing I remember now that cocos2d-x has no keyboard input and no real-time mouse tracking (only when the mouse is clicked in the window) hence no key input polling or registering callback, so I guess this can be the cause :)
    One last thing to keep in mind is that when you do this test in orx, you actually have all the systems up and running, including OpenAL and Box2D, is that also the case with cocos2d-x?

    This can be another cause, with default cocos2d-x project there is no box2d or openal linked with the project.

    And your pc is powerful enough not to detect any lag i think
  • edited April 2013
    jim wrote:
    I have actually attached cocos2d-x binary with the post I have attached 4 image, you might have not noticed.

    Ah woops, I completely missed it, indeed! Thanks! :)
    One thing I remember now that cocos2d-x has no keyboard input and no real-time mouse tracking (only when the mouse is clicked in the window) hence no key input polling or registering callback, so I guess this can be the cause :)

    Ah yep, that sounds like a good lead indeed.
    And your pc is powerful enough not to detect any lag i think

    Well, apparently we were at about the same CPU use you and me. You were actually doing slightly better than me on the 5000 test if I compare both profiler screenshots. :)
    The difference might come from my peripherals themselves: I have high polling rate (1000Hz) gaming keyboard/mouse instead of the usual 100-125Hz ones. :)

    Btw, I did some test regarding smoothness with and without VSync, windowed or fullscreen, and I'll post some recommendations on the orx-dev group soon.
    I've also found a lot of people complaining about VSync and stuttering all over internet, and that's something that concerned me a bit to be honest. I think I now have found a decent scheme to keep a visual experience as smooth as possible while maintaining as little input lag as possible.
  • jimjim
    edited April 2013
    iarwain wrote:
    Ah woops, I completely missed it, indeed! Thanks!
    Your welcome :) From now on I can follow orx development more closely and try to learn orx source bit by bit. I am still a learner so it would take me sometime to grasp it all :)

    I have noticed in the orx source that there is 3 library for i/o in orx, sdl,sfml and glfw. Why there is 3 library? And which one orx is using currently ?
  • edited April 2013
    No worries, ask as many questions as needed in the process! :)

    As for the different plugins:
      none
    • SFML is the oldest one, based on SFML 1.5, but it doesn't support all the features orx needs, does weird stuff with shaders instead of using regular GLSL syntax and is really slow compared to a bare bone implementation as made in the other plugins. It's still the only version that can work in non-embedded mode while compiled as static. It's deprecated.
    • SDL was the replacement, however it's based on SDL 1.2 and is kind of limited, especially on Mac due to the way it was architectured. It's deprecated as well.
    • GLFW is the most up-to-date one, the one that supports the most of orx's features (except the windows decoration, unfortunately) and is the fastest so far. It can only run in non-embedded mode if it's compiled dynamically, but by default it's compiled statically for performance reasons.none

    I'm looking at the newest SDL 2.0 at the moment and might give it a try if GLFW doesn't become satisfactory at some point. :)
Sign In or Register to comment.