Hello everyone,
Finally, I am done with my exams and everything
so I started orx tutorial again. I changed 01_object tutorial and lazily tried to draw many objects as possible. I am using the demoiselle.png from the tutorial.
I drew objects using the following code. I dont know if it is the best or efficient way to do, but i just did it quickly using the following code :
int i;
for(i = 0 ; i < N_OBJECT ;i++)
{
orxVECTOR myVec = {-400 + rand()% 800, - 400 + rand()%800};
orxOBJECT * j = orxObject_CreateFromConfig ( "Object" );
orxObject_SetPosition(j,&myVec);
}
ORX test :
1. Result : 1000 objects
MSVC 2010 : 60 FPS, CPU on average 15-16%
Codelite 5.1 : 60 FPS, CPU on average 7-8%
2. Result : 5000 objects
MSVC 2010 : 14 FPS , CPU on average 17-18%
Codelite 5.1 : 14 FPS , CPU on average 8-10%
3. Result : 10000 objects
MSVC 2010 : 7 FPS , CPU on average 18-19%
Codelite 5.1 : 7 FPS , CPU on average 8-9%
I did similar things with another game engine which is cocos2d-x using following code, also dont know if its the best or efficient way to do:
for(int i = 0 ; i < N_OBJECT ;i++)
{
CCSprite * pSprite = CCSprite::create("demoiselle.png");
pSprite->setPosition(ccp(rand()%800, rand()%800));
this->addChild(pSprite, 0);
}
cocos2d-x test:
1. Result : 1000 objects
MSVC 2010 : 60 FPS, CPU on average 25-27%
(No Official mingw support)
2. Result : 5000 objects
MSVC 2010 : FPS on average 14, CPU on average 9-10%
(No Official mingw support)
3. Result : 10000 objects
MSVC 2010 : FPS on average 6, CPU on average 9-10%
(No Official mingw support )
What I observed from this tests:
1. Mingw is less CPU intensive in my PC than MSVC 2010
2. Both engine performs same on FPS
3. orx CPU uses is consistent in both MSVC and mingw but mingw performs better than MSVC
I just did the test for fun, and the lazy way possible. What you guys say ?
Note: My system is core i5, windows 7 64 bit. I am using orx 1.4 from download section
Comments
First, a small detail regarding the orx test, it won't be more efficient overall, but it is a bit more flexible/data oriented to put the random position in config: And in code you can simply keep the orxObject_CreateFromConfig() call.
I have a couple questions for you, if you don't mind.
How do you limit the FPS? Is it with VSync and/or by setting the Clock.MainClockFrequency property?
Would you mind doing the same test with the latest from mercurial, with a profile version and post a screenshot of the profiler screen for all 3 tests so that we could see the hotspots?
Thanks!
Hmm, now I have disabled VSync from config but for 1000 and 5000 the FPS remains same, but for one object its over 1400.
Ok, I will try to grab the latest from HG and post the results with profiler here hopefully tomorrow
Was this kind of issue reported after 1.4 release, I am gonna try latest hg after tomorrow. If it was not reported or nobody has noticed yet we have to dig deeper to find the cause.
As your CPU use is pretty low, I'm curious to know what happens.
orx's using GLFW to communicate with the OS, so maybe we can try to see if the problem comes from there or not.
Also what video card do you have? Are your OpenGL drivers up-to-date?
Oh, I am using intel HD graphics 4000 with no extra graphics card on board.
My drivers dated from 2012, so I am gonna update it to latest and report it back
I'll do a similar test on my own machine tonight, but I already know that I can easily display 12k objects @ 60Hz, and when removing VSync, I turn at ~6400FPS when displaying 4 objects (compiled with VS2008).
However you can replace the keyboard plugin by a dummy one that does nothing at compile time.
In the file /code/src/plugins/orxPlugin_EmbeddedList.cpp, replace the line:
#include "../plugins/Keyboard/GLFW/orxKeyboard.c"
by
#include "../plugins/Keyboard/Dummy/orxKeyboard.c"
And recompile orx.
Btw, you're now using the latest from Mercurial, isn't it?
Here's the results I get:
VS2008 / 1024x768
1 + VSync: 60FPS / CPU 0.8%
1 - VSync: 5240FPS / CPU 16.5-17.5%
1000 + VSync: 60FPS / CPU 2.2%
1000 - VSync: 195FPS / CPU - 16.5-17.5%
5000 + VSync: 42FPS / CPU 16.5-17.5%
5000 - VSync: 42FPS / CPU 16.5-17.5%
10000 + VSync: 21FPS / CPU 16.5-17.5%
10000 - VSync: 21FPS / CPU 16.5-17.5%
As my CPU has 6 cores, 16.6% means a full core is used at 100% for orx, which is the limit as orx isn't multi-threaded (yet).
I don't experience any input delay, however, with the 5000x & 10000x tests, my GPU is at 100% and makes aero lags (moving the windows around is not as smooth as it used to be).
Would you mind sharing your test (including the cocos2d-x binaries), so that I can profile them and look at what could the differences be?
At the moment, orx runs with 6 threads by default, one being added by OpenGL for communication with ATI's driver, others are created by OpenAL, etc...
If I use Dummy plugins for Keyboard, Mouse, Joystick and Sound, there are only 2 threads left: the main one created by orx, and the ATI one.
With the 5000x/10000x test, the CPU goes 100% because of the GPU.
Orx spins on the CPU while waiting for the OpenGL swap to happen, whereas the actual CPU processing takes ~10-12ms total per frame as you can see on the screenshot below.
If you know how to not have a blocking OpenGL swap call, let me know, that would lower the overall CPU usage.
You mean binary files from cocos2d-x project ?
I have found something on the net dont know if it is related to it or not
https://github.com/LaurentGomila/SFML/issues/320
Edit1 : I have attached binary files from cocos2d-x project.
There is a text file to specify number of objects to draw
Edit2 : I have also found a way to show profile history graph, if you need that I can provide my tests with graph enabled.
Edit3 : I have also disabled keyboard and compiled orx again, keypress does not work now but the lag is still there
https://forum.orx-project.org/uploads/legacy/fbfiles/files/Debug.zip
Your version behaves very closely to mine CPU-wise, however my GPU looks much more capable than yours (it's an ATI HD 5750).
Yep, if you could send me all your source + binaries for all the tests, that be great. I had a look at cocos2d-x source tonight, didn't find anything unusual in it.
Their renderer is very similar to orx's except they're using a VBO (much better for 3D, not really convinced for 2D, but I'll try) and don't do GL_TRIANGLE_LIST + indexed rendering, chosing GL_TRIANGLES arrays instead (the first one is supposed to be slightly faster, but I don't know if there are any real world difference).
As for your last link, that concerns input lag within the game, not outside. It's simply that OpenGL stacks frames before actually rendering them, leading to N frames latency, making the rendering looks laggy. A call to glFinish() makes sure the CPU and GPU are in sync at the end of every frames.
I'll offer the option too I guess, when VSync's on, but that won't solve the problem you've seen outside of the game itself.
I wonder if that's because I bind orx's main thread to a single CPU instead of letting the OS dispatch it at will.
I tried with and without it, but I don't have any input lag to begin with, so that didn't tell me much.
If you want to try it on your side, open orxSystem.c, and comment both those lines:
as well as this one:
Those are just to make sure we get as much CPU as possible, leaving as little as possible to other applications, but it might be a bit too aggressive when using 100% CPU.
Also I just took the 01_object tutorial and just added those line and with cocos2d-x I created a template project and added those lines, otherwise its all same.
I wouldn't be concerned much by that problem in the end anyway, it's not a real situation problem, you can't render 10000+ objects on screen, there isn't a single game out there that can do it. Battlefield 3 has 15000 objects max in the world, not rendered at the same time, not mentioning it's in 3D with little alpha, depth buffer and partitioner.
One last thing to keep in mind is that when you do this test in orx, you actually have all the systems up and running, including OpenAL and Box2D, is that also the case with cocos2d-x?
Lastly, do you experience the same issue when having 15000 objects spread in a larger world, not all rendered at the same time?
This can be another cause, with default cocos2d-x project there is no box2d or openal linked with the project.
And your pc is powerful enough not to detect any lag i think
Ah woops, I completely missed it, indeed! Thanks!
Ah yep, that sounds like a good lead indeed.
Well, apparently we were at about the same CPU use you and me. You were actually doing slightly better than me on the 5000 test if I compare both profiler screenshots.
The difference might come from my peripherals themselves: I have high polling rate (1000Hz) gaming keyboard/mouse instead of the usual 100-125Hz ones.
Btw, I did some test regarding smoothness with and without VSync, windowed or fullscreen, and I'll post some recommendations on the orx-dev group soon.
I've also found a lot of people complaining about VSync and stuttering all over internet, and that's something that concerned me a bit to be honest. I think I now have found a decent scheme to keep a visual experience as smooth as possible while maintaining as little input lag as possible.
I have noticed in the orx source that there is 3 library for i/o in orx, sdl,sfml and glfw. Why there is 3 library? And which one orx is using currently ?
As for the different plugins:
none- SFML is the oldest one, based on SFML 1.5, but it doesn't support all the features orx needs, does weird stuff with shaders instead of using regular GLSL syntax and is really slow compared to a bare bone implementation as made in the other plugins. It's still the only version that can work in non-embedded mode while compiled as static. It's deprecated.
- SDL was the replacement, however it's based on SDL 1.2 and is kind of limited, especially on Mac due to the way it was architectured. It's deprecated as well.
- GLFW is the most up-to-date one, the one that supports the most of orx's features (except the windows decoration, unfortunately) and is the fastest so far. It can only run in non-embedded mode if it's compiled dynamically, but by default it's compiled statically for performance reasons.none
I'm looking at the newest SDL 2.0 at the moment and might give it a try if GLFW doesn't become satisfactory at some point.