Happy new year, everyone
I have recently moved to a new place, so it took a quite amount of time to settle down. Now, with a new year, new home, I can get onto what I started. I have been following bitbucket regularly and very excited to see the thread branch. It would be really cool once it is done.
Now, I have stumbled upon this document
, and they propose a new kind of rendering for their engine. And they claim it's really efficient. I have read the doc, seems legit to me. What I want to know, is how much efficient their technique is, compared to current orx rendering method. Or if any of this techniques be added in orx if it's worthwhile.
I understand a few things, but am interested in,
1. Decouple the scene graph from the renderer
2. Rendering on a thread
I know culling is on the bitbucket issue. And automatic batching, customizable rendering is already available in orx.
You can try the thread branch, it works pretty well for the desktop versions but there's still work to do for iOS/Android.
Interesting plan of action for Cocos2D. It's mostly good things.
Lemme address all the points separately:
* Decouple the scene graph from the renderer
That's a good thing. I'm actually doing something similar in spirit but different in implementation. Having a command buffer is a very traditional way of doing rendering. Its main advantage is when having a render engine that deals with a lot of different rendering commands.
It's cleaner but it also comes at a price: handling the command buffer isn't free. It has a memory cost as well as a CPU cost (sorting, processing, re/allocating, etc...).
It also make customization of rendering much trickier as users have to go through that intermediate level in order to do what they want and they can't access directly the low level API (OpenGL, DirectX, ...). It can become pretty messy too as I've seen a few times in different companies over the last decade.
Orx aims at 2D rendering and there's only a handful of different commands. I don't think the overhead of having a command buffer is interesting in our case.
* Viewing frustum Geometry culling
I'm surprised they didn't have it before. It's a major performance boost for large (as in space, not so much as in number of entities) scenes. Orx already has this feature. What I want to add is a partitioner to make it much more efficient, that's the important part for supporting large *and* rich scenes. I didn't find any mention of partitioning in the document you sent, but I only read it diagonally.
* Rendering on a thread
That's a neat feature and it's also in my todo list (will come shortly after I merge threading back to the main branch). I don't think it'll provide a big perf boost though, as perf bottlenecks in 2D games aren't about batch gathering processing time (CPU) as much as bandwidth/fillrate limitation (GPU). Still nice to have as it doesn't come with any hard extra cost.
* Automatic batching
Been there, done that. Good idea, I'd be curious to see what their key-packing will end up being, performance-wise.
* (Node based) Customizable rendering
Orx has this and a bit more as well: think of viewport-based image compositing, MRT (Multiple Render Target) support, etc... It's a must have, I think and what separates flexible 2D engines from traditional ones (as a metaphor, think of shader pipeline vs fixed pipeline).
* Optimized for 2D, but suitable for 3D as well
That point is the result of the previous one, I think.
One thing they're missing for efficient 2D rendering (and orx too, for that matter, though I added an issue to the tracker a few weeks ago about it): early-Z rejection (more precisely Z pre-pass).
In most 2D games, the highest rendering cost is likely to be the overdraw and fillrate limitation. That's what the early-Z rejection is all about.
I did a quick test at work on the UbiArt Framework (the engine powering the recent Rayman games and some upcoming games such as Child of Light), turning early-Z on a few scenes of Child of Light yielded an average 40% speed boost. I previously thought it would only be efficient for 3D rendering but I was wrong, that test convinced me of the importance of it for 2D rendering as well.
Here you can find a couple of links about it:
I'm going to tackle this when I'm done with the threading integration and the CPU-side render optimizations. It'll be optional as I don't know how well it'll be in all the possible cases, including on mobile devices.
Thanks for reporting that document, it's interesting to see that we're not the only one going toward the same destination, albeit via slightly different roads.
Yep, they did not have any geometric culling until recently. They added some of the features mentioned the document in their engine and I am waiting for any official performance benchmark from them.
I think they have not think about Z pre-pass because they are targeting only mobile. Maybe that's why cocos2d and orx do things differently because of target platforms. So, orx is more cross-platform than cocos2d.
I have seen some video demonstration of UbiArt framework, and to me it is one of the most advanced and powerful engine for 2d to date.
Edit: I think to reduce overdraw/fill rate, Z pre-pass and mesh (to reduce transparent portion) could be used. It is also a matter of optimization and finding a sweet spot between number of draw calls (with the increase of mesh vertex) and fill rate, which delivers optimal CPU and GPU performance.