BEPUphysics and XNA

As some of you may have noticed, there's been a recent uptick in the discussion of XNA's slow fading. As the main branch of BEPUphysics is still based on XNA, I think it would be appropriate to discuss the path of BEPUphysics.

For those of you who are not aware, BEPUphysics has long had multiple forks based on different libraries for easy math interoperation. The official forks include SlimDX, SharpDX, and the dependency free fork (and of course the main development fork, which is currently XNA). The dependency free fork has its own math systems and does not depend on anything beyond .NET or Mono's libraries.

As mentioned/hidden away in the version roadmap, one of the next two major packaged releases of BEPUphysics will move over to the dependency free fork. Already, you can see progress in this direction; the dependency free math utilities have been reorganized and expanded. Internally, we are already using the dependency free fork utilities for our projects.

Expect the swap to occur in the next six months as my procrastination is overridden by 1) internal development and 2) the fact that I haven't released a proper packaged version since May 2012. (The latter of which has lead to some confusion about whether development has halted- development continues and will continue!)

The most likely target frameworks for the rewritten demos will be either SharpDX or, for wider use, MonoGame.

There have also been some discussions about WinRT versions of BEPUphysics. WinRT is not a targeted platform for our internal projects, so it's a lower priority. However, there are a variety of little annoyances and API changes (particularly with threading) which make it nontrivial for people to pull it into WinRT projects; it would be nice to solve this in one spot so there isn't a bunch of redundant work being done. I will likely get around to it eventually- assuming someone else doesn't maintain a fork for it :) Anyone? :) :)   :)   :) :) :)      :) :) :)    <:)

 

Don't forget to follow me on twitter so you can see a notification on twitter about this blog post!

BEPUik for Blender now available!

An early version of BEPUik, the full body inverse kinematics add on for blender, is now available.

Check out the announcement thread over on blender artists for more.

BEPUik started out as an experimental addition to BEPUphysics (currently available in the development fork):

This system was then ported and integrated with Blender. The result:

Finally and least importantly, I have a twitter account which has more than zero tweets on it now. Squashwell has one too.

BEPUphysics and hyperthreading

BEPUphysics v1.2.0 includes some improvements to multithreaded scaling, particularly in the DynamicHierarchy broad phase with certain core counts. Given that I recently/finally upgraded my CPU, it seemed appropriate to investigate scaling in the presence of hyperthreading.

3770K

[Before continuing, note that all multithreading comparisons in this article are slightly biased towards the single threaded case. Single threaded cases can bypass the overhead of dispatching work entirely, so the jump from one to two threads isn't entirely apples to apples...

...And I may have forgotten to turn off turbo boost in certain tests using the 3770K; oops. Fortunately, the base clock is 4.5ghz and the boosted frequency is 4.6ghz, so the difference should be no more than 2% or so on the 1 core and sometimes 2 core cases.]

Given the improvements, it seems natural to start with the broad phase. This test uniformly distributes thousands of boxes in a volume and then runs broad phase updates many thousands of times for different thread counts. This is running on the quad core hyperthreaded 3770K.

That's almost five times faster on a processor with four physical cores. A decent improvement over the old results in v0.16.0 on my Q6600!

After four threads, the gains slow down as expected. However, using all 8 available threads still manages to be 41% faster than 4 threads.

How about the running the same test on the 3770K with hyperthreading disabled? 

It's roughly equivalent with the first four hyperthreaded results, within error imposed by differing process environments. That's good; the thread scheduling is handled effectively by Windows 7 when hyperthreading is enabled. The final non-hyperthreading speedup is around 3.5 times. Not bad for 4 threads, but not as good as hyperthreading.

Now for some full simulations! The following test measures a few hundred/thousand time steps of three different configurations. The full time for each simulation in milliseconds is recorded.

The first simulation is 5000 boxes falling out of the sky onto a pile of objects. This starts out primarily stressing the broad phase before the pile-up begins.  Then, as the pile grows and thousands of collision pairs are created, it stresses some bookkeeping code's potential sequential bottlenecks, the narrow phase, and the solver. It runs for 700 time steps.

The second simulation is a big wall built of 4000 boxes. This mostly stresses the solver, with narrow phase second and broad phase third. It runs for 800 time steps.

The final simulation is 15625 boxes orbiting a planet (like the PlanetDemo) for 3000 time steps.

All of the simulations can run in real time, but the tests simulate the time steps as fast as they can calculate without any frame delays. 

First, with hyperthreading: 


Somewhat surprisingly, they all scale similarly. I was expecting the solver-heavy simulations to suffer a bit due to the added contention. Something is stopping the scaling from doing quite as well as the broad phase alone, though; the scaling ranges from 3.5 (Planet) to 3.8 (Pile).  The Planet scaling being lower is quite interesting since a large portion of that simulation should be the high-scaling broad phase.

Without hyperthreading:

The scalings range from 2.91 to 3.08 times faster at 4 threads. That's about the same as the 4 threads in hyperthreading. Once again, the Pile has the best scaling and the Planet the worst scaling for currently unknown reasons.

The consistency in scaling between different simulations is promising; it is evidence that there aren't any huge potholes in simulation type to watch out for when going for maximum performance.

Xbox360

We'll start with DynamicHierarchy tests again for the Xbox360. If I remember correctly, these results were on a smaller set of objects than was run on the 3770K (since a 3770K core is far faster than an Xbox360 core).

Not quite as impressive as the 3770K's scaling, but it still gets to around twice as fast. The important thing to note here is the speed boost offered by the usage of that final hardware thread. (The Xbox360 only has 3 physical cores with 6 hardware threads distributed between them. Two hardware threads, one on each of the first two physical cores, are reserved and cannot be used from within XNA projects.)

Despite theoretically stressing the load balancer more, loading up that last physical core with both hardware threads appears to be a win.

How about the full simulations? (Note that these were reduced in size a bit because the Xbox360 had a habit of getting too hot and trying to take a nap after thousands of reruns.)

Once again not quite as great as the 3770K's scaling. However, the Wall manages a respectable 2.3 times faster, providing more evidence that the parallel solver scales better than expected.

The Pile reverses its performance on the 3770K and shows a case where throwing as many threads as possible at a problem isn't the best option in all cases. It's within error (taking a little less than 4% longer to complete), but it's obviously not a clear victory. This suggests it's worth testing both 3 and 4 threads to see what behaves better for your simulation.

3930K

To the next platform: a 3930K running at 4.3ghz! Once again, we'll start with the DynamicHierarchy test. This is the same test that ran on the 3770K.

While we don't see scaling higher than the physical core count this time, it still gets to a decent 5.75 times faster. Interestingly, the single threaded time falls right in line with expectations relative to the 3770K; it takes 7% longer due to the 300 mhz speed difference, and another 5% or so due to the architecture improvements in Ivy Bridge over Sandy Bridge E.

Part of the difficulty here in getting the same kind of usage as 8 cores did on the 3770K may be the binary form of the dynamic hierarchy.  It naturally runs better on systems with core counts that are a power of two. v1.2.0 tries to compensate for this, but it can only help so much before single threaded bottlenecking eliminates any threading gains. Does anyone out there have an 8 core hyperthreaded Xeon or dual processor setup to test this theory? :)

Now for the most interesting test:

What is going on here? The solver-heavy Wall once again is the unexpected bastion of stable and robust scaling, reaching 4.2 times faster. It doesn't benefit from the final two threads, though. The other two simulations seem to have a seizure after 8 threads.

When I first saw these numbers, I assumed there had to be some environmental cause, but I couldn't locate any. Unfortunately, since the 3930K is not mine, I was unable to spend a long time searching. Hopefully I'll get some more testing time soon.

Wrap Up

Try using all the available threads. It seems to help most of the time. If the platform a has a huge number of cores, tread carefully, test, and tell me how it looks!

I don't have access to any recent AMD processors, so if any of you Bulldozers/soon-to-be-Piledrivers want to give these tests a shot, it could be informative. The tests used to generate the data in this post are in the BEPUphysicsDemos (MultithreadedScalingTestDemo and BroadPhaseMultithreadingTestDemo).

I still suspect some type of shenanigans on the 3930K simulations test, but the evidence has shown there are some aspects of threading behavior in BEPUphysics that I did not model properly. The solver seems to be a consistently good scaler. The question then becomes, if the DynamicHierarchy and Solver both scale well, what else in the largely embarrassingly-parallel loopfest of the engine is keeping things from scaling higher?

The answer to that will have to wait for more investigation and another post, to be completed hopefully before hexacores, octocores, and beyond are all over the place!

Questions or comments? Head to the forums

Shuffling some webstuff around

We'll be shuffling some data and domains to a new host over the coming weeks.

If the forum is inaccessible, try accessing it directly at http://bepu.nfshost.com/forum.

If this blog is inaccessible, try accessing it directly at /.

The Codeplex page at bepuphysics.codeplex.com will be free of any major changes (apart from engine updates, of course!).

BEPUphysics v1.0.0: XNA, SlimDX, and everything else!

As always, grab the newest version on codeplex.

In addition to the regular upgrades, this version comes with updates to the previously available SlimDX fork.

Don't want to use XNA or SlimDX? No problem! There's now a dependency free version as well! The BEPUphysicsDemos project used to test the dependency-free library still relies on XNA, but the library itself is completely independent.

Here's some jazzy boxswarm-related entertainment, filmed in v1.0.0:

SlimDX Prototype

Want to use BEPUphysics, but with SlimDX instead of XNA?

Good news!

A quickly-made experimental port is now available on codeplex.  Anyone who wants to check it out, test it, fix it, or maintain it for future feature parity with the XNA version is invited to go grab it!

I'd like to get the community involved in maintaining projects like this; if you'd like to discuss it, head on over here

v0.16.2 Character Controller Details

With the release of v0.16.2 comes a variety of fixes, changes, and most importantly, a new character controller. It can be found in the BEPUphysicsDemos project in the main source download.  If you just want to play around with it, the BEPUphysicsDemos comes precompiled on the codeplex downloads page.

As their name implies, character controllers control characters. The way they move and stand on things is defined by a set of rules. Usually, these rules include things like 'don't walk up a steep slope,' and 'step up/down when walking on stairs.' Their most familiar usage is in FPS games.

There are many possible approaches to developing a character controller depending on the needs of the game. The following shows a spectrum of character designs with increasingly complex behaviors.

Basic Physical Interaction

Some games may work perfectly with a dynamic sphere which is pushed around by forces and rolls everywhere. Calling such a thing a character controller is a stretch, but it is a reasonable starting point on the spectrum. These characters do not support anything like stepping or dynamics beyond baseline physical response. They rely on their round shape to smoothly slide or roll over obstacles.

Stripped Down Character

Sometimes, you don't want your character to be a sphere. Cylinders, capsules or boxes are typically a better fit for a standing bipedal form. In this case, you probably also don't want them falling over. This can be addressed by setting the local inertia tensor inverse of an entity to all zeroes, which is equivalent to having infinite inertia.

Simple Character Controller

Usually, the character also needs to be kept on the ground. Launching off of every single ramp can be pretty inconvenient. One simple way to handle this is to ray cast down to find the ground and conditionally remove separating velocity. That ray can also be used to support a basic form of up stepping and down stepping. With the addition of horizontal motion control to allow for configurable acceleration and speeds, the result is the SimpleCharacterController.

However, the SimpleCharacterController has a couple of problems. First, as mentioned, it does not try to protect itself against step ups. It has no way to recover from a bad step. Second, since it relies on a single ray cast, it can fall into tiny gaps. The CharacterControllerConvexCast addresses the second problem. Instead of a single, thin ray cast, it casts a disc down. However, it still has no way to abort a bad step-up.

Spherical Character Controller

Added in v1.1.0, the SphereCharacterController in the BEPUphysicsDemos strikes a balance between features, robustness, and speed.  It does not perform discontinuous stepping like the SimpleCharacterController and CharacterController.  Instead, it relies on its round shape to slide over obstacles.

However, you may find that the control systems in the SphereCharacterController (borrowed mostly from the full CharacterController) are more robust than the SimpleCharacterController.

The rotational symmetry of a sphere can be helpful sometimes, too.  If you are thinking about having a character capable of moving in 6 degrees of freedom, it's easier to handle with a rotationally invariant shape like a Sphere than a Cylinder.

Full Character Controller

To avoid getting into invalid states completely, an approach different from the SimpleCharacterController was needed. The new CharacterController abandons the usage of a single supporting cast. Instead, the contacts created between the body and the environment are analyzed. Combined with ray casts and testing candidate locations with shape queries, the character supports crouching and stepping without any danger of getting into an invalid state.

The new CharacterController is used by default in the BEPUphysicsDemos. It, along with a character playground demo, are both available in the BEPUphysicsDemos project in the main source download.

Picking the Right Character

If your game doesn't need traditional walking, jumping, or stepping, then using the simplest possible option of a pushable dynamic object is a good starting point.

If the game needs something closer to normal character control, but can guarantee that the game's level design and content doesn't require tall steps, the SphereCharacterController would be a good choice. It's a little simpler than the full CharacterController and, by virtue of being a sphere and having fewer features, it's a little faster.

If the game needs a full-fledged FPS-style character, then the CharacterController should be used. It is somewhat more expensive due to the extra verification it does to avoid getting into invalid states.

Advanced Usage and Modification

All characters are based on dynamic objects. This has the benefit of providing interaction with other dynamic objects and the environment without additional effort. The full CharacterController uses specialized physical constraints to manage all the primary movement behaviors for extra robustness in the presence of complex physical simulations. Due to being dynamic, these characters are influenced by gravity and can be constrained in other ways. When constraining dynamic characters capable of stepping, watch out; the step process involves a discontinuous teleportation. The constraint will correct the error pretty quickly, but not instantly.

The SimpleCharacterController is conceptually easy to modify thanks to its "capsule on a stick" nature. However, it is possible to add behaviors to the full CharacterController. Double jumping, for example, could be implemented by adding a jump counter and changing the conditions under which jumping is permitted. Ladders, climbing, and hanging require a bit more work, but the CharacterController provides a QueryManager to assist in testing nearby objects with collision shapes and rays.

The CharacterController's existing behaviors can also be modified. Rules for support can be changed in the SupportFinder and the shape could be modified if necessary. If you'd like to use most, but not all, of the CharacterController, individual behaviors can be disabled without much problem for extra speed.

Edited on July 26, 2012 to reflect new stuff and to discourage the use of the SimpleCharacterController in favor of the SphereCharacterController and CharacterController.

v0.16.0 Broad Phase Benchmarks

Welcome to the first BEPUphysics development blog post! Today's post is about broad phases. The BroadPhase (accessible in the Space.BroadPhase property) prunes object collisions based on their axis-aligned bounding boxes. Some also accelerate queries, like ray casts and bounding box tests. In v0.16.0, the main broad phase has been rewritten and a few new options are available. The data in this post was collected on the latest development version, which can always be downloaded on the development fork over on codeplex: http://bepuphysics.codeplex.com/SourceControl/network/Forks/RossNordby/Development
Read More

Welcome!

Welcome to the first post of the new BEPU website!  BEPUphysics has transmogrified into a free and open source physics library, hosted on Codeplex.

If your mind is exploding with questions, you may want to head on o'er to the FAQ section.  If your question isn't answered there, ask it on the forums . . . we're always listening . . .