h1

New release of Haskell compiler improves parallel support

14/12/2009

A new version of the Glasgow Haskell Compiler (GHC 6.12.1) has been released today and is available for download here. It supports Linux, Windows 2000/XP/Vista (and perhaps Windows 7 will run that version okay too..?), and MacOS X.

I’ve blogged previously about why Haskell is great for multicore programming. This latest version of GHC incorporates some changes to optimise compiler performance for parallel programs. Language changes have been made in this release (and some features are starting to be described as deprecated, with support to be withdrawn in future), but these do not appear to affect parallel programming adversely.

Here are some edited highlights from the full release notes:

  • Considerably improved support for parallel execution. GHC 6.10 would execute parallel Haskell programs, but performance was often not very good. Simon Marlow has done lots of performance tuning in 6.12, removing many of the accidental (and largely invisible) gotchas that made parallel programs run slowly.
  • As part of this parallel-performance tuning, Satnam Singh and Simon Marlow have developed ThreadScope, a GUI that lets you see what is going on inside your parallel program. It’s a huge step forward from “It takes 4 seconds with 1 processor, and 3 seconds with 8 processors; now what?”. ThreadScope will be released separately from GHC, but at more or less the same time as GHC 6.12.
  • The flag +RTS -N now automatically determines how many threads to use, based on the number of CPUs in your machine.
  • The parallel GC now uses the same threads as the mutator, with the consequence that you can no longer select a different number of threads to use for GC. The -gn RTS option has been removed, except that -g1 is still accepted for backwards compatibility.
  • The new flag +RTS -qggen sets the minimum generation for which parallel garbage collection is used. Defaults to 1. The flag -qg on its own disables parallel GC.
  • The new flag +RTS -qbgen controls load balancing in the parallel GC. (There are more details on all these garbage collection options here).
  • There is a new statistic in the +RTS -s output that tells you how many sparks (requests for parallel execution, caused by calls to par) were created, how many were actually evaluated in parallel (converted), and how many were found to be already evaluated and were thus discarded (pruned). Any unaccounted for sparks are simply discarded at the end of evaluation.
  • There is a new flag -feager-blackholing which typically gives better performing code when running with multiple threads. See “Compile-time options for SMP parallelism” for more information.
  • The threadsafe foreign import safety level is now deprecated; use safe instead.
  • The new flag +RTS -qa uses the OS to set thread affinity (experimental).
h1

Multicore and programming artificial intelligence (AI) for games

14/12/2009

At their best, games are the most-demanding popular applications on PCs today. They push a machine’s graphics, sound and processing capacity to its limits in an attempt to create a realistic, immersive environment.

While console games can rely on a standard architecture (all PS3 or Xbox 360 games run on the same hardware), PC games run on a wide range of different hardware configurations. One of the challenges in game design and programming will be adapting to the number of cores and threads available for processing.

An extreme solution is to limit the game to only running on particular hardware. This involves a compromise which is unlikely to be viable. At the low end, if games are designed to run on the lowest common denominator, those with higher spec machines are unlikely to be satisfied with the experience offered. If a higher threshold for compatibility is set, then the market will be reduced (at least in the short term) to the point where development might not prove to be viable.

One way that games have scaled is to offer richer and more detailed graphics where the processor resources are available to support that. At IDF, I saw Intel’s Smoke demonstration of galloping horses (video here). The configuration running at the show fixed the frame rate and then scaled the number of horses (or in-game characters) in line with the available resources. It had the peculiar result of making horses appear and vanish (which would need concealing or justifying in the game world if used in a game). It did show, though, how the detail can be scaled up not just in the background but also in the number and arrangement of non-player characters.

A new white paper has been published which hints further at how artificial intelligence can be threaded. It’s part of a series about designing artificial intelligence for games using the Intel Media Software Development Kit. Part one (PDF) provides an architectural overview of the new API for the Intel Media SDK and how it can be used to incorporate AI into game design. Part two (PDF) looks at how AI algorithms can be designed for non-player characters, including how characters can perceive and navigate their environments. It includes mention of two algorithms which can be multithreaded to ensure the game runs smoothly:

  • Crash and turn navigation: where the AI agent moves towards its target, turns when it faces a fork, and when it hits an obstacle backtracks to the last turn and tries another direction. This lightweight method enables a lot of characters to be supported without slowing the game down. It lends itself well to multithreading, but does result in each character creating its own map of the environment, unless the map is shared and a dedicated module provides access to it to avoid data conflicts.
  • Path finding using the A* algorithm: this works by testing nodes in turn (preferring those closest to the goal), and then testing that node’s neighbours, building up a tree of connected nodes that eventually map the route to the goal. Because the process only takes place when a character decides to move, a character can send a request for a path thread and wait until  it is returned without affecting the AI performance. If the path thread is overloaded, the character can start to make its way towards the goal (using crash and turn perhaps) until the path is returned.

Once the AI processing is threaded, it should become possible to scale the number of intelligent agents in the game in line with the cores and threads available. One of the challenges will be to ensure that the game remains fair: if those using machines with more cores face more enemies, will the pay-off of higher potential scores be enough to justify greater difficulty? Will it be necessary to provide user-controlled difficulty levels, with the more demanding levels only available to those whose hardware can support them? There are a number of questions like these still to be resolved, but it will be interesting to see how game design evolves to take account of the resources available.

h1

Netbook World Summit: Scott Apeland talks up Intel’s SDK release

10/12/2009

I managed to get a couple of minutes with Director of Intel’s Developer Division, Scott Apeland, at Netbook World Summit earlier this week.

In this video, Apeland gives an overview of the Intel Atom Developer Program, the latest beta SDK release, and what Intel has in store for developers next year…

This movie requires Adobe Flash for playback.