h1

Mobile World Congress: the highlights for developers

09/02/2010

I’ll be blogging from Mobile World Congress in Barcelona next week, and I thought I’d share some highlights from the programme for programmers.

Of particular interest this year is the creation of a new App Planet area dedicated to mobile applications. As well as developer days dedicated to the Blackberry and Android platforms (among others), Intel will be running a comprehensive programme of events on its booth (7A49 in Hall 7).

A series of Meet Mobility discussions will be held. On Tuesday at 11am, the topic will be hardware and how apps work within the netbook segment (8-12″ screens). Later that same day, at 2pm, there will be a discussion of the different OS options available, including Moblin, Windows XP and Windows 7. This session will also present the Intel AppUp Center and the Intel Atom Developer Program, which enables programmers to distribute their netbook apps to end customers.

On Wednesday, Scott Apeland will be on the stand at 11am to reveal the latest details of the Intel Atom Developer Program. You’re invited to attend and put your questions to him. At 2pm on Wednesday, the final session will be a podcast recording. You can find details of all these sessions on the Meet Mobility site.

If you’re not familiar with the Moblin operating system, which is a Linux variant that is optimised for netbooks, the Intel booth will provide many opportunities to see what it’s capable of, first-hand. ISVs presenting their Moblin applications include Eyesight, JayCut, Fluendo, Popcatcher, Fring, Igalia, Scalado, Digital Chocolate and Abby. The ISVs are taking it in turns to be on the stand over the course of the congress, so it’s worth dropping by repeatedly to see the full range of Moblin capabilities.

The congress programme itself is bursting with ideas as ever. If you’re a programmer, I think the most interesting sessions will be:

  • Monday morning’s session on mobile applications (innovation vs fragmentation). I expect this will provide some interesting pointers on how to choose the right platform(s) for development.
  • The Mobile Innovation Grand Prix, taking place on Monday afternoon. There should be lots of good ideas for what makes a compelling app here.
  • Completing the 2.0 Reality with Mobile: two sessions on social networking and mobile applications, taking place on Tuesday afternoon.
  • Live keynote with Google CEO Eric Schmidt, outlining his vision for mobile. This takes place on Tuesday, starting at 5.45.
  • Wednesday’s keynote ‘A world of applications’, which will take a mile-high view of the apps economy
  • Wednesday’s second keynote on mobile entertainment and lifestyle, which will discuss the unique contribution that mobile can make to interactivity, creativity and ubiquity of technology.

To find out more, view the official agenda on the Mobile World Congress website. I’ll be tweeting from Mobile World Congress as well next week, so don’t forget to follow me there too. If you’ve spotted something else in the programme you’d like to recommend, feel free to leave a comment below.

h1

DevMob 2010: Revealing the single biggest programming inefficiency

09/02/2010

What’s the biggest inefficiency in software? Opinions were divided at DevMob 2010, last week’s unconference at the Science Museum. Richard Broadhurst from games house The Creative Assembly said that their software spends a lot of time waiting for the video card to respond. File I/O was cited as a bottleneck by someone else, and one attendee was cruel enough to suggest that users were a major cause of delays. If only they could type at the same speed the computer can read their keypresses, the computer wouldn’t have to waste any time waiting for them!

Stephen Blair-Chappell, leading the session, said that the single biggest inefficiency he sees in all the code he reviews for Intel customers is… the cache miss. Most programs have this problem, he said, which arises when a program can’t find the memory it needs stored in level 1 and level 2 cache. The level 1 and 2 caches are local registers designed to deliver data to the CPU as quickly as possible. Data is always moving into level 2 cache, and then into level 1 cache, to minimise the delay to the processor. Reading data from the main memory outside the cache is at least 100 times slower, so if it happens frequently, it can be a major cause of slowdown.

Blair-Chappell recommends using tools like Intel vTune Performance Analyzer to identify cache misses in the first instance.

There are various solutions to cache misses. The processor uses prefetching all the time, trying to guess what data the program will need next and to ensure that it is available in the cache. Programmers can also put prefetch instructions into their code, so that if they know certain data will be required in 200 loop iterations, they can start to prepare the cache now. The compiler can help to lay out code so that data items live in the cache longer too.

h1

DevMob 2010: How can the compiler optimise parallel code?

08/02/2010

There was so much good stuff to come out of last week’s DevMob 2010 unconference, that I wanted to share a couple of ideas this week too. I’ve already mentioned Stephen Blair-Chappell’s session on code optimisation, but there were a few more tips for parallel programmers which I’d like to blog.

Blair-Chappell discussed how the compiler can help to optimise parallel code, with reference to two features in the Intel Compiler. The first was the auto parallelisation option. “I’m not sure whether this is not a very mature technology, or whether it’s just something that very hard to do,” he said. “In 90% of cases, this feature doesn’t make any difference at all. It’s always a good experiment to try to run the parallelisation option to see whether you get a benefit, but you need to start thinking about other strategies too because auto parallelisation doesn’t usually bring any benefit.”

One of those alternative strategies is automatic vectorisation. Vectorisation enables you to issue a single command that will perform multiple calculations of the same type on multiple different pieces of data at the same time. This SSE technology works by providing a 128 bit register which can be used to process multiple smaller data items in parallel. There’s more information and a nice example of vectorisation at Wikipedia.

The autovectorisation feature in the compiler can spot where you have a loop that can be processed more efficiently using SSE instructions, and fewer loop iterations. It automatically modifies the loop counter as well as updating the instructions in the loop. “This can provide a significant benefit,” said Blair-Chappell. “I have seen programs run ten times faster just by switching this on. Another company in the oil exploration industry turned this option on and their code ran twice as fast without them doing anything else.”

Depending on the accuracy you need for your maths and whether you are performing floating point calculations, your own programs might not exhibit such rapid acceleration, Blair-Chappell warned. But the feature is quick to use, so why not give it a go? And if you have any a success, feel free to write about it in the comments below.