.

After playing a while with Qt 4.4 on my N800 (will be the same for N810), the impression of "slow" (like latest Adam’s latest post) is something that I predicted will soon come out. So far, my very brief investigation reveals that the cause for this is both simple and sad: the Maemo’s X is running 16-bit visual (N8xx’s display is Highcolor only, not Truecolor). Thus, if you do something fancy with 32-bit stuff, like running a QPainter on a Format_ARGB32 or Format_ARGB32_Premultiplied QImage, then basically you force a 32-to-16 conversion every time you blit the image.

Take a look at PictureFlow, the infamous clone of Cover Flow. You can just checkout it, run qmake and make as usual and start to have some fun (all inside Scratchbox, and after that transfer it to the N8xx).

Surprisingly, it runs horribly slow, probably less than 5 fps. Consider than PictureFlow does its magic by straight pixel manipulation and then just blit the result, this looks really weird. However, it was found out that Qt needs to perform the conversion from 32-bit image to 16-bit pixmap for every single frame, hence the terrible slow down. In short, there is more CPU power wasted for the blitting vs for the rendering. Check your callgrind output to confirm this.

Good news: this is something that can be fixed (both in the application level and probably inside Qt). Personally this one is my favorite aim for Qt 4.5. Once we start to see ported Qt apps for Maemo, we can’t afford to allow a potential performance penalty like this, it would just give a wrong impression of Qt. Don’t you agree?

Sidenote: 16-bit can be useful because (on a non-accelerated graphics system), you can perform faster full-screen update (less transferred bytes). 32-bit is however easier to handle because each color component lives in a separate byte.

  • http://www.blogger.com/profile/05943813284575224375 notriddle

    @ariya: Good news: this is something that can be fixed (both in the application level and probably inside Qt). Personally this one is my favorite aim for Qt 4.5. Once we start to see ported Qt apps for Maemo, we can’t afford to allow a potential performance penalty like this, it would just give a wrong impression of Qt. Don’t you agree?

    Just curious, how? If Qt internally uses 32bit color, it can be set to internally use 16bit color if the screen is 16bit, but is there any other way.

  • http://www.blogger.com/profile/05943813284575224375 notriddle

    @ariya: Good news: this is something that can be fixed (both in the application level and probably inside Qt). Personally this one is my favorite aim for Qt 4.5. Once we start to see ported Qt apps for Maemo, we can’t afford to allow a potential performance penalty like this, it would just give a wrong impression of Qt. Don’t you agree?

    Just curious, how? If Qt internally uses 32bit color, it can be set to internally use 16bit color if the screen is 16bit, but is there any other way.

  • http://negerihijau.blogspot.com ali

    tukaran link ya bro, saya sudah tambahkan blog ini di blog saya

  • http://negerihijau.blogspot.com ali

    tukaran link ya bro, saya sudah tambahkan blog ini di blog saya

  • http://www.blogger.com/profile/03121582140059106015 Ariya Hidayat

    @notriddle: I’ll elaborate the tricks after I can manage to make PictureFlow flies on N800.

  • http://www.blogger.com/profile/03121582140059106015 Ariya Hidayat

    @notriddle: I’ll elaborate the tricks after I can manage to make PictureFlow flies on N800.

  • http://www.blogger.com/profile/08817893714831737510 Gustavo Sverzut Barbieri

    Hey,

    Please check my software_16 engine for Evas (Canola runs on it), I did lots of optimization for 16bpp that you could use, including the best loop unrolling for each operation (based on tests/benchmarks).

    Also, don’t forget that if you take a minor visual hit and downscale alpha from 8 to 5 bits, you can do RGB-565 alpha math using just one arithmetic operation: unpack the 16bits pixel into the 32bits register (move green to the higher half-word), leaving at least 5 bits padding, do the math once (ie: multiply, add), then mask and go back to 16bits. ARM has shifted reads and GCC will produce those, so the shifts you need are not much impact.

    But about coverflow, you better remember that you must enable VFPU to make floating point operations suck less.

    Glad to see Nokia working on such a mandatory thing! I was hoping that for years!

  • http://www.blogger.com/profile/08817893714831737510 Gustavo Sverzut Barbieri

    Hey,

    Please check my software_16 engine for Evas (Canola runs on it), I did lots of optimization for 16bpp that you could use, including the best loop unrolling for each operation (based on tests/benchmarks).

    Also, don’t forget that if you take a minor visual hit and downscale alpha from 8 to 5 bits, you can do RGB-565 alpha math using just one arithmetic operation: unpack the 16bits pixel into the 32bits register (move green to the higher half-word), leaving at least 5 bits padding, do the math once (ie: multiply, add), then mask and go back to 16bits. ARM has shifted reads and GCC will produce those, so the shifts you need are not much impact.

    But about coverflow, you better remember that you must enable VFPU to make floating point operations suck less.

    Glad to see Nokia working on such a mandatory thing! I was hoping that for years!

  • Anonymous

    Don’t forget that Nokia’s framebuffer max performance is 4.2M pixels per second. Writing all screen pixels per frame is not optimize. It’s only 11fps. If You want 24fps animation, You must change only 176k pixels. For example: it’s surface size 800×230. In gtk/cairo I use lots of “clip” and “invalidate_rect”.

  • Anonymous

    Don’t forget that Nokia’s framebuffer max performance is 4.2M pixels per second. Writing all screen pixels per frame is not optimize. It’s only 11fps. If You want 24fps animation, You must change only 176k pixels. For example: it’s surface size 800×230. In gtk/cairo I use lots of “clip” and “invalidate_rect”.

  • http://www.blogger.com/profile/03121582140059106015 Ariya Hidayat

    @Gustavo: I think Qt’s own 16-bit raster engine is just fine. You can see that Qt for Embedded Linux performs quite well (running to the framebuffer directly) on N8xx. It is just missing few tweaks in the X11 paint engine.

    Also, VFPU is irrelevant here as my PictureFlow does not need any FPU :-) It is pure integer-based operation.

  • http://www.blogger.com/profile/03121582140059106015 Ariya Hidayat

    @Gustavo: I think Qt’s own 16-bit raster engine is just fine. You can see that Qt for Embedded Linux performs quite well (running to the framebuffer directly) on N8xx. It is just missing few tweaks in the X11 paint engine.

    Also, VFPU is irrelevant here as my PictureFlow does not need any FPU :-) It is pure integer-based operation.

  • Dragan

    What’s the status of this in Qt 4.5?

  • Dragan

    What’s the status of this in Qt 4.5?

  • Anonymous

    I read a post about v0.7 in http://forum.xda-developers.com/showthread.php?t=358049, and check the performance is better.

    Anyone know how to download the source code for PictureFlow-v0.7-Qt-WinCE project?

    Is there any continuous effect on this project?

    Good job.

  • Anonymous

    I read a post about v0.7 in http://forum.xda-developers.com/showthread.php?t=358049, and check the performance is better.

    Anyone know how to download the source code for PictureFlow-v0.7-Qt-WinCE project?

    Is there any continuous effect on this project?

    Good job.