Jump to content
LaptopVideo2Go Forums
Bill

CUDA vs Quad Core Performance Test

Recommended Posts

Guest ptrein

Changes

  1. Now supports the Intel OpenCL SDK1, 2
  2. Multi-GPU parallel compute... again3

[1] The Intel OpenCL supports Intel Core2 and better processors. It can be found here. Since this "alpha" software, install only if you agree to all the Intel disclaimers.

[2] From the screenshot below, Platform 1 Device 0 is the CPU running SSE4.1 optimized code on all eight logical processors using Intel's OpenCL SDK. The speed-up is astonishing... 25.204976 for sequential, 6.509673 for parallel, and 2.742596 over the SSE3 implementation using ATI Stream SDK 2.3!

[3] Multi-GPU parallel compute using multiple GPUs is highly dependent on the OpenCL driver implementation of asynchronous calls. I've been seeing slightly more consistent results since upgrading my drivers to 266.35 but your mileage will vary. Notice the processing time for Platform 0 Device * :)

matrixmultint_1.3.7_i7-950_a2_s1536.png

For those interested, the latest version of the program can be found here (link expires 1/14). Links to old versions have been removed.

Share this post


Link to post
Share on other sites
mobilenvidia

I'll give err ago as soon as I"m back home

Sandy Bridge looks highly optimsed soon we won't need GPU's :)

Share this post


Link to post
Share on other sites
ptrein

Actually, if Project Denver goes according to plan and gets realized in 2013 with Maxwell, it might mean an inversion of the whole CPU-GPU relationship as we know it today. Imagine a system with a large GPU with 4-8 embedded CPU cores and a unified caching architecture (no more blocking issues, no more transfer overhead)... that would be sweet indeed! :)

Edited by ptrein

Share this post


Link to post
Share on other sites
Bill

While it remains to be seen how fast the 4-8 core Cortex A15 will be, my Cortex A9 dev board doesn't yet compete with even a core 2 CPU. It will be a while before we can replace our 4-5 GHz Sandy Bridge PCs with ARM CPUs. However these new ARM platforms are making good ground though.

Share this post


Link to post
Share on other sites
mobilenvidia
Algorithm: Strassen's, Matrix Size: 1536x1536

OpenCL Platforms: 2, OpenCL Devices: 2

Platform 0 Device 0: GeForce GTX 470, Compute Units: 14

Platform 1 Device 0: Intel® Xeon® CPU X3470 @ 2.93GHz, Compute Units: 8

CPU ID: Intel64 Family 6 Model 30 Stepping 5, Cores: 4, Logical Processors: 8

Processing time for Core 0 (sequential): 8991.8233 ms

Processing time for Core 1 (sequential): 8726.1618 ms

Processing time for Core 2 (sequential): 8535.7822 ms

Processing time for Core 3 (sequential): 8535.2124 ms

Processing time for Core * (parallel HT): 2371.2577 ms

Processing time for Core * (parallel !HT): 2503.7507 ms

Processing time for Platform 0 Device 0 (OpenCL): 35.2478 ms

Processing time for Platform 1 Device 0 (OpenCL): 1115.0161 ms

Parallel speed-up: 3.599445, Efficiency: 89.986133% ( HT)

Parallel speed-up: 3.408971, Efficiency: 85.224264% (!HT)

OpenCL CPU speed-up: 7.654788 (sequential), 2.126658 (parallel)

OpenCL GPU speed-up: 242.148798 (sequential), 67.273921 (parallel)

My results with stock X3470

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...