Jump to content
LaptopVideo2Go Forums

CUDA vs Quad Core Performance Test


Bill

Recommended Posts

  • Replies 81
  • Created
  • Last Reply

Top Posters In This Topic

  • mobilenvidia

    28

  • Bill

    13

  • ptrein

    2

  • Blacky

    1

Guest ptrein

Changes

  1. Now supports the Intel OpenCL SDK1, 2
  2. Multi-GPU parallel compute... again3

[1] The Intel OpenCL supports Intel Core2 and better processors. It can be found here. Since this "alpha" software, install only if you agree to all the Intel disclaimers.

[2] From the screenshot below, Platform 1 Device 0 is the CPU running SSE4.1 optimized code on all eight logical processors using Intel's OpenCL SDK. The speed-up is astonishing... 25.204976 for sequential, 6.509673 for parallel, and 2.742596 over the SSE3 implementation using ATI Stream SDK 2.3!

[3] Multi-GPU parallel compute using multiple GPUs is highly dependent on the OpenCL driver implementation of asynchronous calls. I've been seeing slightly more consistent results since upgrading my drivers to 266.35 but your mileage will vary. Notice the processing time for Platform 0 Device * :)

matrixmultint_1.3.7_i7-950_a2_s1536.png

For those interested, the latest version of the program can be found here (link expires 1/14). Links to old versions have been removed.

Link to comment
Share on other sites

  • 3 weeks later...

I'll give err ago as soon as I"m back home

Sandy Bridge looks highly optimsed soon we won't need GPU's :)

Link to comment
Share on other sites

Actually, if Project Denver goes according to plan and gets realized in 2013 with Maxwell, it might mean an inversion of the whole CPU-GPU relationship as we know it today. Imagine a system with a large GPU with 4-8 embedded CPU cores and a unified caching architecture (no more blocking issues, no more transfer overhead)... that would be sweet indeed! :)

Edited by ptrein
Link to comment
Share on other sites

While it remains to be seen how fast the 4-8 core Cortex A15 will be, my Cortex A9 dev board doesn't yet compete with even a core 2 CPU. It will be a while before we can replace our 4-5 GHz Sandy Bridge PCs with ARM CPUs. However these new ARM platforms are making good ground though.

Link to comment
Share on other sites

Algorithm: Strassen's, Matrix Size: 1536x1536

OpenCL Platforms: 2, OpenCL Devices: 2

Platform 0 Device 0: GeForce GTX 470, Compute Units: 14

Platform 1 Device 0: Intel® Xeon® CPU X3470 @ 2.93GHz, Compute Units: 8

CPU ID: Intel64 Family 6 Model 30 Stepping 5, Cores: 4, Logical Processors: 8

Processing time for Core 0 (sequential): 8991.8233 ms

Processing time for Core 1 (sequential): 8726.1618 ms

Processing time for Core 2 (sequential): 8535.7822 ms

Processing time for Core 3 (sequential): 8535.2124 ms

Processing time for Core * (parallel HT): 2371.2577 ms

Processing time for Core * (parallel !HT): 2503.7507 ms

Processing time for Platform 0 Device 0 (OpenCL): 35.2478 ms

Processing time for Platform 1 Device 0 (OpenCL): 1115.0161 ms

Parallel speed-up: 3.599445, Efficiency: 89.986133% ( HT)

Parallel speed-up: 3.408971, Efficiency: 85.224264% (!HT)

OpenCL CPU speed-up: 7.654788 (sequential), 2.126658 (parallel)

OpenCL GPU speed-up: 242.148798 (sequential), 67.273921 (parallel)

My results with stock X3470

Link to comment
Share on other sites


×
×
  • Create New...