CUDA vs Quad Core Performance Test

January 8, 2011

BTW, I'm going to be testing out Intel's OpenCL driver on my system tonight.

Their's is SSE 4.1 only (versus SSE 3 for the AMD driver) - Core2 and better processors basically.

January 8, 2011

Changes

Now supports the Intel OpenCL SDK^{1, 2}
Multi-GPU parallel compute... again³

[1] The Intel OpenCL supports Intel Core2 and better processors. It can be found here. Since this "alpha" software, install only if you agree to all the Intel disclaimers.

[2] From the screenshot below, Platform 1 Device 0 is the CPU running SSE4.1 optimized code on all eight logical processors using Intel's OpenCL SDK. The speed-up is astonishing... 25.204976 for sequential, 6.509673 for parallel, and 2.742596 over the SSE3 implementation using ATI Stream SDK 2.3!

[3] Multi-GPU parallel compute using multiple GPUs is highly dependent on the OpenCL driver implementation of asynchronous calls. I've been seeing slightly more consistent results since upgrading my drivers to 266.35 but your mileage will vary. Notice the processing time for Platform 0 Device * :)

For those interested, the latest version of the program can be found here (link expires 1/14). Links to old versions have been removed.

January 29, 2011

Two words... Sandy Bridge. :)

Core i7-2600K (stock)

Core i7-950 (stock)

Latest version (1.3.9) of utility attached.

matrixmultint_v1.3.9.7z

Edited January 29, 2011 by ptrein

January 29, 2011

I'll give err ago as soon as I"m back home

Sandy Bridge looks highly optimsed soon we won't need GPU's :)

January 30, 2011

Actually, if Project Denver goes according to plan and gets realized in 2013 with Maxwell, it might mean an inversion of the whole CPU-GPU relationship as we know it today. Imagine a system with a large GPU with 4-8 embedded CPU cores and a unified caching architecture (no more blocking issues, no more transfer overhead)... that would be sweet indeed! :)

Edited January 30, 2011 by ptrein

January 30, 2011

While it remains to be seen how fast the 4-8 core Cortex A15 will be, my Cortex A9 dev board doesn't yet compete with even a core 2 CPU. It will be a while before we can replace our 4-5 GHz Sandy Bridge PCs with ARM CPUs. However these new ARM platforms are making good ground though.

February 2, 2011

Algorithm: Strassen's, Matrix Size: 1536x1536
OpenCL Platforms: 2, OpenCL Devices: 2

Platform 0 Device 0: GeForce GTX 470, Compute Units: 14

Platform 1 Device 0: Intel® Xeon® CPU X3470 @ 2.93GHz, Compute Units: 8

CPU ID: Intel64 Family 6 Model 30 Stepping 5, Cores: 4, Logical Processors: 8

Processing time for Core 0 (sequential): 8991.8233 ms

Processing time for Core 1 (sequential): 8726.1618 ms

Processing time for Core 2 (sequential): 8535.7822 ms

Processing time for Core 3 (sequential): 8535.2124 ms

Processing time for Core * (parallel HT): 2371.2577 ms

Processing time for Core * (parallel !HT): 2503.7507 ms

Processing time for Platform 0 Device 0 (OpenCL): 35.2478 ms

Processing time for Platform 1 Device 0 (OpenCL): 1115.0161 ms

Parallel speed-up: 3.599445, Efficiency: 89.986133% ( HT)

Parallel speed-up: 3.408971, Efficiency: 85.224264% (!HT)

OpenCL CPU speed-up: 7.654788 (sequential), 2.126658 (parallel)

OpenCL GPU speed-up: 242.148798 (sequential), 67.273921 (parallel)

My results with stock X3470

Sign In

CUDA vs Quad Core Performance Test

Recommended Posts

Guest ptrein

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Guest ptrein

Link to comment

Share on other sites

ptrein

Link to comment

Share on other sites

mobilenvidia

Link to comment

Share on other sites

ptrein

Link to comment

Share on other sites

Bill

Link to comment

Share on other sites

mobilenvidia

Link to comment

Share on other sites

Browse

Activity