Guest ptrein Posted January 8, 2011 Report Share Posted January 8, 2011 BTW, I'm going to be testing out Intel's OpenCL driver on my system tonight. Their's is SSE 4.1 only (versus SSE 3 for the AMD driver) - Core2 and better processors basically. Link to comment Share on other sites More sharing options...
Guest ptrein Posted January 8, 2011 Report Share Posted January 8, 2011 Changes Now supports the Intel OpenCL SDK1, 2 Multi-GPU parallel compute... again3 [1] The Intel OpenCL supports Intel Core2 and better processors. It can be found here. Since this "alpha" software, install only if you agree to all the Intel disclaimers. [2] From the screenshot below, Platform 1 Device 0 is the CPU running SSE4.1 optimized code on all eight logical processors using Intel's OpenCL SDK. The speed-up is astonishing... 25.204976 for sequential, 6.509673 for parallel, and 2.742596 over the SSE3 implementation using ATI Stream SDK 2.3! [3] Multi-GPU parallel compute using multiple GPUs is highly dependent on the OpenCL driver implementation of asynchronous calls. I've been seeing slightly more consistent results since upgrading my drivers to 266.35 but your mileage will vary. Notice the processing time for Platform 0 Device * :) For those interested, the latest version of the program can be found here (link expires 1/14). Links to old versions have been removed. Link to comment Share on other sites More sharing options...
ptrein Posted January 29, 2011 Report Share Posted January 29, 2011 (edited) Two words... Sandy Bridge. :) Core i7-2600K (stock) Core i7-950 (stock) Latest version (1.3.9) of utility attached. matrixmultint_v1.3.9.7z Edited January 29, 2011 by ptrein Link to comment Share on other sites More sharing options...
mobilenvidia Posted January 29, 2011 Report Share Posted January 29, 2011 I'll give err ago as soon as I"m back home Sandy Bridge looks highly optimsed soon we won't need GPU's :) Link to comment Share on other sites More sharing options...
ptrein Posted January 30, 2011 Report Share Posted January 30, 2011 (edited) Actually, if Project Denver goes according to plan and gets realized in 2013 with Maxwell, it might mean an inversion of the whole CPU-GPU relationship as we know it today. Imagine a system with a large GPU with 4-8 embedded CPU cores and a unified caching architecture (no more blocking issues, no more transfer overhead)... that would be sweet indeed! :) Edited January 30, 2011 by ptrein Link to comment Share on other sites More sharing options...
Bill Posted January 30, 2011 Author Report Share Posted January 30, 2011 While it remains to be seen how fast the 4-8 core Cortex A15 will be, my Cortex A9 dev board doesn't yet compete with even a core 2 CPU. It will be a while before we can replace our 4-5 GHz Sandy Bridge PCs with ARM CPUs. However these new ARM platforms are making good ground though. Link to comment Share on other sites More sharing options...
mobilenvidia Posted February 2, 2011 Report Share Posted February 2, 2011 Algorithm: Strassen's, Matrix Size: 1536x1536OpenCL Platforms: 2, OpenCL Devices: 2 Platform 0 Device 0: GeForce GTX 470, Compute Units: 14 Platform 1 Device 0: Intel® Xeon® CPU X3470 @ 2.93GHz, Compute Units: 8 CPU ID: Intel64 Family 6 Model 30 Stepping 5, Cores: 4, Logical Processors: 8 Processing time for Core 0 (sequential): 8991.8233 ms Processing time for Core 1 (sequential): 8726.1618 ms Processing time for Core 2 (sequential): 8535.7822 ms Processing time for Core 3 (sequential): 8535.2124 ms Processing time for Core * (parallel HT): 2371.2577 ms Processing time for Core * (parallel !HT): 2503.7507 ms Processing time for Platform 0 Device 0 (OpenCL): 35.2478 ms Processing time for Platform 1 Device 0 (OpenCL): 1115.0161 ms Parallel speed-up: 3.599445, Efficiency: 89.986133% ( HT) Parallel speed-up: 3.408971, Efficiency: 85.224264% (!HT) OpenCL CPU speed-up: 7.654788 (sequential), 2.126658 (parallel) OpenCL GPU speed-up: 242.148798 (sequential), 67.273921 (parallel) My results with stock X3470 Link to comment Share on other sites More sharing options...
Recommended Posts