https://en.wikipedia.org/wiki/OpenCL
https://en.wikipedia.org/wiki/Task_parallelism
https://en.wikipedia.org/wiki/Data_parallelism
https://computing.llnl.gov/tutorials/parallel_comp/
http://cs.nyu.edu/courses/spring12/CSCI-GA.3033-012/lecture1.pdf
https://www.pgroup.com/lit/articles/insider/v2n1a5.htm
https://cinwell.wordpress.com/2013/09/06/overview-of-gpu-architecture-fermi-based/
https://blogs.msdn.microsoft.com/nativeconcurrency/2012/03/26/warp-or-wavefront-of-gpu-threads/
http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/
http://www.nvidia.com/content/GTC-2010/pdfs/2008_GTC2010.pdf
http://www.cs.bris.ac.uk/home/simonm/workshops/OpenCL_lecture4.pdf
http://stackoverflow.com/questions/10460742/how-do-cuda-blocks-warps-threads-map-onto-cuda-cores
http://slideplayer.com/slide/6003993/
https://developer.nvidia.com/opencl
http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/06-intro_to_opencl.pdf
http://www.nvidia.com/content/gtc/documents/1409_gtc09.pdf
https://www.youtube.com/watch?v=hUiX8rBcNzw
https://www.youtube.com/watch?v=M6vpq6s1h_A
https://anteru.net/blog/2012/11/03/2009/
https://www.youtube.com/watch?v=oc1-y1V1TPQ&list=PLWVFhSrgolFRatvUa1viEJJy0SV2sfqFr
https://streamcomputing.eu/blog/2015-03-16/how-to-install-opencl-on-windows/
https://www.youtube.com/watch?v=8D6yhpiQVVI
https://www.youtube.com/playlist?list=PLTfYiv7-a3l7mYEdjk35wfY-KQj5yVXO2
http://cims.nyu.edu/~schlacht/OpenCLModel.pdf
http://www.slideshare.net/vladimirstarostenkov/hands-on-opencl
http://embedded-computing.com/articles/understand-the-mobile-graphics-processing-unit/#
http://www.slideshare.net/TomaszBednarz1/introduction-to-opencl-2010
https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/first-opencl-program/
https://streamcomputing.eu/blog/2013-06-03/the-application-areas-opencl-can-be-used/
https://www.khronos.org/files/opencl-1-1-quick-reference-card.pdf
https://www.khronos.org/files/opencl20-quick-reference-card.pdf
http://www.iwocl.org/wp-content/uploads/iwocl-2016-opencl-caffe.pdf
https://github.com/BVLC/caffe/tree/opencl/src/caffe/greentea
https://github.com/amd/OpenCL-caffe/blob/stable/src/caffe/ocl/
http://developer.amd.com/resources/articles-whitepapers/opencl-optimization-case-study-support-vector-machine-training/
http://www.haifux.org/lectures/267/OpenCL_Dos_and_Donts.pdf
http://www.nvidia.com/content/GTC/documents/1068_GTC09.pdf
https://github.com/clMathLibraries/clBLAS
https://www.youtube.com/playlist?list=PLzy5q1NUJKCJocUKsRxZ0IPz29p38xeM-
http://www.seehuhn.de/pages/linear
http://stackoverflow.com/questions/6890302/barriers-in-opencl
https://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html
http://stackoverflow.com/questions/3606636/cuda-model-what-is-warp-size
https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
https://www.cs.utexas.edu/~pingali/CS378/2015sp/lectures/BasicGPUPerformance.pdf
https://www.cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf
https://nvlabs.github.io/moderngpu/performance.html
http://courses.cms.caltech.edu/cs179/2015_lectures/cs179_2015_lec05.pdf
http://www.ertl.jp/~shinpei/papers/icpads13.pdf
http://blogs.mathworks.com/loren/2012/12/14/measuring-gpu-performance/
http://www.stuffedcow.net/research/cudabmk
http://stackoverflow.com/questions/4097635/how-many-memory-latency-cycles-per-memory-access-type-in-opencl-cuda
https://en.wikipedia.org/wiki/Amdahl%27s_law
https://www.javacodegeeks.com/2013/02/amdahls-law-illustrated.html
https://streamcomputing.eu/blog/2015-08-14/opencl-basics-multiple-opencl-devices-with-the-icd/
http://opencl.codeplex.com/wikipage?title=OpenCL Tutorials - 1
http://dhruba.name/2012/08/16/opencl-cookbook-building-a-program-and-debugging-failures/
https://www.olcf.ornl.gov/tutorials/opencl-vector-addition/
http://will-landau.com/gpu/lectures/cudac-atomics/cudac-atomics.pdf
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538f/BABHCEGA.html
http://sett.com/gpgpu
http://stackoverflow.com/questions/17704216/opencl-image2d-and-image3d-memory-layout
https://www.microway.com/hpc-tech-tips/cuda-host-to-device-transfers-and-data-movement/
https://en.wikipedia.org/wiki/CUDA_Pinned_memory
http://courses.cms.caltech.edu/cs179/2015_lectures/cs179_2015_lec13.pdf
http://baptiste-wicht.com/posts/2011/09/profile-c-application-with-callgrind-kcachegrind.html
https://web.stanford.edu/class/cs107/guide_callgrind.html
https://www.youtube.com/watch?v=fvTsFjDuag8&list=PLGvfHSgImk4ZZq5KWX0mGT0kgwy9-I-Qe
https://devblogs.nvidia.com/parallelforall/how-implement-performance-metrics-cuda-cc/
https://web.stanford.edu/class/cs107/guide_gdb.html
http://web.stanford.edu/~adyuen/107
https://www.tutorialspoint.com/cprogramming/c_multi_dimensional_arrays.htm
http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
http://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays/
https://kusemanohar.wordpress.com/2012/08/13/c-performance-analysis-profiling-tools/
http://www.brendangregg.com/perf.html
https://github.com/CppCon/CppCon2014
https://www.youtube.com/watch?v=W0gtG67GUIw
https://www.youtube.com/watch?v=FJW8nGV4jxY
http://sandsoftwaresound.net/perf/perf-tutorial-hot-spots/
https://devblogs.nvidia.com/parallelforall/cudacasts-episode-19-cuda-6-guided-performance-analysis-visual-profiler/
http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-guided-analysis-nvidia-visual-profiler.pdf
http://gpgpu-computing4.blogspot.co.uk/2009/10/matrix-multiplication-3-opencl.html
http://www.cs.bris.ac.uk/home/simonm/workshops/OpenCL_lecture3.pdf
http://parallelis.com/how-to-measure-opencl-kernel-execution-time/
http://www.cmsoft.com.br/opencl-tutorial/case-study-matrix-multiplication/
http://www.cmsoft.com.br/opencl-tutorial/case-study-high-performance-convolution-using-opencl-__local-memory/ __
http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/programming-in-opencl/image-convolution-using-opencl/image-convolution-using-opencl-a-step-by-step-tutorial/
https://devblogs.nvidia.com/parallelforall/cuda-7-5-pinpoint-performance-problems-instruction-level-profiling/
http://www.cedricnugteren.nl/tutorial.php
https://www.sharcnet.ca/help/index.php/Porting_CUDA_to_OpenCL
http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/programming-in-opencl/porting-cuda-applications-to-opencl/
http://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/
https://github.com/opencv/opencv_contrib/blob/master/modules/dnn/src/opencl/im2col.cl
https://github.com/hughperkins/DeepCL
https://classes.soe.ucsc.edu/ee264/Fall11/cmex.pdf
http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/
http://uk.mathworks.com/matlabcentral/newsreader/view_thread/241754
http://stackoverflow.com/questions/11220250/how-do-i-profile-a-mex-function-in-matlab
http://uk.mathworks.com/help/matlab/matlab_external/debugging-on-linux-platforms.html
https://www.youtube.com/watch?v=4vAuwk3bj4s
https://www.youtube.com/watch?v=pvuCg2yT5wY
http://mdkey.org/?p=174
Last updated 5 years ago