# Bibliography

* <https://en.wikipedia.org/wiki/OpenCL>
* <https://en.wikipedia.org/wiki/Task_parallelism>
* <https://en.wikipedia.org/wiki/Data_parallelism>
* <https://computing.llnl.gov/tutorials/parallel_comp/>
* <http://cs.nyu.edu/courses/spring12/CSCI-GA.3033-012/lecture1.pdf>
* <https://www.pgroup.com/lit/articles/insider/v2n1a5.htm>
* <https://cinwell.wordpress.com/2013/09/06/overview-of-gpu-architecture-fermi-based/>
* <https://blogs.msdn.microsoft.com/nativeconcurrency/2012/03/26/warp-or-wavefront-of-gpu-threads/>
* <http://blog.accelereyes.com/blog/2012/02/17/opencl_vs_cuda_webinar_recap/>
* <http://www.nvidia.com/content/GTC-2010/pdfs/2008_GTC2010.pdf>
* <http://www.cs.bris.ac.uk/home/simonm/workshops/OpenCL_lecture4.pdf>
* <http://stackoverflow.com/questions/10460742/how-do-cuda-blocks-warps-threads-map-onto-cuda-cores>
* <http://slideplayer.com/slide/6003993/>
* <https://developer.nvidia.com/opencl>
* <http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/06-intro_to_opencl.pdf>
* <http://www.nvidia.com/content/gtc/documents/1409_gtc09.pdf>
* <https://www.youtube.com/watch?v=hUiX8rBcNzw>
* <https://www.youtube.com/watch?v=M6vpq6s1h_A>
* <https://anteru.net/blog/2012/11/03/2009/>
* <https://www.youtube.com/watch?v=oc1-y1V1TPQ&list=PLWVFhSrgolFRatvUa1viEJJy0SV2sfqFr>
* <https://streamcomputing.eu/blog/2015-03-16/how-to-install-opencl-on-windows/>
* <https://www.youtube.com/watch?v=8D6yhpiQVVI>
* <https://www.youtube.com/playlist?list=PLTfYiv7-a3l7mYEdjk35wfY-KQj5yVXO2>
* <http://cims.nyu.edu/~schlacht/OpenCLModel.pdf>
* <http://www.slideshare.net/vladimirstarostenkov/hands-on-opencl>
* <http://embedded-computing.com/articles/understand-the-mobile-graphics-processing-unit/#>
* <http://www.slideshare.net/TomaszBednarz1/introduction-to-opencl-2010>
* <https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/first-opencl-program/>
* <https://streamcomputing.eu/blog/2013-06-03/the-application-areas-opencl-can-be-used/>
* <https://www.khronos.org/files/opencl-1-1-quick-reference-card.pdf>
* <https://www.khronos.org/files/opencl20-quick-reference-card.pdf>
* <http://www.iwocl.org/wp-content/uploads/iwocl-2016-opencl-caffe.pdf>
* <https://github.com/BVLC/caffe/tree/opencl/src/caffe/greentea>
* <https://github.com/amd/OpenCL-caffe/blob/stable/src/caffe/ocl/>
* <http://developer.amd.com/resources/articles-whitepapers/opencl-optimization-case-study-support-vector-machine-training/>
* <http://www.haifux.org/lectures/267/OpenCL_Dos_and_Donts.pdf>
* <http://www.nvidia.com/content/GTC/documents/1068_GTC09.pdf>
* <https://github.com/clMathLibraries/clBLAS>
* <https://www.youtube.com/playlist?list=PLzy5q1NUJKCJocUKsRxZ0IPz29p38xeM->
* <http://www.seehuhn.de/pages/linear>
* <http://stackoverflow.com/questions/6890302/barriers-in-opencl>
* <https://www.gnu.org/software/gsl/manual/html_node/GSL-CBLAS-Examples.html>
* <http://stackoverflow.com/questions/3606636/cuda-model-what-is-warp-size>
* <https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf>
* <https://www.cs.utexas.edu/~pingali/CS378/2015sp/lectures/BasicGPUPerformance.pdf>
* <https://www.cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf>
* <https://nvlabs.github.io/moderngpu/performance.html>
* <http://courses.cms.caltech.edu/cs179/2015_lectures/cs179_2015_lec05.pdf>
* <http://www.ertl.jp/~shinpei/papers/icpads13.pdf>
* <http://blogs.mathworks.com/loren/2012/12/14/measuring-gpu-performance/>
* <http://www.stuffedcow.net/research/cudabmk>
* <http://stackoverflow.com/questions/4097635/how-many-memory-latency-cycles-per-memory-access-type-in-opencl-cuda>
* <https://en.wikipedia.org/wiki/Amdahl%27s_law>
* <https://www.javacodegeeks.com/2013/02/amdahls-law-illustrated.html>
* <https://streamcomputing.eu/blog/2015-08-14/opencl-basics-multiple-opencl-devices-with-the-icd/>
* [http://opencl.codeplex.com/wikipage?title=OpenCL Tutorials - 1](http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201)
* <http://dhruba.name/2012/08/16/opencl-cookbook-building-a-program-and-debugging-failures/>
* <https://www.olcf.ornl.gov/tutorials/opencl-vector-addition/>
* <http://will-landau.com/gpu/lectures/cudac-atomics/cudac-atomics.pdf>
* <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0538f/BABHCEGA.html>
* <http://sett.com/gpgpu>
* <http://stackoverflow.com/questions/17704216/opencl-image2d-and-image3d-memory-layout>
* <https://www.microway.com/hpc-tech-tips/cuda-host-to-device-transfers-and-data-movement/>
* <https://en.wikipedia.org/wiki/CUDA_Pinned_memory>
* <http://courses.cms.caltech.edu/cs179/2015_lectures/cs179_2015_lec13.pdf>
* <http://baptiste-wicht.com/posts/2011/09/profile-c-application-with-callgrind-kcachegrind.html>
* <https://web.stanford.edu/class/cs107/guide_callgrind.html>
* <https://www.youtube.com/watch?v=fvTsFjDuag8&list=PLGvfHSgImk4ZZq5KWX0mGT0kgwy9-I-Qe>
* <https://devblogs.nvidia.com/parallelforall/how-implement-performance-metrics-cuda-cc/>
* <https://web.stanford.edu/class/cs107/guide_callgrind.html>
* <https://web.stanford.edu/class/cs107/guide_gdb.html>
* <http://web.stanford.edu/~adyuen/107>
* <https://www.tutorialspoint.com/cprogramming/c_multi_dimensional_arrays.htm>
* <http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/>
* <http://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays/>
* <https://kusemanohar.wordpress.com/2012/08/13/c-performance-analysis-profiling-tools/>
* <http://www.brendangregg.com/perf.html>
* <https://github.com/CppCon/CppCon2014>
* <https://www.youtube.com/watch?v=W0gtG67GUIw>
* <https://www.youtube.com/watch?v=FJW8nGV4jxY>
* <http://sandsoftwaresound.net/perf/perf-tutorial-hot-spots/>
* <https://devblogs.nvidia.com/parallelforall/cudacasts-episode-19-cuda-6-guided-performance-analysis-visual-profiler/>
* <http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-guided-analysis-nvidia-visual-profiler.pdf>
* <http://gpgpu-computing4.blogspot.co.uk/2009/10/matrix-multiplication-3-opencl.html>
* <http://www.cs.bris.ac.uk/home/simonm/workshops/OpenCL_lecture3.pdf>
* <http://parallelis.com/how-to-measure-opencl-kernel-execution-time/>
* <http://www.cmsoft.com.br/opencl-tutorial/case-study-matrix-multiplication/>
* <http://www.cmsoft.com.br/opencl-tutorial/case-study-high-performance-convolution-using-opencl-__local-memory/> \_\_
* <http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/programming-in-opencl/image-convolution-using-opencl/image-convolution-using-opencl-a-step-by-step-tutorial/>
* <https://devblogs.nvidia.com/parallelforall/cuda-7-5-pinpoint-performance-problems-instruction-level-profiling/>
* <http://www.cedricnugteren.nl/tutorial.php>
* <https://devblogs.nvidia.com/parallelforall/cuda-7-5-pinpoint-performance-problems-instruction-level-profiling/>
* <http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-guided-analysis-nvidia-visual-profiler.pdf>
* <http://gpgpu-computing4.blogspot.co.uk/2009/10/matrix-multiplication-3-opencl.html>
* <http://www.cs.bris.ac.uk/home/simonm/workshops/OpenCL_lecture3.pdf>
* <http://parallelis.com/how-to-measure-opencl-kernel-execution-time/>
* <https://www.sharcnet.ca/help/index.php/Porting_CUDA_to_OpenCL>
* <http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-resources/programming-in-opencl/porting-cuda-applications-to-opencl/>
* <http://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/>
* <https://github.com/opencv/opencv_contrib/blob/master/modules/dnn/src/opencl/im2col.cl>
* <https://github.com/hughperkins/DeepCL>
* <https://classes.soe.ucsc.edu/ee264/Fall11/cmex.pdf>
* <http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/>
* <http://uk.mathworks.com/matlabcentral/newsreader/view_thread/241754>
* <http://stackoverflow.com/questions/11220250/how-do-i-profile-a-mex-function-in-matlab>
* <http://uk.mathworks.com/help/matlab/matlab_external/debugging-on-linux-platforms.html>
* <https://www.youtube.com/watch?v=4vAuwk3bj4s>
* <https://www.youtube.com/watch?v=pvuCg2yT5wY>
* <http://mdkey.org/?p=174>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://leonardoaraujosantos.gitbook.io/opencl/bibliography.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
