<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>VIDWAV.com &#187; 开发</title>
	<atom:link href="http://www.vidwav.com/category/develop/feed" rel="self" type="application/rss+xml" />
	<link>http://www.vidwav.com</link>
	<description>关注于视频技术领域的相关资讯，研发和产品</description>
	<lastBuildDate>Thu, 20 May 2010 13:51:01 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Mac OS X Leopard Xcode环境下的NVIDIA CUDA设置</title>
		<link>http://www.vidwav.com/2009/06/mac-os-x-leopard-xcode-nvidia-cuda.htm</link>
		<comments>http://www.vidwav.com/2009/06/mac-os-x-leopard-xcode-nvidia-cuda.htm#comments</comments>
		<pubDate>Sun, 14 Jun 2009 03:30:22 +0000</pubDate>
		<dc:creator>Yu Liu</dc:creator>
				<category><![CDATA[开发]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[IDE]]></category>
		<category><![CDATA[Xcode]]></category>

		<guid isPermaLink="false">http://www.vidwav.com/2009/06/mac-os-x-leopard-xcode%e7%8e%af%e5%a2%83%e4%b8%8b%e7%9a%84nvidia-cuda%e7%bc%96%e7%a8%8b.htm</guid>
		<description><![CDATA[在前面的文章中，曾介绍过<a href="http://www.vidwav.com/2009/06/integrate-cuda-into-ide.htm">Xcode IDE环境下的CUDA设置</a>。之前提过我的CUDA Plugin for Xcode一直设置不成功，经过一番折腾，总算是把Xcode IDE的CUDA配置给搞定了。下面介绍一下如何设置CUDA Plugin for Xcode IDE。
<ol>
<li>
首先安装NVIDIA最新的显卡驱动,以及 CUDA Toolkit 和 CUDA SDK. 下载地址 <a href="http://www.nvidia.com/object/cuda_get.html">here</a>.
</li>
<li>
安装CUDA之后，你可以测试一下你的机器是否CUDA配置正确。方法如下：进入CUDA安装目录，我将CUDA安装在/Developer目录下。在/Developer/CUDA下有一个Makefile，在该目录[......]</li></ol><p class='read-more'><a href='http://www.vidwav.com/2009/06/mac-os-x-leopard-xcode-nvidia-cuda.htm'>继续阅读</a></p>]]></description>
			<content:encoded><![CDATA[<p align="justify">在前面的文章中，曾介绍过<a href="http://www.vidwav.com/2009/06/integrate-cuda-into-ide.htm">Xcode IDE环境下的CUDA设置</a>。之前提过我的CUDA Plugin for Xcode一直设置不成功，经过一番折腾，总算是把Xcode IDE的CUDA配置给搞定了。下面介绍一下如何设置CUDA Plugin for Xcode IDE。</p>
<ol>
<li>
<div>首先安装NVIDIA最新的显卡驱动,以及 CUDA Toolkit 和 CUDA SDK. 下载地址 <a href="http://www.nvidia.com/object/cuda_get.html">here</a>.</div>
</li>
<li>
<div>安装CUDA之后，你可以测试一下你的机器是否CUDA配置正确。方法如下：进入CUDA安装目录，我将CUDA安装在/Developer目录下。在/Developer/CUDA下有一个Makefile，在该目录下运行make，编译projects目录下的所有demo程序，其中就包括一个deviceQuery程序。编译完成后，你可以在/Developer/CUDA/bin/darwin/release/目录下运行deviceQuery程序，其应该输出以下信息; 否则，你的机器并没有CUDA capable的GPU，或者GPU设备驱动并没有正确安装。</div>
<div><img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="devicequery" src="http://ftp.vidwav.com/image/MacOSXLeopardXcodeNVIDIACUDA_99CA/devicequery_thumb.png" border="0" alt="devicequery" width="511" height="394" /></div>
</li>
<li>下载NVCuda Plugin for Xcode，下载地址<a href="http://ftp.vidwav.com/cuda/nvcuda_plugin.zip">here</a>.</li>
<li>
<div>解压缩nvcuda_plugin.zip包，将里面的NVCuda.pbplugin文件拷贝到<br />
&#8220;/Library/Application\ Support/Developer/Shared/Xcode/Plug-ins/“ 目录下， 如果你没有这个目录，那么在这个路径上创建相应的目录结构。重新启动你的Xcode后，在你的工程Target的Build Tab下应该有一个叫<strong>NVIDA Cuda – Code Generation</strong>的Section了。</div>
</li>
<li>
<div>打开Xcode，创建一个Command Line Utility下的C++ Tool的工程，将你的源代码拷贝到该工程的目录下并加入工程。</div>
</li>
<li>
<div>在菜单Project -&gt; Edit Active Target下的Build tab进行以下设置<br />
在Section: <strong>Linking</strong>中的Other Linker Flags添加-lcuda, -lcudart, 并选中Prebinding<br />
<img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="linking" src="http://ftp.vidwav.com/image/MacOSXLeopardXcodeNVIDIACUDA_99CA/linking_thumb.png" border="0" alt="linking" width="604" height="113" />以及Section: <strong>Search Paths<span style="font-weight: normal;">中:</span><br />
在<span style="font-weight: normal;">Header Search Paths中添加CUDA的系统目录/usr/local/cuda/include/**,如果你用到了CUDA SDK里面的函数,则需要加上/Developer/CUDA/common/inc.<br />
在Library Search Paths中添加CUDA的系统目录/usr/local/cuda/lib,如果你用到了CUDA SDK里面的函数,则需要加上/Developer/CUDA/lib 和/Developer/CUDA/common/lib<br />
<img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="searchpaths" src="http://ftp.vidwav.com/image/MacOSXLeopardXcodeNVIDIACUDA_99CA/searchpaths_thumb.png" border="0" alt="searchpaths" width="604" height="137" />以及Section: </span>NVIDA Cuda – Code Generation</strong>中,Host Compilation设置为c++.</div>
<p style="text-align: center;"><img class="aligncenter" style="display: inline; border-width: 0px;" title="cudaplugin" src="http://ftp.vidwav.com/image/MacOSXLeopardXcodeNVIDIACUDA_99CA/cudaplugin_thumb.png" border="0" alt="cudaplugin" width="604" height="193" /></p>
</li>
<li>点击Build and Go 按钮，你的程序应该不会再有&#8221;no rule to process file test.cu … for architecture i386″的错误了。</li>
<li>如果你用到了CUDA SDK的函数,有可能会出现Link时找不到相应函数,这时你需要将CUDA SDK的库加入你的工程中,如libcutil.a库文件.</li>
<li>Have Fun！</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.vidwav.com/2009/06/mac-os-x-leopard-xcode-nvidia-cuda.htm/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDA程序初窥</title>
		<link>http://www.vidwav.com/2009/06/first-cuda-program.htm</link>
		<comments>http://www.vidwav.com/2009/06/first-cuda-program.htm#comments</comments>
		<pubDate>Wed, 10 Jun 2009 05:50:41 +0000</pubDate>
		<dc:creator>Yu Liu</dc:creator>
				<category><![CDATA[开发]]></category>
		<category><![CDATA[CUDA]]></category>

		<guid isPermaLink="false">http://www.vidwav.com/?p=163</guid>
		<description><![CDATA[在网上看到一篇比较不错的CUDA工程模板程序的详细分析文章，大家可以用来了解一个CUDA程序的基本结构。转载于<a href="http://blog.csdn.net/darkstorm2111203/archive/2008/08/22/2813480.aspx">http://blog.csdn.net/darkstorm2111203/archive/2008/08/22/2813480.aspx</a>。

 /* Template_Host.c 用于演示如何生成cuda工程的样本程序 */
/* 主机端，也就是cpu code*/

// includes, system
#include &#60;stdlib.h&#62;
#include &#60;stdio.h&#62;
#include &#60;string.h&#62;
#include &#038;l[......]<p class='read-more'><a href='http://www.vidwav.com/2009/06/first-cuda-program.htm'>继续阅读</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">在网上看到一篇比较不错的CUDA工程模板程序的详细分析文章，大家可以用来了解一个CUDA程序的基本结构。转载于<a href="http://blog.csdn.net/darkstorm2111203/archive/2008/08/22/2813480.aspx">http://blog.csdn.net/darkstorm2111203/archive/2008/08/22/2813480.aspx</a>。</p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt">
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span lang="EN-US"><span style="font-size: small; color: #000000;"> </span></span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">/* Template_Host.c </span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes">用于演示如何生成<span lang="EN-US">cuda</span>工程的样本程序 */</span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">/* </span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes">主机端，也就是<span lang="EN-US">cpu code</span></span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">*/</span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">// includes, system<br />
</span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&lt;stdlib.h&gt;<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&lt;stdio.h&gt;<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&lt;string.h&gt;<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&lt;math.h&gt;</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">// includes, project<br />
</span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&lt;cutil.h&gt;<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; COLOR: maroon; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">调用<span lang="EN-US">CUDA</span>扩展必须的头文件</span><br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">// includes, kernels<br />
</span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> &#8220;</span><span style="COLOR: maroon">template_kernel.cu&#8221;<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">// declaration, forward<br />
</span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">void</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> runTest( </span><span style="COLOR: blue">int</span><span style="color: #000000;"> argc, </span><span style="COLOR: blue">char</span><span style="color: #000000;">** argv);</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">extern</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&#8220;C</span></span><span style="FONT-SIZE: 9pt; COLOR: maroon; mso-ascii-font-family: 新宋体; mso-fareast-font-family: 新宋体; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">”</span><span style="FONT-SIZE: 9pt; COLOR: maroon; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">表示使用<span lang="EN-US">C</span>编译器编译下面函数<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">void</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> computeGold( </span><span style="COLOR: blue">float</span><span style="color: #000000;">* reference, </span><span style="COLOR: blue">float</span><span style="color: #000000;">* idata, </span><span style="COLOR: blue">const</span><span style="color: #000000;"> </span><span style="COLOR: blue">unsigned</span><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> len);<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">// Program main<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">// host</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes">端程序是在<span lang="EN-US">CPU</span>上运行的，可以在其中加入一些<span lang="EN-US">C</span>代码，不同的是多了一些<span lang="EN-US">CUDA</span>扩展<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////<br />
</span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">int </span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">main( </span><span style="COLOR: blue">int</span><span style="color: #000000;"> argc, </span><span style="COLOR: blue">char</span><span style="color: #000000;">** argv)<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">{<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>runTest( argc, argv);<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">经过封装的显卡计算程序<br />
</span></span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>CUT_EXIT(argc, argv);<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">退出<span lang="EN-US">CUDA</span>，停止使用显卡进行计算<br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">}<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">//! Run a simple test for CUDA<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////</span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">void </span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">runTest( </span><span style="COLOR: blue">int</span><span style="color: #000000;"> argc, </span><span style="COLOR: blue">char</span><span style="color: #000000;">** argv)<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">{<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"> CUT_DEVICE_INIT(argc, argv);<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">初始化设备，使用多卡时应该加上设备号，或者使用<span lang="EN-US">cudasetdevice()</span>函数<br />
</span></span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><br />
unsigned</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> timer = 0;<span style="mso-tab-count: 2"><br />
</span></span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"> CUT_SAFE_CALL( cutCreateTimer( &amp;timer)); <span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">定义一个计时器<br />
</span></span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>CUT_SAFE_CALL( cutStartTimer( timer));<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">开始计时</span></span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">unsigned</span><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> num_threads = 32;<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">定义每个<span lang="EN-US">block</span>中的线程数<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">unsigned</span><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> mem_size = </span><span style="COLOR: blue">sizeof</span><span style="color: #000000;">( </span><span style="COLOR: blue">float</span><span style="color: #000000;">) * num_threads;<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">需要分配的存储器大小，这里每个线程只处理一个浮点数</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"><br />
</span></span><span style="COLOR: green">// allocate host memory<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">float</span><span style="color: #000000;">* h_idata = (</span><span style="COLOR: blue">float</span><span style="color: #000000;">*) malloc( mem_size);<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">在主机内存上分配空间，前缀<span lang="EN-US">h_</span>表示<span lang="EN-US">host<br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"><br />
</span></span><span style="COLOR: green">// initalize the memory<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">for</span><span style="color: #000000;">( </span><span style="COLOR: blue">unsigned</span><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> i = 0; i &lt; num_threads; ++i)</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">{<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>h_idata[i] = (</span><span style="COLOR: blue">float</span><span style="color: #000000;">) i;  <span style="color: #008000;">//<span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">初始化内存中的数值</span><br />
</span></span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>}<span style="mso-tab-count: 10"> </span></span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// allocate device memory<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">float</span><span style="color: #000000;">* d_idata;<span style="mso-tab-count: 1"> <span style="color: #008000;"> </span></span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">定义指针，这个指针将指向显卡上的显存，前缀<span lang="EN-US">d_</span>表示<span lang="EN-US">device</span>，<span lang="EN-US">i</span>表示</span><span lang="EN-US"><span style="color: #008000;">input<br />
</span></span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUDA_SAFE_CALL( cudaMalloc( (</span><span style="COLOR: blue">void</span><span style="color: #000000;">**) &amp;d_idata, mem_size)); <span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">在显卡上分配空间<br />
</span><br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// copy host memory to device<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUDA_SAFE_CALL( cudaMemcpy( d_idata, h_idata, mem_size, </span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">cudaMemcpyHostToDevice) );<br />
<span style="color: #008000;"> //</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">将内存中<span lang="EN-US">h_idata</span>中的值拷贝到显存中的<span lang="EN-US">d_idata</span>里，这样就完成了主机对设备的数据写入<br />
</span></span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// allocate device memory for result<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">float</span><span style="color: #000000;">* d_odata;<span style="mso-tab-count: 1"> <span style="color: #008000;"> </span></span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">定义指针，这个指针将指向显卡上的显存，前缀<span lang="EN-US">d_</span>表示<span lang="EN-US">device</span>，<span lang="EN-US">o</span>表示</span><span lang="EN-US"><span style="color: #008000;">output<br />
</span></span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUDA_SAFE_CALL( cudaMalloc( (</span><span style="COLOR: blue">void</span><span style="color: #000000;">**) &amp;d_odata, mem_size)); <span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">在显卡上分配空间</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;"> </span></span></p>
<div><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;"> </span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// setup execution parameters<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">dim3</span><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>grid( 1, 1, 1);<span style="mso-tab-count: 2"><br />
</span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">定义网格大小<span lang="EN-US">, </span>第一维和第二维之积必须小于<span lang="EN-US">65535</span>，第三维为<span lang="EN-US">1</span>，网格上的每个点代表一个</span><span lang="EN-US"><span style="color: #008000;">block<br />
</span></span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">dim3</span><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>threads( num_threads, 1, 1);<span style="mso-tab-count: 1"><br />
</span><span style="color: #008000;">//</span></span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">定义<span lang="EN-US">block</span>大小，三个维度之积小于<span lang="EN-US">768</span>（<span lang="EN-US">1.0</span>或者<span lang="EN-US">1.1</span>的硬件）<span lang="EN-US">/1024</span>（<span lang="EN-US">1.3</span>硬件），第一维和第二维最大为<span lang="EN-US">512</span>，第三维最大为<span lang="EN-US">4</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US"> </span></span></div>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"><br />
</span></span><span style="COLOR: green">// execute the kernel<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>testKernel&lt;&lt;&lt; grid, threads, mem_size &gt;&gt;&gt;( d_idata, d_odata);<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"> <span style="color: #008000;">/*</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">调用核函数，其中那个夸张的<span lang="EN-US">&lt;&lt;&lt; &gt;&gt;&gt;</span>中是调用核函数的一些设置，，而<span lang="EN-US">( )</span>括号里是程序的的参数列表。<span lang="EN-US">&lt;&lt;&lt; &gt;&gt;&gt;</span>里第一个参数是<span lang="EN-US">dim3</span>类型的<span lang="EN-US">grid</span>网格形状，第二个参数是<span lang="EN-US">block</span>的形状，第三个参数表示给核函数分配的<span lang="EN-US">shared memory</span>大小。因为我们的<span lang="EN-US">grid</span>和<span lang="EN-US">threads</span>实际上是一维的，因此以<span lang="EN-US">&lt;&lt;&lt; 1, num_threads, mem_size&gt;&gt;&gt;</span>的形式调用<span lang="EN-US">testKernel()</span>效果是一样的。（）中的指针必须是指向显存的，而其他参数如<span lang="EN-US">float, int</span>等则不必传到显卡。*/</span></span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// check if kernel execution generated and error<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUT_CHECK_ERROR(</span><span style="COLOR: maroon">&#8220;Kernel execution failed&#8221;</span><span style="color: #000000;">);<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">用于检查错误</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// allocate mem for the result on host side<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">float</span><span style="color: #000000;">* h_odata = (</span><span style="COLOR: blue">float</span><span style="color: #000000;">*) malloc( mem_size);<span style="mso-tab-count: 1"> <span style="color: #008000;"> </span></span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">在内存上开辟空间准备接收显卡计算得到的结果<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// copy result from device to host<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUDA_SAFE_CALL( cudaMemcpy( h_odata, d_odata, </span><span style="COLOR: blue">sizeof</span><span style="color: #000000;">( </span><span style="COLOR: blue">float</span><span style="color: #000000;">) * num_threads,</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>cudaMemcpyDeviceToHost) );<br />
<span style="color: #008000;"> </span></span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US">//</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">将显存中的数据拷贝回内存</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>CUT_SAFE_CALL( cutStopTimer( timer));<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">结束计时<br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>printf( </span><span style="COLOR: maroon">&#8220;Processing time: %f (ms)\n&#8221;</span><span style="color: #000000;">, cutGetTimerValue( timer));<span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">显示时间</span><br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>CUT_SAFE_CALL( cutDeleteTimer( timer));<span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">删除<span lang="EN-US">timer</span></span></span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left">
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// compute reference solution<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">float</span><span style="color: #000000;">* reference = (</span><span style="COLOR: blue">float</span><span style="color: #000000;">*) malloc( mem_size);<span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">在内存上分配空间，用于计算<span lang="EN-US">cpu</span>结果</span><br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>computeGold( reference, h_idata, num_threads);<span style="color: #008000;">//computeGold</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">函数在<span lang="EN-US">computeGold.c</span>中，是一个与<span lang="EN-US">GPU</span>计算功能相同的对照程序</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// check result<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">if</span><span style="color: #000000;">( cutCheckCmdLineFlag( argc, (</span><span style="COLOR: blue">const</span><span style="color: #000000;"> </span><span style="COLOR: blue">char</span><span style="color: #000000;">**) argv, </span><span style="COLOR: maroon">&#8220;regression&#8221;</span><span style="color: #000000;">))<br />
<span style="color: #008000;"> //</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #000000;"><span style="color: #008000;">判断<span lang="EN-US">gpu</span>程序接收到的消息中是否设定需要纪录显卡计算结果，否则比较<span lang="EN-US">cpu</span>与<span lang="EN-US">gpu</span>计算结果</span><br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>{<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// write file for regression test<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUT_SAFE_CALL( cutWriteFilef( </span><span style="COLOR: maroon">&#8220;./data/regression.dat&#8221;</span><span style="color: #000000;">, </span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">h_odata, num_threads, 0.0));<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>}<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">else</span><span style="color: #000000;"><br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>{<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// custom output handling when no regression test running<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// in this case check if the result is equivalent to the expected soluion<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>CUTBoolean res = cutComparef( reference, h_odata, num_threads);<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>printf( </span><span style="COLOR: maroon">&#8220;Test %s\n&#8221;</span><span style="color: #000000;">, (1 == res) ? </span><span style="COLOR: maroon">&#8220;PASSED&#8221;</span><span style="color: #000000;"> : </span><span style="COLOR: maroon">&#8220;FAILED&#8221;</span><span style="color: #000000;">);<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>}</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// cleanup memory<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>free( h_idata);<span style="color: #008000;">//</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">释放程序使用的内存空间，否则会造成内存溢出</span><br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>free( h_odata);<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>free( reference);<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>CUDA_SAFE_CALL(cudaFree(d_idata));<span style="color: #008000;">//</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">释放使用的显存空间，否则会造成显存溢出，多次运行后显卡无法工作</span><br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span><span style="mso-spacerun: yes"> </span>CUDA_SAFE_CALL(cudaFree(d_odata));<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">}</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">/*  Template_Kernel.c </span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes">用于演示如何生成<span lang="EN-US">cuda</span>工程的样本程序<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">* </span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes">设备端，也就是<span lang="EN-US">gpu code </span></span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">*/</span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left">
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#ifndef</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> _TEMPLATE_KERNEL_H_<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#define</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> _TEMPLATE_KERNEL_H_</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left">
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#include</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: maroon">&lt;stdio.h&gt;<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">在<span lang="EN-US">emu</span>模式下可以有<span lang="EN-US">stdio</span>以便输出一些中间结果来观察，由<span lang="EN-US">gpu</span>运行时是不能使用其中的函数的<br />
</span></span><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#define</span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"> SDATA( index)<span style="mso-spacerun: yes"> </span>CUT_BANK_CHECKER(sdata, index)</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US">//</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">主要是在<span lang="EN-US">emu</span>模式下检查<span lang="EN-US">bank</span>访问<span lang="EN-US">,</span>如果程序最终从<span lang="EN-US">gpu</span>运行，这一句不是必要的，此时程序中的<span lang="EN-US">SDATA()</span>应该替换成<span lang="EN-US">s_data[]</span>的形式</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left">
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">//! Simple test kernel for device functionality<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">//! @param g_idata<span style="mso-spacerun: yes"> </span>input data in global memory<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">//! @param g_odata<span style="mso-spacerun: yes"> </span>output data in global memory<br />
</span><span style="FONT-SIZE: 9pt; COLOR: green; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">////////////////////////////////////////////////////////////////////////////////</span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">__global__</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: blue">void </span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">testKernel( </span><span style="COLOR: blue">float</span><span style="color: #000000;">* g_idata, </span><span style="COLOR: blue">float</span><span style="color: #000000;">* g_odata)<br />
</span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US">/*</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">对应<span lang="EN-US">host</span>代码中的<span lang="EN-US">testKernel&lt;&lt;&lt; grid, threads, mem_size &gt;&gt;&gt;( d_idata, d_odata)</span>，注意到同样的指针在<span lang="EN-US">host</span>端中前缀是<span lang="EN-US">d</span>，而在<span lang="EN-US">kernel</span>函数中，相同的指针前缀却变成了<span lang="EN-US">g</span>，这是由于观察角度不同造成的。在<span lang="EN-US">host</span>端中，程序编写者是从<span lang="EN-US">cpu</span>的角度思考，因此显卡只是一个协处理设备；而编写<span lang="EN-US">kernel</span>程序时，我们要养成从<span lang="EN-US">gpu</span>的角度思考的习惯，因此显卡上的显存也就被看成了<span lang="EN-US">global memory*/<br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">{<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// shared memory<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// the size is determined by the host application<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">extern</span><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">__shared__</span><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">float</span><span style="color: #000000;"> sdata[];<br />
</span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US"> /*</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">定义每个<span lang="EN-US">block</span>中使用的<span lang="EN-US">shared_mem</span>的大小，此处<span lang="EN-US">extern</span>表示<span lang="EN-US">shared memory</span>的大小由外部定义，也就是在<span lang="EN-US">host</span>端代码调用<span lang="EN-US">kernel</span>函数<span lang="EN-US">&lt;&lt;&lt;&gt;&gt;&gt;</span>中的第三个参数<span lang="EN-US">mem_size*</span></span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span lang="EN-US">/</span></span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span lang="EN-US"><br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// access thread id<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">const</span><span style="color: #000000;"> </span><span style="COLOR: blue">unsigned</span><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> tid = </span><span style="COLOR: blue">threadIdx</span><span style="color: #000000;">.x;<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> <span style="color: #008000;">//</span></span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">定义常数<span lang="EN-US"> tid</span>，储存在GPU端寄存器中，<span lang="EN-US">threadIdx.x</span>是每个线程在<span lang="EN-US">block</span>中的<span lang="EN-US">x</span>轴编号，即（<span lang="EN-US">0</span>，<span lang="EN-US">1</span>，<span lang="EN-US">2.. blockDim.x-1</span>）<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// access number of threads in this block</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">const</span><span style="color: #000000;"> </span><span style="COLOR: blue">unsigned</span><span style="color: #000000;"> </span><span style="COLOR: blue">int</span><span style="color: #000000;"> num_threads = </span><span style="COLOR: blue">blockDim</span><span style="color: #000000;">.x;<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> <span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">定义常数，<span lang="EN-US"> blockDim.x</span>就是<span lang="EN-US">dim3 threads( num_threads, 1, 1)</span>的第一个参数，即<span lang="EN-US">32</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// read in input data from global memory<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// use the bank checker macro to check for bank conflicts during host<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// emulation<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>SDATA(tid) = g_idata[tid];<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">从<span lang="EN-US">global memory</span>（显存）拷贝到<span lang="EN-US">shared memory<br />
</span></span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">__syncthreads</span><span style="color: #000000;">();<span style="mso-tab-count: 2"> </span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">输入数据后同步一次，保证计算时所有数据均已到位</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// perform some computations<br />
</span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"><span style="mso-spacerun: yes"> </span>SDATA(tid) = (</span><span style="COLOR: blue">float</span><span style="color: #000000;">) num_threads * SDATA( tid);<span style="mso-tab-count: 1"> <span style="color: #008000;"> </span></span><span style="color: #008000;">//</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes"><span style="color: #008000;">实际上就是一个<span lang="EN-US">32*i<br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: blue">__syncthreads</span><span style="color: #000000;">();<span style="mso-tab-count: 1"> </span><span style="color: #008000;">//</span></span></span><span style="color: #008000;"><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;">输入数据后同步一次，保证输出前所有数据已经计算完成</span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;" lang="EN-US"> </span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-layout-grid-align: none" align="left"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"><span style="color: #000000;"> </span></span><span style="COLOR: green">// write data to global memory<br />
</span></span><span style="color: #000000;"><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="mso-spacerun: yes"> </span>g_odata[tid] = SDATA(tid);<span style="color: #008000;">//</span></span><span style="font-size: 9pt; font-family: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes;"><span style="color: #008000;">将<span lang="EN-US">shared</span>中计算完的数据写入<span lang="EN-US">g_data</span>中<br />
</span></span></span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;">}</span></span></p>
<p class="MsoNormal" style="MARGIN: 0cm 0cm 0pt"><span style="FONT-SIZE: 9pt; COLOR: blue; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US">#endif</span><span style="FONT-SIZE: 9pt; FONT-FAMILY: 新宋体; mso-hansi-font-family: 'Times New Roman'; mso-font-kerning: 0pt; mso-no-proof: yes" lang="EN-US"><span style="color: #000000;"> </span><span style="COLOR: green">// #ifndef _TEMPLATE_KERNEL_H_</span></span><span id="Post.ascx_ViewPost_PreviousAndNextEntriesDown"> </span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.vidwav.com/2009/06/first-cuda-program.htm/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>常用集成开发环境(IDE)的CUDA配置</title>
		<link>http://www.vidwav.com/2009/06/integrate-cuda-into-ide.htm</link>
		<comments>http://www.vidwav.com/2009/06/integrate-cuda-into-ide.htm#comments</comments>
		<pubDate>Sat, 06 Jun 2009 07:16:52 +0000</pubDate>
		<dc:creator>Yu Liu</dc:creator>
				<category><![CDATA[开发]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[IDE]]></category>
		<category><![CDATA[Visual Studio]]></category>
		<category><![CDATA[Xcode]]></category>

		<guid isPermaLink="false">http://www.vidwav.com/?p=131</guid>
		<description><![CDATA[前面简单介绍了CUDA通用并行架构的情况,如何在常用的集成开发环境(IDE)下CUDA的配置好坏往往影响到CUDA程序开发的难易程度.
我这里介绍的常用的IDE包括Windows平台下的Visual Studio (VS)系列和Mac OS X平台下的Xcode. 至于Linux平台的开发,印象中的linux guy都是用make,gcc和gdb来进行项目管理和程序开发调试,我想应该很少linux guy会用IDE吧? 并且CUDA安装程序已经包含了相应的环境设置以及makefile文件的设置,只要依葫芦画瓢,很容易生成自己项目的makefile文件,这里我就不再赘述了.
回到正题,如何对CU[......]<p class='read-more'><a href='http://www.vidwav.com/2009/06/integrate-cuda-into-ide.htm'>继续阅读</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">前面简单介绍了CUDA通用并行架构的情况,如何在常用的集成开发环境(IDE)下CUDA的配置好坏往往影响到CUDA程序开发的难易程度.</p>
<p style="text-align: justify;">我这里介绍的常用的IDE包括Windows平台下的Visual Studio (VS)系列和Mac OS X平台下的Xcode. 至于Linux平台的开发,印象中的linux guy都是用make,gcc和gdb来进行项目管理和程序开发调试,我想应该很少linux guy会用IDE吧? 并且CUDA安装程序已经包含了相应的环境设置以及makefile文件的设置,只要依葫芦画瓢,很容易生成自己项目的makefile文件,这里我就不再赘述了.</p>
<p style="text-align: justify;">回到正题,如何对CUDA在VS和Xcode下的IDE设置呢? 最快捷的方式就是利用相应的CUDA插件来进行配置.这里不得不提两个比较好的插件,一个是Windows平台下的VS插件CUDA VS Wizard, 下载地址 <a href="http://sourceforge.net/projects/cudavswizard">here</a>; 另一个是Mac平台下的Xcode插件NVCuda Plug-in, 下载地址 <a href="http://ftp.vidwav.com/cuda/nvcuda_plugin.zip">here</a>.</p>
<p style="text-align: justify;">由于我对Xcode的开发环境不是很熟悉,因此NVCuda这个插件我总是安装不成功,或者根本就不work. 用这个插件之后,Xcode仍然报什么&#8221;no rule to process file test.cu &#8230; for architecture i386&#8243;的错误.但google一下,好像有人可以成功的利用这个插件在Xcode下编译成功,但没有给出细节,:(. 如果有人知道如何解决这个问题的话,麻烦告诉我一声. <span style="color: #000080;"> <span style="color: #666699;">(后记：问题已解决，参见《</span></span><a title="到 Mac OS X Leopard Xcode环境下的NVIDIA CUDA设置 的永久链接" rel="bookmark" href="http://www.vidwav.com/2009/06/mac-os-x-leopard-xcode-nvidia-cuda.htm">Mac OS X Leopard Xcode环境下的NVIDIA CUDA设置</a><span style="color: #666699;">》一文.）</span></p>
<p style="text-align: justify;">相对Xcode而言,在Windows平台下就显得容易很多了,CUDA VS Wizard插件安装很方便,安装完后,其使用就和VS其他project template一样简单易用. 详细的视频教程可以参看CUDA VS Wizard作者制作的视频, <a href="http://ftp.vidwav.com/cuda/CUDA_easy_start_up.wmv" target="_blank">CUDA_easy_start_up.wmv</a>.</p>
<p style="text-align: justify;">但有个问题就是 CUDA VS Wizard 这个插件没有实现 CUDA 源文件 .cu 的语法高亮设置,解决的方法可以利用 CUDA SDK 自带的usertype.dat 文件来设置 VS 的语法高亮,方法如下:</p>
<ol>
<li style="text-align: justify;">首先安装NVIDIA最新的显卡驱动,以及 CUDA Toolkit 和 CUDA SDK. 下载地址 <a href="http://www.nvidia.com/object/cuda_get.html">here</a>.</li>
<li style="text-align: justify;">然后拷贝%NVIDIA CUDA SDK%\doc\syntax_highlighting\visual_studio_8目录下的usertype.dat文件到%Program Files%\Microsoft Visual Studio 8\Common7\IDE目录下.</li>
<li style="text-align: justify;">在Visual Studio下打开Tools -&gt; Options</li>
<li style="text-align: justify;">在Text Editor -&gt; File Extension tab下,指定新的扩展名extension “.cu”, 关联到Microsoft Visual C++.</li>
<li style="text-align: justify;">重启Visual Studio.</li>
<li>Enjoy it!</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.vidwav.com/2009/06/integrate-cuda-into-ide.htm/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://ftp.vidwav.com/cuda/CUDA_easy_start_up.wmv" length="10827444" type="video/x-ms-wmv" />
		</item>
		<item>
		<title>NVIDIA CUDA通用并行计算架构</title>
		<link>http://www.vidwav.com/2009/06/nvidia-cuda-programming.htm</link>
		<comments>http://www.vidwav.com/2009/06/nvidia-cuda-programming.htm#comments</comments>
		<pubDate>Sat, 06 Jun 2009 05:45:08 +0000</pubDate>
		<dc:creator>Yu Liu</dc:creator>
				<category><![CDATA[开发]]></category>
		<category><![CDATA[CUDA]]></category>
		<category><![CDATA[并行]]></category>
		<category><![CDATA[视频]]></category>

		<guid isPermaLink="false">http://www.vidwav.com/?p=121</guid>
		<description><![CDATA[最近在学习NVIDIA的CUDA通用并行计算架构, 该架构是利用NVIDIA的GPU进行计算密集型、高度并行化的计算. 有兴趣的读者可以参见NVIDIA CUDA<a href="http://www.nvidia.cn/object/cuda_home_cn.html">中文网</a>或<a href="http://www.nvidia.com/object/cuda_home.html">英文网</a>.
在学习的过程中,该并行架构的编程原理给我的感觉还算好理解, 主要的流程就是:
<ol style="text-align: justify;">
<li>Host (aka. CPU) 进行初始化程序及Host上的内存数据</li>
<li>Host 将其内存区上的数据拷贝到Device (aka. GPU)的内存区</li>
<li>执行Device上的代码 (aka. Kernal函数)进行并行计算</li>
<li>Device上的计算结束后,再将Device的内存区的数据拷贝回Host的内存区</li>
<li>Host再进行后续代码工作</li>
</ol>
因此Hos[......]<p class='read-more'><a href='http://www.vidwav.com/2009/06/nvidia-cuda-programming.htm'>继续阅读</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">最近在学习NVIDIA的CUDA通用并行计算架构, 该架构是利用NVIDIA的GPU进行计算密集型、高度并行化的计算. 有兴趣的读者可以参见NVIDIA CUDA<a href="http://www.nvidia.cn/object/cuda_home_cn.html">中文网</a>或<a href="http://www.nvidia.com/object/cuda_home.html">英文网</a>.</p>
<p style="text-align: justify;">在学习的过程中,该并行架构的编程原理给我的感觉还算好理解, 主要的流程就是:</p>
<ol style="text-align: justify;">
<li>Host (aka. CPU) 进行初始化程序及Host上的内存数据</li>
<li>Host 将其内存区上的数据拷贝到Device (aka. GPU)的内存区</li>
<li>执行Device上的代码 (aka. Kernal函数)进行并行计算</li>
<li>Device上的计算结束后,再将Device的内存区的数据拷贝回Host的内存区</li>
<li>Host再进行后续代码工作</li>
</ol>
<p style="text-align: justify;">因此Host与Device之间的数据传输将是一个关键点,因为这是一个相对比较耗时的操作,直接影响整个程序的性能. 如果数据传输加Device上并行执行的时间开销要大于原本只需在Host上串行执行的时间开销的话,那这个并行架构就没有任何意义了.</p>
<p style="text-align: justify;">对于视频编解码程序而言,这个数据传输的问题相对而言并不是一个很大的问题.</p>
<p style="text-align: justify;">对于编码器而言,由于编码的复杂度相对比较高,大部分时间都集中编码过程,因此数据传输的开销相对比例很小,而且利用CUDA的page-locked memory来实现异步并行传输方式或DMA传输方式.</p>
<p style="text-align: justify;">对于解码器而言,由于解码器相对简单,数据传输的开销比例可能会因此变得相对较大. 幸运的是,多数解码器的直接输出为屏幕显示,即解码器的本身就需要将解码后的图像数据拷贝到显卡(Device or GPU)的内存区,因此利用CUDA编程时,从Device的内存区回拷到Host的内存区这一步骤可以省去,因此相应的数据传输的开销可以说并没有增加.</p>
<p style="text-align: justify;">综上所述, CUDA通用并行计算架构是很适合视频编解码技术的并行算法的实现.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vidwav.com/2009/06/nvidia-cuda-programming.htm/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to build ffmpeg with x264 on windows platform</title>
		<link>http://www.vidwav.com/2009/06/how-to-build-ffmpeg-with-x264-on-windows-platform.htm</link>
		<comments>http://www.vidwav.com/2009/06/how-to-build-ffmpeg-with-x264-on-windows-platform.htm#comments</comments>
		<pubDate>Tue, 02 Jun 2009 02:32:23 +0000</pubDate>
		<dc:creator>Yu Liu</dc:creator>
				<category><![CDATA[开发]]></category>
		<category><![CDATA[ffmpeg]]></category>
		<category><![CDATA[x264]]></category>

		<guid isPermaLink="false">http://www.vidwav.com/?p=110</guid>
		<description><![CDATA[First, download some tools of the trade and the source for ffmpeg and x264
<ol style="text-align: justify;" type="1">
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;">Download MinGW from <a href="http://downloads.sourceforge.net/mingw/MinGW-5.1.3.exe">here</a>. </li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;">Download MSYS from <a href="http://downloads.sourceforge.net/mingw/MSYS-1.0.10.exe">here</a>. </li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;">Download updated bash for MSYS from <a href="http://downloads.sourceforge.net/mingw/bash-2.05b-MSYS.tar.bz2">here</a>. </li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;">Download updated w32api-3.13-mingw32-dev.tar for MinGW from <a href="http://sourceforge.net/project/showfiles.php?group_id=2435">here</a>.(here is a <a href="http://sourceforge.net/project/showfiles.php?group_id=2435&#38;package_id=11550&#38;release_id=645278">direct link</a>).</li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;">Get the latest snapshot of ffmpeg from[......]</li></ol><p class='read-more'><a href='http://www.vidwav.com/2009/06/how-to-build-ffmpeg-with-x264-on-windows-platform.htm'>继续阅读</a></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">First, download some tools of the trade and the source for ffmpeg and x264</p>
<ol style="text-align: justify;" type="1">
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Download MinGW from </span><a href="http://downloads.sourceforge.net/mingw/MinGW-5.1.3.exe"><span style="font-size: small;">here</span></a><span style="font-size: small;">. </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Download MSYS from </span><a href="http://downloads.sourceforge.net/mingw/MSYS-1.0.10.exe"><span style="font-size: small;">here</span></a><span style="font-size: small;">. </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Download updated bash for MSYS from </span><a href="http://downloads.sourceforge.net/mingw/bash-2.05b-MSYS.tar.bz2"><span style="font-size: small;">here</span></a><span style="font-size: small;">. </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Download updated w32api-3.13-mingw32-dev.tar for MinGW from </span><a href="http://sourceforge.net/project/showfiles.php?group_id=2435"><span style="font-size: small; color: #800080;">here</span></a><span style="font-size: small;">.(here is a </span><a href="http://sourceforge.net/project/showfiles.php?group_id=2435&amp;package_id=11550&amp;release_id=645278"><span style="font-size: small;">direct link</span></a><span style="font-size: small;">).</span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Get the latest snapshot of ffmpeg from </span><a href="http://ffmpeg.mplayerhq.hu/download.html"><span style="font-size: small;">here</span></a><span style="font-size: small;"> (here’s a </span><a href="http://ffmpeg.mplayerhq.hu/ffmpeg-export-snapshot.tar.bz2"><span style="font-size: small;">direct link</span></a><span style="font-size: small;">). </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Get the latest snapshot of x264 from </span><a href="ftp://ftp.videolan.org/pub/videolan/x264/snapshots/"><span style="font-size: small;">here</span></a><span style="font-size: small;">.</span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Download updated yasm for compiling x264 from </span><a href="http://www.tortall.net/projects/yasm/releases/yasm-0.7.2-win32.exe"><span style="font-size: small;">here</span></a><span style="font-size: small;">.</span></span></li>
</ol>
<p style="text-align: justify;"><span style="font-family: Calibri;"><span style="font-size: small;">Let’s install the tools. </span></span></p>
<ol style="text-align: justify;" type="1">
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Install MinGW (choose “MinGW base tools” and “MinGW make”) into c:\mingw </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Install MSYS into c:\msys\1.0 </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">After MSYS installation it will run a postinstall script. It will ask you some questions which you should answer as following:<br />
Do you wish to continue with the post install? [yn ] -&gt; y<br />
Do you have MinGW installed? [yn ] -&gt; y<br />
Where is your MinGW installation? -&gt; C:/mingw </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Copy bash-2.05b-MSYS.tar.bz2 to c:\msys\1.0 and extract it (bash.exe should go into C:\MSYS\1.0\bin).<br />
We need to do this because ffmpeg’s <em><span style="font-family: Calibri;">configure</span></em> script doesn’t work with bash 2.0.4 that comes with MSYS. </span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Extract the w32api-3.13-mingw32-dev.tar to c:\minGW.</span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Rename yasm-0.7.2-win32.exe to yasm.exe, and copy yasm.exe to c:\sysm\1.0\bin for compiling x264.</span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l1 level1 lfo2; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">Extract the ffmpeg and x264 sources. I’ll assume you’ve extracted them to c:\work\ffmpeg and c:\work\x264. </span></span></li>
</ol>
<p style="text-align: justify;"><span style="font-family: Calibri;"><span style="font-size: small;">Now it’s time to build the library.</span></span></p>
<ol style="text-align: justify;" type="1">
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;">
<p class="MsoNormal" style="margin: 0in 0in 0pt; color: blue; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;"><span style="color: windowtext; font-family: Calibri;"><span style="font-size: small;">launch the MSYS, then, in c:\work\x264, perform the following command:<br />
</span></span><em style="mso-bidi-font-style: normal;"><span style="font-size: 10.5pt; color: maroon; font-family: Calibri;">./configure &#8211;prefix=/static –enable-shared</span></em></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">run the following commands: make, <span style="font-size: 10.5pt; font-family: Calibri;">make install.<br />
there will exist x264.h in static/include,<span style="mso-spacerun: yes;">  </span>and libx264.a in static/lib</span></span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">in c:workffmpeg and perform the following command: </span></span><span style="font-family: Calibri; mso-bidi-font-family: 'Courier New';"><br />
</span><em style="mso-bidi-font-style: normal;"><span style="font-size: 10.5pt; color: maroon; font-family: Calibri;">./configure &#8211;enable-memalign-hack &#8211;enable-shared &#8211;disable-static &#8211;enable-gpl<span style="mso-spacerun: yes;">  </span>&#8211;enable-libx264<span style="mso-spacerun: yes;">  </span>&#8211;enable-avisynth &#8211;extra-cflags=-I/static/include &#8211;extra-ldflags=-L/static/lib</span></em></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;"><span style="font-size: small;"><span style="font-family: Calibri; mso-bidi-font-family: 'Courier New';">make</span></span></li>
<li class="MsoNormal" style="margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;"><span style="font-family: Calibri;"><span style="font-size: small;">In order to run ffmpeg.exe, collect the binary files and put them together in one folder:<br />
</span></span><span style="font-family: Calibri; mso-bidi-font-family: 'Courier New';"><span style="font-size: small;">ffmpeg\ffmpeg.exe<br />
ffmpeg\libavcodec\avcodec-51.dll<br />
ffmpeg\libavformat\avformat-51.dll<br />
ffmpeg\libavutil\avutil-49.dll<br />
ffmpeg\libavdevice\avdevice-52.dll</span></span></li>
</ol>
<p class="MsoNormal" style="text-align: justify; margin: 0in 0in 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l2 level1 lfo3; tab-stops: list .5in;"> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.vidwav.com/2009/06/how-to-build-ffmpeg-with-x264-on-windows-platform.htm/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
