Sse
UnCacheable或者说non-cacheable(不作缓存)
Write-Combining(合并写)
http://www.pcinlife.com/article/cpumb/2007-09-18/1190104122d436_7.html
void _mm_stream_pi(__m64 * p , __m64 a );
http://msdn.microsoft.com/zh-tw/library/03a2xda2%28v=vs.90%29.aspx
MOVNTDQ/MOVNTPD 作用也都是每次向non-cacheable缓存存入64个字节长度的数据再以Write-Combining的方式向指定的内存地址存入这128-bit数据,这两个指令不同的地方在于MOVNTDQ的源操作数是Packed起来的整数(可以是字节、字、双字、四字),而MOVNTPD是Packed起来的两个双精度浮点数。
MOVDQA一次可以从主内存中load 128bit数据到Write-Combining Buffer中,只能填满1/4的cache-line
SSE4.1的MOVNTDAQ能以每次从non-cacheable内存中连续地载入64字节长度的数据(只要符合一定条件的话),填满一条cache-line,这个方式被称作Streaming Load
MPSADBW(有时候为了简化称呼人们又将它称之为MPSAD)能实现单指令完成32字节的差分绝对值求和(SAD)运算
PHMINPOSUW的作用是对获取同一行地址中的数据(UWORD)最小值后将其存放在目的地址中并给予其位置的编号,在整数操作和子像素运动评估中这样的操作是经常使用的。
12条整数格式转换指令,能单指令完成单字节(Byte)、字(WORD)、双字(DWORD)转换至字(WORD)、双字(DWORD)、四字(QWORD)格式,它们就是PMOVSXxx、PMOVZXxx,其中的xx对应6种整数格式,例如xx为BW就是8个packed起来的8位整数。
包含了Dot Product、DWORD MUL、往XMM寄存器插入或者从XMM寄存器提取数据、条件位置混合、取舍等。
DPPS(单精度,单指令4D向量)和DPPD(双精度,单指令2D向量)
D3D中的DP4指令描述
dest.x = dest.y = dest.z = dest.w =
(src0.x * src1.x) + (src0.y * src1.y) +
(src0.z * src1.z) + (src0.w * src1.w);
Radix-16除法器和Super Shuffle(重排列)引擎,前者能够显著提升除法、开根等运算,后者能显著提升128bit数据块的重排列的性能
Guidelines for When to Use EMMS
http://msdn.microsoft.com/zh-cn/library/ays9ef83%28v=vs.90%29.aspx
http://msdn.microsoft.com/zh-cn/library/8z56xtyh%28v=vs.90%29.aspx
http://www.csie.ntu.edu.tw/~r89004/hive/sse/page_7.html
prefetch 和 movntps
_mm_load_ps _mm_prefetch _mm_mul_ps _mm_stream_ps
图形图像处理-之-高质量的快速的图像缩放(2)[ZZ]
http://hi.baidu.com/ustc_/blog/item/97d9226ef253c6dd81cb4acf.html
http://bbs.chinaunix.net/forum-226-1.html
You should not access the __m64 fields directly. You can, however, see these types in the debugger. A variable of type __m64 maps to the MM[0-7] registers.
Variables of type _m64 are automatically aligned on 8-byte boundaries.
The __m64 data type is not supported on x64 processors. Applications that use __m64 as part of MMX intrinsics must be rewritten to use equivalent SSE and SSE2 intrinsics.
飽和算法(Saturation Arithmetic)和封裝模式(Wraparound Mode)
http://www.codeproject.com/KB/recipes/mmxintro.aspx
Intel Software manuals.
Volume 1: Basic Architecture. Chapter 8. Programming with the Intel MMX� Technology
Volume 2: Instruction Set Reference http://developer.intel.com/design/archives/processors/mmx/index.htm
MSDN, MMX Technology. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclang/html/vcrefsupportformmxtechnology.asp
Microsoft Visual C++ CPUID sample. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcsample/html/vcsamcpuiddeterminecpucapabilities.asp
Microsoft Visual C++ MMXSwarm sample. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vcsample/html/vcsamMMXSwarmSampleDemonstratesCImageVisualCsMMXSupport.asp
Matt Pietrek. Under The Hood. February 1998 issue of Microsoft Systems Journal. http://www.microsoft.com/msj/0298/hood0298.aspx
- 上一篇 Pathon math lib.
- 下一篇 Python debugger