Inline Assembler

D, being a systems programming language, provides an inline assembler. The inline assembler is standardized for D implementations across the same CPU family, for example, the Intel Pentium inline assembler for a Win32 D compiler will be syntax compatible with the inline assembler for Linux running on an Intel Pentium.

Implementations of D on different architectures, however, are free to innovate upon the memory model, function call/return conventions, argument passing conventions, etc.

This document describes the x86 implementation of the inline assembler.

AsmInstruction:
    Identifier : AsmInstruction
    align IntegerExpression
    even
    naked
    db Operands
    ds Operands
    di Operands
    dl Operands
    df Operands
    dd Operands
    de Operands
    Opcode
    Opcode Operands

Operands:
    Operand
    Operand , Operands

Labels

Assembler instructions can be labeled just like other statements. They can be the target of goto statements. For example:

void *pc;
asm
{
  call L1          ;
 L1:               ;
  pop  EBX         ;
  mov  pc[EBP],EBX ; // pc now points to code at L1
}

align IntegerExpression

IntegerExpression:
    IntegerLiteral
    Identifier

Causes the assembler to emit NOP instructions to align the next assembler instruction on an IntegerExpression boundary. IntegerExpression must evaluate at compile time to an integer that is a power of 2.

Aligning the start of a loop body can sometimes have a dramatic effect on the execution speed.

even

Causes the assembler to emit NOP instructions to align the next assembler instruction on an even boundary.

naked

Causes the compiler to not generate the function prolog and epilog sequences. This means such is the responsibility of inline assembly programmer, and is normally used when the entire function is to be written in assembler.

db, ds, di, dl, df, dd, de

These pseudo ops are for inserting raw data directly into the code. db is for bytes, ds is for 16 bit words, di is for 32 bit words, dl is for 64 bit words, df is for 32 bit floats, dd is for 64 bit doubles, and de is for 80 bit extended reals. Each can have multiple operands. If an operand is a string literal, it is as if there were length operands, where length is the number of characters in the string. One character is used per operand. For example:

asm
{
  db 5,6,0x83;   // insert bytes 0x05, 0x06, and 0x83 into code
  ds 0x1234;     // insert bytes 0x34, 0x12
  di 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00
  dl 0x1234;     // insert bytes 0x34, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
  df 1.234;      // insert float 1.234
  dd 1.234;      // insert double 1.234
  de 1.234;      // insert real 1.234
  db "abc";      // insert bytes 0x61, 0x62, and 0x63
  ds "abc";      // insert bytes 0x61, 0x00, 0x62, 0x00, 0x63, 0x00
}

Opcodes

A list of supported opcodes is at the end.

The following registers are supported. Register names are always in upper case.

Register:
    AL AH AX EAX
    BL BH BX EBX
    CL CH CX ECX
    DL DH DX EDX
    BP EBP
    SP ESP
    DI EDI
    SI ESI
    ES CS SS DS GS FS
    CR0 CR2 CR3 CR4
    DR0 DR1 DR2 DR3 DR6 DR7
    TR3 TR4 TR5 TR6 TR7
    ST
    ST(0) ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7)
    MM0 MM1 MM2 MM3 MM4 MM5 MM6 MM7
    XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7

Special Cases

lock, rep, repe, repne, repnz, repz

These prefix instructions do not appear in the same statement as the instructions they prefix; they appear in their own statement. For example:

asm {
  rep   ;
  movsb ;
}

pause

This opcode is not supported by the assembler, instead use

asm {
  rep  ;
  nop  ;
}

which produces the same result.

floating point ops

Use the two operand form of the instruction format;

fdiv ST(1);	// wrong
fmul ST;        // wrong
fdiv ST,ST(1);	// right
fmul ST,ST(0);	// right

Operands

Operand:
    AsmExp

AsmExp:
    AsmLogOrExp
    AsmLogOrExp ? AsmExp : AsmExp

AsmLogOrExp:
    AsmLogAndExp
    AsmLogAndExp || AsmLogAndExp

AsmLogAndExp:
    AsmOrExp
    AsmOrExp && AsmOrExp

AsmOrExp:
    AsmXorExp
    AsmXorExp | AsmXorExp

AsmXorExp:
    AsmAndExp
    AsmAndExp ^ AsmAndExp

AsmAndExp:
    AsmEqualExp
    AsmEqualExp & AsmEqualExp

AsmEqualExp:
    AsmRelExp
    AsmRelExp == AsmRelExp
    AsmRelExp != AsmRelExp

AsmRelExp:
    AsmShiftExp
    AsmShiftExp < AsmShiftExp
    AsmShiftExp <= AsmShiftExp
    AsmShiftExp > AsmShiftExp
    AsmShiftExp >= AsmShiftExp

AsmShiftExp:
    AsmAddExp
    AsmAddExp << AsmAddExp
    AsmAddExp >> AsmAddExp
    AsmAddExp >>> AsmAddExp

AsmAddExp:
    AsmMulExp
    AsmMulExp + AsmMulExp
    AsmMulExp - AsmMulExp

AsmMulExp:
    AsmBrExp
    AsmBrExp * AsmBrExp
    AsmBrExp / AsmBrExp
    AsmBrExp % AsmBrExp

AsmBrExp:
    AsmUnaExp
    AsmBrExp [ AsmExp ]

AsmUnaExp:
    AsmTypePrefix AsmExp
    offsetof AsmExp
    seg AsmExp
    + AsmUnaExp
    - AsmUnaExp
    ! AsmUnaExp
    ~ AsmUnaExp
    AsmPrimaryExp

AsmPrimaryExp:
    IntegerLiteral
    FloatLiteral
    __LOCAL_SIZE
    $
    Register
    DotIdentifier

DotIdentifier:
    Identifier
    Identifier . DotIdentifier

The operand syntax more or less follows the Intel CPU documentation conventions. In particular, the convention is that for two operand instructions the source is the right operand and the destination is the left operand. The syntax differs from that of Intel's in order to be compatible with the D language tokenizer and to simplify parsing.

The seg means load the segment number that the symbol is in. This is not relevant for flat model code. Instead, do a move from the relevant segment register.

Operand Types

AsmTypePrefix:
    near ptr
    far ptr
    byte ptr
    short ptr
    int ptr
    word ptr
    dword ptr
    qword ptr
    float ptr
    double ptr
    real ptr

In cases where the operand size is ambiguous, as in:

add	[EAX],3		;

it can be disambiguated by using an AsmTypePrefix:

add  byte ptr [EAX],3 ;
add  int ptr [EAX],7  ;

far ptr is not relevant for flat model code.

Struct/Union/Class Member Offsets

To access members of an aggregate, given a pointer to the aggregate is in a register, use the qualified name of the member:

struct Foo { int a,b,c; }
int bar(Foo *f) {
  asm {
    mov EBX,f          ;
    mov EAX,Foo.b[EBX] ;
  }
}

Stack Variables

Stack variables (variables local to a function and allocated on the stack) are accessed via the name of the variable indexed by EBP:

int foo(int x) {
  asm {
    mov EAX,x[EBP] ; // loads value of parameter x into EAX
    mov EAX,x      ; // does the same thing
  }
}

If the [EBP] is omitted, it is assumed for local variables. If naked is used, this no longer holds.

Special Symbols

$

Represents the program counter of the start of the next instruction. So,

jmp  $  ;

branches to the instruction following the jmp instruction. The $ can only appear as the target of a jmp or call instruction.

__LOCAL_SIZE

This gets replaced by the number of local bytes in the local stack frame. It is most handy when the naked is invoked and a custom stack frame is programmed.

Opcodes Supported

aaa	aad	aam	aas	adc
add	addpd	addps	addsd	addss
and	andnpd	andnps	andpd	andps
arpl	bound	bsf	bsr	bswap
bt	btc	btr	bts	call
cbw	cdq	clc	cld	clflush
cli	clts	cmc	cmova	cmovae
cmovb	cmovbe	cmovc	cmove	cmovg
cmovge	cmovl	cmovle	cmovna	cmovnae
cmovnb	cmovnbe	cmovnc	cmovne	cmovng
cmovnge	cmovnl	cmovnle	cmovno	cmovnp
cmovns	cmovnz	cmovo	cmovp	cmovpe
cmovpo	cmovs	cmovz	cmp	cmppd
cmpps	cmps	cmpsb	cmpsd	cmpss
cmpsw	cmpxch8b	cmpxchg	comisd	comiss
cpuid	cvtdq2pd	cvtdq2ps	cvtpd2dq	cvtpd2pi
cvtpd2ps	cvtpi2pd	cvtpi2ps	cvtps2dq	cvtps2pd
cvtps2pi	cvtsd2si	cvtsd2ss	cvtsi2sd	cvtsi2ss
cvtss2sd	cvtss2si	cvttpd2dq	cvttpd2pi	cvttps2dq
cvttps2pi	cvttsd2si	cvttss2si	cwd	cwde
da	daa	das	db	dd
de	dec	df	di	div
divpd	divps	divsd	divss	dl
dq	ds	dt	dw	emms
enter	f2xm1	fabs	fadd	faddp
fbld	fbstp	fchs	fclex	fcmovb
fcmovbe	fcmove	fcmovnb	fcmovnbe	fcmovne
fcmovnu	fcmovu	fcom	fcomi	fcomip
fcomp	fcompp	fcos	fdecstp	fdisi
fdiv	fdivp	fdivr	fdivrp	feni
ffree	fiadd	ficom	ficomp	fidiv
fidivr	fild	fimul	fincstp	finit
fist	fistp	fisub	fisubr	fld
fld1	fldcw	fldenv	fldl2e	fldl2t
fldlg2	fldln2	fldpi	fldz	fmul
fmulp	fnclex	fndisi	fneni	fninit
fnop	fnsave	fnstcw	fnstenv	fnstsw
fpatan	fprem	fprem1	fptan	frndint
frstor	fsave	fscale	fsetpm	fsin
fsincos	fsqrt	fst	fstcw	fstenv
fstp	fstsw	fsub	fsubp	fsubr
fsubrp	ftst	fucom	fucomi	fucomip
fucomp	fucompp	fwait	fxam	fxch
fxrstor	fxsave	fxtract	fyl2x	fyl2xp1
hlt	idiv	imul	in	inc
ins	insb	insd	insw	int
into	invd	invlpg	iret	iretd
ja	jae	jb	jbe	jc
jcxz	je	jecxz	jg	jge
jl	jle	jmp	jna	jnae
jnb	jnbe	jnc	jne	jng
jnge	jnl	jnle	jno	jnp
jns	jnz	jo	jp	jpe
jpo	js	jz	lahf	lar
ldmxcsr	lds	lea	leave	les
lfence	lfs	lgdt	lgs	lidt
lldt	lmsw	lock	lods	lodsb
lodsd	lodsw	loop	loope	loopne
loopnz	loopz	lsl	lss	ltr
maskmovdqu	maskmovq	maxpd	maxps	maxsd
maxss	mfence	minpd	minps	minsd
minss	mov	movapd	movaps	movd
movdq2q	movdqa	movdqu	movhlps	movhpd
movhps	movlhps	movlpd	movlps	movmskpd
movmskps	movntdq	movnti	movntpd	movntps
movntq	movq	movq2dq	movs	movsb
movsd	movss	movsw	movsx	movupd
movups	movzx	mul	mulpd	mulps
mulsd	mulss	neg	nop	not
or	orpd	orps	out	outs
outsb	outsd	outsw	packssdw	packsswb
packuswb	paddb	paddd	paddq	paddsb
paddsw	paddusb	paddusw	paddw	pand
pandn	pavgb	pavgw	pcmpeqb	pcmpeqd
pcmpeqw	pcmpgtb	pcmpgtd	pcmpgtw	pextrw
pinsrw	pmaddwd	pmaxsw	pmaxub	pminsw
pminub	pmovmskb	pmulhuw	pmulhw	pmullw
pmuludq	pop	popa	popad	popf
popfd	por	prefetchnta	prefetcht0	prefetcht1
prefetcht2	psadbw	pshufd	pshufhw	pshuflw
pshufw	pslld	pslldq	psllq	psllw
psrad	psraw	psrld	psrldq	psrlq
psrlw	psubb	psubd	psubq	psubsb
psubsw	psubusb	psubusw	psubw	punpckhbw
punpckhdq	punpckhqdq	punpckhwd	punpcklbw	punpckldq
punpcklqdq	punpcklwd	push	pusha	pushad
pushf	pushfd	pxor	rcl	rcpps
rcpss	rcr	rdmsr	rdpmc	rdtsc
rep	repe	repne	repnz	repz
ret	retf	rol	ror	rsm
rsqrtps	rsqrtss	sahf	sal	sar
sbb	scas	scasb	scasd	scasw
seta	setae	setb	setbe	setc
sete	setg	setge	setl	setle
setna	setnae	setnb	setnbe	setnc
setne	setng	setnge	setnl	setnle
setno	setnp	setns	setnz	seto
setp	setpe	setpo	sets	setz
sfence	sgdt	shl	shld	shr
shrd	shufpd	shufps	sidt	sldt
smsw	sqrtpd	sqrtps	sqrtsd	sqrtss
stc	std	sti	stmxcsr	stos
stosb	stosd	stosw	str	sub
subpd	subps	subsd	subss	sysenter
sysexit	test	ucomisd	ucomiss	ud2
unpckhpd	unpckhps	unpcklpd	unpcklps	verr
verw	wait	wbinvd	wrmsr	xadd
xchg	xlat	xlatb	xor	xorpd
xorps

Pentium 4 (Prescott) Opcodes Supported

addsubpd	addsubps	fisttp	haddpd	haddps
hsubpd	hsubps	lddqu	monitor	movddup
movshdup	movsldup	mwait

AMD Opcodes Supported

pavgusb	pf2id	pfacc	pfadd	pfcmpeq
pfcmpge	pfcmpgt	pfmax	pfmin	pfmul
pfnacc	pfpnacc	pfrcp	pfrcpit1	pfrcpit2
pfrsqit1	pfrsqrt	pfsub	pfsubr	pi2fd
pmulhrw	pswapd

Books