超軟之家: 使用 array-label-goto 來模擬 while-switch-case 語法

2012年3月31日星期六

使用 array-label-goto 來模擬 while-switch-case 語法

最近看到在 Jserv 他的簡報中的某些Kernel原始碼，

使用 array+label + goto 來模擬while-switch-case的指令。

目的是讓整個程式片段能完整載入到CPU的Ｌ１快取中，

以提高程式效能。

以下是筆者所做的小實驗，程式碼可由此下載。

testla.c 用 array+label+goto 來實作

#include "testla.h"

#define dispatch(s, h) { int n = (*++s) ? (s - h) % 3 : 3; \

goto *tdispatch_list[n]; }

int run(const char *s)
{
const char *head = s;
void *tdispatch_list[] = {&&la, &&lb, &&lc, &&ld};
int a = 1;

la:
dmsg("a:%c\n", *s);
a++;
dispatch(s, head);
lb:
dmsg("b:%c\n", *s);
a++;
dispatch(s, head);
lc:
a-=2;
dmsg("c:%c\n", *s);
dispatch(s, head);

ld:
return a;
}

testsw.c 用while-switch-case 來實作

#include "testla.h"

int run(const char *s)
{
const char *h = s;
int n = 0;
int a = 1;
do {

switch(n) {
case 0:
dmsg("a:%c\n", *s);
a++;
break;
case 1:
dmsg("b:%c\n", *s);
a++;
break;
case 2:
a-=2;
dmsg("c:%c\n", *s);
};

n = (*++s) ? (s - h) % 3 : 3;

} while(n != 3);
return a;
}

共用的主程式部分

#include "testshared.h"

int main(int argc, char **argv) {
const char *s = "hello";
long i;
for (i = 0 ; i < 250000000; i++) {
run(s);
}
return 0;
}

編譯及測試用Ｍakefile

SHELL=bash

OS=$(shell uname -s)

ifeq ($(CROSS),1)
ifeq ($(OS),Darwin)
CROSS_COMPILE=/usr/local/gcc-4.5.2-for-linux64/bin/x86_64-pc-linux-
CC=$(CROSS_COMPILE)gcc
AS=$(CROSS_COMPILE)as
LD=$(CROSS_COMPILE)ld
ifeq ($(DUMP_AD),1)
CFLAGS = -Wa,-a,-ad
endif
endif
endif

ifeq ($(NO_O2),1)
CFLAGS += --save-temps
else
CFLAGS += -O2 --save-temps
endif

all: testla testsw

.c.o:
$(CC) -c $< -I. $(CFLAGS)

testla: main.o testla.o
$(CC) -o $@ $^

testsw: main.o testsw.o
$(CC) -o $@ $^

run: testla testsw
time ./testla
@echo
time ./testsw

執行結果

(以執行環境：windows xp, cygwin 為例)
無最佳化 -O2參數，testla 檔案大小 20136 Bytes; testla 檔案大小 19624 Bytes

testlabelsw$ make run

time ./testla

real 0m5.596s
user 0m5.546s
sys 0m0.092s

time ./testsw

real 0m6.034s
user 0m5.968s
sys 0m0.077s

有最佳化 -O2 參數, 檔案大小 20136Bytes; testla 檔案大小 19624Bytes

testlabelsw$ make run
time ./testla

real 0m3.172s
user 0m3.077s
sys 0m0.061s

time ./testsw

real 0m3.219s
user 0m3.140s
sys 0m0.093s

結論

目前測試只在User Space中，執行函式2.5億次

無最佳化下，大約有 7.25 % 的效能提升。

有最佳化下，大約只有 1.46 % 的效能提升。

以目前的結果來看，若去除誤差值約0.2%，使用最佳化的情況下，

在user-space下，可能不太有價值去做這樣的更動；

但就系統Kernel來說，這個方法提供一種優化演算法的小技巧，值得玩玩。

更新04-19

根據Jserv的回覆：筆者添加到5個switch-case，在有最佳化的情況下，效能提升至 9~10% 之間（測試數據），所以在此更正結論。原始碼已更新，請由此下載。

3 則留言:

jserv 提到...: switch-case 數量太小，不足以反映實際狀況 :-)

其實只要把 switch-case 的數量從這個例子的 3 個提升到 5 個，就能帶來 10% 的時間縮減; 2012年4月19日凌晨1:22
超軟之家提到...: 你是正確的，已更正及修改範例程式。; 2012年4月19日下午3:01
jserv 提到...: 深入的效能探討可見: http://eli.thegreenplace.net/2012/07/12/computed-goto-for-efficient-dispatch-tables/; 2012年7月23日晚上11:51

張貼留言

2012年3月31日 星期六

使用 array-label-goto 來模擬 while-switch-case 語法

3 則留言:

2012年3月31日星期六