「決定不作什麼」更重要
from 國二學生認真打雜 by ericsk
最近有個機會聽了某個作網站的人談論了他的一些想法,他整個講演的中心思想就是:
決定不作什麼,是一件很難卻很重要的事
我聽了之後覺得很有道理,因為這個想法不論套在什麼地方都很中肯呀!
講演者的用意,是想說做一個網站,最重要的是清楚自己的定位,把應該要做到「當使用者想到OOO的時候,會想來你的網站」,很多想投入電子商務的網站,手上資源很多,一開始就想包山包海把全部可以作的東西一口氣推出來,結果使用者被拐騙引導來你網站後,雖然選項琳瑯滿目,卻搞得他見樹不見林,馬上就雲深不知處了。這樣的網站很難把使用者「黏」住。
馬上就可以聯想到生活上很多其它的事務上面,一個碩士班二年級剛開始要寫論文的學生,總是覺得自己武功蓋世,一定可以在把某個題目作得十分透徹,但一旦動手開始進行研究時,才發現自己定的目標太大,不斷限縮自己的目標,最後才勉強擠出一篇論文。學生在放暑假前,除了玩樂之外,有時還會訂一大堆「我要學OOO」、「我要作XXX」等的目標,最後的結果往往是什麼都沒完成,或是只做了一半。
做什麼事,還真的是不如專心一步步做好每件事,再去想下一件事,這樣總比一次做很多事,把自己當作蠟蠋多頭燒,結果每件事都隨便亂做要好得多了。「樣樣通樣樣鬆」這句話可不是大家隨便亂講的。
Saturday, July 31, 2010
組合語言非常不適合被當作單獨的電腦程式語言來學習
Albert.B
我記得我以前好像講過,不過我還是再強調一次...
"組合語言非常不適合被當作單獨的電腦程式語言來學習!!!"
所謂的"組合語言"是可以跟某一種CPU的二進位指令碼一對一
對應的速記符號,當你在講"組合語言"時,實際上是指某一種
CPU的指令集的速記碼,而且你指的還是"某一種組譯器"所能
接受的格式.
為什麼組合語言不適合這樣教? 因為要一個初學者了解組合
語言,他必須先了解這顆CPU,要了解一顆CPU,他至少要知道
大致上CPU是怎麼回事,所以,他必須先學完計算機結構,至少,
要學完"指令集",然後再以某個特定CPU的指令集為例子,教他
這種CPU的組合語言.
學會了C,你到哪用的C都大同小異,但是在不明白計算機結構
的情況下學x86組合語言,你可能不會了解MIPS或Alpha的組語
要怎麼寫,不要說是別種CPU,你看了施威銘那本組合語言,我
就不相信你知道同樣在x86底下,UNIX裡面要顯示字元怎麼寫,
你也不會知道在不准靠INT 21(DOS),INT19(BIOS),的情況下
要怎麼去用組合語言存取硬體,偏偏這些才是組合語言的真正
目的.
這世界上沒有哪本書可以讓初學者不碰計算機結構,就可以
學好組合語言的,要基礎,要觀念清楚,請先找一本清楚的計
算機結構課本,沒有這種基礎,施威銘那本(以及坊間絕大部
分的組合語言"入門"書)只會讓你知道怎麼呼叫DOS和BIOS
裡面別人已經幫你寫好的東西,至於那些東西怎麼寫,你大概
也學不會.
--
廟小妖風大
池淺王八多
--
> : 對於這個部份我深有同感,
> : 市面上x86的書好像都是在講解
> : DOS下的組合語言使用法,
> : 對於一些我比較想要了解的
> : 部份都未提及..
> 我以前也看到有人 post 說,如果寫組語都只是在 call 別人寫好的程式(int),
> 那這樣根本不算是真正在寫組語。這樣的說法嚴格了一點,不過市面上的組語書
> 的確大都以「你要學 DOS下 x86 組合語言」為前提。我也很想知道如果不用DOS
> 和 BIOS 的中斷,要怎麼寫個程式達到有那些中斷的功能或者存取硬體,不知道
> 有什麼書是以這樣的主題為內容呢?想來真的是蠻不容易的。
那就是我說的....
有沒有書會拿這種主題當內容? 很抱歉,沒有.
為什麼沒有? 因為講不完,"存取硬體"意味著你
要知道這個硬體的介面(也就是溝通的規則),這
世界上有多少種介面? 有多少種規則? 你自己
隨時可以創出一個硬體介面來,怎麼可能有哪本
書告訴你所有的規則?
舉個例子來說,IDE硬碟知道吧? 知道怎麼存取嗎?
這個問題要問你的IDE硬碟控制器,如果很幸運的,
這個控制器是傳統的定址方式,你可以知道IRQ是14,
I/O port是1F0h-1F7h(我沒記得很確實),然後你
就要找出ST506/412當年的介面規定,上面會告訴你
1F0是資料暫存器,1F1是指令等等等等,然後你就
用in/out對暫存器讀寫適當的值.
如果不是,你就要看該控制器的介面手冊,然後自己
去想辦法摸出方法來存取.
很麻煩是不是? 但是它們的骨子裡是一樣的,也就
是說,你透過CPU對外界的存取(記憶體或是I/O port)
來對一個硬體裝置下達它能接受的命令,正如同你
學習以x86能接受的方式去對CPU下命令一樣,而這個
裝置所能接受的命令,就是它的指令集.
所以你可以了解為什麼我極力鼓吹學習組合語言的
應該先學計算機結構,因為你要先有這種基本的概念,
才能在學習組語的過程中知道自己在學的是什麼,
知道如果遇上問題,你應該去找什麼樣的資料.
也唯有了解計算機結構,你才能了解為什麼我們通常
用 xor ax,ax,而不用mov ax,0,雖然這兩個作用一樣.
Reference:
請介紹一本適合初學者關於組合語言的書
我記得我以前好像講過,不過我還是再強調一次...
"組合語言非常不適合被當作單獨的電腦程式語言來學習!!!"
所謂的"組合語言"是可以跟某一種CPU的二進位指令碼一對一
對應的速記符號,當你在講"組合語言"時,實際上是指某一種
CPU的指令集的速記碼,而且你指的還是"某一種組譯器"所能
接受的格式.
為什麼組合語言不適合這樣教? 因為要一個初學者了解組合
語言,他必須先了解這顆CPU,要了解一顆CPU,他至少要知道
大致上CPU是怎麼回事,所以,他必須先學完計算機結構,至少,
要學完"指令集",然後再以某個特定CPU的指令集為例子,教他
這種CPU的組合語言.
學會了C,你到哪用的C都大同小異,但是在不明白計算機結構
的情況下學x86組合語言,你可能不會了解MIPS或Alpha的組語
要怎麼寫,不要說是別種CPU,你看了施威銘那本組合語言,我
就不相信你知道同樣在x86底下,UNIX裡面要顯示字元怎麼寫,
你也不會知道在不准靠INT 21(DOS),INT19(BIOS),的情況下
要怎麼去用組合語言存取硬體,偏偏這些才是組合語言的真正
目的.
這世界上沒有哪本書可以讓初學者不碰計算機結構,就可以
學好組合語言的,要基礎,要觀念清楚,請先找一本清楚的計
算機結構課本,沒有這種基礎,施威銘那本(以及坊間絕大部
分的組合語言"入門"書)只會讓你知道怎麼呼叫DOS和BIOS
裡面別人已經幫你寫好的東西,至於那些東西怎麼寫,你大概
也學不會.
--
廟小妖風大
池淺王八多
--
> : 對於這個部份我深有同感,
> : 市面上x86的書好像都是在講解
> : DOS下的組合語言使用法,
> : 對於一些我比較想要了解的
> : 部份都未提及..
> 我以前也看到有人 post 說,如果寫組語都只是在 call 別人寫好的程式(int),
> 那這樣根本不算是真正在寫組語。這樣的說法嚴格了一點,不過市面上的組語書
> 的確大都以「你要學 DOS下 x86 組合語言」為前提。我也很想知道如果不用DOS
> 和 BIOS 的中斷,要怎麼寫個程式達到有那些中斷的功能或者存取硬體,不知道
> 有什麼書是以這樣的主題為內容呢?想來真的是蠻不容易的。
那就是我說的....
有沒有書會拿這種主題當內容? 很抱歉,沒有.
為什麼沒有? 因為講不完,"存取硬體"意味著你
要知道這個硬體的介面(也就是溝通的規則),這
世界上有多少種介面? 有多少種規則? 你自己
隨時可以創出一個硬體介面來,怎麼可能有哪本
書告訴你所有的規則?
舉個例子來說,IDE硬碟知道吧? 知道怎麼存取嗎?
這個問題要問你的IDE硬碟控制器,如果很幸運的,
這個控制器是傳統的定址方式,你可以知道IRQ是14,
I/O port是1F0h-1F7h(我沒記得很確實),然後你
就要找出ST506/412當年的介面規定,上面會告訴你
1F0是資料暫存器,1F1是指令等等等等,然後你就
用in/out對暫存器讀寫適當的值.
如果不是,你就要看該控制器的介面手冊,然後自己
去想辦法摸出方法來存取.
很麻煩是不是? 但是它們的骨子裡是一樣的,也就
是說,你透過CPU對外界的存取(記憶體或是I/O port)
來對一個硬體裝置下達它能接受的命令,正如同你
學習以x86能接受的方式去對CPU下命令一樣,而這個
裝置所能接受的命令,就是它的指令集.
所以你可以了解為什麼我極力鼓吹學習組合語言的
應該先學計算機結構,因為你要先有這種基本的概念,
才能在學習組語的過程中知道自己在學的是什麼,
知道如果遇上問題,你應該去找什麼樣的資料.
也唯有了解計算機結構,你才能了解為什麼我們通常
用 xor ax,ax,而不用mov ax,0,雖然這兩個作用一樣.
Reference:
請介紹一本適合初學者關於組合語言的書
TIO Young Professional Association
http://www.taiwaneseinone.com/main/about-tio
About TIO
Our Missions
To foster and strengthen network for our members;
To facilitate professional development for our members;
To promote balanced life styles;
To involve in local community functions;
TIO Young Professional Association is an organization for young professionals and working individuals who reside and work in Canada. While the majority of the members are Taiwanese-Canadians, TIO's member-base is consisted of individuals from diverse backgrounds.
Our Target Groups
Most of our members were born in Taiwan or other parts of Asia, educated in North America, and are currently working in the Greater Vancouver area. Although the word “professional” usually refers only to those who have a professional designation, such as lawyers, accountants, doctors, engineers, etc., TIO deems all those who have entered the work force or graduate/professional school programs on full-time basis as part of our “professional” group.
Completion of Post-Secondary Education
Working full-time
Enrolled in graduate school or professional school.
Professional Subgroups
We have established 8 professional subgroups to provide members the opportunity to have more interactions with other members from their profession:
ACE (Art, Culture & Education)
AFA (Accouting, Finance & Administration)
BRED (Building & Real Estate Development)
CompIT (Computer & IT)
EngSci (Engineering & Science)
GPL (Governemnt, Politics & Legal)
HealthPro (Health Care Professionals)
HEaT (Hospitality, Entertainment & Tourism)
MaRS (Marketing, Retails & Sales)
About TIO
Our Missions
To foster and strengthen network for our members;
To facilitate professional development for our members;
To promote balanced life styles;
To involve in local community functions;
TIO Young Professional Association is an organization for young professionals and working individuals who reside and work in Canada. While the majority of the members are Taiwanese-Canadians, TIO's member-base is consisted of individuals from diverse backgrounds.
Our Target Groups
Most of our members were born in Taiwan or other parts of Asia, educated in North America, and are currently working in the Greater Vancouver area. Although the word “professional” usually refers only to those who have a professional designation, such as lawyers, accountants, doctors, engineers, etc., TIO deems all those who have entered the work force or graduate/professional school programs on full-time basis as part of our “professional” group.
Completion of Post-Secondary Education
Working full-time
Enrolled in graduate school or professional school.
Professional Subgroups
We have established 8 professional subgroups to provide members the opportunity to have more interactions with other members from their profession:
ACE (Art, Culture & Education)
AFA (Accouting, Finance & Administration)
BRED (Building & Real Estate Development)
CompIT (Computer & IT)
EngSci (Engineering & Science)
GPL (Governemnt, Politics & Legal)
HealthPro (Health Care Professionals)
HEaT (Hospitality, Entertainment & Tourism)
MaRS (Marketing, Retails & Sales)
Friday, July 30, 2010
mysql difference between decimal type and float type
mysql difference between decimal type and float type
Thursday, July 29, 2010
Online Course Lecture at Harvard
Online Course Lecture at Harvard
http://computerscience1.tv/
http://academicearth.org/subjects/computer-science
Free Science Online
Free Video Lectures
LectureShare
SICP Lectures
Video Lectures.net
LectureFox
Free Streaming Audio and Video Lectures
Ted
There's a host of videos from TED conferences and every one of them is worth watching. I can strongly recommend the talks by Richard Dawkins, Craig Venter, Hans Rosling and many others.
Reference:
http://stackoverflow.com/questions/24319/where-can-i-find-good-technical-video-podcasts-or-videos-for-download
http://computerscience1.tv/
http://academicearth.org/subjects/computer-science
Free Science Online
Free Video Lectures
LectureShare
SICP Lectures
Video Lectures.net
LectureFox
Free Streaming Audio and Video Lectures
Ted
There's a host of videos from TED conferences and every one of them is worth watching. I can strongly recommend the talks by Richard Dawkins, Craig Venter, Hans Rosling and many others.
Reference:
http://stackoverflow.com/questions/24319/where-can-i-find-good-technical-video-podcasts-or-videos-for-download
侯捷觀點 - 漫談 程式師與編程
侯捷觀點
漫談 程式師與編程
random talk on programmer and programming
北京《程式師》2001.05
臺北《 Run!PC》2001.06
作者簡介:侯捷,臺灣電腦技術作家,著譯評兼擅。常著文章自娛,頗示己志。
個人網站:www.jjhou.com
北京鏡站:www.csdn.net/expert/jjhou
--------------------------------------------------------------------------------
「侯捷觀點」進行了4期。通過這個專欄的作用,我開始接觸大陸的電腦技術刊物《程式師》和電腦技術網站 CSDN,並累積了相當量的觀察和感想。這個專欄前數期談的都是技術,不是深度書評就是高階技法。這一期讓我們輕鬆一下,談談程式師(p rogrammer)與編程(programming)。其中不少議題起因於讀者來信的觸發,許多觀點我也已經回應於侯捷網站上。所以若干文字可能你曾經在侯捷網站上閱讀過。有些看法也許讀來刺眼,聽來刺耳。但如果大家不把我視為外人,當能平心靜氣地思考。臺灣存在許多相同的問題,我也時常為文針砭。
有一句話這麽說:如果你想使人發怒,就說謊。如果你想使人大怒,就說實話。說實話的人來了,但願你心平氣和。
急功近利是大忌
一位讀者寫信給我,說他非常著急。他一個月掙300元人民幣,家裏情況又不好。他希望趕快把 VC/MFC 學會,進入 IT 產業掙錢。信寫得很長,看著看著,我也不禁為他著急起來。
有許多讀者,雖然情況沒有那麽急迫,燃眉之情卻也溢於言表。不外乎都是希望能夠儘快把某技術某技術學習起來。
但是哪一樣東西哪一樣技術是可以快速學成的呢?能夠快速學成的技術,人才也就必然易取易得,根據市場供需法則,也就不可能有很好的報酬。所以諸君當有心理準備,門檻高的,學習代價高,報酬高;門檻低的,學習代價低,報酬低。
說起來是老生常談了。這其中最可怕的心理在急功近利。從讀者的來信,以及從 CSDN 上的眾多帖文,我感覺,許許多多人學習 IT 技術,進入 IT 產業,是認為 IT 產業可以助你脫困,遠離貧窮。
是的,IT 產業有這個「錢」景,但你得有那份實力。要吃硬核桃,也得先估量自己的牙口。
「好利」是基本人性,Acer 總裁施振榮先生大力提倡「好逸惡勞」之說,視為人性之本,進步的原動力。誰能說不是呢?好利可以,近利就不妙了。近利代表目光淺短,一切作為都因此只在小格局中打轉。
梨園有句話:要在人前顯貴,就要在人後受罪。臺上一分鐘,台下十年功。老祖宗這方面的教誨太多了,身為中國人的我們,應該都耳熟能詳。
對於心急的朋友,我只有一句話:勿在浮沙築高臺。你明明很清楚這個道理,為什麽臨到自己身上,就糊塗了?急是沒有用的,浮躁更會壞事。耐住性子紮根基吧。做任何事都要投資,紮根基就是你對自己的未來的投資。如果想知道如何按部就班紮根基,侯捷網站上有一篇文章:「9 7/06 選義按部 考辭就班」,請你看看。
口舌之戰有何益
最常在程式技術相關論壇上看到毫無價值而又總是人聲鼎沸的口舌之戰,就是諸如「VB 和 Delphi 誰好」、「BCB 和 VC 誰優」、「Linus 和 Windows 誰棒」、「Java 和 C++ 誰強」這種題目。每次出場都一片洋洋灑灑,紅紅火火急速竄升為超酷話題。眾人各擁所好,口沫飛揚,但是從來說服不了任何異陣營的人,話都只說給自己人聽,給自己人爽。
這樣的論戰有何意義?許多人在重組自己的偏見時,還以為自己在思考呢。戰到最後,就只是爭誰說最後一句話而已。而且,擦傷引起的爭吵幾乎總是以刺傷結束。
工具與技術的評比,是一場高水準的演出。真有能力做評比,侯捷是很尊敬的。但是這些各擁所好,口沫飛揚的人,真的對評比兩造都有深刻的瞭解嗎?很多時候我們看到的只是無知,而無知是這麽一種東西 : 當你擁有了它,你就擁有巨大的膽量。
很多人喜歡某種工具,只不過因為那是他的初體驗。他玩它玩出了一點心得,可以說出它的某些好,就開始做「評比」了。你只看到牡丹的豔麗,又怎知寒梅的清香,幽蘭的空靈?
絕大多數人使用某種工具,不是因為它最好,不是因為眾裏尋它千百度,僅僅只是因緣際會。雖然說不同的應用環境選擇不同的工具,是最伶俐的作為,但我真的懷疑,在現今工具(以及工具背後反映的技術)如此繁複的時空下,有多少人能夠同時精通一個以上的同質工具?追二兔不得一兔,我還是認為你精專一樣工具,把它發揮到最高效能,獲得的利益多些。被大家拿來評比的,都是市場上的佼佼者,還能差到哪里去?能夠兩雄相爭,必然是在技術面、非技術面(資源的普及、品牌的可靠度)各有一片天,你的評比意義大嗎?全面嗎?
大多數人沒有能力同時精通兩種同質工具,初學者聽了網路上不知名大俠的高論,也不可能有所選擇(如果有,怕也只是蒙著頭瞎選)。這種沒有提供資料,評論者也沒有顯示任何信譽(c redit)的論戰,沒有任何意義,純粹只為自己爽。浪費網路資源!
C++ 之父 Bjarne Stroustrup 曾經在他自己的網頁上的 FAQ (以及其他許多場合)中回答如下問題。雖然其中談的是語言,但是擴大到其他層面仍然合適,值得大家好好咀嚼(注:全文由孟岩先生譯出,可自侯捷網站流覽):
Q: 你願不願意將C++與別的語言比較?
A: 抱歉,我不願意。你可以在The Design and Evolution of C++的介紹性文字裏找到原因。有不少人邀請我把C++與其他語言相比,我已經決定不做這類事情。在此我想重申一個很久以來我一直強調的觀點:語言之間的比較沒什麽意義,更不公平。主流語言之間的合理比較要耗費很大的精力,多數人不會願意付出這麽大的代價。另外還需要在廣泛的應用領域有充份經驗,保持一種不偏不倚客觀獨立的立場,有 公正無私的信念。...
人們試圖把各種語言拿來比較長短,有些現像我已經一次又一次地注意到,坦率地說我感到擔憂。作者們盡力表現出公正無私,但最終都是無可救藥地偏向於某一種特定的應用程式,某一種特定的編程風格,或者某一種特定的程式師文化。更糟的是,當某一種語言明顯地比另一種語言更出名時,一些不易察覺的偷樑換柱就開始了:比較有名的語言中的缺陷被有意淡化,而且被拐彎抹角地加以掩飾;同樣的缺陷在不那麽出名的語言裏就被描述為致命傷。同樣的道理,較出名的語言的技術資料經常更新,而不太出名的語言的技術資料往往是陳年老酒,試問這種比較有何公正性和意義可言?
Q: 別人可是經常拿他們的語言與C++比來比去,這讓你感到不自在嗎?
A: 當這些評比不夠完整,或者出於商業目的,我確實感覺不爽。那些散佈最廣的比較性評論大多是由某種語言,比方說Z語言的擁護者發表的,其目的是為了證明Z 比其他語言好。由於C++被廣泛運用,所以C++通常成了黑名單上的頭一個名字。通常這類文章被夾在Z語言供應商提供的產品之中,成了其市場競爭的一個手段。令人震驚的是,相當多的此類評論竟然引用的是那些Z 語言開發廠商的員工的文章,而這些經不起考驗的文章無非想證明Z是最好的。尤其當評論之中確實有一些零零散散的事實...,特意選擇出來的事實雖然好像正確,有時卻是完全誤導。
以後再看到語言評比文章時,請留心是誰寫的,他的表述是不是以事實為依據,以公正為準繩,特別是評判的標準是不是對於所引述的每一種語言來說都公平合理。這可不容易做到。
我說過了,真正精譬的技術評比,對於相當程度的研究者,是很有價值的,但我很少在論壇上看到精品 ─ 論壇還能有什麽精品,99% 是打屁閒談沒有營養的文字。我們每每在其中看到偏見、我執、以及最後免不了因擦傷而引起的刺傷。這真令人傷感。這些人把時間拿來學習,多好。奉勸各位少花時間瞎打屁,多花時間學習,看些真正的精典,別動不動就在論壇上提問,也別動不動就掛在論壇上看別人的瞎打屁。
不但評比性的話題,大家喜歡強出頭,其他話題,情緒性的反應也很多。中國強盛之道,眼前彷佛全壓寶在 IT產業(尤其軟體工業)上面。程式師被賦予了過多的期許,程式師也自我膨脹了許多。夾雜著民族主義或個人好惡,看到不滿意的人事物,就號召大家「黑(h ack)」過去。這是什麽心態?比拳頭嗎?說實話,就算要比拳頭大小,「黑」個網站算是什麽尺寸的拳頭?網路是個大暗室,君子不欺暗室。
雜誌定位在哪里
CSDN上頭,前一陣子曾經請大家就《程式師》的定位問題給意見。很熱鬧。我不知道刊物掌門人在看了那麽多建言之後,有沒有收穫。猜想是沒有 ─ 就算有也恐怕不大。
就像面對書籍一樣,讀者最直觀的感覺,就是要看他所需要的東西。100個人有100種需求,這樣的詢問得不出總結。隱性讀者、不上網的讀者、不投票的讀者、不寫帖文的讀者,你又如何知道他的想法。
我以為,只需把握一個原則:永遠比大眾水平高一個檔次,扮演引導者,帶領讀者接觸前沿思想與宏觀視野,那就是了。讀者本身會成長,不論你把刊物定位在實質技術的哪一個層次,都會有人不滿足;今年的讀者成長了,不見得明年還是你的讀者。唯有保持前沿思想與宏觀視野,時常導入新的技術形式、新的思維、專家的見解、意見領袖的看法,才能夠長期吸引讀者,並對許多人以及整個技術開發環境做出長久的貢獻。
美國大物理學家費曼,曾經批評物理課的教學。他說老師老是在傳授解物理習題的技巧,而不是從物理的精神層面來啟發學生。這一點是不是可以給刊物經營者和刊物讀者一點點啟發?
以此觀之,就我個人的專長領域,STL 之父訪談錄、演算法大師 Donald Knuth 採訪、C++/OOP 大系、GP/STL 大系、將標準C++視為一個新語言┅以及一些總括性、大局觀的文章,是我認為最好的主題。此中有侯捷自己的作品,唔,我向來不客氣。
當然啦,太形而上的東西是不行的,太過抽象的東西不容易被接受。抽象層次愈高,人的自由度愈大,但抽象思考是層次高的人的專利,要普羅大眾能夠接受,還需具象細節稍做輔助。
如何長期保持具有前沿思想與宏觀視野的稿源?與外國雜誌合作是一個既快又好的辦法。每一期《程式師》最前數頁都有當期重要外文期刊的前沿摘要,可見《程式師》編輯群一直與外文專業期刊保持著閱讀上的接觸。要挑選合作夥伴,心中一定有譜。
當然啦,與國外合作涉及經費問題。旁人(尤其讀者)很難體會或換位思考經費上的種種困難。就像有人痛心疾首義正詞嚴地埋怨 CSDN 速度慢得像蝸牛,卻可曾想過網站的資源從哪里來。向你收費,你接受嗎?臺灣已經倒掉很多很多家著名的網站,我等著看免費的服務撐到幾時。
要刊物宏觀耐讀,讀者們也得成熟些。一群很好的讀者,才拱得起一本很好的刊物。
下麵是一封讀者來信:
現在技術發展太快了,國外(甚至印度)在實現「軟體工業化」的時候,大陸(至少我周圍是這樣)還停留在小作坊手工打造的水平。我認為未來的世界不再屬於「個人數位英雄」,軟體工程似乎比一兩項技術更迫切。以您的大局觀和豐富的閱歷,對這個問題是否有不同的看法,不知您是否願意就此從技術(或其他)角度寫篇文章發表您的見解。
軟體工程對整個軟體工業的提升,至為重要。但是一個程式師要修練到對「軟體工程」這個題目感興趣,非三五載(甚至更多)不為功。我的意思是什麽呢?我的意思是,這類書籍、這類工具、這類網站、這類刊物,在一個嘈嘈切切、急功近利的環境中難有生存空間。這是為什麽蔣濤先生想要將《程式師》雜誌導向軟體工程主題時,我對他興起巨大的尊敬與憂慮的原因。
順帶一提,《程式師》的文字水平一直以來帶給我「閱讀的樂趣」。這個評語我從來少有機會用在臺灣的電腦刊物或電腦書籍上。比起臺灣的電腦讀物,這裏的文字有深度多了。
輕浮躁進沒信心
只要上網看看程式師出沒的論壇,你就會看到一片浮躁與焦慮。反映出來的就是沒有信心。
「C# 推出,Java 將死」,「Java 演進,C++ 將亡」,「.Net 推出,VB程式師死定了」,「Kylix 推出,大夥兒快學」,「Delphi 持續新版,哥兒們別怕」,「我剛學VC,怎麽它就出場了」,「MFC 真的要過時了嗎」┅。諸如此類的問題,不知該歸類為謠言還是童語?
很奇怪也很感歎,為什麽大家對這類問題如此感到興趣。那透露出一種膚淺 ─ 沒有深刻瞭解技術本質,因而汲汲營營慌慌張張惶惶惑惑於新工具、新事務、並且認為新的大概一定都是好的。對自己沒有信心,對整個環境也沒有信心。
有深度的程式師絕對不會在意這種事情。當然,並不是早晚三柱香就萬事保平安。並不是告訴自己別在乎別在意,就真的能夠不在乎不在意了。那必需是發自內心,胸中自有丘壑的一種篤定,有著好的本質學能做靠山。
臺灣 BBS(連線)前陣子也有許多熱烈討論 Java, C#, C++, .NET 的貼信。我把我最欣賞的一封引於下。其最後結語,擴張到任何領域都是合適的。
發信人: algent@kkcity.com.tw (流雲), 看板: programming
標 題: 一些想法Re: 不懂,業界一直喊Java,在喊些什麽..."
發信站: KKCITY (Sun Feb 18 12:55:49 2001)
以目前臺灣業界的情形來看,C\C++ 應該是想成為一個軟體工程師的基本技能;至於 Java,如果熟悉 C++,學 Java 應該花不了一個月的時間。
以我個人的觀點,Java 的 OO 程度是勝於 C++ 的,而且在這個 Internet盛行的年代,效率的瓶頸在於網路本身的頻寬而不在單機執行時的效率,Java 所提供的 Collection framework 是非常威力強大的程式設計工具,又內建了對 Multi-thread 程式的支援,豐富的 class library 讓人在設計網路、資料庫┅的相關軟體時無後顧之憂。
C++ 可能是過去十多年以來最重要的程式語言之一,它的效率顯然較Java為佳,但在撰寫需要安裝在Internet上成千上萬種不同廠牌的機器上執行的程式時,相對於J ava可能就不是最好的解決方案。
「目前」不需要以 Java 來開發 DeskTop 上的應用程式,因為「當下」而言 Java 撰寫的程式相對於 C++ 會佔據更多的記憶體且執行效能不彰。
我們不能期待免子遊得比魚快,也不能期待魚飛得比鷹高。
工程上的需求使得各種場合有不同的適合的程式語言,不必費心去批評 A、推崇B、打壓 C。基本的理論比這些事重要多了。
VB 將死?Java 將亡?C++ 將被 Java 取代...,這很重要嗎?我用Java 也用 C++,即使明年它們全都被 Java++、C++++、Lisp++、Forth++取代,何有於我哉?FFT 還是 FFT、Dijkstra algorithm 還是Dijkstra algorithm...還是別太擔心這些事了...
侯捷除了偶在 BBS 上自說自話外,絕少回應或三與討論。看了上封信,忍不住回了一帖:
作者: jjhou (jjhou) 看板: programming
標題: 一些想法Re: 不懂,業界一直喊Java,在喊些什麽..."
時間: Fri Feb 23 21:12:14 2001
同意你的看法。寫得非常精采。
人到了一個層次,才會去思考事物的本質是什麽,不被浮面的工具所系絆。
熟練工具是必要的,但工具的演化汰換,不是大家在這裏關起門來喊爽就好。
Donald Knuth 說:「語言持續演進,那是必要的。不論現在流行什麽語言,你都可以肯定十年二十年之後它不再風光。我總是在自己的書中寫些不時髦的東西,但這些東西卻值得後代子孫記取。」(注:以上局部是《程式師》2 000/12 的譯文)
DDJ 1996/04 p18:
"Language keep evolving, and that is necessary. ...Whatever computer language is in fashion, you can guarantee that whitin a decade or two it will be completely out of fashion. In my book, I try to write things that are not trendy, but are things that are going to be worth remembering for other generations."
追求新知固然是一個電腦從業人員該有的態度,但是追求新工具與充實固有知識兩者之間,應該取得一個平衡。過猶不及!
再說,凡走過必留下足跡。你現今的任何努力,只要它是扎扎實實的,就絕不至於落空。技術是有累積性的呀,技術總是觸類旁通的呀。你說 MFC 和 OWL 就沒有累積性,我說有,message map 的原理不一樣嗎?framework 的工作原理不一樣嗎?
我個人並非任何語言或任何工具或任何技術的狂熱者,我是務實派。對於自稱熟稔多種(屬性不同的)語言的人,我充滿敬畏並保持工作上的距離。要精通一個語言,使自己能發揮其最大效能,不是件容易的事,需要不少精力的投注。9 9.99% 的人都是凡人,身為凡人的我們,把時間用來精通一(或二)種適合其工作性質的「語言」,比泛泛認識多種「語法」,要高明得多,回報也大得多。
真的,還是別太擔心誰將興起誰將亡的事了吧。
天才的沃土
教育永遠是我最關心的議題。教育的重要性絕對不亞於產業。沒有好的教育,何來好的產業人才?
學校教育就不提了,那不是侯捷能夠著力的地方。雖然我也在大學教書,但一年不過教育數十位學生,影響能有多大?書籍的讀者動輒數萬人,刊物的讀者動輒數十萬人,這才是有大影響力的地方。
自修教育如影隨形,打你離開學校就跟隨你一輩子,重要性遠勝於學校教育。談到自修,離不開讀物 ─ 各種型式的書籍和刊物。在咱們程式師這一行,書籍和刊物的情況如何?
下麵是一封讀者來信:
我記得您說過,到一個地區的書店去逛逛,對這裏的IT技術水平就知道大概。這話太得我心了。我學習軟體技術5年,花在買書的錢有一萬二千(人民幣)以上,如今回頭來看,絕大部份是垃圾。以前曾經擔心:若要到外地工作,這麽多書怎麽帶走?現在則是一種心痛的輕鬆,因為值得帶走的書只夠一提。學習I T之初,誰不想在產業上做出一番成成績?但多年之後回首,則恐怕都會為自己當時所處的教育環境痛心。
關於電腦書籍的浮濫、低劣,我收到太多太多的讀者反應了。以上只是冰山一角,有興趣的讀者請上侯捷網站看個飽。有些出版社甚至以出爛書聞名,看看這封信:
您想必看過蔣先生在《程式師》上寫的文章,知道所謂IT出版四大家。蔣先生可能礙於禮儀,有些地方還沒講透。例如其中的XXX出版社,在譯作方面現在已經是一塊榜樣 粗製濫造的榜樣。
再看這封信:
我在您網站中看到了有關對關於xxx 出版社的評價,深有感慨。其實該出版社是大陸IT業引進外文書籍的鼻祖,我們這一輩程式師(92年以前的)就是讀該出版社的譯著成長起來的(我至少還有兩大紙箱x xx出版社的舊書),在那個時候,差不多所有的電腦類圖書都是它們引進並翻譯的,現在看來,那個時候的翻譯質量差得無法忍受(比I ncide VC++ 5/e還差許多),但我們那個時候已經很滿足了,畢竟有比沒有好。現在大家對xxx出版社的批評,我想是競爭的結果,因為大家看到了更好的譯著,有了比較。總而言之,x xx 出版社當年的特點是大量翻譯,草草出版,讓科技人員能夠在儘快的讀到優秀作品。這種作風顯然已經不合時宜了,或者說它已經完成了它的歷史使命。我現在當然也不象從前那樣狂買x xx 出版社的書了,因為有了更多的選擇。
這封信讓我跌入回憶。臺灣也曾有兩家出版社,有過同等劣質的作法。這兩家惡貫滿盈的出版社,一名瑩圃,一叫松格。兩家都關門了。他們的作法都是,快速而大量地翻譯外文書。由於速度快,也由於選材之中不乏好書,所以曾經擁有一定的市場。怎地都關門了?因為讀者只能被欺負一次兩次,不會永遠當傻瓜。這樣的出版心態擺明沒有長遠打算,只想撈一票走人,不關門才怪。
我們可能因為,垃圾堆中多少撿過一些經過修補尚稱堪用的東西,而對刻意製造這些垃圾的人產生一種奇怪的情愫。東西明明不好,但我們從中吸收了一點點養份。該謝他還是該恨他?
該唾棄他!
這些商人之所以大量而快速地引進外文書,因為有利可圖。有利可圖是好事,但他沒把他該做的事做好。他們放棄品質而無所懼,因為他們知道,在怎樣的時空背景下可以怎樣輕鬆地賺錢。大陸出版界朋友告訴我,誰誰誰(都有名有姓)很輕鬆地在幾年裏就這樣積聚了幾百萬人民幣的身家。幾百萬人民幣呀,我的天。這也算 IT 產業吧,果然是一片紅火,雞犬升天。
因努力做事而致富,應該得到我們的讚美和祝福。可這樣的出版社,花更大的功夫賺更多更長遠的錢他們不要,因為輕鬆錢賺起來不費勁兒。百分之一的人可能從這些垃圾中吸收到一些養份,百分之百的人從中感受了閱讀的痛苦。誰知道從中被誤導的人又有百分之幾?買書的錢我們沒少花,得到的正價值卻是那麽少,痛苦指數那麽高。
這位讀者說『總而言之xxx 出版社當年的特點是大量翻譯,草草出版,讓科技人員能夠儘快的讀到優秀作品』,又說『它們引進並翻譯的,現在看來,翻譯質量差得無法忍受』。喔,一本優秀的原作,經過無法忍受的翻譯質量洗禮後,還會是一本優秀的作品嗎?待人寬厚是美德,但是刻意製造餿水油讓人吃壞肚子者,不值得為他們說話。你說『它已經完成了它的歷史使命』。不,他們從來就沒有歷史使命,也沒有使命。
如此「仁厚自持」而且忍耐度奇佳的讀者,相當稀少。絕大部份程式師談到電腦圖書,都是斑斑血淚的控訴。《程式師》2001/03 p119 可不就有一篇「電腦圖書出版商的陷阱」。
讀者來信寫道:
魯迅說,未有天才之前,應該要先營造天才的土壤。...您的心情我確實能夠深刻理解(這大概就是堆在牆角那幾百本垃圾書的最大貢獻吧)。
「天才的土壤」,嗯,魯迅說得好。不正應該是出版社的職志嗎?我們卻能向誰說去?其實我們也只是希望有一些好書造就一些資質不錯的程式師而已。前一陣子才沸沸揚揚於印度程式師與中國程式師的比較,我們哪企望天才?不過就是希望培養一些扎實的人才而已。
看倌也許奇怪,書不好,侯捷為什麽不把矛頭對準作者,卻大罵出版社。哇勒,我早就抱著「得之我幸,不得我命」的卑微態度,不敢期望創作性中文好書。上面我說的,以及讀者最痛心疾首的,是翻譯書的低劣水平。人才濟濟的中國,怎麽可能找不到夠格的譯者?如果不是出版社的搶錢搶短心態,會造就出這一大批劣品嗎?我能不怪罪出版社嗎?
到頭來,還是要靠自己。「電腦圖書出版商的陷阱」一文最終是這麽說的:『記住,您花的是自己辛苦掙來的錢,所以千萬不要浪費在沒有用的東西上。對於出版了優秀圖書的出版公司要有所回報。買他們的書,給他們寫信,讓他們知道你在想什麽,你需要什麽。』
良性迴圈
一個體系的建制,需要從底層到頂層的堅實構築。不論是 C++, Java, .Net, OO, UML, Windows programming, Linux programming,每一個主題欲成就一個完整體系,都需要一大套書。拿C++/OOP 來說,就得涵蓋語法語意的、物件模型的、專家經驗的、設計樣式(design patterns)的、入門的、進階的,作為三考工具的┅。拿 GP/STL 來說,就得有 GP 泛論型的、STL 源碼剖析的、STL 應用大全的、STL 規格大全的、STL 元件設計的、其他泛型技術的┅。拿Java 來說,就得有語言核心的、物件導向的、多緒編程的、圖形介面的、網路應用的┅。對生手而言,不先把底層的東西弄清楚就學習高層的抽象,必會成為空中樓閣,流於形式。對熟手而言,缺乏抽象思維,意味層次上的停滯。
寫作、翻譯、乃至於出版全體系好書,真的是一件需要目光長遠、意志堅定、帶點理想色彩的人,才做得起來的志業。
如果這樣的人,這樣的出版社,沒有得到大家理念上和實質上的支持,誰會投入這種傻事?
我個人一向是高品質高價位的堅定信奉者。高品質高價位是生產者經營的最大誘因。因為努力做出了高品質,所以得享高價位帶來的高利潤,天經地義。否則誰要費心去做高品質?慈善家嗎?傻瓜嗎?
對於消費者,高價位當然令他不舒服。但是你應當思考是否物有所值,甚至物超所值。拿英文書為例,USD 49.95 一本的 The C++ Standard Library,或是 USD 49.95 一本的 Generic Programming and the STL,完全物超所值。當我瞭解這些書的價值,就算他們再貴兩倍,我也要買。有人拼死吃河豚,我可是要拼命買好書。現實地說,眼下「知識經濟」喊得震天響,好書帶來的知識不正是賺錢工具嗎?對賺錢工具小氣,是不是和自己過不去?
下麵是一封讀者來信:
相較日本無論是漫畫作家、文學作家或是偶像歌星、影星的客觀條件來比較, 在臺灣,身為專業作家竟如此難為?有人可以連夜搭帳篷排隊買票看演唱會,有人卻可以論斤計兩地討論頁數與書價高低。或許他們不知道,一本介紹C 程式語言的入門書,在德國索價100 DM (約NT$2000)。 因此我的德國同事們購書前必定徵詢意見或三考書評。書價雖不低,但其讀書風氣仍不亞於日本。
這裏點出了一個重點:書價很高,於是大家慎選好書,重視書評。下麵是另一封讀者來信:
我是一名大陸的讀者,同時也是一名電腦的初學者。我在網上看到網友都十分推崇您的著作及譯作。知道您的作品《深入淺出MFC》第二版即將在內陸出版,我決定買這本書,並與華中科技大學出版社取了聯繫。從那裏知道您今年還會在大陸出幾本書,我非常高興,但在知道了您對價格的看法後,又有些失望。
大陸與臺灣的經濟水平是不同的,作為普通的工薪階層,購買力也是有限的。我們這裏,各類圖書中電腦類圖書的價格是最高的,圖書頁碼的最高位與書價的最高位基本相同 -- 700頁的書,價格在70到80元之間,1000頁以上的,價格在100元以上。這是目前大陸書價的大體情況。如果按您所說,350頁,書價80元,在這裏算是很高的價格了,這種價格的書,只能看,不能買。
"春蠶到死絲方盡,蠟炬成灰淚始幹",教師工作被我們看成很神聖的職業,燃燒自己,照亮別人。我想您出書的目的,也是想讓更多渴望知識的人受益於它,少走彎路。作為讀者,我們也希望能夠看到更多更好的書。但是在一定歷史時期內,購買力與價格應當有一個平衡,3 50頁80元的價格確實太高了,如果能夠降到60元以內,我相信大多數讀者可以接受。
您的書的品質很高這是大家的共識,從價格上應當與其他書區別開來,但書價也不宜太高。名牌服裝走高價位的路線,當然可以提高它的身價,顯得它檔次很高,但是太高的價格使它脫離了主要的消費群體,大多數人只能在口頭上談論它,卻只有極少數的人會把它穿在身上。書籍與名牌服裝不同,只有經過很多讀者長時間的閱讀之後,才能夠證明它的價值,如果很多人都知道侯先生的書質量很好,但是卻很少有人讀過(因為價格問題),那豈不是一種悲哀。
我最不樂意看到「xxx 頁的書,售價 xxx 元」這種觀念。一本書的價值在內容,不在頁數。真要這麽算,每本書我們都應該檢視一下其字型大小、行距字距、硬拷圖多寡、留白多寡 -- 因為這些都關係著頁數。如果大家都接受頁數和書價的固定比例,肯定會有大量浮濫的書跑出來(不就是現在的情況)。
不必這麽累。一本書值它的價,就買;不值它的價,就別買。很簡單的邏輯。
我們難道能夠拿著尺衡量一件亞曼尼用了多少碼布,來決定它的價格嗎?或是拿著尺衡量一張梵穀是幾號,來決定它的價格?我能夠說因為我畫的繡球花比梵穀的鳶尾花大兩倍,所以我應該賣他的兩倍價?
買東西不能光看有形;那無形的往往更重要。買書不是買紙。正確價值觀必須建立起來。
當然很有可能你認為買名牌服裝或名畫的人都是瘋子。你要的只是布和框。那表示那些物品在你心中不值那個價。很好,你有你的評價,你有你的選擇。
我不打算在「引喻」(例如名牌服裝或名畫)上面多著墨。引喻有顧此失彼的時候,筆戰都是這樣打起來的。各位知道我要強調的是什麽。
350頁的書,不應該一定賣 80元,也不應該一定不賣 80 元。這要看350頁的含金量有多少。況且我從沒說過侯捷有 350頁的書要賣 80元。但所有的可能都存在。350頁可以是180元,也可以是80元,也可能 530 頁連 18 元都不值。請不要再以頁數做為書價的依據了。
教師的工作很神聖,但「燃燒自己,照亮別人」太沉重。「燃燒自己」,呵呵,說起來多麽容易,做起來多麽痛苦。某人的工作對眾人有益,他會很開心。但你要他燃燒自己照亮別人,除非聖人,否則不幹的。我很樂意照亮別人,卻不想燃燒自己。燃燒自己,我只能照亮別人五年;把自己照顧好,我可以一輩子照亮別人。抬出大帽子,會讓有能力寫作好書的人畏懼不前。
請大家接受這樣的觀念吧:書的價值在內容,不在厚薄,不在頁數。價值影響價格,價值帶動價格。接受這樣的觀念,便是對好書的所有出力者致上敬意與實質支持。如果大家慎選好書,1 0 本垃圾書的價格支撐兩三本高價(其實是適價)好書綽綽有餘。走編程這條路,誰手上沒有 10 本 20 本垃圾書!當大家慎選好書,支持好書(儘管它價格較高),就會帶動書評風氣,帶動優良寫譯風氣。這對所有的人都好。不需有人燃燒自己,大家都被照亮了。
當然,高價位的薄書很可能帶來盜印與影印的歪風。但無論如何,我是堅持己見不會退縮的。如果大環境真的無法提升,好書離開了,好人退出了,最後損失的是誰?
不論各位相信不信,侯捷企圖以個人影響力(如果有的話)建立優良的技術寫作大環境,對臺灣如此,對大陸也是如此。「問渠安得清如許,為有源頭活水來」,要讓大家有更多好書可讀,就要有源頭活水注入;要有源頭活水,就要有更多誘因吸引更多才能之士到技術寫譯領域來。更多的誘因是什麽?讓他們知道,好作品可以突出,可以區隔(講白了就是有好價格),不會牛驥同一皂,這就是一種誘因。不,這不算誘因,這根本是一種基本道理。
優質的書使讀者受惠,優質書籍所帶來的高報酬使作者、出版社受惠,並吸引更多優秀人才到這個領域。形成一個良性迴圈,大家都受惠。
另外我要建議大陸出版社,善用你們獨特的「影印版」。臺灣的電腦類翻譯書,由於也是良窳不齊,窳多於良,曾有讀者開玩笑建議,出版社取得授權後,不要譯了,直接以原文出版,讀者看得高興,售價又得以大幅下降。想來這就是大陸的影印版(在臺灣是不許的)。既然翻譯書已到了千夫所指的地步,何不乾脆多多引進影印版?不是要搶短搶快嗎?這個最快了,讀者也多一種選擇。
翻譯出了什麽問題
電腦翻譯書的一個大問題是,譯者沒有時間(或正確的心態,或足夠的中文能力)將譯稿一看再看,一改再改。中文有一個缺點,那就是名詞本身表現不出複數,動詞本身表現不出時態。多數時候這可能不是很重要,因而可以忽略。但某些時候它們佔有關鍵地位,於是一個精准的英文句子,往往需要譯者額外花很大的心力,才能精准地以中文表達出來,那麽譯者就得有足夠的時間和足夠的中文能力。而唯有譯者在專業技術上具備足夠的素養,才能夠看出某些隱微地方對理解之正確性有關鍵性影響。
英文裏頭的子句如果又臭又長,別說中國人,老外也得費一番手腳才看得懂。看看這個(C++ Primer 3/e, p730):
[code..] where the conditional test if (this != &rhs) prevents assigning a class object to itself. This is particularly inappropriate in a copy assignment operator that first frees a resource currently associated with the class in order to assign the resource associated with the class being copied.
我的譯文是:
[code..] 其中的條件測試 if ( this != &rhs ) 避免將 class object 指派給自己,因為「自己指派給自己」這個動作,對於那種「先將目前系結於自己身上的資源釋放掉,以便稍後將該份資源再系結於即將被拷貝的那個 object 身上」的 copy assignment 運算子而言,尤其不合適。
只需加幾個引號,標示出子句,就好看多了。尋常一樣窗前月,才有梅花便不同。如果沒有引號輔助,你試譯看看會是什麽樣子。別對我說「根據教育部規範,上下引號只適用於強調專有名詞或特殊語氣┅」,規範是死的,人是活的呀。只要能夠靈活而正確地表現出文意,就是好辦法。小平同志不是說,管它黑貓白貓,會抓老鼠的就是好貓嗎。阿波羅1 3號登月計畫失敗時,太空艙內的備用排氣罩規格不符,地面控制中心要求宇航員必須想辦法將方形罩子塞進圓形的排氣管中,否則大家都得因為飽食二氧化碳而死於太空。這時候還想什麽規範?腦筋靈活點。
另一個中文表達的大缺點是:動名詞不分。操作是名詞(operation),也可以是動詞(operate);實現是名詞(implementation),也可以是動詞(i mplement);三考是名詞(reference),也可以是動詞(refer);請求是名詞(request),也可以是動詞(request);委託是名詞(d elegation),也可以是動詞(delegate)。當動詞名詞混雜一起的時候,就造成閱讀上的錯亂。於是你可以看到這樣的句子(取材自《設計模式》p 14,李英軍等譯,機械工業出版社)。請諸位先看原譯,能否就中文語句結構分析出其大致意思:
(1)原譯:只有當委託使設計比較簡單而不是更複雜時,它才是好的選擇。
(1)侯譯:只有當「委託方式」簡化了設計,它才是一個好的選擇。
(1)原文:Delegation is a good design choice only when it simplifies more than it complicates.
(2)原譯:委託方式為了得到同樣的效果,接受請求的物件將自己傳給被委託者(代理人),使被委託的操作可以引用接受請求的物件。
(2)侯譯:為了以「委託方式」獲得相同效果,「請托(request)受理者」將自己傳給被委託人,使自己得以讓「被委託之操作行為」取用。
(2)原文:To achieve the same effect with delegation, the receiver passes itself to the delegate to let the delegated operation refer to the receiver.
我沒有一別苗頭之意。我的譯法不見得最高明。況且翻譯一整本書所需的各種前後呼應的考量,遠比光譯一兩個句子複雜許多。只是既然我提出了問題,我總要提出自己的解法,給大家三考評量。對於機械工業出版社願意出版這樣一本經典,李英軍先生等人願意翻譯這樣一本高階而吃力不討好的書,我是帶有敬意的。
另一個翻譯上的問題就是大家往往在電腦類書中硬套一般字典查來的辭彙,沒人敢突圍。要知道,一般字典並未考量電腦技術,更不可能考慮到上下文(c ontext)。太多人抱著少做少錯,不做不錯的心理,一昧緊跟字典,不敢變動,才會造成目前許多譯詞不夠理想,卻動彈不得。我印象最深刻的是這幾個字:
instance:臺灣和大陸均有不少人譯為「實例」。這個「例」字根本不好。臺灣甚至有人譯為「案例」,更不妥當。為什麽這麽譯,因為字典查來的現成辭彙是這樣。所謂 instance 就是根據某個東西(可能是實物,可能是某種表述)產生出來的一份實際物體。我認為譯為「實體」是很合適的。根據 class 產生出來的便是object實體,根據 class template 產生出來的則是class 實體,根據 function template 產生出來的是function 實體。根據可執行檔(executable files)產生出來的,則是一份 process 實體。
paradigm:臺灣常譯為「典範」。為什麽?喔,字典查來的現成辭彙。有時候這樣譯有點道理,例如 paradigm shift 叫做「典範移轉」。問題是,何謂「典範移轉」?很難望文生義是吧。把 generic paradigm 譯為泛型典範,更是令人不知所以。我們日常用語裏也有「典範」一詞,我們會說某某人是國家社會的典範,那和電腦術語裏頭的 paradigm 根本不是同一個意思。根據 paradigm 在電腦術語中的實際意義,我把它譯為「思維模式」 ─ 典型的、根本的思維模式。
讀者來了這樣一封信:
我向您討教一個翻譯風格的問題。正如您所說,英文技術書籍最難在長句子,因為英文的句式組合形式比中文大大豐富,理解起來已經費力,翻譯成順口的中文更難。我有時遇到這種句型,切分組合,翻來覆去掂量,還是覺得中文不忍卒讀。您認為此時 (1) 我可不可以放棄 "信" 而求 "達",也就是說略掉部份句子成份以保全譯句的通順?還是 (2) 務求將原義表達出來,寧可中文句子不順暢也在所不惜?更有甚者,有時 (3) 某些句子無關宏旨,卻異常難譯,可不可以乾脆略過不譯?您的看法是什麽?
(各位有沒有注意到,這位讀者的中文很好。「切分組合,反覆掂量」這幾個字用得多精簡傳神)我的看法是,譯者有權利也有義務通權達變,但也必須有這份能耐才行。因此你的第一個問題我認為可以,你的第二個問題我認為不可以。你的第三個問題我認為可以,但需謹慎為之,莫因譯者本身水平,犧牲了某些東西。
科技翻譯應該務求義譯而非字譯。信與達,應從整個句子或甚至整個段落來看,不是從一個個單字來看。技術文章和文學多有不同,譯者最重要的任務是正確傳達知識,並儘量減少讀者的吸收困難。
到底彈性的底限在哪里?我這麽認為,譯者於技術層次上愈有把握,便享有愈大的彈性。只要技術層次夠,有把握正確瞭解並傳達了原作者要表達的技術,那麽,文字上不必字字拘泥。
中文在科技表達中並非一無是處。中文有一個優點,就是資訊密度高,很多時候精簡漂亮的一行中文,可以表達出「子句夾帶子句再夾帶子句」的三行冗長英文。中文有優美的詞藻與取之不盡用之不竭的典故、成語、俗諺,如果善用它們,冰冷的技術文字一下子就能有閱讀的樂趣。一本爛譯本,會讓讀者詰屈聱牙,痛苦至極;但是一本好譯本,能使人如沐春風。
容我說一句,正確的心態、足夠的時間、充裕的中文表達能力、水平以上的專業素養,是造就好譯本的基本元素。現今情況如何?話說回頭,好譯者的報酬幾何?你願意多花點錢表示你對他們的付出的認同嗎?
健康的選書心態
以下談到選書的心態和作學問的態度,由於都以讀者來信展開討論,因此避免不了提到我寫的《深入淺出MFC》。我要談的問題,其實不局限於某一本書,或某一種技術。就像這篇文章先前舉的許多例子一樣,都是可以放大來看的。
讀者修書一封如下:
2個星期前好不容易讀完了您的大作,讓我對MFC的認識多了不少,不過一點遺憾的是從您的書裏並不能學到如何寫一個具體的程式,僅僅是明白了M FC的“包裝技術”。本來我還以為又上當了呢 因為我買這本書的目的就是要學習用MFC來做程式的...一個偶然的機遇讓我得到了 Jeff Prosise的《programming windows with MFC》,這才發現老師您的書是多麽的重要,假如沒有您的《深入淺出MFC》我又怎麽可能programming with MFC呢?...您的書救我於水深火熱之中,帶領我衝破MFC的條條封鎖線。
雖然這位讀者最終對侯捷和侯捷的書以感謝和讚美作收,但我頗有感慨。
讀者往往以最直觀的態度來審視一本書的價值,以最直接的方式來表達他的愛憎。但不能凡是不需要的,一律視為灌水;凡不符合需求的,一律視為欺騙。這不是一種健康的選書態度。即使你最後並沒有發現《深入淺出M FC》「是多麽的重要,救我於水深火熱之中,帶領我衝破MFC的條條封鎖線」,這本書又何嘗在書名或內容欺騙過你,使你「以為又上當了呢」。再者「我買這本書的目的就是要學習用M FC來做程式的」,可是你若連MFC與application 的第一線接軌都不瞭解,照著葫蘆畫瓢,能寫出多好的程式?
我不是責怪這位讀者,只是這封來信代表某些現象,讓我心有感慨。下面是另一種偏激:
您的書我覺得有些無用的原理講的太多了! 你所寫的並不是真正的教人怎麽用VC,而是教人VC運做是怎麽樣進行的! 其實很多讀者真正關心的問題並不是在這裏! 而是在怎麽對用VC設計出真正出色的程式!
讀者永遠只想要看自己想看的內容,這一點很自然。但是你不想看到的東西並非就是「無用」,它對別人可能「很有用」。再說,連MFC與a pplication 的第一線接軌都不瞭解,照著葫蘆畫瓢,我不知道你能寫出什麽「出色的程式」。只要出一點差錯,你連除錯的能力都沒有。開車是很簡單的,開車上路遇到各種突發狀況而能應付並排除障礙,才是困難的地方,才是技術的表現。
下面是兩封臺灣讀者的意見,有點年代了。當然我必得先說明,抱持這種態度的讀者,比例大約在百分之零點零一。
讀者意見一
這本書包裝太厚。不該有的東東太多,附錄A所列的無責任書評,在我想來也是多餘。因為這篇書評在RUN!PC早有提及,後來也出了無責任書評第三集,因此實在沒有這個必要。想來是侯先生要增加書的厚度,有以致也。
讀者意見二
書評不應該放在這本書裏吧! 因為這些東西而讓書太厚實在有點┅這些灌水的東西共計有:
(a)1-16頁的讀者來函:共16頁
(b)超長的序,嗯,這應該沒有關係
©843-872頁的無責任書評:共30頁(其實裏面有一些發人省思的東西,還好)
(d)873-914頁的Scribble原始碼:共42頁(這最嚴重,幾乎沒必要的東西)
(e)915-920頁的VC範例程式一覽:共6頁(很可惜,如果再多加發揮的話很有用,
但是侯Sir只是列個標題,連說明都是英文,和看Help檔沒什麽差別)
共計:94頁
不是我無聊找碴,您可曾看到有哪本書有將近一百頁的贅肉?更別題書中動不動就列出四五頁的原始碼了。這些在光碟上都有,何必浪費紙張? 不過消掉這些贅肉,這本書還是有它的價值┅至於書中缺少的部份,我認為要看您如何去定位這本書。
總不能要求一本書把所有Program的東西講完吧! 以探討MFC的內部而言,本書沒什麽好批評的了。總而言之,這本書該不該買,我想還是肯定的。但是如果書能瘦點、售價能低點,那就更好了。
說來說去,原來是為了「如果書能瘦點、售價能低點那就更好了」。這便是頁數和售價牽扯觀念下的可憐受害者,他們扭曲了書籍的價值,也嚴重扭曲了自己該有的正確價值觀。如果我告訴這些讀者,少掉那1 00頁的所謂「贅肉」,售價一樣是 NTD 860,恐怕他們又要對這些「贅肉」熱情擁抱來一個親親了。真的是這樣,這本書是先確定價格,最後為了給讀者更多資訊和更大的方便,我才加上那些「贅肉」的。
這一類讀者,站在敵對的立場,看待出版社和作者,幻想每一個人都在覬覦他的錢包,並且認為對他無實質幫助的每一頁(可能只是因為他已看過)都是被刻意灌水的結果,都是為了欺騙他的鈔票。這樣的讀者在杯弓蛇影的壓力之下,忘記了沒有任何一本書是為個人量身打造的,也忘記了其他人可能有不同的需求,完全以自我為中心。
這一類不成熟的讀者,實在是當前劣品充斥下的犧牲者。老實說我個人並不喜歡他們成為我的讀者。只是,讀者有選擇作者的權利,作者卻沒有選擇讀者的機會。
正確的作學問態度
前面兩篇來信透露出一個疑惑,《深入淺出MFC》是不是一本對VC編程有幫助的書。我不是要在這裏夾帶推薦該書(相信我,我不需要如此),而是想透過M FC與VC的關係,引申談談作學問的態度。如果「作學問」太高遠了,那我們來談談「學習」的態度吧。
以下是一封讀者來函:
我有個疑惑,想請你幫助。我們今天學C/C++,明天學MFC,OWL(如果流行的話)
後天學C#,JAVA...如果 WINDOW 被 X WINDOW 淘汰,豈不是都要從頭學過?有沒有必要把一切搞得如此精通?同樣的目的,為什麽不用更方便簡單的快速RAD開發工具?而非要以鑽研繁雜作為樂趣?和體現水平?是否搞錯了方向和目標?我認為這正是目前大陸(臺灣我不瞭解)軟體發展的一個錯誤的方向。
所有同質的技術都有累積性與共通性。信中提到的三組東西:MFC, OWL, 或是 Windows, X Window, 或是 C++, Java, C#, 都有類似性與共通性。技術是會累積的,有了某種經驗,學習新技術會快很多。經驗愈多,學習愈快。所以我常喜歡說「觸類旁通」。如果每種技術都得從新學習,大家三五年就得歸零一次,人類世界就不會在 20 世紀像爆炸似地進步這麽快。
「有沒有必要把一切搞得如此精通?」我的回答是:看個人需求與定位。基礎知識的精通,是做為應用的一種過程與手段,而不是目的。如果你不需要通過這樣的過程,就可以把你要做的事情做得很好,那麽當然你可以跳過這個過程。我所知道的是,許多許多人必須先有這樣的過程,才能夠良好達成期望目標。我自己也需要通過這樣的過程(否則寫不出這樣的書)。這不是你所謂的「鑽研繁雜」或「體現水平」。
既然信中提到RAD,我也談談我的看法。我曾經寫過一篇文章,把RAD喻為「匹夫無罪,懷璧其罪」(見侯捷網站 1999/01/26 懷璧其罪 RAD),建議各位看看。我很贊成使用RAD。我書寫MFC書籍,探討MFC技術,但從來沒有認為它最好,或不好,我只是要幫助那麽多使用MF C的人。和Bjarne 的態度一樣,我對諸如此類的工具評比活動一點興趣都沒有。我樂意當一名觀眾,但從來不評比(應該可以說,也沒有能力評比)。
RAD 的情況,可以拿汽車做比喻。現今誰開車還需要知道齒輪箱、傳動軸、離合器、引擎點火原理、火星塞呢?但是滿街開車人誰又能夠表演3 60度大迴旋?要到達「車手」的程度,就必須對車子的原理有相當程度的瞭解。同樣是開車,洗拿(F1方程式冠軍車手)和侯捷兩人發揮車輛功能的程度,絕對有天壤之別。我認識的所有慣使R AD 的高手,無一不是有底層深厚功力。以RAD始,以RAD終,斷不能在技術上有所太大長進。你的生涯將是空白的五線譜,沒有高音,沒有低音,永遠的水平┅。
RAD是要用的,有好工具不用,和自己過不去。但是使用RAD的同時,對底層做更多的瞭解才有助於在某種情況下脫困或自助。這和 STL 的運用也一樣。會用STL,是一種檔次。對STL原理有所瞭解,又是一個檔次。追蹤過STL源碼,又是一個檔次。第三種檔次的人用起 STL 來,虎虎生風之勢絕非第一檔次的人能夠望其項背。
學習某種工具,及其背後代表的某種技術,究竟要鑽研到什麽深度?唔,答案視你想扮演什麽角色而定。「F1方程式車手」和「半夜三點才敢上臺北大馬路的用車人」之間,有許多許多的層次,你自己定位自己。
有些人絕對擁護RAD,有些人又重新反省RAD。下麵是另一封信:
我原本是一個一天到晚使用RAD工具的人...但是歷經了三個版本之後,我有一種被騙的感覺,因為處在這個環境中,似乎是投身在別人設好的一個圈套裏!這種東西會讓人對於『瞭解 OS 內部運作以及各種規範與協定的基礎層面』的欲望慢慢減低至零。今天為了突破某一個元件的限制而自己寫了一個元件,明天新版RAD內附元件就出現了比自己寫得還要好的東西。到了最後,自己不想寫,只想等別人寫給你
;要是別人不寫,你就徹頭徹尾地喪失了一項能力...(天曉得要等到何年何月),要不然就是官方給的元件功能少東少西。不只這些!最讓我受不了的是,我竟然發現:程式用這種方式去寫,簡直就比用O ffice 還要簡單,深入的思考幾乎是零...。
我在「懷璧其罪 RAD」一文中是這麽回答的:
1. RAD 並非罪惡,而是優點。要怎麽用它則是你自己的問題。我有兩位朋友是 Delphi 專家,他們可以使用 Delphi 做任何事情,沒有任何你想像中 RAD「該有」的限制。
2. 果真能夠「寫一個程式,比使用 Office 還要簡單,深入的思考幾乎是零」,並不是壞事。大家都能夠隨手寫個小程式解決手邊幾個小問題,是為component software 以及 RAD 的大貢獻。但你的形容太誇張了,目前的 RAD 還不至於美好若此,總還需要一些程式邏輯和程式語言的基本訓練。真到了你說的那一天,我覺得是件好事而不是壞事。只不過,那樣子完成的程式,都需藉助現成的元件。如果要突破現成的框框,就得有更深的功力。無論如何,R AD 不會是你的絆腳石。
這類話題很難一言以蔽之。總之,優秀的技術者一定需要一個向下沉澱的歷練,通過了這層歷練,有了扎實的基礎,就可以向上浮升,開始以抽象的思考,抽象的語言、快速開發工具來進行高層次的開發工作。這時候運用 RAD 工具,當能如虎添翼。
所謂百煉成鋼;鋼的形成,是將鐵塊不斷錘打,不斷回火,不斷淬煉。做為一個程式員,本身技能的層次,和回火淬煉的次數有密切關係。
讓我們再回頭談談基礎建設。很多資訊科系在學學生對學校所開的課程,非常不服氣,非常不屑,認為對編程能力一點幫助也沒有。首先我要說,編程、軟體發展並不是資訊系學生的唯一出路。資訊領域何其廣泛,編程只是其中小小的一支而已(但對就業市場而言則是大大的一脈)。其次我要說,基礎理論課程並非對你的編程一無是處 ─ 不是無用,只是未到用時。有些科目的影響非常直接而深遠,例如對編程最重要的兩門課:資料結構(data structure)和演算法(algorithm),這兩門課對邏輯思考與實現能力的訓練,有關鍵性的價值。沒有這兩門課做底,任你 C/C++/Java 多強多行,也寫不出個好程式。其他基礎理論課程也都各有用途,會不會在你未來的編程生涯中帶來幫助,那得看你編哪一種程。就業與學校所學,不必然會發生關係,不必然不會發生關係。
編程能力強的年輕同學,容易孳生一種趾高氣揚的惡習,看這不順眼,看那不順眼,教授都老朽,同學都可笑。問他為什麽,哦,因為「他們的編程功力都不如我」。可笑的正是你自己呀。
編程實力的培養其實很容易的。我所謂容易,是指不需借助外力,純粹自修就幾乎可以做到。再沒有比這更幸運的事了。當然你的進修必須按部就班(在我的專長範圍內,我寫了很多讓你前進時有所依循的文章,都在侯捷網站上)。任何高深的理論,只要實際操作過都可以霍然理解,編程的實作又有什麽難的。數本好書,一部電腦,一些必要的工具,全部搞定,只欠一股「頭懸樑錐刺股」的苦讀精神。實力進展到一個階段後,我非常鼓勵你追蹤名家源碼(有人指導更好)。司馬相如說:能讀千賦則善賦,能觀千劍則善劍。侯捷說:讀過千賦亦能賦,觀過千劍亦能劍。
最後我還要說,學校(尤其大學)原本不是職訓所。但是關於人格的培養,思想的啟迪,視野的開拓,現今言之,恐怕是陳義過高,沒人愛聽了。
學校肯定有學校的缺失。其一是課程太過理論,高來高去。以大學生的程度而言,太過抽象的東西他們是沒有能力接受的。但是要化抽象為具象,化繁為簡,可得有非常深厚的實力才行。其二是教材、教具、教師太過陳舊,跟不上時代。我印象最深刻的是,臺灣B BS時常有學生問 Turbo C 3.0 上的問題。我的媽呀,C++ Standard 都出來兩年了,學校還在用TC3.0。倒不是說一定要追最新最炫的工具或產品,但是TC3.0 距離 C++ Standard,有月球到地球的距離吧。用這個編譯器,可想而知老師教的是什麽內容,可想而知老師本身跟上外界脈動的程度。如果新工具新產品都很貴,顧及學校經費,我們也能體諒。可 Borland C++ 5.5, GNU C++ 2.96, TAI C++ 都是可以免費下載或限期試用的呀。它們對 C++ Standard 的實現,比TC3.0 好太多太多了。
這就涉及學校教育裏頭最重要的關鍵:師資。說句實在話,大學裏頭有不少老師,書是念得很棒,就是沒有實作經驗,更沒有業界經驗。因循苟且之念一動,萬年教材一攤,同學們就只有自求多福。
自救之道當然有:你必須更勤奮。勤奮看書,勤奮發問。勤奮搜尋好的導師和好的讀物。或許天道酬勤,就讓你碰上一個傳道授業解惑的貴人,就讓你知道一本必讀的經典,並且就讓你找到它。
說到勤奮發問,讓我發出本文的最後一聲感歎做為結束。臺灣大學生在「表達能力」這一點,程度普遍低下幼稚。能夠條理分明把自己的意思表達清楚的,十分罕見。反映出來的,就是怯怯懦懦,理不直而氣不壯。私底下聲若洪鐘,要他站起來公開表示意見,卻如細蚊之嗡嗡。不論口語或文字,用詞普遍地「俗」。大陸情況,就我的印象,以及我收到的讀者來信,感覺好很多。以前臺灣的說法是,因為大陸鬥爭厲害,人人得有一口利嘴以求自保。但文革已過數十年,我看大家的表達能力普遍還是很不錯,是不是求學階段中曾經特別重視這個?
發問的能力影響學習甚巨。善問者使人繼其聲,善教者使人承其志。我常自詡為一名善教者,但如課堂上兼能有一名善問者,高潮迭起,全班受惠。
漫談 程式師與編程
random talk on programmer and programming
北京《程式師》2001.05
臺北《 Run!PC》2001.06
作者簡介:侯捷,臺灣電腦技術作家,著譯評兼擅。常著文章自娛,頗示己志。
個人網站:www.jjhou.com
北京鏡站:www.csdn.net/expert/jjhou
--------------------------------------------------------------------------------
「侯捷觀點」進行了4期。通過這個專欄的作用,我開始接觸大陸的電腦技術刊物《程式師》和電腦技術網站 CSDN,並累積了相當量的觀察和感想。這個專欄前數期談的都是技術,不是深度書評就是高階技法。這一期讓我們輕鬆一下,談談程式師(p rogrammer)與編程(programming)。其中不少議題起因於讀者來信的觸發,許多觀點我也已經回應於侯捷網站上。所以若干文字可能你曾經在侯捷網站上閱讀過。有些看法也許讀來刺眼,聽來刺耳。但如果大家不把我視為外人,當能平心靜氣地思考。臺灣存在許多相同的問題,我也時常為文針砭。
有一句話這麽說:如果你想使人發怒,就說謊。如果你想使人大怒,就說實話。說實話的人來了,但願你心平氣和。
急功近利是大忌
一位讀者寫信給我,說他非常著急。他一個月掙300元人民幣,家裏情況又不好。他希望趕快把 VC/MFC 學會,進入 IT 產業掙錢。信寫得很長,看著看著,我也不禁為他著急起來。
有許多讀者,雖然情況沒有那麽急迫,燃眉之情卻也溢於言表。不外乎都是希望能夠儘快把某技術某技術學習起來。
但是哪一樣東西哪一樣技術是可以快速學成的呢?能夠快速學成的技術,人才也就必然易取易得,根據市場供需法則,也就不可能有很好的報酬。所以諸君當有心理準備,門檻高的,學習代價高,報酬高;門檻低的,學習代價低,報酬低。
說起來是老生常談了。這其中最可怕的心理在急功近利。從讀者的來信,以及從 CSDN 上的眾多帖文,我感覺,許許多多人學習 IT 技術,進入 IT 產業,是認為 IT 產業可以助你脫困,遠離貧窮。
是的,IT 產業有這個「錢」景,但你得有那份實力。要吃硬核桃,也得先估量自己的牙口。
「好利」是基本人性,Acer 總裁施振榮先生大力提倡「好逸惡勞」之說,視為人性之本,進步的原動力。誰能說不是呢?好利可以,近利就不妙了。近利代表目光淺短,一切作為都因此只在小格局中打轉。
梨園有句話:要在人前顯貴,就要在人後受罪。臺上一分鐘,台下十年功。老祖宗這方面的教誨太多了,身為中國人的我們,應該都耳熟能詳。
對於心急的朋友,我只有一句話:勿在浮沙築高臺。你明明很清楚這個道理,為什麽臨到自己身上,就糊塗了?急是沒有用的,浮躁更會壞事。耐住性子紮根基吧。做任何事都要投資,紮根基就是你對自己的未來的投資。如果想知道如何按部就班紮根基,侯捷網站上有一篇文章:「9 7/06 選義按部 考辭就班」,請你看看。
口舌之戰有何益
最常在程式技術相關論壇上看到毫無價值而又總是人聲鼎沸的口舌之戰,就是諸如「VB 和 Delphi 誰好」、「BCB 和 VC 誰優」、「Linus 和 Windows 誰棒」、「Java 和 C++ 誰強」這種題目。每次出場都一片洋洋灑灑,紅紅火火急速竄升為超酷話題。眾人各擁所好,口沫飛揚,但是從來說服不了任何異陣營的人,話都只說給自己人聽,給自己人爽。
這樣的論戰有何意義?許多人在重組自己的偏見時,還以為自己在思考呢。戰到最後,就只是爭誰說最後一句話而已。而且,擦傷引起的爭吵幾乎總是以刺傷結束。
工具與技術的評比,是一場高水準的演出。真有能力做評比,侯捷是很尊敬的。但是這些各擁所好,口沫飛揚的人,真的對評比兩造都有深刻的瞭解嗎?很多時候我們看到的只是無知,而無知是這麽一種東西 : 當你擁有了它,你就擁有巨大的膽量。
很多人喜歡某種工具,只不過因為那是他的初體驗。他玩它玩出了一點心得,可以說出它的某些好,就開始做「評比」了。你只看到牡丹的豔麗,又怎知寒梅的清香,幽蘭的空靈?
絕大多數人使用某種工具,不是因為它最好,不是因為眾裏尋它千百度,僅僅只是因緣際會。雖然說不同的應用環境選擇不同的工具,是最伶俐的作為,但我真的懷疑,在現今工具(以及工具背後反映的技術)如此繁複的時空下,有多少人能夠同時精通一個以上的同質工具?追二兔不得一兔,我還是認為你精專一樣工具,把它發揮到最高效能,獲得的利益多些。被大家拿來評比的,都是市場上的佼佼者,還能差到哪里去?能夠兩雄相爭,必然是在技術面、非技術面(資源的普及、品牌的可靠度)各有一片天,你的評比意義大嗎?全面嗎?
大多數人沒有能力同時精通兩種同質工具,初學者聽了網路上不知名大俠的高論,也不可能有所選擇(如果有,怕也只是蒙著頭瞎選)。這種沒有提供資料,評論者也沒有顯示任何信譽(c redit)的論戰,沒有任何意義,純粹只為自己爽。浪費網路資源!
C++ 之父 Bjarne Stroustrup 曾經在他自己的網頁上的 FAQ (以及其他許多場合)中回答如下問題。雖然其中談的是語言,但是擴大到其他層面仍然合適,值得大家好好咀嚼(注:全文由孟岩先生譯出,可自侯捷網站流覽):
Q: 你願不願意將C++與別的語言比較?
A: 抱歉,我不願意。你可以在The Design and Evolution of C++的介紹性文字裏找到原因。有不少人邀請我把C++與其他語言相比,我已經決定不做這類事情。在此我想重申一個很久以來我一直強調的觀點:語言之間的比較沒什麽意義,更不公平。主流語言之間的合理比較要耗費很大的精力,多數人不會願意付出這麽大的代價。另外還需要在廣泛的應用領域有充份經驗,保持一種不偏不倚客觀獨立的立場,有 公正無私的信念。...
人們試圖把各種語言拿來比較長短,有些現像我已經一次又一次地注意到,坦率地說我感到擔憂。作者們盡力表現出公正無私,但最終都是無可救藥地偏向於某一種特定的應用程式,某一種特定的編程風格,或者某一種特定的程式師文化。更糟的是,當某一種語言明顯地比另一種語言更出名時,一些不易察覺的偷樑換柱就開始了:比較有名的語言中的缺陷被有意淡化,而且被拐彎抹角地加以掩飾;同樣的缺陷在不那麽出名的語言裏就被描述為致命傷。同樣的道理,較出名的語言的技術資料經常更新,而不太出名的語言的技術資料往往是陳年老酒,試問這種比較有何公正性和意義可言?
Q: 別人可是經常拿他們的語言與C++比來比去,這讓你感到不自在嗎?
A: 當這些評比不夠完整,或者出於商業目的,我確實感覺不爽。那些散佈最廣的比較性評論大多是由某種語言,比方說Z語言的擁護者發表的,其目的是為了證明Z 比其他語言好。由於C++被廣泛運用,所以C++通常成了黑名單上的頭一個名字。通常這類文章被夾在Z語言供應商提供的產品之中,成了其市場競爭的一個手段。令人震驚的是,相當多的此類評論竟然引用的是那些Z 語言開發廠商的員工的文章,而這些經不起考驗的文章無非想證明Z是最好的。尤其當評論之中確實有一些零零散散的事實...,特意選擇出來的事實雖然好像正確,有時卻是完全誤導。
以後再看到語言評比文章時,請留心是誰寫的,他的表述是不是以事實為依據,以公正為準繩,特別是評判的標準是不是對於所引述的每一種語言來說都公平合理。這可不容易做到。
我說過了,真正精譬的技術評比,對於相當程度的研究者,是很有價值的,但我很少在論壇上看到精品 ─ 論壇還能有什麽精品,99% 是打屁閒談沒有營養的文字。我們每每在其中看到偏見、我執、以及最後免不了因擦傷而引起的刺傷。這真令人傷感。這些人把時間拿來學習,多好。奉勸各位少花時間瞎打屁,多花時間學習,看些真正的精典,別動不動就在論壇上提問,也別動不動就掛在論壇上看別人的瞎打屁。
不但評比性的話題,大家喜歡強出頭,其他話題,情緒性的反應也很多。中國強盛之道,眼前彷佛全壓寶在 IT產業(尤其軟體工業)上面。程式師被賦予了過多的期許,程式師也自我膨脹了許多。夾雜著民族主義或個人好惡,看到不滿意的人事物,就號召大家「黑(h ack)」過去。這是什麽心態?比拳頭嗎?說實話,就算要比拳頭大小,「黑」個網站算是什麽尺寸的拳頭?網路是個大暗室,君子不欺暗室。
雜誌定位在哪里
CSDN上頭,前一陣子曾經請大家就《程式師》的定位問題給意見。很熱鬧。我不知道刊物掌門人在看了那麽多建言之後,有沒有收穫。猜想是沒有 ─ 就算有也恐怕不大。
就像面對書籍一樣,讀者最直觀的感覺,就是要看他所需要的東西。100個人有100種需求,這樣的詢問得不出總結。隱性讀者、不上網的讀者、不投票的讀者、不寫帖文的讀者,你又如何知道他的想法。
我以為,只需把握一個原則:永遠比大眾水平高一個檔次,扮演引導者,帶領讀者接觸前沿思想與宏觀視野,那就是了。讀者本身會成長,不論你把刊物定位在實質技術的哪一個層次,都會有人不滿足;今年的讀者成長了,不見得明年還是你的讀者。唯有保持前沿思想與宏觀視野,時常導入新的技術形式、新的思維、專家的見解、意見領袖的看法,才能夠長期吸引讀者,並對許多人以及整個技術開發環境做出長久的貢獻。
美國大物理學家費曼,曾經批評物理課的教學。他說老師老是在傳授解物理習題的技巧,而不是從物理的精神層面來啟發學生。這一點是不是可以給刊物經營者和刊物讀者一點點啟發?
以此觀之,就我個人的專長領域,STL 之父訪談錄、演算法大師 Donald Knuth 採訪、C++/OOP 大系、GP/STL 大系、將標準C++視為一個新語言┅以及一些總括性、大局觀的文章,是我認為最好的主題。此中有侯捷自己的作品,唔,我向來不客氣。
當然啦,太形而上的東西是不行的,太過抽象的東西不容易被接受。抽象層次愈高,人的自由度愈大,但抽象思考是層次高的人的專利,要普羅大眾能夠接受,還需具象細節稍做輔助。
如何長期保持具有前沿思想與宏觀視野的稿源?與外國雜誌合作是一個既快又好的辦法。每一期《程式師》最前數頁都有當期重要外文期刊的前沿摘要,可見《程式師》編輯群一直與外文專業期刊保持著閱讀上的接觸。要挑選合作夥伴,心中一定有譜。
當然啦,與國外合作涉及經費問題。旁人(尤其讀者)很難體會或換位思考經費上的種種困難。就像有人痛心疾首義正詞嚴地埋怨 CSDN 速度慢得像蝸牛,卻可曾想過網站的資源從哪里來。向你收費,你接受嗎?臺灣已經倒掉很多很多家著名的網站,我等著看免費的服務撐到幾時。
要刊物宏觀耐讀,讀者們也得成熟些。一群很好的讀者,才拱得起一本很好的刊物。
下麵是一封讀者來信:
現在技術發展太快了,國外(甚至印度)在實現「軟體工業化」的時候,大陸(至少我周圍是這樣)還停留在小作坊手工打造的水平。我認為未來的世界不再屬於「個人數位英雄」,軟體工程似乎比一兩項技術更迫切。以您的大局觀和豐富的閱歷,對這個問題是否有不同的看法,不知您是否願意就此從技術(或其他)角度寫篇文章發表您的見解。
軟體工程對整個軟體工業的提升,至為重要。但是一個程式師要修練到對「軟體工程」這個題目感興趣,非三五載(甚至更多)不為功。我的意思是什麽呢?我的意思是,這類書籍、這類工具、這類網站、這類刊物,在一個嘈嘈切切、急功近利的環境中難有生存空間。這是為什麽蔣濤先生想要將《程式師》雜誌導向軟體工程主題時,我對他興起巨大的尊敬與憂慮的原因。
順帶一提,《程式師》的文字水平一直以來帶給我「閱讀的樂趣」。這個評語我從來少有機會用在臺灣的電腦刊物或電腦書籍上。比起臺灣的電腦讀物,這裏的文字有深度多了。
輕浮躁進沒信心
只要上網看看程式師出沒的論壇,你就會看到一片浮躁與焦慮。反映出來的就是沒有信心。
「C# 推出,Java 將死」,「Java 演進,C++ 將亡」,「.Net 推出,VB程式師死定了」,「Kylix 推出,大夥兒快學」,「Delphi 持續新版,哥兒們別怕」,「我剛學VC,怎麽它就出場了」,「MFC 真的要過時了嗎」┅。諸如此類的問題,不知該歸類為謠言還是童語?
很奇怪也很感歎,為什麽大家對這類問題如此感到興趣。那透露出一種膚淺 ─ 沒有深刻瞭解技術本質,因而汲汲營營慌慌張張惶惶惑惑於新工具、新事務、並且認為新的大概一定都是好的。對自己沒有信心,對整個環境也沒有信心。
有深度的程式師絕對不會在意這種事情。當然,並不是早晚三柱香就萬事保平安。並不是告訴自己別在乎別在意,就真的能夠不在乎不在意了。那必需是發自內心,胸中自有丘壑的一種篤定,有著好的本質學能做靠山。
臺灣 BBS(連線)前陣子也有許多熱烈討論 Java, C#, C++, .NET 的貼信。我把我最欣賞的一封引於下。其最後結語,擴張到任何領域都是合適的。
發信人: algent@kkcity.com.tw (流雲), 看板: programming
標 題: 一些想法Re: 不懂,業界一直喊Java,在喊些什麽..."
發信站: KKCITY (Sun Feb 18 12:55:49 2001)
以目前臺灣業界的情形來看,C\C++ 應該是想成為一個軟體工程師的基本技能;至於 Java,如果熟悉 C++,學 Java 應該花不了一個月的時間。
以我個人的觀點,Java 的 OO 程度是勝於 C++ 的,而且在這個 Internet盛行的年代,效率的瓶頸在於網路本身的頻寬而不在單機執行時的效率,Java 所提供的 Collection framework 是非常威力強大的程式設計工具,又內建了對 Multi-thread 程式的支援,豐富的 class library 讓人在設計網路、資料庫┅的相關軟體時無後顧之憂。
C++ 可能是過去十多年以來最重要的程式語言之一,它的效率顯然較Java為佳,但在撰寫需要安裝在Internet上成千上萬種不同廠牌的機器上執行的程式時,相對於J ava可能就不是最好的解決方案。
「目前」不需要以 Java 來開發 DeskTop 上的應用程式,因為「當下」而言 Java 撰寫的程式相對於 C++ 會佔據更多的記憶體且執行效能不彰。
我們不能期待免子遊得比魚快,也不能期待魚飛得比鷹高。
工程上的需求使得各種場合有不同的適合的程式語言,不必費心去批評 A、推崇B、打壓 C。基本的理論比這些事重要多了。
VB 將死?Java 將亡?C++ 將被 Java 取代...,這很重要嗎?我用Java 也用 C++,即使明年它們全都被 Java++、C++++、Lisp++、Forth++取代,何有於我哉?FFT 還是 FFT、Dijkstra algorithm 還是Dijkstra algorithm...還是別太擔心這些事了...
侯捷除了偶在 BBS 上自說自話外,絕少回應或三與討論。看了上封信,忍不住回了一帖:
作者: jjhou (jjhou) 看板: programming
標題: 一些想法Re: 不懂,業界一直喊Java,在喊些什麽..."
時間: Fri Feb 23 21:12:14 2001
同意你的看法。寫得非常精采。
人到了一個層次,才會去思考事物的本質是什麽,不被浮面的工具所系絆。
熟練工具是必要的,但工具的演化汰換,不是大家在這裏關起門來喊爽就好。
Donald Knuth 說:「語言持續演進,那是必要的。不論現在流行什麽語言,你都可以肯定十年二十年之後它不再風光。我總是在自己的書中寫些不時髦的東西,但這些東西卻值得後代子孫記取。」(注:以上局部是《程式師》2 000/12 的譯文)
DDJ 1996/04 p18:
"Language keep evolving, and that is necessary. ...Whatever computer language is in fashion, you can guarantee that whitin a decade or two it will be completely out of fashion. In my book, I try to write things that are not trendy, but are things that are going to be worth remembering for other generations."
追求新知固然是一個電腦從業人員該有的態度,但是追求新工具與充實固有知識兩者之間,應該取得一個平衡。過猶不及!
再說,凡走過必留下足跡。你現今的任何努力,只要它是扎扎實實的,就絕不至於落空。技術是有累積性的呀,技術總是觸類旁通的呀。你說 MFC 和 OWL 就沒有累積性,我說有,message map 的原理不一樣嗎?framework 的工作原理不一樣嗎?
我個人並非任何語言或任何工具或任何技術的狂熱者,我是務實派。對於自稱熟稔多種(屬性不同的)語言的人,我充滿敬畏並保持工作上的距離。要精通一個語言,使自己能發揮其最大效能,不是件容易的事,需要不少精力的投注。9 9.99% 的人都是凡人,身為凡人的我們,把時間用來精通一(或二)種適合其工作性質的「語言」,比泛泛認識多種「語法」,要高明得多,回報也大得多。
真的,還是別太擔心誰將興起誰將亡的事了吧。
天才的沃土
教育永遠是我最關心的議題。教育的重要性絕對不亞於產業。沒有好的教育,何來好的產業人才?
學校教育就不提了,那不是侯捷能夠著力的地方。雖然我也在大學教書,但一年不過教育數十位學生,影響能有多大?書籍的讀者動輒數萬人,刊物的讀者動輒數十萬人,這才是有大影響力的地方。
自修教育如影隨形,打你離開學校就跟隨你一輩子,重要性遠勝於學校教育。談到自修,離不開讀物 ─ 各種型式的書籍和刊物。在咱們程式師這一行,書籍和刊物的情況如何?
下麵是一封讀者來信:
我記得您說過,到一個地區的書店去逛逛,對這裏的IT技術水平就知道大概。這話太得我心了。我學習軟體技術5年,花在買書的錢有一萬二千(人民幣)以上,如今回頭來看,絕大部份是垃圾。以前曾經擔心:若要到外地工作,這麽多書怎麽帶走?現在則是一種心痛的輕鬆,因為值得帶走的書只夠一提。學習I T之初,誰不想在產業上做出一番成成績?但多年之後回首,則恐怕都會為自己當時所處的教育環境痛心。
關於電腦書籍的浮濫、低劣,我收到太多太多的讀者反應了。以上只是冰山一角,有興趣的讀者請上侯捷網站看個飽。有些出版社甚至以出爛書聞名,看看這封信:
您想必看過蔣先生在《程式師》上寫的文章,知道所謂IT出版四大家。蔣先生可能礙於禮儀,有些地方還沒講透。例如其中的XXX出版社,在譯作方面現在已經是一塊榜樣 粗製濫造的榜樣。
再看這封信:
我在您網站中看到了有關對關於xxx 出版社的評價,深有感慨。其實該出版社是大陸IT業引進外文書籍的鼻祖,我們這一輩程式師(92年以前的)就是讀該出版社的譯著成長起來的(我至少還有兩大紙箱x xx出版社的舊書),在那個時候,差不多所有的電腦類圖書都是它們引進並翻譯的,現在看來,那個時候的翻譯質量差得無法忍受(比I ncide VC++ 5/e還差許多),但我們那個時候已經很滿足了,畢竟有比沒有好。現在大家對xxx出版社的批評,我想是競爭的結果,因為大家看到了更好的譯著,有了比較。總而言之,x xx 出版社當年的特點是大量翻譯,草草出版,讓科技人員能夠在儘快的讀到優秀作品。這種作風顯然已經不合時宜了,或者說它已經完成了它的歷史使命。我現在當然也不象從前那樣狂買x xx 出版社的書了,因為有了更多的選擇。
這封信讓我跌入回憶。臺灣也曾有兩家出版社,有過同等劣質的作法。這兩家惡貫滿盈的出版社,一名瑩圃,一叫松格。兩家都關門了。他們的作法都是,快速而大量地翻譯外文書。由於速度快,也由於選材之中不乏好書,所以曾經擁有一定的市場。怎地都關門了?因為讀者只能被欺負一次兩次,不會永遠當傻瓜。這樣的出版心態擺明沒有長遠打算,只想撈一票走人,不關門才怪。
我們可能因為,垃圾堆中多少撿過一些經過修補尚稱堪用的東西,而對刻意製造這些垃圾的人產生一種奇怪的情愫。東西明明不好,但我們從中吸收了一點點養份。該謝他還是該恨他?
該唾棄他!
這些商人之所以大量而快速地引進外文書,因為有利可圖。有利可圖是好事,但他沒把他該做的事做好。他們放棄品質而無所懼,因為他們知道,在怎樣的時空背景下可以怎樣輕鬆地賺錢。大陸出版界朋友告訴我,誰誰誰(都有名有姓)很輕鬆地在幾年裏就這樣積聚了幾百萬人民幣的身家。幾百萬人民幣呀,我的天。這也算 IT 產業吧,果然是一片紅火,雞犬升天。
因努力做事而致富,應該得到我們的讚美和祝福。可這樣的出版社,花更大的功夫賺更多更長遠的錢他們不要,因為輕鬆錢賺起來不費勁兒。百分之一的人可能從這些垃圾中吸收到一些養份,百分之百的人從中感受了閱讀的痛苦。誰知道從中被誤導的人又有百分之幾?買書的錢我們沒少花,得到的正價值卻是那麽少,痛苦指數那麽高。
這位讀者說『總而言之xxx 出版社當年的特點是大量翻譯,草草出版,讓科技人員能夠儘快的讀到優秀作品』,又說『它們引進並翻譯的,現在看來,翻譯質量差得無法忍受』。喔,一本優秀的原作,經過無法忍受的翻譯質量洗禮後,還會是一本優秀的作品嗎?待人寬厚是美德,但是刻意製造餿水油讓人吃壞肚子者,不值得為他們說話。你說『它已經完成了它的歷史使命』。不,他們從來就沒有歷史使命,也沒有使命。
如此「仁厚自持」而且忍耐度奇佳的讀者,相當稀少。絕大部份程式師談到電腦圖書,都是斑斑血淚的控訴。《程式師》2001/03 p119 可不就有一篇「電腦圖書出版商的陷阱」。
讀者來信寫道:
魯迅說,未有天才之前,應該要先營造天才的土壤。...您的心情我確實能夠深刻理解(這大概就是堆在牆角那幾百本垃圾書的最大貢獻吧)。
「天才的土壤」,嗯,魯迅說得好。不正應該是出版社的職志嗎?我們卻能向誰說去?其實我們也只是希望有一些好書造就一些資質不錯的程式師而已。前一陣子才沸沸揚揚於印度程式師與中國程式師的比較,我們哪企望天才?不過就是希望培養一些扎實的人才而已。
看倌也許奇怪,書不好,侯捷為什麽不把矛頭對準作者,卻大罵出版社。哇勒,我早就抱著「得之我幸,不得我命」的卑微態度,不敢期望創作性中文好書。上面我說的,以及讀者最痛心疾首的,是翻譯書的低劣水平。人才濟濟的中國,怎麽可能找不到夠格的譯者?如果不是出版社的搶錢搶短心態,會造就出這一大批劣品嗎?我能不怪罪出版社嗎?
到頭來,還是要靠自己。「電腦圖書出版商的陷阱」一文最終是這麽說的:『記住,您花的是自己辛苦掙來的錢,所以千萬不要浪費在沒有用的東西上。對於出版了優秀圖書的出版公司要有所回報。買他們的書,給他們寫信,讓他們知道你在想什麽,你需要什麽。』
良性迴圈
一個體系的建制,需要從底層到頂層的堅實構築。不論是 C++, Java, .Net, OO, UML, Windows programming, Linux programming,每一個主題欲成就一個完整體系,都需要一大套書。拿C++/OOP 來說,就得涵蓋語法語意的、物件模型的、專家經驗的、設計樣式(design patterns)的、入門的、進階的,作為三考工具的┅。拿 GP/STL 來說,就得有 GP 泛論型的、STL 源碼剖析的、STL 應用大全的、STL 規格大全的、STL 元件設計的、其他泛型技術的┅。拿Java 來說,就得有語言核心的、物件導向的、多緒編程的、圖形介面的、網路應用的┅。對生手而言,不先把底層的東西弄清楚就學習高層的抽象,必會成為空中樓閣,流於形式。對熟手而言,缺乏抽象思維,意味層次上的停滯。
寫作、翻譯、乃至於出版全體系好書,真的是一件需要目光長遠、意志堅定、帶點理想色彩的人,才做得起來的志業。
如果這樣的人,這樣的出版社,沒有得到大家理念上和實質上的支持,誰會投入這種傻事?
我個人一向是高品質高價位的堅定信奉者。高品質高價位是生產者經營的最大誘因。因為努力做出了高品質,所以得享高價位帶來的高利潤,天經地義。否則誰要費心去做高品質?慈善家嗎?傻瓜嗎?
對於消費者,高價位當然令他不舒服。但是你應當思考是否物有所值,甚至物超所值。拿英文書為例,USD 49.95 一本的 The C++ Standard Library,或是 USD 49.95 一本的 Generic Programming and the STL,完全物超所值。當我瞭解這些書的價值,就算他們再貴兩倍,我也要買。有人拼死吃河豚,我可是要拼命買好書。現實地說,眼下「知識經濟」喊得震天響,好書帶來的知識不正是賺錢工具嗎?對賺錢工具小氣,是不是和自己過不去?
下麵是一封讀者來信:
相較日本無論是漫畫作家、文學作家或是偶像歌星、影星的客觀條件來比較, 在臺灣,身為專業作家竟如此難為?有人可以連夜搭帳篷排隊買票看演唱會,有人卻可以論斤計兩地討論頁數與書價高低。或許他們不知道,一本介紹C 程式語言的入門書,在德國索價100 DM (約NT$2000)。 因此我的德國同事們購書前必定徵詢意見或三考書評。書價雖不低,但其讀書風氣仍不亞於日本。
這裏點出了一個重點:書價很高,於是大家慎選好書,重視書評。下麵是另一封讀者來信:
我是一名大陸的讀者,同時也是一名電腦的初學者。我在網上看到網友都十分推崇您的著作及譯作。知道您的作品《深入淺出MFC》第二版即將在內陸出版,我決定買這本書,並與華中科技大學出版社取了聯繫。從那裏知道您今年還會在大陸出幾本書,我非常高興,但在知道了您對價格的看法後,又有些失望。
大陸與臺灣的經濟水平是不同的,作為普通的工薪階層,購買力也是有限的。我們這裏,各類圖書中電腦類圖書的價格是最高的,圖書頁碼的最高位與書價的最高位基本相同 -- 700頁的書,價格在70到80元之間,1000頁以上的,價格在100元以上。這是目前大陸書價的大體情況。如果按您所說,350頁,書價80元,在這裏算是很高的價格了,這種價格的書,只能看,不能買。
"春蠶到死絲方盡,蠟炬成灰淚始幹",教師工作被我們看成很神聖的職業,燃燒自己,照亮別人。我想您出書的目的,也是想讓更多渴望知識的人受益於它,少走彎路。作為讀者,我們也希望能夠看到更多更好的書。但是在一定歷史時期內,購買力與價格應當有一個平衡,3 50頁80元的價格確實太高了,如果能夠降到60元以內,我相信大多數讀者可以接受。
您的書的品質很高這是大家的共識,從價格上應當與其他書區別開來,但書價也不宜太高。名牌服裝走高價位的路線,當然可以提高它的身價,顯得它檔次很高,但是太高的價格使它脫離了主要的消費群體,大多數人只能在口頭上談論它,卻只有極少數的人會把它穿在身上。書籍與名牌服裝不同,只有經過很多讀者長時間的閱讀之後,才能夠證明它的價值,如果很多人都知道侯先生的書質量很好,但是卻很少有人讀過(因為價格問題),那豈不是一種悲哀。
我最不樂意看到「xxx 頁的書,售價 xxx 元」這種觀念。一本書的價值在內容,不在頁數。真要這麽算,每本書我們都應該檢視一下其字型大小、行距字距、硬拷圖多寡、留白多寡 -- 因為這些都關係著頁數。如果大家都接受頁數和書價的固定比例,肯定會有大量浮濫的書跑出來(不就是現在的情況)。
不必這麽累。一本書值它的價,就買;不值它的價,就別買。很簡單的邏輯。
我們難道能夠拿著尺衡量一件亞曼尼用了多少碼布,來決定它的價格嗎?或是拿著尺衡量一張梵穀是幾號,來決定它的價格?我能夠說因為我畫的繡球花比梵穀的鳶尾花大兩倍,所以我應該賣他的兩倍價?
買東西不能光看有形;那無形的往往更重要。買書不是買紙。正確價值觀必須建立起來。
當然很有可能你認為買名牌服裝或名畫的人都是瘋子。你要的只是布和框。那表示那些物品在你心中不值那個價。很好,你有你的評價,你有你的選擇。
我不打算在「引喻」(例如名牌服裝或名畫)上面多著墨。引喻有顧此失彼的時候,筆戰都是這樣打起來的。各位知道我要強調的是什麽。
350頁的書,不應該一定賣 80元,也不應該一定不賣 80 元。這要看350頁的含金量有多少。況且我從沒說過侯捷有 350頁的書要賣 80元。但所有的可能都存在。350頁可以是180元,也可以是80元,也可能 530 頁連 18 元都不值。請不要再以頁數做為書價的依據了。
教師的工作很神聖,但「燃燒自己,照亮別人」太沉重。「燃燒自己」,呵呵,說起來多麽容易,做起來多麽痛苦。某人的工作對眾人有益,他會很開心。但你要他燃燒自己照亮別人,除非聖人,否則不幹的。我很樂意照亮別人,卻不想燃燒自己。燃燒自己,我只能照亮別人五年;把自己照顧好,我可以一輩子照亮別人。抬出大帽子,會讓有能力寫作好書的人畏懼不前。
請大家接受這樣的觀念吧:書的價值在內容,不在厚薄,不在頁數。價值影響價格,價值帶動價格。接受這樣的觀念,便是對好書的所有出力者致上敬意與實質支持。如果大家慎選好書,1 0 本垃圾書的價格支撐兩三本高價(其實是適價)好書綽綽有餘。走編程這條路,誰手上沒有 10 本 20 本垃圾書!當大家慎選好書,支持好書(儘管它價格較高),就會帶動書評風氣,帶動優良寫譯風氣。這對所有的人都好。不需有人燃燒自己,大家都被照亮了。
當然,高價位的薄書很可能帶來盜印與影印的歪風。但無論如何,我是堅持己見不會退縮的。如果大環境真的無法提升,好書離開了,好人退出了,最後損失的是誰?
不論各位相信不信,侯捷企圖以個人影響力(如果有的話)建立優良的技術寫作大環境,對臺灣如此,對大陸也是如此。「問渠安得清如許,為有源頭活水來」,要讓大家有更多好書可讀,就要有源頭活水注入;要有源頭活水,就要有更多誘因吸引更多才能之士到技術寫譯領域來。更多的誘因是什麽?讓他們知道,好作品可以突出,可以區隔(講白了就是有好價格),不會牛驥同一皂,這就是一種誘因。不,這不算誘因,這根本是一種基本道理。
優質的書使讀者受惠,優質書籍所帶來的高報酬使作者、出版社受惠,並吸引更多優秀人才到這個領域。形成一個良性迴圈,大家都受惠。
另外我要建議大陸出版社,善用你們獨特的「影印版」。臺灣的電腦類翻譯書,由於也是良窳不齊,窳多於良,曾有讀者開玩笑建議,出版社取得授權後,不要譯了,直接以原文出版,讀者看得高興,售價又得以大幅下降。想來這就是大陸的影印版(在臺灣是不許的)。既然翻譯書已到了千夫所指的地步,何不乾脆多多引進影印版?不是要搶短搶快嗎?這個最快了,讀者也多一種選擇。
翻譯出了什麽問題
電腦翻譯書的一個大問題是,譯者沒有時間(或正確的心態,或足夠的中文能力)將譯稿一看再看,一改再改。中文有一個缺點,那就是名詞本身表現不出複數,動詞本身表現不出時態。多數時候這可能不是很重要,因而可以忽略。但某些時候它們佔有關鍵地位,於是一個精准的英文句子,往往需要譯者額外花很大的心力,才能精准地以中文表達出來,那麽譯者就得有足夠的時間和足夠的中文能力。而唯有譯者在專業技術上具備足夠的素養,才能夠看出某些隱微地方對理解之正確性有關鍵性影響。
英文裏頭的子句如果又臭又長,別說中國人,老外也得費一番手腳才看得懂。看看這個(C++ Primer 3/e, p730):
[code..] where the conditional test if (this != &rhs) prevents assigning a class object to itself. This is particularly inappropriate in a copy assignment operator that first frees a resource currently associated with the class in order to assign the resource associated with the class being copied.
我的譯文是:
[code..] 其中的條件測試 if ( this != &rhs ) 避免將 class object 指派給自己,因為「自己指派給自己」這個動作,對於那種「先將目前系結於自己身上的資源釋放掉,以便稍後將該份資源再系結於即將被拷貝的那個 object 身上」的 copy assignment 運算子而言,尤其不合適。
只需加幾個引號,標示出子句,就好看多了。尋常一樣窗前月,才有梅花便不同。如果沒有引號輔助,你試譯看看會是什麽樣子。別對我說「根據教育部規範,上下引號只適用於強調專有名詞或特殊語氣┅」,規範是死的,人是活的呀。只要能夠靈活而正確地表現出文意,就是好辦法。小平同志不是說,管它黑貓白貓,會抓老鼠的就是好貓嗎。阿波羅1 3號登月計畫失敗時,太空艙內的備用排氣罩規格不符,地面控制中心要求宇航員必須想辦法將方形罩子塞進圓形的排氣管中,否則大家都得因為飽食二氧化碳而死於太空。這時候還想什麽規範?腦筋靈活點。
另一個中文表達的大缺點是:動名詞不分。操作是名詞(operation),也可以是動詞(operate);實現是名詞(implementation),也可以是動詞(i mplement);三考是名詞(reference),也可以是動詞(refer);請求是名詞(request),也可以是動詞(request);委託是名詞(d elegation),也可以是動詞(delegate)。當動詞名詞混雜一起的時候,就造成閱讀上的錯亂。於是你可以看到這樣的句子(取材自《設計模式》p 14,李英軍等譯,機械工業出版社)。請諸位先看原譯,能否就中文語句結構分析出其大致意思:
(1)原譯:只有當委託使設計比較簡單而不是更複雜時,它才是好的選擇。
(1)侯譯:只有當「委託方式」簡化了設計,它才是一個好的選擇。
(1)原文:Delegation is a good design choice only when it simplifies more than it complicates.
(2)原譯:委託方式為了得到同樣的效果,接受請求的物件將自己傳給被委託者(代理人),使被委託的操作可以引用接受請求的物件。
(2)侯譯:為了以「委託方式」獲得相同效果,「請托(request)受理者」將自己傳給被委託人,使自己得以讓「被委託之操作行為」取用。
(2)原文:To achieve the same effect with delegation, the receiver passes itself to the delegate to let the delegated operation refer to the receiver.
我沒有一別苗頭之意。我的譯法不見得最高明。況且翻譯一整本書所需的各種前後呼應的考量,遠比光譯一兩個句子複雜許多。只是既然我提出了問題,我總要提出自己的解法,給大家三考評量。對於機械工業出版社願意出版這樣一本經典,李英軍先生等人願意翻譯這樣一本高階而吃力不討好的書,我是帶有敬意的。
另一個翻譯上的問題就是大家往往在電腦類書中硬套一般字典查來的辭彙,沒人敢突圍。要知道,一般字典並未考量電腦技術,更不可能考慮到上下文(c ontext)。太多人抱著少做少錯,不做不錯的心理,一昧緊跟字典,不敢變動,才會造成目前許多譯詞不夠理想,卻動彈不得。我印象最深刻的是這幾個字:
instance:臺灣和大陸均有不少人譯為「實例」。這個「例」字根本不好。臺灣甚至有人譯為「案例」,更不妥當。為什麽這麽譯,因為字典查來的現成辭彙是這樣。所謂 instance 就是根據某個東西(可能是實物,可能是某種表述)產生出來的一份實際物體。我認為譯為「實體」是很合適的。根據 class 產生出來的便是object實體,根據 class template 產生出來的則是class 實體,根據 function template 產生出來的是function 實體。根據可執行檔(executable files)產生出來的,則是一份 process 實體。
paradigm:臺灣常譯為「典範」。為什麽?喔,字典查來的現成辭彙。有時候這樣譯有點道理,例如 paradigm shift 叫做「典範移轉」。問題是,何謂「典範移轉」?很難望文生義是吧。把 generic paradigm 譯為泛型典範,更是令人不知所以。我們日常用語裏也有「典範」一詞,我們會說某某人是國家社會的典範,那和電腦術語裏頭的 paradigm 根本不是同一個意思。根據 paradigm 在電腦術語中的實際意義,我把它譯為「思維模式」 ─ 典型的、根本的思維模式。
讀者來了這樣一封信:
我向您討教一個翻譯風格的問題。正如您所說,英文技術書籍最難在長句子,因為英文的句式組合形式比中文大大豐富,理解起來已經費力,翻譯成順口的中文更難。我有時遇到這種句型,切分組合,翻來覆去掂量,還是覺得中文不忍卒讀。您認為此時 (1) 我可不可以放棄 "信" 而求 "達",也就是說略掉部份句子成份以保全譯句的通順?還是 (2) 務求將原義表達出來,寧可中文句子不順暢也在所不惜?更有甚者,有時 (3) 某些句子無關宏旨,卻異常難譯,可不可以乾脆略過不譯?您的看法是什麽?
(各位有沒有注意到,這位讀者的中文很好。「切分組合,反覆掂量」這幾個字用得多精簡傳神)我的看法是,譯者有權利也有義務通權達變,但也必須有這份能耐才行。因此你的第一個問題我認為可以,你的第二個問題我認為不可以。你的第三個問題我認為可以,但需謹慎為之,莫因譯者本身水平,犧牲了某些東西。
科技翻譯應該務求義譯而非字譯。信與達,應從整個句子或甚至整個段落來看,不是從一個個單字來看。技術文章和文學多有不同,譯者最重要的任務是正確傳達知識,並儘量減少讀者的吸收困難。
到底彈性的底限在哪里?我這麽認為,譯者於技術層次上愈有把握,便享有愈大的彈性。只要技術層次夠,有把握正確瞭解並傳達了原作者要表達的技術,那麽,文字上不必字字拘泥。
中文在科技表達中並非一無是處。中文有一個優點,就是資訊密度高,很多時候精簡漂亮的一行中文,可以表達出「子句夾帶子句再夾帶子句」的三行冗長英文。中文有優美的詞藻與取之不盡用之不竭的典故、成語、俗諺,如果善用它們,冰冷的技術文字一下子就能有閱讀的樂趣。一本爛譯本,會讓讀者詰屈聱牙,痛苦至極;但是一本好譯本,能使人如沐春風。
容我說一句,正確的心態、足夠的時間、充裕的中文表達能力、水平以上的專業素養,是造就好譯本的基本元素。現今情況如何?話說回頭,好譯者的報酬幾何?你願意多花點錢表示你對他們的付出的認同嗎?
健康的選書心態
以下談到選書的心態和作學問的態度,由於都以讀者來信展開討論,因此避免不了提到我寫的《深入淺出MFC》。我要談的問題,其實不局限於某一本書,或某一種技術。就像這篇文章先前舉的許多例子一樣,都是可以放大來看的。
讀者修書一封如下:
2個星期前好不容易讀完了您的大作,讓我對MFC的認識多了不少,不過一點遺憾的是從您的書裏並不能學到如何寫一個具體的程式,僅僅是明白了M FC的“包裝技術”。本來我還以為又上當了呢 因為我買這本書的目的就是要學習用MFC來做程式的...一個偶然的機遇讓我得到了 Jeff Prosise的《programming windows with MFC》,這才發現老師您的書是多麽的重要,假如沒有您的《深入淺出MFC》我又怎麽可能programming with MFC呢?...您的書救我於水深火熱之中,帶領我衝破MFC的條條封鎖線。
雖然這位讀者最終對侯捷和侯捷的書以感謝和讚美作收,但我頗有感慨。
讀者往往以最直觀的態度來審視一本書的價值,以最直接的方式來表達他的愛憎。但不能凡是不需要的,一律視為灌水;凡不符合需求的,一律視為欺騙。這不是一種健康的選書態度。即使你最後並沒有發現《深入淺出M FC》「是多麽的重要,救我於水深火熱之中,帶領我衝破MFC的條條封鎖線」,這本書又何嘗在書名或內容欺騙過你,使你「以為又上當了呢」。再者「我買這本書的目的就是要學習用M FC來做程式的」,可是你若連MFC與application 的第一線接軌都不瞭解,照著葫蘆畫瓢,能寫出多好的程式?
我不是責怪這位讀者,只是這封來信代表某些現象,讓我心有感慨。下面是另一種偏激:
您的書我覺得有些無用的原理講的太多了! 你所寫的並不是真正的教人怎麽用VC,而是教人VC運做是怎麽樣進行的! 其實很多讀者真正關心的問題並不是在這裏! 而是在怎麽對用VC設計出真正出色的程式!
讀者永遠只想要看自己想看的內容,這一點很自然。但是你不想看到的東西並非就是「無用」,它對別人可能「很有用」。再說,連MFC與a pplication 的第一線接軌都不瞭解,照著葫蘆畫瓢,我不知道你能寫出什麽「出色的程式」。只要出一點差錯,你連除錯的能力都沒有。開車是很簡單的,開車上路遇到各種突發狀況而能應付並排除障礙,才是困難的地方,才是技術的表現。
下面是兩封臺灣讀者的意見,有點年代了。當然我必得先說明,抱持這種態度的讀者,比例大約在百分之零點零一。
讀者意見一
這本書包裝太厚。不該有的東東太多,附錄A所列的無責任書評,在我想來也是多餘。因為這篇書評在RUN!PC早有提及,後來也出了無責任書評第三集,因此實在沒有這個必要。想來是侯先生要增加書的厚度,有以致也。
讀者意見二
書評不應該放在這本書裏吧! 因為這些東西而讓書太厚實在有點┅這些灌水的東西共計有:
(a)1-16頁的讀者來函:共16頁
(b)超長的序,嗯,這應該沒有關係
©843-872頁的無責任書評:共30頁(其實裏面有一些發人省思的東西,還好)
(d)873-914頁的Scribble原始碼:共42頁(這最嚴重,幾乎沒必要的東西)
(e)915-920頁的VC範例程式一覽:共6頁(很可惜,如果再多加發揮的話很有用,
但是侯Sir只是列個標題,連說明都是英文,和看Help檔沒什麽差別)
共計:94頁
不是我無聊找碴,您可曾看到有哪本書有將近一百頁的贅肉?更別題書中動不動就列出四五頁的原始碼了。這些在光碟上都有,何必浪費紙張? 不過消掉這些贅肉,這本書還是有它的價值┅至於書中缺少的部份,我認為要看您如何去定位這本書。
總不能要求一本書把所有Program的東西講完吧! 以探討MFC的內部而言,本書沒什麽好批評的了。總而言之,這本書該不該買,我想還是肯定的。但是如果書能瘦點、售價能低點,那就更好了。
說來說去,原來是為了「如果書能瘦點、售價能低點那就更好了」。這便是頁數和售價牽扯觀念下的可憐受害者,他們扭曲了書籍的價值,也嚴重扭曲了自己該有的正確價值觀。如果我告訴這些讀者,少掉那1 00頁的所謂「贅肉」,售價一樣是 NTD 860,恐怕他們又要對這些「贅肉」熱情擁抱來一個親親了。真的是這樣,這本書是先確定價格,最後為了給讀者更多資訊和更大的方便,我才加上那些「贅肉」的。
這一類讀者,站在敵對的立場,看待出版社和作者,幻想每一個人都在覬覦他的錢包,並且認為對他無實質幫助的每一頁(可能只是因為他已看過)都是被刻意灌水的結果,都是為了欺騙他的鈔票。這樣的讀者在杯弓蛇影的壓力之下,忘記了沒有任何一本書是為個人量身打造的,也忘記了其他人可能有不同的需求,完全以自我為中心。
這一類不成熟的讀者,實在是當前劣品充斥下的犧牲者。老實說我個人並不喜歡他們成為我的讀者。只是,讀者有選擇作者的權利,作者卻沒有選擇讀者的機會。
正確的作學問態度
前面兩篇來信透露出一個疑惑,《深入淺出MFC》是不是一本對VC編程有幫助的書。我不是要在這裏夾帶推薦該書(相信我,我不需要如此),而是想透過M FC與VC的關係,引申談談作學問的態度。如果「作學問」太高遠了,那我們來談談「學習」的態度吧。
以下是一封讀者來函:
我有個疑惑,想請你幫助。我們今天學C/C++,明天學MFC,OWL(如果流行的話)
後天學C#,JAVA...如果 WINDOW 被 X WINDOW 淘汰,豈不是都要從頭學過?有沒有必要把一切搞得如此精通?同樣的目的,為什麽不用更方便簡單的快速RAD開發工具?而非要以鑽研繁雜作為樂趣?和體現水平?是否搞錯了方向和目標?我認為這正是目前大陸(臺灣我不瞭解)軟體發展的一個錯誤的方向。
所有同質的技術都有累積性與共通性。信中提到的三組東西:MFC, OWL, 或是 Windows, X Window, 或是 C++, Java, C#, 都有類似性與共通性。技術是會累積的,有了某種經驗,學習新技術會快很多。經驗愈多,學習愈快。所以我常喜歡說「觸類旁通」。如果每種技術都得從新學習,大家三五年就得歸零一次,人類世界就不會在 20 世紀像爆炸似地進步這麽快。
「有沒有必要把一切搞得如此精通?」我的回答是:看個人需求與定位。基礎知識的精通,是做為應用的一種過程與手段,而不是目的。如果你不需要通過這樣的過程,就可以把你要做的事情做得很好,那麽當然你可以跳過這個過程。我所知道的是,許多許多人必須先有這樣的過程,才能夠良好達成期望目標。我自己也需要通過這樣的過程(否則寫不出這樣的書)。這不是你所謂的「鑽研繁雜」或「體現水平」。
既然信中提到RAD,我也談談我的看法。我曾經寫過一篇文章,把RAD喻為「匹夫無罪,懷璧其罪」(見侯捷網站 1999/01/26 懷璧其罪 RAD),建議各位看看。我很贊成使用RAD。我書寫MFC書籍,探討MFC技術,但從來沒有認為它最好,或不好,我只是要幫助那麽多使用MF C的人。和Bjarne 的態度一樣,我對諸如此類的工具評比活動一點興趣都沒有。我樂意當一名觀眾,但從來不評比(應該可以說,也沒有能力評比)。
RAD 的情況,可以拿汽車做比喻。現今誰開車還需要知道齒輪箱、傳動軸、離合器、引擎點火原理、火星塞呢?但是滿街開車人誰又能夠表演3 60度大迴旋?要到達「車手」的程度,就必須對車子的原理有相當程度的瞭解。同樣是開車,洗拿(F1方程式冠軍車手)和侯捷兩人發揮車輛功能的程度,絕對有天壤之別。我認識的所有慣使R AD 的高手,無一不是有底層深厚功力。以RAD始,以RAD終,斷不能在技術上有所太大長進。你的生涯將是空白的五線譜,沒有高音,沒有低音,永遠的水平┅。
RAD是要用的,有好工具不用,和自己過不去。但是使用RAD的同時,對底層做更多的瞭解才有助於在某種情況下脫困或自助。這和 STL 的運用也一樣。會用STL,是一種檔次。對STL原理有所瞭解,又是一個檔次。追蹤過STL源碼,又是一個檔次。第三種檔次的人用起 STL 來,虎虎生風之勢絕非第一檔次的人能夠望其項背。
學習某種工具,及其背後代表的某種技術,究竟要鑽研到什麽深度?唔,答案視你想扮演什麽角色而定。「F1方程式車手」和「半夜三點才敢上臺北大馬路的用車人」之間,有許多許多的層次,你自己定位自己。
有些人絕對擁護RAD,有些人又重新反省RAD。下麵是另一封信:
我原本是一個一天到晚使用RAD工具的人...但是歷經了三個版本之後,我有一種被騙的感覺,因為處在這個環境中,似乎是投身在別人設好的一個圈套裏!這種東西會讓人對於『瞭解 OS 內部運作以及各種規範與協定的基礎層面』的欲望慢慢減低至零。今天為了突破某一個元件的限制而自己寫了一個元件,明天新版RAD內附元件就出現了比自己寫得還要好的東西。到了最後,自己不想寫,只想等別人寫給你
;要是別人不寫,你就徹頭徹尾地喪失了一項能力...(天曉得要等到何年何月),要不然就是官方給的元件功能少東少西。不只這些!最讓我受不了的是,我竟然發現:程式用這種方式去寫,簡直就比用O ffice 還要簡單,深入的思考幾乎是零...。
我在「懷璧其罪 RAD」一文中是這麽回答的:
1. RAD 並非罪惡,而是優點。要怎麽用它則是你自己的問題。我有兩位朋友是 Delphi 專家,他們可以使用 Delphi 做任何事情,沒有任何你想像中 RAD「該有」的限制。
2. 果真能夠「寫一個程式,比使用 Office 還要簡單,深入的思考幾乎是零」,並不是壞事。大家都能夠隨手寫個小程式解決手邊幾個小問題,是為component software 以及 RAD 的大貢獻。但你的形容太誇張了,目前的 RAD 還不至於美好若此,總還需要一些程式邏輯和程式語言的基本訓練。真到了你說的那一天,我覺得是件好事而不是壞事。只不過,那樣子完成的程式,都需藉助現成的元件。如果要突破現成的框框,就得有更深的功力。無論如何,R AD 不會是你的絆腳石。
這類話題很難一言以蔽之。總之,優秀的技術者一定需要一個向下沉澱的歷練,通過了這層歷練,有了扎實的基礎,就可以向上浮升,開始以抽象的思考,抽象的語言、快速開發工具來進行高層次的開發工作。這時候運用 RAD 工具,當能如虎添翼。
所謂百煉成鋼;鋼的形成,是將鐵塊不斷錘打,不斷回火,不斷淬煉。做為一個程式員,本身技能的層次,和回火淬煉的次數有密切關係。
讓我們再回頭談談基礎建設。很多資訊科系在學學生對學校所開的課程,非常不服氣,非常不屑,認為對編程能力一點幫助也沒有。首先我要說,編程、軟體發展並不是資訊系學生的唯一出路。資訊領域何其廣泛,編程只是其中小小的一支而已(但對就業市場而言則是大大的一脈)。其次我要說,基礎理論課程並非對你的編程一無是處 ─ 不是無用,只是未到用時。有些科目的影響非常直接而深遠,例如對編程最重要的兩門課:資料結構(data structure)和演算法(algorithm),這兩門課對邏輯思考與實現能力的訓練,有關鍵性的價值。沒有這兩門課做底,任你 C/C++/Java 多強多行,也寫不出個好程式。其他基礎理論課程也都各有用途,會不會在你未來的編程生涯中帶來幫助,那得看你編哪一種程。就業與學校所學,不必然會發生關係,不必然不會發生關係。
編程能力強的年輕同學,容易孳生一種趾高氣揚的惡習,看這不順眼,看那不順眼,教授都老朽,同學都可笑。問他為什麽,哦,因為「他們的編程功力都不如我」。可笑的正是你自己呀。
編程實力的培養其實很容易的。我所謂容易,是指不需借助外力,純粹自修就幾乎可以做到。再沒有比這更幸運的事了。當然你的進修必須按部就班(在我的專長範圍內,我寫了很多讓你前進時有所依循的文章,都在侯捷網站上)。任何高深的理論,只要實際操作過都可以霍然理解,編程的實作又有什麽難的。數本好書,一部電腦,一些必要的工具,全部搞定,只欠一股「頭懸樑錐刺股」的苦讀精神。實力進展到一個階段後,我非常鼓勵你追蹤名家源碼(有人指導更好)。司馬相如說:能讀千賦則善賦,能觀千劍則善劍。侯捷說:讀過千賦亦能賦,觀過千劍亦能劍。
最後我還要說,學校(尤其大學)原本不是職訓所。但是關於人格的培養,思想的啟迪,視野的開拓,現今言之,恐怕是陳義過高,沒人愛聽了。
學校肯定有學校的缺失。其一是課程太過理論,高來高去。以大學生的程度而言,太過抽象的東西他們是沒有能力接受的。但是要化抽象為具象,化繁為簡,可得有非常深厚的實力才行。其二是教材、教具、教師太過陳舊,跟不上時代。我印象最深刻的是,臺灣B BS時常有學生問 Turbo C 3.0 上的問題。我的媽呀,C++ Standard 都出來兩年了,學校還在用TC3.0。倒不是說一定要追最新最炫的工具或產品,但是TC3.0 距離 C++ Standard,有月球到地球的距離吧。用這個編譯器,可想而知老師教的是什麽內容,可想而知老師本身跟上外界脈動的程度。如果新工具新產品都很貴,顧及學校經費,我們也能體諒。可 Borland C++ 5.5, GNU C++ 2.96, TAI C++ 都是可以免費下載或限期試用的呀。它們對 C++ Standard 的實現,比TC3.0 好太多太多了。
這就涉及學校教育裏頭最重要的關鍵:師資。說句實在話,大學裏頭有不少老師,書是念得很棒,就是沒有實作經驗,更沒有業界經驗。因循苟且之念一動,萬年教材一攤,同學們就只有自求多福。
自救之道當然有:你必須更勤奮。勤奮看書,勤奮發問。勤奮搜尋好的導師和好的讀物。或許天道酬勤,就讓你碰上一個傳道授業解惑的貴人,就讓你知道一本必讀的經典,並且就讓你找到它。
說到勤奮發問,讓我發出本文的最後一聲感歎做為結束。臺灣大學生在「表達能力」這一點,程度普遍低下幼稚。能夠條理分明把自己的意思表達清楚的,十分罕見。反映出來的,就是怯怯懦懦,理不直而氣不壯。私底下聲若洪鐘,要他站起來公開表示意見,卻如細蚊之嗡嗡。不論口語或文字,用詞普遍地「俗」。大陸情況,就我的印象,以及我收到的讀者來信,感覺好很多。以前臺灣的說法是,因為大陸鬥爭厲害,人人得有一口利嘴以求自保。但文革已過數十年,我看大家的表達能力普遍還是很不錯,是不是求學階段中曾經特別重視這個?
發問的能力影響學習甚巨。善問者使人繼其聲,善教者使人承其志。我常自詡為一名善教者,但如課堂上兼能有一名善問者,高潮迭起,全班受惠。
Wednesday, July 28, 2010
FTP ipfw firewall FreeBSD
FTP ipfw firewall FreeBSD
i had a minor question/concern i was wondering why does the firewall
rulesets have permissions for everything, and help for running almosty
anything and how to open and wich port to open but yet it has no exmpale
ruleset or any help for using a FTP while using a firewall such as IPFW. it
has no help in the handbook period. on how to use ftp while using IPFW
While the default IPFW ruleset will let you make outgoing TCP connections on any port, including outbound FTP control on port 21, you need to open port 20 inbound to set up the data channel:
${fwcmd} add pass tcp from any to any 20,21 out
${fwcmd} add pass tcp from any 20 to any 1024-65535 setup
If you are running an FTP server that you want to be able to access from the outside, you'll also need:
${fwcmd} add pass log tcp from any to any 21 in via ${oif} setup
Reference: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/questions/2006-01/msg00131.html
i had a minor question/concern i was wondering why does the firewall
rulesets have permissions for everything, and help for running almosty
anything and how to open and wich port to open but yet it has no exmpale
ruleset or any help for using a FTP while using a firewall such as IPFW. it
has no help in the handbook period. on how to use ftp while using IPFW
While the default IPFW ruleset will let you make outgoing TCP connections on any port, including outbound FTP control on port 21, you need to open port 20 inbound to set up the data channel:
${fwcmd} add pass tcp from any to any 20,21 out
${fwcmd} add pass tcp from any 20 to any 1024-65535 setup
If you are running an FTP server that you want to be able to access from the outside, you'll also need:
${fwcmd} add pass log tcp from any to any 21 in via ${oif} setup
Reference: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/questions/2006-01/msg00131.html
PHP parse XML
<?php $xmlDoc = new DOMDocument('1.0', 'UTF-8'); $xmlDoc->load('note.XML'); $x = $xmlDoc->documentElement; //var_export($x); foreach ($x->childNodes AS $item) { //echo $item->nodeName . PHP_EOL; //echo $item->nodeValue . PHP_EOL; //echo $item->getAttribute('newsid') . PHP_EOL; //echo $item->getAttribute('newstitle') . PHP_EOL; //echo $x->childNodes->item($child1)->nodeName . ' = '; //echo $x->childNodes->item($child1)->getAttribute('newsid') . PHP_EOL; } unset($x); unset($xmlDoc); ?>
<?php $xmlDoc = new DOMDocument('1.0', 'UTF-8'); $xmlDoc->load('note.XML'); $x = $xmlDoc->documentElement; //var_export($x); foreach ($x->childNodes AS $item) { $Headline = $value->getElementsByTagName("Headline"); $Headline = $Headline->item(0)->nodeValue; $BODY = $value->getElementsByTagName("BODY"); $BODY = $BODY->item(0)->nodeValue; echo $Headline . " :: " . $BODY . "
"; } unset($x); unset($xmlDoc); ?>
Monday, July 26, 2010
利用位元運算加速運算效率
5/11/2007
(AS3)利用位元運算加速運算效率
MSN Space、Google Doc 、Google Blog、啪啦資訊科技
Chui-Wen Chiu(Arick)
2007.05.11 建立
位元運算在 C 語言相當常見,這種寫法的優勢在於運算非常的有效率,但缺點是可讀性不高和寫法上有些許限制,因此,如果程式有執行效能瓶頸,可透過位元算算來提高運算效能,[1] 提供一些 AS3 在位元運算上的範例和校能改善幅度。以下針對[1] 的內容整理如下:
位元運算加速技巧
1. 如果乘上一個 2 的倍數數值,可以改用左移運算(Left Shift) 加速 300%
6. 取餘數,如果除數為 2 的倍數,可利用 AND 運算加速 600%
其他位元運算技巧
1. RGB 色彩分離
2. RGB 色彩合併
雖然上述的數據相當誘人,不過,還是建議效能關鍵處再使用上述的方式,否則後續維護上是一個問題。
參考資料:
[1] Bitwise gems - fast integer math
[2] Bitwise Operations in C
(AS3)利用位元運算加速運算效率
MSN Space、Google Doc 、Google Blog、啪啦資訊科技
Chui-Wen Chiu(Arick)
2007.05.11 建立
位元運算在 C 語言相當常見,這種寫法的優勢在於運算非常的有效率,但缺點是可讀性不高和寫法上有些許限制,因此,如果程式有執行效能瓶頸,可透過位元算算來提高運算效能,[1] 提供一些 AS3 在位元運算上的範例和校能改善幅度。以下針對[1] 的內容整理如下:
位元運算加速技巧
1. 如果乘上一個 2 的倍數數值,可以改用左移運算(Left Shift) 加速 300%
x = x * 2; x = x * 64; //改為: x = x << 1; // 2 = 21 x = x << 6; // 64 = 262. 如果除上一個 2 的倍數數值,可以改用右移運算加速 350%
x = x / 2; x = x / 64; //改為: x = x >> 1;// 2 = 21 x = x >> 6;// 64 = 263. 數值轉整數加速 10%
x = int(1.232) //改為: x = 1.232 >> 0;4. 交換兩個數值(swap),使用 XOR 可以加速 20%
var t:int = a; a = b; b = t; //equals: a ^= b; b ^= a; a ^= b;5. 正負號轉換,可以加入 300%
i = -i; //改為 i = ~i + 1; // NOT 寫法 //或 i = (i ^ -1) + 1; // XOR 寫法
6. 取餘數,如果除數為 2 的倍數,可利用 AND 運算加速 600%
x = 131 % 4; //equals: x = 131 & (4 - 1);7. 利用 AND 運算檢查整數是否為 2 的倍數,可以加速 600%
isEven = (i % 2) == 0; //equals: isEven = (i & 1) == 0;8. 加速 Math.abs 600% 的寫法1,寫法2 又比寫法1加速 20%
//寫法1 i = x < 0 ? -x : x;//寫法2
i = (x ^ (x >> 31)) - (x >> 31);9. 比較兩數值相乘之後是否擁有相同的符號,加速 35%
eqSign = a * b > 0; //equals: eqSign = a ^ b > 0;
其他位元運算技巧
1. RGB 色彩分離
var 24bitColor:uint = 0xff00cc; var r:uint = 24bitColor >> 16; var g:uint = 24bitColor >> 8 & 0xFF; var b:uint = 24bitColor & 0xFF;
2. RGB 色彩合併
var r:uint = 0xff; var g:uint = 0x00; var b:uint = 0xcc; var 24bitColor:uint = r << 16 | g << 8 | b;
雖然上述的數據相當誘人,不過,還是建議效能關鍵處再使用上述的方式,否則後續維護上是一個問題。
參考資料:
[1] Bitwise gems - fast integer math
[2] Bitwise Operations in C
無的放矢
無的放矢
用法說明◥
一、
語義說明 比喻言語或行動沒有目的。
使用類別 用在「言談空泛」的表述上。
例句
01 總經理開會向來言必有據,從不無的放矢。
02 他們談了半天都只是無的放矢,什麼問題也沒說明。
03 請各位針對問題發表意見,不要無的放矢,浪費大家的時間。
二、
語義說明 比喻毫無事實根據而胡亂的指責、攻擊別人。
使用類別 用在「指控無據」的表述上。
例句
01 這些指控完全是無的放矢,毫無根據。
02 他決定訴諸法律來抗議這種無的放矢的指責。
03 這些話都是無的放矢的謠言,你又何必在意呢?
用法說明◥
一、
語義說明 比喻言語或行動沒有目的。
使用類別 用在「言談空泛」的表述上。
例句
01 總經理開會向來言必有據,從不無的放矢。
02 他們談了半天都只是無的放矢,什麼問題也沒說明。
03 請各位針對問題發表意見,不要無的放矢,浪費大家的時間。
二、
語義說明 比喻毫無事實根據而胡亂的指責、攻擊別人。
使用類別 用在「指控無據」的表述上。
例句
01 這些指控完全是無的放矢,毫無根據。
02 他決定訴諸法律來抗議這種無的放矢的指責。
03 這些話都是無的放矢的謠言,你又何必在意呢?
Sunday, July 25, 2010
Installing SSH server in Ubuntu
Installing SSH server in Ubuntu
By default, your system will have no SSH service enabled, which means you won't be able to connect to it remotely using SSH protocol (TCP port 22). This means that installing SSH server will be one of the first post-install steps on your system.
The most common SSH implementation is OpenSSH server, and that's exactly what you want to install.
Log in with your standard username and password, and run the following command to install openssh-server. You should be using the same username that you specified when installing Ubuntu, as it will be the only account with sudo privileges to run commands as root:
ubuntu$ sudo su
or
ubuntu$ sudo apt-get install openssh-server
Verifying your SSH server is installed:
ubuntu$ dpkg --get-selections | grep -i ssh
openssh-server install
Verifying your SSH server works
While you're still on your local desktop session, you can use the ps command to confirm that SSH daemon (sshd) is running:
ubuntu$ ps -aef | grep sshd
By default, your system will have no SSH service enabled, which means you won't be able to connect to it remotely using SSH protocol (TCP port 22). This means that installing SSH server will be one of the first post-install steps on your system.
The most common SSH implementation is OpenSSH server, and that's exactly what you want to install.
Log in with your standard username and password, and run the following command to install openssh-server. You should be using the same username that you specified when installing Ubuntu, as it will be the only account with sudo privileges to run commands as root:
ubuntu$ sudo su
or
ubuntu$ sudo apt-get install openssh-server
Verifying your SSH server is installed:
ubuntu$ dpkg --get-selections | grep -i ssh
openssh-server install
Verifying your SSH server works
While you're still on your local desktop session, you can use the ps command to confirm that SSH daemon (sshd) is running:
ubuntu$ ps -aef | grep sshd
forgot root password - Changing your root password from bootable media
forgot root password - Changing your root password from bootable media
Step 1: Boot into another Linux system
This can be any other Linux system on your computer. It can be another installed distribution, or a live CD such as RIP (Recovery is Possible) or Knoppix.
Step 2: Open a terminal
If you have booted into a system such as Knoppix or the Ubuntu Live CD, you will need to open a terminal first. Alternatively, you can switch to a virtual terminal by pressing CTRL+ALT+F2. You will need root access on this system to gain access to your system.
Step 3: Mount your root filesystem to be rescued
This will be the filesystem that contains your /bin, /etc and /sbin directories, typically /dev/sda1 or /dev/hda1.
Typing the following can give you a list of partitions with sizes, this might give you a clue as to which partition your root partition is:
#cat /proc/partitions
To mount your root partition, type:
#mount /dev/hda1 /mnt
To gain access to your system, type:
#chroot /mnt
You will now have full access to your old system. To change your root password, type:
# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
To exit from the chroot, type 'exit'. You can now reboot (by typing 'reboot' and to press enter') and gain root access to your system again.
Step 1: Boot into another Linux system
This can be any other Linux system on your computer. It can be another installed distribution, or a live CD such as RIP (Recovery is Possible) or Knoppix.
Step 2: Open a terminal
If you have booted into a system such as Knoppix or the Ubuntu Live CD, you will need to open a terminal first. Alternatively, you can switch to a virtual terminal by pressing CTRL+ALT+F2. You will need root access on this system to gain access to your system.
Step 3: Mount your root filesystem to be rescued
This will be the filesystem that contains your /bin, /etc and /sbin directories, typically /dev/sda1 or /dev/hda1.
Typing the following can give you a list of partitions with sizes, this might give you a clue as to which partition your root partition is:
#cat /proc/partitions
To mount your root partition, type:
#mount /dev/hda1 /mnt
To gain access to your system, type:
#chroot /mnt
You will now have full access to your old system. To change your root password, type:
# passwd
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
To exit from the chroot, type 'exit'. You can now reboot (by typing 'reboot' and to press enter') and gain root access to your system again.
Friday, July 23, 2010
在 Linux NAT 監看 MSN 聊天內容
在 Linux NAT 監看 MSN 聊天內容
Gateway 是上網時網路封包必經的關卡, 從中截聽封包內容最能掌握使用者上網的所有動作.
網路上已有許多如: MSN Sniffer、ICQ Sniffer、AIM Sniffer 等網路聊天監聽軟體, 需要的話可以在搜尋引擎找到一堆. 以下是在 Linux 平台上使用 msniff 監看 MSN 傳訊內容的操作備忘:
安裝 msniff
需求套件: libpcap
wget http://shh.thathost.com/pub-unix/files/msniff-0.1.3.tar.gz
tar -zxf msniff-0.1.3.tar.gz
cd msniff-0.1.3
make
ps. 若確定已安裝 libpcap, 但 make 時仍發生 "pcap.h: No such file or directory" 的錯誤訊息, 修正 Makefile 裡的 INCDIR, 將它指向 pcap.h 的正確路徑即可解決
使用前的前置動作
由於 msnnif 尚無法解析 msn 透過 port 80 傳輸的內容 (因格式不同), 因此在 iptables 中禁止使用者透過 tcp port 80 傳訊:
iptables -A FORWARD -p tcp -d 207.46.0.0/16 --dport 80 -j DROP
iptables -A FORWARD -p tcp -s 207.46.0.0/16 --sport 80 -j DROP
若 Linux NAT 扮演 Transparent Proxy 的角色, 須修改 REDIRECT 規則如下:
iptables -t nat -A PREROUTING -p tcp -s 192.168.0.0/24 -d ! 207.46.0.0/16 --dport 80 -j REDIRECT --to-port 3128
ps. 207.46.0.0/16 是指 baym[n]-gw[1-n].msgr.hotmail.com
搭配 Squid 全面封鎖 msn http 傳輸的辦法:
修改 squid.conf
acl msn_domain dstdomain .msgr.hotmail.com .messenger.hotmail.com webmessenger.msn.com
acl msn_mime req_mime_type -i ^application/x-msn-messenger$
http_access deny msn_domain
http_access deny msn_mime
資料來源: 資安論壇的這篇文章
-- 2005/12/27 補充:
使用 iptables + L7-filter 強迫 MSN Messenger 使用 1863 port
iptables -t mangle -A PREROUTING -p tcp --sport 1863 -m layer7 --l7proto msnmessenger -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp --dport 1863 -m layer7 --l7proto msnmessenger -j ACCEPT
iptables -t mangle -A PREROUTING -m layer7 --l7proto msnmessenger -j DROP
監看 msn 傳訊內容
./msniff eth0 > msn.log & (背景執行, 預設監聽 msn 標準 port -- tcp 1863)
tail -f msn.log (持續監看所有聊天內容, 中文內容為 UTF-8 格式)
若需轉為 BIG5 可使用 iconv 指令轉換: iconv -f utf-8 -t big5 msn.log
Gateway 是上網時網路封包必經的關卡, 從中截聽封包內容最能掌握使用者上網的所有動作.
網路上已有許多如: MSN Sniffer、ICQ Sniffer、AIM Sniffer 等網路聊天監聽軟體, 需要的話可以在搜尋引擎找到一堆. 以下是在 Linux 平台上使用 msniff 監看 MSN 傳訊內容的操作備忘:
安裝 msniff
需求套件: libpcap
wget http://shh.thathost.com/pub-unix/files/msniff-0.1.3.tar.gz
tar -zxf msniff-0.1.3.tar.gz
cd msniff-0.1.3
make
ps. 若確定已安裝 libpcap, 但 make 時仍發生 "pcap.h: No such file or directory" 的錯誤訊息, 修正 Makefile 裡的 INCDIR, 將它指向 pcap.h 的正確路徑即可解決
使用前的前置動作
由於 msnnif 尚無法解析 msn 透過 port 80 傳輸的內容 (因格式不同), 因此在 iptables 中禁止使用者透過 tcp port 80 傳訊:
iptables -A FORWARD -p tcp -d 207.46.0.0/16 --dport 80 -j DROP
iptables -A FORWARD -p tcp -s 207.46.0.0/16 --sport 80 -j DROP
若 Linux NAT 扮演 Transparent Proxy 的角色, 須修改 REDIRECT 規則如下:
iptables -t nat -A PREROUTING -p tcp -s 192.168.0.0/24 -d ! 207.46.0.0/16 --dport 80 -j REDIRECT --to-port 3128
ps. 207.46.0.0/16 是指 baym[n]-gw[1-n].msgr.hotmail.com
搭配 Squid 全面封鎖 msn http 傳輸的辦法:
修改 squid.conf
acl msn_domain dstdomain .msgr.hotmail.com .messenger.hotmail.com webmessenger.msn.com
acl msn_mime req_mime_type -i ^application/x-msn-messenger$
http_access deny msn_domain
http_access deny msn_mime
資料來源: 資安論壇的這篇文章
-- 2005/12/27 補充:
使用 iptables + L7-filter 強迫 MSN Messenger 使用 1863 port
iptables -t mangle -A PREROUTING -p tcp --sport 1863 -m layer7 --l7proto msnmessenger -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp --dport 1863 -m layer7 --l7proto msnmessenger -j ACCEPT
iptables -t mangle -A PREROUTING -m layer7 --l7proto msnmessenger -j DROP
監看 msn 傳訊內容
./msniff eth0 > msn.log & (背景執行, 預設監聽 msn 標準 port -- tcp 1863)
tail -f msn.log (持續監看所有聊天內容, 中文內容為 UTF-8 格式)
若需轉為 BIG5 可使用 iconv 指令轉換: iconv -f utf-8 -t big5 msn.log
網路監聽 vs. 反監聽
- ettercap - 比 dsniff 好用千百倍的跨 Switch 密碼截錄軟體
- 以 PromiScan 搜索區域網路中, 網卡處於混雜模式 (promiscuous) 的機器
- Simp Server for Linux 只要其中一方透過 Simp Server 連線即可加密傳訊
監聽
Ethereal
dsniff 網路監控工具組
在 Linux NAT 監看 MSN 聊天內容- ettercap 比 dsniff 好用千百倍的跨 Switch 密碼截錄軟體
- sudo /usr/local/bin/ettercap -G -i eth0 -n 255.255.255.0
- 「Unified sniffing...」→ Network interface: eth0 → OK
- 「Hosts」→「Scan for hosts」
- 「Mitm」→「Arp poisoning...」→ check「Sniff remote connections.」→ OK
- 「Start」→「Start sniffing」
ps. ettercap 若啟動了「Arp poisoning」, 結束之前記得「Mitm」→「Stop mitm attack(s)」,
若直接關掉 ettercap 會讓整個 LAN 掛掉好一陣子 :)
反監聽
以 arp 指令查看 arp table 資訊
Linux: arp
Windows: arp -a若清單中出現兩個以上相同的 mac address, 表示被 "arp 欺騙", 封包正在被截聽
破解方法: 在 ARP 表格中定義固定 MAC Address 資料
指令: arp -s 真實的.閘道.IP 真實的:閘道:MAC:ADDRESS
以
PromiScan 搜索區域網路中, 網卡處於混雜模式 (promiscuous) 的機器
適用平台: Windows 2000 / XP
需求套件:
WinPcap 3.0
以上版本
Thursday, July 22, 2010
七傷拳
「五行之氣調陰陽,損心傷肺摧肝腸,藏離精失意恍惚,三焦齊逆兮魂魄飛揚!」
這是七傷拳的總訣,倚天屠龍記中崆峒派的絕學.
也是名震江湖的金毛獅王謝遜,擊斃少林方丈的神功.
對於七傷拳的威力,謝遜描述道:
「我這一拳之中共有七股不同勁力,或剛猛,或陰柔,或剛中有柔,或柔中有剛,或橫出,或直送,或內縮。敵人抵擋了第一股勁,抵不住第二股,抵了第二股,第三股勁力他又如何對付?嘿嘿,『七傷拳』之名便由此來。」
對於中了七傷拳的情形,金毛獅王亦曾一擊冰火島上的大樹.當下大樹外表毫無損傷,然而樹內的紋路脈理卻全數碎斷.若非將樹砍下無法觀視,而樹也將漸轉枯黃而死.
七傷拳威力如此之大,卻也不是無有缺點.
謝遜道:「每人體內,均有陰陽二氣,金木水火土五行。心屬火、肺屬金、腎屬水、脾屬土、肝屬木,一練七傷,七者皆傷。這七傷拳的拳功每練一次,自身內臟便受一次損害,所謂七傷,實則是先傷己,再傷敵。我若不是在練七傷拳時傷了心脈,也不致有時狂性大發、無法抑制了。」
這就是七傷拳:先傷己,再傷敵;傷敵七分,自損三分.
小說內描寫此部武學,而現實生活中,七傷拳也存在著.
所謂脣槍舌劍,言語,就是發出的劍氣掌風.
尖酸刻薄的話,一針見血的話,冷酷無情的話,甚至是沉默無聲,都是具有強大威力的殺招
而受招者若是自己親密的人,重要的人,心愛的人
那這一掌下去,痛的絕不只對方而已
我想我應該修練七傷拳很久了吧…
不但如此,我更是七傷拳界的佼佼者
當地雷引爆,七傷亂舞過後,四周滿目瘡痍,慘不忍睹.
而我也身受重傷,倒地嘔血不止
甚至在事過境遷之後,
深夜失眠之時,
我還會因傷了心脈而潸然淚下
說來可笑,其實我的七傷拳,非至親摯愛不打
一旦擊出,隨即後悔,縱使退敵,亦自心疼
可是我卻一次又一次出拳,一次又一次的傷人自傷
面對自己這種愚蠢的行為,我不禁懷疑我是不是已經傷到心脈俱斷,意識離散
如果你發現自己也有類似的情形,
誠摯的勸你莫在修練下去了.
此武功威力之大,傷害之深,非常人可練
戒之慎之.
Reference: http://bmy335.pixnet.net/blog/post/9375990
這是七傷拳的總訣,倚天屠龍記中崆峒派的絕學.
也是名震江湖的金毛獅王謝遜,擊斃少林方丈的神功.
對於七傷拳的威力,謝遜描述道:
「我這一拳之中共有七股不同勁力,或剛猛,或陰柔,或剛中有柔,或柔中有剛,或橫出,或直送,或內縮。敵人抵擋了第一股勁,抵不住第二股,抵了第二股,第三股勁力他又如何對付?嘿嘿,『七傷拳』之名便由此來。」
對於中了七傷拳的情形,金毛獅王亦曾一擊冰火島上的大樹.當下大樹外表毫無損傷,然而樹內的紋路脈理卻全數碎斷.若非將樹砍下無法觀視,而樹也將漸轉枯黃而死.
七傷拳威力如此之大,卻也不是無有缺點.
謝遜道:「每人體內,均有陰陽二氣,金木水火土五行。心屬火、肺屬金、腎屬水、脾屬土、肝屬木,一練七傷,七者皆傷。這七傷拳的拳功每練一次,自身內臟便受一次損害,所謂七傷,實則是先傷己,再傷敵。我若不是在練七傷拳時傷了心脈,也不致有時狂性大發、無法抑制了。」
這就是七傷拳:先傷己,再傷敵;傷敵七分,自損三分.
小說內描寫此部武學,而現實生活中,七傷拳也存在著.
所謂脣槍舌劍,言語,就是發出的劍氣掌風.
尖酸刻薄的話,一針見血的話,冷酷無情的話,甚至是沉默無聲,都是具有強大威力的殺招
而受招者若是自己親密的人,重要的人,心愛的人
那這一掌下去,痛的絕不只對方而已
我想我應該修練七傷拳很久了吧…
不但如此,我更是七傷拳界的佼佼者
當地雷引爆,七傷亂舞過後,四周滿目瘡痍,慘不忍睹.
而我也身受重傷,倒地嘔血不止
甚至在事過境遷之後,
深夜失眠之時,
我還會因傷了心脈而潸然淚下
說來可笑,其實我的七傷拳,非至親摯愛不打
一旦擊出,隨即後悔,縱使退敵,亦自心疼
可是我卻一次又一次出拳,一次又一次的傷人自傷
面對自己這種愚蠢的行為,我不禁懷疑我是不是已經傷到心脈俱斷,意識離散
如果你發現自己也有類似的情形,
誠摯的勸你莫在修練下去了.
此武功威力之大,傷害之深,非常人可練
戒之慎之.
Reference: http://bmy335.pixnet.net/blog/post/9375990
Wednesday, July 21, 2010
x86 Disassembly/Functions and Stack Frames
x86 Disassembly/Functions and Stack Frames
From Wikibooks, the open-content textbooks collection
< X86 Disassembly
Jump to: navigation, search
x86 Disassembly
Contents [hide]
1 Functions and Stack Frames
2 Standard Entry Sequence
3 Standard Exit Sequence
4 Non-Standard Stack Frames
4.1 Using Uninitialized Registers
4.2 "static" Functions
4.3 Hot Patch Prologue
5 Local Static Variables
[edit] Functions and Stack Frames
To allow for many unknowns in the execution environment, functions are frequently set up with a "stack frame" to allow access to both function parameters, and automatic function variables. The idea behind a stack frame is that each subroutine can act independently of its location on the stack, and each subroutine can act as if it is the top of the stack.
When a function is called, a new stack frame is created at the current esp location. A stack frame acts like a partition on the stack. All items from previous functions are higher up on the stack, and should not be modified. Each current function has access to the remainder of the stack, from the stack frame until the end of the stack page. The current function always has access to the "top" of the stack, and so functions do not need to take account of the memory usage of other functions or programs.
[edit] Standard Entry Sequence
This code example uses
MASM Syntax
For many compilers, the standard function entry sequence is the following piece of code (X is the total size, in bytes, of all automatic variables used in the function):
push ebp
mov ebp, esp
sub esp, X
For example, here is a C function code fragment and the resulting assembly instructions:
void MyFunction()
{
int a, b, c;
...
push ebp ; save the value of ebp
mov ebp, esp ; ebp now points to the top of the stack
sub esp, 12 ; space allocated on the stack for the local variables
This means local variables can be accessed by referencing ebp. Consider the following C code fragment and corresponding assembly code:
a = 10;
b = 5;
c = 2;
mov [ebp - 4], 10 ; location of variable a
mov [ebp - 8], 5 ; location of b
mov [ebp - 12], 2 ; location of c
This all seems well and good, but what is the purpose of ebp in this setup? Why save the old value of ebp and then point ebp to the top of the stack, only to change the value of esp with the next instruction? The answer is function parameters.
Consider the following C function declaration:
void MyFunction2(int x, int y, int z)
{
...
}
It produces the following assembly code:
push ebp
mov ebp, esp
sub esp, 0 ; no local variables, most compilers will omit this line
Which is exactly as one would expect. So, what exactly does ebp do, and where are the function parameters stored? The answer is found when we call the function.
Consider the following C function call:
MyFunction2(10, 5, 2);
This will create the following assembly code (using a Right-to-Left calling convention called CDECL, explained later):
push 2
push 5
push 10
call _MyFunction2
Note: Remember that the call x86 instruction is basically equivalent to
push eip + 2 ; return address is current address + size of two instructions
jmp _MyFunction2
It turns out that the function arguments are all passed on the stack! Therefore, when we move the current value of the stack pointer (esp) into ebp, we are pointing ebp directly at the function arguments. As the function contents pushes and pops values, ebp is not affected by esp. Remember that pushing basically does this:
sub esp, 4 ; "allocate" space for the new stack item
mov [esp], X ; put new stack item value X in
This means that first the return address and then the old value of ebp are put on the stack. Therefore [ebp] points to the location of the old value of ebp, [ebp + 4] points to the return address, and [ebp + 8] points to the first function argument. Here is a (crude) representation of the stack at this point:
: :
| 5 | [ebp + 12] (2nd function argument)
| 10 | [ebp + 8] (1st function argument)
| RA | [ebp + 4] (return address)
| FP | [ebp] (old ebp value)
| | [ebp - 4] (1st local variable)
: :
The stack pointer value may change during the execution of the current function. In particular this happens when:
parameters are passed to another function;
the pseudo-function "alloca()" is used.
[FIXME: When parameters are passed into another function the esp changing is not an issue. When that function returns the esp will be back to its old value. So why does ebp help there. This needs better explanation. (The real explanation is here, ESP is not really needed: http://blogs.msdn.com/larryosterman/archive/2007/03/12/fpo.aspx)] This means that the value of esp cannot be reliably used to determine (using the appropriate offset) the memory location of a specific local variable. To solve this problem, many compilers access local variables using negative offsets from the ebp registers. This allows us to assume that the same offset is always used to access the same variable (or parameter). For this reason, the ebp register is called the frame pointer, or FP.
[edit] Standard Exit Sequence
The Standard Exit Sequence must undo the things that the Standard Entry Sequence does. To this effect, the Standard Exit Sequence must perform the following tasks, in the following order:
Remove space for local variables, by reverting esp to its old value.
Restore the old value of ebp to its old value, which is on top of the stack.
Return to the calling function with a ret command.
As an example, the following C code:
void MyFunction3(int x, int y, int z)
{
int a, int b, int c;
...
return;
}
Will create the following assembly code:
push ebp
mov ebp, esp
sub esp, 12 ; sizeof(a) + sizeof(b) + sizeof(c)
;x = [ebp + 8], y = [ebp + 12], z = [ebp + 16]
;a = [ebp - 12] = [esp], b = [ebp - 8] = [esp + 4], c = [ebp - 4] = [esp + 8]
mov esp, ebp
pop ebp
ret 12 ; sizeof(x) + sizeof(y) + sizeof(z)
[edit] Non-Standard Stack Frames
Frequently, reversers will come across a subroutine that doesn't set up a standard stack frame. Here are some things to consider when looking at a subroutine that does not start with a standard sequence:
[edit] Using Uninitialized Registers
When a subroutine starts using data in an uninitialized register, that means that the subroutine expects external functions to put data into that register before it gets called. Some calling conventions pass arguments in registers, but sometimes a compiler will not use a standard calling convention.
[edit] "static" Functions
In C, functions may optionally be declared with the static keyword, as such:
static void MyFunction4();
The static keyword causes a function to have only local scope, meaning it may not be accessed by any external functions (it is strictly internal to the given code file). When an optimizing compiler sees a static function that is only referenced by calls (no references through function pointers), it "knows" that external functions cannot possibly interface with the static function (the compiler controls all access to the function), so the compiler doesn't bother making it standard.
[edit] Hot Patch Prologue
Some Windows functions set up a regular stack frame as explained above, but start out with the apparently non-sensical line
mov edi, edi;
This instruction is assembled into 2 bytes which serve as a placeholder for future function patches. Taken as a whole such a function might look like this:
nop ; each nop is 1 byte long
nop
nop
nop
nop
FUNCTION: ; <-- This is the function entry point as used by call instructions
mov edi, edi ; mov edi,edi is 2 bytes long
push ebp ; regular stack frame setup
mov ebp, esp
If such a function needs to be replaced without reloading the application (or restarting the machine in case of kernel patches) it can be achieved by inserting a jump to the replacement function. A short jump instruction (which can jump +/- 127 bytes) requires 2 bytes of storage space - just the amount that the "mov edi,edi" placeholder provides. A jump to any memory location, in this case to our replacement function, requires 5 bytes. These are provided by the 5 no-operation bytes just preceding the function. If a function thus patched gets called it will first jump back by 5 bytes and then do a long jump to the replacement function. After the patch the memory might look like this
LABEL:
jmp REPLACEMENT_FUNCTION ; <-- 5 NOPs replaced by jmp
FUNCTION:
jmp short LABEL ; <-- mov edi has been replaced by short jump backwards
push ebp
mov ebp, esp ; <-- regular stack frame setup as before
The reason for using a 2-byte mov instruction at the beginning instead of putting 5 nops there directly, is to prevent corruption during the patching process. There would be a risk with replacing 5 individual instructions if the instruction pointer is currently pointing at any one of them. Using a single mov instruction as placeholder on the other hand guarantees that the patching can be completed as an atomic transaction.
[edit] Local Static Variables
Local static variables cannot be created on the stack, since the value of the variable is preserved across function calls. We'll discuss local static variables and other types of variables in a later chapter.
Retrieved from "http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames"
From Wikibooks, the open-content textbooks collection
< X86 Disassembly
Jump to: navigation, search
x86 Disassembly
Contents [hide]
1 Functions and Stack Frames
2 Standard Entry Sequence
3 Standard Exit Sequence
4 Non-Standard Stack Frames
4.1 Using Uninitialized Registers
4.2 "static" Functions
4.3 Hot Patch Prologue
5 Local Static Variables
[edit] Functions and Stack Frames
To allow for many unknowns in the execution environment, functions are frequently set up with a "stack frame" to allow access to both function parameters, and automatic function variables. The idea behind a stack frame is that each subroutine can act independently of its location on the stack, and each subroutine can act as if it is the top of the stack.
When a function is called, a new stack frame is created at the current esp location. A stack frame acts like a partition on the stack. All items from previous functions are higher up on the stack, and should not be modified. Each current function has access to the remainder of the stack, from the stack frame until the end of the stack page. The current function always has access to the "top" of the stack, and so functions do not need to take account of the memory usage of other functions or programs.
[edit] Standard Entry Sequence
This code example uses
MASM Syntax
For many compilers, the standard function entry sequence is the following piece of code (X is the total size, in bytes, of all automatic variables used in the function):
push ebp
mov ebp, esp
sub esp, X
For example, here is a C function code fragment and the resulting assembly instructions:
void MyFunction()
{
int a, b, c;
...
push ebp ; save the value of ebp
mov ebp, esp ; ebp now points to the top of the stack
sub esp, 12 ; space allocated on the stack for the local variables
This means local variables can be accessed by referencing ebp. Consider the following C code fragment and corresponding assembly code:
a = 10;
b = 5;
c = 2;
mov [ebp - 4], 10 ; location of variable a
mov [ebp - 8], 5 ; location of b
mov [ebp - 12], 2 ; location of c
This all seems well and good, but what is the purpose of ebp in this setup? Why save the old value of ebp and then point ebp to the top of the stack, only to change the value of esp with the next instruction? The answer is function parameters.
Consider the following C function declaration:
void MyFunction2(int x, int y, int z)
{
...
}
It produces the following assembly code:
push ebp
mov ebp, esp
sub esp, 0 ; no local variables, most compilers will omit this line
Which is exactly as one would expect. So, what exactly does ebp do, and where are the function parameters stored? The answer is found when we call the function.
Consider the following C function call:
MyFunction2(10, 5, 2);
This will create the following assembly code (using a Right-to-Left calling convention called CDECL, explained later):
push 2
push 5
push 10
call _MyFunction2
Note: Remember that the call x86 instruction is basically equivalent to
push eip + 2 ; return address is current address + size of two instructions
jmp _MyFunction2
It turns out that the function arguments are all passed on the stack! Therefore, when we move the current value of the stack pointer (esp) into ebp, we are pointing ebp directly at the function arguments. As the function contents pushes and pops values, ebp is not affected by esp. Remember that pushing basically does this:
sub esp, 4 ; "allocate" space for the new stack item
mov [esp], X ; put new stack item value X in
This means that first the return address and then the old value of ebp are put on the stack. Therefore [ebp] points to the location of the old value of ebp, [ebp + 4] points to the return address, and [ebp + 8] points to the first function argument. Here is a (crude) representation of the stack at this point:
: :
| 5 | [ebp + 12] (2nd function argument)
| 10 | [ebp + 8] (1st function argument)
| RA | [ebp + 4] (return address)
| FP | [ebp] (old ebp value)
| | [ebp - 4] (1st local variable)
: :
The stack pointer value may change during the execution of the current function. In particular this happens when:
parameters are passed to another function;
the pseudo-function "alloca()" is used.
[FIXME: When parameters are passed into another function the esp changing is not an issue. When that function returns the esp will be back to its old value. So why does ebp help there. This needs better explanation. (The real explanation is here, ESP is not really needed: http://blogs.msdn.com/larryosterman/archive/2007/03/12/fpo.aspx)] This means that the value of esp cannot be reliably used to determine (using the appropriate offset) the memory location of a specific local variable. To solve this problem, many compilers access local variables using negative offsets from the ebp registers. This allows us to assume that the same offset is always used to access the same variable (or parameter). For this reason, the ebp register is called the frame pointer, or FP.
[edit] Standard Exit Sequence
The Standard Exit Sequence must undo the things that the Standard Entry Sequence does. To this effect, the Standard Exit Sequence must perform the following tasks, in the following order:
Remove space for local variables, by reverting esp to its old value.
Restore the old value of ebp to its old value, which is on top of the stack.
Return to the calling function with a ret command.
As an example, the following C code:
void MyFunction3(int x, int y, int z)
{
int a, int b, int c;
...
return;
}
Will create the following assembly code:
push ebp
mov ebp, esp
sub esp, 12 ; sizeof(a) + sizeof(b) + sizeof(c)
;x = [ebp + 8], y = [ebp + 12], z = [ebp + 16]
;a = [ebp - 12] = [esp], b = [ebp - 8] = [esp + 4], c = [ebp - 4] = [esp + 8]
mov esp, ebp
pop ebp
ret 12 ; sizeof(x) + sizeof(y) + sizeof(z)
[edit] Non-Standard Stack Frames
Frequently, reversers will come across a subroutine that doesn't set up a standard stack frame. Here are some things to consider when looking at a subroutine that does not start with a standard sequence:
[edit] Using Uninitialized Registers
When a subroutine starts using data in an uninitialized register, that means that the subroutine expects external functions to put data into that register before it gets called. Some calling conventions pass arguments in registers, but sometimes a compiler will not use a standard calling convention.
[edit] "static" Functions
In C, functions may optionally be declared with the static keyword, as such:
static void MyFunction4();
The static keyword causes a function to have only local scope, meaning it may not be accessed by any external functions (it is strictly internal to the given code file). When an optimizing compiler sees a static function that is only referenced by calls (no references through function pointers), it "knows" that external functions cannot possibly interface with the static function (the compiler controls all access to the function), so the compiler doesn't bother making it standard.
[edit] Hot Patch Prologue
Some Windows functions set up a regular stack frame as explained above, but start out with the apparently non-sensical line
mov edi, edi;
This instruction is assembled into 2 bytes which serve as a placeholder for future function patches. Taken as a whole such a function might look like this:
nop ; each nop is 1 byte long
nop
nop
nop
nop
FUNCTION: ; <-- This is the function entry point as used by call instructions
mov edi, edi ; mov edi,edi is 2 bytes long
push ebp ; regular stack frame setup
mov ebp, esp
If such a function needs to be replaced without reloading the application (or restarting the machine in case of kernel patches) it can be achieved by inserting a jump to the replacement function. A short jump instruction (which can jump +/- 127 bytes) requires 2 bytes of storage space - just the amount that the "mov edi,edi" placeholder provides. A jump to any memory location, in this case to our replacement function, requires 5 bytes. These are provided by the 5 no-operation bytes just preceding the function. If a function thus patched gets called it will first jump back by 5 bytes and then do a long jump to the replacement function. After the patch the memory might look like this
LABEL:
jmp REPLACEMENT_FUNCTION ; <-- 5 NOPs replaced by jmp
FUNCTION:
jmp short LABEL ; <-- mov edi has been replaced by short jump backwards
push ebp
mov ebp, esp ; <-- regular stack frame setup as before
The reason for using a 2-byte mov instruction at the beginning instead of putting 5 nops there directly, is to prevent corruption during the patching process. There would be a risk with replacing 5 individual instructions if the instruction pointer is currently pointing at any one of them. Using a single mov instruction as placeholder on the other hand guarantees that the patching can be completed as an atomic transaction.
[edit] Local Static Variables
Local static variables cannot be created on the stack, since the value of the variable is preserved across function calls. We'll discuss local static variables and other types of variables in a later chapter.
Retrieved from "http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames"
We Read Over The Code And Learned How It Worked
We Read Over The Code And Learned How It Worked
"Look at anyone who is extremely nimble with the kernel, and ask them what they worked on to get going with development. Did Andrew Morton fixup whitespace errors when he was starting to become familiar with the tree? Did I? No, none of us did this stuff. We read over the code and learned how it worked, did a port, optimized a lookup algorithm somewhere. Consistently we see people turding with whitespace, and not breaking out of that cycle. That is a problem."
http://kerneltrap.org/Quote/We_Read_Over_The_Code_And_Learned_How_It_Worked
"Look at anyone who is extremely nimble with the kernel, and ask them what they worked on to get going with development. Did Andrew Morton fixup whitespace errors when he was starting to become familiar with the tree? Did I? No, none of us did this stuff. We read over the code and learned how it worked, did a port, optimized a lookup algorithm somewhere. Consistently we see people turding with whitespace, and not breaking out of that cycle. That is a problem."
http://kerneltrap.org/Quote/We_Read_Over_The_Code_And_Learned_How_It_Worked
understand stake - Intel x86 Function-call Conventions - Assembly View
understand stake - Intel x86 Function-call Conventions - Assembly View
One of the "big picture" issues in looking at compiled C code is the
function-calling conventions. These are the methods that a calling
function and a called function agree on how parameters and return values
should be passed between them, and how the stack is used by the function
itself. The layout of the stack constitutes the "stack frame", and
knowing how this works can go a long way to decoding how something works.
In C and modern CPU design conventions, the stack frame is a chunk of
memory, allocated from the stack, at run-time, each time a function is
called, to store its automatic variables. Hence nested or recursive calls
to the same function, each successively obtain their own separate frames.
Physically, a function's stack frame is the area between the addresses
contained in esp, the stack pointer, and ebp, the frame pointer (base
pointer in Intel terminology). Thus, if a function pushes more values
onto the stack, it is effectively growing its frame.
This is a very low-level view: the picture as seen from the C/C++
programmer is illustrated elsewhere:
• Unixwiz.net Tech Tip:
Intel x86 Function-call Conventions - C Programmer's View
For the sake of discussion, we're using the terms that the Microsoft Visual C
compiler uses to describe these conventions, even though other platforms
may use other terms.
It's important to note that these are merely conventions, and any
collection of cooperating code can agree on nearly anything. There are
other conventions (passing parameters in registers, for instance)
that behave differently, and of course the optimizer can make mincemeat
of any clear picture as well.
Our focus here is to provide an overview, and not an authoritative
definition for these conventions.
In both __cdecl and __stdcall conventions,
the same set of three registers is involved in the function-call
frame:
Virtually everybody in the Intel assembler world uses the Intel
notation, but the GNU C compiler uses what they call the "AT&T syntax"
for backwards compatibility. This seems to us to be a really dumb idea,
but it's a fact of life.
There are minor notational differences between the two notations, but
by far the most annoying is that the AT&T syntax reverses the
source and destination operands. To move the immediate value 4 into
the EAX register:
More recent GNU compilers have a way to generate the Intel
syntax, but it's not clear if the GNU assembler takes it. In any
case, we'll use the Intel notation exclusively.
There are other minor differences that are not of much concern to
the reverse engineer.
The best way to understand the stack organization is to see each
step in calling a function with the __cdecl conventions. These
steps are taken automatically by the compiler, and though not all
of them are used in every case (sometimes no parameters, sometimes
no local variables, sometimes no saved registers), but this shows
the overall mechanism employed.
The __stdcall convention is mainly used by the Windows API, and
it's a bit more compact than __cdecl. The main difference is that
any given function has a hard-coded set of parameters, and this cannot
vary from call to call like it can in C (no "variadic functions").
Because the size of the parameter block is fixed, the burden of cleaning
these parameters off the stack can be shifted to the called function,
instead of being done by the calling function as in __cdecl. There
are several effects of this:
The x86 architecture provides a number of built-in mechanisms for
assisting with frame management, but they don't seem to be commonly used
by C compilers. Of particular interest is the ENTER instruction,
which handles most of the function-prolog code.
We're pretty sure these are functionally equivalent, but our 80386
processor reference suggests that the ENTER version is more
compact (6 bytes -vs- 9) but slower (15 clocks -vs- 6). The newer
processors are probably harder to pin down, but somebody has probably
figured out that ENTER is slower. Sigh.
Reference:
http://unixwiz.net/techtips/win32-callconv-asm.html
http://unixwiz.net/techtips/win32-callconv.html
One of the "big picture" issues in looking at compiled C code is the
function-calling conventions. These are the methods that a calling
function and a called function agree on how parameters and return values
should be passed between them, and how the stack is used by the function
itself. The layout of the stack constitutes the "stack frame", and
knowing how this works can go a long way to decoding how something works.
In C and modern CPU design conventions, the stack frame is a chunk of
memory, allocated from the stack, at run-time, each time a function is
called, to store its automatic variables. Hence nested or recursive calls
to the same function, each successively obtain their own separate frames.
Physically, a function's stack frame is the area between the addresses
contained in esp, the stack pointer, and ebp, the frame pointer (base
pointer in Intel terminology). Thus, if a function pushes more values
onto the stack, it is effectively growing its frame.
This is a very low-level view: the picture as seen from the C/C++
programmer is illustrated elsewhere:
• Unixwiz.net Tech Tip:
Intel x86 Function-call Conventions - C Programmer's View
For the sake of discussion, we're using the terms that the Microsoft Visual C
compiler uses to describe these conventions, even though other platforms
may use other terms.
__cdecl (pronounced see-DECK-'ll
rhymes with "heckle")- This convention is the most common because it supports semantics required by the C language. The C language supports variadic functions (variable argument lists, alá printf), and this means that the caller must clean up the stack after the function call: the called function has no way to know how to do this. It's not terribly optimal, but the C language semantics demand it.
- __stdcall
- Also known as __pascal, this requires that each function take a fixed number of parameters, and this means that the called function can do argument cleanup in one place rather than have this be scattered throughout the program in every place that calls it. The Win32 API primarily uses __stdcall.
It's important to note that these are merely conventions, and any
collection of cooperating code can agree on nearly anything. There are
other conventions (passing parameters in registers, for instance)
that behave differently, and of course the optimizer can make mincemeat
of any clear picture as well.
Our focus here is to provide an overview, and not an authoritative
definition for these conventions.
Register use in the stack frame
In both __cdecl and __stdcall conventions,
the same set of three registers is involved in the function-call
frame:
%ESP - Stack Pointer- This 32-bit register is implicitly manipulated by several CPU instructions (PUSH, POP, CALL, and RET among others), it always points to the last element used on the stack (not the first free element): this means that the PUSH and POP operations would be specified in pseudo-C as:
*--ESP = value; // push value = *ESP++; // pop
- The "Top of the stack" is an occupied location, not a free one, and is at the lowest memory address.
%EBP - Base Pointer- This 32-bit register is used to reference all the function parameters and local variables in the current stack frame. Unlike the %esp register, the base pointer is manipulated only explicitly. This is sometimes called the "Frame Pointer".
%EIP - Instruction Pointer- This holds the address of the next CPU instruction to be executed, and it's saved onto the stack as part of the CALL instruction. As well, any of the "jump" instructions modify the %EIP directly.
Assembler notation
Virtually everybody in the Intel assembler world uses the Intel
notation, but the GNU C compiler uses what they call the "AT&T syntax"
for backwards compatibility. This seems to us to be a really dumb idea,
but it's a fact of life.
There are minor notational differences between the two notations, but
by far the most annoying is that the AT&T syntax reverses the
source and destination operands. To move the immediate value 4 into
the EAX register:
mov $4, %eax // AT&T notation mov eax, 4 // Intel notation
More recent GNU compilers have a way to generate the Intel
syntax, but it's not clear if the GNU assembler takes it. In any
case, we'll use the Intel notation exclusively.
There are other minor differences that are not of much concern to
the reverse engineer.
Calling a __cdecl function
The best way to understand the stack organization is to see each
step in calling a function with the __cdecl conventions. These
steps are taken automatically by the compiler, and though not all
of them are used in every case (sometimes no parameters, sometimes
no local variables, sometimes no saved registers), but this shows
the overall mechanism employed.
Push parameters onto the stack, from right to left- Parameters are pushed onto the stack, one at a time, from right to left. Whether the parameters are evaluated from right to left is a different matter, and in any case this is unspecified by the language and code should never rely on this. The calling code must keep track of how many bytes of parameters have been pushed onto the stack so it can clean it up later.
Call the function- Here, the processor pushes contents of the %EIP (instruction pointer) onto the stack, and it points to the first byte after the CALL instruction. After this finishes, the caller has lost control, and the callee is in charge. This step does not change the %ebp register.
Save and update the %ebp- Now that we're in the new function, we need a new local stack frame pointed to by %ebp, so this is done by saving the current %ebp (which belongs to the previous function's frame) and making it point to the top of the stack.
push ebp mov ebp, esp // ebp « esp
- Once %ebp has been changed, it can now refer directly to the function's arguments as 8(%ebp), 12(%ebp). Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old instruction pointer.
Save CPU registers used for temporaries- If this function will use any CPU registers, it has to save the old values first lest it walk on data used by the calling functions. Each register to be used is pushed onto the stack one at a time, and the compiler must remember what it did so it can unwind it later.
Allocate local variables- The function may choose to use local stack-based variables, and they are allocated here simply by decrementing the stack pointer by the amount of space required. This is always done in four-byte chunks.
- Now, the local variables are located on the stack between the %ebp and %esp registers, and though it would be possible to refer to them as offsets from either one, by convention the %ebp register is used. This means that -4(%ebp) refers to the first local variable.
Perform the function's purpose- At this point, the stack frame is set up correctly, and this is represented by the diagram to the right. All the parameters and locals are offsets from the %ebp register:
16(%ebp) - third function parameter 12(%ebp) - second function parameter 8(%ebp) - first function parameter 4(%ebp) - old %EIP (the function's "return address") 0(%ebp) - old %EBP (previous function's base pointer) -4(%ebp) - first local variable -8(%ebp) - second local variable -12(%ebp) - third local variable - The function is free to use any of the registers that had been saved onto the stack upon entry, but it must not change the stack pointer or all Hell will break loose upon function return.
Release local storage- When the function allocates local, temporary space, it does so by decrementing from the stack point by the amount space needed, and this process must be reversed to reclaim that space. It's usually done by adding to the stack pointer the same amount which was subtracted previously, though a series of POP instructions could achieve the same thing.
Restore saved registers- For each register saved onto the stack upon entry, it must be restored from the stack in reverse order. If the "save" and "restore" phases don't match exactly, catastrophic stack corruption will occur.
Restore the old base pointer- The first thing this function did upon entry was save the caller's %ebp base pointer, and by restoring it now (popping the top item from the stack), we effectively discard the entire local stack frame and put the caller's frame back in play.
Return from the function- This is the last step of the called function, and the RET instruction pops the old %EIP from the stack and jumps to that location. This gives control back to the calling function. Only the stack pointer and instruction pointers are modified by a subroutine return.
Clean up pushed parameters- In the __cdecl convention, the caller must clean up the parameters pushed onto the stack, and this is done either by popping the stack into don't-care registers (for a few parameters) or by adding the parameter-block size to the stack pointer directly.
__cdecl -vs- __stdcall
The __stdcall convention is mainly used by the Windows API, and
it's a bit more compact than __cdecl. The main difference is that
any given function has a hard-coded set of parameters, and this cannot
vary from call to call like it can in C (no "variadic functions").
Because the size of the parameter block is fixed, the burden of cleaning
these parameters off the stack can be shifted to the called function,
instead of being done by the calling function as in __cdecl. There
are several effects of this:
- the code is a tiny bit smaller, because the parameter-cleanup code is
found once — in the called function itself — rather than in every
place the function is called. These may be only a few bytes per call,
but for commonly-used functions it can add up. This presumably means
that the code may be a tiny bit faster as well. - calling the function with the wrong number of parameters is
catastrophic - the stack will be badly misaligned, and general
havoc will surely ensue.
As an offshoot of #2, Microsoft Visual C takes special care of
functions that are B{__stdcall}. Since the number of parameters is
known at compile time, the compiler encodes the parameter byte count
in the symbol name itself, and this means that calling the function
wrong leads to a link error.
For instance, the function int foo(int a, int b) would generate
— at the assembler level — the symbol "_foo@8",
where "8" is the number
of bytes expected. This means that not only will a call with 1 or 3
parameters not resolve (due to the size mismatch), but neither
will a call expecting the __cdecl parameters (which looks for _foo).
It's a clever mechanism that avoids a lot of problems.
Variations and Notes
The x86 architecture provides a number of built-in mechanisms for
assisting with frame management, but they don't seem to be commonly used
by C compilers. Of particular interest is the ENTER instruction,
which handles most of the function-prolog code.
ENTER 10,0 PUSH ebp MOV ebp, esp SUB esp, 10
We're pretty sure these are functionally equivalent, but our 80386
processor reference suggests that the ENTER version is more
compact (6 bytes -vs- 9) but slower (15 clocks -vs- 6). The newer
processors are probably harder to pin down, but somebody has probably
figured out that ENTER is slower. Sigh.
Reference:
http://unixwiz.net/techtips/win32-callconv-asm.html
http://unixwiz.net/techtips/win32-callconv.html
Assembly Language Tutorial
Assembly Language Tutorial
Programming is an art.
There are plenty of compilers around to program.The famous C,the beautiful Java,the useful HTML,the elusive oracle and the simple .net…
So why assembly language????
Those of you who ever had a go at cracking and all other stuff will have a readymade answer to it.No cracking without a crack at assembly language.
If u are not in the cracking Biz why do you need it?
No programmer is complete without mastery over assembly language..your program ruins slow or there is some deep glitch.you work on it day and night and still the glitch remains a glitch.here comes the assembly language.You analyse the processes and come at a diagnosis…
Assembly language makes you powerful.it is the base on which everything is built.
Without assembly language you will always remain a novice.whatever you build or achieve.
So lets start the learning process.
You must have heard that the—-
• Assembly language is hard to learn and understand
• Its difficult to debug
• It’s a messy outfit
• Why do you want to save a little space using assembly language when you have so much space?
Believe Me, learning assembly language is easier than most high level languages..
Once you learn assembly language everything else comes naturally..
Assembly language has several benefits:
• Speed. Assembly language programs are generally the fastest programs around.
• Space. Assembly language programs are often the smallest.
• Capability. You can do things in assembly which are difficult or impossible in HLLs (High Level Language).
• Knowledge. Your knowledge of assembly language will help you write better programs,
even when using HLLs (High Level Language).
===========================================================
LESSON 1 -
THE REGISTERS AND SEGMENTS
unlike other languages there is no predefined commands like "writeln", "printf",…
assembly language doesnot provide those tools for you
So how does it work>?
Ok.. first they have predefine registers :
/* all of these are the data holders
AX - accumulator index
BX - Base index
CX - Count index
DX - Data index
*/
/* all of these are the pointing and index storage registers
SP - Stack pointer
BP - Base pointer
SI - Source index
DI - Destination index
IP - Instruction pointer
*/
/* all of these are segments holder
CS - Code segment
DS - Data segment
SS - Stack segment
ES - Extra segment
*/
FLAGS - Holds some of the function conditions
now to be more specific:
Data registers:
they are the basic registers for all the computer calcs, and position
each of the registers is 16bit and they are divided into two registers
high and low which are 8 bit :
AX - ah (high), al (lo)
BX - bh (high), bl (lo)
CX - ch (high), cl (lo)
DX - dh (high), dl (lo)
high is MSB - most significent byte
lo is LSB - least significent byte
Pointing registers:
each of these registers has an unique job :
SP - is the offset of the stack (-n-)
BP - a pointer for the stack (-n-)
SI - is the source index, uses as an offset in memory transfers
DI - is the destination index, uses as an offset in memory transfers
IP - is the offset of the current instruction (-n-)
(-n-) means don't change unless you know what your'e doing
Segment registers:
CS - is the segment of the code (-n-)
DS - is the segment (usually) of the data
SS - is the segment for the stack (-n-)
ES - is an extra segment, uses for memory transfers
Flags, will be disscussed later
Assembly language works with segments .each segment has a maximum limit which is 64K,
Now we create a segment.
when we have a segment we have to give it a definition,
For this we need the command "Assume" which gives each one of the segments
registers it's default segment,
Here is a typical segment—-
Sseg segment ; a semicolon (;) is a remark and will not be compiled
db 10 dup (?)
ends ; each segment has a name and the "segment" after it
; when we finished to define stuff in the segment
; we close it with ends (end segment)
Dseg segment
ends
Cseg segment
assume cs:cseg,ds:dseg,ss:sseg
ends
end
know as we saw segment is built as follow :
Name Segment
.
.
.
Ends
know in the dseg all the data will be stored, in the sseg the stack
and in the cseg the code.
Reference:
http://assembly.co.nr/
Programming is an art.
There are plenty of compilers around to program.The famous C,the beautiful Java,the useful HTML,the elusive oracle and the simple .net…
So why assembly language????
Those of you who ever had a go at cracking and all other stuff will have a readymade answer to it.No cracking without a crack at assembly language.
If u are not in the cracking Biz why do you need it?
No programmer is complete without mastery over assembly language..your program ruins slow or there is some deep glitch.you work on it day and night and still the glitch remains a glitch.here comes the assembly language.You analyse the processes and come at a diagnosis…
Assembly language makes you powerful.it is the base on which everything is built.
Without assembly language you will always remain a novice.whatever you build or achieve.
So lets start the learning process.
You must have heard that the—-
• Assembly language is hard to learn and understand
• Its difficult to debug
• It’s a messy outfit
• Why do you want to save a little space using assembly language when you have so much space?
Believe Me, learning assembly language is easier than most high level languages..
Once you learn assembly language everything else comes naturally..
Assembly language has several benefits:
• Speed. Assembly language programs are generally the fastest programs around.
• Space. Assembly language programs are often the smallest.
• Capability. You can do things in assembly which are difficult or impossible in HLLs (High Level Language).
• Knowledge. Your knowledge of assembly language will help you write better programs,
even when using HLLs (High Level Language).
===========================================================
LESSON 1 -
THE REGISTERS AND SEGMENTS
unlike other languages there is no predefined commands like "writeln", "printf",…
assembly language doesnot provide those tools for you
So how does it work>?
Ok.. first they have predefine registers :
/* all of these are the data holders
AX - accumulator index
BX - Base index
CX - Count index
DX - Data index
*/
/* all of these are the pointing and index storage registers
SP - Stack pointer
BP - Base pointer
SI - Source index
DI - Destination index
IP - Instruction pointer
*/
/* all of these are segments holder
CS - Code segment
DS - Data segment
SS - Stack segment
ES - Extra segment
*/
FLAGS - Holds some of the function conditions
now to be more specific:
Data registers:
they are the basic registers for all the computer calcs, and position
each of the registers is 16bit and they are divided into two registers
high and low which are 8 bit :
AX - ah (high), al (lo)
BX - bh (high), bl (lo)
CX - ch (high), cl (lo)
DX - dh (high), dl (lo)
high is MSB - most significent byte
lo is LSB - least significent byte
Pointing registers:
each of these registers has an unique job :
SP - is the offset of the stack (-n-)
BP - a pointer for the stack (-n-)
SI - is the source index, uses as an offset in memory transfers
DI - is the destination index, uses as an offset in memory transfers
IP - is the offset of the current instruction (-n-)
(-n-) means don't change unless you know what your'e doing
Segment registers:
CS - is the segment of the code (-n-)
DS - is the segment (usually) of the data
SS - is the segment for the stack (-n-)
ES - is an extra segment, uses for memory transfers
Flags, will be disscussed later
Assembly language works with segments .each segment has a maximum limit which is 64K,
Now we create a segment.
when we have a segment we have to give it a definition,
For this we need the command "Assume" which gives each one of the segments
registers it's default segment,
Here is a typical segment—-
Sseg segment ; a semicolon (;) is a remark and will not be compiled
db 10 dup (?)
ends ; each segment has a name and the "segment" after it
; when we finished to define stuff in the segment
; we close it with ends (end segment)
Dseg segment
ends
Cseg segment
assume cs:cseg,ds:dseg,ss:sseg
ends
end
know as we saw segment is built as follow :
Name Segment
.
.
.
Ends
know in the dseg all the data will be stored, in the sseg the stack
and in the cseg the code.
Reference:
http://assembly.co.nr/
No programmer is complete without mastery over assembly language
No programmer is complete without mastery over assembly language..your program ruins slow or there is some deep glitch. you work on it day and night and still the glitch remains a glitch. here comes the assembly language. You analyse the processes and come at a diagnosis…
Assembly language makes you powerful. it is the base on which everything is built.
Without assembly language you will always remain a novice. whatever you build or achieve.
Reference: http://assembly.co.nr/
Assembly language makes you powerful. it is the base on which everything is built.
Without assembly language you will always remain a novice. whatever you build or achieve.
Reference: http://assembly.co.nr/
What is exactly the base pointer and stack pointer? To what do they point?
What is exactly the base pointer and stack pointer? To what do they point?
From what I see, I'd say the stack pointer points always to the top of the stack, and the base pointer to the beggining of the the current function? Or what?
[] One important thing to note is that the stack grows "downwards" in memory. This means that to move the stack pointer upward you decrease its value. – Ben Strasser Sep 8 '09 at 19:07
[] One hint to differentiate what EBP/ESP and EIP are doing: EBP & ESP deal with data, while EIP deals with code. – rstevens Sep 8 '09 at 19:19
[] You mean that if I called a new function named for example DrawPixel(), appearing on the top of the current stack, ESP would decrease, is that it? And after the function returned it would increase again(so the picture would look just like it is right now)? – devoured elysium Sep 8 '09 at 19:20
[] In your graph, ebp (usually) is the "frame pointer", esp the "stack pointer". This allows to access locals via [ebp-x] and stack parameters via [ebp+x] consistently, independent of the stack pointer (which frequently changes within a function). Adressing could be done through ESP, freeing up EBP for other operations - but that way, debuggers can't tell call stack or values of locals. – peterchen Sep 8 '09 at 19:31
[] @Ben. Not nesacerily. Some compilers put stack frames into the heap. The concept of stack growing down is just that, a concept that makes it easy to understand. The implementation of the stack can be anything (using random chunks of the heap makes hacks that overwrite parts of the stack a lot harder as they are not as deterministic). – Martin York Sep 8 '09 at 20:06
[] esp is as you say it is, the top of the stack.
ebp is usually set to esp at the start of the function. Local variables are accessed by subtracting a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself actually points to the previous frame's base pointer, which enables stack walking in a debugger and viewing other frames local variables to work.
Most function prologs look something like:
push ebp ; Preserve current frame pointer
mov ebp, esp ; Create new frame pointer pointing to current stack top
sub esp, 20 ; allocate 20 bytes worth of locals on stack.
Then later in the function you may have code like (presuming both local variables are 4 bytes)
mov [ebp-4], eax ; Store eax in first local
mov ebx, [ebp - 8] ; Load ebx from second local
FPO or frame pointer omission optimization which you can enable will actually eliminate this and use ebp as another register and access locals directly off of esp, but this makes debugging a bit more difficult since the debugger can no longer directly access the stack frames of earlier function calls.
[] +1 and would do another one if possible. Nice and comprehensive explanation! – rstevens Sep 8 '09 at 19:14
[] Thanks for the explanation! But I am now kinda confused. Let's assume I call a function and I am in the first line of its prolog, still without having executed a single line from it. At that point, what is ebp's value? Does the stack have anything at that point besides the pushed arguments? Thanks! – devoured elysium Sep 9 '09 at 12:33
[] EBP is not magically changed, so until you've established a new EBP for your function you'll still have the callers value. And besides arguments, the stack will also hold the old EIP (return address) – MSalters Sep 9 '09 at 13:34
[] You have it right. The stack pointer points to the top item on the stack and the base pointer points to the "previous" top of the stack before the function was called.
When you call a function, any local variable will be stored on the stack and the stack pointer will be incremented. When you return from the function, all the local variables on the stack go out of scope. You do this by setting the stack pointer back to the base pointer (which was the "previous" top before the function call).
Doing memory allocation this way is very, very fast and efficient.
answered Sep 8 '09 at 18:48
Robert Cartaino
[] @Robert: When you say "previous" top of the stack before the function was called, you are ignoring both the parameters, which are pushed onto the stack just before calling the function and the caller EIP. This might confuse readers. Let's just say that in a standard stack frame, EBP points to the same place where ESP pointed just after entering the function. – wigy Sep 9 '09 at 13:24
[] EDIT: For a better description, see x86 Disassembly/Functions and Stack Frames in a WikiBook about x86 assembly. I try to add some info you might be interested in using Visual Studio.
Storing the caller EBP as the first local variable is called a standard stack frame, and this may be used for nearly all calling conventions on Windows. Differences exist whether the caller or callee deallocates the passed parameters, and which parameters are passed in registers, but these are orthogonal to the standard stack frame problem.
Speaking about Windows programs, you might probably use Visual Studio to compile your C++ code. Be aware that Microsoft uses an optimization called Frame Pointer Omission, that makes it nearly impossible to do walk the stack without using the dbghlp library and the PDB file for the executable.
This Frame Pointer Omission means that the compiler does not store the old EBP on a standard place and uses the EBP register for something else, therefore you have hard time finding the caller EIP without knowing how much space the local variables need for a given function. Of course Microsoft provides an API that allows you to do stack-walks even in this case, but looking up the symbol table database in PDB files takes too long for some use cases.
To avoid FPO in your compilation units, you need to avoid using /O2 or need to explicitly add /Oy- to the C++ compilation flags in your projects. You probably link against the C or C++ runtime, which uses FPO in the Release configuration, so you will have hard time to do stack walks without the dbghlp.dll.
link|flag edited Sep 9 '09 at 9:09
answered Sep 8 '09 at 19:20
wigy
http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
[] I don't get how EIP is stored on the stack. Shouldn't it be a register? How can a register be on the stack? Thanks! – devoured elysium Sep 8 '09 at 22:36
[] The caller EIP is pushed onto the stack by the CALL instruction itself. The RET instruction just fetches the top of the stack and puts it into the EIP. If you have buffer overruns, this fact might be used to jump into user code from a privileged thread. – wigy Sep 9 '09 at 9:05
[] First of all, the stack pointer points to the bottom of the stack since x86 stacks build from high address values to lower address values. The stack pointer is the point where the next call to push (or call) will place the next value. It's operation is equivalent to the C/C++ statement:
// push eax
--*esp = eax
// pop eax
eax = *esp++;
// a function call, in this case, the caller must clean up the function parameters
move eax,some value
push eax
call some address // this pushes the next value of the instruction pointer onto the
// stack and changes the instruction pointer to "some address"
add esp,4 // remove eax from the stack
// a function
push ebp // save the old stack frame
move ebp, esp
... // do stuff
pop ebp // restore the old stack frame
ret
The base pointer is top of the current frame. ebp generally points to your return address. ebp+4 points to the first parameter of your function (or the this value of a class method). ebp-4 points to the first local variable of your function, usually the old value of ebp so you can restore the prior frame pointer.
link|flag answered Sep 8 '09 at 18:59
jmucchiello
6,226719
[] That was indeed very helpful for me. – CDR Oct 4 '09 at 5:21
[] +1 High to low adressing, and ebp +- issues are very helpful to mention. – kolistivra Jun 22 at 23:09
[] ESP is the current stack pointer, which will change any time a word or address is pushed or popped onto/off off the stack. EBP is a more convenient way for the compiler to keep track of a function's parameters and local variables than using the ESP directly.
Generally (and this may vary from compiler to compiler), all of the arguments to a function being called are pushed onto the stack (usually in the reverse order that they're declared in the function prototype, but this varies). Then the function is called, which pushes the return address (EIP) onto the stack.
Upon entry to the function, the old EBP value is pushed onto the stack and EBP is set to the value of ESP. Then the ESP is decremented (because the stack grows downward in memory) to allocate space for the function's local variables and temporaries. From that point on, during the execution of the function, the arguments to the function are located on the stack at positive offsets from EBP (because they were pushed prior to the function call), and the local variables are located at negative offsets from EBP (because they were allocated on the stack after the function entry). That's why the EBP is called the frame pointer, because it points to the center of the function call frame.
Upon exit, all the function has to do is set ESP to the value of EBP, and then the old EBP value is popped, then the function returns (popping the return address into EIP).
link|flag answered Sep 8 '09 at 19:44
Loadmaster
[] Long time since I've done Assembly programming, but this link might be useful...
The processor has a collection of registers which are used to store data. Some of these are direct values while others are pointing to an area within RAM. Registers do tend to be used for certain specific actions and every operand in assembly will require a certain amount of data in specific registers.
The stack pointer is mostly used when you're calling other procedures. With modern compilers, a bunch of data will be dumped first on the stack, followed by the return address so the system will know where to return once it's told to return. The stack pointer will point at the next location where new data can be pushed to the stack, where it will stay until it's popped back again.
Base registers or segment registers just point to the address space of a large amount of data. Combined with a second regiser, the Base pointer will divide the memory in huge blocks while the second register will point at an item within this block. Base pointers therefor point to the base of blocks of data.
Do keep in mind that Assembly is very CPU specific. The page I've linked to provides information about different types of CPU's.
link|flag answered Sep 8 '09
http://www.osdata.com/topic/language/asm/register.htm
[] Segment registers are seperate on x86 - they're gs, cs, ss, and unless you are writing memory management software you never touch them. – Michael Sep 8 '09 at 18:51
[] ds is also a segment register and in the days of MS-DOS and 16-bits code, you definitely needed to change these segment registers occasionally, since they could never point to more than 64 KB of RAM. Yet DOS could access memory up to 1 MB because it used 20-bits address pointers. Later we got 32-bits systems, some with 36-bits address registers and now 64-bits registers. So nowadays you won't really need to change these segment registers anymore. – Workshop Alex Sep 8 '09 at 19:17
[] No modern OS uses 386 segments – Paul Betts Sep 8 '09 at 19:32
[] @Paul: WRONG! WRONG! WRONG! The 16-bits segments are replaced by 32-bits segments. In protected mode, this allows the virtualization of memory, basically allowing the processor to map physical addresses to logical ones. However, within your application, things still seem to be flat, since the OS has virtualized the memory for you. The kernel operates in protected mode, allowing applications to run in a flat memory model. See also en.wikipedia.org/wiki/Protected_mode – Workshop Alex Sep 9 '09 at 8:30
[] @Workshop ALex: That's a technicality. All modern OSes set all segments to [0, FFFFFFFF]. That doesn't really count. And if you would read the linked page, you'll see that all fancy stuff is done with pages, which are much more fine-grained then segments. – MSalters Sep 9 '09 at 13:39
[] @MSalters, that's not completely true. They do this for the processes that they execute themselves, providing virtual memory for those processes so these segments aren't needed. The operating System just hides the segmentation of memory, but it still uses segments internally. Watcom C/C++ for 32-bits systems actually supports the use of segments when doing far calls! More at users.pjwstk.edu.pl/~jms/qnx/help/watcom/… Watcom C/C++ is now OpenWatcom: openwatcom.org – Workshop Alex Sep 10 '09 at 8:12
[] Btw, I just debugged a Delphi application. The segment registers are 16 bits and CS contains the value 001Bh, DS, ES and SS are all 0023h, FS =s 003Bh and only GS is NULL. They are different values and therefor must each have a special function. (Possibly related to exception handling.) – Workshop Alex Sep 10 '09 at 8:19
[] Edit Yeah, this is mostly wrong. It describes something entirely different in case anyone is interested :)
Yes, the stack pointer points to the top of the stack (whether that's the first empty stack location or the last full one I'm unsure of). The base pointer points to the memory location of the instruction that's being executed. This is on the level of opcodes - the most basic instruction you can get on a computer. Each opcode and its parameters is stored in a memory location. One C or C++ or C# line could be translated to one opcode, or a sequence of two or more depending on how complex it is. These are written into program memory sequentially and executed. Under normal circumstances the base pointer is incremented one instruction. For program control (GOTO, IF, etc) it can be incremented multiple times or just replaced with the next memory address.
In this context, the functions are stored in program memory at a certain address. When the function is called, certain information is pushed on the stack that lets the program find its was back to where the function was called from as well as the parameters to the function, then the address of the function in program memory is pushed into the base pointer. On the next clock cycle the computer starts executing instructions from that memory address. Then at some point it will RETURN to the memory location AFTER the instruction that called the function and continue from there.
link|flag edited Sep 8 '09 at 19:11
answered Sep 8 '09 at 18:46
Stephen Friederichs
2
[] EBP does not point to current instruction, that's eip. – Michael Sep 8 '09 at 18:50
I'm having a bit of trouble understanding what the ebp is. If we have 10 lines of MASM code, that means that as we go down running those lines, ebp will be always increasing? – devoured elysium Sep 8 '09 at 18:58
[] @Devoured - No. That is not true. eip will be increasing. – Michael Sep 8 '09 at 19:00
You mean that what I said is right but not for EBP, but for IEP, is that it? – devoured elysium Sep 8 '09 at 19:03
[] Yes. EIP is the instruction pointer and is implicitly modified after each instruction is executed. – Michael Sep 8 '09 at
[] Oooh my bad. I'm thinking of a different pointer. I think I'll go wash my brain out. – Stephen Friederichs Sep 8 '09 at 19:10
Reference:
http://stackoverflow.com/questions/1395591/what-is-exactly-the-base-pointer-and-stack-pointer-to-what-do-they-point
http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
http://en.wikipedia.org/wiki/X86_assembly_language
From what I see, I'd say the stack pointer points always to the top of the stack, and the base pointer to the beggining of the the current function? Or what?
[] One important thing to note is that the stack grows "downwards" in memory. This means that to move the stack pointer upward you decrease its value. – Ben Strasser Sep 8 '09 at 19:07
[] One hint to differentiate what EBP/ESP and EIP are doing: EBP & ESP deal with data, while EIP deals with code. – rstevens Sep 8 '09 at 19:19
[] You mean that if I called a new function named for example DrawPixel(), appearing on the top of the current stack, ESP would decrease, is that it? And after the function returned it would increase again(so the picture would look just like it is right now)? – devoured elysium Sep 8 '09 at 19:20
[] In your graph, ebp (usually) is the "frame pointer", esp the "stack pointer". This allows to access locals via [ebp-x] and stack parameters via [ebp+x] consistently, independent of the stack pointer (which frequently changes within a function). Adressing could be done through ESP, freeing up EBP for other operations - but that way, debuggers can't tell call stack or values of locals. – peterchen Sep 8 '09 at 19:31
[] @Ben. Not nesacerily. Some compilers put stack frames into the heap. The concept of stack growing down is just that, a concept that makes it easy to understand. The implementation of the stack can be anything (using random chunks of the heap makes hacks that overwrite parts of the stack a lot harder as they are not as deterministic). – Martin York Sep 8 '09 at 20:06
[] esp is as you say it is, the top of the stack.
ebp is usually set to esp at the start of the function. Local variables are accessed by subtracting a constant offset from ebp. All x86 calling conventions define ebp as being preserved across function calls. ebp itself actually points to the previous frame's base pointer, which enables stack walking in a debugger and viewing other frames local variables to work.
Most function prologs look something like:
push ebp ; Preserve current frame pointer
mov ebp, esp ; Create new frame pointer pointing to current stack top
sub esp, 20 ; allocate 20 bytes worth of locals on stack.
Then later in the function you may have code like (presuming both local variables are 4 bytes)
mov [ebp-4], eax ; Store eax in first local
mov ebx, [ebp - 8] ; Load ebx from second local
FPO or frame pointer omission optimization which you can enable will actually eliminate this and use ebp as another register and access locals directly off of esp, but this makes debugging a bit more difficult since the debugger can no longer directly access the stack frames of earlier function calls.
[] +1 and would do another one if possible. Nice and comprehensive explanation! – rstevens Sep 8 '09 at 19:14
[] Thanks for the explanation! But I am now kinda confused. Let's assume I call a function and I am in the first line of its prolog, still without having executed a single line from it. At that point, what is ebp's value? Does the stack have anything at that point besides the pushed arguments? Thanks! – devoured elysium Sep 9 '09 at 12:33
[] EBP is not magically changed, so until you've established a new EBP for your function you'll still have the callers value. And besides arguments, the stack will also hold the old EIP (return address) – MSalters Sep 9 '09 at 13:34
[] You have it right. The stack pointer points to the top item on the stack and the base pointer points to the "previous" top of the stack before the function was called.
When you call a function, any local variable will be stored on the stack and the stack pointer will be incremented. When you return from the function, all the local variables on the stack go out of scope. You do this by setting the stack pointer back to the base pointer (which was the "previous" top before the function call).
Doing memory allocation this way is very, very fast and efficient.
answered Sep 8 '09 at 18:48
Robert Cartaino
[] @Robert: When you say "previous" top of the stack before the function was called, you are ignoring both the parameters, which are pushed onto the stack just before calling the function and the caller EIP. This might confuse readers. Let's just say that in a standard stack frame, EBP points to the same place where ESP pointed just after entering the function. – wigy Sep 9 '09 at 13:24
[] EDIT: For a better description, see x86 Disassembly/Functions and Stack Frames in a WikiBook about x86 assembly. I try to add some info you might be interested in using Visual Studio.
Storing the caller EBP as the first local variable is called a standard stack frame, and this may be used for nearly all calling conventions on Windows. Differences exist whether the caller or callee deallocates the passed parameters, and which parameters are passed in registers, but these are orthogonal to the standard stack frame problem.
Speaking about Windows programs, you might probably use Visual Studio to compile your C++ code. Be aware that Microsoft uses an optimization called Frame Pointer Omission, that makes it nearly impossible to do walk the stack without using the dbghlp library and the PDB file for the executable.
This Frame Pointer Omission means that the compiler does not store the old EBP on a standard place and uses the EBP register for something else, therefore you have hard time finding the caller EIP without knowing how much space the local variables need for a given function. Of course Microsoft provides an API that allows you to do stack-walks even in this case, but looking up the symbol table database in PDB files takes too long for some use cases.
To avoid FPO in your compilation units, you need to avoid using /O2 or need to explicitly add /Oy- to the C++ compilation flags in your projects. You probably link against the C or C++ runtime, which uses FPO in the Release configuration, so you will have hard time to do stack walks without the dbghlp.dll.
link|flag edited Sep 9 '09 at 9:09
answered Sep 8 '09 at 19:20
wigy
http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
[] I don't get how EIP is stored on the stack. Shouldn't it be a register? How can a register be on the stack? Thanks! – devoured elysium Sep 8 '09 at 22:36
[] The caller EIP is pushed onto the stack by the CALL instruction itself. The RET instruction just fetches the top of the stack and puts it into the EIP. If you have buffer overruns, this fact might be used to jump into user code from a privileged thread. – wigy Sep 9 '09 at 9:05
[] First of all, the stack pointer points to the bottom of the stack since x86 stacks build from high address values to lower address values. The stack pointer is the point where the next call to push (or call) will place the next value. It's operation is equivalent to the C/C++ statement:
// push eax
--*esp = eax
// pop eax
eax = *esp++;
// a function call, in this case, the caller must clean up the function parameters
move eax,some value
push eax
call some address // this pushes the next value of the instruction pointer onto the
// stack and changes the instruction pointer to "some address"
add esp,4 // remove eax from the stack
// a function
push ebp // save the old stack frame
move ebp, esp
... // do stuff
pop ebp // restore the old stack frame
ret
The base pointer is top of the current frame. ebp generally points to your return address. ebp+4 points to the first parameter of your function (or the this value of a class method). ebp-4 points to the first local variable of your function, usually the old value of ebp so you can restore the prior frame pointer.
link|flag answered Sep 8 '09 at 18:59
jmucchiello
6,226719
[] That was indeed very helpful for me. – CDR Oct 4 '09 at 5:21
[] +1 High to low adressing, and ebp +- issues are very helpful to mention. – kolistivra Jun 22 at 23:09
[] ESP is the current stack pointer, which will change any time a word or address is pushed or popped onto/off off the stack. EBP is a more convenient way for the compiler to keep track of a function's parameters and local variables than using the ESP directly.
Generally (and this may vary from compiler to compiler), all of the arguments to a function being called are pushed onto the stack (usually in the reverse order that they're declared in the function prototype, but this varies). Then the function is called, which pushes the return address (EIP) onto the stack.
Upon entry to the function, the old EBP value is pushed onto the stack and EBP is set to the value of ESP. Then the ESP is decremented (because the stack grows downward in memory) to allocate space for the function's local variables and temporaries. From that point on, during the execution of the function, the arguments to the function are located on the stack at positive offsets from EBP (because they were pushed prior to the function call), and the local variables are located at negative offsets from EBP (because they were allocated on the stack after the function entry). That's why the EBP is called the frame pointer, because it points to the center of the function call frame.
Upon exit, all the function has to do is set ESP to the value of EBP, and then the old EBP value is popped, then the function returns (popping the return address into EIP).
link|flag answered Sep 8 '09 at 19:44
Loadmaster
[] Long time since I've done Assembly programming, but this link might be useful...
The processor has a collection of registers which are used to store data. Some of these are direct values while others are pointing to an area within RAM. Registers do tend to be used for certain specific actions and every operand in assembly will require a certain amount of data in specific registers.
The stack pointer is mostly used when you're calling other procedures. With modern compilers, a bunch of data will be dumped first on the stack, followed by the return address so the system will know where to return once it's told to return. The stack pointer will point at the next location where new data can be pushed to the stack, where it will stay until it's popped back again.
Base registers or segment registers just point to the address space of a large amount of data. Combined with a second regiser, the Base pointer will divide the memory in huge blocks while the second register will point at an item within this block. Base pointers therefor point to the base of blocks of data.
Do keep in mind that Assembly is very CPU specific. The page I've linked to provides information about different types of CPU's.
link|flag answered Sep 8 '09
http://www.osdata.com/topic/language/asm/register.htm
[] Segment registers are seperate on x86 - they're gs, cs, ss, and unless you are writing memory management software you never touch them. – Michael Sep 8 '09 at 18:51
[] ds is also a segment register and in the days of MS-DOS and 16-bits code, you definitely needed to change these segment registers occasionally, since they could never point to more than 64 KB of RAM. Yet DOS could access memory up to 1 MB because it used 20-bits address pointers. Later we got 32-bits systems, some with 36-bits address registers and now 64-bits registers. So nowadays you won't really need to change these segment registers anymore. – Workshop Alex Sep 8 '09 at 19:17
[] No modern OS uses 386 segments – Paul Betts Sep 8 '09 at 19:32
[] @Paul: WRONG! WRONG! WRONG! The 16-bits segments are replaced by 32-bits segments. In protected mode, this allows the virtualization of memory, basically allowing the processor to map physical addresses to logical ones. However, within your application, things still seem to be flat, since the OS has virtualized the memory for you. The kernel operates in protected mode, allowing applications to run in a flat memory model. See also en.wikipedia.org/wiki/Protected_mode – Workshop Alex Sep 9 '09 at 8:30
[] @Workshop ALex: That's a technicality. All modern OSes set all segments to [0, FFFFFFFF]. That doesn't really count. And if you would read the linked page, you'll see that all fancy stuff is done with pages, which are much more fine-grained then segments. – MSalters Sep 9 '09 at 13:39
[] @MSalters, that's not completely true. They do this for the processes that they execute themselves, providing virtual memory for those processes so these segments aren't needed. The operating System just hides the segmentation of memory, but it still uses segments internally. Watcom C/C++ for 32-bits systems actually supports the use of segments when doing far calls! More at users.pjwstk.edu.pl/~jms/qnx/help/watcom/… Watcom C/C++ is now OpenWatcom: openwatcom.org – Workshop Alex Sep 10 '09 at 8:12
[] Btw, I just debugged a Delphi application. The segment registers are 16 bits and CS contains the value 001Bh, DS, ES and SS are all 0023h, FS =s 003Bh and only GS is NULL. They are different values and therefor must each have a special function. (Possibly related to exception handling.) – Workshop Alex Sep 10 '09 at 8:19
[] Edit Yeah, this is mostly wrong. It describes something entirely different in case anyone is interested :)
Yes, the stack pointer points to the top of the stack (whether that's the first empty stack location or the last full one I'm unsure of). The base pointer points to the memory location of the instruction that's being executed. This is on the level of opcodes - the most basic instruction you can get on a computer. Each opcode and its parameters is stored in a memory location. One C or C++ or C# line could be translated to one opcode, or a sequence of two or more depending on how complex it is. These are written into program memory sequentially and executed. Under normal circumstances the base pointer is incremented one instruction. For program control (GOTO, IF, etc) it can be incremented multiple times or just replaced with the next memory address.
In this context, the functions are stored in program memory at a certain address. When the function is called, certain information is pushed on the stack that lets the program find its was back to where the function was called from as well as the parameters to the function, then the address of the function in program memory is pushed into the base pointer. On the next clock cycle the computer starts executing instructions from that memory address. Then at some point it will RETURN to the memory location AFTER the instruction that called the function and continue from there.
link|flag edited Sep 8 '09 at 19:11
answered Sep 8 '09 at 18:46
Stephen Friederichs
2
[] EBP does not point to current instruction, that's eip. – Michael Sep 8 '09 at 18:50
I'm having a bit of trouble understanding what the ebp is. If we have 10 lines of MASM code, that means that as we go down running those lines, ebp will be always increasing? – devoured elysium Sep 8 '09 at 18:58
[] @Devoured - No. That is not true. eip will be increasing. – Michael Sep 8 '09 at 19:00
You mean that what I said is right but not for EBP, but for IEP, is that it? – devoured elysium Sep 8 '09 at 19:03
[] Yes. EIP is the instruction pointer and is implicitly modified after each instruction is executed. – Michael Sep 8 '09 at
[] Oooh my bad. I'm thinking of a different pointer. I think I'll go wash my brain out. – Stephen Friederichs Sep 8 '09 at 19:10
Reference:
http://stackoverflow.com/questions/1395591/what-is-exactly-the-base-pointer-and-stack-pointer-to-what-do-they-point
http://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
http://en.wikipedia.org/wiki/X86_assembly_language
Subscribe to:
Posts (Atom)