http://homepage.mac.com/liwei999/Publications_PDF/YF3.txt
3. Dependency Patterns for Wordcategories: ten elementary tree structures
Dependency patterns are represented by the ten elementary tree structures
respectively, which are derived from the above list in Section 2.
These trees display the kinds of different dependants a word from a certain
wordcategory can govern. There can be two or more dependants of the the same
type. Adjuncts can just be doubled, i.e. there can for example be two or more
adverbial adjuncts in the sentence, where the elementary tree gives just one
branch for such a dependant, alluding to the other possible instances of the
same branch type. Most complements, to the contrary, cannot be doubled in this
way, their number being fixed to one of every complement type.
In the following figures, the node represents a word from the wordcategory given,
while the branches are labelled with the names of the possible dependants. Note
that these trees do not contain valency information. If a verb is said to be able
to govern nine complements, this is a statement about the maximal governing
capacity of verbs. A special verb may have a smaller number of possible
complements, and, moreover, some of these complements can be facultative. These
two facts form a part of the valency information of the word.
Note: there are two categories of words, adverbs and interjections, that
cannot govern anything.
3.01 Elementary dependency tree for verbs
1)
Verb
/ / / / / / / / /
/ / / / / / / / /
/ / / / / / / / /
SUB OBJ OBJ2 SUBOB SOC PC LCV VqC BaC
complements
Verb
\ \ \ \ \ \ \ \ \
\ \ \ \ \ \ \ \ \
\ \ \ \ \ \ \ \ \
AdvA VA PMOD TOP VCoA AspA CirA ZyqA BeiA
adjuncts
Basic Constituent Order: CirA<->TOP/(OBJ)-SUB-(PC)/BaC<->AdvA-BeiA--V--
VCoA-AspA-(PC)-(OBJ)/SUBOB-SOC-VqC-PMOD/LCV-VA-ZyqA
Note: A<->B indicates that A precedes B more often than B does A. A/B shows
the syntactic impossibility of the concurence of A and B, hence eliminating
the problem of their relative order. (A)--governor--(A) signifies that A can
be put either before or after the governor: for OBJ, the unmarked order is V
--OBJ with OBJ--V as its transformation; for PC, the order is decided by the
valency of the verb governor.
It is widely believed that the main order for Chinese is S-V-O, which is only
right to some extent. Actually, the position for object is much freer than
commonly expected although subject is almost always put before the predicate
verb. The permutations for S, V, O are: 1. S-V-O; 2. S-O-V; 3. V-S-O; 4. V-O-
S; 5. O-S-V; 6. O-V-S. In standard written Chinese, there are no pattern 6,
pattern 3 and pattern 4. (Yet, in spoken Chinese we can easily hear such a
sentence as "地DI (floor) 扫SAO (sweep) 了LE (pst.) 吗ME (chu), 你NI (you) ?"
(Have you swept the floor?). We might list its possible variations with sample
sentences as follows:
Basic pattern: SVO
Variation Sample sentence Remarks
SOV 我 南京 去 过, 上海 没 去。 SOV is often present in
(I Nanjing go ? Shanghai not go) parallel structures, i.e.
(I have been to Nanjing, never to Shanghai) compound sentences.
OSV 南京 我 去 过。 OSV is far more often
(I have been to Nanjing) used than SOV.
Then in the surface sequence N1+N2+V, how do we know whether it is in the form
SOV or OSV? The decisive factor seems to come from semantic analysis rather
than syntactic analysis. (also see 5.4.3)
3.02 Elementary dependency tree for adjectives
Adjective
/ / \ \ \ \ \ \
/ / \ \ \ \ \ \
/ / \ \ \ \ \ \
/ / \ \ \ \ \ \
SUB PC AdvA PMOD TOP ACoA CirA ZyqA
complements adjuncts
Basic Constituent Order: CirA<->TOP-SUB-(PC)-AdvA--A--ACoA-(PC)-PMOD-ZyqA
3.03 Elementary dependency tree for nouns
Noun
/ / / / \ \ \ \
/ / / / \ \ \ \
/ / / / \ \ \ \
SUB MnC NC LCN AtrA DetA NCoA ZyqA
complements adjuncts
Basic Constituent Order: SUB-DetA/LCN-AtrA--N--NC-MnC-NCoA-ZyqA
3.04 Elementary dependency tree for Pronouns
Pronoun
/ \
/ \
MnC AppA
complement adjunct
Basic Constituent Order: AppA--D--MnC
.pa
�3.05 Elementary dependency tree for Prepositions
Preposition
/
/
CP
complement
Basic Constituent Order: P--CP
3.06 Elementary dependency tree for Postpositions
Postposition
/
/
CW
complement
Basic Constituent Order: CW--W
3.07 Elementary dependency tree for Numerals
Numeral
/ \
/ \
DiC SA
complement adjunct
Basic Constituent Order: DiC--S--SA
3.08 Elementary dependency tree for Classifiers
Classifier
\
\
LA
adjunct
Basic Constituent Order: LA--L
3.09 Ementary dependency tree for Particles
Particle
/ / / /
/ / / /
SUB CDe CDe2 CDe3
complements
Basic Constituent Order: SUB-CDe/CDe2--Z--CDe3
3.10 Elementary dependency tree for Conjunctions
Conjunction
/ / \ \
/ / \ \
CC X-C X-C Y
complement
Basic Constituent Order: (Y)-X-C--C--X-C-CC-(Y)
The two dependants marked X-C represent any dependants that can be
coordinated. The coordinating conjunction in this case takes the syntactic
label from the branch it depends on and copies it to the two coordinated
dependants. A dependant that depends on both of the two coordinated items as
a whole can be added as Y. A conjunction can govern either the two dependants
marked X-C plus any number of Y's, defined by the X-C's or a Complement of
Conjunction (CC). Please refer to Section 5 "Sample Trees" for a
straightforward understanding of the above, and for details about the role of
conjunction see 2.3.6 in "Syntactic Structures in DLT" (Schubert, 1986).
4. Sample Trees
4.01 每D 样L 东西N , 每D 件L 事情N, 由P 谁D 管V, 怎么F 管V, 都F 落实V
MEI YANG DONGXI, MEI JIAN SHIQING, YOU SHUI GUAN, ZENME GUAN, DOU LUOSHI
到P 每D 个L 人N 头N 上W 。
DAO MEI GE REN TOU SHANG.
落实LUOSHI
SUB / AdvA / \ PC
, 都DOU 到DAO
OBJ / SUB-C / \ SUB-C \ CP
, 管GUAN 管GUAN 上SHANG
OBJ-C / \ OBJ-C \ BeiA \ AdvA \ CW
东西DONGXI 事情SHIQING 由YOU 怎么ZENME 头TOU
DetA / \ LCN \ CP / AtrA
样YANG 件JIAN 谁SHUI 人REN
LA / \ LA DetA \
每MEI 每MEI 个GE
LA \
每MEI
4.02 她D 看V 了Z 看V 表N, 计算V 着Z 乘V 哪D 一S 路L 汽车N 快A, 什么D
TA KAN LE KAN BIAO, JISUAN ZHE CHENG NA YI LU QICHE KUAI, SHIME
时候Nt 可以V 赶V 到P 幼儿N 园N, 什么D 时候Nt 可以V 抱V 着Z 女儿V 赶V 到P 家N。
SHIHOU KEYI GAN DAO YOUER YUAN, SHIME SHIHOU KEYI BAO ZHE NUER GAN DAO JIA.
看V
SUB / AdvA / PMOD / \ OBJ \ PMOD
她D 了Z 看V' 表N 计算V
OBJ / \ AspA
,C 着Z
OBJ-C / \ OBJ-C
快A ,C
SUB / OBJ-C / \ OBJ-C
乘V 可以Vz 可以Vz
OBJ / OBJ / \ AdvA AdvA / \ OBJ
汽车N 赶V 时候Nt 时候Nt 赶V
LCN / PC / DetA / DetA / AdvA / \ PC
路L 到P 什么D 什么D 抱V 到P
LA / LA / CP / AspA / \ OBJ \ CP
哪D 一S 园N 着Z 女儿N 家N
AtrA /
幼儿N
4.03 我D 这D 时Nt 又F 忽然F 想V 起Z, 小A 林Nz 要V 我D 给P 他D 买V 一S 本L
WO ZHE SHI YOU HURAN XIANG QI, XIAO LIN YAO WO GEI TA MAI YI BEN
书N, 刚才F 在P 书N 店N 里W 忘V 了Z 问V 了Z。
SHU, GANGCAI ZAI SHU DIAN LI WANG LE WEN LE.
,C
-C / \ -C
想V 忘V
SUB / AdvA / AdvA / \ AdvA \ VqC \ OBJ AdvA/ AdvA/\AspA \OBJ \ZyqA
我D 时Nt 又F 忽然F 起Z 要V 刚才F 在P 了Z 问V 了Z
DetA / SUB / SUBOB/ \SOC \ CP
这D 林N 我D 买V 里W
AtrA / AdvA / \ OBJ \ CW
小A 给P 书N 店N
CP / \ LCN \ AtrA
他D 本L 书N
\ LA
一S
4.04 但C 那D 时Nt 我D 在P 上海Nz 也F 有V 一S 个L 惟一A 的Z 不但C 敢V 于P
DAN NA SHI WO ZAI SHANGHAI YE YOU YI GE WEIYI DE BUDAN GAN YU
随便A 谈V 笑V, 而且C 还F 敢V 于P 托V 他D 办V 点D 私A 事N 的Z 人N, 那D 就F
SUIBIAN TAN XIAO, ERQIE HAI GAN YU TUO TA BAN DIAN SI SHI DE REN, NA JIU
是V 送V 书N 去C 给V 白莽Nz 的Z 柔石Nz。
SHI SONG SHU QU GEI BAIMANG DE ROUSHI. ,C
-C / \ -C
有V 是V
CirA / AdvA / SUB / AdvA / \ AdvA \ OBJ SUB / \ AdvA \ OBJ
但C 时Nt 我D 在P 也F 人N 那D 就F 柔石N
DetA / CP / DetA /AtrA / \ AtrA AtrA /
那D 上海N 个L 的Z 的Z 的Z
LA / CDe / \ CDe \ CDe
一S 惟一A ,C 送V
CDe-C / \ CDe-C OBJ / \ VA
不但Cdp 而且Cdp 书N 去C
CC / \ CC CC /
敢V 敢V 给
PC / AdvA / \ PC \ OBJ
于P 还 于P 白莽
CP / \ CP
谈V 托V
AdvA / \ VCoA SUBOB / \ SOC
随便 笑 他D 办V
OBJ /
事N
Det / \ AtrA
点D 私A
4.05 胶N 合V 板N 是Vs 把P 原木N 旋切V 或C 刨切V 成P 单A 片N 薄A 板N,
JIAO HE BAN SHI BA YUANMU XUANQIE HUO PAOQIE CHENG DAN PIAN BO BAN,
经过V 干燥A 、涂V 胶N, 并C 按P 木材N 纹理N 方向N 纵A 横A 交错V
JINGGUO GANZAO、TU JIAO, BING AN MUCAI WENLI FANGXIANG ZONG HENG JIAOCUO
相F 叠V, 在P 加V 热A 或C 不F 加V 热A 的Z 条件N 下W 压制V 而C 成V 的Z
XIANG DIE, ZAI JIA RE HUO BU JIA RE DE TIAOJIAN XIA YAZHI ER CHENG DE
一S 种L 板材N。
YI ZHONG BANCAI.
是Vs
SUB / \ OBJ
板N 板材N
AtrA / AtrA / \ DetA
合V 的Z 种L
SUB / CDe / \ LA
胶N 并C 一S
CDe-C / \ CDe-C
,C ,C
CDe-C / \ CDe-C CDe-C / \ CDe-C
或C 经过V 交错V 而C
BaC / CDe-C / \ CDe-C \ PMOD OBJ / AdvA/AdvA / \ VA CDe-C / \ CDe-C
把P 旋切V 刨切V 成P 、C 按P 横A 叠V 压制V 成V
CP / CP / OBJ-C / \ OBJ-C \ CP \ ACoA \ AdvA \ AdvA
原木N 板N 干燥A 涂V 方向N 纵A 相F 在P
AtrA / \ AtrA OBJ / \ AtrA CP /
片N 薄A 胶N 纹理N 下W
AtrA / AtrA / CW /
单A 木材N 条件N
AtrA /
的Z
CDe /
或C
CDe-C / \ CDe-C
加V 加V
PMOD / AdvA / \ PMOD
热A 不F 热A
-----------------------------------------------------------------------------
Note: these sample sentences are adopted from "800 Words in Contemporary
Chinese" by Lu Shuxiang (1981).
5. Some Issues on Establishing a Chinese Formal Syntax
5.1 Syntactic model and semantic model
语言模型至少包括句法模型和语义模型两大部分。形式和内容是同一事物不可分割
的两个方面, 因此, 有人主张句法分析与语义分析同时进行, 建立句法和语义合一的模
型。这两种分析是分开还是合并, 看来各有利弊。分开显得干净利落, 模块分明, 也有
利于模型的纯粹化和抽象化, 但在计算机上实现, 可能带来组合爆炸。合并处理显得紧
凑, 开销小, 效率高, 减少了一些重复查寻, 但对软件的要求比较高, 模型本身也显得
臃肿。本模型是一套句法形式模型,首先为采用句法和语义分开策略的荷兰DLT多语机译
系统服务, 但也为句法语义同时分析的我们的JFY-IV系统的应用留下了扩充的口子。
5.2 Explicit forms and implicit forms
建立形式文法的基点当然是语言形式。究竟什么是形式呢? 对于书面语来说, 文句
是有规律的字符串, 所以, 其形式只能是字符(字形、词形、成语形)及其字符间的次序
(字序、词序、词组序)。考察前者, 我们发现, 所有人类语言的词都可以分作两大类,
一类是封闭词, 通常所谓功能词, 它们出现频率高, 数量有限; 另一类是开放词, 不段
有增加和淘汰, 难以枚举。封闭类好办, 其直接量(字形、词形)就是最清楚的句法形式
标志。开放类直接量当然也是形式, 必要时也可以利用(比如成语加工), 但因其数量太
多, 无法用枚举法建立抽象模型. 有形态的语言可以根据其易于识别并可以枚举的种种
形态, 主要是词尾, 找到一些形式标记。而象汉语这样缺乏形态的语言则没有这种便利
。然而, 要想建立一个抽象的形式句法模型, 单单依靠封闭类直接量、词序和形态这些
显性形式几乎是不可能的, 即便对于迄今为止形态最发达的人类语言也是如此。形态不
过是词的内在组合特性的一种外在体现, 而组合特性多种多样, 再发达的形态也只能表
现其中一部分。所以, 形式文法还要求助于一种所谓隐性形式, 就是对词--特别是开放
词--的形式分类。所谓形式分类, 就是依据词的句法组合能力进行的分类, 如动词、名
词等大类的划分, 再如单宾动词、双宾动词等子类的划分, 等等。
5.3 Fomal analysis and semantic analysis
应该指出的是, 单单依靠形式, 不论是显性形式还是隐性形式, 要想完全实现无结
构二义性的分析也还是不可能的。句法二义性结构是普遍存在的语言现象, 缺乏形态的
语言更是如此。因此, 必须允许打出多棵句法树, 有待其后的其他分析, 主要是语义分
析去过滤筛选。在本句法的基础上, 建立一部配有词的各种形式分类的词典和一部句法
规则库, 利用扩展转移网络ATN软件手段, 就可以对于汉语文句进行自动分析, 产生一
棵或多棵相应的带有从属关系标记的合法的句法树。这样的句法树是下一步语义分析的
入口。如:
总之F, 我D 们Z 的Z 工作N/V 成绩N 很F 大A。
ZONGZHI, WO MEN DE GONGZUO CHENGJI HEN DA.
In a word, our working achievements are great.
(1) 大DA
状句 / 主题 / 主语/ \状语
总之ZONGZHI 工作GONGZUO 成绩CHENGJI 很HEN
定语 /
的DE
补的 /
我WO
\ 复数
们MEN
(2) 大DA
状句 / 主语 / \ 状语
总之ZONGZHI 成绩CHENGJI 很HEN
定语 / \ 定语
的DE 工作GONGZUO
补的 /
我WO
\ 复数
们MEN
5.4 Chinese word order
6. BIBLIOGRAPHY
1. Lu Shuxiang (1981): "800 Woeds in Contemporary Chinese", Beijing, Shangwu
2. Liu, Zhuo; Fu, Aiping & Li, Wei (1989) JFY-IV Machine Translation
System, In Proceedings of MT SUMMIT II, pp.88-93, Munich.
3. Lucien Tesniere (1959): "Elements de Syntaxe Structurale", Paris:
Klincksieck.
4. Klaus Schubert (1986): "Syntactic Tree Structures in DLT", published
by BSO/Research, Utrecht.
5. Bieke van der Korst (1986): "A Dependency Syntax for English", BSO/DLT Research Report, Utrecht.
6. Engel, Ulrich (1982): "Syntax der deutschen Gegenwartssprache", Berlin:
Schmidt.
-- END --
APPENDIX I: ABSTRACT (in Chinese)
本文是以Tesniere从属关系学说为基础对现代汉语语法作系统研究的一次尝试。
自然语言的机器处理一般需要经历四个步骤: 语言理论 --> 语言模型 --> 算法设计
--> 程序实现。本文属于第二阶段的工作。语言模型至少包括句法模型和语义模型两
大部分, 也可以是句法和语义合一的模型。本文提供一套描述汉语结构(层次和关系)
的句法形式模型。
本模型将汉语词划分成12个大类和若干个小类,并运用这些分类、封闭词和词序,
形式化地定义了现代汉语书面语36种从属关系, 其中20个补足语, 16个附加语。在本
句法的基础上, 建立一部配有词的各种形式分类的词典和一部句法规则库, 利用扩展
转移网络ATN, 就可以对于汉语文句进行自动分析,产生一棵或多棵相应的带有从属关
系标记的合法的句法树。这样的句法树是下一步语义分析的入口。此外, 本句法也同
样可以做汉语生成系统的基础。当然, 要真正实现一个比较完善的汉语生成系统, 还
有许多具体的工作要做。
形式和内容是同一事物不可分割的两个方面, 因此, 有人主张句法分析与语义分
析同时进行。这两种分析是分开还是合并, 看来各有利弊。分开显得干净利落, 模块
分明, 也有利于模型的纯粹化和抽象化, 但在计算机上实现, 可能带来组合爆炸。合
并处理显得紧凑, 开销小, 效率高, 减少了一些重复查寻, 但对软件的要求比较高,
模型本身也显得臃肿。 本模型首先是为采用句法和语义分开策略的荷兰DLT多语机译
系统服务的,但也为句法语义同时分析的我们的JFY-IV系统的应用留下了扩充的口子。
本文对于汉语语法学界的价值主要不在于描写语法现象的准确和深入上--在这个
方面, 作者作为汉语研究的新手还有很多遗憾, 而是在于它给出了一个适于机器处理
的句法模型的标本, 这对于不熟悉计算机而又对计算语言学的方法和思路感兴趣的汉
语语法学者, 可能具有某种启发意义。迄今为止, 我们还找不到一个现成的比较权威
的汉语句法模型作为机器加工的基础, 然而语言机器处理的实践对这种模型的要求越
来越迫切。本模型还远不能令人满意, 但它至少是可用的。从这个意义上看, 希望它
起到抛砖引玉的作用。
APPENDIX II:
Linguistic Problems Concerning Chinese in Constructing DLT Parsers
-- in answer to Dr. Dan Maxwell
LI Wei
1. Writing system
1) A set of characters constitutes Chinese writing system. The system of
PINYIN (Chinese alphabet), which is based on Latin alphabet, is often used to
represent the pronunciation of the characters. The standard system PINYIN
includes four special signs above the vowels, denoting Chinese four tones: 1.
high level tone; 2. rising tone; 3. falling-rising tone; 4. falling tone; e.g.
MA, MA, MA, MA. What often happens is that there are many characters with same
pronunciation, e.g. ZHI: . In order
to be practically used in DLT, we suggest that a coding principle be set up
that all the characters with same pronunciation, regardless of tones, should
be queued and numbered according to their order in authoritative dictionaries
so that they can be differentiated by the different number at the end of a
syllable, e.g. ZHI1: , ZHI2: ; ... ZHI55: . In this way, it will be very
easy to transform between Chinese characters and such codes at a Chinese
computer terminal.
2) It is widely accepted that a Chinese word >= a Chinese character in
Contemporary Chinese. How to automatically recognize words out of a character-
string has now become a special research topic in the field of Chinese
information processing (see 1.0 in "A Dependency Syntax of Contemporary
Chinese", pp.4-5; also see "Word-recognition and Syntactic Analysis in Chinese
Information Processing" by Prof. LIU Zhuo and "General Situation of the
Research of Computational Linguistics in China" by FU Aiping).
3) Generally speaking, the system of punctuation marks follows that of English
in usage. The only noteworthy difference is that in Chinese there is a special
coordinating mark called DUNHAO: , which is always used to set off items of
a series, i.e. between closely parallel coordinated words or word groups (see
1.2.10 in "A Dependency Syntax of Contemporary Chinese", p.14). In fact,
Chinese comma and DUNHAO together accomplish what English comma does.
2. Word order
In such an inflection-wanting language as Chinese, word order has naturally
and necessarily turned out to be one of the two most important syntactic means
(the other being function words), to which we should accordingly pay special
attention.
2.1 Freedom of word order
Generally, the degree of freedom of word order in Chinese is low. Out of 36
dependants in our syntax, only 2 complements, OBJ and PC, often appear either
before or after their governor. The unmarked pattern for OBJ is V-OBJ and its
transformation OBJ-V may be induced by such factors as emphasis, style, etc.
Each particular PC, however, always has a specific position with respect to
its governor, and such indication of position should be included in the
valency information of the governor so that the parser can therefore know
where and which preposition to look for. How about the freedom of order
between the sister constituents under a same governor? Multi-adverbials, for
example, are syntactically very free in order, especially for those DE2-
adverbials, and the same happens to multi-attributes, especially for DE-
attributes. Between different types of sister dependants, there is no general
rule: some are order free to each other, some more are not free. As for those
order free pairs of sister constituents, one might consider the more often
appearing pattern as unmarked. For example, the unmarked order between LCN and
AtrA under the governor noun is LCN-AtrA--N as in the phrase WO MEN DE SAN GE
REN (our three men), but now and then we also come across SAN GE WO MEN DE REN
(three men of ours) in the pattern AtrA-LCN--N. Such rules concerning
constituent order under each main word category will be discussed in details
in 2.2.
The factors affecting Chinese word order are generalized as in the following 5
aspects: 1. syntactic roles, i.e types of dependants; 2. semantic roles; 3.
pragmatic effects such as emphasis; 4. phonetic requirements, esp. the number
of syllables (characters) in a constituent; 5. rhetoric effects as different
styles. The final realization of a sentence in its surface order is often
achieved through the compound effects of the above 5 factors, but which is the
deterministic one seems difficult to define: for each factor, we can cite more
or less sample sentences whose word order is mainly determined just by that
factor. (see also 4.5) In a word, as a language with no inflections, Chinese
is bound to be quite limited in word order freedom; and as a very expressive
language with a long history of development, Chinese also shows its
flexibility in many ways including the highest possible freedom of word order
on the condition of not being contradictory to the overall frame of Chinese.
Undoubtedly, word order is a very complicated and important problem in Chinese
to be studied further and it needs years of hard work before we can get a
better and clearer discovery of the inherent mechanism of Chinese word order.
Unfortunately, there have been so far no authoritative and large scale
statistical studies of Chinese word order.
2.2 Word order and Dependency
2.2.1 Basic order of the ten elementary trees
1) Verb
/ / / / / / / / / \ \ \ \ \ \ \ \
/ / / / / / / / / \ \ \ \ \ \ \ \
/ / / / / / / / / \ \ \ \ \ \ \ \
SUB OBJ OBJ2 SUBOB SOC PC LCV VqC BaC AdvA VA PMOD TOP VCoA AspA CirA ZyqA
Basic Constituent Order: CirA<->TOP/(OBJ)<->SUB-(PC)/BaC<->AdvA--V--VCoA-
AspA-(PC)-(OBJ)/SUBOB-SOC-VqC-PMOD/LCV-VA-ZyqA
Note: A<->B indicates that A precedes B more often than B does A. A/B shows
the syntactic impossibility of the concurrence of A and B, hence eliminating
the problem of their relative order. (A)--governor--(A) signifies that A can
be put either before or after the governor: for (OBJ), the unmarked order is
V--OBJ with OBJ--V as its transformation; for (PC), the order is decided by
the valency of the verb governor.
It is widely believed that the main order for Chinese is S-V-O, which is only
right to some extent. Actually, the position for object is much freer than
commonly expected although subject is almost always put before the predicate
verb. The permutations for S, V, O are: 1. S-V-O; 2. S-O-V; 3. V-S-O; 4. V-O-
S; 5. O-S-V; 6. O-V-S. In standard written Chinese, there are no pattern 6,
pattern 3 and pattern 4. We might list its possible transforms with sample
sentences as follows:
Basic pattern: SVO
Transform Sample sentence Remarks
SOV WO NANJING QU GUO, SHANGHAI MEI QU. SOV is often present in
(I Nanjing go ? Shanghai not go) parallel structures, i.e.
(I have been to Nanjing, never to Shanghai) compound sentences.
OSV NANJING WO QU GUO. OSV is far more often
(I have been to Nanjing) used than SOV.
Then in the surface sequence N1+N2+V, how do we know whether it is in the form
SOV or OSV? The decisive factor seems to come from semantic analysis rather
than syntactic analysis. (also see 4.5)
2) Adjective
/ / \ \ \ \ \ \
/ / \ \ \ \ \ \
/ / \ \ \ \ \ \
/ / \ \ \ \ \ \
SUB PC AdvA PMOD TOP ACoA CirA ZyqA
Basic Constituent Order: CirA<->TOP-SUB-(PC)-AdvA--A--ACoA-(PC)-PMOD-ZyqA
3) Noun
/ / / / \ \ \ \
/ / / / \ \ \ \
/ / / / \ \ \ \
SUB MnC NC LCN AtrA DetA NCoA ZyqA
Basic Constituent Order: SUB-DetA/LCN-AtrA--N--NC-MnC-NCoA-ZyqA
4) Pronoun
/ \
/ \
MnC AppA
Basic Constituent Order: AppA--D--MnC
5) Preposition
/
/
CP
Basic Constituent Order: P--CP
6) Postposition
/
/
CW
Basic Constituent Order: CW--W
7) Numeral
/ \
/ \
DiC SA
Basic Constituent Order: DiC--S--SA
8) Classifier
\
\
LA
Basic Constituent Order: LA--L
9) Particle
/ / / /
/ / / /
SUB CDe CDe2 CDe3
Basic Constituent Order: SUB-CDe/CDe2--Z--CDe3
10) Conjunction
/ / \ \
/ / \ \
CC X-C X-C Y
Basic Constituent Order: (Y)-X-C--C--X-C-CC-(Y)
2.2.2 Discontinuous dependencies
Similar to that of the Indo-European languages, discontinuity exists in
object-preceding patterns as OBJ-(SUB)-auxiliary verb-V, e.g. 1. ZHE(this)
REN(man) GAI(should) SHA(kill) (This man should be killed); 2. JI(chicken)
WO(I) DASUAN(plan) RANG(let) BINGREN(patient) CHI(eat) (I'm planning to let
the patients eat the chickens).
1. GAI
\ OBJ
SHA
\ OBJ
REN
DetA /
ZHE
2A. DASUAN
SUB / \ OBJ
WO RANG
\ SUBOB \ SOC
BINGREN CHI
\ OBJ
JI
A possible simpler analysis for Chinese which might eliminate discontinuity
between the preceding object and its verb governor is to take the object as
topic, reducing the transitive verb to intransitive as with its object
omitted:
2B. DASUAN
TOP / SUB / \ OBJ
JI WO RANG
\ SUBOB \ SOC
BINGREN CHI
3. Word grammar
3.1 There are no inflections in Chinese.
3.2 Morpheme Order
Nearly every Chinese character can serve as sort of morpheme which can be
combined with other character(s) to form a word. Usually the last morpheme is
thought to be the head of the word just as in Esperanto.
3.3 Derivational Morphology
In Contemporary Chinese, there have come to be a few morphemes (characters)
functioning very similarly to some suffixes in English, e.g. XING works just
like "-ness", changing a noun or an adjective into an abstract noun, N/A/X +
XING --> N: LISHI (N:history) + XING --> LISHIXING (N:historicity); SHIYONG
(A:practical) + XING --> SHIYONGXING (N:practicalness, practicality); YANSU
(A:serious) + XING --> YANSUXING (seriousness); KE (can) DU (read) + XING -->
KEDUXING (N:readability). Such quasi-suffixes as XING, DU (similar to XING),
HUA (=ization) are very creative and therefore highly worthy of great
attention though they are very few.
4. Types of Syntactic Ambiguity
Just due to the lack of inflections, Chinese seems more likely to be
syntactically ambiguous, resulting in two or more trees for most of Chinese
sentences. Chinese is essentially a semantics-bound language, therefore, one
can hardly expect to achieve much by syntactic analysis, which can only be
based on forms, explicit forms (function words and word order) or implicit
forms (word categories, subclasses and valencies). It is not difficult to cite
some sentences which lead to as many as a dozen trees, hence (I'm afraid) the
problem of combinational explosion in parsing.
4.1 Word category
The problem of category ambiguity for Chinese words is so serious and striking
that there used to be a prevailing view in Chinese grammar circle that "there
are no grammatical categories for Chinese words, and categories can only be
defined in context".
We find that there are two kinds of category ambiguity, the first might be
called potential ambiguity, which arises simply for the fact that some words
cover syntagmatic definitions of two or more categories, and the second is
dynamic ambiguity which occurs when the language user makes elastic or
temporary use of some words. Accordingly, there are two ways. For words of
potential category ambiguity, we should list all their potential categories as
their static codes in dictionary, and sentences with such words in them will
call a subroutine of category disambiguating rule set to help solve most of
the problem and try to determine the only correct category in this very
context erasing the other improper categories. Words of dynamic category
ambiguity can not be predicted, they are therefore attached with only one
category in dictionary, which will be dynamically changed into another
category during the execution of some special rules (often related closely to
certain special function words). For example, the rule X + LE --> V + LE will
dynamically change any category before LE into a verb because the function
word LE can only be used after its governor verb as its aspect adjunct
(perfect aspect). One more example, the rule S + N1 + N2 --> S + Ln + N2
changes a noun into classifier.
4.2 Word category and morphology
There are no inflections in Chinese, not to say inflection ambiguity.
4.3 Adpositional phrases
There are no such problems because Chinese prepositions and postpositions can
not be directly used as attributes. Besides, in Chinese all attributes precede
their noun governor and all adverbials precede their verb/adjective governor.
However, there do exist ambiguities in such patterns as P+N1+DE+N2+DE+N3. One
reading is ((((P+N1)+DE)+N2)+DE)+N3; another is ((P+N1)+DE)+(N2+DE)+N3; a
third is P+((((N1+DE)+N2)+DE)+N3). (see 4.6)
4.4 Coordination
There are similar problems in Chinese as those in English , like the ambiguity
in "happy students and workers": 1. (happy students) and workers; 2. happy
(students and workers). We sometimes employ a so-called ambiguity-untouched
strategy in the practice of our machine translation research from Indo-
European languages into Chinese, e.g. A and B of C --> C DE B HE A (C's B and
A). In default of other reliable means, this strategy in most cases may lead
to unexpectedly satisfactory results.
The English sentence "They washed and polished the table" may be translated
into Chinese in several ways:
1) TA MEN XI LE QIE CA LE ZHUO ZI
(he -s wash pst and polish pst table noun-suffix)
2) TA MEN XI QIE CA LE ZHUO ZI (with the first particle LE omitted)
3) TA MEN XI LE CA LE ZHUO ZI (using DUNHAO instead of QIE)
4) TA MEN XI CA LE ZHUO ZI (in the way of both 2) and 3))
5) TA MEN XI CA LE ZHUO ZI (with first LE and conjunction QIE omitted)
6) TA MEN XI LE, CA LE ZHUO ZI (using comma instead of QIE)
Only sentence 1) is ambiguous, the same as in the original sentence. Sentences
2), 3), 4) and 5) mean that they washed the table and polished it too.
Sentence 6) corresponds to the second reading that they washed (themselves or
something other than the table) and polished the table. If we want to get both
readings, the comma in 6) should be replaced by the conjunction QIE or ERQIE
as in 1); if to get the first reading only, the conjunction should be a
DUNHAO instead of the comma as in 3). Here lies the slight difference between
these coordinating conjunctions in usage. Sentence 2) has come to be
disambiguated because the omitted LE makes the second LE necessarily modify
the two coordinated verbs, thus eliminating the possibility of the second
reading.
4.5 Subjects and objects
Although Chinese subject can not follow its predicate verb, object can often
be placed before its governor (also see 2.2.1). Since most transitive verbs
can also be used as intransitive, the pattern N+V can be analysed as S-V or O-
V, e.g. JI (chicken) CHI (eat) LE (pst).
To further look into the matter of subject, object and their governor verb,
we'll cite some sample sentences on the permutation of a noun JI, a pronoun
WO, and a verb CHI.
1) WO CHI LE JI. (I ate the chicken)
2) JI WO CHI LE. (The chicken I ate)
In this case, if CHI is not a transitive verb, JI shall be its TOP or CirA,
otherwise JI is its object. WO, as a pronoun in the above position, can only
function as SUB.
3) JI CHI LE WO. (The chicken ate me)
Although contradictory to common sense, 3) can only be interpreted in this way
unless in spoken Chinese it might be considered that JI is the object and WO
the subject of CHI (in that case, it should be written with a comma before WO
as JI CHI LE, WO).
We always think that in different situations a certain realization of an idea,
i.e. a certain sentence, is generated due to several factors in which one must
be decisive. The problem is that it is quite difficult to find which one on
what condition will play the decisive role in the generation or analysis of a
sentence. In the pattern N1 (or pronoun) + transitive V + N2 (or pronoun), it
is syntax, not semantics, not world knowledge, or anything else, that defines
N1 as subject and N2 as object of V, as in 1) and 3). The pattern N + pronoun
+ transitive V in 2) is also syntax-bound to be in the only possible
structure O-S-V. But the decisive factors to help to get the right reading out
of the four possible explanations in the following 4) are respectively common
sense, background knowledge, syntax and statistics. (see following, and also
2.1)
4) WO JI CHI LE. (I ate the chicken; the chicken ate me;
my chicken ate; my chicken was eaten)
CHI CHI CHI CHI
SUB/ OBJ/ \ AspA OBJ/ SUB/ \ AspA SUB / \ AspA OBJ / \ AspA
WO JI LE WO JI LE JI LE JI LE
AtrA / AtrA /
WO WO
The possibility of the second reading can be eliminated by common sense. And
the probability of the first reading is not high unless it appears in parallel
structure such as WO JI CHI LE, DAN (but) YA (duck) MEI (not) CHI (I ate the
chicken, but didn't eat duck). In this case, one can find some clues of form,
therefore, syntax works effectively here. If not in parallel structure, then
the factor of linguistic statistics must be taken into consideration. As for
the third and forth readings, background knowledge or context will help.
5) CHI LE WO JI. ((Someone) ate my chicken)
In the pattern transitive V + personal pronoun + N, syntax discovers its only
possible structure V-(attribute)-O.
4.6 Modifier chains
The ambiguity in modifier chains seems particularly serious because Chinese
attributes are all put before the noun governor. The more constituents in the
chain, the exponentially more readings will result, especially for those
attributes in the form of DE-phrase (by the way, according to statistics by
computer, the particle DE is the most frequently appearing word in Chinese,
its frequency much higher than that of the second most). In the pattern N1 DE
N2 DE N3 DE ... Nn, N1+DE can syntactically modify any of the following nouns,
although probability of each reading is highly varied.
However, there are some special characteristics of Chinese grammar which can
eliminate some of the readings concerning modifier chain ambiguity. Compare
the following examples:
1) NIANQING DAIFU DE PENGYOU (the young doctor's friend)
(young doctor 's friend )
2) NIANQING DE DAIFU DE PENGYOU (1. the young doctor's friend;
2. the young friend of the doctor)
3) YONGGAN DE MEILI DE NU BING (brave pretty woman soldiers)
(brave pretty female soldier)
4) JUYOU WEIDA YIYI DE GEMING DE CHENGGONG
(have great significance revolution 's success)
(1. the revolution's success with great significance;
2. the success of (the revolution with great significance))
5) FULAN MUTOU DE QIAO (the bridge of rotten wood)
(rotten wood bridge)
6) FULAN DE MUTOU DE QIAO (1. the bridge of rotten bridge
2. the rotten wooden bridge)
7) PIAOLIANG MU QIAO (the beautiful wooden bridge)
(beautiful wood bridge)
Examples 1) and 2), 5) and 6) are different in that the first attribute in 1)
and 5) is adjective itself and in 2) and 6) is DE-phrase (adjective + DE).
Chinese adjectives can both directly and indirectly (with the help of DE)
modify the following noun, but DE-phrase attributes seem much freer and less
restricted. 3) is not ambiguous because an adjective is generally not allowed
to modify another adjective. 4) is in the pattern VP + DE + N1 + DE + N2,
which can always be syntactically interpreted either as (VP + DE + N1) + DE +
N2 or VP + DE + (N1 + DE + N2). In 7) the adjective PIAOLIANG can by no means
modify the immediately following noun MU because of the effect of the number
of syllables in an immediate constituent on structure. The potential demand
that a 2-syllable constituent be directly related also to another 2-syllable
word and never to a single-syllable word makes the first combination of MU and
QIAO into a 2-syllable word-like unit which is then modified by the 2-syllable
adjective PIAOLIANG (see 0.1.3 in "A Dependency Syntax of Contemporary
Chinese", p.3).
4.7. Word syntax vs. sentence syntax
One of the important features of Chinese grammar is that word syntax is
essentially identical to sentence syntax. In fact, there is no substantial
limit between a word and a morpheme. What is worth mentioning here is that
word syntax embodies more elements of ancient Chinese. (for details and
examples, see 1.0 in "A Dependency Syntax of Contemporary Chinese", pp.4-5).
4.8. Antecedents of relative pronouns
There are no relative pronouns in Chinese.
4.9. Ellipsis
For the so-called elliptical sentence JI CHI LE, see 4.5.
In Chinese, there are few elliptical sentences like "Hans liebt Anna, und
Peter auch", "Peter claims that Paul likes beer and Sandy does too" or "Sam
sent Pam to Mary and Paul to Sara" as in Indo-European languages. It then
becomes a tough problem for the machine to properly recover what has omitted
in the original sentences when our system translates them into Chinese. (see
the doctoral dissertation by Xiuming HUANG, U.S.)
As for the sentence "The professor would claim that the students will go on
strike, if necessary", only some so-called Europeanized Chinese sentences,
i.e. the sentences greatly affected in their structures by Indo-European
languages, will have the same problem.
4.10. Long distance dependencies
Problems of this sort are not found in Chinese because the word order in
Chinese interrogative sentence is the same as that in some other type of
sentence.
5. Style
1) Compared with written Chinese, spoken Chinese seems some freer in syntax.
For example, although in 2.2.1 we declare that the pattern O-V-S does not
exist in Chinese, this, however, is only true in written Chinese, while in
spoken Chinese we can easily hear such a sentence as "DI (floor) SAO (sweep)
LE (pst.) ME (chu) NI (you) ?" (Have you swept the floor?). Besides, the three
very important particles DE, DE2, DE3 are pronounced exactly the same in
spoken Chinese. But we shall only deal with written Chinese in the form of
"informative texts".
2) There are 8 major dialects and hundreds of, or even thousands of,
subdialects in Chinese. Each dialect has more or less of its own
characteristics in phonetics, vocabulary and syntax. But we shall only deal
with PUTONGHUA (standard Chinese).