A Dependency Syntax of Contemporary Chinese (3/3)

http://homepage.mac.com/liwei999/Publications_PDF/YF3.txt

3. Dependency Patterns for Wordcategories: ten elementary tree structures 

Dependency patterns are represented by the ten elementary tree structures 
respectively, which are derived from the above list in Section 2.

These trees display the kinds of different dependants a word from a certain 
wordcategory can govern. There can be two or more dependants of the the same 
type. Adjuncts can just be doubled, i.e. there can for example be two or more 
adverbial adjuncts in the sentence, where the elementary tree gives just one 
branch for such a dependant, alluding to the other possible instances of the 
same branch type. Most complements, to the contrary, cannot be doubled in this 
way, their number being fixed to one of every complement type.

In the following figures, the node represents a word from the wordcategory given, 
while the branches are labelled with the names of the possible dependants. Note 
that  these trees do not contain valency information. If a verb is said to be able 
to govern nine complements, this is a statement about the maximal governing 
capacity of verbs. A special verb may have a smaller number of possible 
complements, and, moreover, some of these complements can be facultative. These 
two facts form a part of the valency information of the word.

Note: there are two categories of words, adverbs and interjections, that 
cannot govern anything.

3.01  Elementary dependency tree for verbs

1)

                                                Verb 
            /   /    /    /    /   /  /   /  /  
          /   /    /    /    /   /  /   /   /   
        /   /    /    /    /   /  /   /   /    
      SUB OBJ OBJ2 SUBOB SOC PC LCV VqC BaC    
                 complements

                           Verb 
                               \  \  \   \   \    \    \    \    \
                                \  \   \   \   \    \    \    \    \
                                 \  \    \   \   \    \    \    \    \
                                AdvA VA PMOD TOP VCoA AspA CirA ZyqA BeiA
                                        adjuncts

Basic Constituent Order: CirA<->TOP/(OBJ)-SUB-(PC)/BaC<->AdvA-BeiA--V--
VCoA-AspA-(PC)-(OBJ)/SUBOB-SOC-VqC-PMOD/LCV-VA-ZyqA

Note: A<->B indicates that A precedes B more often than B does A. A/B shows 
the syntactic impossibility of the concurence of A and B, hence eliminating 
the problem of their relative order. (A)--governor--(A) signifies that A can 
be put either before or after the governor: for OBJ, the unmarked order is V
--OBJ with OBJ--V as its transformation; for PC, the order is decided by the 
valency of the verb governor.

It is widely believed that the main order for Chinese is S-V-O,  which is only 
right  to some extent.  Actually,  the position for object is much freer  than 
commonly  expected although subject is almost always put before the  predicate 
verb.  The permutations for S, V, O are: 1. S-V-O; 2. S-O-V; 3. V-S-O; 4. V-O-
S;  5.  O-S-V;  6. O-V-S. In standard written Chinese, there are no pattern 6, 
pattern  3  and pattern 4. (Yet, in spoken  Chinese we can easily hear such a 
sentence as "地DI (floor) 扫SAO (sweep) 了LE (pst.) 吗ME (chu), 你NI (you) ?" 
(Have you swept the floor?). We might list its possible variations with sample 
sentences as follows: 

                              Basic pattern: SVO
Variation  Sample sentence                       Remarks
SOV        我 南京    去 过,  上海     没  去。  SOV is often present in 
          (I  Nanjing go ?    Shanghai not go)   parallel structures, i.e.
     (I have been to Nanjing, never to Shanghai) compound sentences.

OSV        南京 我 去 过。                       OSV is far more often 
          (I have been to Nanjing)               used than SOV. 
                                                 
Then in the surface sequence N1+N2+V, how do we know whether it is in the form 
SOV  or OSV?  The decisive factor seems to come from semantic analysis  rather 
than syntactic analysis. (also see 5.4.3)

3.02  Elementary dependency tree for adjectives

                Adjective 
            /      / \      \     \      \      \      \
          /      /     \      \     \      \      \      \  
        /      /         \      \     \      \      \      \
      /      /             \      \     \      \      \      \ 
   SUB      PC               AdvA   PMOD  TOP    ACoA   CirA   ZyqA 
   complements                             adjuncts
  
Basic Constituent Order: CirA<->TOP-SUB-(PC)-AdvA--A--ACoA-(PC)-PMOD-ZyqA

3.03  Elementary dependency tree for nouns

                                  Noun    
            /      /      /      /   \      \      \      \
         /      /      /      /        \      \      \      \
      /      /      /      /             \      \      \      \
  SUB     MnC     NC     LCN             AtrA   DetA   NCoA   ZyqA 

         complements                             adjuncts
 
Basic Constituent Order:  SUB-DetA/LCN-AtrA--N--NC-MnC-NCoA-ZyqA

3.04  Elementary dependency tree for Pronouns

                                  Pronoun
                                  /     \ 
                                /         \
                             MnC           AppA             
                          complement      adjunct

Basic Constituent Order:  AppA--D--MnC
.pa
�3.05 Elementary dependency tree for Prepositions

                                Preposition       
                                  /
                                /   
                              CP         
                          complement
   
Basic Constituent Order:  P--CP
 
3.06 Elementary dependency tree for Postpositions

                                 Postposition 
                                    /
                                  /
                               CW                       
                           complement
  
Basic Constituent Order: CW--W

3.07 Elementary dependency tree for Numerals

                                   Numeral
                                   /    \
                                /          \
                              DiC           SA
                           complement     adjunct
 
Basic Constituent Order:  DiC--S--SA  

3.08 Elementary dependency tree for Classifiers

                                   Classifier
                                         \
                                           \
                                             LA
                                           adjunct

Basic Constituent Order:  LA--L

3.09 Ementary dependency tree for Particles

                                  Particle
                      /   /   /    /
                    /   /   /    /
                 SUB CDe CDe2 CDe3                                            
                    complements
    
Basic Constituent Order:  SUB-CDe/CDe2--Z--CDe3

3.10 Elementary dependency tree for Conjunctions

                              Conjunction
                         /       /    \      \
                       /       /        \       \
                    CC      X-C          X-C      Y
                complement

Basic Constituent Order:  (Y)-X-C--C--X-C-CC-(Y)

The two dependants marked X-C represent any dependants that can be 
coordinated. The coordinating conjunction in this case takes the syntactic 
label from the branch it depends on and copies it to the two coordinated 
dependants. A dependant that depends on both of the two coordinated items as 
a whole can be added as Y. A conjunction can govern either the two dependants 
marked X-C plus any number of Y's, defined by the X-C's or a Complement of 
Conjunction (CC). Please refer to Section 5 "Sample Trees" for a 
straightforward understanding of the above, and for details about the role of 
conjunction see 2.3.6 in "Syntactic Structures in DLT" (Schubert, 1986).

4. Sample Trees         

4.01 每D 样L  东西N , 每D 件L  事情N,   由P 谁D  管V,  怎么F 管V,  都F 落实V 
     MEI YANG DONGXI, MEI JIAN SHIQING, YOU SHUI GUAN, ZENME GUAN, DOU LUOSHI 
到P 每D 个L 人N 头N 上W 。
DAO MEI GE  REN TOU SHANG.
                               
                                        落实LUOSHI
                        SUB  /   AdvA /     \ PC
                           ,        都DOU     到DAO
         OBJ /     SUB-C /  \ SUB-C             \ CP
           ,          管GUAN  管GUAN              上SHANG
    OBJ-C / \ OBJ-C     \ BeiA  \ AdvA             \  CW
  东西DONGXI 事情SHIQING 由YOU   怎么ZENME           头TOU
  DetA /       \ LCN        \ CP                    / AtrA
     样YANG     件JIAN       谁SHUI              人REN
 LA /             \ LA                         DetA \
  每MEI            每MEI                              个GE
                                                     LA \
                                                         每MEI

4.02  她D 看V 了Z 看V 表N,  计算V  着Z 乘V   哪D 一S 路L 汽车N 快A,  什么D 
      TA  KAN LE  KAN BIAO, JISUAN ZHE CHENG NA  YI  LU  QICHE KUAI, SHIME 
时候Nt 可以V 赶V 到P 幼儿N 园N,  什么D 时候Nt 可以V 抱V 着Z 女儿V 赶V 到P 家N。
SHIHOU KEYI  GAN DAO YOUER YUAN, SHIME SHIHOU KEYI  BAO ZHE NUER  GAN DAO JIA.

                            看V
         SUB / AdvA / PMOD / \ OBJ \ PMOD
           她D   了Z   看V'  表N    计算V
                                OBJ /    \ AspA
                                   ,C     着Z
                           OBJ-C /      \ OBJ-C
                             快A          ,C
                      SUB /       OBJ-C /       \ OBJ-C
                       乘V           可以Vz     可以Vz
                   OBJ /        OBJ /  \ AdvA AdvA /   \ OBJ
                    汽车N        赶V   时候Nt 时候Nt     赶V
                LCN /         PC / DetA /  DetA /    AdvA /  \ PC
                 路L           到P  什么D   什么D     抱V     到P
          LA / LA /        CP /                    AspA / \ OBJ  \ CP
           哪D 一S        园N                       着Z   女儿N    家N
                      AtrA /   
                      幼儿N

4.03 我D 这D 时Nt 又F 忽然F 想V   起Z, 小A  林Nz 要V 我D 给P 他D 买V 一S 本L 
     WO  ZHE SHI  YOU HURAN XIANG QI,  XIAO LIN  YAO WO  GEI TA  MAI YI  BEN 
书N, 刚才F   在P 书N 店N  里W 忘V  了Z 问V 了Z。 
SHU, GANGCAI ZAI SHU DIAN LI  WANG LE  WEN LE.

                                         ,C
                    -C /                                     \ -C
                   想V                                       忘V
SUB / AdvA / AdvA /  \ AdvA  \ VqC      \ OBJ      AdvA/ AdvA/\AspA \OBJ \ZyqA 
 我D    时Nt    又F    忽然F   起Z          要V    刚才F 在P   了Z   问V  了Z
   DetA /                        SUB / SUBOB/ \SOC         \ CP  
     这D                        林N      我D   买V           里W 
                           AtrA /         AdvA / \ OBJ         \ CW
                             小A           给P    书N            店N
                                      CP /          \ LCN          \ AtrA
                                      他D             本L            书N  
                                                        \ LA
                                                          一S      

4.04  但C 那D 时Nt 我D 在P 上海Nz   也F 有V 一S 个L 惟一A 的Z 不但C 敢V 于P 
      DAN NA  SHI  WO  ZAI SHANGHAI YE  YOU YI  GE  WEIYI DE  BUDAN GAN YU  
随便A   谈V 笑V,  而且C 还F 敢V 于P 托V 他D 办V 点D  私A 事N 的Z 人N, 那D 就F 
SUIBIAN TAN XIAO, ERQIE HAI GAN YU  TUO TA  BAN DIAN SI  SHI DE  REN, NA  JIU 
是V 送V  书N 去C 给V 白莽Nz  的Z 柔石Nz。
SHI SONG SHU QU  GEI BAIMANG DE  ROUSHI.   ,C
                                 -C /              \ -C
                                有V                  是V
 CirA / AdvA /  SUB / AdvA /      \ AdvA \ OBJ   SUB / \ AdvA \ OBJ
 但C    时Nt   我D     在P         也F      人N    那D   就F     柔石N
   DetA /          CP /          DetA /AtrA / \ AtrA         AtrA /
    那D           上海N            个L   的Z   的Z             的Z
                                LA /  CDe /      \ CDe           \ CDe
                                一S  惟一A        ,C               送V
                                         CDe-C /     \ CDe-C  OBJ /   \ VA 
                                         不但Cdp     而且Cdp  书N      去C  
                                         CC /           \ CC       CC /
                                         敢V             敢V       给
                                      PC /         AdvA /  \ PC      \ OBJ
                                       于P           还     于P       白莽
                                    CP /                      \ CP
                                    谈V                       托V
                               AdvA /  \ VCoA          SUBOB /    \ SOC
                               随便     笑              他D       办V
                                                             OBJ /
                                                              事N
                                                        Det /  \ AtrA
                                                         点D   私A

4.05  胶N  合V 板N 是Vs 把P 原木N  旋切V   或C 刨切V  成P   单A 片N  薄A 板N, 
      JIAO HE  BAN SHI  BA  YUANMU XUANQIE HUO PAOQIE CHENG DAN PIAN BO  BAN,
经过V   干燥A 、涂V 胶N,  并C  按P 木材N 纹理N 方向N     纵A  横A  交错V   
JINGGUO GANZAO、TU  JIAO, BING AN  MUCAI WENLI FANGXIANG ZONG HENG JIAOCUO 
相F   叠V, 在P 加V 热A 或C 不F 加V 热A 的Z 条件N    下W 压制V 而C 成V   的Z 
XIANG DIE, ZAI JIA RE  HUO BU  JIA RE  DE  TIAOJIAN XIA YAZHI ER  CHENG DE 
一S 种L   板材N。
YI  ZHONG BANCAI.
                                          是Vs
                              SUB /                  \ OBJ
                               板N                     板材N
                         AtrA /                   AtrA /  \ DetA
                           合V                      的Z    种L
                      SUB /                   CDe /            \ LA
                       胶N                   并C                 一S
                               CDe-C /                \ CDe-C
                                ,C                         ,C
                    CDe-C /            \ CDe-C     CDe-C /   \  CDe-C
                  或C                  经过V         交错V         而C
   BaC  / CDe-C /  \ CDe-C \ PMOD  OBJ /   AdvA/AdvA /  \ VA CDe-C / \ CDe-C
    把P    旋切V  刨切V    成P      、C   按P    横A    叠V    压制V  成V
 CP /                 CP /   OBJ-C / \ OBJ-C \ CP  \ ACoA \ AdvA \ AdvA
原木N                板N       干燥A  涂V    方向N  纵A    相F    在P
                 AtrA / \ AtrA    OBJ /        \ AtrA         CP /
                  片N   薄A       胶N          纹理N         下W
             AtrA /                       AtrA /           CW /
              单A                         木材N           条件N
                                                     AtrA /
                                                       的Z
                                                  CDe /
                                                   或C
                                           CDe-C /   \ CDe-C
                                              加V     加V
                                        PMOD /  AdvA /  \ PMOD
                                          热A     不F    热A

-----------------------------------------------------------------------------
Note: these sample sentences are adopted from "800 Words in Contemporary 
Chinese" by Lu Shuxiang (1981).

5. Some Issues on Establishing a Chinese Formal Syntax

5.1  Syntactic model and semantic model

    语言模型至少包括句法模型和语义模型两大部分。形式和内容是同一事物不可分割
的两个方面, 因此, 有人主张句法分析与语义分析同时进行, 建立句法和语义合一的模
型。这两种分析是分开还是合并, 看来各有利弊。分开显得干净利落, 模块分明, 也有
利于模型的纯粹化和抽象化, 但在计算机上实现, 可能带来组合爆炸。合并处理显得紧
凑, 开销小, 效率高, 减少了一些重复查寻, 但对软件的要求比较高, 模型本身也显得
臃肿。本模型是一套句法形式模型,首先为采用句法和语义分开策略的荷兰DLT多语机译
系统服务, 但也为句法语义同时分析的我们的JFY-IV系统的应用留下了扩充的口子。

5.2  Explicit forms and implicit forms

    建立形式文法的基点当然是语言形式。究竟什么是形式呢? 对于书面语来说, 文句
是有规律的字符串, 所以, 其形式只能是字符(字形、词形、成语形)及其字符间的次序
(字序、词序、词组序)。考察前者, 我们发现, 所有人类语言的词都可以分作两大类, 
一类是封闭词, 通常所谓功能词, 它们出现频率高, 数量有限; 另一类是开放词, 不段
有增加和淘汰, 难以枚举。封闭类好办, 其直接量(字形、词形)就是最清楚的句法形式
标志。开放类直接量当然也是形式, 必要时也可以利用(比如成语加工), 但因其数量太
多, 无法用枚举法建立抽象模型. 有形态的语言可以根据其易于识别并可以枚举的种种
形态, 主要是词尾, 找到一些形式标记。而象汉语这样缺乏形态的语言则没有这种便利
。然而, 要想建立一个抽象的形式句法模型, 单单依靠封闭类直接量、词序和形态这些
显性形式几乎是不可能的, 即便对于迄今为止形态最发达的人类语言也是如此。形态不
过是词的内在组合特性的一种外在体现, 而组合特性多种多样, 再发达的形态也只能表
现其中一部分。所以, 形式文法还要求助于一种所谓隐性形式, 就是对词--特别是开放
词--的形式分类。所谓形式分类, 就是依据词的句法组合能力进行的分类, 如动词、名
词等大类的划分, 再如单宾动词、双宾动词等子类的划分, 等等。

5.3  Fomal analysis and semantic analysis

    应该指出的是, 单单依靠形式, 不论是显性形式还是隐性形式, 要想完全实现无结
构二义性的分析也还是不可能的。句法二义性结构是普遍存在的语言现象, 缺乏形态的
语言更是如此。因此, 必须允许打出多棵句法树, 有待其后的其他分析, 主要是语义分
析去过滤筛选。在本句法的基础上, 建立一部配有词的各种形式分类的词典和一部句法
规则库,  利用扩展转移网络ATN软件手段, 就可以对于汉语文句进行自动分析, 产生一
棵或多棵相应的带有从属关系标记的合法的句法树。这样的句法树是下一步语义分析的
入口。如:

     总之F,   我D 们Z 的Z 工作N/V  成绩N   很F 大A。 
     ZONGZHI, WO  MEN DE  GONGZUO  CHENGJI HEN DA.
     In a word, our working achievements are great.
 
    (1)                                                  大DA
                           状句 /       主题 /      主语/       \状语  
                            总之ZONGZHI 工作GONGZUO 成绩CHENGJI  很HEN
                               定语 /
                                  的DE
                            补的 /
                             我WO
                               \ 复数
                                们MEN 
 
  (2)                                           大DA
                            状句 /      主语 /       \ 状语  
                             总之ZONGZHI 成绩CHENGJI  很HEN
                                  定语 /    \ 定语 
                                    的DE     工作GONGZUO
                              补的 /
                               我WO
                                 \ 复数
                                  们MEN 

5.4 Chinese word order

6. BIBLIOGRAPHY

1. Lu Shuxiang (1981): "800 Woeds in Contemporary Chinese", Beijing, Shangwu 

2. Liu, Zhuo; Fu, Aiping & Li, Wei (1989) JFY-IV Machine Translation
System, In Proceedings of MT SUMMIT II, pp.88-93, Munich.

3. Lucien Tesniere (1959): "Elements de Syntaxe Structurale", Paris: 
Klincksieck.

4. Klaus Schubert (1986): "Syntactic Tree Structures in DLT", published 
by BSO/Research, Utrecht.

5. Bieke van der Korst (1986): "A Dependency Syntax for English", BSO/DLT Research Report, Utrecht. 

6. Engel, Ulrich (1982): "Syntax der deutschen Gegenwartssprache", Berlin: 
Schmidt.

                        -- END --

APPENDIX I: ABSTRACT (in Chinese) 

    本文是以Tesniere从属关系学说为基础对现代汉语语法作系统研究的一次尝试。
自然语言的机器处理一般需要经历四个步骤: 语言理论 --> 语言模型 --> 算法设计 
--> 程序实现。本文属于第二阶段的工作。语言模型至少包括句法模型和语义模型两
大部分, 也可以是句法和语义合一的模型。本文提供一套描述汉语结构(层次和关系)
的句法形式模型。

    本模型将汉语词划分成12个大类和若干个小类,并运用这些分类、封闭词和词序, 
形式化地定义了现代汉语书面语36种从属关系, 其中20个补足语, 16个附加语。在本
句法的基础上, 建立一部配有词的各种形式分类的词典和一部句法规则库, 利用扩展
转移网络ATN, 就可以对于汉语文句进行自动分析,产生一棵或多棵相应的带有从属关
系标记的合法的句法树。这样的句法树是下一步语义分析的入口。此外, 本句法也同
样可以做汉语生成系统的基础。当然, 要真正实现一个比较完善的汉语生成系统, 还
有许多具体的工作要做。

    形式和内容是同一事物不可分割的两个方面, 因此, 有人主张句法分析与语义分
析同时进行。这两种分析是分开还是合并, 看来各有利弊。分开显得干净利落, 模块
分明, 也有利于模型的纯粹化和抽象化, 但在计算机上实现, 可能带来组合爆炸。合
并处理显得紧凑, 开销小, 效率高, 减少了一些重复查寻, 但对软件的要求比较高, 
模型本身也显得臃肿。 本模型首先是为采用句法和语义分开策略的荷兰DLT多语机译
系统服务的,但也为句法语义同时分析的我们的JFY-IV系统的应用留下了扩充的口子。

    本文对于汉语语法学界的价值主要不在于描写语法现象的准确和深入上--在这个
方面, 作者作为汉语研究的新手还有很多遗憾, 而是在于它给出了一个适于机器处理
的句法模型的标本, 这对于不熟悉计算机而又对计算语言学的方法和思路感兴趣的汉
语语法学者, 可能具有某种启发意义。迄今为止, 我们还找不到一个现成的比较权威
的汉语句法模型作为机器加工的基础, 然而语言机器处理的实践对这种模型的要求越
来越迫切。本模型还远不能令人满意, 但它至少是可用的。从这个意义上看, 希望它
起到抛砖引玉的作用。

APPENDIX II: 

      Linguistic Problems Concerning Chinese in Constructing DLT Parsers

                       -- in answer to Dr. Dan Maxwell

                                    LI Wei


1. Writing system

1)  A  set of characters constitutes Chinese writing  system.  The  system  of 
PINYIN (Chinese alphabet),  which is based on Latin alphabet, is often used to 
represent  the  pronunciation of the characters.  The standard  system  PINYIN 
includes four special signs above the vowels,  denoting Chinese four tones: 1. 
high level tone; 2. rising tone; 3. falling-rising tone; 4. falling tone; e.g. 
MA, MA, MA, MA. What often happens is that there are many characters with same 
pronunciation,  e.g. ZHI:                                           . In order 
to  be practically used in DLT,  we suggest that a coding principle be set  up 
that all the characters with same pronunciation,  regardless of tones,  should 
be queued and numbered according to their order in authoritative  dictionaries 
so  that  they can be differentiated by the different number at the end  of  a 
syllable,  e.g.  ZHI1:   , ZHI2:  ; ... ZHI55:  . In this way, it will be very 
easy  to  transform  between Chinese characters and such codes  at  a  Chinese 
computer terminal.

2)  It  is  widely  accepted that a Chinese word >=  a  Chinese  character  in 
Contemporary Chinese. How to automatically recognize words out of a character-
string  has  now  become  a special research topic in the  field  of   Chinese 
information  processing  (see  1.0 in "A  Dependency  Syntax  of  Contemporary 
Chinese", pp.4-5; also see "Word-recognition and Syntactic Analysis in Chinese 
Information  Processing"  by  Prof.  LIU Zhuo and "General  Situation  of  the 
Research of Computational Linguistics in China" by FU Aiping). 

3) Generally speaking, the system of punctuation marks follows that of English 
in usage. The only noteworthy difference is that in Chinese there is a special 
coordinating mark called DUNHAO:    , which is always used to set off items of 
a series,  i.e. between closely parallel coordinated words or word groups (see 
1.2.10   in "A Dependency Syntax of Contemporary  Chinese",  p.14).  In  fact, 
Chinese comma and DUNHAO together accomplish what English comma does. 


2. Word order

In  such an inflection-wanting language as Chinese,  word order has  naturally 
and necessarily turned out to be one of the two most important syntactic means 
(the  other being function words),  to which we should accordingly pay special 
attention.

2.1 Freedom of word order

Generally,  the degree of freedom of word order in Chinese is low.  Out of  36 
dependants in our syntax,  only 2 complements, OBJ and PC, often appear either 
before or after their governor.  The unmarked pattern for OBJ is V-OBJ and its 
transformation OBJ-V may be induced by such factors as emphasis,  style,  etc. 
Each  particular PC,  however,  always has a specific position with respect to 
its  governor,  and  such  indication of position should be  included  in  the 
valency  information  of the governor so that the parser  can  therefore  know 
where  and  which  preposition to look for.  How about the  freedom  of  order 
between the sister constituents under a same governor?  Multi-adverbials,  for 
example,  are  syntactically  very free in order,  especially for  those  DE2-
adverbials,  and  the  same happens to multi-attributes,  especially  for  DE-
attributes.  Between different types of sister dependants, there is no general 
rule:  some are order free to each other, some more are not free. As for those 
order  free  pairs of sister constituents,  one might consider the more  often 
appearing pattern as unmarked. For example, the unmarked order between LCN and 
AtrA under the governor noun is LCN-AtrA--N as in the phrase WO MEN DE SAN  GE 
REN (our three men), but now and then we also come across SAN GE WO MEN DE REN 
(three  men  of  ours)  in the  pattern  AtrA-LCN--N.  Such  rules  concerning 
constituent  order under each main word category will be discussed in  details 
in 2.2.

The factors affecting Chinese word order are generalized as in the following 5 
aspects:  1.  syntactic roles,  i.e types of dependants; 2. semantic roles; 3. 
pragmatic effects such as emphasis;  4. phonetic requirements, esp. the number 
of syllables (characters) in a constituent;  5.  rhetoric effects as different 
styles.  The  final  realization of a sentence in its surface order  is  often 
achieved through the compound effects of the above 5 factors, but which is the 
deterministic one seems difficult to define: for each factor, we can cite more 
or  less  sample sentences whose word order is mainly determined just by  that 
factor.  (see also 4.5) In a word,  as a language with no inflections, Chinese 
is bound to be quite limited in word order freedom;  and as a very  expressive 
language  with  a  long  history  of  development,   Chinese  also  shows  its 
flexibility  in many ways including the highest possible freedom of word order 
on  the condition of not being contradictory to the overall frame of  Chinese. 
Undoubtedly, word order is a very complicated and important problem in Chinese 
to  be  studied  further and it needs years of hard work before we can  get  a 
better and clearer discovery  of the inherent mechanism of Chinese word order. 
Unfortunately,  there  have  been  so far no  authoritative  and  large  scale 
statistical studies of Chinese word order.

2.2 Word order and Dependency

2.2.1 Basic order of the ten elementary trees

1)                                     Verb 
      /   /    /    /    /   /  /   /  / \  \  \   \   \    \    \    \    
    /   /    /    /    /   /  /   /   /   \  \   \   \   \    \    \    \   
  /   /    /    /    /   /  /   /   /      \  \    \   \   \    \    \    \    
SUB OBJ OBJ2 SUBOB SOC PC LCV VqC BaC     AdvA VA PMOD TOP VCoA AspA CirA ZyqA 

Basic Constituent Order: CirA<->TOP/(OBJ)<->SUB-(PC)/BaC<->AdvA--V--VCoA-
                         AspA-(PC)-(OBJ)/SUBOB-SOC-VqC-PMOD/LCV-VA-ZyqA

Note:  A<->B  indicates that A precedes B more often than B does A.  A/B shows 
the  syntactic impossibility of the concurrence of A and B,  hence eliminating 
the problem of their relative order.  (A)--governor--(A) signifies that A  can 
be put either before or after the governor:  for (OBJ),  the unmarked order is 
V--OBJ  with OBJ--V as its transformation;  for (PC),  the order is decided by 
the valency of the verb governor.

It is widely believed that the main order for Chinese is S-V-O,  which is only 
right  to some extent.  Actually,  the position for object is much freer  than 
commonly  expected although subject is almost always put before the  predicate 
verb.  The permutations for S, V, O are: 1. S-V-O; 2. S-O-V; 3. V-S-O; 4. V-O-
S;  5.  O-S-V;  6. O-V-S. In standard written Chinese, there are no pattern 6, 
pattern  3  and pattern 4.  We might list its possible transforms with  sample 
sentences as follows: 

                              Basic pattern: SVO
Transform  Sample sentence                       Remarks
SOV        WO NANJING QU GUO, SHANGHAI MEI QU.   SOV is often present in 
          (I  Nanjing go ?    Shanghai not go)   parallel structures, i.e.
     (I have been to Nanjing, never to Shanghai) compound sentences.

OSV        NANJING WO QU GUO.                    OSV is far more often 
          (I have been to Nanjing)               used than SOV. 
                                                 
Then in the surface sequence N1+N2+V, how do we know whether it is in the form 
SOV  or OSV?  The decisive factor seems to come from semantic analysis  rather 
than syntactic analysis. (also see 4.5)

2)              Adjective 
            /      / \      \     \      \      \      \
          /      /     \      \     \      \      \      \  
        /      /         \      \     \      \      \      \
      /      /             \      \     \      \      \      \ 
   SUB      PC               AdvA   PMOD  TOP    ACoA   CirA   ZyqA 
  
Basic Constituent Order: CirA<->TOP-SUB-(PC)-AdvA--A--ACoA-(PC)-PMOD-ZyqA

3)                                Noun    
            /      /      /      /   \      \      \      \
         /      /      /      /        \      \      \      \
      /      /      /      /             \      \      \      \
  SUB     MnC     NC     LCN             AtrA   DetA   NCoA   ZyqA 
 
Basic Constituent Order:  SUB-DetA/LCN-AtrA--N--NC-MnC-NCoA-ZyqA

4)                                Pronoun
                                  /     \ 
                                /         \
                             MnC           AppA             

Basic Constituent Order:  AppA--D--MnC

5)                              Preposition       
                                  /
                                /   
                              CP         
   
Basic Constituent Order:  P--CP
 
6)                               Postposition 
                                    /
                                  /
                               CW                       
  
Basic Constituent Order: CW--W

7)                                 Numeral
                                   /    \
                                /          \
                              DiC           SA
 
Basic Constituent Order:  DiC--S--SA  

8)                                 Classifier
                                         \
                                           \
                                             LA

Basic Constituent Order:  LA--L

9)                                Particle
                      /   /   /    /
                    /   /   /    /
                 SUB CDe CDe2 CDe3                                            
    
Basic Constituent Order:  SUB-CDe/CDe2--Z--CDe3

10)                           Conjunction
                           /     /    \      \
                         /     /        \       \
                      CC    X-C          X-C      Y

Basic Constituent Order:  (Y)-X-C--C--X-C-CC-(Y)

2.2.2 Discontinuous dependencies

Similar  to  that  of the Indo-European  languages,  discontinuity  exists  in 
object-preceding patterns as OBJ-(SUB)-auxiliary  verb-V,  e.g.  1.  ZHE(this) 
REN(man)  GAI(should) SHA(kill) (This man should be  killed);  2.  JI(chicken) 
WO(I)  DASUAN(plan)  RANG(let) BINGREN(patient) CHI(eat) (I'm planning to  let 
the patients eat the chickens). 

    1.                    GAI
                             \ OBJ
                               SHA
                                 \ OBJ
                                   REN
                             DetA /
                              ZHE





    2A.                       DASUAN
                         SUB /      \ OBJ
                          WO          RANG
                                       \ SUBOB   \ SOC
                                         BINGREN   CHI
                                                     \ OBJ
                                                       JI

A  possible simpler analysis for Chinese which might  eliminate  discontinuity 
between  the  preceding object and its verb governor is to take the object  as 
topic,  reducing  the  transitive  verb to intransitive  as  with  its  object 
omitted:

    2B.                       DASUAN
                   TOP / SUB /      \ OBJ
                    JI    WO          RANG
                                       \ SUBOB   \ SOC
                                         BINGREN   CHI
                                          
3. Word grammar

3.1 There are no inflections in Chinese.

3.2 Morpheme Order

Nearly  every  Chinese  character can serve as sort of morpheme which  can  be 
combined with other character(s) to form a word.  Usually the last morpheme is 
thought to be the head of the word just as in Esperanto.

3.3 Derivational Morphology

In  Contemporary Chinese,  there have come to be a few morphemes  (characters) 
functioning very similarly to some suffixes in English,  e.g.  XING works just 
like "-ness",  changing a noun or an adjective into an abstract noun,  N/A/X + 
XING  --> N:  LISHI (N:history) + XING --> LISHIXING (N:historicity);  SHIYONG 
(A:practical) + XING --> SHIYONGXING  (N:practicalness,  practicality);  YANSU 
(A:serious) + XING --> YANSUXING (seriousness);  KE (can) DU (read) + XING --> 
KEDUXING (N:readability).  Such quasi-suffixes as XING,  DU (similar to XING), 
HUA  (=ization)  are  very  creative  and therefore  highly  worthy  of  great 
attention though they are very few.

4. Types of Syntactic Ambiguity

Just  due  to  the  lack of inflections,  Chinese  seems  more  likely  to  be 
syntactically  ambiguous,  resulting in two or more trees for most of  Chinese 
sentences.  Chinese is essentially a semantics-bound language,  therefore, one 
can  hardly expect to achieve much by syntactic analysis,   which can only  be 
based  on  forms,  explicit forms (function words and word order) or  implicit 
forms (word categories, subclasses and valencies). It is not difficult to cite 
some sentences which lead to as many as a dozen trees,  hence (I'm afraid) the 
problem of combinational explosion in parsing.

4.1 Word category

The problem of category ambiguity for Chinese words is so serious and striking 
that there used to be a prevailing view in Chinese grammar circle that  "there 
are  no grammatical categories for Chinese words,  and categories can only  be 
defined in context".  

We  find  that there are two kinds of category ambiguity,  the first might  be 
called potential ambiguity,  which arises simply for the fact that some  words 
cover  syntagmatic  definitions of two or more categories,  and the second  is 
dynamic  ambiguity  which  occurs  when the language  user  makes  elastic  or 
temporary use of some words.  Accordingly,  there are two ways.  For words  of  
potential category ambiguity, we should list all their potential categories as 
their static codes in dictionary,  and sentences with such words in them  will 
call  a  subroutine of category disambiguating rule set to help solve most  of 
the  problem  and  try to determine the only correct  category  in  this  very 
context  erasing  the other improper categories.   Words of  dynamic  category 
ambiguity  can  not be predicted,  they are therefore attached with  only  one 
category  in  dictionary,  which  will  be dynamically  changed  into  another 
category during the execution of some special rules (often related closely  to 
certain special function words).  For example, the rule X + LE --> V + LE will 
dynamically  change  any category before LE into a verb because  the  function 
word  LE  can  only  be used after its governor verb  as  its  aspect  adjunct 
(perfect  aspect).  One  more example,  the rule S + N1 + N2 --> S + Ln  +  N2 
changes a noun into classifier.

4.2 Word category and morphology

There are no inflections in Chinese, not to say inflection ambiguity.

4.3 Adpositional phrases

There are no such problems because Chinese prepositions and postpositions  can 
not be directly used as attributes. Besides, in Chinese all attributes precede 
their  noun governor and all adverbials precede their verb/adjective governor. 
However,  there do exist ambiguities in such patterns as P+N1+DE+N2+DE+N3. One 
reading  is ((((P+N1)+DE)+N2)+DE)+N3;  another  is  ((P+N1)+DE)+(N2+DE)+N3;  a 
third is P+((((N1+DE)+N2)+DE)+N3). (see 4.6)

4.4 Coordination

There are similar problems in Chinese as those in English , like the ambiguity 
in "happy students and workers":  1.  (happy students) and workers;  2.  happy 
(students  and workers).  We sometimes employ a so-called  ambiguity-untouched 
strategy  in  the  practice of our machine  translation  research  from  Indo-
European languages into Chinese,  e.g. A and B of C --> C DE B HE A (C's B and 
A).  In default of other reliable means, this strategy in most cases may  lead 
to unexpectedly satisfactory results. 

The  English sentence "They washed and polished the table" may  be  translated 
into Chinese in several ways:

1) TA MEN XI  LE  QIE CA     LE   ZHUO  ZI  
  (he -s wash pst and polish pst table noun-suffix)

2) TA MEN XI QIE CA LE ZHUO ZI (with the first particle LE omitted)

3) TA MEN XI LE  CA LE ZHUO ZI (using DUNHAO instead of QIE)

4) TA MEN XI  CA LE ZHUO ZI    (in the way of both 2) and 3))

5) TA MEN XI CA LE ZHUO ZI     (with first LE and conjunction QIE omitted)

6) TA MEN XI LE, CA LE ZHUO ZI (using comma instead of QIE)

Only sentence 1) is ambiguous, the same as in the original sentence. Sentences 
2),  3),  4)  and  5)  mean that they washed the table and  polished  it  too. 
Sentence 6) corresponds to the second reading that they washed (themselves  or 
something other than the table) and polished the table. If we want to get both 
readings,  the  comma in 6) should be replaced by the conjunction QIE or ERQIE 
as  in  1);  if to get the first reading only,  the conjunction  should  be  a 
DUNHAO instead of the comma as in 3).  Here lies the slight difference between 
these  coordinating  conjunctions  in  usage.  Sentence  2)  has  come  to  be 
disambiguated  because the omitted LE makes the second LE  necessarily  modify 
the  two  coordinated verbs,  thus eliminating the possibility of  the  second 
reading.

4.5 Subjects and objects

Although  Chinese subject can not follow its predicate verb,  object can often 
be  placed before its governor (also see 2.2.1).  Since most transitive  verbs 
can also be used as intransitive, the pattern N+V can be analysed as S-V or O-
V, e.g. JI (chicken) CHI (eat) LE (pst). 

To  further look into the matter of subject,  object and their governor  verb,  
we'll  cite some sample sentences on the permutation of a noun JI,  a  pronoun 
WO, and a verb CHI.

1) WO CHI LE JI. (I ate the chicken)

2) JI WO CHI LE. (The chicken I ate)

In  this case,  if CHI is not a transitive verb,  JI shall be its TOP or CirA, 
otherwise JI is its object.  WO,  as a pronoun in the above position, can only 
function as SUB.  
                                
3) JI CHI LE WO. (The chicken ate me)

Although contradictory to common sense, 3) can only be interpreted in this way 
unless  in spoken Chinese it might be considered that JI is the object and  WO 
the subject of CHI (in that case,  it should be written with a comma before WO 
as JI CHI LE, WO). 

We always think that in different situations a certain realization of an idea, 
i.e. a certain sentence, is generated due to several factors in which one must 
be  decisive.  The problem is that it is quite difficult to find which one  on 
what  condition will play the decisive role in the generation or analysis of a 
sentence.  In the pattern N1 (or pronoun) + transitive V + N2 (or pronoun), it 
is syntax,  not semantics, not world knowledge, or anything else, that defines 
N1 as subject and N2 as object of V,  as in 1) and 3). The pattern N + pronoun 
+  transitive  V  in  2)  is also syntax-bound  to be  in  the  only  possible 
structure O-S-V. But the decisive factors to help to get the right reading out 
of the four possible explanations in the following 4) are respectively  common 
sense,  background knowledge,  syntax and statistics. (see following, and also 
2.1)

4) WO JI CHI LE.  (I ate the chicken; the chicken ate me;  
                   my chicken ate; my chicken was eaten)

         CHI              CHI                 CHI                   CHI
 SUB/ OBJ/ \ AspA  OBJ/ SUB/ \ AspA      SUB /   \ AspA        OBJ /   \ AspA
  WO   JI    LE     WO   JI    LE         JI       LE           JI       LE
                                     AtrA /                AtrA /
                                       WO                    WO

The  possibility of the second reading can be eliminated by common sense.  And 
the probability of the first reading is not high unless it appears in parallel 
structure such as WO JI CHI LE,  DAN (but) YA (duck) MEI (not) CHI (I ate  the 
chicken,  but didn't eat duck). In this case, one can find some clues of form, 
therefore,  syntax works effectively here.  If not in parallel structure, then 
the factor of linguistic statistics must be taken into consideration.  As  for 
the third and forth readings, background knowledge or context will help.

5) CHI LE WO JI. ((Someone) ate my chicken) 

In the pattern transitive V + personal pronoun + N,  syntax discovers its only 
possible structure V-(attribute)-O. 

4.6 Modifier chains

The  ambiguity in modifier chains seems particularly serious  because  Chinese 
attributes are all put before the noun governor.  The more constituents in the 
chain,  the  exponentially  more readings will result,  especially  for  those 
attributes  in the form of DE-phrase (by the way,  according to statistics  by 
computer,   the  particle DE is the most frequently appearing word in Chinese, 
its frequency much higher than that of the second most).  In the pattern N1 DE 
N2 DE N3 DE ... Nn, N1+DE can syntactically modify any of the following nouns, 
although probability of each reading is highly varied. 

However,  there  are some special characteristics of Chinese grammar which can 
eliminate  some of the readings concerning modifier chain  ambiguity.  Compare 
the following examples:

1) NIANQING  DAIFU  DE PENGYOU   (the young doctor's friend)
  (young     doctor 's friend )
    
2) NIANQING DE DAIFU DE PENGYOU  (1. the young doctor's friend; 
                                  2. the young friend of the doctor)

3) YONGGAN DE MEILI DE NU     BING  (brave pretty woman soldiers) 
  (brave      pretty   female soldier)

4) JUYOU WEIDA YIYI         DE GEMING     DE CHENGGONG
  (have  great significance    revolution 's success)
  (1. the revolution's success with great significance; 
   2. the success of (the revolution with great significance))

5) FULAN MUTOU DE QIAO (the bridge of rotten wood)
  (rotten wood    bridge) 

6) FULAN DE MUTOU DE QIAO (1. the bridge of rotten bridge
                           2. the rotten wooden bridge)

7) PIAOLIANG MU   QIAO (the beautiful wooden bridge)
  (beautiful wood bridge)

Examples 1) and 2),  5) and 6) are different in that the first attribute in 1) 
and  5)  is adjective itself and in 2) and 6) is DE-phrase (adjective  +  DE). 
Chinese  adjectives  can both directly and indirectly (with the  help  of  DE) 
modify  the following noun,  but DE-phrase attributes seem much freer and less 
restricted.  3) is not ambiguous because an adjective is generally not allowed 
to  modify  another adjective.  4) is in the pattern VP + DE + N1 + DE  +  N2, 
which can always be syntactically interpreted either as (VP + DE + N1) + DE  + 
N2 or VP + DE + (N1 + DE + N2).  In 7) the adjective PIAOLIANG can by no means 
modify  the immediately following noun MU because of the effect of the  number 
of  syllables in an immediate constituent on structure.  The potential  demand 
that  a 2-syllable constituent be directly related also to another  2-syllable 
word and never to a single-syllable word makes the first combination of MU and 
QIAO into a 2-syllable word-like unit which is then modified by the 2-syllable 
adjective  PIAOLIANG  (see  0.1.3  in "A  Dependency  Syntax  of  Contemporary 
Chinese", p.3). 

4.7. Word syntax vs. sentence syntax

One  of  the  important features of Chinese grammar is  that  word  syntax  is 
essentially  identical to sentence syntax.  In fact,  there is no  substantial 
limit  between  a word and a morpheme.  What is worth mentioning here is  that 
word  syntax  embodies  more elements of ancient  Chinese.  (for  details  and 
examples,  see 1.0 in "A Dependency Syntax of Contemporary Chinese",  pp.4-5).   

4.8. Antecedents of relative pronouns
 
There are no relative pronouns in Chinese.

4.9. Ellipsis

For the so-called elliptical sentence JI CHI LE, see 4.5.

In  Chinese,  there  are few elliptical sentences like "Hans liebt  Anna,  und 
Peter  auch",  "Peter claims that Paul likes beer and Sandy does too" or  "Sam 
sent  Pam  to Mary and Paul to Sara" as in Indo-European  languages.  It  then 
becomes  a tough problem for the machine to properly recover what has  omitted 
in  the original sentences when our system translates them into Chinese.  (see 
the doctoral dissertation by Xiuming HUANG, U.S.)

As  for  the sentence "The professor would claim that the students will go  on 
strike,  if  necessary",  only some so-called Europeanized Chinese  sentences, 
i.e.  the  sentences  greatly  affected in their structures  by  Indo-European 
languages, will have the same problem. 

4.10. Long distance dependencies

Problems  of  this  sort are not found in Chinese because the  word  order  in 
Chinese  interrogative  sentence  is the same as that in some  other  type  of 
sentence.

5. Style

1)  Compared with written Chinese,  spoken Chinese seems some freer in syntax. 
For  example,  although  in 2.2.1 we declare that the pattern O-V-S  does  not 
exist in Chinese,  this,  however,  is only true in written Chinese,  while in 
spoken  Chinese we can easily hear such a sentence as "DI (floor) SAO  (sweep) 
LE (pst.) ME (chu) NI (you) ?" (Have you swept the floor?). Besides, the three 
very  important  particles DE,  DE2,  DE3 are pronounced exactly the  same  in 
spoken  Chinese.  But we shall only deal with written Chinese in the  form  of 
"informative texts".

2)  There  are  8  major  dialects and hundreds  of,  or  even  thousands  of, 
subdialects   in  Chinese.   Each  dialect  has  more  or  less  of  its   own 
characteristics in phonetics,  vocabulary and syntax.  But we shall only  deal 
with PUTONGHUA (standard Chinese).
    

Very interesting, thanks for sharing

A question is, where are we now after two decades?
It's such an important problem with the rise of "semantic web"; immensely more important now.

-。-。-。-。-。-。-
我是管理员-有事尽管问!