机器翻译的风险

来源 :英语学习 | 被引量 : 0次 | 上传用户:xuwh0415
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  The ideal translator is a person “on whom nothing is lost,” said Henry James. Or maybe it’s a machine. But a machine won’t stop you from swearing at nuns...
  Years ago, on a flight from Amsterdam to Boston, two American nuns seated to my right listened to a voluble1 young Dutchman who was out to discover the United States. He asked the nuns where they were from. Alas, Framingham, Massachusetts was not on his itinerary, but, he noted, he had“shitloads of time and would be visiting shitloads of other places”.2
  The jovial young Dutchman had apparently gathered that“shitloads” was a colourful synonym for the bland “lots”.3 He had mastered the syntax of English and a rather extensive vocabulary but lacked experience of the appropriateness of words to social contexts.4
  This memory sprang to mind with the recent news that the Google Translate engine would move from a phrase-based system to a neural network. Both methods rely on training the machine with a “corpus”5 consisting of sentence pairs: an original and a translation. The computer then generates rules for inferring, based on the sequence6 of words in the original text, the most likely sequence of words from the target language.
  The procedure is an exercise in pattern matching. Similar pattern-matching algorithms are used to interpret the syllables you utter when you ask your smartphone to “navigate to Brookline” or when a photo app tags your friend’s face.7 The machine doesn’t “understand” faces or destinations; it reduces them to vectors8 of numbers, and processes them.
  I am a professional translator, having translated some 125 books from the French. One might therefore expect me to bristle9 at Google’s claim that its new translation engine is almost as good as a human translator, scoring 5.0 on a scale of 0 to 6, whereas humans average 5.1. But I’m also a PhD in mathematics who has developed software that “reads” European newspapers in four languages and categorises the results by topic. So, rather than be defensive about the possibility of being replaced by a machine translator, I am aware of the remarkable feats of which machines are capable, and full of admiration for the technical complexity and virtuosity of Google’s work.10
  My admiration does not blind me to the shortcomings of machine translation, however. Think of the young Dutch traveler who knew “shitloads” of English. The young man’s fluency demonstrated that his “wetware”—a living neural network, if you will—had been trained well enough to intuit the subtle rules (and exceptions) that make language natural.11 Computer languages, on the other hand, have context-free grammars. The young Dutchman, however, lacked the social experience with English to grasp the subtler rules that shape the native speaker’s diction, tone and structure. The native speaker might also choose to break those rules to achieve certain effects. If I were to say “shitloads of places”rather than “lots of places” to a pair of nuns, I would mean something by it. The Dutchman blundered into inadvertent comedy.12   Google’s translation engine is “trained” on corpora ranging from news sources to Wikipedia. The bare description of each corpus is the only indication of the context from which it arises. From such scanty13 information it would be difficult to infer the appropriateness or inappropriateness of a word such as “shitloads”. If translating into French, the machine might predict a good match to beaucoup or plusieurs. This would render the meaning of the utterance but not the comedy,14 which depends on the socially marked“shitloads” in contrast to the neutral plusieurs. No matter how sophisticated the algorithm, it must rely on the information provided, and clues as to context, in particular social context, are devilishly15 hard to convey in code.
  The problem, as with all previous attempts to create artificial intelligence (AI)16 going back to my student days at MIT, is that intelligence is incredibly complex. To be intelligent is not merely to be capable of inferring logically from rules or statistically from regularities. Before that, one has to know which rules are applicable, an art requiring awareness of sensitivity to situation. Programmers are very clever, but they are not yet clever enough to anticipate the vast variety of contexts from which meaning emerges. Hence even the best algorithms will miss things—and as Henry James put it, the ideal translator must be a person “on whom nothing is lost”.
  This is not to say that mechanical translation is not useful. Much translation work is routine. At times, machines can do an adequate job. Don’t expect miracles, however, or felicitous literary translations, or aptly rendered political zingers.17 Overconfident claims have dogged18 AI research from its earliest days. I don’t say this out of fear for my job: I’ve retired from translating and am devoting part of my time nowadays to…writing code.
  亨利·詹姆斯說,理想的译者应该是“一无所失”之人。或者,是一无所失之机器。但是,机器可不会教你不能在修女面前爆粗口。
  几年前,我从阿姆斯特丹乘机前往波士顿,两位美国修女坐在我右边,听一个正要去探索美国的荷兰小伙子侃侃而谈。他问修女从哪儿来。啊,马萨诸塞州的弗雷明汉,可惜不在他的行程计划之内。但是他说,他有“贼他妈多的时间,可以去贼他妈多的其他地方”。
  这个热情友好的荷兰小伙子显然知道,“贼他妈多”跟普普通通的“很多”比起来,有趣得多。他掌握了英语的句法,有相当丰富的词汇量,却缺乏交际经验,来判断用词是否合乎语境。
  想起这件事,是因为有新闻说,谷歌翻译引擎将从一个基于短语的系统,变成一个神经网络系统。两种方法都以语料库为基础,训练计算机掌握多个由原文和译文搭配组合的句子。计算机由此总结出一套规则,可以根据原句的词语排列,推导出目标语言最有可能的词语排序。
  整个过程属于模式匹配的训练。当智能手机识别你的语音提问“导航到布鲁克莱恩”,或者当拍照软件识别你朋友的面部时,运用的也是类似的模式匹配算法。计算机并不能“理解”人脸或者目的地,而是把它们变成向量,再进行处理。
  我是专业译者,译了差不多有125本法语书。有人因此可能会觉得,我看到谷歌的下述言论会很生气:谷歌新的翻译引擎跟人工译者一样好;若满分6分,谷歌可以打到5分,而人类的平均水平也只有5.1分。但我同样也是数学博士,我开发出来的软件可以“阅读”欧洲四种语言的报纸,再按主题将它们归类。所以,我对机器翻译取代人工翻译并没有多大戒心,反而非常清楚机器所取得的非凡成就,相当佩服谷歌复杂而精湛的技术。   佩服归佩服,我也不会对机器翻译的缺陷视而不见。想想那个会说“贼他妈多”的荷兰年轻人,他流利的英语显示他的“湿件”—— 一个活生生的神经网络系统——已经训练得足以感觉出一些细微规则(和例外),从而使语言自然流畅。相反,计算机语言则是纯粹脱离语境的语法。然而,那位年轻的荷兰人因缺乏英语社会经验而无法掌握母语使用者在措辞、语气和句子结构方面更微妙的规则。当然,母语使用者也可能有意打破这些规则,以达到某种效果。如果我对两个修女说“贼他妈多地方”,而不是“很多地方”,我可能是话里有话。那个荷兰人在误打误撞中造成了一种喜剧效果。
  谷歌翻译引擎所用的语料库来自各种新闻资源和维基百科。对每个语料库仅有的描述也就成了关于语境的唯一线索。从这少得可怜的信息当中,很难推断像“贼他妈多”这样的词用着合不合适。如果译成法语,机器可能会认为beaucoup或者plusiers都是很好的选择。这些词也许可以达意,但却丧失了喜剧效果,而这种效果更依赖于带有社会效应的“贼他妈多”一词,而非中性的plusiers。不管算法有多复杂,它也得依赖于已有的信息和线索,至于语境,尤其是交际语境,则很难通过编码来传达。
  人脑实在是太复杂了。我在麻省理工学院读书时,这个问题就横亘在创造人工智能的各种努力之前。要想和人类一样智能,不仅仅是能够根据规则进行逻辑推理,或是根据规律进行数据演算。在此之前,還得知道哪些规则是可用的,这得具有一种能敏锐觉察当时情况的艺术能力才行。程序员都很聪明,但是还没有聪明到可以预估意义赖以产生的庞大语境。所以即使是最好的算法,也会有所缺失——所以正如亨利·詹姆斯所说,理想的译者应该“一无所失”。
  这并不是说机器翻译毫无用处。很多翻译工作都只是例行公事而已。有时,机器完全可以胜任。但可别指望多大的奇迹,比如贴切的文学翻译,或者恰当的政治妙语。人工智能的研究从一开始就太过自信。我这么说并不是因为担心失业:我已经不搞翻译了,最近正抽空写代码呢。
  1. voluble: 健谈的。
  2. itinerary: 旅行计划,预定行程;shitload: 许多,大量。
  3. jovial: 热情友好的,天性快活的;synonym: 同义词,近义词;bland:平和的,温和的。
  4. syntax: 语法,句法;appropriateness:合适,得体。
  5. corpus: 语料库。
  6. sequence: 顺序,先后次序。
  7. algorithm: 算法;syllable: 音节;navigate: 导航。
  8. vector: 向量。
  9. bristle: 显得愤怒。
  10. feat: 业绩,功绩;virtuosity: 精湛技巧。
  11. wetware: 湿件,计算机专用术语,指软件、硬件以外的其他“件”,即人脑、大脑神经系统;intuit: 凭直觉知道。
  12. blunder: 跌跌撞撞,出漏子;inadvertent: 无意的,非故意的。
  13. scanty: 不足的,勉强够的。
  14. render:(用不同的语言)表达,翻译;utterance: 表达,表述。
  15. devilishly: 非常,极其。
  16. artificial intelligence (AI): 人工智能。
  17. felicitous: 恰当的,贴切的;aptly: 适当地;zinger: 妙语,幽默的话。
  18. dog: 作动词,意为紧随。
其他文献
林丹、刘翔、孙杨、博尔特,对于这些大神级别的体育人,还记得他们何以进入大众视线的吗?没错,奥运会。这些人的职业生涯转折点都与奥运会分不开。因为他们的带动,更多人开始了解和参与游泳、跑步、羽毛球等体育项目。  攀岩这项运动从2016年8月4日起,也与奥运会产生了联系,真正加入到奥运会竞技体育大家庭。  从宣布的那一天至今,三年时间,中国的攀岩群体都在悄悄发生着变化。  中国国民间攀岩馆开始激增,从早
聯合国第七任秘书长科菲·安南8月18日与世长辞,享年80岁。他是公认的联合国历史上最富有改革精神的秘书长,就任后为了和平使命,在世界各地不断地穿梭访问、调停斡旋、化解危机、防止战争、呼吁和谈、谴责暴力,足迹遍布五大洲。无论是在伊拉克危机、中东巴以冲突中,还是在克什米尔争端、阿富汗战争里,都可以见到安南的身影,因此他被誉为“世界上最忙碌的和平使者”。  T o live is to choose.
清末大翻译家严复在《天演论》前言中的一句感叹——“一名之立,旬月踟蹰”,可谓道出了翻译的甘苦。  一位在民企工作的朋友来了条微信,告知他们要举办国际商务年会,同时发来公关公司为他们设计的大会背板及中英标语文字。其中主题标语是“破局、驭势、共生”,下面赫然写着:Riding and Expanding, Mind and Future。朋友说,感觉翻译欠火候,但又想不出更好的译法。  在发来图片的同
It was in 2012 that Major League Baseball1 ran an ad showing generations of parents sharing the beloved American pastime with their kids. But it was already too late: in 2012, the average price for ti
摘 要:本文从教师基于教材开展听说活动中的常见问题入手,以北师大版高中《英语》(2019年版)选择性必修一Unit 1 Relationships的单元听说活动设计为例,详细说明理解教材听说活动要从单元层面出发,整体理解听说部分的主题意义,活动设计体现主题意义由理解到表达的渐进过程,并在理解和表达主题意义过程中发展技能、学习策略和积淀语言。本文还提出教师在使用教材听说活动时要遵循整体性、一致性、渐
摘 要:培养学生的批判性思维是初中英语教学中应当关注的重要教学目标之一,而英语视听说课因其特有的课型优势,可以作为培养学生批判性思维的重要阵地。在初中英语视听说课的教学实践中,教师为了有效提升学生语言学习过程的思维含量,应为学生搭建平台,提供更多开发思维能力的机会。本文旨在通过设计一节八年级英语视听说课程,表明教师通过科学选择教学材料、合理设定教学目标并精心设计教学活动,可以有效实现对学生批判性思
遇到喜欢的书,那个字啊,扑面而来,就往你的眼里蹦,直接进入你的大脑,你一下子就理解了;但有的书,字虽然也是汉字,但是却很陌生,怎么也看不懂  北京,幽静整洁的卧龙小区里。  2005年底离开ITl68网站CEO一职的宫玉国,已为他的新项目筹备了几个月。书柜上的项目进度表,明确表示着项目已进入“倒计时”。办公室窗外,纯白的玉兰花盛开着,散发出淡淡的清香。宫玉国身着休闲西装,脸上透着轻松和自然,笑意甚
How many other things are you doing right now while you’re reading this piece? Are you also checking your email, glancing at your Twitter feed, and updating your Facebook page? What five years ago Dav
摘 要:在阅读教学中,教师要引导学生运用有效的阅读策略来理解、内化、迁移和再构文本。本文聚集阅读策略中的联结策略,笔者用两个在线阅读教学案例介绍教师如何引导学生在阅读文本时建立和自我、世界、其他文本的联结,并反思运用策略的原则和意义。  关键词:联结;主题意义;初中英语;阅读教学;在线教学引言  目前,越来越多的英语教师学会了使用录播、直播或视频会议开展线上教学活动,指导学生的英语学习。在逐步克服
人的姓名当然是非常重要的。有些姓名在一些习语和表达式中,有着很多约定俗成的意义,在翻译时不能望文生义,往往要考虑其文化语境。一、源自普通人名的习语  1. 张三李四(every Tom, Dick and Harry)  宋代王安石《拟寒山拾得》诗中云:“张三裤口窄,李四帽檐长。”张三李四王五,这些都是假设的名字,泛指某人。Tom、Dick、Harry在英语文化中都是极其常见的男性名字,三个名字连