论文部分内容阅读
【目的】法言法语实体的自动识别是实现裁判文书文本挖掘的重要的基础性工作。【方法】采用爬虫方法获取数据,人工方式进行语料标注,利用NLPIR加载法律领域词典对语料进行分词,结合法言法语的内部和外部特征构建基于条件随机场的特征模板,自动识别语料中的法言法语。【结果】融入法言法语内部和外部特征的条件随机场模型,自动识别法言法语的实验效果良好,模型的调和平均值达到90%以上。【局限】法言法语实体自动识别模型在领域的扩展上有一定的局限性。【结论】基于条件随机场对法言法语实体实现自动抽取是可行的。
【Objective】 Automatic recognition of French-speaking French is an important basic work to realize the text mining of referees. 【Method】 Data were acquired by reptile method, corpus was annotated artificially, NLPIR was used to segment the corpus in the domain of legal field, and combined with the internal and external features of French dialect, a feature template based on conditional random field was constructed, and the method of corpus was automatically identified Speak French. 【Result】 The results show that the experimental results of French-French automatic recognition method are well integrated into the conditional random field model of French internal and external features, and the average harmonic value of the model reaches more than 90%. [Limitations] French legal entities automatically identify the model in the field of expansion has some limitations. 【CONCLUSION】 It is feasible to automatically extract French-speaking French entities based on conditional random field.