专辑:人工智能与情报学

基于卷积神经网络的文献自动分类研究

  • 郭利敏
展开
  • 1.上海图书馆
郭利敏,男,上海图书馆系统网络部工程师。

收稿日期: 2017-12-21

  网络出版日期: 2018-02-07

Study of Automatic Classification of Literature Based on Convolution Neural Network

  • Guo Limin
Expand

Received date: 2017-12-21

  Online published: 2018-02-07

摘要

: 人工智能技术的蓬勃发展,驱动着文献自动分类由基于规则的分类向基于机器学习的方向发展。文章在对深度学习概述的基础上,将卷积神经网络引入到了文献自动分类,构建了基于题名、关键词的多层次卷积神经网络模型,使之能够根据文献的题名和关键词自动给出中图分类号。通过在TensorFlow平台上的深度学习模型,利用《全国报刊索引》约170万条记录进行模型训练,并对7000多篇待加工的文献做中图法分类预测,其在生产情况下一级分类准确率为75.39%,四级准确率为57.61%。当置信度为0.9时,一级正确率为43.98%,错误率为1.96%,四级正确率为25.66%,四级错误率为5.11%。证明该模型有着较低的错误率,可为《全国报刊索引》分类流程的半自动化提供帮助,解决存在的编目人员紧缺、加工质量和效率下降等问题。

本文引用格式

郭利敏 . 基于卷积神经网络的文献自动分类研究[J]. 图书与情报, 2017 , 37(06) : 96 -103 . DOI: 10.11968/tsyqb.1003-6938.2017119

Abstract

With the rapid development of artificial intelligence, the automatic classification of literature is changing from the rule-based to the machine learning. After an outline of deep learning, the paper introduced convolution neural network into the automatic classification, constructing a multi-level model based on the title and the key words and thus CLC is given automatically. Through the deep learning model in TensorFlow, about 1700000 records of National Newspaper Index were used to make model train. More than 7000literature were processed with the model and the result is: under the production condition, the accuracy of the first classification is 75.39%; the accuracy of the fourth classification is 57.61. When the confidence is 0.9, the correct rate of the first classification is 43.98%, error rate is 1.96%; correct rate of the fourth classification is 25.66%, the error rate is 5.11%.This shows that the model can be used to help realize the semi-automatic in the classification of National Newspaper Index and other problems.
文章导航

/