专题:基于自然语言处理的研究热点识别与前沿预测

共词分析识别研究热点的效标关联效度研究:基于自然语言处理*

  • 杨 丽 张彤彤 周文杰
展开
  • 1.西北师范大学商学院
杨丽(1978-),女,西北师范大学商学院讲师;张彤彤(1994-),女,西北师范大学商学院硕士研究生;周文杰(1973-),男,西北师范大学商学院教授。

收稿日期: 2018-02-20

  网络出版日期: 2018-03-14

基金资助

*本文系国家自然科学基金项目“基于共词分析的科学计量信效度研究”(项目编号:71563042)研究成果之一。

The Criterion-related Validity of the Hot Research Issues Identified by Co-words Analysis:a Study based on the Natural Language Processing

  • Yang Li Zhang Tongtong Zhou Wenjie
Expand

Received date: 2018-02-20

  Online published: 2018-03-14

摘要

:文章应用自然语言处理的方法,对样本文献中的题名、摘要和全文进行分词,并连同关键词一起,分别提取了四种分析单元下的高频词并应用Pajek和Sci2两个软件工具和常用的八种指标(算法)分别进行了研究热点的识别。然后,以全文为效标,分别运用相关分析和配对样本t检验,对题名、摘要和关键词在研究热点识别上的同时效度进行了检验。研究发现:(1)基于摘要而识别的研究热点同时效度最高,而基于关键词所识别的研究热点同时效度相对较低,具有一定效度风险;(2)在研究热点的识别方面,文本比词的同时效度高,而且文本的长度对于同时效度有着一定影响。

本文引用格式

杨 丽 张彤彤 周文杰 . 共词分析识别研究热点的效标关联效度研究:基于自然语言处理*[J]. 图书与情报, 2018 , 38(01) : 15 -19 . DOI: 10.11968/tsyqb.1003-6938.2018003

Abstract

Present study conducted a Natural Language Processing(NLP) on the titles, abstracts and whole paper of sample literature, together with keywords, to exact the high frequency words from all of above 4 units to identify the hot research issues via Pajek and Sci2. Moreover, present study performed Criterion-related Validity through correlation analysis and paired samples t-test by setting the hot research issues identified by whole paper as criterion and compared the correlation coefficient and t-value between whole papers and titles, abstracts and keywords to identify the Concurrency Validity of co-words analysis on hot research issues. The findings of this research include that: a) Those hot research issues identified by the abstract has a higher Concurrency Validity keywords. b) Aiming to identify the hot research issues, text is better than words from the perspective of Concurrency Validity, however, Validity is affected by the length of sampled text.
文章导航

/