在国家大力推动人工智能场景应用的背景下,文章利用BERTopic考察人工智能应用场景的主题模型。首先,在澎湃新闻网搜集数据,经清洗后得到3524条新闻数据。在主题建模中,先利用Conan-embedding-v1预训练大模型实现文本嵌入,再利用UMAP、HDBSCAN和c-TF-IDF进行降维、聚类和主题表征,最后利用KeyBERT微调技术对主题词进行优化。在主题模型分析中,结合技术研发、文化数字化、区域经济协同、经济发展、金融创新、资本市场、医疗养老、政策协同、低空经济、城市发展、新闻传播11个主题进行了主题词分析。主题相似性分析表明不同主题之间存在强关联性:城市发展与经济发展、金融创新、政策协同三个主题的相似性较高,经济发展与金融创新、政策协同两个主题相似性较高;层次聚类及文档分布分析表明技术研发、文化数字化和政策协同与其他主题存在不同程度的交叉渗透现象。研究结果在一定程度上展现了催生人工智能颠覆性应用的场景现状、潜在需求和关联要素。
Against the backdrop of China's vigorous promotion of AI (Artificial Intelligence) application scenarios, this study employs BERTopic to examine topic patterns in AI deployment contexts. Initially, 3,524 news articles were collected from The Paper (Pengpai News) and preprocessed for analysis. For topic modeling, the Conan-embedding-v1 pre-trained large model was utilized for text embedding, followed by dimensionality reduction via UMAP, clustering through HDBSCAN, and topic representation using c-TF-IDF. Topic keywords were further refined through KeyBERT-based optimization techniques. In topic analysis, keyword distributions were examined across 11 domains: technological R&D, cultural digitization, regional economic collaboration, economic development, financial innovation, capital markets, healthcare/elderly care, policy coordination, low-altitude economy, urban development, and news dissemination. Similarity analysis revealed strong inter-topic correlations: urban development demonstrated high similarity with economic development, financial innovation, and policy coordination; while economic development showed pronounced alignment with financial innovation and policy coordination. Hierarchical clustering and document distribution analysis indicated varying degrees of cross-domain integration between technological R&D, cultural digitization, and policy coordination with other topic areas. This research, to a certain extent, elucidates the current landscape, latent demands, and interconnected elements of disruptive applications of AI.