主管:教育部
主办:中国人民大学
ISSN 1002-8587  CN 11-2765/K
国家社科基金资助期刊

journal6 ›› 2016, Vol. 0 ›› Issue (4): 26-35.

• 学术专论 • 上一篇    下一篇

地方历史文献的数字化、数据化与文本挖掘:以《中国地方历史文献数据库》为例

  

  1. 上海交通大学人文学院
  • 出版日期:2016-11-15 发布日期:2016-11-15
  • 作者简介:赵思渊(1985—), 男, 上海交通大学人文学院历史系讲师, 上海200240; titaner@ sjtu.edu.cn
  • 基金资助:

    本文系上海市晨光计划“十九世纪徽州乡村的土地市场与社会关系网络” (项目编号: 14CGA013) 阶段性成果。

The Digitization of Local Historical Archives, Creation of Metadata, and Text Mining : The Example of The Chinese Local History Archive

  1. HIstory Department,Shanghai Jiao Tong University
  • Online:2016-11-15 Published:2016-11-15
  • About author:ZHAO Siyuan (History Department,Shanghai Jiao Tong Unversity; titaner@sjtu.edu.cn)

摘要: 历史文献数据库可区分为数字化、数据化、文本挖掘三种不同形态, 迄今多数中文历史文献数据库实现了数字化功能, 部分地实现数据化功能, 而能够实现文本挖掘功能的则十分少见。数字化是将文献的物理形态转化为电子形态, 数据化是将文献转化为可量化分析的数据, 编制元数据是主要方法。文本发掘是在此基础上开发文本分析工具。《中国地方历史文献数据库》以文献学研究为基础, 建立特定的元数据结构, 提供交叉导航、数据统计等多种功能, 这些功能不仅可以帮助研究者找到自己的所需文献, 更可能帮助研究者发现新的研究议题。史学研究中, 数据库有必要被视作一种新的文献形态, 建立针对性的文献学方法论。

Abstract: This article exams three distinct concepts that distinguish historical databases: digitization, creation of metadata, and text mining. Digitization has transformed the archives into cyber texts that encompass data and metadata, which can then be used for text mining, databases, and other analytic tools. The database of Local Chinese Historical Archives was designed with these principles of metadata creation and text mining that constructed a modified metadata based on the Dublin Core. Cross-searching and statistics is also available in the database. The database not only facilitates the researchers’effort to find data for their research but also helps them discover new topics. It is also suggested that a new corresponding methodology on archives criticism should be applied to the database.