Theoretical Foundations of Data-Driven Linguistics
07-09-2025

Social Sciences in China (Chinese Edition)

No. 4, 2025

 

Theoretical Foundations of Data-Driven Linguistics

(Abstract)

 

Liu Haitao

 

Linguistic research in the era of data-based intelligence should be grounded in authentic language data, rooted in the fundamental nature of human language, and enriched by the accumulated insights of prior linguistic scholarship. The necessity and feasibility of extracting linguistic patterns from language data are evident, with linearity and systematicity serving as the two core principles of data-driven linguistics. This approach conceptualizes language as a human-driven probabilistic system, focusing on its probabilistic nature to explore both linear laws and systematic structural patterns of language. It examines the relationship between linear word chains and two-dimensional network structures. At the same time, it advances a dual perspective attentive to both human and machine needs in the context of data-based intelligence. Through the synergistic combination of data-driven and data-based methodologies, it employs systems science approaches to uncover the operational principles of human language systems and examine their applicability in artificial intelligence AI and related fields. This endeavor ultimately aims to establish a scientific discipline of “speech dynamics,” contributing to the development of explainable AI and deepening our understanding of the operational mechanisms of human language systems.