Statistical laws in linguistics (2024)

[Submitted on 11 Feb 2015]

View PDF

Abstract:Zipf's law is just one out of many universal laws proposed to describe statistical regularities in language. Here we review and critically discuss how these laws can be statistically interpreted, fitted, and tested (falsified). The modern availability of large databases of written text allows for tests with an unprecedent statistical accuracy and also a characterization of the fluctuations around the typical behavior. We find that fluctuations are usually much larger than expected based on simplifying statistical assumptions (e.g., independence and lack of correlations between observations).These simplifications appear also in usual statistical tests so that the large fluctuations can be erroneously interpreted as a falsification of the law. Instead, here we argue that linguistic laws are only meaningful (falsifiable) if accompanied by a model for which the fluctuations can be computed (e.g., a generative model of the text). The large fluctuations we report show that the constraints imposed by linguistic laws on the creativity process of text generation are not as tight as one could expect.
Comments: Proceedings of the Flow Machines Workshop: Creativity and Universality in Language, Paris, June 18 to 20, 2014
Subjects: Physics and Society (physics.soc-ph); Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Cite as: arXiv:1502.03296 [physics.soc-ph]
(or arXiv:1502.03296v1 [physics.soc-ph] for this version)
https://doi.org/10.48550/arXiv.1502.03296

arXiv-issued DOI via DataCite

Related DOI: https://doi.org/10.1007/978-3-319-24403-7_2

DOI(s) linking to related resources

Submission history

From: Eduardo G. Altmann [view email]
[v1] Wed, 11 Feb 2015 13:10:58 UTC (2,138 KB)

Statistical laws in linguistics (2024)

FAQs

What is the meaning of statistical law? ›

An empirical statistical law or (in popular terminology) a law of statistics represents a type of behaviour that has been found across a number of datasets and, indeed, across a range of types of data sets.

What are the linguistic laws? ›

Linguistic laws refer to statistical patterns shared across human languages. Investigation of these patterns has been extended to a range of biological systems, from molecules to organisms to ecosystems, with the number of studies increasing in recent years.

What is the role of statistics in linguistics? ›

Linguists rely on statistical methods for a whole host of reasons. For one, linguistic differences are often subtle. It is hardly the case that some speaker group, some social class, some genre, etc. uses an expression that others never use.

What are the fundamental laws of statistics? ›

The two fundamental laws that govern statistics are known as the law or statistical regularity and the law of inertia of large numbers. The law of statistical regularity is a law that helps us draw a sample with the assumption that any inferences drawn on the sample are applicable to the entire population.

Why statistical laws are not exact? ›

Statistical laws are not exact. In fact, the results are true only on averages. Also, they are valid only under a certain set of assumptions. Therefore, the science of statistics is less exact than natural sciences like physics, chemistry, etc.

What is the law of statistical significance? ›

Statistical significance is used to provide evidence concerning the plausibility of the null hypothesis, which hypothesizes that there is nothing more than random chance at work in the data. Statistical hypothesis testing is used to determine whether the result of a data set is statistically significant.

What are some examples of linguistic rules? ›

Linguistic Rules
  • A long vowel shortens before: ...
  • vowel-s-vowel > vowel-r-vowel. ...
  • c + s = x. ...
  • d or t (dental) + s = long vowel/syllable -s or -ss (with loss of d/t) ...
  • b (voiced) + s (unvoiced) = -ps- (both unvoiced) ...
  • g (voiced) + t (unvoiced) = -ct- (both unvoiced) ...
  • b (voiced) + t (unvoiced) = -pt- (both unvoiced)

What is the meaning of law linguistics? ›

Law and corpus linguistics (LCL) is an academic sub-discipline that uses large databases of examples of language usage equipped with tools designed by linguists called corpora to better get at the meaning of words and phrases in legal texts (statutes, constitutions, contracts, etc.).

What are the 3 main linguistic areas? ›

Important subfields of linguistics include:

Morphology - the study of word structure. Syntax - the study of sentence structure. Semantics - the study of linguistic meaning.

What are statistical methods in linguistics? ›

A large part of what is meant by “statistical methods” in computational linguistics is the study of stochastic grammars of this form: grammars obtained by adding probabilities in a fairly transparent way to “algebraic” (i.e., non-probabilistic) grammars.

What is statistical learning in linguistics? ›

Statistical learning is the ability for humans and other animals to extract statistical regularities from the world around them to learn about the environment. Although statistical learning is now thought to be a generalized learning mechanism, the phenomenon was first identified in human infant language acquisition.

How is statistics used in language analysis? ›

Statistics. The second most commonly used language analysis technique is statistics. It's easier to identify and comment on because it presents itself every time the author includes numbers (mostly percentages). Statistics is often used to lend credibility or evidence to the author's argument.

What are the two laws of statistics? ›

There are two fundamental laws of statistics namely, Law of Statistical Regularity and Law of Inertia of Large Numbers. These laws are very important when an investigator adopts sampling method in an enquiry.

What is the basic law of statics? ›

Statics assumes that the bodies with which it deals are perfectly rigid. It also holds that the sum of all the forces acting on a body at rest has to be zero (i.e., the forces involved balance one another) and that there must be no tendency for the forces to turn the body about any axis.

What is the golden rule of statics? ›

The statistical golden rule (SGR) is the average of the two golden ratios expressions, in which the quantities a and b are, say, science units (e.g., measured in talent, time, mental strength, etc.) and art units (corresponding to the science units) employed during a statistical undertaking.

What is the meaning of statistical purposes in law? ›

Statistical purposes mean any operation of collection and the processing of personal data necessary for statistical surveys or for the production of statistical results.

What is the law of statics? ›

Statics assumes that the bodies with which it deals are perfectly rigid. It also holds that the sum of all the forces acting on a body at rest has to be zero (i.e., the forces involved balance one another) and that there must be no tendency for the forces to turn the body about any axis.

What do you mean by statistical mean? ›

The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set. The median is the middle value when a data set is ordered from least to greatest. The mode is the number that occurs most often in a data set.

What is an example of statistical? ›

For example, if we consider one math class to be a sample of the population of all math classes, then the average number of points earned by students in that one math class at the end of the term is an example of a statistic. The statistic is an estimate of a population parameter.

References

Top Articles
Latest Posts
Article information

Author: Nathanial Hackett

Last Updated:

Views: 6046

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Nathanial Hackett

Birthday: 1997-10-09

Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

Phone: +9752624861224

Job: Forward Technology Assistant

Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.