Next Article in Journal
Illuminating Water and Life
Previous Article in Journal
Low-Pass Filtering Approach via Empirical Mode Decomposition Improves Short-Scale Entropy-Based Complexity Estimation of QT Interval Variability in Long QT Syndrome Type 1 Patients
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropy Evaluation Based on Value Validity

by
Tarald O. Kvålseth
Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
Entropy 2014, 16(9), 4855-4873; https://0-doi-org.brum.beds.ac.uk/10.3390/e16094855
Submission received: 4 July 2014 / Revised: 6 August 2014 / Accepted: 18 August 2014 / Published: 5 September 2014
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Besides its importance in statistical physics and information theory, the Boltzmann-Shannon entropy S has become one of the most widely used and misused summary measures of various attributes (characteristics) in diverse fields of study. It has also been the subject of extensive and perhaps excessive generalizations. This paper introduces the concept and criteria for value validity as a means of determining if an entropy takes on values that reasonably reflect the attribute being measured and that permit different types of comparisons to be made for different probability distributions. While neither S nor its relative entropy equivalent S* meet the value-validity conditions, certain power functions of S and S* do to a considerable extent. No parametric generalization offers any advantage over S in this regard. A measure based on Euclidean distances between probability distributions is introduced as a potential entropy that does comply fully with the value-validity requirements and its statistical inference procedure is discussed.

1. Introduction

Consider that p1,..., pn, with i = 1 n p i = 1, are the probabilities of a set of n quantum states accessible to a system or of a set of n mutually exclusive and exhaustive events of some statistical experiment. Thus, pi is the probability of the system being in state i or of event i occurring ( i = 1,..., n ). The entropy (of the system or set of events) is then defined as:
S = - k i = 1 n p i log  p i
where k is some positive constant and where the logarithm is the natural one. In statistical mechanics, k may be Boltzmann’s constant, while, in information theory, k = 1/log2 so that S = - i = 1 n p i log 2 p i and the unit of measurement becomes bits as introduced by Shannon [1]. When deriving Equation (1) axiomatically from some basic required properties (axioms), k becomes an arbitrary constant (e.g., [2,3]). For convenience, we shall set k = 1 throughout this paper.
The entropy S, which provides a link between statistical mechanics and information theory, is interpreted somewhat differently in the two fields. In statistical mechanics, entropy is often considered to be a measure of the disorder of a system, although it may be argued that a more appropriate measure of disorder is the following dimensionless relative entropy [3] (pp. 366–357):
S * = S log  n = - i = 1 n p i log n p i [ 0 , 1 ]
In information theory, S is typically interpreted as a measure of the uncertainty, information content, or randomness of a set of events, while S* in (2) is considered as a measure of efficiency of a noise-free communication channel and 1− S* as a measure of its redundancy [4] (pp. 109–110).
Boltzmann [5] had used the function S in Equation (1) (or its continuous analog), but what Shannon [1] “did was to give a universal meaning to the function −∑pi log pi and thereby make it possible to find other applications [6] (p. 476)”. This function has indeed proved to be remarkably versatile and used as a measure of a variety of attributes in various fields of study, ranging from ecology (e.g., [7]) to psychology (e.g., [8]). It has also resulted in literally infinitely many alternative entropy formulations and generalizations such as the parameterized families of entropies given in Table 1 and for each of which the S in Equation (1) is a particular member. The real utility or contributions of those generalization efforts may be questioned, with some calling them “mindless curve-fitting” and stating that “The ratio of papers to ideas has gone to infinity” [9].
This paper is concerned with the use and misuse of S and S* in Equations (1) and (2) and other proposed entropies. Whatever an entropy measure is being used for, it is not uncommon for comparisons to be made between differences in entropy values and for statements or implications to occur about the absolute and relative values of the attributes (characteristics) being measured by means of the entropy. This can lead to incorrect and misleading results and conclusions unless certain conditions are met as discussed in this paper. If, using a simplified notation, e1, e2,... denote the values of a generic entropy E for the probability distributions Pn = (p1,... pn ), Qm = (q1,..., qm ),..., the various types of potential comparisons may be defined as follows:
S i z e ( o r d e r ) c o m p a r i s o n : e 1 > e 2
D i f f e r e n c e c o m p a r i s o n : e 1 - e 2 > e 3 - e 4
P r o p o r t i o n a l d i f f e r e n c e c o m p a r i s o n : e 1 - e 2 > c ( e 3 - e 4 )
where c is a constant.
In particular, we shall address the following fundamental questions: Which conditions on an entropy are required for the comparisons in Equation (3) to be valid or permissible? Does S or S* in Equations (1) and (2) meet such valid comparison conditions, and if not, are there functions of S or S* that do? Do any of the entropy families in Table 1 have members that are superior to S in this regard? If none of those entropies meet such conditions, is there an alternative entropy formulation that does?

2 Entropy Properties

2.1. Properties of S

Although the properties of S(Pn ), or simply S, in Equation (1) are discussed in various textbooks (e.g., [24,10,24]), they will be briefly outlined here so that we can conveniently refer to them throughout this paper. Some of the most important ones are as follows:
(P1)
S is a continuous function of all its arguments p1,..., pn (so that small changes in some of the pis result in only a small change in the value of S).
(P2)
S is (permutation) symmetric in the pi (i = 1,..., n).
(P3)
S is zero-indifferent (expansible), i.e., the addition of some state(s) or event(s) with zero probability does not change the value of S, or formally:
S ( p 1 , , p n , 0 , , 0 ) = S ( p 1 , , p n )
(P4)
S attains its extremal values for the two probability distributions:
P n 0 = ( 1 , 0 , , 0 ) , P n 1 = ( 1 n , , 1 n )
so that, for any distribution Pn = (p1,..., pn ):
S ( P n 0 ) S ( P n ) S ( P n 1 )
(P5)
S ( P n 1 ) is strictly increasing in n for P n 1 in Equation (4).
(P6)
S is strictly Schur-concave and hence, if Pn is majorized by Qn (denoted by ≺):
P n Q n S ( P n ) S ( Q n )
with strict inequality unless Qn is simply a permutation of Pn.
(P7)
S is additive in the following sense. If {pij} in the joint probability distribution for the quantum states for two parts of a system or for the events of two statistical experiments, with marginal probability distributions {pi+} and {p+ j} where p i + = j = 1 m p i j and p + j = i = 1 n p i j for i = 1,..., n and j = 1,..., m, then, under independence:
S ( { p i j } ) = S ( { p i + p + j } ) = S ( { p i + } ) + S ( { p + j } )
Most of these properties would seem to be necessary and desirable for any entropy. One could argue about the absolute necessity of Property P7 (e.g., [25]) and among the families of entropies in Table 1, only S1 and S4 have this property. The essential Property P6 is a precise way of stating that the value of S increases as components of a probability distribution become “more nearly equal”, i.e., S(Pn ) > S(Qn ) if the components of Pn are “more nearly equal” or ”less spread out” than those of Qn. In terms of majorization, and by definition [26], if the components of Pn are ordered such that:
p 1 p 2 p n
and similarly for Qn, then:
P n Q n if  i = 1 j p i i = 1 j q i , j = 1 , , n - 1
with i = 1 n p i = i = 1 n q i = 1. Of course, not all Pn and Qn are comparable with respect to majorization.

2.2. Valid Comparison Conditions

If an entropy has the above Properties P1–P6, there would seem to be no particular reason to doubt that size (order) comparisons are reasonable or permissible. Thus, for S and Equation (1) with k = 1 and for, say, P 3 ( 1 ) = ( 0.90 , 0.05 , 0.05 ) and P 2 ( 2 ) = ( 0.70 , 0.30 ) so that S ( P 3 ( 1 ) ) = 0.39 and S ( P 2 ( 2 ) ) = 0.61, it would be reasonable to conclude that the disorder or uncertainty is greater in the second case than in the first. However, for the additional probability distributions P 2 ( 3 ) = ( 0.8 , 0.2 ) and P 4 ( 4 ) = ( 0.70 , 0.15 , 0.10 , 0.05 ), the result S ( P 2 ( 2 ) ) - S ( P 3 ( 1 ) ) = 0.22 and S ( P 4 ( 4 ) ) - S ( P 2 ( 3 ) ) = 0.41 simply states that the difference in S-values of 0.22 is less than that of 0.41. There is, however, no basis for assuming or suggesting that this result necessarily reflects the true differences in the disorder of the four systems or the uncertainty of the four sets of events. For such comparisons to be valid, additional conditions need to be imposed. We shall determine such validity conditions in a couple of different ways.
In measurement theory, “Validity describes how well the measured variable represents the attribute being measured, or how well it captures the concept which is the target of measurement” [27] (p. 129). While there are different forms of validity, we shall use value validity and define it as follows:
Definition: A measure has value validity if all its potential values provide numerical representations of the size (extent) of the attribute being measured that are true or realistic with respect to some acceptable criterion.
To determine the conditions for an entropy to have value validity, we shall use the recently introduced lambda distribution defined as:
P n λ = ( 1 - λ + λ n , λ n , , λ n ) , λ [ 0 , 1 ]
where λ is a parameter that reflects the uniformity or evenness of the distribution [28]. The P n 0 and P n 1 in Equation (4) are particular (extreme) cases of this distribution. In fact, P n λ is a weighted mean of P n 0 and P n 1, i.e.,:
P n λ = λ P n 1 + ( 1 - λ ) P n 0
For a generic entropy E that is (strictly) Schur-concave (Property P6), and from the majorization P n 1 P n P n 0 for any given Pn as is easily verified from Equations (6) and (7), it follows that:
E ( P n ) = E ( P n λ ) for a unique  λ
Consequently, validity conditions on E(Pn ) can equivalently be formulated in terms of E ( P n λ ).
By considering P n λ , P n 0, and P n 1 as points (vectors) in n-dimensional space, Euclidean distances are then the logical choice as the basis of a criterion for the value validity of entropy E. Then, the following ratio equality presents itself as the natural and obvious requirement:
E ( P n 1 ) - E ( P n λ ) E ( P n 1 ) - E ( P n 0 ) : = d ( P n λ , P n 1 ) d ( P n 0 , P n 1 ) = 1 - λ
Besides the standard Euclidean distance function d used in Equation (11), the same result 1 − λ would be obtained for all members of the Minkowski class of distance metrics. With E ( P n 0 ) = 0 since there is no disorder or uncertainty when one pi = 1 (and the other pis equal 0) or when n = 1, (11) can be expressed as:
E ( P n λ ) = λ E ( P n 1 )
and, in terms of the relative entropy:
E * ( P n λ ) = E ( P n λ ) E ( P n 1 ) = λ
for all n and λ. This formulation is also an immediate consequence of (9), i.e.:
E ( P n λ ) = E [ λ P n 1 + ( 1 - λ ) P n 0 ] = λ E ( P n 1 ) + ( 1 - λ ) E ( P n 0 ) = λ E ( P n 1 ) for  E ( P n 0 ) = 0
If we accept E ( P n 1 ) = log  n as a reasonable maximum entropy for any given n, which is that of S in Equation (1) (with k = 1), then Equation (12) would become:
E ( P n λ ) = λ log  n
However, a reasonable and justifiable alternative would clearly be E ( P n 1 ) = n - 1 so that Equation (12) becomes:
E ( P n λ ) = ( n - 1 ) λ
Of course, both expressions in Equations (15) and (16) give E ( P n λ ) = 0 for n = 1 as is only reasonable.
The E ( P n 1 ) = n - 1 and Equation (12) also follow from simple functional equations. With E ( P n 1 ) = f ( n ), it seems reasonable and most intuitive to suggest that increasing n by an integer value m (m < n) should result in the same absolute change in the value of the function f as when n is reduced by the same amount m, i.e.,:
f ( n + m ) - f ( n ) = f ( n ) - f ( n - m )
The general solution to this functional equation is:
f ( n ) = a + b n
where a and b are arbitrary real constants [29] (p. 82). Also, Equation (18) is the solution of Jensen’s functional equation for integers ([29] (p. 43), i.e.,:
f ( n + m 2 ) = f ( n ) + f ( m ) 2
Since f(1)=0, Equation (18) becomes f (n) = b(n − 1) and hence E ( P n 1 ) = n - 1 for b = 1.
If, instead of Equation (17), one proposes:
f ( n m ) = f ( n ) + f ( m )
then the most general solution would be f (n) = a logn with arbitrary constant a [29] (p. 39). By setting a = 1 and hence E ( P n 1 ) = log  n, then, instead of Equation (16), Equation (12) becomes Equation (15).
Similarly, for any given (fixed) n, E ( P n λ ) becomes a function g of λ only and for which it is proposed that:
g ( λ + μ ) - g ( λ ) = g ( λ ) - g ( λ - μ )
where μ is such that 0 ≤ λ + μ ≤ 1 and 0 ≤ λμ ≤ 1, with the general solution of Equation (21) being:
g ( λ ) = c + d λ
with arbitrary constants c and d [29] (p.82). Since E ( P n 0 ) = g ( 0 ) = 0 and E ( P n 1 ) = g ( 1 ) = d, Equation (22) results in Equation (12).
Consequently, different lines of reasoning lead to Equations (12) and (15) or Equation (16) as conditions for an entropy E to have value validity and therefore making the difference comparisons in Equations (3b) and (3c) permissible. The basis for those conditions are the distance criterion in Equation (11), the mean-value relationship in Equation (14), and the difference relationships represented by the functional equations in Equations (17), (19)(21). Those functional equations also directly support the validity of the comparisons in Equations (3b) and (3c).

3. Value-Valid Functions of S and S*

It is immediately apparent that neither S in Equation (1) nor S* in Equation (2) meet those validity conditions. It is found that S and S* consistently overstate the true extent of the attribute being measured, i.e., the attribute of system disorder or event uncertainty. Consider, for example, the lambda distribution in Equation (8) with λ = 0.5 and n = 4, i.e., P 4 0.5 = ( 0.625 , 0.125 , 0.125 , 0.125 ) for which S = 1.07 and S* = 0.77, which are, respectively, substantially greater than the values (0.5) log 4 = 0.69 and 0.5 as required by Equations (15) and (13). Each element of the distribution P 4 0.5 has the same distance from each element of P n 1 = ( 0.25 , 0.25 , 0.25 , 0.25 ) as it does from each element of P n 0 = ( 1 , 0 , 0 , 0 ), i.e., P 4 0.5 is the midpoint between P 4 0 and P 4 1. Clearly, the midrange (0 + log4)/2 = 0.69 would be the only reasonable entropy value and the midrange (0+1)/2 = 0.5 the only reasonable relative entropy value, which are consistent with Equations (15) and (13). Also, one distribution P4 for which S ( P 4 ) = S ( P 4 0.5 ) as in Equation (10) is found by trial and error to be P4 = (0.6, 0.2, 0.14, 0.06).
As another simple example, consider P3 = (0.8, 0.15, 0.05) for which S = 0.61 and S* = 0.56. Since this P3 -distribution is much closer to P 3 0 = ( 1 , 0 , 0 ) than it is to P 3 1 = ( 1 / 3 , 1 / 3 , 1 / 3 ) and since S ∈ [0, 1.10] for n = 3 and since S* ∈ [0, 1], these values of S = 0.61 and S* = 0.56 are unreasonably large. By comparison, for S ( 0.8 , 0.15 , 0.5 ) = S ( P 3 λ ) in Equation (10), it is found that λ = 0.282 so that, from Equations (13) and (15), 0.282log3 = 0.31 and 0.28, respectively, would have been appropriate values, rather than 0.61 and 0.56, had the entropy (with upper bound log n) had value validity. When comparing the results from these two examples with the respective S-values of 1.07 and 0.61, it would not be a valid inference that the disorder (uncertainty) in the first case was about 75% greater than in the second case (i.e., as a particular case of Equation (3c)). This result would only apply to the S-values themselves and not to the attribute that S is supposed to measure (i.e., the disorder or uncertainty). The appropriate and valid comparison should be between the above entropy values of 0.69 and 0.31, showing a 123% increase in disorder (uncertainty). Even though S and S* do not meet the conditions for valid difference comparisons, perhaps some functions of S and S* do. We shall address this next.

3.1. The Case of S

In order to satisfy the validity requirement in Equation (15), we shall explore if there exists a function (or transformation) f such that:
S ( P n λ ) = f ( λ log  n )
from which a transformed entropy ST could be obtained as:
S T ( P n λ ) = λ log  n = f - 1 [ S ( P n λ ) ] = g [ S ( P n λ ) ]
where P n λ is again the distribution defined in Equation (8). From the graphs of S ( P n λ )
versus λ logn for some different values of n as shown in Figure 1, it is clear that no such function f exists for all λ and n. It is also evident from Figure 1 that S overstates the degree of disorder (uncertainty) throughout the range from 0 to log n and for different n. The absolute extent of such overstatement or lack of value validity appears to be greatest when S roughly equals (4/3) log n.
Nevertheless, it would appear from Figure 1 that at least a reasonable degree of approximation could be achieved from Equations (23) and (24) if we restrict those functions to cases when, say, S ≤ 0.8logn, or S* ≤ 0.8 for all n. When the function (model) S = α (λ logn)β is fitted to the different values of n and λ in Table 2 for S* ≤ 0.8, regression analysis results in the parameter estimates α̂ = 1.52 and β̂ = 0.78. When these estimates are replaced with the nearest fraction (for convenience) 3/2 and 4/5 and when this fitted function is then inverted as in Equation (24), we obtain the transformed entropy:
S T ( P n λ ) = [ ( 2 3 ) S ( P n λ ) ] 5 / 4 for  S * ( P n λ ) 0.8
so that, for any probability distribution Pn = (p1,..., pn ) and from Equation (10):
S T ( P n ) = [ ( 2 3 ) S ( P n ) ] 5 / 4 for  S * ( P n ) 0.8
The values of S T ( P n λ ) in Equation (25) for various λ and n as given in Table 2 are quite comparable with the corresponding values of λ log n. In fact, the coefficient of determination, when properly computed [30], is found to be R 2 = 1 - Σ [ λ log  n - S T ( P n λ ) ] 2 / Σ ( λ log  n - λ log  n ¯ ) 2 = 0.998, showing that about 99% of the variation of λ log n is explained (accounted for) by the model in Equation (25).
The entropy ST has all of the same Properties P1–P6 as does S, but it does not have the additivity Property P7. Of course, ST has the limitation that it is defined for the restricted range from 0 to [(2/3)(.8 log n)]5/4. However, ST in Equation (26) does approximately meet the requirement in Equation (15) for its limited range so that difference comparisons as in Equations (3b) and (3c) are reasonably valid.

3.2. The Case of S*

For the relative entropy S* ∈[0,1] in Equation (2), and in order to meet the validity condition in Equation (12) with E ( P n 1 ) = 1, a function f is needed such that S * ( P n λ ) = f ( λ , n ) and from which a transformed relative entropy S T * [ 0 , 1 ] follows as:
S T * ( P n λ ) = λ = g [ S * ( P n λ ) , n ]
It is apparent from Figure 1 that the functions f and g have to have the integer n as a variable. By exploring alternative functions or models for different n and λ, using regression analysis, and expressing parameter estimates as convenient fractions, the following result is obtained:
S T * = 1 - [ 1 - ( S * ) 4 / 3 ] α , α = ( 1 / 2 ) ( n - 1 ) 1 / 9
where S* stands for either S * ( P n λ ) or S* (Pn) and the corresponding S T * stands for either S T * ( P n λ ) or S T * ( P n ).
This function (model) in Equation (28) does indeed provide excellent fit to different data points (n, λ) as seen from the results in Table 2. The values of S T * ( P n λ ) are nearly equal to the values of λ for different n. The small residuals λ - S T * in Table 2 have no clear pattern that would indicate any particular inadequacy with Equation (28). The coefficient of determination, when properly computed [30], is found from Table 2 to be R 2 = 1 - Σ ( λ - S T * ) 2 / Σ ( λ - λ ¯ ) 2 = 0.997, indicating that nearly all of the variation in the chosen λ-values is explained by Equation (28).
While the ST in Equations (25) and (26) is only defined for S* ≤ 0.8, the S T * in Equation (28) is appropriate for all S T * and S*. Being a strictly increasing function of S* = S/log n for any given n, S T * has some of the same properties as S given in Section 2.1 with some obvious exceptions. However, S T * has the important advantage over S* of satisfying, to a high degree of approximation, the condition in Equation (12) with S T * ( P n 1 ) = 1, making difference comparisons as in Equations (3b) and (3c) reasonably valid for S T *. Of course, neither S T * nor S* are zero-indifferent (Property P3) unless n is replaced by n+ where n+ is the number of positive elements of Pn = (p1,…pn), or formally stated:
n + = # { 1 i n : p i > 0 }
It may also be noted that log2 n+ is frequently referred to as Hartley’s measure or entropy ([24], Chapter 2) after Hartley [31].
For the interesting binary case; Equation (28) simplifies to:
S T * = 1 - 1 - ( S * ) 4 / 3 for  n = 2
and noting that:
S * ( P 2 ) = - i = 1 2 p i log p i / log 2 = - i = 1 2 p i log 2 p i
Figure 2 shows a comparison between S T * and S* for distribution P 2 λ = ( 1 - λ / 2 , λ / 2 ) (upper graph) and for P2 = (1 − p,p) (lower graph), with the latter form of the distribution typically being used for depicting binary entropies (e.g., [4,24]). The dashed lines represent the entropy requirement for value validity in Equation (13), which, for the upper and lower graphs becomes, respectively:
E * ( P 2 λ ) = λ , E * ( 1 - p , p ) = 1 - 1 - 2 p
Note that, while the derivative of E*(1 − p, p) with respect to p in Equation (31) does not exist at p = 0.5, E* (1−p, p) is continuous at p = 0.5 (Property P1).
It may perhaps be tempting to use S T * in Equation (28) to propose the following entropies:
S T = ( log  n ) S T * , S T = ( n - 1 ) S T *
which would, respectively, comply with Equations (15) and (16), at least to a high degree of approximation. If, instead of n, the n+ in Equation (29) is used in Equation (32) and for S T * in Equation (28), then those two potential entropies S T and S T would also be zero-indifferent (Property P3). However, neither S T nor S T can be acceptable entropies as exemplified by the two distributions P4 = (0.40, 0.35, 0.24, 0.01) and Q4 = (0.40, 0.35, 0.24, 0) for which S(P4) = 1.12 and S(Q4) = 1.08 whereas, from Equations (32) and (28) using n+, S T ( P 4 ) = 0.76 , S T ( Q 4 ) = 0.94 , S T ( P 4 ) = 1.50 and S T ( Q 4 ) = 1.72. That is, in spite of the majorization P4Q4 when any reasonable entropy should be greater for P4 than for Q4 (Property P6), both S T and S T give the opposite result. It is easy to find other examples with the same results.

4. Assessment of Entropy Families

For a parameterized family of entropies Si, such as those defined in Table 1, to be viable beyond being an interesting mathematical exercise or a generalization for its own sake, one could certainly argue that Si would need to meet some conditions lacking by S in Equation (1). First, Si should have some properties that may be considered important or desirable and that S is lacking. Second, the flexibility provided by the incorporation of one or more parameters into the formulation of Si should be justifiable by the parameter(s) having some meaning or interpretation relative to the characteristic (attribute) that Si is supposed to measure.
With respect to the first condition, it is rather obvious from the expressions in Table 1 that none of those entropy families would be favored over S in Equation (1) in terms of their properties. In fact, some of those entropies are even lacking the essential Schur-concavity property (Property P6 in Section 2.1). The entropy S3 in Table 1, which is a particular subset of S8 with β = δ = 1 and λ = k/(1 − α), and which was defined for all real α, is strictly Schur-concave only for α ≥ 0. This follows immediately from the fact that, with the pis ordered as in Equation (6), the partial derivative S 3 / p i = k [ α / ( 1 - α ) ] p i α - 1 is increasing in i = 1,…, n only if α > 0 and strictly so if the inequalities in Equation (6) are all strict [26] (p.84). For the limiting case when α → 0, S3 reduces to (1), which is strictly Schur-concave [26] (p. 101). Similarly, S10 was defined by Good [23] for non-negative integer values of α and β, but is not Schur-concave for all such α and β values. Baczkowski et al. [32] extended S10 to permit α and β to take on real values and determined the rather restrictive (α, β) regions for the Schur-concavity of S10.
A brief comment is warranted about the potential case when the probability distribution pn = (p1,…,pn) is possibly incomplete, i.e., when i = 1 n p i 1 [10,11]. Then, setting λ = k/(1 − α) for some constant k and β = δ = 1, the S8 in Table 1 becomes:
S 8 α = k 1 - α ( i = 1 n p i α i = 1 n p i - 1 ) , α > 0
In the limiting case when α α 1, and using L’Hôpital’s rule, Equation (33) reduces to:
S 8 , 0 = - k ( i = 1 n p i ) - 1 i = 1 n p i log  p i
The entropy in Equation (34) was first proposed by Rényi [11] for k = 1/log2, or equivalently, for k = 1 and the base-2 logarithm in Equation (34). In particular, when the probability distribution consists of a single probability p ∈ (0,1), then Equations (33) and (34) become:
S 8 α = k ( 1 - α ) - 1 ( p α - 1 - 1 ) , S 8 , 0 = - k log  p
It is rather apparent from the expressions in Table 1 that none of those entropy families or individual members, including those in Equations (33) and (34), meet the validity conditions in Section 2.2. Clearly, none of them satisfy Equations (15) and (16) or the weaker condition in Equation (12). There appears to be no reason for preferring any of those entropies or their relative (normed) forms over S or S* in Equations (1) and (2) because of any substantial superiority with respect to value validity.
With respect to the flexibility provided by such generalized entropies, one could argue that the entropy parameters may potentially be selected to best fit some given situation or problem [2] (p. 185) [33] (pp. 298–301). However, any parameter selection has to have some meaningful basis or explanation, which is sorely lacking in the published literature. Of the various families of entropies in Table 1, Rényi’s entropy S1 has attracted the most attention in information theory and in physics where it is being used, for example, as a generalized measure of fractal dimension in chaos theory [34] (pp. 686–688) [35] (pp. 203–223).
Furthermore, such flexibility can alternatively be achieved by simply considering strictly increasing functions of S in Equation (1). As an example, consider Rényi’s entropy S1 in Table 1 with α = 2, i.e., - log i = 1 n p i 2. For the lambda distribution P n λ = { p i λ } in Equation (8) and the values of n and λ in Table 2, and based on regression analysis, the following model is obtained:
- log i = 1 n ( p i λ ) 2 = 0.58 [ S ( P n λ ) ] 1.16 , R 2 = 0.84
It then follows from Equation (10) that the same type of relationship as in Equation (35) should hold approximately for any probability distribution Pn = (p1,…pn).

5. The Euclidean Entropy

Since neither S in Equation (1) nor any of the entropies in Table 1 meet the validity condition in Equation (12) or in Equations (15) and (16), we shall search for an entropy that does. The most logical starting point is clearly the Euclidean distance relationship in Equation (11). Thus, for any distribution Pn = (p1,…pn), we can define:
S E * ( P n ) = 1 - d ( P n , P n 1 ) d ( P n 0 , P n 1 ) [ 0 , 1 ]
where P n 0 and P n 1 are those in Equation (4). With P n = P n λ in Equation (8), it is immediately apparent that this S E * satisfies the validity condition in Equation (13). Then, an entropy that satisfies condition Equation (16) can be defined in terms of Equation (36) as:
S E = ( n + - 1 ) S E * [ 0 , n + - 1 ]
where n+ is defined in Equation (29). It seems appropriate to call this SE as the Euclidean entropy since it is based purely on Euclidean distances. The n+ instead of n is used in the definition of SE to ensure that it is zero-indifferent (Property P3 in Section 2.1).
The SE can be expressed as:
S E = ( n + - 1 ) { 1 - [ 1 - n + n + - 1 ( 1 - i = 1 n p i 2 ) ] 1 / 2 } = n + - 1 - [ ( n + - 1 ) ( n + i = 1 n p i 2 - 1 ) ] 1 / 2 = ( n + - 1 ) ( 1 - n + s n + - 1 )
where sn+−1 is the standard deviation of the n+ positive probabilities using n+ − 1 instead of n+ as a divisor. From the first expression in Equation (38), we see that, for any given n+, SE is also a strictly increasing function of the so-called quadratic entropy 1 - i = 1 n p i 2 studied in [36]. Note also that S E * in Equations (36) and (37) is the coefficient of nominal variation introduced by [37] as measure of variation for nominal categorical data. Also, from the Lagrange identity (e.g., [38] (p.3)) and the second expression in Equation (38)SE and S E * can be expressed in terms of pairwise differences between probabilities as:
The SE can be seen to have all of the properties of S in Equation (1) as outlined in Section 2.1 except for the additivity Property P7. It is strictly Schur-concave (Property P6) since (a)
S E = n + - 1 - [ ( n + - 1 ) Σ Σ 1 i < j n + ( p i - p j ) 2 ] 1 / 2 , S E * = 1 - ( Σ Σ 1 i < j n + ( p i - p j ) 2 n + - 1 ) 1 / 2
is strictly Schur-convex and (b) SE is a strictly decreasing function of i = 1 n p i 2 for any given (fixed) n+ from Equation (38) [26] (Chapter 3). The SE avoids the limitation pointed out for the potential entropies S T and S T in Equation (32). That is, the implication under Property P6 also holds when some of the elements of Pn or Qn are zero. For example, for P4 = (0.40, 0.35, 0.24, 0.01) and Q4 = (0.40, 0.35, 0.24, 0), SE (P4) = 1.96 > SE (Q4) = 1.74, which is an appropriate result since P4Q4, but for which S T and S T gave the opposite and unacceptable result.
To prove this last property of SE, it is sufficient to show that, for the distribution Pn = (p1,…,pn+,0,…,0) and using n instead of n+ in the formula in Equation (38) and denoting this by SE(Pn;n), the value of SE(Pn;n) for this Pn is strictly increasing in n for given (fixed) n+. Treating n as a continuous variable (for mathematical purposes), we obtain from Equation (38) the following partial derivative:
S E ( P n ; n ) n = 1 - ( 1 2 ) ( n i = 1 n + p i 2 - 1 ) 1 / 2 ( n - 1 ) 1 / 2 - ( 1 2 ) ( n - 1 ) 1 / 2 i = 1 n + p i 2 ( n i = 1 n + p i 2 - 1 ) 1 / 2 = 1 - A - B
The first term A ≤ 1/2 since i = 1 n + p i 2 1. The term B ≤ 1/2 if ( n - 1 ) ( i = 1 n + p i 2 ) 2 n i = 1 n + p i 2 - 1, i.e., if ( n - 1 ) i = 1 n + p i 2 - 1 0, which holds since i = 1 n + p i 2 1 / n +. For i = 1 n + p i 2 = 1 / n +, when B = 1/2 for n = n+ +1, A < 1/2 so that ∂SE(Pn;n)/∂n > 0 in Equation (39) for all nn+ +1, which complete the proof. Thus, if Qn = (q1,〦,qn) for all qi > 0 is majorized by Pn = (p1,…,pn+,0,…,0), then SE (Qn) > SE (p1,…pn+).
Most importantly, and the reason for introducing SE and S E *, is that they satisfy the validity requirement in Equations (16) and (13), respectively. For P n λ in Equation (8), the expressions for SE and S E * in Equations (37) and (38) become S E ( P n λ ) = ( n - 1 ) λ and S E * ( P n λ ) = λ. The S E * in Equation (36) also has an appealing interpretation: it is the relative extent to which the distance between Pn and P n 1 is less than that between P n 0 and P n 1. Such interpretation can also be made in terms of max P n d ( P n , P n 1 ), which equals d ( P n 0 , P n 1 ) since d ( P n , P n 1 ) is strictly Schur-convex in Pn and P n P n 0.

6. Statistical Inferences

We shall also consider the situation when the probability distribution Pn = (p1,…,pn) consists of multinomial sample estimates pi = ni/N for i = 1,…, n and sample size N = i = 1 n n i, with the corresponding population distribution being Πn = (π1,…,πn). For a generic entropy E, our interest may then be in making statistical inferences, especially confidence-interval construction, about the unknown population entropy E(Πn) based on the sample distribution Pn and the sample size N. From the delta method of the large sample theory ([39], Chapter 14), the following convergence to the normal distribution holds:
N [ E ( P n ) - E ( Π n ) ] d N o r m a l ( 0 , σ 2 )
In other words, for large N, E(Pn) is approximately normally distributed with mean E(Πn) and variance Var[E(Pn)] = σ2/N or standard error S E = σ / N and where σ2 is given by:
σ 2 = i = 1 n π i ( E ( Π n ) π i ) 2 - [ i = 1 n π i ( E ( Π n ) π i ) ] 2
The limiting normal distribution in Equation (40) still holds when, as is necessary in practice, the estimated variance σ̂2 is substituted for σ2 by replacing the population probabilities π2 in Equation (41) with their sample estimates pi, i = 1,…,n, yielding the estimated standard error S E ^ = σ ^ / N.
In the case of S in Equation (1) with k = 1, it is easily found from this procedure, starting with Equation (41), that the estimated standard error of S is given by:
S E ^ ( S ) = { N - 1 [ i = 1 n p i ( log  p i ) 2 - ( i = 1 n p i log  p i ) 2 ] } 1 / 2
(see, e.g., [40] (p. 100)). The estimated standard error for the transformed ST in Equation (26) is then derived from S E ^ ( S ) in Equation (42) as:
S E ^ ( S T ) = ( d S T d S ) S E ^ ( S ) = ( 5 6 ) [ ( 2 3 ) S ] 1 / 4 S E ^ ( S )
Similarly, for S T * in Equation (28):
S E ^ ( S T * ) = ( d S T * d S ) S E ^ ( S ) = ( 4 α 3 log  n ) [ 1 - ( S * ) 4 / 3 ] α - 1 ( S * ) 1 / 3 S E ^ ( S )
where α is defined in Equation (28).
In the case of SE in Equation (38) and assuming n+ = n, by (a) taking the partial derivatives ∂ SE(Πn)/∂πi for i = 1, …, n; (b) inserting those partial derivatives into Equation (41); and (c) substituting sample pi for the population πn (i = 1,…, n), the following estimated standard error is obtained:
S E ^ ( S E ) = { n 2 ( n - 1 ) N ( n i = 1 n p i 2 - 1 ) [ i = 1 n p i 3 - ( i = 1 n p i 2 ) 2 ] } 1 / 2
As a simple illustrative example of the potential use of these statistical results, consider the sample distribution P4 = (0.60, 0.20, 0.15, 0.05) based on a multinomial sample of size N = 100. The following entropy values from Equations (1) (with k = 1), (26), (28), and (38) as well their corresponding standard errors from Equations (42)(45) are then computed for this P4-distribution as: S = 1.06, S E ^ ( S ) = 0.07; ST = 0.65, S E ^ ( S T ) = 0.06 ; S T * = 0.50 , S E ^ ( S T * ) = 0.06; SE = 1.55, S E ^ ( S E ) = 0.18. While these standard errors do provide some indication of how accurately the entropy estimates reflect the corresponding unknown population entropies, such information is more appropriately provided in terms of confidence intervals and because of the limiting distribution in Equation (40). Therefore, in this example, an approximate 95% confidence interval for S(Π4) is obtained as 1.06 ± 1.96(0.07), or [0.92, 1.20]. Similarly, an approximate 95% confidence interval for the population entropy S(Π4) becomes 1.55 ± 1.96(0.18), or [1.20, 1.90]. For ST (Π4) and S T * ( Π 4 ), approximate 95% confidence intervals become [0.53, 0.77] and [0.38, 0.62], respectively.

7. Concluding Comments

A number of conclusions may be made from this analysis using the concept of value validity of an entropy and based on the lambda distribution and criteria involving Euclidean distances and simple functional equations. Equations (12)(16) provide the additional conditions that an entropy E has to meet for E to have the value-validity property so that difference comparisons as in Equations (3b) and (3c) may be permissible. While neither the Boltzmann-Shannon entropy in Equation (1) nor any of the proposed entropy families in Table 1 satisfy those conditions, the transformed entropy ST in Equation (26) does for S(Pn)/log n ≤ 0.8 and also the relative entropy S T *in Equation (28) does to a reasonable degree of approximation.
Since no members of the generalized entropies in Table 1 has the advantage of value validity over S, and some may lack other properties of S as outlined in Section 2.1, one may question the need for what seems to have become almost an embarrassment of riches of entropies. One justifiable exception would be if the parameter(s) of a generalized entropy could be shown to have some particular meaning or interpretation that would be useful for explaining some phenomenon or result. However, such flexibility that may be provided by a parameterized family of entropies can also potentially be achieved by considering functions of S in Equation (1) as exemplified by Equation (35).
Whether an entropy E is used as a measure of disorder of a system in physics, uncertainty (information content) of a set of events in information theory, or of some other attribute or characteristic, the concern is with what types of comparisons can be made between values of E. If we argue that an E, such as S in Equation (1), should only be used for size (“greater than”) comparisons as in Equation (3a), such advice will not always be heeded as demonstrated in the published literature, resulting in invalid and misleading conclusions and interpretations. Such a misuse problem is avoided and more informative results can be obtained if E has the value-validity property permitting difference comparisons in Equations (3b) and (3c) to be made. The Euclidean entropy SE in Equation (38) is proposed as one such more informative entropy.
As with any measure that summarizes a set of data into a single number, it is advisable that the results be used or interpreted with some caution and an entropy is no exception. Even though the SE in Equation (38) has the value-validity property and a number of other desirable properties so that it can be used for all the comparisons in Equations (3a)(3c) as reasonable indications of the attribute (characteristic) being measured, this does not necessarily imply that another entropy with all the same properties would produce exactly the same results. Even S in Equation (1) and some member of Rényi’s family S1 in Table 1 such as α = 2, which both have the same Properties P1–P7 (Section 2.1), do not necessarily order their values in the same way for all probability distributions unless the distributions are comparable with respect to majorization.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J 1948, 27. [Google Scholar]
  2. Aczél, J.; Daróczy, Z. On Measures of Information and Their Characterizations; Academic Press: New York, NY, USA, 1975. [Google Scholar]
  3. Landsberg, P.T. Thermodynamnics and Statistical Mechanics; Dover: New York, NY, USA, 1990. [Google Scholar]
  4. Reza, F.M. An Introduction to Information Theory; McGraw-Hill: New York, NY, USA, 1961. [Google Scholar]
  5. Boltzmann, L. Weitere studien über das wärmegleichgewicht unter gasmolekülen. In Kaiserliche Akademie der Wissenschaften [Vienna] Sitzungsberichte 1872; Hof, K.-K., Ed.; und Staatsdruckerei in Commission bei F; Tempsky: Wien, Austria, 1872; pp. 275–370. (In German) [Google Scholar]
  6. Tribus, M. Thirty years of information theory. In The study of Information; Machlup, F., Mansfield, U., Eds.; Wiley: New York, NY, USA, 1983; pp. 475–484. [Google Scholar]
  7. Magurran, A.E. Measuring Biological Diversity; Blackwell: Oxford, UK, 2004. [Google Scholar]
  8. Norwich, K.H. Information, Sensation, and Perception; Academic Press: San Diego, CA, USA, 1993. [Google Scholar]
  9. Cho, A. A fresh take on disorder, or disorderly science. Science 2002, 297, 1268–1269. [Google Scholar]
  10. Rényi, A. Probability Theory; North-Holland: Amsterdam, The Netherlands, 1970. [Google Scholar]
  11. Rényi, A. On Measures of Entropy and Information. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1961; University of California Press: Berkeley-Los Angeles, CA, USA, 1961; I, pp. 547–561. [Google Scholar]
  12. Havrda, J.; Charvat, F. Quantification method of classification processes, concept of structural α-entropy. Kybernetika 1967, 3, 30–35. [Google Scholar]
  13. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys 1988, 52, 479–487. [Google Scholar]
  14. Kapur, J.N. Generalized entropy of order α and type β. Math. Semin 1967, 4, 78–94. [Google Scholar]
  15. Aczél, J.; Daróczy, Z. Über verallgemeinerte quasilineare mittelwerte, die mit gewichtsfunktionen gebildet sind. Publ. Math. Debr 1963, 10, 171–190. (In German) [Google Scholar]
  16. Arimoto, S. Information theoretical considerations on estimation problems. Inform. Control 1971, 19, 181–194. [Google Scholar]
  17. Sharma, B.D.; Mittal, D.P. New non-additive measures of entropy for discrete probability distributions. J. Math. Sci 1975, 10, 28–40. [Google Scholar]
  18. Rathie, P.N. Generalization of the non-additive measures of uncertainty and information and their axiomatic characterizations. Kybernetika 1971, 7, 125–131. [Google Scholar]
  19. Kvålseth, T.O. On generalized information measures of human performance. Percept. Mot. Skills 1991, 72, 1059–1063. [Google Scholar]
  20. Kvålseth, T.O. Correction of a generalized information measure. Percept. Mot. Skills 1994, 79, 348–350. [Google Scholar]
  21. Kvålseth, T.O. Entropy. In International Encyclopedia of Statistical Science; Lovric, M., Ed.; Springer-Verlag: Heidelberg, Germany, 2011; Part 5, pp. 436–439. [Google Scholar]
  22. Morales, D.; Pardo, L.; Vajda, I. Uncertainty of discrete stochastic systems: Gezneral theory and statistical inference. IEEE Trans. Syst. Man Cybern 1996, 26, 681–697. [Google Scholar]
  23. Good, I.J. The population frequencies of species and the estimation of population parameters. Biometrika 1953, 40, 237–264. [Google Scholar]
  24. Klir, G.J. Uncertainty and Information; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
  25. Aczél, J. Measuring information beyond communication theory. Inf. Proc. Manag 1984, 20, 383–395. [Google Scholar]
  26. Marshall, A.W.; Ingram, O.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed; Springer: New York, NY, USA, 2011. [Google Scholar]
  27. Hand, D.J. Measurement Theory and Practice; Wiley: Chichester, UK, 2004. [Google Scholar]
  28. Kvålseth, T.O. The lambda distribution and its applications to categorical summary measures. Adv. Appl. Stat 2011, 24, 83–106. [Google Scholar]
  29. Aczél, J. Lectures on Functional Equations and their Applications; Academic Press: New York, NY, USA, 1966. [Google Scholar]
  30. Kvålseth, T.O. Cautionary note about R2. Am. Stat 1985, 39, 279–285. [Google Scholar]
  31. Hartley, R.V. Transmission of information. Bell Syst. Tech. J 1928, 7, 535–563. [Google Scholar]
  32. Baczkowski, S.J.; Joanes, D.N.; Shamia, G.M. Range of validity of α and β for a generalized diversity index H(α, β) due to Good. Math. Biosci 1998, 148, 115–128. [Google Scholar]
  33. Kapur, M.N.; Kesavan, H.K. Entropy Optimization Principles with Application; Academic Press: Boston, MA, USA, 1992. [Google Scholar]
  34. Peitgen, H.-O.; Jürgens, H.; Saupe, D. Chaos and Fractals: New Frontiers of Science, 2nd ed; Springer-Verlag: New York, NY, USA, 2004. [Google Scholar]
  35. Schroeder, M. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise; W.H. Freeman: New York, NY, USA, 1991. [Google Scholar]
  36. Vadja, I. Bounds on the minimal error probability on checking a finite or countable numbers of hypotheses. Probl. Pereda. Inf 1968, 4, 9–19. [Google Scholar]
  37. Kvålseth, T.O. Coefficients of variation for nominal and ordinal categorical data. Percept. Mot. Skills 1995, 80, 843–847. [Google Scholar]
  38. Beckenbach, E.F.; Bellman, R. Inequalities; Springer-Verlag: Berlin, Germany, 1971. [Google Scholar]
  39. Bishop, Y.M.M.; Fienberg, S.E.; Holland, P.W. Discrete Multivariate Analysis; MIT Press: Cambridge, MA, USA, 1975. [Google Scholar]
  40. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall: Boca Raton, FL, USA, 2006. [Google Scholar]
Figure 1. Relationships between S ( P n λ ), with S being the entropy in Equation (1) with k = 1 and P n λ being the probability distribution in Equation (8), and λ log n for n = 2 (lowest curve), n = 5, n = 20, and n = 100. The diagonal line corresponds to an entropy E that would satisfy the value-validity condition in Equation (15).
Figure 1. Relationships between S ( P n λ ), with S being the entropy in Equation (1) with k = 1 and P n λ being the probability distribution in Equation (8), and λ log n for n = 2 (lowest curve), n = 5, n = 20, and n = 100. The diagonal line corresponds to an entropy E that would satisfy the value-validity condition in Equation (15).
Entropy 16 04855f1
Figure 2. Upper graph: relative entropy values S * ( P 2 λ ) in Equation (2) (upper curve) and S T * ( P 2 λ ) in Equation (30) (lower curve) as functions of λ Lower graph: S*(p, 1 − p) and S T * ( p , 1 - p ) as functions of p. The dashed lines in the two graphs represent Equation (31).
Figure 2. Upper graph: relative entropy values S * ( P 2 λ ) in Equation (2) (upper curve) and S T * ( P 2 λ ) in Equation (30) (lower curve) as functions of λ Lower graph: S*(p, 1 − p) and S T * ( p , 1 - p ) as functions of p. The dashed lines in the two graphs represent Equation (31).
Entropy 16 04855f2
Table 1. Parameterized families of entropies.
Table 1. Parameterized families of entropies.
FormulationParameter RestrictionsSource
S 1 = 1 1 - α log i = 1 n p i αα > 0Rényi [10,11]
S 2 = 1 2 1 - α - 1 ( i = 1 n p i α - 1 )α > 0Havrda and Charvát [12]
S 3 = k 1 - α ( i = 1 n p i α - 1 )−∞ < α < ∞, k constantTsallis [13]
S 4 = 1 δ - α log  ( i = 1 n p i α i = 1 n p i δ )α, δ > 0Kapur [14], Aczél and Daróczy [15]
S 5 = α 1 - α [ ( i = 1 n p i α ) 1 / α - 1 ]α > 0Arimoto [16]
S 6 = 1 2 1 - β - 1 [ ( i = 1 n p i α ) ( β - 1 ) / ( α - 1 ) - 1 ]α, β > 0Sharma and Mittal [17]
S 7 = 1 2 1 - α - 1 [ i = 1 n p i α + δ - 1 i = 1 n p i δ - 1 ]α > 0, α + δ −1 > 0Rathie [18]
S 8 = λ [ ( i = 1 n p i α i = 1 n p i δ ) β - 1 ]0 < α < 1 ≤ δ, βλ > 0; or, 0 ≤ δ ≤ 1 < α, βλ < 0Kvålseth [1921]
S 9 = α 2 β - 1 ( 2 β S i - 1 )α, β > 0Morales et al. [22]
S 10 = i = 1 n p i α ( - log  p i ) βα, β positive integersGood [23]
S 11 = - 1 i = 1 n p i α i = 1 n p i α log  p iAczél and Daróczy [15]
Notes: The Greek letters used for the parameters differ from some of those used by the authors. When indeterminate forms 0/0 occur from certain parameter values (e.g, α = 1 for S1 or β = 1 for S6 ), the entropies are defined in their limits (e.g., as α → 1 or β → 1) using L’Hôpital’s rule.
Table 2. Values of S in Equation (1) (with k = 1), S* in Equation (2)ST in Equations (25)(26) and S T * in Equation (28) for the lambda distribution P n λ in Equation (8) with varying n and λ.
Table 2. Values of S in Equation (1) (with k = 1), S* in Equation (2)ST in Equations (25)(26) and S T * in Equation (28) for the lambda distribution P n λ in Equation (8) with varying n and λ.
nλλ and nSS*ST S T *
20.10.070.200.290.080.10
20.30.210.420.610.200.31
20.50.350.560.81-0.51
20.70.490.650.94-0.72
20.90.620.690.99-0.88
50.10.160.390.240.190.09
50.30.480.880.550.510.29
50.50.801.230.760.780.50
50.71.131.460.91-0.71
50.91.451.590.99-0.92
100.10.230.500.220.250.09
100.30.691.180.510.740.28
100.51.151.680.731.150.50
100.71.612.040.89-0.71
100.92.072.270.98-0.90
200.10.300.590.200.310.08
200.30.901.440.480.950.28
200.51.502.090.701.510.49
200.72.102.600.87-0.71
200.92.702.930.98-0.92
500.10.390.700.180.390.08
500.31.171.750.451.210.28
500.51.962.600.661.990.48
500.72.743.290.84-0.70
500.93.523.800.97-0.92
1000.10.460.780.170.440.08
1000.31.381.970.431.410.28
1000.52.302.970.642.350.49
1000.73.223.800.83-0.72
1000.94.144.440.96-0.91

Share and Cite

MDPI and ACS Style

Kvålseth, T.O. Entropy Evaluation Based on Value Validity. Entropy 2014, 16, 4855-4873. https://0-doi-org.brum.beds.ac.uk/10.3390/e16094855

AMA Style

Kvålseth TO. Entropy Evaluation Based on Value Validity. Entropy. 2014; 16(9):4855-4873. https://0-doi-org.brum.beds.ac.uk/10.3390/e16094855

Chicago/Turabian Style

Kvålseth, Tarald O. 2014. "Entropy Evaluation Based on Value Validity" Entropy 16, no. 9: 4855-4873. https://0-doi-org.brum.beds.ac.uk/10.3390/e16094855

Article Metrics

Back to TopTop