462

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 5, SEPTEMBER 1978

A Method for Clinical Data Reduction Based ''Weighted Entropy11 MASAHIKO OKADA,

Abstract-In a given medical diagnostic system, each clinical data item is not stochastically independent of each other. As information for diagnosis, each item has its relative significance. The method was developed to reduce the amount of clinical data by eliminating relatively insignificant items. The procedure of the data reduction consists of the following two steps: 1. Identify the least significant item; 2. Test the correctness of the diagnosis without the item identified in the first step. During the study, the function was derived to give the measurement for the relative significance. In step 1, the function is evaluated for the data items to determine the order of significance. The computed value is defined as "the weighted entropy." The weight is given according to the degree of the similarity among the data items, that is, the order of significance is attributed to the degree of similarity. These two steps are repeated until the test in step 2 results in a failure. At that point, we should obtain the minimal set of items with which the diagnosis is assured to be as reliable as with the original set of items.

on

MEMBER, IEEE

The method was applied to the actual differential diagnosis system. As a result, only 28 out of 72 symptoms were found to be enough to make a reliable differential diagnosis among four diseases. Note: 1) "A diagnosis" here covers not only diagnosis itself but also therapeutic selection, perspection of prognosis, and some other medical decision making. 2) We include as clinical data such items as patient history, physical examination, laboratory findings, symptoms and signs, etc. These items will be termed simply as symptoms in the following sections.

THEORY FOR COMPUTATION Computation of Entropy Let a mutually exclusive set of diseases be {D1 ,D2)... D *. * * Dn,} and symptoms which bear on the diagnosis be {Si, S2,.. , ASi, , Sm }. Let P(D.) be the probability that D. occurs, P(Sj) the probability that Si occurs, and INTRODUCTION P(Daf 'Si) the probability that D. occurs under the condition IN the development of the medical data processing and that has been observed. Then, the entropy of the set Si automated diagnosis by a digital computer, the clinical {P(D1 ), P(D2), , P(Dn)} is computed by data reduction is an essential problem to be solved [1]. It is almost impossible for doctors to process a huge amount n (1) of data perfectly. Not only in the doctor diagnosis but also I P(Da) log2 P(Da). oY=1 in the computer diagnosis, processing more than enough data may even decrease the probability of correct diagnosis. Gen- The conditional entropy of the set {P(DaIS1)ia = 1, 2, * * , erally, data items are not totally independent of each other in n, i = 1, 2, * * *, m} is computed by a given medical diagnostic system. From the study on the m n computer diagnosis, it has become clear that the data which (2) - E EP(Si)P(DaISi) log2 P(DaIS1). has been recorded often contains redundant items for making i=i a=i a differential diagnosis [2], [3]. This article describes the method to reduce the amount of The difference between (1) and (2), data which has to be recorded for diagnosis by introducing n the order to the clinical data items according to their signifi- 3 P(D&t) log2 P(Dci) Cv=1 cance relative to each other. As criteria for determining the significance, we employed the degree of similarity of a data m n item to the others, that is, we considered an item not signifi(3) E, E P(Si)P(DOIlSi) 1092 P(DadSi) cant if it is similar (in the sense that we defined in the paper) to some others. To quantify the degree of similarity, we represents how the uncertainty or entropy of the set {P(D1), derived the function during the study based on information P(D2 ),.. ,P(Dn)} decreases by the observation of the symptheory [4] - [6]. The quantity given by the function is defined toms SI , S2, . . , Sm*. In other words, the quantity given by as "entropy weighted with similarity." Experimentally, the (3) is considered as the average amount of information of the function was verified as a useful tool in the clinical data symptoms {S1, S2, * ' *, Sm } for the differential diagnosis of reduction procedure. the diseases {D1,D2,* * , D3,}. This quantity is defined as the entropy of the set {SI, S2, * * *, Sm }. Manuscript received February 10, 1977. In applying (3) to the actual clinical data, there arise two The author is with the Department of Neurophysiology, Brain Research Institute, Niigata University, Niigata, Japan. questionable points. The first is about the "dependency" 001 8-9294/78/0900-0462$00.75 © 1978 IEEE

OKADA: CLINICAL DATA REDUCTION

463

among the occurrences of symptoms. This will be discussed in the next section. The second is how to obtain the probability of occurrence of each disease P(Da). There is, however, no problem at this point. In (3), the first term is fixed for a given set of diseases, and only the second term, i.e. (2) is to be considered. The quantity given by (2) is "uncertainty" or "insignificance" of a given combination of symptoms for diagnosis. This is zero when P(Dal jSjI) # 0

P(DalSi) = 0

(acA 1, i / ii) and is maximum when P(DcxlSi) is the same for a = 1, 2, n and i = 1, 2, * *, m. It follows that a given set of symptoms {SI S2, * , Sm } contains less significant information if the value given by (2) is larger. It should be noted that (3) is inversely proportional to (2). P(Si) and P(DcxlSi) can be obtained from a random sample of cases which have been already diagnosed. n Im n -

E a=1 P(Si) = a=1fi/ k=1 E fka

Fig. 1. Symptom by disease matrix.

toms. Define an m by n matrix P = [pi,, whose m rows corP(DcxIS,) = Via (4) respond to the m symptoms and n columns correspond to the fil n diseases as follows: an entry pi, =P(DcxlSi) is the probwhere fio (i = 1, 2, ,m, a = 1, 2, n) is the percentage ability of occurrence of disease Dcx under the condition that by Eq. of the number of cases diagnosed Dax with symptom Si to the symptom Si has been observed. P(DcxlSi) is computed = (P(D1 ISil), In row vectors matrix consider two (4). P, Si, total number of cases diagnosed Dcx. Fig. 1 shows m by n P(D21Sil ), * * , P(DnlSg )) and Si, = (P(D1 ISj2), P(D2lSi2), matrix with fia as an entry. P(DP 512 )). Our definition of similarity of symptoms stated that two symptoms Sil and Si2 are similar if there is corEntropy Weighted with Similarity between {P(D1 ISi), *.. . ,P(Dn1 Sit )} and {P(D1 Si2 ), Returning to the first question raised in the previous section, relation . . ., P(D, jS,2 )}. Based on this definition, let us introduce a let us consider the "dependency" among the symptoms. In function Wa, which is defined as: weight Eq. (2), it is assumed that occurrence of each symptom is stochastically independent of each other. However, this m W =- Wia0log2 Wia (a = 1, 2,, n), assumption does not hold true in the actual situation. In this w =1 sort of analysis of the data, the problem of the "dependency" could be solved by the multivariate analysis such as "factor where analysis" or "principal component analysis" [7]. But these /m methods require complicated procedure for obtaining entropy. = P(DcxlSi) / (5) Wick P(DcxISk)4 Theoretically, conditional entropy can be obtained for the k=l data in which there is correlation between any two items. To Let S represent the set of symptoms {S1,S2, ,Sm}. do this, correlation must be computed for every possible of S Then the and H*(S), the function function of H(S,), Si, permutation of the symptoms, that is, for m symptoms and as are defined follows: n diseases, computation must be performed (n * m)! times. This does not seem to be practical procedure. Instead of n 1 directly computing conditional entropy of the data, let us (6) E H(S) = -og2 i 102 e=1 WaP (DalS,) log2 P(DalSi) pay attention to the correlation between the probabilities of occurrences of symptoms. First, we define the meaning of the similarity of symptoms. P(Sd)WcxP(DCdIS) log2 P(DclSi). 1 Hz= E Let {D1, D2,'* , Dn} be a given set of diseases. For two arbitrary symptoms Si and Si, we define that S is similar to S5 if (7) and only if there is correlation between {P(D1 Si), P(D2 IS), ... ,P(Dn IS)} and {P(DIIS), P(D2ISi), -, P(DnISd)}. If H(S,) is the entropy of symptom Si weighted according to its the correlation is higher, Si is more similar to S. For differ- degree of similarity to the other symptoms. (Hereafter, this ential diagnosis, a symptom Si is almost useless if there is a is termed simply as "the weighted entropy of Si.") H(S,) gives relative significance of a symptom Si in the given diagsymptom Si(i :Ai) to which Si is very similar. of the nostic system. H*(S) is the average weighted entropy of all the of similarity symplet us degree quantify Next, P(~

~~ n

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 5, SEPTEMBER 1978

464

Si's. In (6) and (7), the denominator log2 m is the maximum of Wa, which is given by replacing * ,n) with 1r/m in (5),

wi/i = 1,

TABLE I RESULTS OF SIMUIATION

, m, a= I,

1 3 4 6 9

1 3 4 6 9

H(S1) = 2.0000 H(S2)=2.0000

4 6 9

1 3 4 6 9

1

1 1

1 1

9 9

1 1 1

1 1 1

9 9 9

H(S1)= 1.2075 H(S2)= 1.2075 H(S3)= 1.2075 H(S4)=1.2075 H(55) = 1.2075

1 1 9 1 1

1 9 1 1 1

9 1 1 1 1

H(S2) = 0.8049 H(S3)=0.8049 H(S4) = 0.8058 H(S5) = 0.8058

9 9

9

1

1 9 9

9 9 9

9

9

1

max Wa = - E-log2-= log2 m. a i~l m m

3

H*(S) is the measurement of the usefulness of the set of symptoms S. The smaller H* is, the more symptoms in S are significant. Max H*(S) is obtained by replacing P(DcxISj) ,m, a= 1, * , n) in with 1/n and Wa with log2 m(i = 1,

(ii) 1

(7),

1 1

1_m_

m maxH*(S)=- oZg P(Si)(log2 m) log2 M1

(B)

(A)

1

n

1o2

H(S3)=2.0000

H(S4)= 2.0000 H(S5)= 2.0000

1

(iii)

n1

1 9 9

= log2 n.

Max H*(S) is independent of the number of symptoms m. Now, we are ready to select symptoms which are significant as diagnostic information. The procedure of the selection is as follows: step 0. Set k = 0. step 1. Set k = k + 1. Choose Sikwith H(Sik) the smallest in S. step 2. Test the probability of correct diagnosis with a set of symptoms S - {Sik} If the correctness is satisfactory, let S = S - {Sjkj and repeat step 1. Otherwise, Sis the desired set of symptoms.

The actual procedure will be described later. SIMULATION To test the validity of H(Si) as the weighted entropy, H(Si) was computed for several sets of test data. The result is summarized in Table I. Column (A) shows (5 X 4) matrices (i), (ii), (iii), and (iv). An entry in the i'th row and a'th column (i = 1, 2, 3, 4, 5, a(= 1, 2, 3, 4) represents the number of cases with symptoms Si which have been diagnosed Da. Column (B) shows computed H(Si)'s. In (i), every column vector is identical to each other, that is, no symptoms here can give any information in making differential diagnosis. In (ii), every row vector is identical to each other, that is, only one symptom is significant as information to identify D4. In (iii), row vectors SI - S4 are independent of each other. Each one of symptoms SI -S4, therefore, has different information in this diagnostic system. This combination of symptoms is naturally considered to be more useful as diagnostic information than that of (ii), which in turn is more useful than that of (i). The weighted entropy H(S1) is smaller in (iii) than in (ii), although the classical entropy of (ii) and (iii) computed by (2) turned out to be the same. In (iii), H(S4) and H(S5) are larger than H(S1), H(S2), and H(S3). This reflects the

1 1

(iv)

9

9 9 1 1

1 9 9

H(S1) = 0.8049

H(S1) = 1.5528 H(S2)= 1.5528 H(S3) = 1.5528 H(S4) = 1.5862

H(55)=1.5862

fact that two row vectors S4 and Ss are identical, and hence are less significant when they exist together in S. The layouts of row vectors of (iii) and (iv) are alike. But in (iv), each symptom Si has high probability for all diseases but one, hence it helps only to eliminate one disease from the candidates. On the other hand, Si's in (iii) are capable of choosing the most probable disease. It then is reasonable that H(Si)'s are larger in (iv) than in (iii). By the same reason as in (iii), H(S4) and H(Ss) are larger than the rest in (iv). APPLICATION Data Acquisition The data base is being developed currently for neurological disease data processing. For the experiment, we selected 80 cases from this data base. Each patient filed in the data base has been diagnosed either by anatomical findings or by doctor's long-term observation. Table II shows the symptom by disease array. As for diseases, Dl through D4 represents 1) multiple sclerosis, 2) myelitis, 3) brain stem encephalitis and 4) myeloradiculoneuritis, respectively. As for symptoms, S(1) through S(72) were selected only from the routine clinical data. The list of these symptoms is available in Table III. The sequence of symptoms, which is an important clue in the doctor diagnosis, is not taken into consideration here. Since the number of cases is not fixed for each disease in the data base, we employed f Q(i= 1, 2, * ,72, a = 1, 2, 3, 4) as an entry of the array, which is a rounded ratio of the occurrences of Si to 20 cases of Da (Table II). The right most column shows

V =1flti(i 1, 2, *,72). =

OKADA: CLINICAL DATA REDUCTION

465

TABLE II OBSERVED FREQUENCY fia 5( 1) S( 2) S5 3) S 4) S( 5) S( 6) S( 7) S( 8) S 9) S(10)

S(11) S(12) S(13) S(14) S(15) S(1 6) S(17) S(1 8) S(1 9) S(20) 5(21)

S(22) S(23) 5(24) S(25)

S(26) S(27) S(28) S(29) S(30) 5(31) 5(32) S(33)

S(34) S(35) 5(36)

D1

D2

D3

D4a

7

15 0 3 0 0 0 17 0 0 3 1 2 6 3 3 1 1 2 3 3 0 0 0 0 0 0 0 1 1 3 0 0 0 1 1 10

14

17 0 8 0 0 0 17 2 0 0 2 4 8 4 2 2 0 2 4 2 0 4 0 4

0 19

19 9

16 12 15 14 17 0 2 1 1 1 1 1 0 1 4 0 0 0 0

0 2 1

5 2 2 0 0 2 2 10

0 2 2 0 2 2 20 8 20 0 5 2 8 8 0 2 0 0 5 2 0 0 2 0 0 0 5

2 2

0 2

TABLE III THE LIST OF THE SYMPTOMS

DI

D2

D3

D4

53

S(37)

8

0

0

2

0 32 21 9

S(38)

2 5

0 6 0 7

2 0 2 0

0 2 0 6

15 5 0

11 5 0 8 2 0 0 8 14 5 2 8 8 0

18 48 37 22 40 3 13 17 16 14 4 4 4 8 14 2 4 0 6 3 2 2 9

S(39) S(40) S(41)

3

S(42)

5 4 0 2 7 14 13 8 10 8 3 14 9 7 16 1 13 14 16 8 8 1 6 2 1

S(43) S(44)

S(45)

S(46) S/47) S(48) S(49) S(50) S(51) S(52) S(53) S(54)

S(55) S(56) S(57) S(58) S(59) 5(60)

S(61) S(62) S(63) S(64) 5(65) 5(66)

2 2 2 0 2 0

2 0 0 0 4 4

10 7

2

2

7

5(71)

17

13

50

5(72)

4

S(67) 5(68)

0 6

5(69)

7

5(70)

aDI = Multiple Sclerosis,D2 Myelitis,D3 =

D4 = Myeloradiculoneuritis.

0 5 3 4 I 13

1 Course 2. Familial incidence 3. Remmision & recurrence 4. Multifocal lesions

5 0 0 14 14 8 0

20 13 2 2 2 2 2 0 0 0 0 15 4 6 20 2 4 2 20 0 11

10 4 13 3 16 51 27 2 12 11 16 16 16 25 13 5 53 21 28 59 4 25 33 67 17 35

21. Overwork 22. Trauma 23. Vaccination

0 2

13 4

16 18

24. Operation 25. Pregnancy

0 0

2 0

0 0

4 1

26. Delivery

0 8 4 6 0 7

0 8 5 2 0

0 8 0 6 0

0 29 12 18

17

13

0 0 0 1

0 1 0 0 16 0 15 18 1 8 3 17 1 16 2 6

1 50

Brain Stem Encephalitis,

5. Cerebral signs 6. Optic signs 7. Spinal cord signs 8. Brain stem signs 9. Cerebe lar signs 10. Laterality of signs Past history 11. Measles 12. Allergic diseases 13. Trauma 14. Appendicitis 15. Operation 16. Tuberculosis 1 7. Gastro-intestinal diseases 1 8. Liver diseases 1 9. Nephritis 20. Misce laneous Precipitating factor

27. Misce laneous

28. 29. 30. 31.

Initial symptoms

37. Impairment of visual acuity 38. 39. 40. 41. 42. 43. 44. 45.

Double vision Para lysis

Speech disturbance Gait disturbance Numbness

Hypesthesia Ophthalmic pain Miscellaneous

Main neurological signs 46. Mental disorder 47. Impairment of visual acuity 48. Optic nerve atrophy 49. Ophthalmoplegia 50. Nystagmus 51. Dysarthrio 52. Dysphagia 53. Paralysis 54. Half of the body Lower half of the body 55. 56. Hyperreflexia 57. Hyporeflexia 58. Pathological reflex

59. Ataxia or intension tremor 60. Sensory disturbance Half of the body 61. 62. Lower half of the body 63. Glove & stocking 64. Bladder & rectal disturbance

Prodromal symptoms Fever Headache Common cold

65. Convulsion

Nausea & vomiting

Cell count Protein content 71. Misce laneous 72. Effect of steroid therapy

32. Exanthema

33. Dizziness & vertigo 34. Pain 35. Miscellaneous

66. 67. 68. 69. 70.

Painful

cramp

Wasserman reaction C.S.F.

36. Mode of onset

The Procedure And The Result

At the beginning, H(S,) and H(Sj)/H*(S) were computed for each Si (Table IV). The procedure for the reduction of the symptoms is as follows:

step 7. Set S6 = SS - {SjIH(Sj)IH*(S) > 0.4}. Test the probability of correct diagnosis.

The degree of correctness of the differential diagnosis with the reduced set of symptoms was examined in two ways. In the first, computer diagnosis based on "maximum likelistep 1. Obtain the probability of correct diagnosis with the hood method" was used to see the changes in the percentage original set So = {S1, S2, -, S7.2} step 2. Set SI = 1.2}. Test the prob- Table V shows the result at each step. Up to step 4, the percentage of correct diagnosis is kept over 90%. At this ability of correct diagnosis. step 4. Set S3 = S2 - {S IH(Si)IH*(S) > 1.0}. Test the prob- point, 28 symptoms out of 72 are left. These are listed in Table VI. ability of correct diagnosis. In the second, the discriminating capability of the remaining step 5. Set S4 = S3 - {SIH(S,)IH*(S)> 0.8}. Test the probsymptoms was tested using the variance ratio. For this test, ability of correct diagnosis. step 6. Set S5 = S4 - {SIH(S,)/H*(S)> 0.6} . Test the prob- seven cases diagnosed multiple sclerosis were selected randomly from the cases used in the first test. At each step, the likeliability of correct diagnosis.

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 5, SEPTEMBER 1978

466

TABLE VI

TABLE IV

LIST OF THE COMPUTED H(Si) AND H(Si)/ H*(S) H (Si)/H *(S)

H (Si)

S( 1)

1.6646

1.2566

S( 2) S5 3) S( 4) S( 5) S( 6)

0.0000 1.3146

0.0000 0.9925 0.2978 0.0000 0.3306

0.3945 0.0000

0.4379 1.5145 1.0842

S( 7) S( 8) S( 9) S(1 0) S(1 1) S(12) S(1 3) S(14) S(15) S(16) S(17) S(18) S(19) 5(20) 5(21) 5(22) S(23) S(24) S(25) 5(26) S(27) S(28) S(29) 5(30) S(31) S(32) S(33) S(34)

1.2121 1.6633 0.0000

0.0000 0.0000 0.7847 0.8139 0.0000 0.0000 1.4292 1.5258 1.3479 0.8812 0.0000 0.7847 1.1992 1.6872 1.6942

CHANGES IN THE 3. 4. 5. 6. 8. 9. 10. 19. 24. 33. 34. 37. 39. 41. 45.

0.6267 0. 8590 0.5852 1.2268 1.0642 1.1024 1.0434 0.9815 0.9807 0.6384 0.9150 1.2557 0.0000 0.0000 0.0000 0.5924 0.6144 0.0000 0.0000 1.0790 1.1519 1.0176 0.6653 0.0000 0.5924 0.9053 1.2737 1.2790

0.8456

5(36)

S(37) S(38) S(39) S(40) S(41) S(42) S(43) S(44) S(45) S(46)

1.1434 0.8185

0. 8302 1.1378 0.7752 1.6251 1.4096 1.4603 1.3821 1.3001 1.2991

S(35)

H (Si)

S(47) S(48) S(49) 5(50) 5(51)

S(52) S(53) S(54) 5(55) S(56) S(57) S(58) S(59) 5(60) S(61) 5(62)

5(63) 5(64)

0.8812 1.2676 0.8131 1.3016 1.5931 1.5665 0.0000 1.0931 1.1415 0.4741 0.7510 0.8812 1.0357 0. 8443 0. 8533 1.6895 1.3189 1.2650 1.6110 1.3001 1.2474 1.3975 1. 7164 1.1029 1. 3228 0.7488 1.6349

0.8812

S(66) S(67)

0.0000 0.0000 1.7018 1.3451 1.6325 0.0000 1.6752

S(68) S(69) S(70) S(71) 5(72)

Operation Dizziness & vertigo Pain

Impairment of visual acuity Paralysis Gait disturbance Misce laneous

46. 47. 48. 49. 50. 51. 52. 54. 55. 58. 61.

Mental disorder

Impairment of visual acuity Optic nerve atrophy Ophthalmoplegia Nystagmus Dysorthria Dysphasia Half of the body (paralysis) Lower half of the body Pathological reflex Half of the body (Sensory disturbance) 62. Lower half of the body (Sensory disturbance) 63. Glove & stocking (Sensory disturbance)

Percentage of the correct diagnosis

1.2000 < 1.0000 < 0. 8000 < 0.6000

72 51 38 28 14 9

96% 96% 92% 92% 84% 84%

0.4000

4

8os

-

A method for clinical data reduction based on "weighted entropy".

462 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. BME-25, NO. 5, SEPTEMBER 1978 A Method for Clinical Data Reduction Based ''Weighted Entropy11...
2MB Sizes 0 Downloads 0 Views