Cutaneous and Ocular Toxicology

ISSN: 1556-9527 (Print) 1556-9535 (Online) Journal homepage: http://www.tandfonline.com/loi/icot20

Revisit the 21-day cumulative irritation test – statistical considerations Paul Zhang & Qing Li To cite this article: Paul Zhang & Qing Li (2016): Revisit the 21-day cumulative irritation test – statistical considerations, Cutaneous and Ocular Toxicology, DOI: 10.3109/15569527.2016.1141418 To link to this article: http://dx.doi.org/10.3109/15569527.2016.1141418

Published online: 24 Feb 2016.

Submit your article to this journal

Article views: 12

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=icot20 Download by: [Orta Dogu Teknik Universitesi]

Date: 14 March 2016, At: 15:34

http://informahealthcare.com/cot ISSN: 1556-9527 (print), 1556-9535 (electronic) Cutan Ocul Toxicol, Early Online: 1–6 ! 2016 Taylor & Francis. DOI: 10.3109/15569527.2016.1141418

RESEARCH ARTICLE

Revisit the 21-day cumulative irritation test – statistical considerations Paul Zhang1 and Qing Li2 Johnson and Johnson Consumer Inc., Morris Plains, NJ, USA and 2Xian Janssen Pharmaceutical Ltd, Johnson and Johnson, China

Downloaded by [Orta Dogu Teknik Universitesi] at 15:34 14 March 2016

1

Abstract

Keywords

The 21-day cumulative irritation test is widely used for evaluating the irritation potential of topical skin-care products. This test consists of clinician’s assessment of skin reaction of the patch sites and a classification system to categorize the test product’s irritation potential. A new classification system is proposed which enables us to control the estimation error and provides a statistical confidence with regard to the repeatability of the classification.

Evaluation method, irritation, skin-care products

Introduction Skin irritancy potential of a topically applied product is a key component of its safety profile since majority of complaints related to product use are local skin irritation. Appropriate identification of the degree of irritation potential is crucial for making a go/no go decision during product development. A minimum or mild irritation potential is desired for most of topical products, especially those for cosmetic use. The 21-day cumulative irritation patch test (CIT) is the most widely used method for evaluating skin irritation in the general population. This method consists of two components, one is the evaluation procedure for skin reaction and another one is the classification procedure for identifying the degree of irritation potential. The evaluation procedure was originally developed by Lanman et al.1 in which the skin reaction of the patch site and the reaction of the superficial layer of the skin are scored by a clinical evaluator. In 1982, Berger and 2 Bowman modified this procedure and proposed a classification system to identify the degree of irritation potential of a test product. Since then, the modified evaluating procedure and classification system have been broadly adopted by the industry during product development. This classification system, however, was based on empirical experiences of prior tested products, statistical principle was not fully taken into considerations in terms of controlling the estimation error and providing a statistical confidence for classified results. In this paper, a new classification system is proposed. In the section ‘‘Assessment of irritation’’, the currently used irritation evaluation procedure and classification system are reviewed and discussed. In the section ‘‘Proposed method’’,

Address for correspondence: Paul Zhang, Johnson and Johnson Consumer Inc., 185 Tabor Road, Morris Plains, NJ 07950, USA. Email: [email protected]

History Received 3 December 2015 Revised 22 December 2015 Accepted 9 January 2016 Published online 15 February 2016

a new classification system is proposed and appropriate sample size is recommended. An alternative test of the 21-day CIT, the 14-day CIT, is discussed in the section ‘‘14-day cumulative irritation test’’. The last section presents a discussion.

Assessment of irritation The skin irritation potential of a product under development 1 is evaluated by the procedure developed by Lanman et al. 3 in 1968 and modified by Philips et al. in 1972. One patch per test product is typically applied on the back or the upper arm of the study subject. The patch is kept in contact with the skin for 23 ± 1 h per application and then is removed by the subject and discarded. Subjects are instructed to take a bath or shower after patch removal, and then report to the study site for the evaluation of irritation reaction using a scoring system. This procedure is repeated daily for 21 consecutive days. The scoring system consists of the assessment of local skin reaction at the patch site and the assessment of the reaction of superficial layers of the skin. A set of numerical grades is employed to assess the local skin reaction at the patch site: (0) No evidence of irritation (1) Minimal erythema, barely perceptible (2) Definite erythema, readily visible; or minimal edema; or minimal papular response (3) Erythema and papules (4) Definite edema (5) Erythema, edema, and papules (6) Vesicular eruption (7) Strong reaction spreading beyond test site A set of alphabetic grades is employed to assess the reaction of superficial layers of the skin: (A) Slight glazed appearance (B) Marked glazing

Downloaded by [Orta Dogu Teknik Universitesi] at 15:34 14 March 2016

2

P. Zhang & Q. Li

Cutan Ocul Toxicol, Early Online: 1–6

(C) Glazing with peeling and cracking (D) Glazing with fissures (E) Film of dried serous exudate covering all or portion of the patch site (F) Small petechial erosions and/or scabs The alphabetic grade is then converted to a numerical grade: A ¼ 0, B ¼ 1, C ¼ 2, D ¼ E ¼ F ¼ 3. The final irritation score is defined as the sum of the two grades, e.g. 2C ¼ 2 + 2 ¼ 4. In clinical practice, once a subject is being assessed by a score of 3 or higher during a study visit, it is not ethical to ask the subject to continue patch applications on the same site for the remaining days. For the evaluation of the remaining days, different opinions exist. Berger and Bowmanz2 suggested truncating the observed score on that day by 3 and then carrying forward this score for the remaining days. They argued that the method is intended to be used for relatively mild cosmetic products so higher than 3 score would be meaningless. Obviously, this truncation may underestimate the irritation potential of a test product. To avoid this, we propose that if a subject reaches a score of 3 or higher during a visit, the patch application will be stopped and the actual observed score during that visit will be carried forward for the remaining days. For example, if a subject is observed an irritation score of 5 on day 10, then the score of 5 will be carried forward for day 11 through day 21. Berger and Bowman2 developed an empirically derived classification system to identify the degree of irritation potential of a test product. First, the total score of 10 subjects is calculated by the following formula: TSð10Þ ¼

N X 21 10 X Sij N j¼1 i¼1

system questionable, especially when this system is used in a broad range of skin-care products. Another limitation of this classification system is the clinical interpretation of the total score of 10 subjects TS(10). If the investigator is interested in the typical irritation reaction of the subjects under the test product, or if the investigator wants to predict the irritation reaction of a subject in general population, then this score is not feasible to serve that purpose. The irritation characteristics of the test product are depicted by the clinical evaluation scoring system, but no corresponding relationship between the TS(10) and the clinical evaluation scoring system is established. A third limitation of this classification system is that the classification is carried out based on a point estimate, not incorporating the variation of the irritation scores. Hence, it does not provide a statistical assurance on the classified results. This means if the same product is being tested 100 times under the same conditions, what is the percentage of the product being classified in the same category? Or what is the percentage of the product being classified in different categories? If the percentage of classifying the same product into different categories is high, this classification system is less reliable and the validity of this system is questionable. A literature review has not revealed any publication for evaluating the reliability of this empirically derived classification system. Since this system does not take into consideration for the variation of the irritation score, it is not capable to determine the appropriate sample size in order to control the estimation error within certain level. The authors of this system suggested a randomly selected number of more than 10 subjects.

Proposed method

where Sij is the irritation score of the jth subject on the ith day, and N is the total number of subjects underwent the specified test product. The irritation potential of the test product is then classified based on the observed value of the score TS(10) (Table 1). This classification system results in varied weights to each category. The weights for categories I, II, III, IV, and V are 0.0792, 0.2377, 0.3962, 0.2076, and 0.0792, respectively. Therefore, without any prior knowledge about the test product and under the same test conditions, one product has more than 5 times of chance to be classified as category III than to be classified as category I. It seems the choice of this set of weights is arbitrary and based on the empirical experience. Lack of justification of the unequal weights will make this

In order to avoid the potential underestimation of the irritation reaction due to truncation of the irritation score, we propose to use the last observed score prior to discontinuation of the Table 2. Irritation classification system based on the point estimate. Category

Classification

I II III IV V VI

  0.5 S   0.5 5 S   1.0 5 S   1.5 5 S   2.0 5 S  2.5 5 S

1.0 1.5 2.0 2.5

Description None or slight irritation potential Mild irritation potential Mild to moderate irritation potential Moderate irritation potential Moderate to severe irritation potential Severe irritation potential

Table 1. Empirical classification system. TS(10)

Category

Description

0–49

I. Mild product – no experimental irritation

50–199

II. Probable mild in normal use

200–449

III. Possible mild in normal use

450–580

IV. Experimental cumulative irritation

581–630

V. Experimental primary irritant

Essentially no evidence of cumulative irritation under the conditions of test (i.e. continuous at concentration specified). Evidence of a slight potential for very mild cumulative irritation under the conditions of test. Evidence of a moderate potential for mild cumulative irritation under the conditions of test. Evidence of a strong potential for mild-to-moderate cumulative irritation under the conditions of test. Evidence of potential for primary irritation under the conditions of test.

21-Day cumulative irritation test

DOI: 10.3109/15569527.2016.1141418

patch application due to the irritation score of 3 or higher and carry forward this score for the remaining days. The mean irritation score of the jth subject is calculated by the following formula, 21 X j ¼ 1 S Sij 21 i¼1

This score is the jth subject’s average irritation reaction spreading over 21 days. It can be viewed as a typical irritation reaction on any day during the 21-day period. The scores from different subjects reveal the subjects variation to the same product. The group mean score of all subjects tested with the same product is calculated as

Downloaded by [Orta Dogu Teknik Universitesi] at 15:34 14 March 2016

N X ¼ 1 j S S N j¼1

This score is the average irritation reaction from all subjects spreading over 21 days, and it is an unbiased point estimate of the true irritation score. The irritation potential of the test product can be identified based on this score since it can be viewed as a prediction of the irritation reaction to the test product in general population after applying the patch to skin. The identification procedure is specified in Table 2. This classification system assigns equal weights to each category except for the category VI which is the combination  of all categories higher than V. Although it is possible for S having a value of 3 or higher, there is no need to create detailed categories higher than category V since in this case most likely the study has been terminated early due to a high proportion of the subjects experienced severe irritation reaction.  is an unbiased point estimate Although the mean score S of the true irritation score, using point estimate does not provide a statistical assurance with regard to the reliability of this classification system, that is, the chance of classifying the same product into same category if the test product is being evaluated repeatedly under the same evaluation condition for the same group of subjects or different group of subjects. Therefore, the confidence interval is proposed to be utilized for identifying the irritation potential of the test product. The 95% confidence interval of the true irritation score is calculated as     t0:975, N1 pffiffiffiffiffi , S  þ t0:975, N1 pffiffiffiffiffi S N N j . This where  is the standard deviation of the score S quantity is usually unknown but can by PNbe estimated 1  2 the sample standard deviation ^ ¼ N1 j¼1 ðSj  SÞ . The quantity t0:975, N1 is the 97.5 percentile from the student t-distribution with the degree of freedom N  1. The confidence interval is an interval estimate of the true irritation score, that is, if the same product is tested 100 times under the same conditions, then 95% of the times the true irritation score will fall into this interval. The irritation potential of the test product is then classified based on the following classification criterion (Table 3). Let $upp be the upper limit ofpthe ffiffiffiffiffi  þ t0:975, N1 ^= N , 95% confidence interval, i.e. $upp ¼ S then

3

Table 3. Irritation classification system based on the interval estimate. Category I II III IV V VI

Classification $upp  0.625 0.625 5 $upp  1.125 5 $upp  1.625 5 $upp  2.125 5 $upp  2.625 5 $upp

1.125 1.625 2.125 2.625

Description None or slight irritation potential Mild irritation potential Mild to moderate irritation potential Moderate irritation potential Moderate to severe irritation potential Severe irritation potential

 to the upper limit of The distance from the point estimate S pffiffiffiffiffi the confidence interval $upp is L ¼ t0:975, N1 ^= N . This quantity is a measure of the estimation error. By choosing appropriate sample size we can control this estimation error within 0.125, which is a quarter of the length of each category interval. Therefore, if a test product is classified into one category, say category II, that means the criterion 0.625 5 $upp  1.125 is satisfied, then 95% of the time the point  will be greater than 0.5 estimate of the true irritation score, S, but less than or equal to 1.0. That is, if the test product is evaluated repeatedly under the same evaluation condition, the chance of classifying the test product into the same category is greater than 95%, only less than 5% of the chance will the same test product be classified into different categories. To evaluate the reliability of the Berger and Bowman method and the proposed method, a sample dataset in Table 4 is used for illustration. Thirty subjects were evaluated for 21 days of the skin reaction of a test product. Applying the Berger and Bowman’s method, the total score of 10 subjects TS(10) is 58 and the test product is classified as category II: ‘‘Probable mild in normal use’’. Applying the proposed method, the upper bound of 95% confidence interval is 0.3381 and the test product is classified as category I: ‘‘None or slight irritation potential’’. To repeat this classification process, the bootstrap re-sampling technique was employed to generate 1000 bootstrap samples of each containing 30 subjects. Then the two methods were applied to each of bootstrap samples to categorize the irritation potential. The Berger and Bowman method repeated the same classification results as the original one (category II) 85.5% of the times, while 14.5% of the times the Berger and Bowman method classified the test product into category I. The proposed method repeated the same classification results as the original one 100% of the times. Hence, for this given dataset, the Berger and Bowman method yielded 85.5% reliability and 14.5% of misclassification rate while the proposed method yielded 100% reliability. This example showed that proposed method improved the reliability of classification process by controlling the inference error compared to the Berger and Bowman method which has no control of inference error. In order to control the length of the confidence interval, the knowledge of the standard deviation  is required. Simulations were done to study the properties of the standard deviation. The simulations were set up from the most mild situation, i.e. most irritation scores are of 0 or 1, to the most severe situation, i.e. most irritation scores are of 2 or 3. The simulation results showed that the standard deviation ranges from about 0.15 for the most extreme situations, either most mild or most severe, to about 0.35 for the evenly distributed

4

P. Zhang & Q. Li

Cutan Ocul Toxicol, Early Online: 1–6

Table 4. Observed 30 subjects’ cumulative irritation score.

Downloaded by [Orta Dogu Teknik Universitesi] at 15:34 14 March 2016

Evaluation day Subject ID

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 2 0 1 1 2

0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 1 1 0 0 0 0 1

0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 1 0 1 0 0 2 0 0 0 0

0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 2 1 0 1 1 0 0 1

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 2 1 0 1 0

0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 2 1 0 1 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 1 1 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1

0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0 2 1

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 2 2 0 0 2 0 1 0 1 1 0

0 1 0 1 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 2 0 0 2 2 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 2 1 0 1 0 1 1 0 1

0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 2 0 1 0 1 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 1 0 1 0 0 0 0 0 2 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 1 0 0 1 0 0 0 0

1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 1 0

irritation score. Therefore, a sample size of 32 is suggested to control the length of the confidence interval by 0.25. Considering a 10–15% attrition rate for a 21-day cumulative trial, it is suggested to enroll 40 subjects to ensure at least 32 subjects who have the irritation assessment scores for each of 21 days. In practice, subjects may not be able to come to the clinic site for the irritation assessment and patch application on every day during the 21-day test period. If a subject occasionally missed one site visit not due to the irritation score of 3 or higher, the current practice is to allow for one make-up visit so each subject will have completed 21 assessment scores in order to calculate the total score TS(10). If a subject missed more than one site visit this subject will have less than 21 assessment scores and will be excluded from the calculation of TS(10). By the proposed  j , instead of the total score, method, the subject mean score S  j is much less is calculated. Since the subject mean score S affected by the missing data as compared to the total score, the following rule is suggested. No make-up visit. If a subject has less than or equal to 3 consecutive visits missed, then missed irritation assessment scores will be imputed by the higher score of immediate prior and post missing days. For example, if one subject missed day 7, day 8, and day 9, and the irritation scores were 2 and 1 for day 6 and day 10, respectively, then the irritation scores of day 7, day 8, and day 9 will be imputed by the score of 2. If more than 3 consecutive visits are missed, then no mean score will be calculated for that subject. For subjects who miss the irritation

assessment and patch application due to the irritation score of 3 or higher, the last observed score will be carried forward for the remaining days.

14-Day cumulative irritation test Although 21-day CIT is widely used for evaluating the irritation potential of topical skin-care products, a 14-day CIT is also proposed and utilized in the clinical practice. Bowman 4 et al. concluded that due to a high correlation between the findings of 21-day CIT and 14-day CIT, the 14-day CIT is predictive of the 21-day CIT for some cosmetic products, but not for all product categories. Therefore, the 14-day CIT can be utilized as an alternative for assessing the irritation potential of the products under development to save time and cost. The 14-day CIT follows the same procedure as j the 21-day CIT. The calculation of the subject mean score S  and the group mean score S are similar as for proposed 21-day CIT. If the irritation reactions can be reasonably assumed to occur evenly over 21 days, then the mean scores based on 21-day or 14-day CITs are both unbiased estimates of the true irritation reaction, but the standard deviation of the 14-day mean score is larger than that of 21-day mean score. If the irritation reactions most likely occur within early days, the mean score of 14-day is recommended since the mean score of 21-day would likely underestimate the true irritation reaction. If the irritation reactions likely occur in the later stage, the mean score of 21-day is recommended since the

Downloaded by [Orta Dogu Teknik Universitesi] at 15:34 14 March 2016

DOI: 10.3109/15569527.2016.1141418

mean score of 14-day would likely underestimate the true irritation reaction. Based on the historical data, irritation reactions of cosmetic products tend to occur within early days. Therefore, the 14-day CIT tends to be conservative and can be utilized to speed up product development and reduce the cost. Since the standard deviation of 14-day mean score is about 22.5% larger than that of 21-day mean score, the required sample size for controlling the length of the confidence interval will be larger than that of 21-day CIT. We suggest 48 subjects for whom the mean score can be calculated for each of them. Considering a 10% attrition rate, 55 subjects are recommended to be enrolled to ensure at least 48 subjects having the irritation scores with no more than 3 consecutive days of missing scores during the 14-day test period. If one has no prior knowledge about the test product, it is recommended to employ the 21-day test procedure with 40 subjects. The mean scores of 14-day and 21-day and the ratio of these two scores are calculated for each subject, and then the group mean and standard deviation are calculated based on the ratio scores, and the 90% confidence interval are constructed based on the student t-distribution with degrees of freedom N  1. If the 90% confidence interval of the ratio is within the range of (0.80, 1.25), the estimation bias will not be a concern, both 14-day mean score and 21-day mean score can be utilized as unbiased estimates of the true irritation reaction. For the purpose of identifying the irritation potential, a conservative approach is recommended. That is, the confidence interval of the 14-day mean score is recommended to be utilized for classifying the irritation potential of the test product since this confidence interval is usually wider than the confidence interval of the 21-day mean score. If the 90% confidence interval of the ratio is out of this range, then the larger of the group mean scores of 14-day and 21-day will be used as the conservative estimate of the true irritation reaction and the confidence interval of the larger score will be used to classify the irritation potential of the test product.

Discussion The 21-day cumulative irritation design is the most sensitive design when discriminating between very mild to mild products of a similar nature. Proper and reliable testing and identification of the irritation potential of topical skin-care products under development is crucial in assessing their overall safety profile. Minimal or low irritation potential is desired for a ‘‘go’’ decision for most of the topical formulations. Currently used 21-day and 14-day CITs have been in existence for decades and widely adopted as ‘‘standard’’ in topical product development. Limitations, however, exist such as lack of control of the estimation error and statistical confidence due to the fact that statistical principle is not fully applied in designing the study and analyzing the results. By the proposed method, the potential underestimation of the irritation reaction is avoided, and the irritation reaction is presented as the average daily irritation score which is more straightforward and directly related to the clinical interpretation. The study sample size is determined based on the required estimation accuracy. The proposed method also provides the statistical assurance on the reliability of the classification system. With the added statistical rigors, the

21-Day cumulative irritation test

5

testing results are more reliable and chances of misinterpretation are decreased. The proposed method is aimed for identifying the irritation potential of a test product independently. That is, the irritation potential is identified based on the irritation reactions to the test product without comparing them with those of a reference product. In FDA’s Guidance for Industry Skin Irritation and Sensitization Testing of Generic 5 Transdermal Drug Products , the irritation potential of a test product is identified by the establishment of equivalence to a reference product. In the guidance, the design of a CIT and scoring of the irritation reaction are the same as the 21day CIT, but the evaluation of irritation potential is not based on the test product alone rather the comparison of the test product to a reference product. If the upper limit of the 90% confidence interval for the quantity T  1:25R is less than or equal to zero, the test product is considered equivalent to or better than the reference product in terms of irritation potential, where T and R are the true irritation reaction of the test product and reference product, respectively. These two approaches of identifying a test product’s irritation potential serve different purposes. For generic transdermal products, in addition to demonstration of the efficacy, the sponsor company is required to provide the evidence that irritation and sensitization potentials of the generic product are ‘‘equivalent or better than’’ the reference product. By the proposed classification system, the degree of the irritation of a test product is evaluated in a predefined, empirical classification system. It is possible that a test finding may result in a no-go decision even though the irritation potential of the test product is not worse than that of a reference product. In irritation studies, since the degree of occlusion is an important determinant of percutaneous penetration, the choice of patch material may influence the sensitivity of the test. Therefore, negative and positive controls, and the reference product(s) that produce known effect in humans, are sometimes included in the test. The purpose of including these controls is to ensure the validity of the test. If majority of the sampled subjects experienced no irritation to the negative control and exhibited strong irritation reactions to the positive control, the test is considered sensitive or valid. If sampled subjects showed some unusual reactions to the controls, for example, strong irritation reactions to the negative control and/or mild or no reactions to the positive control, then the results of the specific test are not considered valid. The irritation classification system presented in this paper can also be used to justify the validity of the test. In this case, the negative control needs to be classified as Category I and the positive control as Category III or higher for the test to be valid. In summary, the method of currently used 21-day and 14-day CITs is revisited, the statistical aspects are discussed and modifications are recommended. Some shorter duration CITs, for example, 7-day or even shorter, are seen in practice for the purpose of further improving the efficiency and reducing the cost. In this case, enough knowledge on the irritation of the test product needs to be acquired to ensure that utilizing such shorter duration test will not seriously bias the assessment of the irritation reaction. The sample sizes of

6

P. Zhang & Q. Li

such short duration tests also need to be increased accordingly since the standard deviation of the mean score for such shorter duration test is larger than that of 21-day test as discussed earlier.

Acknowledgements The authors thank Dr. Thomas J. Stephens for his valuable comments and suggestions.

Declaration of interest

Downloaded by [Orta Dogu Teknik Universitesi] at 15:34 14 March 2016

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of this article.

Cutan Ocul Toxicol, Early Online: 1–6

References 1. Lanman BM, Elverse EB, Howard CS. The role of human patch testing in a product development program. In: Proceedings, Joint Conference on Cosmetic Sciences. Washington DC: The Toilet Goods Association, Inc.; 1968:135–145. 2. Berger RS, Bowman JP. A reappraisal of the 21-day cumulative irritation test in man. J Toxicol Cut Ocular Toxicol 1982;1: 109–115. 3. Philips L, Steinberg M, Maibach HI, Akers WA. A comparison of rabbit and human skin response to certain irritants. Toxicol Appl Pharmacol 1972;21:369–382. 4. Bowman JP, Berger RS, Mills OH, et al. The 21-day human cumulative irritation test can be reduced to 14 days without loss of sensitivity. J Cosmet Sci 2003;54:443–449. 5. FDA Guidance for Industry. Skin irritation and sensitization testing of generic transdermal drug products. Washington, DC: CDER/FDA; 1999.

Revisit the 21-day cumulative irritation test - statistical considerations.

The 21-day cumulative irritation test is widely used for evaluating the irritation potential of topical skin-care products. This test consists of clin...
407KB Sizes 1 Downloads 8 Views