Introduction
Upper extremity capabilities of varied prehension, reach, object transport, and sensing can challenge the measurement of outcomes in therapy. The growth and development in children along with differential opportunities for play and education can further impact functional upper extremity measurement. Multifaceted assessment by therapists of children with upper extremity impairments and limitations involves the use of standardized observation and assessment of impairment (manual muscle testing, sensation, lower motor neuron deficits, spasticity, etc.), performance-based measures, child-reported outcomes, and parent-reported outcomes. Therapy-related objectives for assessment include screening; diagnosing functional limitations to develop benchmark, reimbursement, evaluation, and goal setting; prioritization of treatment goals; monitoring change over time; building evidence to support treatment for individualized data-driven decision making; and comparing outcomes among treatments. This chapter is an update to a prior publication that provided a description of outcome measures and their psychometric properties. This chapter provides an overview of different types of tests; details the psychometric properties of outcome instruments and their interpretation; updates the therapists on the literature on outcome instruments with particular focus on functional pediatric upper extremity instruments; and further discusses scoring and interpretation of traditional and modern item response theory (IRT)-based measures. Finally, the chapter provides resources for therapists to maintain ongoing competency in assessment of the pediatric upper extremity.
The scope of this chapter includes functional assessments of the upper extremity. The existing classification systems such as the Manual Ability Classification System for children with Cerebral Palsy (CP), the Mallet Classification and Active Movement Scale for children with brachial plexus birth palsy (BPBP), the International Standards for Neurological Classification of Spinal Cord Injury, and Classification of the Upper Extremity in Tetraplegia for children with spinal cord injury (SCI) will not be discussed in this chapter as they are not outcome measures, but classification systems. The scope of this chapter also excludes impairment-based measures for muscle strength, sensibility, joint range of motion, pain, and spasticity. Information on these methods can be found in other excellent resources. Impairment-based methods have traditionally been used during therapy and do not suffice as functional endpoints for outcomes assessment in children. The scope of this chapter also excludes assessments using mobile health technology and activity trackers as the literature on these technology-based measures is in its early stages with no consensus; however, preliminary evidence can be found in some recent studies. Of note, the terms tests, tools, scales, instruments, measures, and assessments will be used interchangeably in this chapter, as is the case in much of the literature.
Types of Outcome Instruments
The selection of outcome instruments is guided by determining if the scores need to be compared to a level of performance or criterion, or to the typical population. Thus, there are two types of instruments that differ in how scores are interpreted: criterion-referenced and norm-referenced tests. Criterion-referenced tests enable interpretation of test score in relation to a certain level of benchmark performance. For example, Shriners Hospitals Upper Extremity Evaluation (SHUEE) is a criterion-referenced test where children with cerebral palsy are compared on predefined criteria for assessment of their performance. In contrast, norm-referenced tests can be interpreted in the individual’s performance relative to performance of some known typical or normative group. An example of commonly used norm-referenced tests is the developmental motor scales wherein scores are interpreted against “normal” development. For norm-referenced instruments, the mean scores from the reference sample provide a standard and variability is used to determine how an individual performs relative to the reference sample. Norm-referenced tests are usually used for diagnosing, while criterion-referenced tests are used to examine proficiency of performance along a continuum. An example of a criterion-referenced test continuum is the range from inability to do a task to ability to complete the task and is felt to be more useful for developing and evaluating rehabilitation outcomes.
There are two types of assessments based on the use of scores: formative and summative assessments. An assessment that provides information to guide ongoing planning of treatment is called as a formative assessment. The criterion-referenced instruments provide the ability to examine ongoing performance and are used as formative assessments. In contrast, a summative assessment enables initial or discharge assessment of function and typically is a norm-referenced test. The knowledge of the original purpose of the test as described by the test author can help identify the recommended use of the instrument as a formative or summative assessment. Therapists need to be aware of the author-intended use of the instrument and to avoid inaccurate representation of the scores.
The assessments can also be categorized as generic versus disease specific. Generic measures are used across diagnostic conditions but need to be validated for their use in the population of interest. The disease-specific measures are only used with certain diagnoses with their items customized for the symptoms, features, or functional implications of the diagnosis. For example, the Jebsen Test of Hand Function and the Box and Block Test are generic performance measures for the upper extremity that assess function and dexterity, respectively. Although these measures have been originally developed for adults, their measurement properties have been studied in children. In contrast, the Prosthetic Upper Extremity Functional Index (PUFI) and the Child Amputee Prosthetics Project-Functional Status Inventory (CAPP-FSI) are disease-specific instruments developed for children with limb deficiency. The selection between generic and disease-specific instruments is determined by the purpose of assessment. The generic measures enable comparison of outcomes across diagnostic conditions; whereas, the specialized therapy centers prefer to use disease-specific instruments as they typically work for most patients seen within the clinic. Disease-specific instruments have the advantage of highly relevant items and response scales to the diagnostic population and may also function better at detecting change than a generic measure. For example, the PUFI has items probing the usefulness of the prothesis for the activity and items that have response options that take into consideration the use of prosthetic hand actively and passively.
The International Classification of Functioning, Disability, and Health (ICF) has also been used to categorize outcome instruments into body structure and function, activity, or participation level of measurement. Although measurement of performance in each of these domains is important for a comprehensive assessment of functioning, the currently available measures lack adequate coverage of all three domains, particularly those related to participation. Hao et al. described the expert consensus on musculoskeletal pediatric upper extremity outcome instruments according to ICF domains. For the activity domain, the bilateral tasks were highly valued by experts along with the instruments Assisting Hand Assessment (AHA), Pediatric Evaluation of Disability Inventory (PEDI), SHUEE, and Jebsen Hand Function Test. For the participation domain, the Canadian Occupational Performance Measure (COPM), Pediatric Outcomes Data Collection Instrument (PODCI), and Disabilities of the Arm, Shoulder, and Hand (DASH) were highly valued by experts. Other sources have provided detailed ICF classification of outcome instruments for children with CP and linking of individual outcome measures such as the PEDI.
There are different types of assessments based on the individual completing the items, that is, performance-based measures, parent/teacher/proxy-reported measures, and child-reported measures. Therapists typically use performance-based measures wherein the items are scored based on observed performance of the tasks for the test and used to be the data collection method preferred within the clinical setting. However, in the changing healthcare environment, to adhere to patient-centered clinical practice, there is also a need to collect patient-reported outcomes and a battery of outcome measures may be needed to fully assess the client’s functioning. Child-reported outcomes are equally important to collect within pediatric therapy practice and children as young as 3 years old could participate in reporting their experiences on appropriately designed measures.
Psychometric Properties of Outcome Instruments
The measurement or psychometric properties of an instrument provide the therapist with evidence on the accuracy and appropriateness of the tool for the intended purpose. The properties should be considered during selection of the outcome measure particularly when decisions related to treatment and reimbursement are based on the scores. These properties include evidence of reliability, validity, and responsiveness. The definitions of these properties and their interpretation guidelines are provided in Table 4.1 . Reliability evidence for a tool is the most basic of the three measurement properties and needs to be examined carefully. Reliability evidence for total scores is usually determined using Intraclass Correlation Coefficient, while those on individual items can be determined using kappa coefficients. For using a measure repeatedly with a patient, strong evidence of test–retest and intrarater reliability is desired. When there are multiple therapists involved in the assessment of a patient, a measure with strong evidence of interrater reliability estimates is preferred. When using different formats of the same instrument, alternate forms reliability should be examined for the evidence it provides to justify the use of differing formats. Internal consistency is a form of reliability assessed using Cronbach’s α coefficient for each dimension and is usually the most commonly reported evidence for reliability of a tool. However, internal consistency alone is not sufficient evidence of reliability.
Measurement Property | Definition | Interpretation Guidelines | References |
---|---|---|---|
Reliability | |||
Test–retest | Stability of scores free from measurement error across the specified condition (e.g., across time, within one rater, among different raters, and varied forms) | Intraclass correlation coefficient for total scores 0.7 and 0.9 are recommended for outcome instruments Higher than 0.9 are preferred | Andresen (2000) , Fitzpatirck et al., Portney and Watkins , Portney and Watkins, Streiner and Norman |
Intrarater | |||
Interrater | |||
Alternate forms | |||
Internal consistency | The degree of interrelatedness among items in a measure | Cronbach’s α >0.9 strong effect 0.70–0.80 moderate effect <0.70 weak effect | Portney and Watkins , Portney and Watkins |
Measurement error | The systematic or random error not related to a change in function | Scores/points | Portney and Watkins |
Responsiveness | |||
Effect size | The degree to which the score on an instrument is capable of detecting change in function. Responsiveness can be determined in a longitudinal study | Cohen’s d <0.2 weak effect 0.2–0.5 moderate effect 0.5–0.8 strong effect | Andresen (2000) |
Minimal detectable change | The degree to which the score on an instrument is capable of detecting important changes in function beyond measurement error | Scores/points greater than measurement error | Guyatt et al. |
Minimal clinically important difference | The degree to which the score on an instrument is capable of detecting clinically important changes in function. Also called as minimal important difference, minimally important changes, clinically meaningful differences | ||
Floor effect | The degree to which the score on an instrument is not capable of detecting changes at the lower end of function | 15%–20% of individuals achieved the lowest (floor) or highest (ceiling) score | Andresen (2000) |
Ceiling effect | The degree to which the score on an instrument is not capable of detecting changes at the higher end of function | ||
Validity | |||
Validity | The degree to which the instrument measures what it is supposed to measure | There are multiple ways in which validity can be reported. The unified concept of validity considers all psychometric properties as contributing evidence of validity | Messick |
Face | The degree to which the instrument appears to measure what it intends to measure | Qualitative and content expert agreement | Portney and Watkins , Portney and Watkins |
Content | The degree to which the items in the instrument reflect the construct to be measured based on theory and expert opinion | Qualitative and content expert agreement | Portney and Watkins , Portney and Watkins |
Criterion—concurrent | Degree to which the scores on the instrument are related to scores on a gold standard measure of the same construct | Correlation coefficient r or ρ <0.30 weak 0.30–0.60 moderate >0.60 strong | Andresen (2000) , Portney and Watkins , Portney and Watkins |
Construct—convergent | The degree to which the scores on the instrument are related to scores on another instrument with the same construct | ||
Construct—divergent/discriminant | The degree to which the scores on the instrument are not related to scores on another instrument with a different construct | ||
Factorial | The degree to which the items on the instrument represent factors within the construct | Confirmatory factor analysis indicating how much variance in scores is explained by the factors. Measures can have one or more dimensions | Andreson (2000) , Mokkink et al. |
Known groups | The degree to which the scores on the instrument are able to differentiate or discriminate between known groups | Statistically significant difference between groups | Andresen (2000) |
Cross-cultural | The degree to which the scores on the instrument are able to measure the same construct in a new cultural cohort | Cultural adaptation, front and back translation, and adaptation of the measure with establishment of relevant measurement properties | Andresen (2000) |
The evidence for construct validity encompasses all measurement properties. Face and content validity evidence is the most basic form of validity. Content validity evidence for patient-rated outcomes is gathered during the development and field-testing of the items with experts and users in an iterative cognitive testing methodology. The evidence for construct validity can be evaluated using hypothesis testing with a correlation coefficient for convergence with scores from instruments measuring similar constructs and divergence with scores from instruments measuring dissimilar constructs. The evidence for construct validity can also be informed by inferential tests for group differences of scores among known groups, such as those with and without upper extremity impairments. The evidence from exploratory or confirmatory factor analysis can further inform the validity evidence. The language adaptations for the patient-reported instrument should be accompanied with cultural adaptations, backward and forward translations, and measurement properties for the translated instrument. For the evidence of validity to be applicable for the diagnostic condition, measurement studies need to be conducted in the population of interest, the design needs to be intended for measuring psychometric properties, missing data should be reported accurately, and adequately powered for good quality.
The responsiveness of the scores of a measure can help the therapist determine its use as an outcome measure for the clinic. There are multiple methods used to report responsiveness such as effect size, standard response mean, minimal detectable change (MDC), and minimal clinically important difference (MCID). The MDC and MCID values should be greater than the measurement error values of the instrument. Floor and ceiling effects provide the therapist with information regarding the range of deficit for which the measure will not be useful. The clinical utility of a measure, although not a measurement property, includes the appropriateness of the measure based on construct and psychometrics, acceptability, practicability, and accessibility. Outcome measures that are lengthy, difficult to administer, or expensive and those lacking accommodations for disabilities lose clinical utility due to factors beyond those of psychometrics and therapists must consider these pragmatic aspects during selection of outcome measures.
The measurement properties discussed thus far and described in Table 4.1 provide description and interpretation guidelines using traditional measurement theory approach. The newer methods of item response theory employ modern statistical tools to analyze items calibrated on a continuum from low to high levels of the construct or trait. The properties of reliability and validity have different interpretation guidelines for studies using item response theory. For example, the National Institutes of Health Patient-Reported Outcomes Measurement Information System measures are developed using the modern measurement techniques. More information on use of item response theory applications to patient-reported outcome measures can be found in other useful sources ( Table 4.1 ).
Considerations in Selection of Outcome Instruments
Selection of instruments in the clinic can be a time-intensive task and a systematic approach can enhance efficiency. The UK Patient Reported Outcomes Measurement Group has set forth eight criteria for selection of instruments. These include appropriateness of the instrument to the needs of the construct to be assessed; acceptability of the instrument to the patients; feasibility to easily administer and score the instrument; interpretability and precision of the scores of the instrument; reliability of the instrument’s results; reproducibility and internal consistency; and responsiveness of the instrument to detect changes over time that matter to the patients. The Pediatric Section of the American Physical Therapy Association has enabled ease of access to this information for therapists and categorized by the ICF to facilitate the selection of measures. A systematic approach in the clinical setting can involve listing the characteristics of the population served by the clinic (e.g., age, diagnosis, and severity); thorough literature search on the instruments available for the population at impairment, activity and participation levels; examining the eight selection criteria; obtaining the instruments; and periodically reviewing the needs of the patient population to update the inventory. The outcomes used to track groups of patients at the clinic can also dictate the selection of instruments. These criteria can be applied within a program or department for systematic process of staff training using an educational framework for competence. The iterative process can involve conducting a pretraining needs assessment; establishing preliminary face validity of the measures; analyzing the needs of the learners such as time and learning preferences; assessing for initial competency; developing a posttraining competency assessment; developing training modules; delivering learning modules that involve problem-based, real-life learning opportunities; completing posttraining competency assessment; sharing results with learners; and discussion, evaluating the learning experience and updating the training and assessment based on feedback.
The guidelines for selection of instruments set forth by national organizations over the past decade are presented as core datasets or common data elements. The common data elements are essential or highly recommended elements of data for a particular diagnostic condition or for the general population as determined by scientific task forces convened to develop these guidelines. The outcome measures are only a subset of the common data elements that also include essential data to be reported during clinical trials such as height, weight, etc. Table 4.2 provides a description and link to the U.S. National Institute of Health, National Quality Forum, International Spinal Cord Injury Core Outcome Sets, Core Outcome Measures for Effectiveness Trials, American Physical Therapy Association’s EDGE Task Force Recommendations, and the ICF Core Sets. Although the common data elements were developed for uniform data collection in research studies, their translation to clinical practice can lead to standardized measurement variables collected across various diagnoses to enable creation of larger data repositories. This work is currently being done within the specialized model systems in the United States, for example the Spinal Cord Injury Model System. The field of rehabilitation can greatly benefit from adoption of common data elements within clinical practice.
Outcome Measure | Description | Number of Items | Supplier/Source | References |
---|---|---|---|---|
Functional Performance Measures | ||||
Assisting Hand Assessment (AHA) | Assesses the use of the assisting hand while performing bimanual play in usual environments; uses Rasch measurement model | 22 | http://www.ahanetwork.se/ | Chang et al., Bialocerkowski et al., Krumlinde-Sundholm and Eliasson, Gordon, Hoare et al., Krumlinde-Sundholm et al., Holmefur et al. Mini-AHA in CP: Greaves et al. |
Box and Block Test (BB) | Performance measure that requires picking up blocks and placing; norms available | 1 | Sammons Preston | Mathiowetz et al., Chen et al., Desrosiers et al., Lin et al., Platz et al. , Jongbloed-Pereboom, Mulcahey et al., Ekblom et al. |
Capabilities of the Upper Extremity Test (CUE-T) | Performance measure with unilateral and bilateral tasks scored on repetitive actions, progressive actions, and timed tasks | 17 | Dent et al. | Dent et al. |
Graded Refined Assessment of Strength, Sensation, and Prehension (GRASSP) | Strength, dorsal and palmar sensation, prehension ability and prehension performance are assessed | 25 items per hand | https://www.grassptest.com/ | Mulcahey et al. |
Jebsen Test of Hand function | It requires manipulation of objects that reflect everyday tasks and one writing task; norms available | 7 | Sammons Preston | Bovend’Eerdt et al., Jebsen et al., Taylor et al., Noronha et al., Mulcahey et al., Aliu et al., Klingels et al., Netscher et al., Lee et al., Shingade et al., Hiller and Wade , Brandao et al., Staines et al., |
Melbourne Assessment of Unilateral Upper Limb function (MUUL) | Assesses unilateral upper extremity quality of movement in children with neurological impairments | 14 | https://www.rch.org.au/melbourneassessment/how-to-order/ | Randall et al., ; Spiritos et al. , 152 Klingels et al. |
Quality of Upper Extremity skills Test (QUEST) | Criterion-referenced measurement tool, developed to evaluate upper extremity quality of movement in children with CP | 36 | https://www.canchild.ca/en/shop/19-quality-of-upper-extremity-skills-test-quest | Klingels et al., DeMatteo et al., 153,154 |
Shriners Hospitals Upper Extremity Evaluation (SHUEE) | Video-based tool for the assessment of upper extremity function in children with hemiplegic cerebral palsy | Tone, range of motion, and 16 manual function tasks | http://shrinerschildrens.org/shuee-test-scoring-and-interpretation/ | Klingels et al., Sakzewski et al., Bard et al., Randall et al., Klingels et al., Lee et al., Thorley et al., Davidson et al., Gilmore et al. |
Child and Parent-Reported Outcome Measures | ||||
ABILHAND-Kids | Manual ability in children; uses Rasch measurement model | 21 | http://www.rehab-scales.org/abilhand-kids-downloads.html | Arnould et al., Penta et al., 155 Vandervelde et al., 156 Aarts et al., Klingels et al., Sgandurra et al., Foy et al., Spaargaren et al., Buffart et al., Kumar and Phillips |
Activities Scale for Kids (ASK) | Assesses physical function with a capability and performance version | 30 | http://www.activitiesscaleforkids.com/ | Young et al., Plint et al. |
Canadian Occupational Performance Measure (COPM) | Semistructured interviews for parents and children to identify performance activities that are perceived as important by the parent, child, and/or society | 5 | http://www.thecopm.ca/ | Law et al., Cup et al., Eyssen et al., Cusick et al., Carswell et al., McColl et al., Mulcahey et al., Davis et al., Pollock et al., Brandao et al. |
Cerebral Palsy profile of Health and function (CP-PRO) | ||||
Child Health Questionnaire (CHQ) | Assesses health related quality of life | 50 or 28 | https://www.healthactchq.com/survey/chq | Landgraf et al. |
Child Amputee Prosthetics project-Functional Status Inventory (CAPP-FSI) | Assesses functional status in children with limb deficiency including preschool children and toddlers | 40 | Pruitt et al. | Pruitt et al. |
Children’s Hand-use Experience Questionnaire (CHEQ) | Assesses the experience of children and adolescents in using the affected hand in activities where usually two hands are needed. New version uses Rasch model | 29 Mini-CHEQ for ages 3–8 years: 21 | http://www.cheq.se/ | Sköld et al. |
Disabilities of the Arm, Shoulder and Hand (DASH) and QuickDASH | Physical function and symptoms in patients with several musculoskeletal disorders of the upper limb | 30; QuickDASH:11 | http://www.dash.iwh.on.ca/ | Quatman Yates et al. |
Goal Attainment Scaling | Evaluates performance on each goal using specified possible outcomes, and evaluates the extent of goal attainment | Variable | https://www.kcl.ac.uk/nursing/departments/cicelysaunders/attachments/Tools-GAS-Practical-Guide.pdf | Bovend’Eeerdt et al., Kiresuk et al., Mailloux et al., Ten Berge et al., Wesdock et al., Lowe et al., Steenbeek et al., Bovend’Eerdt et al. |
Motor Activity Log: Pediatric and Infant | Structured interview to examine how often and how well a child uses his/her involved upper extremity in their natural environment outside the therapy setting | 22 | https://www.uab.edu/citherapy/images/pdf_files/CIT_PMAL_Manual.pdf | Uswatte et al., Wallen et al. |
Pediatric Evaluation of Disability Inventory (PEDI) | Functional skills, level of independence, and modifications required for functional activities are assessed | 197 | https://www.pearsonclinical.com/childhood/products/100000505/pediatric-evaluation-of-disability-inventory-pedi.html | Haley et al. |
Pediatric Outcomes Data Collection Instrument (PODCI) | Upper extremity function as well as physical function, activity and sports, mobility, pain, happiness, and satisfaction with treatment | 86 | https://www.aaos.org/research/outcomes/Pediatric.pdf | Daltroy et al., Hunsakar, Amor et al., Kunkel et al., Matsumoto et al., Lee et al. (2010), Nath et al., Dedini et al., Huffman et al. |
Pediatric Measure of Participation (PMoP) | Assesses the child’s self-participation relative to how much the child’s friends participate | 51 self-participation 53 friends-participation | Mulcahey et al. | Mulcahey et al. , Mulcahey et al. (2016) |
Pediatric Quality of Life Inventory (PedsQL) | Assesses quality of life in children with chronic illnesses | 23 | https://www.pedsql.org/ | Varni et al. |
Piers-Harris Children’s self-concept scale | Assesses behavior, intellectual and school status, physical appearance and attributes, anxiety, popularity, happiness, satisfaction | 80 | https://www.mhs.com/MHS-Assessment?prodname=piersharris2 | Piers et al. |
Prosthetic Upper Extremity Functional Index (PUFI) | Evaluates the extent to which a child actually uses a prosthetic limb for daily activities, the comparative ease of task performance with and without the prosthesis, and its perceived usefulness | 38 | Wright et al. | Buffart et al., van Dijk-Koot et al., Wright et al. |
Computer Adaptive Tests and Short Forms | ||||
Pediatric Evaluation of Disability Inventory Computer Adaptive Test (PEDI-CAT) | Assesses self-care mobility and social function in children across different diagnoses | Variable from 197 item pool | https://www.pedicat.com/ | Haley et al., Coster et al., Allen et al., Dumas et al., Mulcahey et al. |
CP PRO | Parent-reported assessment of physical functioning with an upper extremity skills subscale | Can be limited to 5, 10, or 15 item stopping rule | University of Utah | Grampurohit et al. |
The advanced user of assessments in the clinic can go beyond the evidence of measurement properties and utility of the instrument to consider the consequences of testing. The consequences of testing are not only experienced by the therapist, organization, and payers, but also by the child, parents, and other team members. The therapist must pay close attention to the undesired consequences of testing such as value judgements and social desirability. For example, the value assigned to a developmental test to determine the services needed for a child can place undesired burden on the tester and the test-taker both and deter from unbiased performance from both entities. The high-stakes embodiment of the instruments may not match the original purpose of the test and can also affect the validity of the instrument’s scores. The advanced user needs to consider the positive and negative impact of testing as contributing to the validity evidence of the instrument.
Administration, Scoring, and Interpretation
To get the child and parent engaged in the process of assessment, it is important to share the rationale, purpose, and interpretation of scores in an easy to understand language. The consent for testing should be routinely obtained and participant’s feedback on the process and results should be incorporated into future testing situations. The structured and standardized administration of a measure includes acquiring the necessary training in setup, instructions to the patient, recording, scoring, and continued competency. Many of the upper extremity performance measures require extensive set up, standard workspaces for reaching and placing objects, and differing sizes and weights of objects. They are prone to observer bias introduced due to fatigue, lack of interest, or accumulation of compensatory movements. For patient-reported outcome measures, the framing of questions can change the responses obtained. For example, task performance of children is indicated by asking if they “did” something and capacity is indicated by asking if they “can do” a task. There was a difference of 18% detected between performance and capacity in children with physical disabilities.
Instruments should be accompanied by instructional manuals that provide detailed information on set up, ideal testing conditions, administration, methods for calculating total and subscale scores, handling missing items, and interpretation of scores. Raw scores are obtained by manual calculations. Raw scores often need to be converted to standard scores, which allow for comparison across individuals and variables with different normal distributions. Calculating a standard score requires one to know the raw score, the mean score, and the standard deviation. For example, IQ scores are traditionally expressed with a mean of 100 and standard deviation of 15 ( Fig. 4.1 ). A conversion to Z-score has a mean of 0 and standard deviation of 1. The Z-score is the distance from the mean in standard deviation units. A T-score on the other hand is a transformed score with a mean of 50 and standard deviation of 10. For example, Pediatric Evaluation of Disability Inventory’s standard score is a T-score. A high T-score needs to be interpreted based on the desirability of the skill assessed. A high T-score for an undesirable trait is worse performance. For example, in the Children’s Depression Inventory, a higher T-score would indicate more depression. In contrast, the PROMIS Upper Extremity Item Bank developed using Item Response Theory methods is scored on a T-score metric where higher scores indicate better function.
For norm-referenced tests, percentile scores are indicative of how the child ranked relative to the normative sample. A percentile rank of 95 indicates that a child scored at or above 95% of the students in the normative sample. Age equivalencies, on the other hand, provide an equivalent age based on the performance of the child on the measure. The interpretation of the equivalent age provided on such measures should be interpreted in light of the other scaled or standard scores. The scaled score is the performance on a subtest that assesses a particular skill. It is combined with other scaled scores to form standard scores. A standard error of measurement is provided based on the normative sample, and two standard errors on either side of the score provide the confidence interval. Thus, the confidence interval is the hypothetical range of scores predicted if a child were given the test multiple times. Criterion-referenced test scores are interpreted with a cut-score or percent correct; however, standard and scaled scores may also be used. Interpretation for criterion-referenced test scores is based on how much of the skill the child can perform and is beneficial for children with moderate-to-severe disability where comparisons with normative sample do not serve the intended purpose of testing.
Functional Performance Measures
The functional performance measures involve demonstration of functional arm and hand tasks that are rated by observation of quality, completion, assistance, or speed of performance. These measures require standard setup, administration, scoring, and interpretation guidelines typically provided in a manual. Table 4.2 lists some of the available functional performance measures that can be used with children with pediatric upper extremity disorders, and other sources listed in Table 4.3 can provide a detailed listing of the psychometric properties of these measures. Therapists should note that many of these performance measures were field-tested with adults. The Jebsen Test of Hand Function is a norm-referenced, timed test of hand function that was originally established for adults and subsequently field-tested in children. It requires manipulation of everyday objects and one writing task. Although it has been used with children with varying diagnoses, sound psychometric studies in samples of pediatric populations with upper extremity impairments are lacking. The Box and Block Test is another generic performance measure that evaluates unilateral hand function as assessed by the number of blocks acquired, carried, and released in 1 min. Although most psychometric studies have been conducted with adults with neurologic and orthopedic impairments, studies have also been done with children 3 years and older and with conditions such as brachial plexius birth palsy, limb deficiency. The AHA is an upper extremity performance measure that measures assisting hand use during bimanual play in child’s typical environment. It has been studied for children with spastic hemiplegia, cerebral palsy, and other orthopedic conditions. The Mini-AHA has been established for babies with CP between 8 and 18 months of age and further psychometric testing is needed. The Melbourne Assessment of Unilateral Upper Limb Function (MUUL), the Quality of Upper Extremity Skills Test (QUEST), and the SHUEE are performance measures for children with cerebral palsy that assess upper extremity function. The MUUL, QUEST, and SHUEE are impairment or body structure-level measures. All three instruments have strong psychometric properties when used with children with CP, provide important information about upper limb function, and have been used in treatment effectiveness studies. Based on a systematic review of psychometric studies, for children with CP and upper limb involvement, the MUUL is recommended for assessment of unilateral performance and, when used with the AHA, is most effective at measuring change in unilateral and bimanual hand function over time or following treatment.
Resource | Website ∗ | Details |
---|---|---|
Databases | ||
PROQUOLID | https://eprovide.mapi-trust.org/about/about-proqolid | A database of patient centered outcomes that assists selection of clinical outcome assessments based on recommended sources such as US food and Drug Administration, European Medicines Agency, and the research community |
Rehab Measures Database | https://www.sralab.org/rehabilitation-measures | A database of over 400 existing measures in rehabilitation. The psychometric properties are provided within each instrument summary for the various populations |
Spinal Cord Injury Research Evidence (SCIRE) Outcome Measures | https://scireproject.com/outcome-measures/ | A database that provides information on common outcome measures used in spinal cord injury clinical practice. Also provides resources for selection of measures such as the Outcome Measures Toolkit with 33 tests |
Measurement Systems | ||
CanChild | https://www.canchild.ca/ | An accessible resource for children, families and healthcare providers regarding outcome measures and other aspects care for autism spectrum disorder, brain injury, concussion, cerebral palsy, developmental coordination disorder, down syndrome, fetal alcohol spectrum disorder, and spina bifida |
Health Measures: A U.S. National Institute of Health (NIH) distribution center for Neuro-QoL, PROMIS, NIH Toolbox, and ASCQ-Me | http://www.healthmeasures.net | Quality of Life in Neurological Disorders (Neuro-QoL): A measurement system developed and validated for common neurological conditions to evaluate physical, mental, and social domains across the lifespan Patient-Reported Outcome Measures Information System (PROMIS): Self-reported and parent-reported measures developed and validated for global, physical, mental, and social health across the lifespan for general population and chronic conditions NIH Toolbox: Performance tests developed and validated for cognitive, motor, and sensory function. Self-reported measures developed and validated for emotional function across the lifespan ASCQ-Me: Self-reported measures developed for physical, mental, and social health in sickle cell disease |
Common Data Elements or Core Sets | ||
American Physical Therapy Association (APTA) | http://www.neuropt.org/professional-resources/neurology-section-outcome-measures-recommendations | Academy of Neurologic Physical Therapy EDGE Recommendations: Details of outcome measures for clinical practice, research and education are provided for many conditions including stroke, brain injury, and spinal cord injury |
https://pediatricapta.org/includes/fact-sheets/pdfs/13%20Assessment&screening%20tools.pdf | Pediatric APTA: List of outcome measures in pediatrics organized by International Classification of Functioning, Disability, and Health Framework | |
Core Outcome Measures in Effectiveness Trials (COMET) | http://www.comet-initiative.org/ | Core Outcome Sets recommended by scientific panels for reporting outcomes within clinical trials |
International Classification of Functioning, Disability, and Health (ICF) Framework | http://www.who.int.easyaccess2.lib.cuhk.edu.hk/classifications/icf/en/ | The ICF framework recommends coresets for various ICF domains that can be populated into a documentation form with a response scale that can be completed online to generate a functional profile |
The International Spinal Cord Society (ISCOS) Core Data Sets | https://www.iscos.org.uk/international-sci-core-data-sets | International spinal cord injury datasets integrated with NINDS CDE resources |
U.S. National Institute of Health (NIH) Common Data Elements | https://www-nlm-nih-gov.easyaccess2.lib.cuhk.edu.hk/cde/ | NIH repository of common data elements with outcome measures recommended by scientific advisory panels including those set up by the National Institute of Neurological Disorders and Stroke (NINDS) and National Cancer Institute (NCI) |
U. S. National Quality forum (NQF) | http://www.qualityforum.org/Setting_Priorities/Improving_Healthcare_Quality.aspx | NQF invites authors to submit their measure for a thorough evaluation by their standing committee or to be guided by a measure incubator to be part of the portfolio of endorsed measures with a rigorous consensus development process |