Deformable image registration can be used in image-guided interventions and additional applications increasingly. increasing difficulty. Our examples derive from clinical data gathered during MRI-guided prostate biopsy authorized using publicly available deformable sign up tool. The results indicate the proposed methodology AHU-377 can be used to generate concise graphical summaries of the experiments as well as a probabilistic estimate of the sign up outcome for a future sample. Its use may facilitate improved objective assessment assessment and retrospective stress-testing of deformable. performance of an algorithm. In practice average performance is definitely of limited energy. A much more practical measure is one that describes (based on the experimental data) the probability of the method producing a meaningful result the next time it is used together AHU-377 with the connected uncertainty with this estimate. Here we investigate the use of tolerance limits  to provide such estimates. Compared to the popular summary statistics that aim to capture average or intense results observed in the experimental evaluation tolerance limits establish confidence bounds on a proportion of the experiments therefore characterizing the expected performance on fresh subjects. A taxonomy for reference-based validation of image processing tools has been proposed by Jannin . Briefly validation typically entails assessment of the results produced by a method under investigation with that of a research. A research can be obtained using a computational method that has been validated earlier or using knowledge of a website expert. Given the results produced by these two methods a comparison function is used to measure the discrepancy or the “range” to the research. In image sign up Target Registration Error (TRE) or Landmark Sign up Error (LRE) are the distances popular . The errors are computed for the different datasets and parameter ideals used in the validation and are summarized by a quality index. The quality index captures statistical properties of the distribution of the local discrepancies in the intrinsic level (input dataset and fixed guidelines) or global level (evaluation carried out using different guidelines and validation datasets). The most commonly reported quality index is concerned with summarizing the average error observed in the evaluation. As an example we examined the manuscripts offered in the Sign up I and II sections of the MICCAI 2013 conference  and found that most of those content articles concerned with the evaluation of a sign up methods report imply and standard deviation of the error measure as the summary statistics Hbegf in the validation section. Although useful the characterization of normal behavior is not sufficient to describe the performance of an algorithm on a typical pair of images. Another generally reported summary statistic is the proportion of successful experiments. In our earlier work AHU-377 we offered an evaluation of a deformable sign up algorithm developed for image-guided prostate biopsy . Success rate (proportion of experiments that were deemed successful based on the defined criteria) was reported separately for each of the datasets used in the evaluation. A similar approach was used AHU-377 in  and  where the capture range of the method was defined as the starting misalignment that led to a fixed success rate. This approach to reporting results does not directly account for variability observed across the datasets used in the evaluation does not include uncertainty in the estimate and does not allow inference of the expected performance of the algorithm under related experimental conditions. In summary none of the measures popular to summarize results of sign up validation studies allow inference of standard (expected) overall performance of the method. A fundamental variation between the typical-case and average-behavior scenarios is definitely that behavior in a typical case must be regarded as a random variable not an unknown constant. For example the accuracy of a sign up will vary from sample to sample and when applied to a particular pair of images subject to a given failure criterion an algorithm will either succeed or fail. Two types.