This paper presents the results of different methods to assess reliability when instructor pilots rate pilots regarding their non-technical skills (NOTECHS). In preparation for a major inter-rater reliability study, this pretest analyzes the rating behavior of two instructor pilots during a full-flight simulator mission. Besides inter-rater reliability and test-retest reliability, the pilots’ self-rating (n =12) and the instructors’ point of view is analyzed. Results indicate a wide spread from poor to excellent reliabilities as a function of the different rating dimensions. Regarding inter-rater reliability, it is found that non-technical skills are rated more reliably under high workload conditions than under low workload conditions, and social aspects of non-technical skills are rated more reliably than cognitive aspects. Test-retest reliability is found to be .6 on average, whereas self-rating / instructor rating reliability is .5 on average. Based on these findings, implications for the major inter-rater reliability study will be derived and incorporated.