Introduction: In order to evaluate complex skills as research, valid and reliable instruments must be constructed. Purpose: To report the process of construction, validity and reliability of an instrument to assess the critical reading of medical papers. Methodology: An electronic research of published medical papers was performed. After several revisions six research designs were finally selected: validation of instruments, surveys, cases and controls, diagnostic tests, clinical trials, and cohorts. An abstract from each paper was made in Spanish emphasizing methodological issues (designs, instruments used, sample size, blind evaluations, bias, the selection of statistical methods, and others). From the abstract several items were constructed which were grouped in headings related to the methodological issues and the indicators of critical reading used: Interpretation (hidden aspects), judgments (evaluating the best methods), and proposals (suggesting better methods) were explored. Each item had a true o false answer. An initial 157 item was constructed. Validation: Five experts were invited to evaluate, in two independent rounds, if the instrument had theory, construction or content validity. The experts suggested changes that were made in these two rounds. At the same time they answered the final items as true or false with an agreement 4/5 or 5/5 to be acceptable. A pilot application was made to students for further adjustments. The final version of the instrument had 108 items, 36 for each indicator and 18 for each design, the half of the correct answers were true and the other half were false. Application: The instrument was resolved by three different experienced groups in critical reading G1 (n: 7) professors of medical specialties G2 (n: 23) medical interns in an active research course and G3 (n: 24) medical students without any research course. The answers were determined as True, False or, Don’t Know. Each correct answer (True or False) added one point and each incorrect answer subtracted one point. Questions answered with Do not know did not add or subtract any points. This is how the final grade, which was expressed as group medians, was determined. The grading was performed through an electronic system especially created to minimize data capturing errors and it was carried out blind by staff unrelated to the research. The data were analyzed with version 15 of the SPSS program. A random level was determined in the groups.Statistics: The intra e inter agreement raters were obtained. 20-KR was used to calculate the reliability. Kruskal-Wallis test were used for comparing the groups . Spearman test relating school average and global medians were calculated. The random level was determined as well. Results. The raters agreed in validity of instrument. The whole reliability was 0.75. The inter agreement value was 0.82 and the intra agreement was 0.80. The global median for G1 was 62, for G2: 28 and for G3:11. The random level calculated was 17. All the results were significant for G1 <=0.01 for global, the three indicators and the six designs. The level of answer random was 0% per G1, 13% for G2 and 83% for G3. Comments: The validty and realiability of the instrument is good to evaluate the critical reading. The contrast results among different experienced groups and the rand values sustained its reliability. The instruments should be valid and reliable to assess this complex skill.