This issue considers social experiments in practice and how recent advances improve their value and potential applications. Experimental evaluations have stretched to address more diverse policy questions, no longer simply providing a treatment–control contrast but adding multiarm, multistage, and multidimensional (factorial) designs and analytic extensions to expose more about what works best for whom. Social experiments are also putting programs under the microscope when they are most ready for testing, enhancing the policy value of their findings.
Using an experimental design, researchers from our lab assesses the impact of a 10-week coding program implemented by an education NGO (funded by Google.org). Outcomes such as spatial temporal reasoning, problem solving, and planning skills are being measured by developmental batteries and tasks.
In our lab, using cognitive tasks and measurement, serves as an independent evaluator of an intervention (funded by İŞGYO) to boost creativity in 8-12 years old children from disadvantaged neighborhoods and aims to contribute to the dissemination of an evidence-based intervention with an improved version.

Balci, F., Baykal, E., Goksun, T., Kisbu-Sakarya, Y. Yantac, E. (in press, 2021). Improving creative thinking in elementary school children from disadvantaged families: A randomized trial. Journal of Creativity Research, in press.

Randomization
This research paper discusses the ways of improving the quality of randomized field experiments.
This research piece examines and discusses the benefits and costs of randomization using the example of the Women Affirming Motherhood program aiming to provide prenatal and postnatal care to underserved women.
Using examples from the New Orleans Homeless Substance Abusers Program, this article discusses and documents the difficulties that may be faced when randomization is used and how to overcome them.
This article describes a method that uses the participants’ compliance to program instructions as a means of classifying participants and, thereby, obtains a randomized control group for a subset of participants. A large smoking intervention project is used to illustrate two variations of this method.
Attrition
In this paper, the authors discuss whether an attrition standard based on information from education research is appropriate for use in other contexts such as home visits. The paper also provides an example of how to assess whether the attrition standard for one systematic evidence review fits other systematic reviews, along with considerations for adopting or modifying the standard for alternative contexts.
This brief describes what attrition is, why it matters, and how it factors into the study ratings in the Home Visiting Evidence of Effectiveness (HomVEE) review.
This report examines sample loss in the population surveys with an eye to telling social security officials what to do about it. Successive chapters document the growth in sample loss due to nonresponse and nonmatching; provide estimates of match bias and attrition bias; examine discontinuities between consecutive population survey panels in estimates of beneficiary characteristics as well as poverty rates for the broader population; and examine the comparative strengths of the population surveys in describing the economic well-being of the population in general and elderly and lower-income persons in particular.
Sampling
In this paper, the author provides a framework for sample selection in the multiple population case, including three compromise allocations. Methods were situated in an example and a discussion of the implications for the design of randomized evaluations more generally were presented.
The article introduces a model-based balanced-sampling framework for improving generalizations ta address the issue of the findings from the experiment are generalizable to a larger population. In this framework, it is suggested that first units in an inference population are divided into relatively homogenous strata using cluster analysis, and then the sample is selected using distance rankings.
This article investigates whether propensity score methods can help us to remove selection bias in studies with small treated groups and large amount of observed covariates. To confirm the usefulness of simulation study results for practice, authors carry out an empirical evaluation with real data. The study provides insights for practice and directions for future research.
This paper examines the effects of misclassification (when there is a discrepancy of classifications based on two different sources, e.g., administrative data and reported values), specifically misclassification of race/ethnicity in sampling stratification in survey estimates. Authors suggests several potential improvements that could be made to address these misclassification problem.
The purpose of this paper is to present the survey design for an evaluation of a social program designed to provide health insurance for children. The authors present sampling examples for different designs and discuss the difficulties in implementations together with suggestions to overcome.
This paper explores the potential implications of recruitment source on response quality using a web survey fielded in 2013 to compare data quality indicators in survey data from the two recruitment platforms (Google and Facebook advertisements).
Effect size – Power – Sample size
This paper discusses the concept of statistical power and its application to psychological research.
In this paper, authors suggest rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli using 5 commonly used experimental designs, describing how to estimate statistical power in each, and providing power analysis results based on a reasonable set of default parameter values. They also introduce a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
A flexible, free statistical power analysis program for the social, behavioral, and biomedical sciences and its use were introduced.
This paper provides simple guidelines for calculating efficient sample sizes in cluster randomized trials with unknown intraclass correlation (ICC) and varying cluster sizes.
In this study from our lab, IEL, we compare the analysis of covariance (ANCOVA), difference score, and residual change score methods in testing the group effect for pretest–posttest data in terms of statistical power and Type I error rates using a Monte Carlo simulation.
Multistage design
The purpose of this study was to evaluate the Multi-Stage Model (MSM). The MSM proposes eight different stages that are distinguished by cognitions and behavior and concepts of habituation as well. The MSM stages were assessed in 835 rehabilitation patients with a questionnaire. Results, implications for further investigations and stage-specific interventions are discussed.
The Multiphase Optimization Strategy (MOST) is an approach to systematically and efficiently developing a behavioral intervention using a sequence of experiments to prepare and optimize the intervention. Authors outline the design and results of each experiment conducted in the MOST-based process, then discuss the challenges faced and decisions made at each stage to move forward in the treatment development and optimization process using an intervention to promote physical activity after an acute coronary syndrome as an example.
Multi-armed design
This article examines why multi-arm designs were attempted, how they were structured, why public administrators cooperated, what various actors sought to learn, and how the researchers determined what strategies the different experimental arms ended up to truly represent. The article concludes that these designs provide convincing evidence and can be inserted into operating programs if the studies address questions that are of keen and immediate interest to state or local program administrators and researchers.
This paper’s key methodological question is: How do the estimators for the two-group design need to be adjusted for multi-armed trials? The critical insight is that in multi-armed trials where the goal is to identify the most effective treatments, the samples for each pairwise contrast are representative of the full set of randomized units, not just of themselves.
Staggered Introduction Designs
Multiple baseline designs (MBDs) have been suggested as alternatives to group-randomized trials (GRT). The paper reviewed structural features of MBDs and considered their potential effectiveness in public health research. The paper also reviewed the effect of staggered starts on statistical power.
This paper describes a new intervention, “mentalizing positive affect,” and evaluates its effect on positive affect, negative affect, Borderline Personality Disorder (BDP) severity, ego-resiliency, and quality of life during Mentalization Based Treatment (MBT) for BPD. In a single-case multiple-baseline design, 4 female BPD patients received 6 months of individual MBT, after which they were followed up for 2 months.
Program Logic model & Intervention
This paper presents a potential mechanism for evaluations of the impact of local human services delivery systems and describes the experiences of a large, urban United Way. The mechanism focuses on a tool called the logic model. A description of the local United Way planning and evaluation process and how this process was enhanced through the utilization of the logic model is provided.
This case study uses a logic model to plan and evaluate an intervention project targeted to high-risk, middle-school, African-American youth for drug and crime prevention. The logic model is used to provide explanation, justifications for selecting various programmatic components and evaluation measures. It consists of five components: assumptions, program activities, immediate outcomes/evaluative activities, intermediate outcomes/objectives, and final outcomes/goals.
This paper describes the logic model process for developing an intervention intended to improve parents’ miscarriage experience at the emergency department. The six steps of W. K. Kellogg Foundation (2004) theory logic model were used to 1) describe the problem; 2) conduct a needs assessment; and to identify 3) expected results, 4) influential factors, 5) intervention strategies, and 6) assumptions related to change strategies.
The use of the program logic model as an integrative framework for analysis is illustrated in a multimethod evaluation of Project TEAMS, a middle school curriculum delivery program. One of the anticipated outcomes, computer skills, is examined in detail to illustrate the utility of program logic models as an analysis framework.
This paper comprehensively describes and illustrates via a logic model what processes work for adolescents in residential treatment facilities and how to make improvements. The purpose of this article is to highlight one Adolescent Therapeutic Residential Care’s journey to develop and implement a working Logic Model.
This case study uses a logic model to plan and evaluate an intervention project targeted to high-risk, middle-school, African-American youth for drug and crime prevention.
Reporting & Interpretation
This chapter shares early experiences and emerging lessons from developing the Climate Investment Funds’ (CIF) Pilot Program for Climate Resilience (PPCR) participatory results-based monitoring and reporting system. The chapter also highlights opportunities to complement monitoring and reporting with evidence-based learning.
This study draws up a set of proposed guidelines for reporting evaluations based on observational methodology.
In this paper, authors examine the state of journal article reporting standards as they applied to qualitative research and generate recommendations for standards that would be appropriate for a wide range of methods within the discipline of psychology. These standards were described what should be included in a research report to enable and facilitate the review process.
Validity
In this article, authors consider how a balanced sample strategic site selection method might be implemented in a welfare policy evaluation to improve external validity.
This study addresses validity issues and by analyzing the intersection of evaluation type and validity type, the authors explore the status of House’s standards of Truth, Beauty, and Justice in evaluation practice.
Causal Inference
This book provides guidelines and discussions about experimental and quasi-experimental designs for generalized causal inference.
This paper analyzes 12 recent within-study comparisons contrasting causal estimates from a randomized experiment with those from an observational study sharing the same treatment group. The aim is to test whether different causal estimates result when a counterfactual group is formed, either with or without random assignment, and when statistical adjustments for selection are made in the group from which random assignment is absent.
This chapter aims to introduce the reader to the topic of causal mechanisms and synthesize significant findings on this special issue.
This study describes methods to estimate causal direct and indirect effects and reports the results of a large Monte Carlo simulation study on the performance of the ordinary regression and modern causal mediation analysis methods, including a previously untested doubly robust sequential g-estimation method, when there are confounders of the mediator-to-outcome relation.