Random Control Trials are the next big thing for measuring impact and evaluation in charities. They are portrayed as the gold star standard - or so the propaganda goes. In reality, they represent a kind of evaluation fantasy: presenting the exceptional and atypical as it were within the grasp of all charities.
Here are five key challenges I’ve identified:
1. Nobody mentions double-blind
The best trials of an intervention are those where neither the test subject or the person administering the test knows who gets which intervention. This is to stop either the user or the evaluator from adding their own bias into the observations. I have yet to see any mention of the double-blind in the charity context. So when people talk about RCTs being the gold standard – they clearly aren’t. At best RCTs are the silver standard, unless they are double blind.
2. Is random feasible or ethical?
How can a charity create a randomised control trial ethically? What are the ethics of giving one group of beneficiaries nothing and the other group an intervention? If an intervention were based in a community, would that mean only half the community might get support? And how would they divide the group randomly?
At nfpSynergy, we did a five year study into the impact of outdoor education on a south London school. In one year, the vast majority of students went on three outdoor residential visits, but about 15-20 pupils each year didn’t. It would be easy to see this as the control group, but of course, they weren’t random. Those that didn’t go did so for financial, cultural, personality or domestic reasons - they weren’t like those who went. Getting a genuinely random control group is very hard.
3. Charity interventions are rarely single variables, and often hard to measure.
In the world of drug trials, a single variable is relatively easy to achieve. In the world of charities, a single variable is much harder to achieve, and deciding what to measure is equally difficult. Even in the education setting, a different kind of approach to teaching class A and class B would require the two classes to have been chosen at random (e.g. not streamed on ability), and that the pupils in the classes didn’t talk to each other about what they were doing (as might happen in sex or sports education, or other topics).
More complex interventions such as in mental health would require both a control group (as opposed to a pre and post intervention study) and a method that allowed objective measurement of benefit.
Many charities work on self-reporting of results. There is plenty of evidence of subjects in studies like this wanting to be better, or to please the researcher. Even more likely is that if people who set up and run the interventions also do the testing, then another bias will creep in.
4. External researchers are usually needed.
For an RCT to be taken seriously, it probably needs to be carried out by independent researchers - not by people employed by the body carrying out the intervention. This is to ensure that those who run the interventions do not consciously or unconsciously bias the results. Even a researcher employed by the same organisation would have a strong conflict of interest if they knew their employer would get significant extra funding if the results of an RCT were positive. Indeed even an external researcher would feel a degree of pressure to find positive results if they knew that this might result in extra evaluation or even just a happy client. Nonetheless, independent research is the best (if not perfect) way to carry out an RCT. However, it does considerably increase the costs.
5. The sample size needs to be pretty big.
Imagine that you are carrying out an intervention that improves exam results by from 20% getting an A* to 24%. Even if the test group and the control group each had 250 pupils in them, this result would only just be statistically significant. The Warrior Programme RCT had under 30 in each group. To conduct an RCT of this size while ensuring that the issue being tested was the only variable and the groups were truly random would be a mammoth undertaking.
All these factors mean that carrying out a valid RCT for a charity or other non-profit is a pretty tall order. Statistical significance is another challenge. For a drug trial, it would be mandatory to show that any benefit of a drug is statistically significance because of the cost of a drug and the risk to human health. But if a new intervention showed a benefit to the participants (say a way of teaching) which had no additional costs, and no potential downside, then is statistical significance important? If all other things are equal then going with an intervention that works better can be a legitimate approach, even if the results are statistically insignificant.
RCTs and volunteering
The government is currently funding some RCTs on volunteering for the over 50s. The criteria for the trials demonstrate just some of the points I have been talking about. They want volunteering events which bring in around 1000 people (or at least 500+), who are then split into groups of 500 to have the different interventions inflicted on them. This type of trial is the tail wagging the dog in almost every sense. Very few volunteer programmes look to recruit 500+ volunteers ‘in a single day’ (to quote the criteria). Most volunteer programmes work on a drip drip drip of volunteers. A volunteering event of this size could only be run by a large organisation (so small organisations penalised yet again). And even if these results all deliver, how replicable will the results be to ordinary organisations recruiting ordinary volunteers?
Fundraisers have effectively used RCTs for many years.
The irony is that fundraisers have used RCTs for years, and never bragged about it. Any direct marketer worth their salary will be continuously testing different interventions on split test or samples of their database. Many years ago, I helped test for RSPCA on whether a request for £8, £10, £12 or £15 as a donation prompt raised the most money (£8 generated the highest response rate, but around £12 the highest income). Perhaps the most bizarre test result I ever got was that a reply envelope with a window showing the reply address on the donation form generated a third more income than one with no window. Don’t ask me why.
In summary, my concern about the current vogue for RCTs is threefold:
1. There are few situations where a genuine RCT will work given all the necessary criteria I have spelt out. Why create scenarios purely for the benefit of having an RCT? Why have an evaluation standard that is applicable to very few of the interventions that charities make?
2. It makes evaluation even harder and more expensive, and it is outside the price bracket for small charities. It is, in effect, a way that makes it even harder for small charities to compete in a big charity world.
3. It may mean that good interventions don’t happen. If we set the bar for a successful intervention as being a statistically significant result in an RCT, then some successful powerful interventions will not get over that bar - and that would be the greatest tragedy of all.