How will you analyse and report EDI characteristics?
In any research project you will have a main analysis which will be carried out on your research sample.  Statistics on participant characteristics (demographics) are obtained from your sample to describe your target population e.g. age, sex, gender and ethnicity. Your research sample should reflect the population that you want to implement the research findings on. This might not necessarily reflect the whole of the UK e.g. your disease area or social care need might be more common in older people such as Chronic Obstructive Pulmonary Disease (COPD) or Parkinson’s Disease, so your sample would be older.
When it comes to research analysis, it’s very common to see participants’ demographic data reported on in ‘Table 1’. This reporting may be because questions of demographic differences are not considered central to the study, and it may also reflect some of the statistical debates related to undertaking subgroup analyses, and the potential misuse of these.
Within your sample though there may be effects that are dependent on demographics that can be examined. We can test these formally through interaction effects in statistical analysis, but these are often underpowered due to not being statistically considered for such analysis. This means that minoritised groups typically become indistinguishable from the majority in statistical analysis, and the experiences, attitudes and/or results of majority groups conceal the nuances between groups. An example relates to domestic violence research and sexuality, where subgroup analyses indicate that bisexual women are much more likely to have experience domestic violence than heterosexual women (Donovan and Barnes 2019). 
We can however examine the results through a sensitivity analysis within a demographic grouping, whatever the amount of people in them, to see if estimates with their confidence intervals are at least comparable to the whole sample without introducing formal testing which would be underpowered. These should be interpreted with caution and seen as hypothesis driving and included in the Statistical Analysis Plan.  If using large datasets such as data from large cohort studies or routine electronic health or social care data, planned sub-groups could be introduced that are sufficiently powered if there is a valid reasoning around using the demographic subgroup to do this.
As of 27 November 2024, the NIHR will require all applicants for domestic programme awards to detail how they will ensure inclusion is considered and built into the whole research lifecycle including how they are collecting demographic data (NIHR 2024). Differences within groups can be starker than differences between groups, however, and intersectionality (Crenshaw 2016) draws our attention to how being, for example, Black African and gay, or Pakistani, older and female, may lead to very different experiences of health and social care compared to studying those identities in isolation (Riggs and das Nair 2012).
With qualitative data analysis, it is important to reflect on your codes and themes as they emerge and consider whether these resonate with the different groups in your sample, or whether some speak more to particular groups than others. Acknowledging negative cases which contradict dominant findings is a key aspect of robust qualitative data analysis and writing up (Patton 1999), but it is also important to interrogate whether there are any patterns in the data that are pertinent to demographic characteristics or specific groups. For example, it might be that most participants spoke positively about their experiences of a new health or social care service, but what about those who were negative or ambivalent? Did they have any demographic characteristics that need to be acknowledged and that might highlight an intervention that is not sufficiently inclusive of that group, or the need for further research about that group’s needs? Particularly with thematic analysis, it is important not to be so focused on specific themes that you become detached from wider contextual factors such as participants’ age, sex, gender, ethnicity or sexual orientation and their related experiences of structural and health inequalities.
Have diverse perspectives been incorporated to support the interpretation of results?
It is also valuable to ensure not only that more than one person is involved in the coding and generation of themes, but also that these researchers occupy different social positions and therefore will interpret the data from a range of vantage points. Practising reflexivity (Probst 2015) is key to recognising how your own identity, lived experience and values might affect your analytical decisions and interpretations.
It is beneficial to involve your public contributors who could bring diversity of lived experience and identities in your data analysis. In some studies, public contributors could conduct some of the analysis; for example, identifying themes and coding qualitative interviews. At the very least, results should be presented to your public involvement group to verify that the narrative around the findings and conclusions drawn are not influenced by the research team’s unconscious bias, but reflect the data being seen. Consideration should be given to how the results are presented to public contributors to ensure they are accessible, and that public contributors are able to use the information to engage in meaningful feedback and discussion.
How will you describe participants’ EDI characteristics?
People often describe other people in relation to themselves, and this can result in people who are often the objects of research being described in ways which may not accurately reflect their identities and experiences. When reporting participants’ demographic characteristics, what language will you use?  Who came up with these names/categories? Did the people you’re describing have a say in what they are called?
Terms to describe people’s identities are constantly evolving, meaning that wordswhich may at one point have been acceptable or analytically convenient may now be considered offensive.  The term ‘BAME’, standing for Black, Asian or Minority Ethnic is increasingly being rejected by those to whom it is applied (Fakim et al. 2020; Inc Arts UK. 2020). This is because of its tendency to clump together people primarily because they are ‘non-white’.  This kind of conflation is sometimes done to gain statistical power by merging existing smaller groups into one big group or to summarise results more concisely.  However, the communities which ‘BAME’ refers to are far from homogeneous and sub-categories of characteristics should not be clumped together where possible.  Government guidance on how to collect, analyse and report ethnicity data is available and may be helpful for you to consider.
Disabled people or people with impairments may subscribe to the social model of disability which describes disabled people as being disabled by an inaccessible and discriminatory society. This is in contrast to the biomedical model of disability which focuses on disability being caused by a person’s impairment (Inclusion London 2022). The biomedical model continues to dominate medicine and medical research, but campaigners are advocating for society – including access to health and social care services and research – to be less disabling.
We should ask ourselves what are we hoping to determine when presenting demographic data  or differences between demographic groups? Demographic data might be used to highlight:
| 
Differences in social determinants of health e.g. institutional racism’s effect on health outcomesGenetic differences contributing to health differences e.g. impact of different genes to blood pressure (although genetic differences are usually higher within race categories rather than between them (Egede 2006), so it might be better to use actual genetic information rather than race categories).Physiological differences that have different risk factors for diseases e.g. relationship between skin colour and skin cancer.Different attitudes to health care, e.g. some religious attitudes to divine protection from disease, or fatalism.Different behaviours that could lead to different health outcomes, e.g. different sexual practices within protected characteristics and risk of HIV infection.Differences in cultures rather than differences in abilities (e.g. the cultural model of Deafness)Variable outcomes for different demographic groups e.g. health promotion literature may be less effective in improving nutritional balance for those in socio-economically deprived areas. | 
You should check with a diverse group of people with the characteristics being researched to determine how to name categories, whether it be ethnicity, gender, sexuality, disabled people, or people living with a health condition. However, you should also be prepared for challenges in arriving at a consensus: views among populations who share a characteristic can be varied and what may be acceptable to some individuals may be considered offensive by others (e.g. Vincent, 2018 in relation to transgender and non-binary gender identities). Asking participants to self-identify might be the most empowering option, but there are ethical tensions when this data then needs to be coded and aggregated for statistical analysis.
 Maintaining anonymity
As well as the considerations above to prevent alienating people from participating in or engaging with research, we also need to take care to protect participants’ anonymity when presenting their demographic data. These risks are amplified when we have low numbers in certain subgroups and when we report participants’ intersecting identities. For example, if we quote from an interview with someone who identifies as a Black, disabled, lesbian woman, this might only describe one or two participants in the sample. Recognising intersectionality is important as different intersections of characteristics have different experiences (Crenshaw 2016, TEDTalk), but it is clear to see that if this data were made public, it could identify people in the data. This risk is heightened with studies of a small geographical area.
With quantitative data, a good rule of thumb is if a table entry has less than five people in it then state ≤5 for the number and not the actual count. For a list of possible identifying variables see (Hrynaszkiewicz et al. 2010). For good practice in sharing data see the UKRI guidance on best practice in the management of research data.  With qualitative data, although detailed ('thick') description adds to the richness of qualitative reporting, careful anonymisation or removal of identifying details may be needed.
Key messages
- It is good practice to consult with groups of people concerning how you will label them in your research. Diverse patient and public involvement is essential to inform these decisions.
- Inclusion is important to build into quantitative and qualitative data analysis. With quantitative data, subgroup analysis should be considered for demographic characteristics where appropriate, but formal statistical testing should be avoided if under-powered to do so.
- When reporting data which includes participants’ demographic data, ensure disclosure control measures are in place to prevent making individual participants identifiable.