Google releases HEAL architecture, 4 steps to evaluate whether medical AI tools are fair

Estimated read time 9 min read

If you think of maintaining a healthy state as a race, not everyone can stand on the same starting line. Some people can finish the race smoothly, and some people can get help as soon as possible even if they fall, but some people may Face additional barriers because of economic circumstances, where you live, education level, race, or other factors.

“Health equity” means that everyone should have equal access to health and medical resources so that they can complete the race more calmly and achieve optimal health. Unfair treatment of disease prevention, diagnosis, and treatment for certain groups, such as racial minorities, people of low socioeconomic status, or individuals with limited access to health care, can significantly affect their quality of life and chances of survival. There is no doubt that increasing attention to “health equity” should become a global consensus to further address the root causes of inequality.

Nowadays, although machine learning, deep learning, etc. have “made achievements” in the medical and health field, they have even moved out of the laboratory and into the clinical frontline. While lamenting the powerful capabilities of AI, people should pay more attention to whether the implementation of this type of emerging technology will exacerbate the inequality of health resources?

Health equity assessment diagram

  • Light blue bars represent pre-existing health outcomes
  • Dark blue bar chart illustrates the impact of an intervention on pre-existing health outcomes

To this end, the Google team developed the HEAL (The health equity framework) framework, which can quantitatively evaluate whether medical and health solutions based on machine learning are “fair”. Through this approach, the research team seeks to ensure that emerging health technologies effectively reduce health inequalities rather than inadvertently exacerbating them.

HEAL architecture: 4 steps to evaluate the fairness of dermatology AI tools

The HEAL framework consists of 4 steps:

  1. Identify factors associated with health inequities and define AI tool performance metrics
  2. Identify and quantify pre-existing health disparities
  3. AI tool performance testing
  4. Measuring the likelihood of AI tools prioritizing health equity disparities

HEAL architecture, taking AI tools for dermatology disease diagnosis and treatment as an example

Step 1: Identify factors associated with health inequities in dermatology and identify metrics to evaluate AI tool performance

The researchers reviewed the literature and considered data availability to select the following factors—age, gender, race/ethnicity, and Fitzpatrick skin type (FST).

FST is a system for classifying human skin based on its response to ultraviolet (UV) radiation, specifically sunburn and tanning. Ranging from FST I to FST VI, each type represents a different level of melanin production in the skin, eyes, and hair, as well as sensitivity to ultraviolet light.

In addition, the researchers chose top-3 agreement as a metric to evaluate the performance of the AI ​​tool, which is defined as the proportion of cases in which at least one of the top 3 conditions suggested by the AI ​​matches the reference diagnosis from a panel of dermatology experts.

Step 2: Identify existing “health disparities” in dermatology

Health disparity indicators are specific measures used to quantify and describe health status inequalities between different groups. These groups are differentiated based on race, economic status, geographic location, gender, age, disability, or other social determinants.

Here are some common indicators of health disparities:

Disability-adjusted life years (DALYs) : reflects the number of years of healthy life lost due to illness, disability or premature death. DALY is a composite indicator that is the sum of years of potential life lost (YLLs) and years lived with disability (YLDs).

Years of Life Lost (YLLs) : The expected number of healthy years lost due to premature death.

At the same time, the researchers also conducted a sub-analysis on skin cancer to understand how the performance of AI tools changes under high-risk conditions. The study used the Global Burden of Disease (GBD) ‘non-melanoma skin cancer’ and ‘malignant cutaneous melanoma’ categories to estimate health outcomes for all cancers, and the ‘cutaneous and subcutaneous diseases’ category for all non-cancer conditions.

Step 3: Measure the performance of the AI ​​tool

Top-3 agreement agreement was measured by comparing AI-predicted ranked pathologies to reference diagnoses on the evaluation dataset (subgroups stratified by age, sex, race/ethnicity, and eFST).

Step 4: Examine the performance of AI tools in accounting for health disparities

Quantify the HEAL indicator of skin disease AI tools. The specific method is as follows:

For each subpopulation, two inputs are required: a quantitative measure of pre-existing health disparities, and AI tool performance.

Calculate the inverse correlation R between health outcomes and AI performance across all subgroups for a given inequality factor (such as race/ethnicity). The larger the positive value of R, the more comprehensive the consideration of health equity.

Defining the HEAL indicator of the AI ​​tool as: p(R > 0), the likelihood of AI prioritizing pre-existing health disparities was estimated through the R distribution of 9,999 samples. A HEAL metric above 50% means there is a higher probability of achieving health equity; below 50% it means a lower probability of achieving equitable performance.

Dermatology AI tool evaluation: Some subgroups still need improvement

Race/Ethnicity: The HEAL indicator is 80.5%, indicating a higher priority for health disparities that exist in these subgroups.

Gender: The HEAL metric is 92.1%, indicating that gender has a high priority in considering health disparities in AI tool performance.

Age: The HEAL indicator is 0.0%, indicating a low likelihood of prioritizing health disparities across age groups. For cancer conditions, the HEAL indicator is 73.8%, while for non-cancer conditions, the HEAL indicator is 0.0%.

HEAL indicators for different age groups and groups with or without cancer

The researchers conducted logistic regression analyses, which showed that age and certain dermatological conditions (such as basal cell carcinoma and squamous cell carcinoma) had a significant impact on AI performance, while performance was less accurate for other conditions (such as cysts).

In addition, researchers conducted an intersectionality analysis and an extended HEAL analysis across age, sex, and race/ethnicity using segmented GBD health outcome measurement tools, with an overall HEAL metric of 17.0%. Focusing specifically on the intersection of lower rankings in both health outcomes and AI performance, we identified subgroups in need of improved performance of AI tools, including Hispanic women over 50, black women over 50, whites over 50 Women, white men ages 20-49, and Asian and Pacific Islander men ages 50 and older.

That said, improving the performance of AI tools targeting these groups is critical to achieving health equity.

More than just health equity: A broader picture of fairness in AI

It is obvious that health inequalities exist significantly among different racial/ethnic, gender and age groups. Especially under the rapid development of high-tech medical technology, the tilt of health resources has even increased. In the process of solving related problems, AI has a long way to go. However, it is worth noting that the inequities caused by technological progress actually exist in all aspects of people’s lives, such as the inequality in access to information, online education and digital services caused by the digital divide.

Jeff Dean, head of Google AI and “Programmer Master”, once said that Google attaches great importance to AI fairness and has done a lot in data, algorithms, communication analysis, model interpretability, cultural differences research, and large model privacy protection. Work. For example:

In 2019, Google Cloud’s Responsible AI Product Review Committee and Google Cloud Responsible AI Transaction Review Committee suspended the development of credit-related artificial intelligence products in order to avoid aggravating algorithm unfairness or bias. In 2021, the Advanced Technology Review Board reviewed research involving large language models and concluded that it could proceed with caution, but that the model cannot be officially launched until a comprehensive review of artificial intelligence principles is conducted. The Google DeepMind team has published a paper exploring “how to integrate human values ​​​​into AI systems” and integrate philosophical ideas into AI to help it establish social fairness.

In the future, in order to ensure the fairness of AI technology, intervention and governance need to be carried out from multiple angles, such as:

Fair data collection and processing: Ensure training data covers diversity, including people of different genders, ages, races, cultures, and socioeconomic backgrounds. At the same time, it is necessary to avoid data selection due to bias and ensure the representativeness and balance of the data set.

Eliminate algorithmic bias: During the model design phase, proactively identify and eliminate algorithmic biases that may lead to unfair outcomes. This may involve careful selection of input features to the model, or the use of specific techniques to reduce or eliminate bias.

Fairness assessment: Fairness assessment should be performed before and after model deployment. This includes using various fairness metrics to evaluate the model’s impact on different groups and making necessary adjustments based on the evaluation results.

Continuous monitoring and iterative improvement: After the AI ​​system is deployed, its performance in the actual environment should be continuously monitored to promptly discover and resolve possible unfair issues. This may require regular iterations of the model to adapt to environmental changes and new social norms.

With the development of AI technology, relevant ethical principles, laws and regulations will also be further improved, allowing AI technology to develop within a more equitable framework. There will also be an increased focus on diversity and inclusion. This requires taking into account the needs and characteristics of different groups in all aspects of data collection, algorithm design, and product development.

In the long run, the true meaning of AI changing life should be to better serve people of different genders, ages, races, cultures, and socioeconomic backgrounds, and to reduce the inequities caused by the application of technology. As the level of public awareness continues to improve, can more people participate in the planning of AI development and make suggestions for the development of AI technology to ensure that the development of technology is in line with the overall interests of society.

The broad blueprint for the fairness of AI technology requires joint efforts from multiple fields such as technology, society, and law. Advanced technology should not be allowed to become the driver of the “Matthew Effect.”

You May Also Like

More From Author

+ There are no comments

Add yours