Today was a better start, 8:30. And it’s easier to study in the rain
Carrying on with the AI papers
- Reporting on deep learning algorithms in healthcare. An interesting topic but pretty hard to understand. However, I re-read this morning and used some extra sources and it made sense. It is about : Inconsistencies exist in the literature on how predicted results from deep learning are reported. It showed that Esteva et al. paper didn’t report CIs for AUROC, nor accuracy, PPV, NPV, F1 score. It then spoke about Bland Altman plots, which I didn’t really know, but I found a 2020 paper by Ioannidis on the topic: “In 1986, Bland and Altman published an article on methods of measurement comparison. Their purpose was to discuss methods of comparing two measures of the same thing, with the goal of assessing to the extent to which one method could be used in place of the other.” OK! I can how that relates to AI. And “in clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analyzed inappropriately, notably by using correlation coefficients. The use of correlation is misleading.” The plot is “the differences between two methods plotted against the mean of the two methods/values, and then using those data to derive the limits of agreement (LOA; = mean difference ± 2 standard deviations), between which 95% of the differences would be expected to fall.” B+A said ““how far apart measurements can be without causing difficulties will be a question of judgment. Ideally, it should be defined in advance to help in the interpretation of the method comparison and to choose the sample size.” However, “despite its simplicity, [Bland–Altman analysis] appears not to be completely understood by researchers, reviewers and editors of journals.” So going back he original paper, it now makes sense when it says “in the case of using mean absolute error for evaluation, proportional bias can not be detected, which is important when evaluating he agreement between predicted results”. As proportional bias means “that one method gives values that are higher (or lower) than those from the other by an amount that is proportional to the level of the measured variable.” Bland-Altman plots allow evaluation of fixed and proportional biases together with the presentation of limits of agreement (the confidence limits of the difference between the deep learning-predicted and the ground truth measurement. Root mean square error (RMSE) is an alternative to BA, and more sensitive to outliers. This was about continuous. Derm is usually binary (disease vs no disease). It says “few studies evaluate accuracy (i.e. total number of correct classifications divided by total number of cases), PPV, or NPV.” “Although AUROC provides a general classification of the deep learning-based classification performance, for further clinical information, a specific classification threshold must be chosen”. ”Youden’s index is a balanced maximisation of sensitivity and specificity. When presented with imbalanced datasets, evaluation of an algorithm based solely on AUROC, sensitivity, and specificity has limitations and should be interpreted with care.”
- An awakening in medicine: the partnership of humanity and intelligent machines. Can only use AI when we fully understand the disease. Need more data.
- Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI Steering Group. Approximately 25% of all manuscript submissions now centre on the diagnostic accuracy of AI algorithms. Without reporting guidelines, key stakeholders are poorly placed to appraise quality and compare diagnostic accuracy between studies.
- The future of digital health with federating learning. Federated learning trains algorithms collaboratively without exchanging the data itself. I.E. The mL process occurs locally at each participating institution and only model characteristics are transferred. Sounds good, but I guess not really need for lesional photos and dermoscopy compared to say MRIs of the face, where the face can be reconstructed?
- Best practices for authors of healthcare-related artificial intelligence manuscripts. Quite good. Lead author is a dermatologist and editor of nature digital medicine.
- The state of AI-based FDA approved medical devices and algorithms. No dermatology ones
- A governance model for the application of AI in health care. Fairly common stuff. Used a quote about needing to engage with patient representatives, clinical experts, and people with relevant AI, ethical, and legal expertise.
- Characterising the role of dermatologists in developing AI for assessment of skin cancer. In JAAD. Quite good. Doesn’t say dermatologists are going to be first against the wall when AI comes in. “A robust understanding of the clinical context is absolutely essential for effective implementation of these technologies”. “performing ML will become increasingly routine as the next generation of machine learning products are used to develop models without any coding from the user” “We observed a lack of standardizaion in the reporting of accuracy metric, and we believed we were unable to assess potential distortion of model efficacy caused by overfitting or data leakage.
- There are too many papers to count! Still going!
That photo was my lunch break
I attended St John’s which was a real amateur hour performance in terms on unmuted mics. The cases were interesting though.
And then I worked on my AI stuff. I’m going to look at transparency first and see what other people have done, and then have a go at making a CNN.