PythonOER

An open resource for python information. This was developed and curated as a resource related to healthcare, data science, and python. Scope creep abounds. Suggestions welcome. Thanks for visiting.

Table of Contents

General References/Resources:

  • Forbes Best Free online data science courses in 2019 - Link

  • Data Camp - Link

    • Python plus many other languages

  • Data Science Plus - Link

    • Impressive review of Stats Methods (Linear regression)

  • Kaggle - Kaggle.com

    • Data science community with full and short courses ; public data sets ; public examples of data projects ; competitions, great resource for introductory learners

  • SUNY OLI - Link

    • Open learning resources, Principles of Computation with Python, course with content/quizzes, includes programming/data/encryption/cellular automaton, has free access to view and requires free account for full features

  • Artificial Intelligence in Healthcare, Stanford Certificate program - Link

    • Four course series related to clinical data, fundamentals of ML for healthcare, Evaluation of AI in Healthcare, and Capstone

  • OurCodingClub - Python Tutorial Link

    • Github site hosted course with easy to follow tutorials, originally meant for Ecosciences

  • Learning Python: basic level Link

    • Kevin Dunn's fantastic review of concepts. From the site: The topics listed below give an idea of what is covered. Within in each notebook are a series of simple or more challenging problems. The problems are designed to build on the topics just learned, as well as the topics from earlier notebooks.

  • Lab in Cognition and Perception online textbook by Todd Gureckis, Brenden M. Lake and others Link

    • While thematically based on cognition/perception, has a hands-on approach to Jupyter and Python. Great content for concepts like Linear Regression and Python resources

Coursera

  • Coursera - Python Genomic Data Science Specialization - Link

    • Machine Learning with Python - Link

    • Data Analysis with Python - Link

    • Python for Everybody Specialization - Link

    • Applied Data Science with Python Specialization - Link

YouTube Content

  • Google Cloud Platform - Link

    • Search of "Google Cloud Tech" for "python" yields lots of information about python as a language, as a data tool, and as an tool in technical (computer-science styled) applications

  • Python Tutorial for Beginners - Link

    • User "Telusko" ; 100 + lessons

    • Download/Install python, Using python, Getting started with python, Variables, Functions, Object, List in python and more by navin reddy

    • Approx 10 - 15 minutes videos

  • Python Beginner Course - Link

    • Basics of syntax ; Some specific exercises like building a random color generator

  • “Data Professor” - Python search playlist - Link

    • Vast array and breath/depth of topics in data science from single user including language specific content, conceptual videos, and interviews with others in data science

  • "Healthcare Data Analytics" - Link

    • Tools as a resource "Health IT Curriculum Resources for Educators" from HealthIT.gov's Workflow Development program

  • Working with Medication Data - Link

Other Locations of Information:

  • OHSU - Wiki that reviews other resources - Main page, Article list

    • Mainly a review of clinical informatics (how does a clinician informaticists think about systems and data)

  • “The Ultimate Data Science Prerequisite Learning List” - Link

    • List of List

    • Broad definition and application examples of math concepts in data science and high level review of programming in data science

  • ML YouTube Courses - Link

    • Hosted on Github by DAIR.AI, includes brief outline of each below whole list

  • Best of ML with Python -Link

    • List of open source machine learning projects hosted on Github

  • Johns Hopkins Data Science Lab - https://jhudatascience.org/index.html

    • Great resource of MOOC for R, open textbooks (Leadership in Cancer Informatics) and other resources.

Statistics Materials:

  • YouTube “Stats Quest” - Josh Starmer - Link

  • UTHealth - Biostatistics for the Clinician - Link

  • YouTube - Brandon Fultz - Link

  • BMJ Statistics - Link

  • Kenyon - Biology - Link

  • Health Knowledge UK, Public Health Textbook, statistical methods section - Link

  • StatR Analysis - Which test to choose - Link

  • Medium, Towards Data Science - “Everything You Need To Know about Hypothesis Testing — Part II” - Link

  • Open UMich Introduction to Statistics - Link

  • Seeing Theory, online book - Link

    • Fantastic review of statistics in an interactive online format

  • Statology - Link

    • Good review of tests, great section in "software tutorials" to explain how to conduct tests in Excel, Google Sheets, Python, R, etc

Books:

  • Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary - Link

    • Part of the "Learning Health System" series

  • Neural Data Science : A Primer with MATLAB® and Python - Link (Available via Pitt HSLS)

  • Python for Bioinformatics Link (Available via Pitt HSLS)

  • Python for Everybody - Link

    • Trinket Open book

    • Very easy to read

  • Hands-on exploratory data analysis with python : perform EDA techniques to understand, summarize, and investigate your data - Link (Pitt ULS)

  • Hands-On Machine Learning with Python and Scikit-Learn - Link (Available via Pitt HSLS, 2 hours of video)

  • Hands-On PySpark for Big Data Analysis - Link (Available via Pitt HSLS)

  • Become a Python Data Analyst: Perform Exploratory Data Analysis and Gain Insight into Scientific Computing Using Python - Link (Available via Pitt HSLS)

  • Learn Data Analysis with Python: Lessons in Coding - Link (Available via Pitt HSLS)

  • Hands-On Data Analysis with Pandas: Efficiently Perform Data Collection, Wrangling, Analysis, and Visualization Using Python - Link (Available via Pitt HSLS)

  • Python for data analysis : data wrangling with pandas, NumPy, and IPython - Link (Available via Pitt HSLS)

  • Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code, Jack Dougherty Ilya Ilyankou - Open book available via GitHub - Link

  • Codeless Deep Learning with KNIME: Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform - Link

Datasets:

  • Mendeley Data Sets - Link

    • Mendeley Data is a free and secure cloud-based communal repository where you can store your data, ensuring it is easy to share, access and cite, wherever you are.

  • Synthetic Cardiovascular Risk Dataset, Github hosted - Link

    • CVD risk data

  • National Sleep Research Resource, Sleep data sets - Link

  • Teaching Statistics in the Health Sciences - Link

    • Teaching Statistics in the Health Sciences, a great repository that matches Data sets (mostly CSV) and papers that use them

  • UC Irvine Machine Learning Repository - Link

    • Popular reference for machine learning data sets

  • Institute for Social Research at University of Michigan - Link

  • Pediatric Intensive Care Data - Link

    • Children's Hospital Zhejiang University School of Medicine

    • Github example of prediction - Link

  • NYU

    • Health Science Library pre-publication data sets - Link

    • Curated set of public data sets - Link

  • Berkley Library, University of California, Health Statistics & Data: Datasets/Raw Data - Link

  • EMRBots - https://github.com/kartoun/emrbots

    • Experimental artificially generated electronic medical records (EMRs), Wiki articles

  • Harvard Dataverse, Med/Health/Life Science tag - Link

  • Synthetic EMR Data Set - https://synthea.mitre.org/Synthethic ; Mitre Mass

    • SyntheticMass contains realistic but fictional residents of the state of Massachusetts. The synthetic population aims to statistically mirrors the real population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants.

  • NIMH Data Archive - https://nda.nih.gov/

    • National Institute of Mental Health Data Archive (NDA) is a single infrastructure that was initially created through the integration of a set of research data repositories

    • Summary information on the data shared in NDA is available in the NDA Query Tool without the need for an NDA user account. To request access to record-level human subject data, you must submit a Data Access Request.

  • Integrated Public Use Microdata Series - Link

    • IPUMS International is dedicated to collecting and distributing census microdata from around the world.

    • Harmonized International Census Data for Social Science and Health Research

  • Clinical Practice Research Datalink (CPRD) - Link

    • Proprietary, Limited Access

    • Clinical Practice Research Datalink (CPRD) is a real-world research service supporting retrospective and prospective public health and clinical studies. CPRD research data services are delivered by the Medicines and Healthcare products Regulatory Agency with support from the National Institute for Health and Care Research (NIHR), as part of the Department of Health and Social Care. (United Kingdom)

  • Medical Imaging, Osteoarthritis Initiative (OAI) - https://nda.nih.gov/oai/

    • This website contains the permanent archive of the clinical data, patient reported outcomes, biospecimen analyses, quantitative image analyses, radiographs (X-Rays) and magnetic resonance images (MRIs) acquired during this study. There are longitudinal assessments and measurements from 4,796 subjects, with data from over 431,000 clinical and imaging visits, and almost 26,626,000 images in this archive. More than 400 research manuscripts have already been generated based on this data.

  • NHS, Business Services Authority, Prescribing data - Link

    • We provide prescribing, dispensing and organization data to help NHS stakeholders track trends and to inform decisions. Using a wide range of information based on prescribing and dispensing, we create reports specific to user needs and requirements.

  • Opendata NHS, Scotland - Link

    • Prescriptions in the Community

    • The Scottish Health and Social Care open data platform gives access to statistics and reference data for information and re-use.

  • AHRQ, Synthetic Healthcare Database for Research (SyH-DR) - Link

    • The Synthetic Healthcare Database for Research (SyH-DR) is an all-payer, nationally representative claims database. The database consists of a sample of inpatient, outpatient, and prescription drug claims, including utilization, payment, and enrollment data, for people insured by Medicare, Medicaid, or commercial health insurance in 2016. AHRQ created SyH-DR, in part, as a resource to facilitate improvements to price and quality transparency in healthcare.

  • Synapse - Data repository for publication data. Includes a subsection for digital health and biomarkers

  • FDA Github

    • Great starting point for learning about the variety of information available from the FDA via Github. Includes documentation for FDA APIs.

  • Mimi Labs - Data Catalog, great references from a great group

Fun Reads / Videos:

  • Machine Learning in Medicine - Link

    • Virtual Seminar series from Carnegie Mellon University

  • O’Reilly Training - Youtube playlist Link

  • Bringing AI to the Underserved Billions - Link, Ted Talk 12 minutes

  • How to keep human bias out of AI - Link, Ted Talk 12 minutes

  • ONC Overview of HealthCare Data Analytics - Link 20 minutes

  • Data Science in Healthcare, PyData NYC 2018 - Link

  • BD2K - Exploratory Data Analysis - Link 60 minutes

  • University of Virginia, Exploratory Data Analysis” - Link 20 minutes

  • Harvard Business Review

    • Articles tagged with “Data” - Link

    • “Use This Framework to Predict the Success of Your Big Data Project - Link

    • Building a data science team - Link

    • When Machine Learning Goes Off the Rails - Link

    • “The Kinds of Data Scientist” - Link

  • Python for Industry Pharmaceuticals and Healthcare - Link (4 minutes)

  • Python vs R vs SAS, Simplilearn - Link (20 minutes)

  • Jill Cates - How to Build a Clinical Diagnostic Model in Python - PyCon 2019 Link (25 minutes), Great video

    • “Feature Engineering of Electronic Medical Records”, Medium Article - Link

  • Machine Learning Crash Course (Anaconda, 60 min) - Link

  • What is Machine Learning (Google Cloud Platform, 5 min) - Link

  • Machine learning without code in a browser (Google Cloud Platform, 10 min) - Link

  • All My Pharmacy Students Learn to Code - Link

    • Article by David Berkowitz about the role of learning skills for clinical students

  • Python: Go From Rookie To Rockstar, by Abhishek Verma, Nov-2021 - Link

    • Simple article that covers many basics of Python

  • Data Science Solutions for Digital Healthcare - Link

    • Collection of low/no code workflows when working with KNIME Platform. ex. Vanco dosing in obesity, Predicting patient glucose levels, Natural Language Processing for disease tagging in literature.

  • Data Literacy for the Busy Librarian - Link

    • 10 videos from National Librarian of Medicine (Shout out Nancy Shin from UWashington) that covers Data life cycle, documentation, standards, SNOMED, RxNorm, UMLS, Data security, Data sharing, Visualizations. Videos are 5-20 minutes. Easy to understand and great introductions.

  • An introduction to Python for R Users - Link

    • Blog post with a basic walkthrough for basic functions like libraries.

Learning through Application / Cases:

Example applications or tests of knowledge/skills

  • Public vs. Private payer data sets Link

    • Comparing patient records at a sample data set from public vs. private payers

    • Considerations of medication and classification of opioids

    • Created by Dominic DiSanto as a part of University of Pittsburgh Office of the Provost Open Education Resources Grants

  • No Code Machine Learning, Google Creative Labs -

    • "Experiments with Google" is a collection of open, accessible applications of machine learning, artificial intelligence, and development cycles related to various data.

    • Machine learning without code in the browser (Link YouTube, Google, 10 minutes) - Helpful overview/walkthrough of the website and correlates to steps in machine learning

    • "Experiment with Google" lab: Teachable Machine (Link) - Google Creative Lab, no coding required, this launches the "webcam"-based model (others include Audio based)

Tools:

Literature

  • Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) - https://www.tripod-statement.org/

    • TRIPOD Checklist

    • Method to report study results with a TRIPOD checklist

Literature:

Journals, Collections

  • Email/Updates

    • PLOS, arXiv, UConn Center for mHealth and Social Media (Mailing List)

    • Method to learn via updates and inspire further investigation

  • Nature Journal - Scientific Data library - Link

  • “Acquiring and Using Electronic Health Record Data” - Link

    • From the NIH Pragmatic Trial Collaborative - RethinkingClinicalTrials

Articles

  • How to Read Articles That Use Machine Learning Users’ Guides to the Medical Literature - Link

    • Helpful review of concepts related to machine learning in clinical contexts

  • A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study - Link -Python for language analysis

  • A validation of machine learning-based risk scores in the prehospital setting - Link

  • CHIME: COVID-19 Hospital Impact Model for Epidemics - Link

  • A machine learning approach predicts future risk to suicidal ideation from social media data - Link

  • “A dataset quantifying polypharmacy in the United States” - Link

  • Coding Errors in Study of Meta-analyses With Falsified Data in the Results” - Link

  • “Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records” - Link

  • “Artificial intelligence-assisted clinical decision support for childhood asthma management: A randomized clinical trial” - Link

  • Prescriptive analytics for reducing 30-day hospital readmissions after general surgery - Link

    • Data from national group, available if a member (UPMC)

  • Forecasting outbound student mobility: A machine learning approach - Link

    • Data from Taiwan

  • Use of social media big data as a novel HIV surveillance tool in South Africa - Link

    • Consider recreating python code from web scraping as an exercise (library - tweepy)

  • Forecasting seasonal influenza-like illness in South Korea after 2 and 30 weeks using Google Trends and influenza data from Argentina - Link

    • Interesting methods, would be able to recreate analysis but navigation is in Korea

  • Characterizing electronic health record usage patterns of inpatient medicine residents using event log data - Link

    • Has github link but original in article not working, general case for use of python

  • Deep neural network models for identifying incident dementia using claims and EHR datasets - Link

    • Very technical but relevant outline, Pitt has access to Optum

  • Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients - Link

    • Focus on Latent class analysis (LCA), not many relevant code or ML info, does discuss NLP

  • Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases - Link

    • Available data, implemented SVM and k-NN using sklearn (with downloadable code). Would just require some re-organizing of materials

  • Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care - Link

    • Article with Github code link with R

  • Personalized prediction of early childhood asthma persistence: A machine learning approach - Link

    • Github link to python code, in the article some pseudo code given, and some sample csv data files

  • Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data - Link

    • UCLA+Japan, No data/code in the article

  • Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19 - Link

    • Great figure explaining the process

  • Teaching data science fundamentals through realistic synthetic clinical cardiovascular data - Link

    • Interesting review of how to teach data science

  • Applications of machine learning to undifferentiated chest pain in the emergency department: A systematic review - Link

    • Relevant conclusion - Machine learning can be better than clinician but is rarely incorporated into practice

  • Beyond performance metrics: modeling outcomes and cost for clinical machine learning - Link

    • Editorial

  • Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review - Link

  • Using machine learning to study the effect of medication adherence in Opioid Use Disorder - Link

    • Comparison of machine learning approaches (XGBoost, Logistic Regression) for outcomes

  • Adverse drug event detection using natural language processing: A scoping review of supervised learning methods - Link

    • Discussion of how NLP concepts apply to detection in clinical notes/narratives

  • Long-term Effect of Machine Learning-Triggered Behavioral Nudges on Serious Illness Conversations and End-of-Life Outcomes Among Patients With Cancer: A Randomized Clinical Trial - Link

    • Interesting application of machine learning for behavioral nudges of prompting serious illness conversations

  • Predicting physician departure with machine learning on EHR use patterns: A longitudinal cohort from a large multi-specialty ambulatory practice Link

    • Uses XGBoost for predication, Shapley Additive Explanations (SHAP) used for feature contribution. The neat component is the granualar detail (though without the data) from the cleaning process available on Github

  • Machine learning to improve frequent emergency department use prediction: a retrospective cohort study Link

    • Tested models against each other: Gradient boosting machines (GBM); Naïve Bayes (NB); Neural networks (NN); Random forests (RF). Similar to other studies, no model clearly outperformed the others.

  • Explainable Data-Driven Hypertension Identification Using Inpatient EMR Clinical Notes Link

    • Interesting comparison of approaches to classifying hypertension (e.g. ICD vs. Measurements). Code available via Github

  • Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports Link

    • Review of a Natural language text approach to free-text radiology reports, a great sample file in the Github report that describes the search process (no data available though), iteration (great example of how to run multiple tests), and viewing results.

  • Evidence from ClinicalTrials.gov on the growth of Digital Health Technologies in neurology trials Link

    • Content is of interest for digital health but the real neat part is the open sharing of the notes. The Github shares the data and the Python code related to the data visualization.

  • Predictive models in emergency medicine and their missing data strategies: a systematic review Link

    • Good review for missing data in healthcare, good figure for comparing dropping vs. Mean/Mode vs. Imputation

  • Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM) Link

    • Application of SVM for predicting Dementia, data available via Mendeley Data for 149 observations (non-binary outcomes: Non-Demented, Demented, Converted)

  • Classification of lapses in smokers attempting to stop: A supervised machine learning approach using data from a popular smoking cessation smartphone app Link

    • Okay article for reviewing use of classification across Random Forest, Support Vector Machine, Penalized Logistic Regression, and Extreme Gradient Boosting.

Edit Notes - This is a living list with updates and edits, last updated: October 2024

Last updated