Health Data Open Education Resources
  • PythonOER
  • SQL
  • Artificial Intelligence
  • Excel
  • Educational Material
    • Managing Data with Excel - Residents/Fellows (2024)
Powered by GitBook
On this page
  • General References/Resources:
  • Online Courses Related to Python:
  • Other Locations of Information:
  • Statistics Materials:
  • Books:
  • Datasets:
  • All of us WorkBench:
  • Fun Reads / Videos:
  • Learning through Application / Cases:
  • Example applications or tests of knowledge/skills
  • Tools:
  • Literature
  • Literature:
  • Journals, Collections
  • Articles

PythonOER

An open resource for python information. This was developed and curated as a resource related to healthcare, data science, and python. Scope creep abounds. Suggestions welcome. Thanks for visiting.

NextSQL

Last updated 1 month ago

Table of Contents

General References/Resources:

    • List of lists, approachable review of job titles, programs that teach data, and courses online; owned by 2U (edx)

    • Great teaching and practice of fundamentals ; decreases barrier to entry , enable data-driven research

    • Pitt resource from Dr. Peter Brusilovsky

    • defunct as of 2022

Online Courses Related to Python:

    • Python plus many other languages

    • Impressive review of Stats Methods (Linear regression)

    • Data science community with full and short courses ; public data sets ; public examples of data projects ; competitions, great resource for introductory learners

    • Open learning resources, Principles of Computation with Python, course with content/quizzes, includes programming/data/encryption/cellular automaton, has free access to view and requires free account for full features

    • Four course series related to clinical data, fundamentals of ML for healthcare, Evaluation of AI in Healthcare, and Capstone

    • Github site hosted course with easy to follow tutorials, originally meant for Ecosciences

    • Kevin Dunn's fantastic review of concepts. From the site: The topics listed below give an idea of what is covered. Within in each notebook are a series of simple or more challenging problems. The problems are designed to build on the topics just learned, as well as the topics from earlier notebooks.

    • While thematically based on cognition/perception, has a hands-on approach to Jupyter and Python. Great content for concepts like Linear Regression and Python resources

Coursera

YouTube Content

    • Search of "Google Cloud Tech" for "python" yields lots of information about python as a language, as a data tool, and as an tool in technical (computer-science styled) applications

    • User "Telusko" ; 100 + lessons

    • Download/Install python, Using python, Getting started with python, Variables, Functions, Object, List in python and more by navin reddy

    • Approx 10 - 15 minutes videos

    • Basics of syntax ; Some specific exercises like building a random color generator

    • Vast array and breath/depth of topics in data science from single user including language specific content, conceptual videos, and interviews with others in data science

    • Tools as a resource "Health IT Curriculum Resources for Educators" from HealthIT.gov's Workflow Development program

Other Locations of Information:

    • Mainly a review of clinical informatics (how does a clinician informaticists think about systems and data)

    • List of List

    • Broad definition and application examples of math concepts in data science and high level review of programming in data science

    • Hosted on Github by DAIR.AI, includes brief outline of each below whole list

    • List of open source machine learning projects hosted on Github

    • Great resource of MOOC for R, open textbooks (Leadership in Cancer Informatics) and other resources.

Statistics Materials:

    • Fantastic review of statistics in an interactive online format

    • Good review of tests, great section in "software tutorials" to explain how to conduct tests in Excel, Google Sheets, Python, R, etc

Books:

    • Part of the "Learning Health System" series

    • Trinket Open book

    • Very easy to read

Datasets:

    • Mendeley Data is a free and secure cloud-based communal repository where you can store your data, ensuring it is easy to share, access and cite, wherever you are.

    • CVD risk data

    • Teaching Statistics in the Health Sciences, a great repository that matches Data sets (mostly CSV) and papers that use them

    • Popular reference for machine learning data sets

    • Provides Goals/Concept of dataset with sample questions, data sets, and data dictionary

      • Data Archive of the Robert Wood Johnson Foundation

    • Children's Hospital Zhejiang University School of Medicine

  • NYU

    • SyntheticMass contains realistic but fictional residents of the state of Massachusetts. The synthetic population aims to statistically mirrors the real population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants.

    • National Institute of Mental Health Data Archive (NDA) is a single infrastructure that was initially created through the integration of a set of research data repositories

    • Summary information on the data shared in NDA is available in the NDA Query Tool without the need for an NDA user account. To request access to record-level human subject data, you must submit a Data Access Request.

    • IPUMS International is dedicated to collecting and distributing census microdata from around the world.

    • Harmonized International Census Data for Social Science and Health Research

    • Proprietary, Limited Access

    • Clinical Practice Research Datalink (CPRD) is a real-world research service supporting retrospective and prospective public health and clinical studies. CPRD research data services are delivered by the Medicines and Healthcare products Regulatory Agency with support from the National Institute for Health and Care Research (NIHR), as part of the Department of Health and Social Care. (United Kingdom)

    • This website contains the permanent archive of the clinical data, patient reported outcomes, biospecimen analyses, quantitative image analyses, radiographs (X-Rays) and magnetic resonance images (MRIs) acquired during this study. There are longitudinal assessments and measurements from 4,796 subjects, with data from over 431,000 clinical and imaging visits, and almost 26,626,000 images in this archive. More than 400 research manuscripts have already been generated based on this data.

    • We provide prescribing, dispensing and organization data to help NHS stakeholders track trends and to inform decisions. Using a wide range of information based on prescribing and dispensing, we create reports specific to user needs and requirements.

    • Prescriptions in the Community

    • The Scottish Health and Social Care open data platform gives access to statistics and reference data for information and re-use.

    • The Synthetic Healthcare Database for Research (SyH-DR) is an all-payer, nationally representative claims database. The database consists of a sample of inpatient, outpatient, and prescription drug claims, including utilization, payment, and enrollment data, for people insured by Medicare, Medicaid, or commercial health insurance in 2016. AHRQ created SyH-DR, in part, as a resource to facilitate improvements to price and quality transparency in healthcare.

  • Synapse - Data repository for publication data. Includes a subsection for digital health and biomarkers

  • FDA Github

    • Great starting point for learning about the variety of information available from the FDA via Github. Includes documentation for FDA APIs.

  • Mimi Labs - Data Catalog, great references from a great group

  • Sage Data Planet

    • List of open data sets related to physiologic systems

      • Great data set documentation about data dictionaries from this CGM data set

      • Also includes jupyter notebook with scripts to parse

  • Stanford University Human-Centered Artificial Intelligence (HAI) - open-source data sets

    • EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

    • INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms

    • MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

All of us WorkBench:

  • University of Pittsburgh, Health Sciences Library System

Fun Reads / Videos:

    • Virtual Seminar series from Carnegie Mellon University

  • Harvard Business Review

    • Article by David Berkowitz about the role of learning skills for clinical students

    • Simple article that covers many basics of Python

    • Collection of low/no code workflows when working with KNIME Platform. ex. Vanco dosing in obesity, Predicting patient glucose levels, Natural Language Processing for disease tagging in literature.

    • 10 videos from National Librarian of Medicine (Shout out Nancy Shin from UWashington) that covers Data life cycle, documentation, standards, SNOMED, RxNorm, UMLS, Data security, Data sharing, Visualizations. Videos are 5-20 minutes. Easy to understand and great introductions.

    • Blog post with a basic walkthrough for basic functions like libraries.

Learning through Application / Cases:

Example applications or tests of knowledge/skills

    • Comparing patient records at a sample data set from public vs. private payers

    • Considerations of medication and classification of opioids

  • No Code Machine Learning, Google Creative Labs -

    • "Experiments with Google" is a collection of open, accessible applications of machine learning, artificial intelligence, and development cycles related to various data.

Tools:

Literature

    • Method to report study results with a TRIPOD checklist

Literature:

Journals, Collections

  • Email/Updates

    • Method to learn via updates and inspire further investigation

    • From the NIH Pragmatic Trial Collaborative - RethinkingClinicalTrials

Articles

    • Helpful review of concepts related to machine learning in clinical contexts

    • Data from national group, available if a member (UPMC)

    • Data from Taiwan

    • Consider recreating python code from web scraping as an exercise (library - tweepy)

    • Interesting methods, would be able to recreate analysis but navigation is in Korea

    • Very technical but relevant outline, Pitt has access to Optum

    • Focus on Latent class analysis (LCA), not many relevant code or ML info, does discuss NLP

    • Available data, implemented SVM and k-NN using sklearn (with downloadable code). Would just require some re-organizing of materials

    • Article with Github code link with R

    • Github link to python code, in the article some pseudo code given, and some sample csv data files

    • UCLA+Japan, No data/code in the article

    • Great figure explaining the process

    • Interesting review of how to teach data science

    • Relevant conclusion - Machine learning can be better than clinician but is rarely incorporated into practice

    • Editorial

    • Comparison of machine learning approaches (XGBoost, Logistic Regression) for outcomes

    • Discussion of how NLP concepts apply to detection in clinical notes/narratives

    • Interesting application of machine learning for behavioral nudges of prompting serious illness conversations

    • Tested models against each other: Gradient boosting machines (GBM); Naïve Bayes (NB); Neural networks (NN); Random forests (RF). Similar to other studies, no model clearly outperformed the others.

    • Good review for missing data in healthcare, good figure for comparing dropping vs. Mean/Mode vs. Imputation

    • Okay article for reviewing use of classification across Random Forest, Support Vector Machine, Penalized Logistic Regression, and Extreme Gradient Boosting.

  • A procedural overview of why, when and how to use machine learning for psychiatry

    • Limited access (not open access), Jupyter Notebook available in Supplemental data with great arc of loading data, reviewing, exploratory data analysis, analysis, and conclusion. Some challenges in running but overall valuable to veiw and see explanations.

Edit Notes - This is a living list with updates and edits, last updated: October 2024

Forbes Best Free online data science courses in 2019 -

Data Camp -

Data Science Plus -

Kaggle -

SUNY OLI -

Artificial Intelligence in Healthcare, Stanford Certificate program -

OurCodingClub - Python Tutorial

Learning Python: basic level

Lab in Cognition and Perception online textbook by Todd Gureckis, Brenden M. Lake and others

Coursera - Python Genomic Data Science Specialization -

Machine Learning with Python -

Data Analysis with Python -

Python for Everybody Specialization -

Applied Data Science with Python Specialization -

Google Cloud Platform -

Python Tutorial for Beginners -

Python Beginner Course -

“Data Professor” - Python search playlist -

"Healthcare Data Analytics" -

Working with Medication Data -

OHSU - Wiki that reviews other resources - ,

“The Ultimate Data Science Prerequisite Learning List” -

ML YouTube Courses -

Best of ML with Python -

Johns Hopkins Data Science Lab -

Course on Infodemiology:

YouTube “Stats Quest” - Josh Starmer -

Excellent review in breadth and depth of topics. Videos like are immensely valuable to new learners.

UTHealth - Biostatistics for the Clinician -

YouTube - Brandon Fultz -

BMJ Statistics -

Kenyon - Biology -

Health Knowledge UK, Public Health Textbook, statistical methods section -

StatR Analysis - Which test to choose -

Medium, Towards Data Science - “Everything You Need To Know about Hypothesis Testing — Part II” -

Open UMich Introduction to Statistics -

Seeing Theory, online book -

Statology -

Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary -

Neural Data Science : A Primer with MATLAB® and Python - (Available via Pitt HSLS)

Python for Bioinformatics (Available via Pitt HSLS)

Python for Everybody -

Hands-on exploratory data analysis with python : perform EDA techniques to understand, summarize, and investigate your data - (Pitt ULS)

Hands-On Machine Learning with Python and Scikit-Learn - (Available via Pitt HSLS, 2 hours of video)

Hands-On PySpark for Big Data Analysis - (Available via Pitt HSLS)

Become a Python Data Analyst: Perform Exploratory Data Analysis and Gain Insight into Scientific Computing Using Python - (Available via Pitt HSLS)

Learn Data Analysis with Python: Lessons in Coding - (Available via Pitt HSLS)

Hands-On Data Analysis with Pandas: Efficiently Perform Data Collection, Wrangling, Analysis, and Visualization Using Python - (Available via Pitt HSLS)

Python for data analysis : data wrangling with pandas, NumPy, and IPython - (Available via Pitt HSLS)

Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code, Jack Dougherty Ilya Ilyankou - Open book available via GitHub -

Codeless Deep Learning with KNIME: Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform -

Mendeley Data Sets -

Synthetic Cardiovascular Risk Dataset, Github hosted -

National Sleep Research Resource, Sleep data sets -

Teaching Statistics in the Health Sciences -

UC Irvine Machine Learning Repository -

Institute for Social Research at University of Michigan -

Data sets:

Health and Medical Care Archive -

Pediatric Intensive Care Data -

Github example of prediction -

Health Science Library pre-publication data sets -

Curated set of public data sets -

Berkley Library, University of California, Health Statistics & Data: Datasets/Raw Data -

EMRBots -

Experimental artificially generated electronic medical records (EMRs),

Harvard Dataverse, Med/Health/Life Science tag -

Synthetic EMR Data Set - ; Mitre Mass

NIMH Data Archive -

Integrated Public Use Microdata Series -

Clinical Practice Research Datalink (CPRD) -

Medical Imaging, Osteoarthritis Initiative (OAI) -

NHS, Business Services Authority, Prescribing data -

Opendata NHS, Scotland -

AHRQ, Synthetic Healthcare Database for Research (SyH-DR) -

Phyio -

Data Browser:

Github hosted Python (and R) templates:

- Pitt hosted/acces

Machine Learning in Medicine -

O’Reilly Training - Youtube playlist

Bringing AI to the Underserved Billions - , Ted Talk 12 minutes

How to keep human bias out of AI - , Ted Talk 12 minutes

ONC Overview of HealthCare Data Analytics - 20 minutes

Data Science in Healthcare, PyData NYC 2018 -

BD2K - Exploratory Data Analysis - 60 minutes

University of Virginia, Exploratory Data Analysis” - 20 minutes

Articles tagged with “Data” -

“Use This Framework to Predict the Success of Your Big Data Project -

Building a data science team -

When Machine Learning Goes Off the Rails -

“The Kinds of Data Scientist” -

Python for Industry Pharmaceuticals and Healthcare - (4 minutes)

Python vs R vs SAS, Simplilearn - (20 minutes)

Jill Cates - How to Build a Clinical Diagnostic Model in Python - PyCon 2019 (25 minutes), Great video

“Feature Engineering of Electronic Medical Records”, Medium Article -

Machine Learning Crash Course (Anaconda, 60 min) -

What is Machine Learning (Google Cloud Platform, 5 min) -

Machine learning without code in a browser (Google Cloud Platform, 10 min) -

All My Pharmacy Students Learn to Code -

Python: Go From Rookie To Rockstar, by Abhishek Verma, Nov-2021 -

Data Science Solutions for Digital Healthcare -

Data Literacy for the Busy Librarian -

An introduction to Python for R Users -

Public vs. Private payer data sets

Created by as a part of University of Pittsburgh Office of the Provost

Machine learning without code in the browser ( YouTube, Google, 10 minutes) - Helpful overview/walkthrough of the website and correlates to steps in machine learning

"Experiment with Google" lab: Teachable Machine () - Google Creative Lab, no coding required, this launches the "webcam"-based model (others include Audio based)

Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) -

TRIPOD

PLOS, arXiv, UConn Center for mHealth and Social Media ()

Nature Journal - Scientific Data library -

“Acquiring and Using Electronic Health Record Data” -

How to Read Articles That Use Machine Learning Users’ Guides to the Medical Literature -

A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study - -Python for language analysis

A validation of machine learning-based risk scores in the prehospital setting -

CHIME: COVID-19 Hospital Impact Model for Epidemics -

A machine learning approach predicts future risk to suicidal ideation from social media data -

“A dataset quantifying polypharmacy in the United States” -

Coding Errors in Study of Meta-analyses With Falsified Data in the Results” -

“Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records” -

“Artificial intelligence-assisted clinical decision support for childhood asthma management: A randomized clinical trial” -

Prescriptive analytics for reducing 30-day hospital readmissions after general surgery -

Forecasting outbound student mobility: A machine learning approach -

Use of social media big data as a novel HIV surveillance tool in South Africa -

Forecasting seasonal influenza-like illness in South Korea after 2 and 30 weeks using Google Trends and influenza data from Argentina -

Characterizing electronic health record usage patterns of inpatient medicine residents using event log data -

Has but original in article not working, general case for use of python

Deep neural network models for identifying incident dementia using claims and EHR datasets -

Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients -

Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases -

Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care -

Personalized prediction of early childhood asthma persistence: A machine learning approach -

Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data -

Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19 -

Teaching data science fundamentals through realistic synthetic clinical cardiovascular data -

Applications of machine learning to undifferentiated chest pain in the emergency department: A systematic review -

Beyond performance metrics: modeling outcomes and cost for clinical machine learning -

Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review -

Using machine learning to study the effect of medication adherence in Opioid Use Disorder -

Adverse drug event detection using natural language processing: A scoping review of supervised learning methods -

Long-term Effect of Machine Learning-Triggered Behavioral Nudges on Serious Illness Conversations and End-of-Life Outcomes Among Patients With Cancer: A Randomized Clinical Trial -

Predicting physician departure with machine learning on EHR use patterns: A longitudinal cohort from a large multi-specialty ambulatory practice

Uses XGBoost for predication, Shapley Additive Explanations (SHAP) used for feature contribution. The neat component is the granualar detail (though without the data) from the cleaning process available on

Machine learning to improve frequent emergency department use prediction: a retrospective cohort study

Explainable Data-Driven Hypertension Identification Using Inpatient EMR Clinical Notes

Interesting comparison of approaches to classifying hypertension (e.g. ICD vs. Measurements). Code available via

Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports

Review of a Natural language text approach to free-text radiology reports, a great sample file in the report that describes the search process (no data available though), iteration (great example of how to run multiple tests), and viewing results.

Evidence from ClinicalTrials.gov on the growth of Digital Health Technologies in neurology trials

Content is of interest for digital health but the real neat part is the open sharing of the notes. The shares the data and the Python code related to the data visualization.

Predictive models in emergency medicine and their missing data strategies: a systematic review

Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM)

Application of SVM for predicting Dementia, data available via for 149 observations (non-binary outcomes: Non-Demented, Demented, Converted)

Classification of lapses in smokers attempting to stop: A supervised machine learning approach using data from a popular smoking cessation smartphone app

Predicting diabetes mellitus metabolic goals and chronic complications transitions—analysis based on natural language processing and machine learning models -

Python/Notebook code available for the machine learning/content ()

Link
Link
Link
Kaggle.com
Link
Link
Link
Link
Link
https://www.freecodecamp.org/learn/machine-learning-with-python/
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Main page
Article list
Link
Link
Link
https://jhudatascience.org/index.html
https://training.infodemiology.com/healthcare
Link
"Gentle Introduction to Machine Learning"
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
https://www.icpsr.umich.edu/web/instructors/biblio/resources
Link
Link
Link
Link
Link
Link
https://github.com/kartoun/emrbots
Wiki articles
Link
https://synthea.mitre.org/Synthethic
https://nda.nih.gov/
Link
Link
https://nda.nih.gov/oai/
Link
Link
Link
https://www.icpsr.umich.edu/web/pages/HMCA/index.html
https://dhealth.synapse.org/
https://github.com/fda
https://www.mimilabs.ai/datacatalog#state-government-databases-mimi-ws-1-stategov
https://data.sagepub.com/
https://physionet.org/about/database/
https://physionet.org/content/cgmacros/1.0.0/
https://databrowser.researchallofus.org
https://github.com/AlexisCenname/HSLSCodeTemplates/blob/main/ConditionAssociation/Python%20Templates/Python-Association-SyntheticData.ipynb
Video (time stamp “1:31:29”)
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Dominic DiSanto
Open Education Resources Grants
Link
Link
https://www.tripod-statement.org/
Checklist
Mailing List
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
github link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Link
Github
Link
Link
Github
Link
Github
Link
Github
Link
Link
Mendeley Data
Link
Link
Github
https://www.mastersindatascience.org/
https://datacarpentry.org/
Knowledge Tree
https://pythonhealthcare.org
General References/Resources
Online Courses Related to Python
Other Locations of Information
Statistics Materials
Books
Datasets
Fun Reads / Videos
Learning through Application / Cases
Tools
Literature