PythonOER
An open resource for python information. This was developed and curated as a resource related to healthcare, data science, and python. Scope creep abounds. Suggestions welcome. Thanks for visiting.
Last updated
An open resource for python information. This was developed and curated as a resource related to healthcare, data science, and python. Scope creep abounds. Suggestions welcome. Thanks for visiting.
Last updated
List of lists, approachable review of job titles, programs that teach data, and courses online; owned by 2U (edx)
Great teaching and practice of fundamentals ; decreases barrier to entry , enable data-driven research
Pitt resource from Dr. Peter Brusilovsky
defunct as of 2022
Python plus many other languages
Impressive review of Stats Methods (Linear regression)
Data science community with full and short courses ; public data sets ; public examples of data projects ; competitions, great resource for introductory learners
Open learning resources, Principles of Computation with Python, course with content/quizzes, includes programming/data/encryption/cellular automaton, has free access to view and requires free account for full features
Four course series related to clinical data, fundamentals of ML for healthcare, Evaluation of AI in Healthcare, and Capstone
Github site hosted course with easy to follow tutorials, originally meant for Ecosciences
Kevin Dunn's fantastic review of concepts. From the site: The topics listed below give an idea of what is covered. Within in each notebook are a series of simple or more challenging problems. The problems are designed to build on the topics just learned, as well as the topics from earlier notebooks.
While thematically based on cognition/perception, has a hands-on approach to Jupyter and Python. Great content for concepts like Linear Regression and Python resources
Search of "Google Cloud Tech" for "python" yields lots of information about python as a language, as a data tool, and as an tool in technical (computer-science styled) applications
User "Telusko" ; 100 + lessons
Download/Install python, Using python, Getting started with python, Variables, Functions, Object, List in python and more by navin reddy
Approx 10 - 15 minutes videos
Basics of syntax ; Some specific exercises like building a random color generator
Vast array and breath/depth of topics in data science from single user including language specific content, conceptual videos, and interviews with others in data science
Tools as a resource "Health IT Curriculum Resources for Educators" from HealthIT.gov's Workflow Development program
Mainly a review of clinical informatics (how does a clinician informaticists think about systems and data)
List of List
Broad definition and application examples of math concepts in data science and high level review of programming in data science
Hosted on Github by DAIR.AI, includes brief outline of each below whole list
List of open source machine learning projects hosted on Github
Great resource of MOOC for R, open textbooks (Leadership in Cancer Informatics) and other resources.
Fantastic review of statistics in an interactive online format
Good review of tests, great section in "software tutorials" to explain how to conduct tests in Excel, Google Sheets, Python, R, etc
Part of the "Learning Health System" series
Trinket Open book
Very easy to read
Mendeley Data is a free and secure cloud-based communal repository where you can store your data, ensuring it is easy to share, access and cite, wherever you are.
CVD risk data
Teaching Statistics in the Health Sciences, a great repository that matches Data sets (mostly CSV) and papers that use them
Popular reference for machine learning data sets
Provides Goals/Concept of dataset with sample questions, data sets, and data dictionary
Data Archive of the Robert Wood Johnson Foundation
Children's Hospital Zhejiang University School of Medicine
NYU
SyntheticMass contains realistic but fictional residents of the state of Massachusetts. The synthetic population aims to statistically mirrors the real population in terms of demographics, disease burden, vaccinations, medical visits, and social determinants.
National Institute of Mental Health Data Archive (NDA) is a single infrastructure that was initially created through the integration of a set of research data repositories
Summary information on the data shared in NDA is available in the NDA Query Tool without the need for an NDA user account. To request access to record-level human subject data, you must submit a Data Access Request.
IPUMS International is dedicated to collecting and distributing census microdata from around the world.
Harmonized International Census Data for Social Science and Health Research
Proprietary, Limited Access
Clinical Practice Research Datalink (CPRD) is a real-world research service supporting retrospective and prospective public health and clinical studies. CPRD research data services are delivered by the Medicines and Healthcare products Regulatory Agency with support from the National Institute for Health and Care Research (NIHR), as part of the Department of Health and Social Care. (United Kingdom)
This website contains the permanent archive of the clinical data, patient reported outcomes, biospecimen analyses, quantitative image analyses, radiographs (X-Rays) and magnetic resonance images (MRIs) acquired during this study. There are longitudinal assessments and measurements from 4,796 subjects, with data from over 431,000 clinical and imaging visits, and almost 26,626,000 images in this archive. More than 400 research manuscripts have already been generated based on this data.
We provide prescribing, dispensing and organization data to help NHS stakeholders track trends and to inform decisions. Using a wide range of information based on prescribing and dispensing, we create reports specific to user needs and requirements.
Prescriptions in the Community
The Scottish Health and Social Care open data platform gives access to statistics and reference data for information and re-use.
The Synthetic Healthcare Database for Research (SyH-DR) is an all-payer, nationally representative claims database. The database consists of a sample of inpatient, outpatient, and prescription drug claims, including utilization, payment, and enrollment data, for people insured by Medicare, Medicaid, or commercial health insurance in 2016. AHRQ created SyH-DR, in part, as a resource to facilitate improvements to price and quality transparency in healthcare.
Synapse - Data repository for publication data. Includes a subsection for digital health and biomarkers
FDA Github
Great starting point for learning about the variety of information available from the FDA via Github. Includes documentation for FDA APIs.
Mimi Labs - Data Catalog, great references from a great group
Sage Data Planet
List of open data sets related to physiologic systems
Great data set documentation about data dictionaries from this CGM data set
Also includes jupyter notebook with scripts to parse
Stanford University Human-Centered Artificial Intelligence (HAI) - open-source data sets
EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models
INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
University of Pittsburgh, Health Sciences Library System
Virtual Seminar series from Carnegie Mellon University
Harvard Business Review
Article by David Berkowitz about the role of learning skills for clinical students
Simple article that covers many basics of Python
Collection of low/no code workflows when working with KNIME Platform. ex. Vanco dosing in obesity, Predicting patient glucose levels, Natural Language Processing for disease tagging in literature.
10 videos from National Librarian of Medicine (Shout out Nancy Shin from UWashington) that covers Data life cycle, documentation, standards, SNOMED, RxNorm, UMLS, Data security, Data sharing, Visualizations. Videos are 5-20 minutes. Easy to understand and great introductions.
Blog post with a basic walkthrough for basic functions like libraries.
Comparing patient records at a sample data set from public vs. private payers
Considerations of medication and classification of opioids
No Code Machine Learning, Google Creative Labs -
"Experiments with Google" is a collection of open, accessible applications of machine learning, artificial intelligence, and development cycles related to various data.
Method to report study results with a TRIPOD checklist
Email/Updates
Method to learn via updates and inspire further investigation
From the NIH Pragmatic Trial Collaborative - RethinkingClinicalTrials
Helpful review of concepts related to machine learning in clinical contexts
Data from national group, available if a member (UPMC)
Data from Taiwan
Consider recreating python code from web scraping as an exercise (library - tweepy)
Interesting methods, would be able to recreate analysis but navigation is in Korea
Very technical but relevant outline, Pitt has access to Optum
Focus on Latent class analysis (LCA), not many relevant code or ML info, does discuss NLP
Available data, implemented SVM and k-NN using sklearn (with downloadable code). Would just require some re-organizing of materials
Article with Github code link with R
Github link to python code, in the article some pseudo code given, and some sample csv data files
UCLA+Japan, No data/code in the article
Great figure explaining the process
Interesting review of how to teach data science
Relevant conclusion - Machine learning can be better than clinician but is rarely incorporated into practice
Editorial
Comparison of machine learning approaches (XGBoost, Logistic Regression) for outcomes
Discussion of how NLP concepts apply to detection in clinical notes/narratives
Interesting application of machine learning for behavioral nudges of prompting serious illness conversations
Tested models against each other: Gradient boosting machines (GBM); Naïve Bayes (NB); Neural networks (NN); Random forests (RF). Similar to other studies, no model clearly outperformed the others.
Good review for missing data in healthcare, good figure for comparing dropping vs. Mean/Mode vs. Imputation
Okay article for reviewing use of classification across Random Forest, Support Vector Machine, Penalized Logistic Regression, and Extreme Gradient Boosting.
A procedural overview of why, when and how to use machine learning for psychiatry
Limited access (not open access), Jupyter Notebook available in Supplemental data with great arc of loading data, reviewing, exploratory data analysis, analysis, and conclusion. Some challenges in running but overall valuable to veiw and see explanations.
Edit Notes - This is a living list with updates and edits, last updated: October 2024
Forbes Best Free online data science courses in 2019 -
Data Camp -
Data Science Plus -
Kaggle -
SUNY OLI -
Artificial Intelligence in Healthcare, Stanford Certificate program -
OurCodingClub - Python Tutorial
Learning Python: basic level
Lab in Cognition and Perception online textbook by Todd Gureckis, Brenden M. Lake and others
Coursera - Python Genomic Data Science Specialization -
Machine Learning with Python -
Data Analysis with Python -
Python for Everybody Specialization -
Applied Data Science with Python Specialization -
Google Cloud Platform -
Python Tutorial for Beginners -
Python Beginner Course -
“Data Professor” - Python search playlist -
"Healthcare Data Analytics" -
Working with Medication Data -
OHSU - Wiki that reviews other resources - ,
“The Ultimate Data Science Prerequisite Learning List” -
ML YouTube Courses -
Best of ML with Python -
Johns Hopkins Data Science Lab -
Course on Infodemiology:
YouTube “Stats Quest” - Josh Starmer -
Excellent review in breadth and depth of topics. Videos like are immensely valuable to new learners.
UTHealth - Biostatistics for the Clinician -
YouTube - Brandon Fultz -
BMJ Statistics -
Kenyon - Biology -
Health Knowledge UK, Public Health Textbook, statistical methods section -
StatR Analysis - Which test to choose -
Medium, Towards Data Science - “Everything You Need To Know about Hypothesis Testing — Part II” -
Open UMich Introduction to Statistics -
Seeing Theory, online book -
Statology -
Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary -
Neural Data Science : A Primer with MATLAB® and Python - (Available via Pitt HSLS)
Python for Bioinformatics (Available via Pitt HSLS)
Python for Everybody -
Hands-on exploratory data analysis with python : perform EDA techniques to understand, summarize, and investigate your data - (Pitt ULS)
Hands-On Machine Learning with Python and Scikit-Learn - (Available via Pitt HSLS, 2 hours of video)
Hands-On PySpark for Big Data Analysis - (Available via Pitt HSLS)
Become a Python Data Analyst: Perform Exploratory Data Analysis and Gain Insight into Scientific Computing Using Python - (Available via Pitt HSLS)
Learn Data Analysis with Python: Lessons in Coding - (Available via Pitt HSLS)
Hands-On Data Analysis with Pandas: Efficiently Perform Data Collection, Wrangling, Analysis, and Visualization Using Python - (Available via Pitt HSLS)
Python for data analysis : data wrangling with pandas, NumPy, and IPython - (Available via Pitt HSLS)
Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code, Jack Dougherty Ilya Ilyankou - Open book available via GitHub -
Codeless Deep Learning with KNIME: Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform -
Mendeley Data Sets -
Synthetic Cardiovascular Risk Dataset, Github hosted -
National Sleep Research Resource, Sleep data sets -
Teaching Statistics in the Health Sciences -
UC Irvine Machine Learning Repository -
Institute for Social Research at University of Michigan -
Data sets:
Health and Medical Care Archive -
Pediatric Intensive Care Data -
Github example of prediction -
Health Science Library pre-publication data sets -
Curated set of public data sets -
Berkley Library, University of California, Health Statistics & Data: Datasets/Raw Data -
EMRBots -
Experimental artificially generated electronic medical records (EMRs),
Harvard Dataverse, Med/Health/Life Science tag -
Synthetic EMR Data Set - ; Mitre Mass
NIMH Data Archive -
Integrated Public Use Microdata Series -
Clinical Practice Research Datalink (CPRD) -
Medical Imaging, Osteoarthritis Initiative (OAI) -
NHS, Business Services Authority, Prescribing data -
Opendata NHS, Scotland -
AHRQ, Synthetic Healthcare Database for Research (SyH-DR) -
Phyio -
Data Browser:
Github hosted Python (and R) templates:
- Pitt hosted/acces
Machine Learning in Medicine -
O’Reilly Training - Youtube playlist
Bringing AI to the Underserved Billions - , Ted Talk 12 minutes
How to keep human bias out of AI - , Ted Talk 12 minutes
ONC Overview of HealthCare Data Analytics - 20 minutes
Data Science in Healthcare, PyData NYC 2018 -
BD2K - Exploratory Data Analysis - 60 minutes
University of Virginia, Exploratory Data Analysis” - 20 minutes
Articles tagged with “Data” -
“Use This Framework to Predict the Success of Your Big Data Project -
Building a data science team -
When Machine Learning Goes Off the Rails -
“The Kinds of Data Scientist” -
Python for Industry Pharmaceuticals and Healthcare - (4 minutes)
Python vs R vs SAS, Simplilearn - (20 minutes)
Jill Cates - How to Build a Clinical Diagnostic Model in Python - PyCon 2019 (25 minutes), Great video
“Feature Engineering of Electronic Medical Records”, Medium Article -
Machine Learning Crash Course (Anaconda, 60 min) -
What is Machine Learning (Google Cloud Platform, 5 min) -
Machine learning without code in a browser (Google Cloud Platform, 10 min) -
All My Pharmacy Students Learn to Code -
Python: Go From Rookie To Rockstar, by Abhishek Verma, Nov-2021 -
Data Science Solutions for Digital Healthcare -
Data Literacy for the Busy Librarian -
An introduction to Python for R Users -
Public vs. Private payer data sets
Created by as a part of University of Pittsburgh Office of the Provost
Machine learning without code in the browser ( YouTube, Google, 10 minutes) - Helpful overview/walkthrough of the website and correlates to steps in machine learning
"Experiment with Google" lab: Teachable Machine () - Google Creative Lab, no coding required, this launches the "webcam"-based model (others include Audio based)
Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) -
TRIPOD
PLOS, arXiv, UConn Center for mHealth and Social Media ()
Nature Journal - Scientific Data library -
“Acquiring and Using Electronic Health Record Data” -
How to Read Articles That Use Machine Learning Users’ Guides to the Medical Literature -
A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study - -Python for language analysis
A validation of machine learning-based risk scores in the prehospital setting -
CHIME: COVID-19 Hospital Impact Model for Epidemics -
A machine learning approach predicts future risk to suicidal ideation from social media data -
“A dataset quantifying polypharmacy in the United States” -
Coding Errors in Study of Meta-analyses With Falsified Data in the Results” -
“Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records” -
“Artificial intelligence-assisted clinical decision support for childhood asthma management: A randomized clinical trial” -
Prescriptive analytics for reducing 30-day hospital readmissions after general surgery -
Forecasting outbound student mobility: A machine learning approach -
Use of social media big data as a novel HIV surveillance tool in South Africa -
Forecasting seasonal influenza-like illness in South Korea after 2 and 30 weeks using Google Trends and influenza data from Argentina -
Characterizing electronic health record usage patterns of inpatient medicine residents using event log data -
Has but original in article not working, general case for use of python
Deep neural network models for identifying incident dementia using claims and EHR datasets -
Subtypes in patients with opioid misuse: A prognostic enrichment strategy using electronic health record data in hospitalized patients -
Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases -
Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care -
Personalized prediction of early childhood asthma persistence: A machine learning approach -
Machine-learning-based prediction models for high-need high-cost patients using nationwide clinical and claims data -
Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19 -
Teaching data science fundamentals through realistic synthetic clinical cardiovascular data -
Applications of machine learning to undifferentiated chest pain in the emergency department: A systematic review -
Beyond performance metrics: modeling outcomes and cost for clinical machine learning -
Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review -
Using machine learning to study the effect of medication adherence in Opioid Use Disorder -
Adverse drug event detection using natural language processing: A scoping review of supervised learning methods -
Long-term Effect of Machine Learning-Triggered Behavioral Nudges on Serious Illness Conversations and End-of-Life Outcomes Among Patients With Cancer: A Randomized Clinical Trial -
Predicting physician departure with machine learning on EHR use patterns: A longitudinal cohort from a large multi-specialty ambulatory practice
Uses XGBoost for predication, Shapley Additive Explanations (SHAP) used for feature contribution. The neat component is the granualar detail (though without the data) from the cleaning process available on
Machine learning to improve frequent emergency department use prediction: a retrospective cohort study
Explainable Data-Driven Hypertension Identification Using Inpatient EMR Clinical Notes
Interesting comparison of approaches to classifying hypertension (e.g. ICD vs. Measurements). Code available via
Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports
Review of a Natural language text approach to free-text radiology reports, a great sample file in the report that describes the search process (no data available though), iteration (great example of how to run multiple tests), and viewing results.
Evidence from ClinicalTrials.gov on the growth of Digital Health Technologies in neurology trials
Content is of interest for digital health but the real neat part is the open sharing of the notes. The shares the data and the Python code related to the data visualization.
Predictive models in emergency medicine and their missing data strategies: a systematic review
Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM)
Application of SVM for predicting Dementia, data available via for 149 observations (non-binary outcomes: Non-Demented, Demented, Converted)
Classification of lapses in smokers attempting to stop: A supervised machine learning approach using data from a popular smoking cessation smartphone app
Predicting diabetes mellitus metabolic goals and chronic complications transitions—analysis based on natural language processing and machine learning models -
Python/Notebook code available for the machine learning/content ()