Skip to content

This repo contains transformer-based encoder-decoder architectures that are applied on the task of ICD9-coding on the MIMIC-III-50 dataset. It is the code for the masterthesis "Classification of ICD-9 Codes from Unstructured Clinical Notes using Transformer-Based Neural Networks" of Malte Feucht at the chair of Computer Aided Medical Procedures …

Notifications You must be signed in to change notification settings

maltefeucht/TBM_DBLAC_ICD9_mimic3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Malte FeuchtMalte Feucht
Malte Feucht
and
Malte Feucht
Sep 19, 2021
11fe1cd · Sep 19, 2021

History

4 Commits
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021
Sep 19, 2021

Repository files navigation

TBM_ICD9_mimic3

This contains the code for the masterthesis with the title "Classification of ICD-9 Codes from Unstructured Clinical Notes using Transformer-Based Neural Networks".

Setup

  • This project is developed in python 3.8
  • Install dependencies using the provided requirements.txt. Other package versions might work as well, but it is recommended to install the package versions as specified in the requirements.txt.

Preprocessing

  • The preprocessing is based on the preprocessing of the the CAML model architecture proposed by Mullenbach et al.
  • First, edit the local and remote DATA_DIR, MIMIC_3_DIR and PROJECT_DIR in constants_mimic3.py to make them point to your respective data directories.
  • Organize the data with the following structure:

mimicdata
|      D_ICD_DIAGNOSES.csv
|      D_ICD_PROCEDURES.csv
|      ICD9_descriptions (already in repo)
|–––mimic3
|      |      NOTEEVENTS.csv
|      |      DIAGNOSES_ICD.csv
|      |      PROCEDURES_ICD.csv
|      |      *_hadm_ids.csv (already in repo)

Obtain the MIMIC-III files here: https://physionet.org/content/mimiciii/1.4/

  • Run dataproc_mimic_III.ipynb. This might take a while.
  • If you are curious, after running dataproc_mimic_III.ipynb you can run data_visualization.ipynb to get plots and statistics on the MIMIC-III and the MIMIC-III-50 dataset.

Training

  • To train one of the provided models, run sh train_<model_name>.sh in the scripts directory of the respective model directory.

Testing

  • To test and reproduce the results for one of the models, run sh test_<model_name>.sh in the scripts directory of the respective model directory. This will load the best performing model obtained over k-fold split training for testing.

Inference

  • To run inference and get predictions for one of the models, run sh inference_<model_name>.sh in the scripts directory of the respective model directory. This will load the best performing model obtained over k-fold split training for inference. The predictions are stored in the results directory.

About

This repo contains transformer-based encoder-decoder architectures that are applied on the task of ICD9-coding on the MIMIC-III-50 dataset. It is the code for the masterthesis "Classification of ICD-9 Codes from Unstructured Clinical Notes using Transformer-Based Neural Networks" of Malte Feucht at the chair of Computer Aided Medical Procedures …

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published