Consumer Medical Information Retrieval Relevance Judgments

We collected relevance judgments for a number of pairs of consumer medical questions and technical medical documents.

Creative Commons License
Consumer Medical Information Retrieval Relevance Judgments by Steven P. Crain and Hongyuan Zha is licensed under a Creative Commons Attribution 3.0 Unported License.

If you use this data, please cite:

  • Steven P. Crain, Shuang-Hong Yang, Hongyuan Zha and Yu Jiao, "Dialect topic modeling for improved consumer medical search," In Proceedings of the AMIA Annual Symposium 2010, Washington, D.C.:American Medical Informatics Association, 2010.

BibTeX format:

@INPROCEEDINGS{Crain2010b,
      author = {Steven P. Crain and Shuang-Hong Yang and Yu Jiao and Hongyuan Zha},
      title = {Dialect topic modeling for improved consumer medical search},
      booktitle = {Proceedings of the AMIA Annual Symposium 2010},
      year = {2010},
      address = {Washington, D.C.},
      organization = {American Medical Informatics Association}
}

Our documents come from five sources:

Yahoo! Answers
Questions and corresponding answers collected from Yahoo! Answers health categories.
Centers for Disease Control and Prevention
WebMD
Medical Subject Headings
PubMed Central Open Access Subset

The following data files are currently available:

all_train_1000.dat.gz
Training data, consisting of 1000 documents from each source.
all_validation_2500.dat.gz
Validation data, consisting of 1000 documents from each source.
all_test_2500.dat.gz
Test data, consisting of 1000 documents from each source.