We collected relevance judgments for a number of pairs of consumer medical questions and technical medical documents.

Consumer Medical Information Retrieval Relevance Judgments by Steven P. Crain and Hongyuan Zha is licensed under a Creative Commons Attribution 3.0 Unported License.
If you use this data, please cite:
- Steven P. Crain, Shuang-Hong Yang, Hongyuan Zha and Yu Jiao, "Dialect topic modeling for improved consumer medical search," In Proceedings of the AMIA Annual Symposium 2010, Washington, D.C.:American Medical Informatics Association, 2010.
BibTeX format:
@INPROCEEDINGS{Crain2010b,
author = {Steven P. Crain and Shuang-Hong Yang and Yu Jiao and Hongyuan Zha},
title = {Dialect topic modeling for improved consumer medical search},
booktitle = {Proceedings of the AMIA Annual Symposium 2010},
year = {2010},
address = {Washington, D.C.},
organization = {American Medical Informatics Association}
}
Our documents come from five sources:
- Yahoo! Answers
- Questions and corresponding answers collected from Yahoo! Answers health categories.
- Centers for Disease Control and Prevention
- WebMD
- Medical Subject Headings
- PubMed Central Open Access Subset
The following data files are currently available:
- all_train_1000.dat.gz
- Training data, consisting of 1000 documents from each source.
- all_validation_2500.dat.gz
- Validation data, consisting of 1000 documents from each source.
- all_test_2500.dat.gz
- Test data, consisting of 1000 documents from each source.