We collected relevance judgments for a number of pairs of consumer medical questions and technical medical documents.
Consumer Medical Information Retrieval Relevance Judgments by Steven P. Crain and Hongyuan Zha is licensed under a Creative Commons Attribution 3.0 Unported License.
If you use this data, please cite:
- Steven P. Crain, Shuang-Hong Yang, Hongyuan Zha and Yu Jiao, "Dialect topic modeling for improved consumer medical search," In Proceedings of the AMIA Annual Symposium 2010, Washington, D.C.:American Medical Informatics Association, 2010.
BibTeX format:
@INPROCEEDINGS{Crain2010b, author = {Steven P. Crain and Shuang-Hong Yang and Yu Jiao and Hongyuan Zha}, title = {Dialect topic modeling for improved consumer medical search}, booktitle = {Proceedings of the AMIA Annual Symposium 2010}, year = {2010}, address = {Washington, D.C.}, organization = {American Medical Informatics Association} }
Our documents come from five sources:
- Yahoo! Answers
- Questions and corresponding answers collected from Yahoo! Answers health categories.
- Centers for Disease Control and Prevention
- WebMD
- Medical Subject Headings
- PubMed Central Open Access Subset
The following data files are currently available:
- all_train_1000.dat.gz
- Training data, consisting of 1000 documents from each source.
- all_validation_2500.dat.gz
- Validation data, consisting of 1000 documents from each source.
- all_test_2500.dat.gz
- Test data, consisting of 1000 documents from each source.