InchWorm Crawling Project: Personalized Social Health Recommendation


InchWorm is a crawler used by researchers in the Data Mining and Information Retrieval Laboratory at Georgia Institute of Technology. The crawler is collecting data for the Personalized Social Health Recommendation research project.


This project has been approved by the Georgia Institute of Technology Institute Review Board as protocol H11049. It has also been approved by the Diabetes Hands Foundation. Changes are being routinely crawled every Thursday.


This research is being conducted by Dr. Steven P. Crain. If you have concerns about the manner in which this study is conducted, you may also contact Ms. Melanie Clark, Georgia Institute of Technology Office of Research Compliance, at (404) 894-6941.


The purpose of this study is to discover ways that a computer can help people find useful health discussion groups, interesting discussions and possible friends or mentors.

What will be crawled

We will be crawling the member profiles, groups and discussion threads in the on-line community. Only data that is publicly available per personal privacy settings will be crawled.

What will be extracted

We will not be storing all of the data that is crawled. Groups with fewer than 25 members will be excluded entirely. For users, we will only record the week in which the user's first public activity occurred, the group memberships and identifiers of the discussions in which the user participated.


The usernames will be replaced with identifiers that are cryptographically related. These identifiers will enable us to merge in additional data at later stages of the project while still protecting user privacy from attackers should they gain access to the data. The data will only be used for research aimed to enhance on-line social networking.