Clinic expert information provides important references for residents in need of hospital care. Usually, such information is hidden in the deep web and cannot be directly indexed by search engines. To extract clinic expert information from the deep web, the first challenge is to make a judgment on forms. This paper proposes a novel method based on a domain model, which is a tree structure constructed by the attributes of search interfaces. With this model, search interfaces can be classified to a domain and filled in with domain keywords. Another challenge is to extract information from the returned web pages indexed by search interfaces. To filter the noise information on a web page, a block importance model is proposed. The experiment results indicated that the domain model yielded a precision 10.83% higher than that of the rule-based method, whereas the block importance model yielded an F1 measure 10.5% higher than that of the XPath method.
Citation:
ZHANGYuanpeng, DONGJiancheng, QIANDanmin, GENGXingyun, WUHuiqun, WANGLi. Study on Information Extraction of Clinic Expert Information from Hospital Portals. Journal of Biomedical Engineering, 2015, 32(6): 1249-1254. doi: 10.7507/1001-5515.20150222
Copy
Copyright © the editorial department of Journal of Biomedical Engineering of West China Medical Publisher. All rights reserved
1. |
BERGMAN M K. The Deep Web:surfacing hidden value[J]. The Journal of Electronic Publishing, 2001, 7(1):8912-8914.
|
2. |
GHANEM T M, AREF W G. Databases deepen the web[J]. Computer (Long Beach Calif), 2004, 37(1):116-117.
|
3. |
王理, 张远鹏, 董建成.利用领域关联知识从电子病历中抽取检查数据[J].中华医院管理杂志, 2014, 30(3):210-213.
|
4. |
CHANG K C C, HE B, LI C, et al. Structured databases on the Web:Observation and implications[J]. SIGMOD Record, 2004, 33(3):61-70.
|
5. |
COPE J, CRASWELL N, HAWKING D. Automated discovery of search interfaces on the web[C]//Proceedings of the l4th Australasian Database Conference. Adelaide, Australia:2003, 143:181-189.
|
6. |
FU Yan, YANG Dongqing, TANG Shiwei, et al. Using XPath to discover informative content blocks of web pages[C]//Proceedings of the Third International Conference on Semantics, Knowledge and Grid. Shan Xi:2007:450-453.
|
7. |
BEGHOLZ A, CHILDLOVSKⅡB. Crawling for domain-specific hidden Web resources[C]//Proceedings of the Fourth International Conference on Web information Systems Engineering. 2003:125-133.
|
8. |
WANG Li, FUKETA M, MORITA K, et al. Context constraint disambiguation of word semantics by field association schemes[J]. Inf Process Manag, 2011, 47(4):560-574.
|
9. |
张慧斌.Deep Web查询接口及查询结果抽取研究[D].天津:南开大学, 2010.
|
10. |
FUREY T S, CRISTIANINI N, DUFFY N, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data[J]. Bioinformatics, 2000, 16(10):906-914.
|
- 1. BERGMAN M K. The Deep Web:surfacing hidden value[J]. The Journal of Electronic Publishing, 2001, 7(1):8912-8914.
- 2. GHANEM T M, AREF W G. Databases deepen the web[J]. Computer (Long Beach Calif), 2004, 37(1):116-117.
- 3. 王理, 张远鹏, 董建成.利用领域关联知识从电子病历中抽取检查数据[J].中华医院管理杂志, 2014, 30(3):210-213.
- 4. CHANG K C C, HE B, LI C, et al. Structured databases on the Web:Observation and implications[J]. SIGMOD Record, 2004, 33(3):61-70.
- 5. COPE J, CRASWELL N, HAWKING D. Automated discovery of search interfaces on the web[C]//Proceedings of the l4th Australasian Database Conference. Adelaide, Australia:2003, 143:181-189.
- 6. FU Yan, YANG Dongqing, TANG Shiwei, et al. Using XPath to discover informative content blocks of web pages[C]//Proceedings of the Third International Conference on Semantics, Knowledge and Grid. Shan Xi:2007:450-453.
- 7. BEGHOLZ A, CHILDLOVSKⅡB. Crawling for domain-specific hidden Web resources[C]//Proceedings of the Fourth International Conference on Web information Systems Engineering. 2003:125-133.
- 8. WANG Li, FUKETA M, MORITA K, et al. Context constraint disambiguation of word semantics by field association schemes[J]. Inf Process Manag, 2011, 47(4):560-574.
- 9. 张慧斌.Deep Web查询接口及查询结果抽取研究[D].天津:南开大学, 2010.
- 10. FUREY T S, CRISTIANINI N, DUFFY N, et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data[J]. Bioinformatics, 2000, 16(10):906-914.