ä¸ãæ件åå¤
ç½ä¸æä»ç»è¯´å¯ä»¥ç´æ¥ç¨plugin -install medcl/elasticsearch-analysis-ikçåæ³ï¼ä½æ¯ææ§è¡ä¸æ¥çææåªæ¯å°æ件çæºç ä¸è½½ä¸æ¥ï¼elasticsearchåªæ¯å°å
¶ä½ä¸ºä¸ä¸ª_siteæ件çå¾
ã
æ以åªææ§è¡maven并å°æå
åçjaræ件æ·è´å°ä¸çº§ç®å½ãï¼å¦åå¨å®ä¹mappingçanalyzerçæ¶åä¼æ示æ¾ä¸å°ç±»çé误ï¼ã
ç±äºIKæ¯åºäºåå
¸çåè¯ï¼æ以è¿è¦ä¸è½½IKçåå
¸æ件ï¼å¨medclçelasticsearch-RTFä¸æï¼å¯ä»¥éè¿è¿ä¸ªå°åä¸è½½ï¼
http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zipä¸è½½ä¹å解å缩å°configç®å½ä¸ãå°è¿éï¼ä½ å¯è½éè¦éæ°å¯å¨ä¸elasticsearchï¼å¥½è®©ä¸ä¸é¨å®ä¹çåè¯å¨è½ç«å³çæã
äºãåè¯å®ä¹
åè¯æ件åå¤å¥½ä¹åå°±å¯ä»¥å¨elasticsearchéå®ä¹ï¼å£°æï¼è¿ä¸ªåè¯ç±»åäºï¼èªå¸¦çå 个类åï¼æ¯å¦standredåä¸éè¦ç¹å«å®ä¹ï¼ãè·å
¶ä»è®¾ç½®ä¸æ ·ï¼åè¯çå®ä¹ä¹å¯ä»¥å¨ç³»ç»çº§ï¼elasticsearchå
¨å±èå´ï¼ï¼ä¹å¯ä»¥å¨ç´¢å¼çº§ï¼åªå¨å½åindexå
é¨å¯è§ï¼ãç³»ç»çº§çå®ä¹å½ç¶æ¯æå¨confç®å½ä¸ç
elasticsearch.ymlæ件éå®ä¹ï¼å
容大è´å¦ä¸ï¼
index:
analysis:
analyzer:
ikAnalyzer:
alias: [ik]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
æè
index.analysis.analyzer.ik.type : "ik"
å 为个人å好ï¼æ并没æè¿ä¹åï¼ èæ¯å®ä¹å¨äºéè¦ä½¿ç¨ä¸æåè¯çindexä¸ï¼è¿æ ·å®ä¹æ´çµæ´»ï¼ä¹ä¸ä¼å½±åå
¶ä»indexã
å¨å®ä¹analyzeä¹åï¼å
å
³éindexãå
¶å®å¹¶ä¸éè¦å
³éä¹å¯ä»¥çæï¼ä½æ¯ä¸ºäºæ°æ®ä¸è´æ§èèï¼è¿æ¯å
æ§è¡å
³éãï¼å¦ææ¯çº¿ä¸çç³»ç»éè¦ä¸æï¼
curl -XPOST
http://localhost:9400/application/_close(å¾æ¾ç¶ï¼è¿éçapplicationæ¯æçä¸ä¸ªindexï¼
ç¶åæ§è¡ï¼
curl -XPUT localhost:9400/application/_settings -d '
{
"analysis": {
"analyzer":{
"ikAnalyzer":{
"type":"org.elasticsearch.index.analysis.IkAnalyzerProvider",
"alias":"ik"
}
}
}
}
'
æå¼indexï¼
curl -XPOST
http://localhost:9400/application/_openå°æ¤ä¸ºæ¢ä¸ä¸ªæ°çç±»åçåè¯å¨å°±å®ä¹å¥½äºï¼æ¥ä¸æ¥å°±æ¯è¦å¦ä½ä½¿ç¨äº
æè
æå¦ä¸é
ç½®
curl -XPUT localhost:9200/indexname -d '{
"settings" : {
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik"
}
}
}
},
"mappings" : {
"article" : {
"dynamic" : true,
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "ik"
}
}
}
}
}'
å¦ææ们æ³è¿åæç»ç²åº¦çåè¯ç»æï¼éè¦å¨elasticsearch.ymlä¸é
ç½®å¦ä¸ï¼
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_smart:
type: ik
use_smart: true
ik_max_word:
type: ik
use_smart: false
ä¸ã使ç¨åè¯å¨
å¨å°åè¯å¨ä½¿ç¨å°å®é
æ°æ®ä¹åï¼å¯ä»¥å
æµéªä¸åè¯ææï¼
http://localhost:9400/application/_analyze?analyzer=ik&text=ä¸æåè¯
åè¯ç»ææ¯ï¼
{
"tokens" : [ {
"token" : "ä¸æ",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "åè¯",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
} ]
}
ä¸ä½¿ç¨standardåè¯å¨çæææ´åçäºï¼
{
"tokens" : [ {
"token" : "ä¸",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 1
}, {
"token" : "æ",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}, {
"token" : "å",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 3
}, {
"token" : "è¯",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 4
} ]
}
æ°çåè¯å¨å®ä¹å®æï¼å·¥ä½æ£å¸¸åå°±å¯ä»¥å¨mappingçå®ä¹ä¸å¼ç¨äºï¼æ¯å¦æå®ä¹è¿æ ·çtypeï¼
curl localhost:9400/application/article/_mapping -d '
{
"article": {
"properties": {
"description": {
"type": "string",
"indexAnalyzer":"ikAnalyzer",
"searchAnalyzer":"ikAnalyzer"
},
"title": {
"type": "string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
}
}
}
}
'
å¾éæ¾ï¼å¯¹äºå·²ç»åå¨çindexæ¥è¯´ï¼è¦å°ä¸ä¸ªstringç±»åçfieldä»standardçåè¯å¨æ¹æå«çåè¯å¨é常é½æ¯å¤±è´¥çï¼
{
"error": "MergeMappingException[Merge failed with failures {[mapper [description] has different index_analyzer, mapper [description] has
different search_analyzer]}]",
"status": 400
}
èä¸æ²¡æåæ³è§£å³å²çªï¼å¯ä¸çåæ³æ¯æ°å»ºä¸ä¸ªç´¢å¼ï¼å¹¶å¶å®mapping使ç¨æ°çåè¯å¨ï¼æ³¨æè¦å¨æ°æ®æå
¥ä¹åï¼å¦åä¼ä½¿ç¨elasticsearché»è®¤çåè¯å¨ï¼
curl -XPUT localhost:9400/application/article/_mapping -d '
{
"article" : {
"properties" : {
"description": {
"type": "string",
"indexAnalyzer":"ikAnalyzer",
"searchAnalyzer":"ikAnalyzer"
},
"title": {
"type": "string",
"indexAnalyzer":"ik",
"searchAnalyzer":"ik"
}
}
}
}
è³æ¤ï¼ä¸ä¸ªå¸¦ä¸æåè¯çelasticsearchå°±ç®æ建å®æã æ³å·æçå¯ä»¥ä¸è½½medclçelasticsearch-RTFç´æ¥ä½¿ç¨ï¼éé¢éè¦çæ件åé
ç½®åºæ¬é½å·²ç»è®¾ç½®å¥½ã
------------
æ ååè¯ï¼standardï¼é
ç½®å¦ä¸ï¼
curl -XPUT localhost:9200/local -d '{
"settings" : {
"analysis" : {
"analyzer" : {
"stem" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "stop", "porter_stem"]
}
}
}
},
"mappings" : {
"article" : {
"dynamic" : true,
"properties" : {
"title" : {
"type" : "string",
"analyzer" : "stem"
}
}
}
}
}'
index:local
type:article
default analyzer:stem (filter:å°åãåç¨è¯ç)
field:titleãã
æµè¯ï¼
# Sample Analysis
curl -XGET localhost:9200/local/_analyze?analyzer=stem -d '{Fight for your life}'
curl -XGET localhost:9200/local/_analyze?analyzer=stem -d '{Bruno fights Tyson tomorrow}'
# Index Data
curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno fights Tyson tomorrow"}'
# search on the title field, which is stemmed on index and search
curl -XGET localhost:9200/local/_search?q=title:fight
# searching on _all will not do anystemming, unless also configured on the mapping to be stemmed...
curl -XGET localhost:9200/local/_search?q=fight
ä¾å¦ï¼
Fight for your life
åè¯å¦ä¸ï¼
{"tokens":[
{"token":"fight","start_offset":1,"end_offset":6,"type":"<ALPHANUM>","position":1},<br>
{"token":"your","start_offset":11,"end_offset":15,"type":"<ALPHANUM>","position":3},<br>
{"token":"life","start_offset":16,"end_offset":20,"type":"<ALPHANUM>","position":4}
]}
-------------------å¦ä¸ç¯--------------------
ElasticSearchå®è£
ikåè¯æ件
ä¸ãIKç®ä»
IK Analyzeræ¯ä¸ä¸ªå¼æºçï¼åºäºjavaè¯è¨å¼åçè½»é级çä¸æåè¯å·¥å
·å
ãä»2006å¹´12ææ¨åº1.0çå¼å§ï¼ IKAnalyzerå·²ç»æ¨åºäº4个大çæ¬ãæåï¼å®æ¯ä»¥å¼æºé¡¹ç®Luence为åºç¨ä¸»ä½çï¼ç»åè¯å
¸åè¯åææ³åæç®æ³çä¸æåè¯ç»ä»¶ãä»3.0çæ¬å¼ å§ï¼IKåå±ä¸ºé¢åJavaçå
¬ç¨åè¯ç»ä»¶ï¼ç¬ç«äºLucene项ç®ï¼åæ¶æä¾äºå¯¹Luceneçé»è®¤ä¼åå®ç°ãå¨2012çæ¬ä¸ï¼IKå®ç°äºç®åçåè¯ æ§ä¹æé¤ç®æ³ï¼æ å¿çIKåè¯å¨ä»å纯çè¯å
¸åè¯å模æè¯ä¹åè¯è¡åã
IK Analyzer 2012ç¹æ§:
1.éç¨äºç¹æçâæ£åè¿ä»£æç»ç²åº¦ååç®æ³âï¼æ¯æç»ç²åº¦åæºè½åè¯ä¸¤ç§åå模å¼ï¼
2.å¨ç³»ç»ç¯å¢ï¼Core2 i7 3.4Gåæ ¸ï¼4Gå
åï¼window 7 64ä½ï¼ Sun JDK 1.6_29 64ä½ æ®épcç¯å¢æµè¯ï¼IK2012å
·æ160ä¸å/ç§ï¼3000KB/Sï¼çé«éå¤çè½åã
3.2012çæ¬çæºè½åè¯æ¨¡å¼æ¯æç®åçåè¯ææ§ä¹å¤çåæ°éè¯å并è¾åºã
4.éç¨äºå¤åå¤çå¨åæ模å¼ï¼æ¯æï¼è±æåæ¯ãæ°åãä¸æè¯æ±çåè¯å¤çï¼å
¼å®¹é©æãæ¥æå符
5.ä¼åçè¯å
¸åå¨ï¼æ´å°çå
åå ç¨ãæ¯æç¨æ·è¯å
¸æ©å±å®ä¹ãç¹å«çï¼å¨2012çæ¬ï¼è¯å
¸æ¯æä¸æï¼è±æï¼æ°åæ··åè¯è¯ã
äºãå®è£
IKåè¯æ件
å设读è
å·²ç»å®è£
好ESï¼å¦æ没æçè¯ï¼è¯·åèElasticSearchå
¥é¨ ââ é群æ建ãå®è£
IKåè¯éè¦çèµæºå¯ä»¥ä»è¿éä¸è½½ï¼æ´ä¸ªå®è£
è¿ç¨éè¦ä¸ä¸ªæ¥éª¤ï¼
1ãè·ååè¯çä¾èµå
éè¿git clone
https://github.com/medcl/elasticsearch-analysis-ikï¼ä¸è½½åè¯å¨æºç ï¼ç¶åè¿å
¥ä¸è½½ç®å½ï¼æ§è¡å½ä»¤ï¼mvn clean packageï¼æå
çæelasticsearch-analysis-ik-1.2.5.jarãå°è¿ä¸ªjaræ·è´å°ES_HOME/plugins/analysis-ikç®å½ä¸é¢ï¼å¦æ没æ该ç®å½ï¼åå
å建该ç®å½ã
2ãikç®å½æ·è´
å°ä¸è½½ç®å½ä¸çikç®å½æ·è´å°ES_HOME/configç®å½ä¸é¢ã
3ãåè¯å¨é
ç½®
æå¼ES_HOME/config/elasticsearch.ymlæ件ï¼å¨æ件æåå å
¥å¦ä¸å
容ï¼
index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true
æ
index.analysis.analyzer.default.type: ik
okï¼æ件å®è£
å·²ç»å®æï¼è¯·éæ°å¯å¨ESï¼æ¥ä¸æ¥æµè¯ikåè¯ææå¦ï¼
ä¸ãikåè¯æµè¯
1ãå建ä¸ä¸ªç´¢å¼ï¼å为indexã
curl -XPUT
http://localhost:9200/index 2ã为索å¼indexå建mappingã
curl -XPOST
http://localhost:9200/index/fulltext/_mapping -d'
{
"fulltext": {
"_all": {
"analyzer": "ik"
},
"properties": {
"content": {
"type" : "string",
"boost" : 8.0,
"term_vector" : "with_positions_offsets",
"analyzer" : "ik",
"include_in_all" : true
}
}
}
}'
3ãæµè¯
curl '
http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d '
{
"text":"ä¸çå¦æ¤ä¹å¤§"
}'
æ¾ç¤ºç»æå¦ä¸ï¼
{
"tokens" : [ {
"token" : "text",
"start_offset" : 4,
"end_offset" : 8,
"type" : "ENGLISH",
"position" : 1
}, {
"token" : "ä¸ç",
"start_offset" : 11,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "å¦æ¤",
"start_offset" : 13,
"end_offset" : 15,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "ä¹å¤§",
"start_offset" : 15,
"end_offset" : 17,
"type" : "CN_WORD",
"position" : 4
} ]
}