跳转到内容

Docker 通过 Dockfile 安装 elasticsearch 以及 ik ,pinyin 分词


已推荐帖子

## 说明 安装 elasticsearch 的 ik 和 pinyin 分词插件,插件的版本要和 elasticsearch 的版本一致 ik 分词地址: https://github.com/medcl/elasticsearch-analysis-ik/ pinyin分词地址: https://github.com/medcl/elasticsearch-analysis-pinyin/ 本文使用 elasticsearch 5.6.9 安装 ## 开始 拉取镜像 ```shell docker pull elasticsearch:5.6.9 ``` 下载插件包 ```shell mkdir docker # 先建个文件夹 # 下载 ik 插件 wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.9/elasticsearch-analysis-ik-5.6.9.zip # 解压 unzip elasticsearch-analysis-ik-5.6.9.zip -d analysis-ik # 下载 pinyin 插件 wget https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v5.6.9/elasticsearch-analysis-pinyin-5.6.9.zip #解压 unzip elasticsearch-analysis-pinyin-5.6.9.zip -d analysis-pinyin ``` .. 创建 Dockerfile ```shell FROM elasticsearch:5.6.9 ADD analysis-ik /usr/share/elasticsearch/plugins/analysis-ik ADD analysis-pinyin /usr/share/elasticsearch/plugins/analysis-pinyin ``` . ```shell docker build -f Dockerfile -t elasticsearch-ik-pinyin:5.6.9 . ``` 成功创建显示: ```shell root@Alone88-Uos:~/docker/els6# docker build -f Dockerfile -t elasticsearch-ik-pinyin:5.6.9 . Sending build context to Docker daemon 18.01MB Step 1/3 : FROM elasticsearch:5.6.9 ---> 5c1e1ecfe33a Step 2/3 : ADD analysis-ik /usr/share/elasticsearch/plugins/analysis-ik ---> 883cd55df8a8 Step 3/3 : ADD analysis-pinyin /usr/share/elasticsearch/plugins/analysis-pinyin ---> 8c9220f304be Successfully built 8c9220f304be Successfully tagged elasticsearch-ik-pinyin:5.6.9 ``` ## 创建容器 ``` docker run -e ES_JAVA_OPTS="-Xms256m -Xmx256m" -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elasticsearch_test elasticsearch-ik-pinyin:5.6.9 ``` -e ES_JAVA_OPTS="-Xms256m -Xmx256m" 是设置 elasticsearch 启动的内存大小,默认是系统一半内存 -e discovery.type 是设置为单节点 elasticsearch-ik-pinyin:5.6.9 就是构建镜像的镜像名和版本号 ## 测试分词 ### 测试拼音 请求 https://127.0.0.1:9200/_analyze 请求方式为 post 请求主体 ```json { "text": "中华人民共和国国徽", "analyzer": "pinyin" } ``` 返回 ```json **{ "tokens":**[ **{ "token":"zhong", "start_offset":0, "end_offset":1, "type":"word", "position":0 }, **{ "token":"zhrmghggh", "start_offset":0, "end_offset":9, "type":"word", "position":0 }, **{ "token":"hua", "start_offset":1, "end_offset":2, "type":"word", "position":1 }, **{ "token":"ren", "start_offset":2, "end_offset":3, "type":"word", "position":2 }, **{ "token":"min", "start_offset":3, "end_offset":4, "type":"word", "position":3 }, **{ "token":"gong", "start_offset":4, "end_offset":5, "type":"word", "position":4 }, **{ "token":"he", "start_offset":5, "end_offset":6, "type":"word", "position":5 }, **{ "token":"guo", "start_offset":6, "end_offset":7, "type":"word", "position":6 }, **{ "token":"guo", "start_offset":7, "end_offset":8, "type":"word", "position":7 }, **{ "token":"hui", "start_offset":8, "end_offset":9, "type":"word", "position":8 } ] } ``` ### 测试 ik 分词 analyzer:可填项有:chinese|ik_max_word|ik_smart,其中chinese是ES的默认分词器选项,ik_max_word(最细粒度划分)和ik_smart(最少划分)是ik中文分词器选项 请求地址: https://127.0.0.1:9200/_analyze 请求方式 : post 请求主体: ``` **{ "text":"中华人民共和国国徽", "analyzer":"ik_max_word" } ``` 返回 ```json **{ "tokens":**[ **{ "token":"中华人民共和国", "start_offset":0, "end_offset":7, "type":"CN_WORD", "position":0 }, **{ "token":"中华人民", "start_offset":0, "end_offset":4, "type":"CN_WORD", "position":1 }, **{ "token":"中华", "start_offset":0, "end_offset":2, "type":"CN_WORD", "position":2 }, **{ "token":"华人", "start_offset":1, "end_offset":3, "type":"CN_WORD", "position":3 }, **{ "token":"人民共和国", "start_offset":2, "end_offset":7, "type":"CN_WORD", "position":4 }, **{ "token":"人民", "start_offset":2, "end_offset":4, "type":"CN_WORD", "position":5 }, **{ "token":"共和国", "start_offset":4, "end_offset":7, "type":"CN_WORD", "position":6 }, **{ "token":"共和", "start_offset":4, "end_offset":6, "type":"CN_WORD", "position":7 }, **{ "token":"国", "start_offset":6, "end_offset":7, "type":"CN_CHAR", "position":8 }, **{ "token":"国徽", "start_offset":7, "end_offset":9, "type":"CN_WORD", "position":9 } ] } ``` **注:不管是拼音分词器还是IK分词器,当深入搜索一条数据是时,必须是通过分词器分析的数据,才能被搜索到,否则搜索不到** ## IK分词和拼音分词的组合使用 ```json PUT /my_index { "settings": { "analysis": { "analyzer": { "ik_smart_pinyin": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] }, "ik_max_word_pinyin": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type" : "pinyin", "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } } } ``` 当我们建type时,需要在字段的analyzer属性填写自己的映射 ```json PUT /my_index/my_type/_mapping { "my_type":{ "properties": { "id":{ "type": "integer" }, "name":{ "type": "text", "analyzer": "ik_smart_pinyin" } } } } ```

查看完整帖子

分享这篇帖子


链接帖子

参与讨论

你现在可以发表并稍后注册。 如果你有帐户,现在就登录参与讨论。

游客
回复主题...

×   粘贴为富文本.   粘贴为纯文本来代替

  只允许使用75个表情符号.

×   你的链接已自动嵌入.   显示为链接来代替

×   你之前的内容已恢复.   清除编辑器

×   你无法直接粘贴图片。要从网址上传或插入图片。

×
×
  • 创建新的...