Author: CodingGorit
Date: 2020年10月22日
Note:学习笔记记录自 B站狂神说:ElasticSearch 学习
搜索相关使用 ES(大数据量下使用)
Lucene 是一套信息检索工具包 (Jar 包,不包含 搜索引擎系统)! Solr
包含的:索引结构!读写索引的工具!排序,搜索规则… 工具类
Lucene 和 EslasticSearch 关系:
ElasticSearch 是基于 Lucene 做了一些封装 和 增强
简称 es
一个开源的高扩展的 分布式全文检索引擎近乎实时的存储,检索数据es使用 java 开发并使用 Licene 作为其核心来实现所有索引 和 搜索功能它的目的是通过简单的 RESTFul API,来隐藏 Lucene 的复杂性,从而让全文搜索变得简单下载,解压
熟悉目录:
bin: 启动文件 config: 配置文件 log4j: 日志文件 jvm.options: java 虚拟机先关的配置 elasticsearch.xml: elasticsearch 的配置文件! lib: 相关 jar 包 logs: 日志 modules: 功能模块 plugins: 插件 ik 启动,访问 9200访问测试:localhost:9200安装可视化插件 es head 插件
下载地址:https://github.com/mobz/elasticsearch-head/启动 npm install npm run start在 elasticSearch.yml 配置跨域
http.cors.enabled: true http.cors.allow-origin: "*"安装 kibana
下载,解压国际化找到 config 下的 kibana.yml 文件,修改最后一行为 i18n.locale: “zh-CN”
集群、节点、索引、类型、文档、分片、映射是什么?
ElasticSearch 是面向文档,关系型数据库 和 elasticSearch 客观的对比! 一切都是 JSON
{
}
名词对应
ElasticSearchRelational DB索引(indices)数据库(database)types表(tables)documents行(rows)fields字段(columns)elasticSearch (集群)中可以包含多个索引(数据库),每个索引中可以包含多个类型(表),每个类型下又包含多个文档(行),每个文档又包含多个字段(列)
elasticSearch 一个就是一个集群
一条条记录
user zs: 15 ls: 22自动识别, string,
数据库
下载好的添加到 plugin 中
跳过,第 8 集
elasticsearch-plugin 可以通过这个命令来查看加载进来的插件
ik_smart(最少切分) 和 ik_max_word(最细粒度划分)
kibana 测试
自定义分词
基础 Rest 命令
methodurl 地址描述PUTlocalhost:9200/索引名称/类型名称/文档 id创建文档(指定文档 id)POSTlocalhost:9200/索引名称/类型名称创建文档(随机文档 id)POSTlocalhost:9200/索引名称/类型名称/文档id/_update修改文档DELETElocalhost:9200/索引名称/类型名称/文档id删除文档GETlocalhost:9200/索引名称/类型名称/文档id查询文档通过文档 idPOSTlocalhost:9200/索引名称/类型名称/_seaarch查询所有数据基本测试
返回值,数据成功添加
#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}). { "_index" : "test", "_type" : "type1", "_id" : "1", "_version" : 1, // 修改次数 "result" : "created", // 状态 "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 } 创建索引规则 PUT /test1/ { "mappings": { "properties": { "name": { "type": "text" }, "age": { "type": "long" }, "birthday": { "type": "date" } } } }返回值
{ "acknowledged" : true, "shards_acknowledged" : true, "index" : "test1" }es 默认配置字段类型!
扩展:通过 _cat/ 可以获得 es 当前很多的信息
GET _cat/health GET _cat/indices?v提交 PUT,覆盖即可
修改数据
PUT /test/type1/1 { "name":"Gorit111", "age": 18, "gender": "male" }修改结果
#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}). { "_index" : "test", "_type" : "type1", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }新的方法 POST 命令更新
POST /test/_doc/1/_update { "doc": { "name":"张三" } } // 结果 { "_index" : "test", "_type" : "_doc", "_id" : "1", "_version" : 3, "_seq_no" : 2, "_primary_term" : 1, "found" : true, "_source" : { "name" : "张三", "age" : 18, "gender" : "male" } }删除索引!!!
DELETE test通过 delete 命令实现删除,根据你的请求来判断删除的是索引 还是 文档
结果:
#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}). { "_index" : "gorit", "_type" : "user", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 } 获取数据 GET /gorit/user/_search # 查询所有数据 GET /gorit/user/1 # 查询单个数据 更新数据 PUT PUT /gorit/user/3 { "name": "李四222", "age": 20, "desc": "Java开发工程师", "tags": ["Python","Java"] } # PUT 更新字段不完整,数据会被滞空 post _update , 推荐使用这种方式! # 修改方式和 PUT 一样会使数据滞空 POST /gorit/user/1 { "doc": { "name": "coco" } } # 修改数据不会滞空, 效率更加高效 POST /gorit/user/1/_update { "doc": { "name": "coco" } }简单的搜索!
# 查询一条记录 GET /gorit/user/1 # 查询所有 GET /gorit/user/_search # 条件查询 [精确匹配] ,如果我们没有个这个属性设置字段,它会背默认设置为 keyword,这个 keyword 字段就是使用全匹配来匹配的,如果是 text 类型,模糊查询就会起效果 GET /gorit/user/_search?q=name:coco使用字段 from 和 size 进行分页查询,方式和 limit pageSize 是一模一样的
from 从第几页开始返回多少条数据 GET /gorit/user/_search { "query": { "match": { "name": "李四" } }, "sort": [ { "age": { "order": "desc" } } ], "from": 0, "size": 1 }must (and), 所有的条件都要符合 where id=1 and name = xxx
# 布尔查询 GET /gorit/user/_search { "query": { "bool": { "must": [ { "match": { "name": "gorit" } },{ "match": { "age": "16" } } ] } } }同时匹配即可
# 多个条件用空格隔开,只要满足一个即可被查出,这个时候可以根据分值判断 GET /gorit/user/_search { "query": { "match": { "tags": "Java Python" } } }term 查询是直接通过倒排索引指定的词条进程精确的查找的!
关于分词
term,直接精确查询
match:会使用分词器解析!!(先分析文档,然后通过分析的文档进行查询!!!)
结论:
text 可分keyword 不可再分这些 MySQL 也可以做,只是 MySQL 效率更低
匹配按照条件匹配精确匹配区间范围匹配匹配字段过滤多条件查询高亮查询倒排索引找官方文档
具体测试
创建索引判断索引是否存在删除索引创建文档操作文档 // 坐标依赖 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> // 核心代码 package cn.gorit; import cn.gorit.pojo.User; import com.alibaba.fastjson.JSON; import javafx.scene.control.IndexRange; import org.apache.lucene.util.QueryBuilder; import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest; import org.elasticsearch.action.bulk.BulkRequest; import org.elasticsearch.action.bulk.BulkResponse; import org.elasticsearch.action.delete.DeleteRequest; import org.elasticsearch.action.delete.DeleteResponse; import org.elasticsearch.action.get.GetRequest; import org.elasticsearch.action.get.GetResponse; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.index.IndexResponse; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.action.support.master.AcknowledgedRequest; import org.elasticsearch.action.support.master.AcknowledgedResponse; import org.elasticsearch.action.update.UpdateRequest; import org.elasticsearch.action.update.UpdateResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.client.indices.CreateIndexRequest; import org.elasticsearch.client.indices.CreateIndexResponse; import org.elasticsearch.client.indices.GetIndexRequest; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.common.xcontent.XContent; import org.elasticsearch.common.xcontent.XContentType; import org.elasticsearch.index.query.MatchAllQueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.TermQueryBuilder; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.fetch.subphase.FetchSourceContext; import org.json.JSONObject; import org.junit.jupiter.api.Test; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.boot.test.context.SpringBootTest; import org.springframework.http.codec.cbor.Jackson2CborDecoder; import java.io.IOException; import java.util.ArrayList; import java.util.concurrent.TimeUnit; /** * es 7.6.2 API 测试 */ @SpringBootTest class DemoApplicationTests { // 名称匹配 @Autowired @Qualifier("restHighLevelClient") private RestHighLevelClient client; @Test void contextLoads() { } // 索引的创建 @Test void testCreateIndex() throws IOException { // 1. 创建索引请求 等价于 PUT /gorit_index CreateIndexRequest request = new CreateIndexRequest("gorit_index"); // 2. 执行创建请求 IndicesClient, 请求后获得响应 CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT); System.out.println(response); } // 测试获取索引,判断其是否存在 @Test void testGetIndexExist() throws IOException { GetIndexRequest request = new GetIndexRequest("gorit_index"); boolean exist = client.indices().exists(request,RequestOptions.DEFAULT); System.out.println(exist); } // 删除索引 @Test void testDeleteIndex() throws IOException { DeleteIndexRequest request = new DeleteIndexRequest("gorit_index"); // 删除 AcknowledgedResponse delete = client.indices().delete(request,RequestOptions.DEFAULT); System.out.println(delete.isAcknowledged()); } // 添加文档 @Test void testAddDocument() throws IOException { // 创建对象 User u = new User("Gorit",3); // 创建请求 IndexRequest request = new IndexRequest("gorit_index"); // 规则 PUT /gorit_index/_doc/1 request.id("1"); request.timeout(TimeValue.timeValueSeconds(3)); request.timeout("1s"); // 将数据放入请求 json IndexRequest source = request.source(JSON.toJSONString(u), XContentType.JSON); // 客户端发送请求 IndexResponse response = client.index(request, RequestOptions.DEFAULT); System.out.println(response.toString()); System.out.println(response.status());// 返回对应的状态 CREATED } // 获取文档,判断存在 get /index/_doc/1 @Test void testIsExists() throws IOException { GetRequest getRequest = new GetRequest("gorit_index", "1"); // 不获取返回的 _source 的上下文了 getRequest.fetchSourceContext(new FetchSourceContext(false)); getRequest.storedFields("_none_"); boolean exists = client.exists(getRequest, RequestOptions.DEFAULT); System.out.println(exists); } // 获取文档信息 @Test void testGetDocument() throws IOException { GetRequest getRequest = new GetRequest("gorit_index", "1"); GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT); // 打印文档的内容 System.out.println(getResponse.getSourceAsString()); System.out.println(getResponse); // 返回全部的内容和命令是一样的 } // 更新文档信息 @Test void testUpdateDocument() throws IOException { UpdateRequest updateRequest = new UpdateRequest("gorit_index", "1"); updateRequest.timeout("1s"); User user = new User("CodingGoirt", 18); updateRequest.doc(JSON.toJSONString(user),XContentType.JSON); UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT); // 打印文档的内容 System.out.println(updateResponse.status()); System.out.println(updateResponse); // 返回全部的内容和命令是一样的 } // 删除文档记录 @Test void testDeleteDocument() throws IOException { DeleteRequest deleteRequest = new DeleteRequest("gorit_index", "1"); deleteRequest.timeout("1s"); DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT); // 打印文档的内容 System.out.println(deleteResponse.status()); System.out.println(deleteResponse); // 返回全部的内容和命令是一样的 } // 特殊的,真的项目。 批量插入数据 @Test void testBulkRequest() throws IOException { BulkRequest bulkRequest = new BulkRequest(); bulkRequest.timeout("10s"); ArrayList<User> userList = new ArrayList<>(); userList.add(new User("张三1",1)); userList.add(new User("张三2",2)); userList.add(new User("张三3",3)); userList.add(new User("张三4",4)); userList.add(new User("张三5",5)); userList.add(new User("张三6",6)); userList.add(new User("张三7",7)); // 批处理请求 for (int i=0;i<userList.size();i++) { // 批量更新,批量删除,就在这里修改为对应的请求即可 bulkRequest.add(new IndexRequest("gorit_index") .id(""+(i+1)) .source(JSON.toJSONString(userList.get(i)),XContentType.JSON)); } BulkResponse bulkItemResponses = client.bulk(bulkRequest, RequestOptions.DEFAULT); System.out.println(bulkItemResponses.hasFailures()); // 是否失败 System.out.println(bulkItemResponses.status()); } // 查询 // SearchRequest 搜索请求 // SearchSourceBuilder条件构造 // HighlightBuilder 构建高亮 // TermQueryBuilder 精确查询 // MatchAllQueryBuilder // xxx QueryBuilder @Test void testSearch() throws IOException { SearchRequest searchRequest = new SearchRequest("gorit_index"); // 构建搜索的条件 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); /** * 查询条件 使用 QueryBuilders 工具类来实现 * QueryBuilders.termQuery 精确 * QueryBuilders.matchAllQueryBuilder() 匹配所有 */ TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "gorit1");//精确查询 // MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery(); sourceBuilder.query(termQueryBuilder); // 分页 sourceBuilder.from(); sourceBuilder.size(); sourceBuilder.highlighter(); // 设置高亮 sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // 构建搜索 searchRequest.source(sourceBuilder); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSON.toJSONString(searchResponse.getHits())); System.out.println("=========================================="); for (SearchHit documentFields: searchResponse.getHits().getHits()) { System.out.println(documentFields.getSourceAsMap()); } } }配置文件
package cn.gorit.config; import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; /** * Spring 步骤 * 1. 找对象 * 2. 放到 spring 中使用 * 3. 分析源码 * * @Classname ElasticSearchConfig * @Description TODO * @Date 2020/10/21 17:20 * @Created by CodingGorit * @Version 1.0 */ @Configuration // xml -bean public class ElasticSearchConfig { @Bean public RestHighLevelClient restHighLevelClient() { RestHighLevelClient client = new RestHighLevelClient( RestClient.builder( new HttpHost("localhost", 9200, "http") ) ); return client; } }爬取京东搜索的内容
config 配置类
package cn.gorit.util; import cn.gorit.pojo.Content; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import org.springframework.stereotype.Component; import java.net.MalformedURLException; import java.net.URL; import java.util.ArrayList; import java.util.List; /** * @Classname HtmlParseUtil * @Description TODO * @Date 2020/10/21 23:17 * @Created by CodingGorit * @Version 1.0 */ @Component public class HtmlParseUtil { // public static void main(String[] args) throws Exception { // new HtmlParseUtil().parseJD("英语").forEach(System.out::println); // } public List<Content> parseJD(String keyword) throws Exception { // 请求 url // 联网,不能获取 ajax 数据 String url = "https://search.jd.com/Search?keyword=wd&enc=utf-8"; // 解析网页 (返回的 Document 对象) Document document = Jsoup.parse(new URL(url.replace("wd",keyword)),30000); // 获取所有节点标签 Element element = document.getElementById("J_goodsList"); // 获取所有的 li 元素 Elements elements = element.getElementsByTag("li"); // 获取元素中的内容 List<Content> goodsList = new ArrayList<>(); for (Element e: elements) { String img = e.getElementsByTag("img").eq(0).attr("data-lazy-img"); String price = e.getElementsByClass("p-price").eq(0).text(); String title = e.getElementsByClass("p-name").eq(0).text(); goodsList.add(new Content(title,img,price)); // System.out.println(img); // System.out.println(price); // System.out.println(title); } return goodsList; } }Service 方法
package cn.gorit.service; import cn.gorit.pojo.Content; import cn.gorit.util.HtmlParseUtil; import com.alibaba.fastjson.JSON; import org.elasticsearch.action.bulk.BulkRequest; import org.elasticsearch.action.bulk.BulkResponse; import org.elasticsearch.action.index.IndexRequest; import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.common.text.Text; import org.elasticsearch.common.unit.TimeValue; import org.elasticsearch.common.xcontent.XContentType; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.index.query.TermQueryBuilder; import org.elasticsearch.search.SearchHit; import org.elasticsearch.search.builder.SearchSourceBuilder; import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder; import org.elasticsearch.search.fetch.subphase.highlight.HighlightField; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import java.io.IOException; import java.util.ArrayList; import java.util.List; import java.util.Map; import java.util.concurrent.TimeUnit; /** * @Classname ContentService * @Description TODO * @Date 2020/10/22 18:44 * @Created by CodingGorit * @Version 1.0 */ @Service public class ContentService { @Autowired private RestHighLevelClient restHighLevelClient; // 不能直接使用,只要 Spring 容器 public static void main(String[] args) throws Exception { new ContentService().parseContent("java"); } // 1. 解析数据放入 es 索引中 public Boolean parseContent (String keywords) throws Exception { // 获取查询到的列表的信息 List<Content> contents = new HtmlParseUtil().parseJD(keywords); // 把查询到的数据放入 es 中 BulkRequest bulkRequest = new BulkRequest(); bulkRequest.timeout("2m"); for (int i=0;i < contents.size();++i) { bulkRequest.add( new IndexRequest("jd_goods") .source(JSON.toJSONString(contents.get(i)),XContentType.JSON)); } BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT); return !bulkResponse.hasFailures(); } // 2. 获取这些数据,实现基本的搜索功能 public List<Map<String,Object>> searchPagehighLight (String keyword, int pageNo,int pageSize) throws IOException { if (pageNo <= 1) pageNo = 1; // 条件清晰 SearchRequest searchRequest = new SearchRequest("jd_goods"); SearchSourceBuilder builder = new SearchSourceBuilder(); builder.from(pageNo); builder.size(pageSize); // 精准匹配 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword); builder.query(termQueryBuilder); builder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // 高亮 HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("title"); highlightBuilder.requireFieldMatch(false); highlightBuilder.preTags("<span style='color:red'>"); highlightBuilder.postTags("</span>"); builder.highlighter(highlightBuilder); // 执行搜索 searchRequest.source(builder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); // 解析结果 ArrayList<Map<String,Object>> list= new ArrayList<>(); for (SearchHit hit: searchResponse.getHits().getHits()) { // 解析高亮的字段 Map<String, HighlightField> highlightFields = hit.getHighlightFields(); HighlightField title = highlightFields.get("title"); Map<String,Object> sourceAsMap = hit.getSourceAsMap();// 原来的结果 // 解析高亮字段,将原来的字段换成我们高亮的字段即可 if (title != null) { Text[] fragments = title.fragments(); StringBuilder nTitle = new StringBuilder(); for (Text text:fragments) { nTitle.append(text); } sourceAsMap.put("title",nTitle); } list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可 } return list; } // 2. 获取这些数据,实现基本的搜索功能 public List<Map<String,Object>> searchPage (String keyword, int pageNo,int pageSize) throws IOException { if (pageNo <= 1) pageNo = 1; // 条件清晰 SearchRequest searchRequest = new SearchRequest("jd_goods"); SearchSourceBuilder builder = new SearchSourceBuilder(); builder.from(pageNo); builder.size(pageSize); // 精准匹配 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword); builder.query(termQueryBuilder); builder.timeout(new TimeValue(60, TimeUnit.SECONDS)); // 执行搜索 searchRequest.source(builder); SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); // 解析结果 ArrayList<Map<String,Object>> list= new ArrayList<>(); for (SearchHit hit: searchResponse.getHits().getHits()) { list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可 } return list; } }Controller
package cn.gorit.controller; import cn.gorit.pojo.Content; import cn.gorit.service.ContentService; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.PathVariable; import org.springframework.web.bind.annotation.RestController; import org.springframework.web.bind.annotation.RestControllerAdvice; import java.io.IOException; import java.util.List; import java.util.Map; /** * @Classname ContentController * @Description TODO * @Date 2020/10/22 18:45 * @Created by CodingGorit * @Version 1.0 */ @RestController public class ContentController { @Autowired private ContentService service; /** * 将数据添加到 ES 中 * @param keyword * @return * @throws Exception */ @GetMapping("/parse/{keyword}") public Boolean pares(@PathVariable("keyword") String keyword) throws Exception { return service.parseContent(keyword); } /** * 查询 ES 的数据 * @param keyword * @param pageNo * @param pageSize * @return * @throws IOException */ @GetMapping("/search/{keyword}/{pageNo}/{pageSize}") public List<Map<String,Object>> search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo, @PathVariable("pageSize") int pageSize) throws IOException { if (pageNo == 0) { pageNo = 1; } return service.searchPage(keyword, pageNo, pageSize); } }POSTMAN 测试
一套项目,多端运用
个人开源项目 (Coding-With-Java ) 欢迎大家点赞