Elasticsearch-集群及运维
Elasticsearch 集群、配置、运维
CAT 运维 API
Compact and aligned text (CAT) APIs
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
JSON 参数格式的接口方便程序处理,但对人眼来说,我们熟悉的分行分列的 Linux 命令行输出格式更加友好,Elasticsearch 提供的 CAT(compact and aligned text) 接口就是以命令行输出格式返回的。
CAT 接口返回的结果只适合人眼阅读,如果需要程序处理,建议使用 JSON 返回格式的接口。
公共参数
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html#common-parameters
?help 可用列
全部 CAT 接口都有 help
参数(query string),可用来查询此接口提供的列信息及解释,例如 http://localhost:8200/_cat/nodes?help
返回的三列从左到右分别是: 字段名、别名(缩写)、解释
curl "http://10.92.54.76:8200/_cat/indices?help"
health | h | current health status
status | s | open/close status
index | i,idx | index name
uuid | id,uuid | index uuid
pri | p,shards.primary,shardsPrimary | number of primary shards
rep | r,shards.replica,shardsReplica | number of replica shards
docs.count | dc,docsCount | available docs
docs.deleted | dd,docsDeleted | deleted docs
store.size | ss,storeSize | store size of primaries & replicas
pri.store.size | | store size of primaries
?v 详情
全部 CAT 接口都有 v
参数(query string),用来开启详情输出,例如 localhost:9200/_cat/indices?v
?h 指定输出列
全部 CAT 接口都有 h
参数(query string),用于指定输出列,例如 _cat/nodes?h=ip,port,heapPercent,name
h 支持通配符,例如 /_cat/thread_pool?h=ip,queue*
curl "http://localhost:8200/_cat/nodes?h=ip,port,heapPercent,name"
127.0.0.97 9300 17 es-7-master-2
127.0.0.95 9300 50 es-7-master-1
127.0.0.96 9300 38 es-7-master-0
?s 指定排序列
CAT 接口支持通过 ?s
设置排序字段,可通过列名或别名指定,可通过逗号分割指定多个排序列,默认是升序排序,列名后加 :desc
可指定降序排序,例如 s=column1,column2:desc,column3
GET _cat/templates?v=true&s=order:desc,index_patterns
?format 返回格式
CAT 接口支持通过 ?format
指定返回格式,默认是 text,支持的格式有:text, json, smile, yaml, cbor
curl "http://10.92.54.76:8200/_cat/indices"
green open user_0124 fRqr86C7QDWp2q1JzNL9DQ 1 1 104772294 2540905 760.7gb 379.7gb
curl "http://10.92.54.76:8200/_cat/indices?format=json"
[{"health":"green","status":"open","index":"user_0124","uuid":"fRqr86C7QDWp2q1JzNL9DQ","pri":"1","rep":"1","docs.count":"104772294","docs.deleted":"2540905","store.size":"760.7gb","pri.store.size":"379.7gb"}]
curl "http://10.92.54.76:8200/_cat/indices?format=yaml"
---
- health: "green"
status: "open"
index: "user_0124"
uuid: "fRqr86C7QDWp2q1JzNL9DQ"
pri: "1"
rep: "1"
docs.count: "104772294"
docs.deleted: "2540905"
store.size: "760.7gb"
pri.store.size: "379.7gb"
/_cat 列出全部CAT接口
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates
/_cat/ml/anomaly_detectors
/_cat/ml/anomaly_detectors/{job_id}
/_cat/ml/trained_models
/_cat/ml/trained_models/{model_id}
/_cat/ml/datafeeds
/_cat/ml/datafeeds/{datafeed_id}
/_cat/ml/data_frame/analytics
/_cat/ml/data_frame/analytics/{id}
/_cat/transforms
/_cat/transforms/{transform_id}
/_cat/health?v 查看集群状态
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-health.html
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1643354450 07:20:50 es-7 green 3 3 2 1 0 0 0 0 - 100.0%
/_cat/nodes?v 查看全部节点
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-nodes.html
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.1.1 61 56 1 1.71 1.52 1.38 cdfhilmrstw * es-7-master-1
192.168.1.2 28 80 0 1.71 1.52 1.38 cdfhilmrstw - es-7-master-2
192.168.1.3 49 80 1 1.71 1.52 1.38 cdfhilmrstw - es-7-master-0
/_cat/indices?v 查看所有索引
curl -X GET "localhost:9200/_cat/indices?v"
返回如下
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open article 8wczi0SdTfqTjrOwqM5FOg 1 1 1 0 5.1kb 5.1kb
green open my_3shards Ox5GfotcSjikFg08MHv-lQ 3 1 141527 36972 9.1gb 4.5gb
store.size 是包含全部分片和副本的总数据大小,比如索引是3分片1副本的,包括的就是全部3个分片及每个分片的一主一从副本的总和数据量
/_cat/count/index?v 查看索引的文档数
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-count.html
/_cat/count
查看全部索引的文档总数/_cat/count/<target>
查看指定索引的文档个数
epoch timestamp count
1643355132 07:32:12 104772294
/_cat/segments/index?v 查看索引的段数据
Elasticsearch Guide [7.17] » REST APIs » Compact and aligned text (CAT) APIs » cat segments API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cat-segments.html
GET /_cat/segments
查看全部段数据GET /_cat/segments/index
查看指定 index 的段数据
size.memory
是一个 segment 占用的堆内存大小。
字段说明:
GET /_cat/segments?help
index | i,idx | index name
shard | s,sh | shard name
prirep | p,pr,primaryOrReplica | primary or replica
ip | | ip of node where it lives
id | | unique id of node where it lives
segment | seg | segment name
generation | g,gen | segment generation
docs.count | dc,docsCount | number of docs in segment
docs.deleted | dd,docsDeleted | number of deleted docs in segment
size | si | segment size in bytes
size.memory | sm,sizeMemory | segment memory in bytes
committed | ic,isCommitted | is segment committed
searchable | is,isSearchable | is segment searched
version | v,ver | version
compound | ico,isCompound | is segment compound
示例
GET /_cat/segments/index
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
my_blog_3shards 0 p 192.168.1.1 _73 255 1185600 0 4.2gb 18244 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _da 478 1124265 0 4gb 18020 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _j9 693 1272105 0 4.6gb 18884 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _o8 872 1002084 0 3.6gb 17540 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _ug 1096 1126064 0 4gb 18404 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _zq 1286 1176128 0 4.2gb 18372 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _15i 1494 904718 0 3.2gb 17188 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _1b8 1700 1081184 0 3.9gb 18148 true true 8.10.1 false
my_blog_3shards 0 p 192.168.1.1 _1hq 1934 915554 0 3.3gb 17012 true true 8.10.1 true
/_cat/shards?v 查看分片状态
https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-shards.html
/_cat/shards
查看全部分片信息/_cat/shards/<target>
查看具体某个分片信息
index shard prirep state docs store ip node
my_app_article_inf_0124 0 p STARTED 104772294 379.7gb 10.233.64.96 es-7-master-0
my_app_article_inf_0124 0 r STARTED 104772294 380.9gb 10.233.64.97 es-7-master-2
my_3shards_new 2 r STARTED 47169 1.5gb 10.233.65.152 es-7-master-2
my_3shards_new 2 p STARTED 47169 1.4gb 10.233.67.98 es-7-master-1
my_3shards_new 1 r STARTED 47211 1.5gb 10.233.67.98 es-7-master-1
my_3shards_new 1 p STARTED 47211 1.5gb 10.233.66.2 es-7-master-0
my_3shards_new 0 p STARTED 47147 1.5gb 10.233.65.152 es-7-master-2
my_3shards_new 0 r STARTED 47147 1.5gb 10.233.66.2 es-7-master-0
Elasticsearch 数据迁移
Elastic数据迁移方法及注意事项
https://www.cnblogs.com/zhengchunyuan/p/9957851.html
Cluster 集群 API
Elasticsearch Guide [7.17] » REST APIs » Cluster APIs
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster.html
PUT /_cluster/settings 修改动态配置
Elasticsearch Guide [7.17] » REST APIs » Cluster APIs » Cluster update settings API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-update-settings.html
PUT /_cluster/settings
例如
PUT /_cluster/settings
{
"persistent" : {
"indices.recovery.max_bytes_per_sec" : "50mb"
}
}
GET /_cluster/settings 查询集群配置
Elasticsearch Guide [7.17] » REST APIs » Cluster APIs » Cluster get settings API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-get-settings.html
GET /_cluster/settings
默认只返回显式修改过的配置,增加 include_defaults=true
参数可返回默认配置。
GET /_nodes/stats 查询节点统计信息
Elasticsearch Guide [7.17] » REST APIs » Cluster APIs » Nodes stats API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-nodes-stats.html
GET /_nodes/stats
查询全部节点的全部统计信息GET /_nodes/<node_id>/stats
查询指定节点的全部统计信息GET/_nodes/stats/<metric>
查询全部节点的指定统计信息,例如 /_nodes/stats/jvm
查全部节点的 jvm 信息GET/_nodes/<node_id>/stats/<metric>
查询指定节点的指定统计信息
GET /_nodes/stats/jvm 查询节点的JVM信息
{
"_nodes": {
"total": 3,
"successful": 3,
"failed": 0
},
"cluster_name": "es-7",
"nodes": {
"JadPfrWATmu3br_YrOogIA": {
"timestamp": 1648638934312,
"name": "master-2",
"transport_address": "127.0.0.1:9300",
"host": "127.0.0.1",
"ip": "127.0.0.1:9300",
"roles": [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
],
"attributes": {
"ml.machine_memory": "17179869184",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "8589934592",
"transform.node": "true"
},
"jvm": {
"timestamp": 1648638934312,
"uptime_in_millis": 3454100628,
"mem": {
"heap_used_in_bytes": 4971640320,
"heap_used_percent": 57,
"heap_committed_in_bytes": 8589934592,
"heap_max_in_bytes": 8589934592,
"non_heap_used_in_bytes": 185095632,
"non_heap_committed_in_bytes": 188481536,
"pools": {
"young": {
"used_in_bytes": 4160749568,
"max_in_bytes": 0,
"peak_used_in_bytes": 5146411008,
"peak_max_in_bytes": 0
},
"old": {
"used_in_bytes": 806977024,
"max_in_bytes": 8589934592,
"peak_used_in_bytes": 4854326784,
"peak_max_in_bytes": 8589934592
},
"survivor": {
"used_in_bytes": 3913728,
"max_in_bytes": 0,
"peak_used_in_bytes": 469762048,
"peak_max_in_bytes": 0
}
}
},
"threads": {
"count": 65,
"peak_count": 80
},
"gc": {
"collectors": {
"young": {
"collection_count": 24232,
"collection_time_in_millis": 698748
},
"old": {
"collection_count": 0,
"collection_time_in_millis": 0
}
}
},
"buffer_pools": {
"mapped": {
"count": 2658,
"used_in_bytes": 2496772173825,
"total_capacity_in_bytes": 2496772173825
},
"direct": {
"count": 52,
"used_in_bytes": 9262228,
"total_capacity_in_bytes": 9262227
},
"mapped - 'non-volatile memory'": {
"count": 0,
"used_in_bytes": 0,
"total_capacity_in_bytes": 0
}
},
"classes": {
"current_loaded_count": 24047,
"total_loaded_count": 24085,
"total_unloaded_count": 38
}
}
}
}
}
GET /_nodes/stats/indices 查看节点的索引统计信息
GET /_nodes/stats/indices
查看全部节点的索引统计信息GET /_nodes/JadPfrWATmu3br_YrOogIA/stats/indices
查看指定节点的索引统计信息
GET /_nodes/JadPfrWATmu3br_YrOogIA/stats/indices
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "es-7",
"nodes": {
"JadPfrWATmu3br_YrOogIA": {
"timestamp": 1649384639759,
"name": "es-7-master-2",
"transport_address": "127.0.0.1:9300",
"host": "127.0.0.1",
"ip": "127.0.0.1:9300",
"roles": [
"data",
"data_cold",
"data_content",
"data_frozen",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
],
"attributes": {
"ml.machine_memory": "17179869184",
"ml.max_open_jobs": "512",
"xpack.installed": "true",
"ml.max_jvm_size": "8589934592",
"transform.node": "true"
},
"indices": {
"docs": {
"count": 672675191,
"deleted": 0
},
"shard_stats": {
"total_count": 2
},
"store": {
"size_in_bytes": 2555649098412,
"total_data_set_size_in_bytes": 2555649098412,
"reserved_in_bytes": 0
},
"indexing": {
"index_total": 672675191,
"index_time_in_millis": 859673193,
"index_current": 0,
"index_failed": 0,
"delete_total": 0,
"delete_time_in_millis": 0,
"delete_current": 0,
"noop_update_total": 0,
"is_throttled": false,
"throttle_time_in_millis": 8124403
},
"get": {
"total": 0,
"time_in_millis": 0,
"exists_total": 0,
"exists_time_in_millis": 0,
"missing_total": 0,
"missing_time_in_millis": 0,
"current": 0
},
"search": {
"open_contexts": 0,
"query_total": 557,
"query_time_in_millis": 913940,
"query_current": 0,
"fetch_total": 147,
"fetch_time_in_millis": 1528,
"fetch_current": 0,
"scroll_total": 0,
"scroll_time_in_millis": 0,
"scroll_current": 0,
"suggest_total": 0,
"suggest_time_in_millis": 0,
"suggest_current": 0
},
"merges": {
"current": 0,
"current_docs": 0,
"current_size_in_bytes": 0,
"total": 11405,
"total_time_in_millis": 2334432062,
"total_docs": 1457464133,
"total_size_in_bytes": 5626849591443,
"total_stopped_time_in_millis": 3821957,
"total_throttled_time_in_millis": 848917760,
"total_auto_throttle_in_bytes": 31632531
},
"refresh": {
"total": 9288,
"total_time_in_millis": 45898273,
"external_total": 643,
"external_total_time_in_millis": 8947460,
"listeners": 0
},
"flush": {
"total": 8442,
"periodic": 8161,
"total_time_in_millis": 373382635
},
"warmer": {
"current": 0,
"total": 641,
"total_time_in_millis": 64
},
"query_cache": {
"memory_size_in_bytes": 0,
"total_count": 0,
"hit_count": 0,
"miss_count": 0,
"cache_size": 0,
"cache_count": 0,
"evictions": 0
},
"fielddata": {
"memory_size_in_bytes": 0,
"evictions": 0
},
"completion": {
"size_in_bytes": 0
},
"segments": {
"count": 662,
"memory_in_bytes": 12082184,
"terms_memory_in_bytes": 7308480,
"stored_fields_memory_in_bytes": 4225008,
"term_vectors_memory_in_bytes": 0,
"norms_memory_in_bytes": 0,
"points_memory_in_bytes": 0,
"doc_values_memory_in_bytes": 548696,
"index_writer_memory_in_bytes": 0,
"version_map_memory_in_bytes": 0,
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": -1,
"file_sizes": {}
},
"translog": {
"operations": 0,
"size_in_bytes": 110,
"uncommitted_operations": 0,
"uncommitted_size_in_bytes": 110,
"earliest_last_modified_age": 2840440564
},
"request_cache": {
"memory_size_in_bytes": 71040,
"evictions": 0,
"hit_count": 39,
"miss_count": 242
},
"recovery": {
"current_as_source": 0,
"current_as_target": 0,
"throttle_time_in_millis": 0
}
}
}
}
}
GET /_cluster/stats 查询集群统计信息
Elasticsearch Guide [7.17] » REST APIs » Cluster APIs » Cluster stats API
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/cluster-stats.html
配置 Elasticsearch
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/settings.html#dynamic-cluster-setting
配置 Elasticsearch
配置文件位置
Elasticsearch 有三个配置文件:
elasticsearch.yml
用于配置 Elasticsearchjvm.options
用于配置 Elasticsearch JVM 设置log4j2.properties
用于配置 Elasticsearch 日志记录
我启动 es 容器使用的配置文件 elasticsearch.yml 如下:
# 开启跨域
http.cors.enabled: true
# 允许任何域访问
http.cors.allow-origin: "*"
# 节点名称
node.name: "node-1"
# 集群名称
cluster.name: "docker-es"
# 节点ip 单机默认回环地址 集群必须绑定真实ip
network.host: 0.0.0.0
默认情况下,Elastic 只允许本机访问,如果需要远程访问,可以修改 Elastic 安装目录的 config/elasticsearch.yml 文件,去掉 network.host 的注释,将它的值改成 0.0.0.0,然后重新启动 Elastic。
配置文件格式
ES 的配置文件是 yaml 格式
环境变量替换
配置文件中可以通过 ${...}
引用环境变量
集群和节点配置
- 静态配置,只能在集群启动前在 elasticsearch.yml 中配置
- 动态配置,可通过集群配置 API
PUT /_cluster/settings
在运行中动态修改,也可以在 elasticsearch.yml 中配置
动态配置
动态配置分为:
- 临时的(transient) 集群重启后失效
- 持久的(persistent) 集群重启后还在
可通过配置 API 给配置项赋值 null
来重置持久的或临时的配置项。
如果在多处配置了相同的配置项,优先级如下:
- transient 临时配置项
- persistent 持久配置项
- elasticsearch.yml 中的配置
- 默认值
transient 配置项优先级最高,可以通过 transient 配置项覆盖 persistent 配置项或 elasticsearch.yml 中的配置。
ES 不建议再使用 transient 临时配置项,因为在集群不稳定时临时配置项可能无故消失,导致潜在的问题。
静态配置
静态配置只能集群启动前在 elasticsearch.yml 中配置
静态配置在集群的每个节点都需要配置
集群分片分配和路由配置
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Cluster-level shard allocation and routing settings
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-cluster.html#disk-based-shard-allocation
磁盘分片分配设置
磁盘分配器通过 高水位(high watermark) 和 低水位(low watermark) 阈值控制分片数据在磁盘的分配,主要目标是要确保每个节点的磁盘使用率都不超过高水位,或者只是临时超过高水位。如果一个节点的磁盘使用率超过了高水位,ES 会将此节点上的分片数据移动到集群中的其他节点
注意:节点的磁盘使用率临时超过高水位值是正常的。
节点的磁盘使用率超过低水位后,分配器会停止往此节点分配分片数据,以使节点远离高水位。如果全部节点都超过了低水位,ES 就无法再分配分片数据,同时也无法在节点间移动分片数据。所以要始终保证集群中有几个节点的磁盘使用率低于低水位。
如果节点的磁盘填充速度非常快,ES 可能还来不及将分片数据移动到其他节点,可能导致磁盘被完全占满。为了避免这种情况,如果节点的磁盘使用率超过 **洪水位(flood-stage watermark)**,ES 会禁止向有分片在此节点的索引中写入数据。ES 会继续将此节点的分片数据移动到其他节点,一旦磁盘使用率低于高水位,就会解除写阻塞。
cluster.routing.allocation.disk.threshold_enabled
是否开启磁盘分配器水位阈值检查,默认 true
,设为 false
禁用检查。cluster.routing.allocation.disk.watermark.low
磁盘使用率低水位,默认值 85%
,即磁盘使用率超过 85% 后 ES 就停止向此节点写入分片数据。也可以设为绝对值,例如 500mb,表示磁盘空间低于 500mb 后就禁止写入。cluster.routing.allocation.disk.watermark.high
磁盘使用率高水位,默认值 90%
,即磁盘使用率超过 90% 后 ES 会尝试将此节点的分片数据移动到其他节点。也可以设为绝对值,例如 500mb,表示磁盘空间低于 500mb 后就尝试向外移动分片数据。cluster.routing.allocation.disk.watermark.flood_stage
磁盘使用率洪水位,默认值 95%
,即磁盘使用率超过 95% 后 ES 将有分片在此节点的索引设为只读可删 index.blocks.read_only_allow_delete
,这是防止节点磁盘被占满的最终手段,磁盘使用率低于高水位后,就会自动解除写阻塞。
注意:不能在这几个配置项中混合使用百分比和绝对值,因为 ES 内部要验证其合理性,保证低水位低于高水位,高水位低于洪水位
例如,通过 API 解除指定索引的只读可删限制
PUT /my-index-000001/_settings
{
"index.blocks.read_only_allow_delete": null
}
cluster.info.update.interval
ES 检查磁盘使用率的时间间隔,默认 30s
429 disk usage exceeded flood-stage watermark
问题:
es 插入数据报错
{
"error":{
"root_cause":[
{
"type":"cluster_block_exception",
"reason":"index [my_index] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];"
}
],
"type":"cluster_block_exception",
"reason":"index [my_index] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];"
},
"status":429
}
原因:
磁盘使用率超过 95% 洪水位,es 禁止写入数据到index
解决:
1、关闭磁盘水位阈值检查,可通过 API 不停机动态更新,或停机后修改 elasticsearch.yml
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.threshold_enabled": false
}
}
或者提高各个水位阈值,同样也可以通过 API 更新或配置文件更新,例如更新低水位到 100gb,高水位到 50gb,洪水位到 10gb,设置信息刷新间隔为 1 分钟
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "100gb",
"cluster.routing.allocation.disk.watermark.high": "50gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "10gb",
"cluster.info.update.interval": "1m"
}
}
2、关闭水位检查或者调高水位值后,还是不能写入,因为各个index已经被加上了 只读可删 的 block,需要手动去掉 block,下面的 API 通过 _all
去掉全部索引的 block,也可以指定 index
PUT _all/_settings
{
"index.blocks.read_only_allow_delete": null
}
Elasticsearch 日志配置
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Logging
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/logging.html
elasticsearch 暴露了三个变量${sys:es.logs.base_path}
等于 elasticsearch.yml 中的 path.logs
目录${sys:es.logs.cluster_name}
集群名称${sys:es.logs.node_name}
结点名称
供 log4j2.properties 配置文件使用
elasticsearch 慢日志
es 中有两种慢日志:
索引慢日志(index slow logs) elasticsearch_index_indexing_slowlog.log
搜索慢日志(search slow logs) elasticsearch_index_search_slowlog.log
Logging configuration
https://www.elastic.co/guide/en/elasticsearch/reference/current/logging.html
Circuit Breaker 断路器配置
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Circuit breaker settings
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/circuit-breaker.html
Elasticsearch 有很多断路器(circuit breaker),用于阻止各种操作可能导致的 OOM 内存溢出。每个断路器都有一个阈值指定最多可以使用多少内存。此外,还有一个父断路器指定了所有断路器最多可以使用多少内存。
Request circuit breaker 请求断路器
request 断路器用于限制执行单个请求需要的内存,比如一个聚合请求可能会用 JVM 内存来做一些汇总计算。
indices.breaker.request.limit
Dynamic 参数,请求允许使用的最大内存,默认 JVM 堆内存的 60%
429 circuit_breaking_exception Data too large
{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<http_request>] would be [128107988/122.1mb], which is larger than the limit of [123273216/117.5mb], real usage: [128107696/122.1mb], new bytes reserved: [292/292b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=292/292b, accounting=2309/2.2kb]",
"bytes_wanted": 128107988,
"bytes_limit": 123273216,
"durability": "PERMANENT"
}
],
"type": "circuit_breaking_exception",
"reason": "[parent] Data too large, data for [<http_request>] would be [128107988/122.1mb], which is larger than the limit of [123273216/117.5mb], real usage: [128107696/122.1mb], new bytes reserved: [292/292b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=292/292b, accounting=2309/2.2kb]",
"bytes_wanted": 128107988,
"bytes_limit": 123273216,
"durability": "PERMANENT"
},
"status": 429
}
原因
jvm 堆内存不够当前查询加载数据所以会报 data too large, 请求被熔断,indices.breaker.request.limit
默认为 jvm heap 的 60%
我的 es 的堆大小设为 128M ,只在里面创建了一个 index,插入了两个 document,每个只有一句话,就报这个错了。看来还需要给 es 多分配写内存。
Field data circuit breaker 列缓存断路器
field data 断路器可以估算将一个 field 加载到 列数据缓存 中需要占用多少内存,如果此加载操作会引起内存使用超过预定义的阈值,就会返回错误。
indices.breaker.fielddata.limit
Dynamic 参数,列缓存允许使用的最大内存,默认是 JVM 堆内存的 40%
node query cache 节点查询缓存
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Node query cache settings
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-cache.html
filter 类型的查询结果会缓存在节点的查询缓存中,每个节点都有一个全部索引共用的查询缓存,使用 LRU 淘汰策略:当缓存满的时候最早的查询结果会被删除。此缓存的内容无法被查看。
query cache 属于 node-level 缓存,能够被当前节点的所有 shard 所共享。
从 5.1.1 版本开始,term filter 查询不再被缓存,因为倒排索引本身就是 term 到文档的一种缓存,本身就很快,如果缓存 term 查询反而会冲掉 LRU 中真正需要被缓存的结果
https://www.elastic.co/blog/elasticsearch-5-1-1-released
Term queries are no longer cached. The reason for this is twofold: term queries are almost always fast, and queries for thousands of terms can trash the query cache history, preventing more expensive queries from being cached.
默认节点查询缓存可存储最多 10000 条查询结果,最多占用 10% JVM 堆内存。
indices.queries.cache.size
Static 配置,节点级配置,filter 查询缓存的最大值,默认是 JVM 堆内存的 10%。可配置为 JVM 堆内存的百分比如 10%,或者绝对值比如 512mbindex.queries.cache.enabled
Static 配置,索引级配置,是否开启索引的查询缓存,默认 true 开启。
shard request chache 分片请求缓存
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Shard request cache settings
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/shard-request-cache.html
在一个或多个索引上进行检索时,所涉及的每个分片会在本地执行检索然后将局部结果返回给协调节点,协调节点将这些分片的局部结果组合成完整的全局检索结果
ES 会缓存每个分片上的检索结果,使得高频检索请求可以立即返回。这个缓存也是 LRU 缓存,满的时候最早的结果被删除。
缓存key是这个查询的DSL语句。所以如果要命中缓存查询生成的DSL一定要一样,这里的一样是指DSL这个字符串一样。
当更新文档、更新 mapping 时,缓存会自动失效
indices.requests.cache.size
默认值最大值是 JVM 堆内存的 2%
打开/关闭分片请求缓存
分片请求缓存默认是开启的,创建索引时可指定关闭缓存:
PUT /my-index-000001
{
"settings": {
"index.requests.cache.enable": false
}
}
也可以动态开启/关闭已有索引的缓存:
PUT /my-index-000001/_settings
{ "index.requests.cache.enable": true }
查看请求缓存使用量
/index/_stats
"request_cache": {
"memory_size_in_bytes": 168128,
"evictions": 0,
"hit_count": 64,
"miss_count": 466
}
fielddata cache 列缓存配置
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Field data cache settings
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-fielddata.html
fielddata 缓存包含字段数据和全局序号 global ordinals,用于支持字段的聚合查询,是位于 JVM 堆内存上的一种缓存结构。
indices.fielddata.cache.size
Static 配置,列缓存的最大值,默认无限制,比如堆内存的 38%,或者绝对值 12GB,这个值应该小于 indices.breaker.fielddata.limit
Elasticsearch 线程池配置
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Thread pools
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-threadpool.html
查看结点状态,里面有线程池配置数据
curl -XGET 'http://localhost:9200/_nodes/stats?pretty'
2.0 之后 5.0 之前,可通过 http api 动态修改线程池大小,无需重启,5.0 之后不能动态修改了,必须重启。
curl -XPUT 'localhost:9200/_cluster/settings' -d '{
"transient": {
"threadpool.index.type": "fixed",
"threadpool.index.size": 100,
"threadpool.index.queue_size": 500
}
}'
elasticsearch 的3种线程池类型
elasticsearch 线程池的线程按照源码的实现来看分为 fixed 固定大小线程池, fixed_auto_queue_size 固定大小带阻塞队列的线程池 和 scaling 可变大小线程池 三种,其中 fixed_auto_queue_size 是实现类型,可能在之后的版本中去除。
fixed 固定大小线程池
fixed_auto_queue_size 固定大小带阻塞队列的线程池
scaling 可变大小线程池
search 线程池
用作 count/search/suggest 操作,线程池类型是 fixed_auto_queue_size ,线程池默认大小为 int((# of available_processors * 3) / 2) + 1
,queue_size 默认大小为 1000
配置示例
thread_pool:
search:
size: 30
queue_size: 500
min_queue_size: 10
max_queue_size: 1000
auto_queue_frame_size: 2000
target_response_time: 1s
write 线程池
用作 index/delete/update 及 bulk 批量操作,线程池类型是 fixed ,默认大小为 # of available processors
, 允许设置的最大值是 1 + # of available processors
, queue_size 默认大小为 200,
配置示例
thread_pool:
write:
size: 30
queue_size: 1000
processors 处理器个数设置
线程池配置中的 # of available processors
指的是自动检测到的 逻辑处理器 个数,等于
# 查看逻辑CPU的个数
cat /proc/cpuinfo| grep "processor"| wc -l
比如 thread_pool.write.size
要求最大值是 1 + # of available processors
,如果逻辑cpu个数为4,则线程池最大为5,如果配置项中指定了比5更大的值会报错
java.lang.IllegalArgumentException: Failed to parse value [30] for setting [thread_pool.write.size] must be <= 5
如果确定想改为更大的值,可以在配置文件 elasticsearch.yml 中手动指定 processors 个数,例如
processors: 2
429 es_rejected_execution_exception
429/Too Many Requests
写入时报错 es_rejected_execution_exception
{
"error":{
"root_cause":[
{
"type":"remote_transport_exception",
"reason":"[ZKjMEXP][127.0.0.1:9300][indices:data/write/bulk[s][p]]"
}
],
"type":"es_rejected_execution_exception",
"reason":"rejected execution of processing of [2026943][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[user_profile_indicator_data][0]] containing [index {[user_profile_indicator_data][indicator_base_info][5725976], source[n/a, actual length: [2.4kb], max length: 2kb]}], target allocation id: IbC5nk5CSOO9ReABdDvcvA, primary term: 1 on EsThreadPoolExecutor[name = ZKjMEXP/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1f44c59d[Running, pool size = 4, active threads = 4, queued tasks = 200, completed tasks = 1442357]]"
},
"status":429
}
查询时报错 EsRejectedExecutionException
[2020-05-18T15:48:31,645][DEBUG][o.e.a.s.TransportSearchAction] [ZKjMEXP] All shards failed for phase: [query]
org.elasticsearch.ElasticsearchException$1: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@4475dcce on QueueResizingEsThreadPoolExecutor[name = ZKjMEXP/search, queue capacity = 100, min queue capacity = 100, max queue capacity = 1000, frame size = 1000, targeted response rate = 1s, task execution EWMA = 21.5ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@1328861b[Running, pool size = 30, active threads = 30, queued tasks = 384, completed tasks = 19350]]
at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:657) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:131) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.action.search.InitialSearchPhase.onShardFailure(InitialSearchPhase.java:100) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.action.search.InitialSearchPhase.access$100(InitialSearchPhase.java:48) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$onFailure$1(InitialSearchPhase.java:220) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.action.search.InitialSearchPhase$1.doRun(InitialSearchPhase.java:187) [elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) [elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.7.jar:6.8.7]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@4475dcce on QueueResizingEsThreadPoolExecutor[name = ZKjMEXP/search, queue capacity = 100, min queue capacity = 100, max queue capacity = 1000, frame size = 1000, targeted response rate = 1s, task execution EWMA = 21.5ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@1328861b[Running, pool size = 30, active threads = 30, queued tasks = 384, completed tasks = 19350]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.7.jar:6.8.7]
es_rejected_execution_exception[bulk] 是批量队列错误。当对 Elasticsearch 集群的请求数超过批量队列大小 (threadpool.bulk.queue_size
) 时,会发生此问题。每个节点上的批量队列可以容纳 50 到 200 个请求,具体取决于您使用的 Elasticsearch 版本。队列已满时,将拒绝新请求。
其实,Elasticsearch 分别对不同的操作【例如:index、bulk、get 等】提供不同的线程池,并设置线程池的线程个数与排队任务上限。可以在数据索引所在节点的 settings 中查看
这里面,有两种类型的线程池,一种是 fixing,一种是 scaling,其中 fixing 是固定大小的线程池,默认是 core 个数的 5 倍,也可以指定大小,scaling 是动态变化的线程池,可以设置最大值、最小值。
解决:
在不增加节点的情况下,把节点的线程池设置大一点、队列上限设置大一点,就可以处理更多的请求了。这个方法需要改变 Elasticsearch 集群的配置,然后重启集群,但是一般情况下会有风险,因为节点的硬件配置【内存、CPU】没有变化,单纯增加线程池,会给节点带来压力,可能会宕机,谨慎采用。配置信息参考如下:
-- 修改 elasticsearch.yml 配置文件
threadpool.bulk.type: fixed
threadpool.bulk.size: 64
threadpool.bulk.queue_size: 1500
Elasticsearch 中的 429 错误 es_rejected_execution_exception
https://www.playpi.org/2017042601.html
Elasticsearch 高级配置(JVM配置)
Elasticsearch Guide [7.17] » Set up Elasticsearch » Configuring Elasticsearch » Advanced configuration
https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html
不要直接修改 /usr/share/elasticsearch/config/jvm.options
,将自定义 JVM 参数放到 /usr/share/elasticsearch/config/jvm.options.d/
目录中
ES 默认根据节点的角色和总内存大小自动配置 JVM 堆内存大小,建议直接使用默认值。
-Xms和-Xmx配置必须相同
-Xms和-Xmx配置的内存大小必须相同,避免 resize,否则 es 启动会报错
ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
bootstrap check failure [1] of [1]: initial heap size [8589934592] not equal to maximum heap size [17179869184]; this can cause resize pauses
-Xms 和 -Xmx 建议不要超过总内存的 50%,因为除了 JVM,es还有其他占用内存的地方。
young gc 频繁
{"type": "server", "timestamp": "2022-03-05T18:28:59,824Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "es-7", "node.name": "es-7-master-0", "message": "[gc][1347730] overhead, spent [311ms] collecting in the last [1s]", "cluster.uuid": "-YDlZAJIQxKModujOTof2g", "node.id": "K_g4Ids0Rz-SHHG2jhp9dQ" }
{"type": "server", "timestamp": "2022-03-05T18:48:00,164Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "es-7", "node.name": "es-7-master-0", "message": "[gc][1348869] overhead, spent [324ms] collecting in the last [1s]", "cluster.uuid": "-YDlZAJIQxKModujOTof2g", "node.id": "K_g4Ids0Rz-SHHG2jhp9dQ" }
发现和集群形成
Elasticsearch Guide [7.17] » Set up Elasticsearch » Discovery and cluster formation
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-discovery.html
包括发现节点、选举 master、形成集群、发布集群状态。
ES 7.x 之前,Bully 算法(Zen Discovery 集群协调子系统):根据节点的ID大小来判定谁是leader,简单粗暴,可能出现选举时集群暂时不可用、以及无法选出master的问题
ES 7.x 使用了新的选主算法,类Raft算法
ElasticSearch-新老选主算法对比
https://yemilice.com/2021/06/16/elasticsearch-%E6%96%B0%E8%80%81%E9%80%89%E4%B8%BB%E7%AE%97%E6%B3%95%E5%AF%B9%E6%AF%94/
Quorum(多数派)选举
Elasticsearch Guide [7.17] » Set up Elasticsearch » Discovery and cluster formation » Quorum-based decision making
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/modules-discovery-quorums.html
MasterNotDiscoveredException
es 启动报错:
“type”: “server”, “timestamp”: “2022-03-30T02:08:52,263Z”, “level”: “WARN”, “component”: “r.suppressed”, “cluster.name”: “es-7”, “node.name”: “es-7-master-0”, “message”: “path: /_cluster/health, params: {wait_for_status=green, timeout=1s}”,
“stacktrace”: [“org.elasticsearch.discovery.MasterNotDiscoveredException: null”
原因:
从 6.x 升级到 7.x 后,需要在环境变量或 elasticsearch.yml 中配置 cluster.initial_master_nodes
指定 master 节点
例如
# 三台实例保证相同
cluster.name: my-cluster
# 设置成对应的 ${HOSTNAME}
node.name: es-01
# 设置成三台实例的 ${HOSTNAME}
discovery.seed_hosts: ["es-01", "es-02", "es-03"]
cluster.initial_master_nodes: ["es-01", "es-02", "es-03"]
下一篇 Elasticsearch-搜索
页面信息
location:
protocol
: host
: hostname
: origin
: pathname
: href
: document:
referrer
: navigator:
platform
: userAgent
: