OpenSearch
|
Índex
|
|
|
- OpenSearch
- Introduction to OpenSearch
-
OpenSearch |
Database |
index |
table |
document (JSON) |
row |
shard (each shard stores a subset of all
documents in an index)
|
|
- Index configured with:
- mapping:
a collection of fields and the types of those fields
- settings:
include index data like the index name, creation
date, and number of shards
- ...
|
|
- Getting
started
- Installation quickstart (using Docker)
- system setup
sudo -i
swapoff -a
echo "vm.max_map_count=262144"
>>/etc/sysctl.conf
sysctl -p
- download compose file
mkdir ~/opensearch
cd ~/opensearch
curl -O
https://raw.githubusercontent.com/opensearch-project/documentation-website/2.18/assets/examples/docker-compose.yml
- set admin password
cd ~/opensearch
echo
"OPENSEARCH_INITIAL_ADMIN_PASSWORD=xxxxxx"
>.env
- start
- verify (3 lines should appear)
docker compose ps
- si no us apareixen les tres línies és que us
cal fer les accions del primer pas
- dashboards:
- Experiment
with sample data
-
|
generate your own from an existing index |
or download a sample |
apply |
mapping |
elasticdump
--debug
--input=https://master:xxx@<my_cluster_host>/myindex
--output=myindex_mappings.json
--type=mapping
|
ecommerce-field_mappings.json |
curl -H "Content-Type:
application/json" -X PUT
"https://localhost:9200/ecommerce" -ku
admin:<custom-admin-password>
--data-binary
"@ecommerce-field_mappings.json"
curl -H "Content-Type:
application/json" -X PUT
"https://localhost:9200/myindex" -ku
admin:<custom-admin-password>
--data-binary "@myindex_mappings.json"
|
data |
elasticdump
--debug
--input=https://master:xxx@<my_cluster_host>/myindex
--output=myindex.ndjson
|
ecommerce.ndjson |
curl -H "Content-Type:
application/x-ndjson" -X PUT
"https://localhost:9200/ecommerce/_bulk"
-ku
admin:<custom-admin-password>
--data-binary "@ecommerce.ndjson"
curl -H "Content-Type:
application/x-ndjson" -X PUT
"https://localhost:9200/myindex/_bulk"
-ku
admin:<custom-admin-password>
--data-binary "@myindex.ndjson"
|
- Import / Export
- Managing
indexes
- CRUD
-
table caption
|
|
|
bulk
|
template |
create template index |
|
|
|
create template data stream |
- create data stream template
PUT
_index_template/<datastream_template_name>
{
"index_patterns": "logs-nginx",
"data_stream":
{
"timestamp_field":
{
"name":
"request_time"
}
},
"priority": 200,
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
}
|
|
index |
create index |
- only needed if parameters are non-default
PUT
<index>
{ "settings": {
"number_of_shards": 6,
"number_of_replicas": 2 } }
|
|
|
rollover index or datastream
(can be automated with ISM) |
- rollover:
POST
<index_or_datastream>/_rollover
|
|
data stream |
create data stream |
- create explicit data stream
(will use matching datastream template, if
any; error if no matching datastream
template):
PUT
_data_stream/<datastream_name>
- create implicit data stream by creating a
document in a new index:
|
|
|
retrieve data stream |
- retrieve info about all datastreams:
- retrieve info about a datastream:
GET
_data_stream/<datastream_name>
- retrieve stats about a datastream:
GET
_data_stream/<datastream_name>/_stats
|
|
|
delete data stream |
- delete a data stream:
DELETE
_data_stream/<name_of_data_stream>
|
|
document |
create documents |
- if index:
- exists:
- a document will be added to
existing index
- (order?) matches an index
template:
- specified index will be created,
with settings from template
- (order?) matches a data stream
template:
- a data stream will be created:
<index>
- an index will be created (
.ds-<index>-00001 )
- does not match a template:
- specified index will be created,
with default settings
- specifying id:
PUT
<index>/_doc/<id>
{ "A JSON": "document" }
- without specifying id:
POST
<index>/_doc
{ "A JSON": "document" }
|
- bulk (using ndjson)
POST
_bulk
{ "index": { "_index":
"<index>", "_id": "<id>" }
}
{ "A JSON": "document" }
|
|
retrieve documents
|
|
- multiple documents with all fields:
GET
_mget
{
"docs": [
{
"_index": "<index>",
"_id":
"<id>"
},
{
"_index": "<index>",
"_id":
"<id>"
}
]
}
- multiple documents with selected fields:
GET
_mget
{
"docs": [
{
"_index": "<index>",
"_id":
"<id>",
"_source": "field1"
},
{
"_index": "<index>",
"_id":
"<id>",
"_source": "field2"
}
]
}
|
|
search documents |
- search documents:
GET
<index>/_search
{
"query": {
"match": {
"message": "login"
}
}
}
|
|
|
check documents
|
- verify whether a document exists:
|
|
|
update documents
|
- total update (replace), specifying id
(same as creating a new document with the
same id):
PUT
<index>/_doc/<id>
{ "A JSON": "document" }
- partial update, specifying id:
POST
<index>/_update/<id> {
"doc": {
"A JSON": "document" }
}
- conditional update (
upsert ),
specifying id
(if it exists: update its info with doc; if
it does not exist: create a document with
upsert):
POST
movies/_update/2
{
"doc": {
"title": "Castle in
the Sky"
},
"upsert": {
"title": "Only
Yesterday",
"genre":
["Animation", "Fantasy"],
"date": 1993
}
}
|
|
|
delete documents
|
- delete a document, specifying id:
|
|
- Templates
- quan es crea un index o bé un data stream
(explícitament; o bé implícitament, quan es crea un
document), opensearch comprova si el nom quadra amb
algun template. Si quadra, crearà l'índex o el data
stream amb la configuració especificada al template
- Tipus
- Index template: va bé per exemple quan AWS
Firehose crea automàticament índexs amb rotació
(diària, setmanal, mensual...)
- Data stream template: configures a set of indexes
as a data stream
- Data
streams
- "A data stream is internally composed of multiple
backing indexes. Search requests are routed to all the
backing indexes, while indexing requests are routed to
the latest write index. ISM policies let you
automatically handle index rollovers or deletions."
- un dels camps ha de ser "
@timestamp "
- Info
- steps
- create a data stream template
- create a data stream
- ingest data into data stream
- search documents
- rollover a data stream
- Cluster
- Optimal sizes
-
... |
|
... |
real usage |
minimum storage |
Calculating
storage requirements |
minimum_storage = Source_data * (1 +
number_of_replicas) * (1 +
indexing_overhead) / (1 -
Linux_reserved_space) / (1 -
OpenSearch_service_overhead)
minimum_storage = Source_data * (1 +
number_of_replicas) * 1.45
|
_cat/indices?v
_cat/allocation?v
- Dashboards: Index Management > Indexes
- Total size (= Size of primaries *
(1+number_of_replicas) + overhead)
|
number of shards |
Choosing
the number of shards |
- the number of primary shards cannot be
changed for an existing index
- default:
- AWS OpenSearch Service: 5 primary
shards + 1 replica = 10 shards
- open source OpenSearch: 1 primary
shard + 1 replica = 2 shards
- optimal size of a shard:
- where search latency is a key
performance objective: 10-30 GiB / shard
- for write-heavy workloads such as log
analytics: 30-50 GiB / shard
number_of_primary_shards =
(Source_data + room_to_grow) * (1 +
indexing_overhead) / desired_shard_size
- Maximum shards per node
- default: 1000 shards / node
cluster.max_shards_per_node
- ...
|
|
|
Choosing
instance types and testing |
|
|
- Sizing
Amazon OpenSearch Service domains
- Configuring
a multi-AZ domain in Amazon OpenSearch Service
- Shard distribution
- cada AZ ha de tenir, sumant tots els nodes de
dades d'aquella AZ, tots els shards, ja siguin els
primaris o les rèpliques
- standby
- with standby: una de les AZ està en stand-by
- without standby: totes les AZ estan actives,
però l'usuari ha de gestionar bé el nombre de
primaris i de rèpliques (almenys 1 rèplica)
- Availability zone disruptions
- Index
State Management
- ...
- Shards and nodes
- Each shard stores a subset of all documents in an index
- Shards are used for even distribution across nodes in a
cluster. A good rule of thumb is to limit shard size to
10–50 GB. (index 1: split into 2 shards; index 2: split into
4 shards)
- Primary and replica shards (index 1: 2 primary shards + 2
replica shards; index 2: 4 primary shards + 4 replica
shards). Default: 5 primary shards + 1 replica = 10 shards
- ...
|
|
|
|
- Clients
- OpenSearch Dashboards
- Self-hosted
- Amazon OpenSearch Service
- Go to details and click on url
- Problemes / Problems
{"Message":"User: anonymous is not
authorized to perform: es:ESHttpGet with an
explicit deny in a resource-based policy"}
- Solució / Solution
- Amazon OpenSearch Service / Domains
- select your domain and go to tab
"Security configuration"
- Access policy:
...
"Effect": "Deny"
"Allow"
- OpenSearch
Dashboards quickstart guide
- Dark mode
- Management / Dashboards Management / Advanced
settings / Appearance / Dark mode
- Dev
Tools console
- Discover
- ...
- REST API
- Python
- ...
-
|
REST API (curl -X ...) |
Dev Tools (OpenSearch Dashboards) |
health |
GET "https://localhost:9200/_cluster/health |
GET _cluster/health |
index
a document
(add an entry to an index)
(index students is automatically created) |
PUT
https://<host>:<port>/<index-name>/_doc/<document-id> |
PUT /students/_doc/1
{
"name": "John Doe",
"gpa": 3.89,
"grad_year": 2022
} |
dynamic
mapping
|
|
GET /students/_mapping
|
Search
your data
|
|
GET /students/_search
GET
/students/_search
{
"query": {
"match_all": {}
}
}
|
Updating
documents (total upload)
|
|
PUT /students/_doc/1
{
"name": "John Doe",
"gpa": 3.91,
"grad_year": 2022,
"address": "123 Main St."
}
|
Updating documents (partial upload) |
|
POST /students/_update/1/
{
"doc": {
"gpa": 3.74,
"address": "123 Main St."
}
} |
Delete a document |
|
DELETE /students/_doc/1 |
Delete index
|
|
DELETE /students
|
Index mapping and settings |
|
PUT /students
{
"settings": {
"index.number_of_shards": 1
},
"mappings": {
"properties": {
"name": {
"type":
"text"
},
"grad_year": {
"type":
"date"
}
}
}
}
GET /students/_mapping |
Bulk
ingestion |
POST "https://localhost:9200/_bulk"
-H 'Content-Type: application/json' -d'
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year":
2025 }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }
' |
POST _bulk
{ "create": { "_index": "students", "_id": "2" } }
{ "name": "Jonathan Powers", "gpa": 3.85, "grad_year":
2025 }
{ "create": { "_index": "students", "_id": "3" } }
{ "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 } |
Ingest from local json files (sample mapping) |
curl -H "Content-Type: application/json" -X PUT
"https://localhost:9200/ecommerce" -ku
admin:<custom-admin-password> --data-binary
"@ecommerce-field_mappings.json" |
|
Ingest from local json files (sample data) |
curl -H "Content-Type: application/x-ndjson" -X
PUT "https://localhost:9200/ecommerce/_bulk" -ku
admin:<custom-admin-password> --data-binary
"@ecommerce.ndjson" |
|
Query |
|
GET /ecommerce/_search
{
"query": {
"match": {
"customer_first_name":
"Sonya"
}
}
} |
Query string queries |
|
GET /students/_search?q=name:john |
|
|
|
- Ingest
your data into OpenSearch
- Search
your data
- ...
|
|
|
|
|
|
- Mapping
parameters
- Analyzer:
- source text -> 1. char_filter
-> 2. tokenizer
-> 3. token
filter -> terms
- Classification:
- Testing
an analyzer
- Exemples / Examples
- url
- Analyze
URL paths to search individual elements in Amazon
Elasticsearch Service
PUT scratch_index
{
"settings": {
"analysis": {
"char_filter": {
"my_clean":
{
"type": "mapping",
"mappings": ["/ => \\u0020",
"s3: => \\u0020"]
}
},
"tokenizer": {
"my_tokenizer":
{
"type": "simple_pattern",
"pattern": "[a-zA-Z0-9\\.\\-]*"
}
},
"analyzer": {
"s3_path_analyzer": {
"char_filter": ["my_clean"],
"tokenizer": "my_tokenizer",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"s3_key": {
"type": "text",
"analyzer": "s3_path_analyzer"
}
}
}
}
PUT scratch_index
{
"settings": {
"analysis": {
"char_filter": {
"url_clean":
{
"type": "mapping",
"mappings": ["/ => \\u0020",
"https: => \\u0020"]
}
},
"tokenizer": {
"url_tokenizer":
{
"type": "simple_pattern",
"pattern": "[a-zA-Z0-9\\.\\-]*"
}
},
"analyzer": {
"url_path_analyzer": {
"char_filter": ["url_clean"],
"tokenizer": "url_tokenizer",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"properties": {
"my_url_field": {
"type": "text",
"analyzer": "url_path_analyzer"
}
}
}
}
- Normalizer
-
set |
get |
PUT my_index
{
"settings": {
"analysis": {
"char_filter":
{
"my_char_filter":
{}
},
"tokenizer":
{
"my_tokenizer":
{}
},
"filter":
{
"my_filter":
{}
},
"analyzer":
{
"my_analyzer":
{
"type": "custom",
"char_filter": ["my_char_filter"],
"tokenizer": "my_tokenizer",
"filter": ["my_filter"]
}
}
}
}
"mappings": {
"properties": {
"my_field": {
"analyzer":
"my_analyzer"
}
}
}
} |
GET my_index/_settings
GET my_index/_mapping |
- ...
|
http://www.francescpinyol.cat/opensearch.html
Primera versió: / First version: 9.XI.2024
Darrera modificació: 19 de gener de 2025 / Last update: 19th
January 2025
Cap a casa / Back home. |