OpenSearch

Índex

General

  • OpenSearch
    • Introduction to OpenSearch
      • OpenSearch Database
        index table
        document (JSON) row
        shard (fragment)

      • Index configured with:
        • mapping: a collection of fields and the types of those fields
        • settings: include index data like the index name, creation date, and number of shards
    • ...

Servidor / Server

  • Getting started
    • Installation quickstart (using Docker)
      1. system setup
        • sudo -i
        • swapoff -a
        • echo "vm.max_map_count=262144" >>/etc/sysctl.conf
        • sysctl -p
      2. download compose file
        • mkdir ~/opensearch
        • cd ~/opensearch
        • curl -O https://raw.githubusercontent.com/opensearch-project/documentation-website/2.18/assets/examples/docker-compose.yml
      3. set admin password
        • cd ~/opensearch
        • echo "OPENSEARCH_INITIAL_ADMIN_PASSWORD=xxxxxx" >.env
      4. start
      5. verify (3 lines should appear)
        • docker compose ps
      6. Experiment with sample data

        • generate your own from an existing index or download a sample apply
          mapping elasticdump --debug --input=https://master:xxx@<my_cluster_host>/myindex --output=myindex_mappings.json --type=mapping
          ecommerce-field_mappings.json
          • curl -H "Content-Type: application/json" -X PUT "https://localhost:9200/ecommerce" -ku admin:<custom-admin-password> --data-binary "@ecommerce-field_mappings.json"
          • curl -H "Content-Type: application/json" -X PUT "https://localhost:9200/myindex" -ku admin:<custom-admin-password> --data-binary "@myindex_mappings.json"
          data elasticdump --debug --input=https://master:xxx@<my_cluster_host>/myindex --output=myindex.ndjson ecommerce.ndjson
          • curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce/_bulk" -ku admin:<custom-admin-password> --data-binary "@ecommerce.ndjson"
          • curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/myindex/_bulk" -ku admin:<custom-admin-password> --data-binary "@myindex.ndjson"
  • Import / Export
  • ...

Seguretat / Security

Clients

  • Clients
    • OpenSearch Dashboards
      • Self-hosted
      • Amazon OpenSearch Service
        • Go to details and click on url
        • Problemes / Problems
          • {"Message":"User: anonymous is not authorized to perform: es:ESHttpGet with an explicit deny in a resource-based policy"}
            • Solució / Solution
              • Amazon OpenSearch Service / Domains
                • select your domain and go to tab "Security configuration"
                • Access policy:
                  • ...
                    "Effect": "Deny" "Allow"
      • OpenSearch Dashboards quickstart guide
      • Dark mode
        • Management / Dashboards Management / Advanced settings / Appearance / Dark mode
      • Dev Tools console
      • Discover
      • ...
    • REST API
    • Python
    • ...

  • REST API (curl -X ...) Dev Tools (OpenSearch Dashboards)
    health GET "https://localhost:9200/_cluster/health GET _cluster/health
    index a document
    (add an entry to an index)
    (index students is automatically created)
    PUT https://<host>:<port>/<index-name>/_doc/<document-id> PUT /students/_doc/1
    {
      "name": "John Doe",
      "gpa": 3.89,
      "grad_year": 2022
    }
    dynamic mapping

    GET /students/_mapping
    Search your data

    GET /students/_search
    GET /students/_search
    {
      "query": {
        "match_all": {}
      }
    }

    Updating documents (total upload)

    PUT /students/_doc/1
    {
      "name": "John Doe",
      "gpa": 3.91,
      "grad_year": 2022,
      "address": "123 Main St."
    }

    Updating documents (partial upload)
    POST /students/_update/1/
    {
      "doc": {
        "gpa": 3.74,
        "address": "123 Main St."
      }
    }
    Delete a document
    DELETE /students/_doc/1
    Delete index

    DELETE /students
    Index mapping and settings
    PUT /students
    {
      "settings": {
        "index.number_of_shards": 1
      },
      "mappings": {
        "properties": {
          "name": {
            "type": "text"
          },
          "grad_year": {
            "type": "date"
          }
        }
      }
    }

    GET /students/_mapping
    Bulk ingestion POST "https://localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
    { "create": { "_index": "students", "_id": "2" } }
    { "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 }
    { "create": { "_index": "students", "_id": "3" } }
    { "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }
    '
    POST _bulk
    { "create": { "_index": "students", "_id": "2" } }
    { "name": "Jonathan Powers", "gpa": 3.85, "grad_year": 2025 }
    { "create": { "_index": "students", "_id": "3" } }
    { "name": "Jane Doe", "gpa": 3.52, "grad_year": 2024 }
    Ingest from local json files (sample mapping) curl -H "Content-Type: application/json" -X PUT "https://localhost:9200/ecommerce" -ku admin:<custom-admin-password> --data-binary "@ecommerce-field_mappings.json"
    Ingest from local json files (sample data) curl -H "Content-Type: application/x-ndjson" -X PUT "https://localhost:9200/ecommerce/_bulk" -ku admin:<custom-admin-password> --data-binary "@ecommerce.ndjson"
    Query
    GET /ecommerce/_search
    {
      "query": {
        "match": {
          "customer_first_name": "Sonya"
        }
      }
    }
    Query string queries
    GET /students/_search?q=name:john



  • Ingest your data into OpenSearch
  • Search your data
  • ...

Índexs / Indexes

Query DSL

  • query
    Boolean query
    must AND GET _search
    {
      "query": {
        "bool": {
          "must": [
            {}
          ],
          "must_not": [
            {}
          ],
          "should": [
            {}
          ],
          "filter": {}
        }
      }
    }
    must_not NOT
    should OR
    filter AND

    Filter context Query context

    Term-level queries Full-text queries

    • no relevance
    • cached
    • exact matches
    • not for text (except keyword)
    • relevance
    • not cached
    • non-exact matches
    • for text

    term
    • value
    • boost
    • case_insensitive

    terms

    terms_set
    • terms
    • minimum_should_match_field
    • minimum_should_match_script
    • boost

    ids
    • vallues
    • boost

    range
    • operators
      • gte
      • gt
      • lte
      • lt
    • format
    • relation
    • boost
    • time_zone

    prefix
    • value
    • boost
    • case_insensitive
    • rewrite

    exists
    • boost

    fuzzy
    • value
    • boost
    • fuzziness
    • max_expansions
    • prefix_length
    • rewrite
    • transpositions

    wildcard
    • value
    • boost
    • case_insensitive
    • rewrite

    regexp
    • value
    • boost
    • case_insensitive
    • flags
    • max_determinized_states
    • rewrite

    intervals
    rule parameters
    match
    • query
    • analyzer
    • filter
    • max_gaps
    • ordered
    • use_field
    prefix
    • ...
    wildcard
    fuzzy
    all_of
    any_of

    match
    • query
    • auto_generate_synonyms_phrase_query
    • analyzer
    • boost
    • enable_position_increments
    • fuzziness
    • fuzzy_rewrite
    • fuzzy_transpositions
    • lenient
    • max_expansions
    • minimum_should_match
    • operator
    • prefix_length
    • zero_terms_query

    match_bool_prefix
    • query
    • analyzer
    • fuzziness
    • fuzzy_rewrite
    • fuzzy_transpositions
    • max_expansions
    • minimum_should_match
    • operator
    • prefix_length

    match_phrase
    • query
    • analyzer
    • slop
    • zero_terms_query

    match_phrase_prefix
    • query
    • analyzer
    • max_expansions
    • slop

    multi_match
    • query
    • auto_generate_synonyms_phrase_query
    • analyzer
    • boost
    • fields
    • fuzziness
    • fuzzy_rewrite
    • fuzzy_transpositions
    • lenient
    • max_expansions
    • minimum_should_match
    • operator
    • prefix_length
    • slop
    • tie_breaker
    • type
    • zero_terms_query

    query_string
    • query
    • allow_leading_wildcard
    • analyze_wildcard
    • analyzer
    • auto_generate_synonyms_phrase_query
    • boost
    • default_field
    • default_operator
    • enable_position_increments
    • fields
    • fuzziness
    • fuzzy_max_expansions
    • fuzzy_transpositions
    • max_determinized_states
    • minimum_should_match
    • phrase_slop
    • quote_analyzer
    • quote_field_suffix
    • rewrite
    • time_zone

    simple_query_string

    aggs


  • ...

Anàlisi de text / Text analysis

  • Mapping parameters
  • Analyzer:
    • source text -> 1. char_filter -> 2. tokenizer -> 3. token filter -> terms
    • Classification:
    • Testing an analyzer
    • Exemples / Examples
      • url
        • Analyze URL paths to search individual elements in Amazon Elasticsearch Service
        • PUT scratch_index
          {
            "settings": {
              "analysis": {
                "char_filter": {
                  "my_clean": {
                    "type": "mapping",
                    "mappings": ["/ => \\u0020",
                                 "s3: => \\u0020"]
                  }
                },
                "tokenizer": {
                  "my_tokenizer": {
                    "type": "simple_pattern",
                    "pattern": "[a-zA-Z0-9\\.\\-]*"
                  }
                },
                "analyzer": {
                  "s3_path_analyzer": {
                    "char_filter": ["my_clean"],
                    "tokenizer": "my_tokenizer",
                    "filter": ["lowercase"]
                  }
                }
              }
            },
            "mappings": {
                "properties": {
                  "s3_key": {
                    "type": "text",
                    "analyzer": "s3_path_analyzer"
                  }
                }
            }
          }
        • PUT scratch_index
          {
            "settings": {
              "analysis": {
                "char_filter": {
                  "url_clean": {
                    "type": "mapping",
                    "mappings": ["/ => \\u0020",
                                 "https: => \\u0020"]
                  }
                },
                "tokenizer": {
                  "url_tokenizer": {
                    "type": "simple_pattern",
                    "pattern": "[a-zA-Z0-9\\.\\-]*"
                  }
                },
                "analyzer": {
                  "url_path_analyzer": {
                    "char_filter": ["url_clean"],
                    "tokenizer": "url_tokenizer",
                    "filter": ["lowercase"]
                  }
                }
              }
            },
            "mappings": {
                "properties": {
                  "my_url_field": {
                    "type": "text",
                    "analyzer": "url_path_analyzer"
                  }
                }
            }
          }
  • Normalizer
  • set get
    PUT my_index
    {
      "settings": {
        "analysis": {
          "char_filter": {
            "my_char_filter": {}
          },
          "tokenizer": {
            "my_tokenizer": {}
          },
          "filter": {
            "my_filter": {}
          },
          "analyzer": {
            "my_analyzer": {
              "type": "custom",
              "char_filter": ["my_char_filter"],
              "tokenizer": "my_tokenizer",
              "filter": ["my_filter"]
            }
          }
        }
      }
      "mappings": {
        "properties": {
          "my_field": {
            "analyzer": "my_analyzer"
          }
        }
      }
    }
    GET my_index/_settings
    GET my_index/_mapping
  • ...

http://www.francescpinyol.cat/opensearch.html
Primera versió: / First version: 9.XI.2024
Darrera modificació: 1 de desembre de 2024 / Last update: 1st December 2024

Valid HTML 4.01!

Cap a casa / Back home.