search
See the ElasticSearch documentation for configuration and Salt installation.
Index
The search
app has a SearchIndex
class in search/search.py
.
This class does the hard work of searching.
As a developer, you just need to create a new class for each of your indexes. Sample index classes are here:
ContactSearch
in https://gitlab.com/kb/contact/blob/master/contact/search.pyTicketSearch
in https://gitlab.com/kb/crm/blob/master/crm/search.py
ElasticSearch 8.x
Changes required for this version include e.g. ContactSearch
Remove the
document_type
property (look in theMixin
class).Rename
_settings
tosettings
Rename
configuration
tomappings
and make it a method (not aproperty
)mappings
to return just theproperties
i.e. remove thesettings
,mappings
anddocument_type
keys.
For sample code, see:
Delete
We have updated our strategy to include deleted rows in the index, with the option to include them in a search.
For more information, on our delete strategy, see the following links:
View
The search
app also has SearchViewMixin
which can be used to search the
indexes e.g:: dash.SearchView
in
https://gitlab.com/kb/kbsoftware_couk/blob/master/dash/views.py#L66
which uses ContactSearch
and TicketSearch
(defined above).
Rebuild and Refresh
To rebuild your index, create a task e.g. rebuild_contact_index
in
https://gitlab.com/kb/contact/blob/master/contact/tasks.py
Create a management command to call the rebuild_contact_index
task:
e.g. rebuild_contact_index.py
To refresh your index, use the same idea, but call refresh
instead e.g:
index = SearchIndex(ContactIndex())
count = index.refresh()
Search
Issues are much easier to diagnose if you can run a simple management command to perform a search query:
Create a management command called search_contact_index
e.g. search_contact_index.py
Update
Create an index update function e.g:: update_contact_index
in
https://gitlab.com/kb/contact/blob/master/contact/tasks.py
In your create and update views, call the update task e.g:
from django.db import transaction
from contact.tasks import update_contact_index
transaction.on_commit(lambda: update_contact_index.delay(self.object.pk))
For a simple UpdateView
, the minimum viable code is as follows:
def form_valid(self, form):
result = super().form_valid(form)
transaction.on_commit(lambda: update_contact_index.delay(self.object.pk))
return result
ElasticSearch
Prerequisites
Setup Celery (using Redis)…
Install
Follow the ElasticSearch - Getting Started instructions…
In your requirements/base.txt
, add the following:
elasticsearch
Tip
Find the version number in Requirements
In settings/production.py
(after CELERY_DEFAULT_QUEUE
):
from celery.schedules import crontab
CELERYBEAT_SCHEDULE = {
'rebuild_contact_index': {
'task': 'contact.tasks.rebuild_contact_index',
'schedule': crontab(minute='10', hour='1'),
},
}
Note
Remember to use the correct pattern for transactions when queuing search index updates. For details, see Transactions
Diagnostics
Analyze
To understand how your field is being analyzed (this example is from the Contact app):
from search.search import SearchIndex
from contact.search import ContactIndex
index = SearchIndex(ContactIndex())
index.drop_create()
import json
print(json.dumps(index.analyze('autocomplete', 'EX2 2AB'), indent=4))
print(json.dumps(index.analyze('autocomplete_search', 'EX2 2AB'), indent=4))
Note
This example uses the ContactSearch
index and the autocomplete
analyzers:
https://gitlab.com/kb/contact/blob/master/contact/search.py#L61
For other diagnostics, see Diagnostics…
Explain
To understand the score for search results:
Make sure
DEBUG
is set toTrue
in yoursettings
.Add
explain
to your call to thesearch
method.
e.g:
result = search_index.search(
criteria,
explain=True,
)
Tip
You could add this to the SearchViewMixin
class
in search/views.py
.
The _explain
method in search/search.py
will write a time-stamped file
containing the results e.g. elastic-explain-2019-01-07-13-20-46.json
.
Maintenance
To manually update the index run the management command created earlier (see rebuild_contact_index.py).
The flush process of an index basically frees memory:
curl localhost:9200/_flush
Test
To check the install:
curl -X GET 'http://localhost:9200/?pretty'
Query
Note
Replace hatherleigh_info
with your site name.
Install httpie
:
pip install httpie
Create a json
file containing your query e.g. query.json
:
{
"query": {
"match": {
"part": "B020"
}
}
}
In this example, we are searching for B020
.
Run the query:
http GET http://localhost:9200/hatherleigh_info/_search < query.json
Explain
To explain the query above:
http GET "http://localhost:9200/hatherleigh_info/part/_validate/query?explain" < query.json
The part
is the document type (DOC_TYPE
in the index mappings
below):
es.indices.create(
SEARCH_INDEX,
{
'mappings': {
self.DOC_TYPE: {
"properties": {
"part": {
"type": "string",
"analyzer": "autocomplete",
},
Analyze
See Diagnostics above… or:
Create a json
file containing your query e.g. analyze.json
:
{
"analyzer": "autocomplete",
"text": "quick brown"
}
Run the analysis:
http GET http://localhost:9200/hatherleigh_info/_analyze < analyze.json