-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL - Allow full text functions disjunctions for non-full text functions #120291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL - Allow full text functions disjunctions for non-full text functions #120291
Conversation
...l/compute/src/main/java/org/elasticsearch/compute/lucene/LuceneQueryExpressionEvaluator.java
Show resolved
Hide resolved
| FoldContext foldCtx, | ||
| Expression exp, | ||
| Layout layout, | ||
| List<ShardContext> shardContexts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShardContexts are provided to the EvalMapper in order to include them on the ToEvaluator interface.
|
|
||
| FoldContext foldCtx(); | ||
|
|
||
| default List<EsPhysicalOperationProviders.ShardContext> shardContexts() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now creating an evaluator can use the shardContexts. This way the signature does not change for the EvaluatorMapper interface.
| for (EsPhysicalOperationProviders.ShardContext shardContext : shardContexts) { | ||
| shardConfigs[i++] = new ShardConfig(shardContext.toQuery(queryBuilder()), shardContext.searcher()); | ||
| } | ||
| return new LuceneQueryExpressionEvaluator.Factory(shardConfigs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create the LuceneQueryExpressionEvaluator with the ShardContexts now that ToEvaluator include them
...asticsearch/xpack/esql/expression/predicate/operator/comparison/InsensitiveEqualsMapper.java
Show resolved
Hide resolved
| * | ||
| * @param condition condition to check for disjunctions of full text searches | ||
| */ | ||
| private static boolean checkPushableFullTextSearchFunctions(Expression condition) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the disjunction detection logic that references FullTextFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this also handle column operator (:) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The operator uses Match function under the hood - it's just syntactic sugar
| LookupFromIndexService lookupFromIndexService, | ||
| PhysicalOperationProviders physicalOperationProviders | ||
| PhysicalOperationProviders physicalOperationProviders, | ||
| List<ShardContext> shardContexts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the ShardContexts so we can use them in the call to EvalMapper.toEvaluator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great if we can have some comments in the code about using ShardContext in ComputeService, LocalExecutionPlanner and EvalMapper, it took me a while to understand it :-).
|
@nik9000 , I'd like an early review from you on the overall approach. Thanks! |
…unctions-disjunctions' into non-issue/esql-full-text-functions-disjunctions
|
Hi @carlosdelest, I've created a changelog YAML for you. |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java
|
@jimczi I'm adding you to this one to check if this solution for implementing disjunctions in full text functions matches your expectations. Scoring is not being handled yet, will do in a follow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested this - works great!
I have one comment that I'd like to be addressed before merging - otherwise 🚀 !
|
|
||
| @Override | ||
| public boolean translatable(LucenePushdownPredicates pushdownPredicates) { | ||
| return super.translatable(pushdownPredicates) && FullTextFunction.checkDisjunctionPushable(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this introduce a side effect where we might not pushdown a disjunction that could be pushed because it uses a FTF? I haven't tested this, but assuming we have:
FROM my-index
| WHERE match(foo, "bar") or x == "foobar"
Normally assuming that x == "foobar" can be pushed, we should be able to push match(foo, "bar") or x == "foobar" but I wonder if this still happens.
We added this method because we need it for the check where disjunctions are used with scoring.
I'd actually be in favour of not overriding this method here and just have the check done only in FullTextFunction if possible. It's also awkward how this method is actually called in FullTextFunction and then here we end up calling FullTextFunction.checkDisjunctionPushable(this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this introduce a side effect where we might not pushdown a disjunction that could be pushed because it uses a FTF?
You are correct! There is no longer a need to check disjunctions when scoring is not there. That's much better done in FullTextFunction class.
I corrected this in d23998d and added a plan optimizer test to check that a disjunction with pushable functions is effectively pushed down.
Thank you!
| * All functions used in the OR clauses are full-text functions themselves, or | ||
| * Scoring is not used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the meantime scoring has been put out of snapshot, hence it might be useful to reflect it here too.
| // Lucene based operators retrieve DocVectors as first block | ||
| DocVector docs = page.<DocBlock>getBlock(0).asVector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this TBH, what about a guard / assertion that makes sure that the expected block is a DocBlock, to avoid CCE in case of implementation changes in Lucene operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I added an assertion in 8e7a835
…ext-functions-disjunctions # Conflicts: # x-pack/plugin/build.gradle # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…unctions-disjunctions' into non-issue/esql-full-text-functions-disjunctions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @carlosdelest, LGTM! I added some thoughts about disjunctions with scores, it does not block this PR to merge, you may decide the best way to handle it.
Having LuceneQueryExpressionEvaluator is a great step forward to allow queries with full text functions and disjunction run into completion, I wonder if it is a good idea to allow disjunction with _score run to completion as well with warnings instead of VerificationException?
| LookupFromIndexService lookupFromIndexService, | ||
| PhysicalOperationProviders physicalOperationProviders | ||
| PhysicalOperationProviders physicalOperationProviders, | ||
| List<ShardContext> shardContexts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great if we can have some comments in the code about using ShardContext in ComputeService, LocalExecutionPlanner and EvalMapper, it took me a while to understand it :-).
| .stream() | ||
| .anyMatch(attr -> attr instanceof MetadataAttribute ma && ma.name().equals(MetadataAttribute.SCORE)); | ||
| if (usesScore) { | ||
| checkFullTextSearchDisjunctions(condition, failures); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually can get scores for some disjunctions if both sides of the OR can be pushed down, however it is decided at data nodes, the coordinator node doesn't have this information.
For example, if the checkFullTextSearchDisjunctions is commented out, this query below can get score.
+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
"query": "from sample_data metadata _score | where match(message, \"Connected\") or cidr_match(client.ip, \"172.21.3.15\") | drop @timestamp, event.duration, test_date"
}
'
client.ip | message | _score
---------------+---------------------+------------------
172.21.2.162 |Connected to 10.1.0.3|0.7329744100570679
172.21.2.113 |Connected to 10.1.0.2|0.7329744100570679
172.21.3.15 |Connection error |0.0
172.21.3.15 |Connection error |0.0
172.21.3.15 |Connection error |0.0
172.21.3.15 |Connected to 10.1.0.1|0.7329744100570679
I'm wondering if it is a good idea to not fail queries with disjunctions and metadata _score? One option that I can think of is that if data nodes can pushdown the OR tree to Lucene, _score is populated by the one returned by Lucene, otherwise if we end up with running full text function with LuceneQueryExpressionEvaluator, we can set _score to null with warnings that mention score is not available for this situation. This is not a new behavior in ES|QL, for the other ES|QL functions, when there are exceptions happen at data nodes, null + warnings will be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it is a good idea to not fail queries with disjunctions and metadata _score?
I wondered about this as well. I think it's better to fail:
- Score would be useless in this situation for relevance scoring
- Users have a workaround: Either disable scoring or use full text functions for the query
- This is a temporary limitation that we're aiming to fix
I'd prefer to fail early rather than provide a broken score, even with warnings.
…ext-functions-disjunctions # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-function.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@fang-xing-esql I added some comments to |
…ext-functions-disjunctions
…unctions-disjunctions' into non-issue/esql-full-text-functions-disjunctions
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…ions (elastic#120291) (cherry picked from commit a87bd7a) # Conflicts: # docs/reference/esql/esql-limitations.asciidoc
Enables disjunctions in full text functions with non-pushable functions when not using score.
Full text functions create a
LuceneQueryExpressionEvaluatorthat provides the evaluation of a Lucene query over the retrieved documents in the compute engine. This way, full text functions can be evaluated when they are not pushed down.In order to do that, it is necessary to provide
ShardContexts to theEvaluatorMapperinterface so the functionQueryBuildercan be transformed into aQuery, and theLuceneQueryExpressionEvaluatorcan retrieve theShardContexts to execute the Query on the corresponding shards.Pushing down filters now take into account whether a condition contains a disjunction with full text functions. The previous validation has been removed.
This doesn't take into account scoring, as we need to re-compute scores on the compute engine side. For now, we're disabling the use of disjunctions when
METADATA _scoreis active.Inspired by #113938
Pending work: