Skip to content

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jan 16, 2025

Enables disjunctions in full text functions with non-pushable functions when not using score.

Full text functions create a LuceneQueryExpressionEvaluator that provides the evaluation of a Lucene query over the retrieved documents in the compute engine. This way, full text functions can be evaluated when they are not pushed down.

In order to do that, it is necessary to provide ShardContexts to the EvaluatorMapper interface so the function QueryBuilder can be transformed into a Query, and the LuceneQueryExpressionEvaluator can retrieve the ShardContexts to execute the Query on the corresponding shards.

Pushing down filters now take into account whether a condition contains a disjunction with full text functions. The previous validation has been removed.

This doesn't take into account scoring, as we need to re-compute scores on the compute engine side. For now, we're disabling the use of disjunctions when METADATA _score is active.

Inspired by #113938

Pending work:

  • Add tests

FoldContext foldCtx,
Expression exp,
Layout layout,
List<ShardContext> shardContexts
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShardContexts are provided to the EvalMapper in order to include them on the ToEvaluator interface.


FoldContext foldCtx();

default List<EsPhysicalOperationProviders.ShardContext> shardContexts() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now creating an evaluator can use the shardContexts. This way the signature does not change for the EvaluatorMapper interface.

for (EsPhysicalOperationProviders.ShardContext shardContext : shardContexts) {
shardConfigs[i++] = new ShardConfig(shardContext.toQuery(queryBuilder()), shardContext.searcher());
}
return new LuceneQueryExpressionEvaluator.Factory(shardConfigs);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create the LuceneQueryExpressionEvaluator with the ShardContexts now that ToEvaluator include them

*
* @param condition condition to check for disjunctions of full text searches
*/
private static boolean checkPushableFullTextSearchFunctions(Expression condition) {
Copy link
Member Author

@carlosdelest carlosdelest Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the disjunction detection logic that references FullTextFunction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also handle column operator (:) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The operator uses Match function under the hood - it's just syntactic sugar

LookupFromIndexService lookupFromIndexService,
PhysicalOperationProviders physicalOperationProviders
PhysicalOperationProviders physicalOperationProviders,
List<ShardContext> shardContexts
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the ShardContexts so we can use them in the call to EvalMapper.toEvaluator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great if we can have some comments in the code about using ShardContext in ComputeService, LocalExecutionPlanner and EvalMapper, it took me a while to understand it :-).

@carlosdelest carlosdelest requested a review from nik9000 January 16, 2025 17:03
@carlosdelest
Copy link
Member Author

@nik9000 , I'd like an early review from you on the overall approach. Thanks!

@carlosdelest carlosdelest added >feature auto-backport Automatically create backport pull requests when merged Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Jan 17, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/fulltext/FullTextFunction.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/physical/local/PushFiltersToSource.java
@carlosdelest carlosdelest requested a review from jimczi January 22, 2025 16:21
@carlosdelest
Copy link
Member Author

@jimczi I'm adding you to this one to check if this solution for implementing disjunctions in full text functions matches your expectations. Scoring is not being handled yet, will do in a follow up.

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested this - works great!
I have one comment that I'd like to be addressed before merging - otherwise 🚀 !


@Override
public boolean translatable(LucenePushdownPredicates pushdownPredicates) {
return super.translatable(pushdownPredicates) && FullTextFunction.checkDisjunctionPushable(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this introduce a side effect where we might not pushdown a disjunction that could be pushed because it uses a FTF? I haven't tested this, but assuming we have:

FROM my-index
| WHERE match(foo, "bar") or x == "foobar"

Normally assuming that x == "foobar" can be pushed, we should be able to push match(foo, "bar") or x == "foobar" but I wonder if this still happens.
We added this method because we need it for the check where disjunctions are used with scoring.
I'd actually be in favour of not overriding this method here and just have the check done only in FullTextFunction if possible. It's also awkward how this method is actually called in FullTextFunction and then here we end up calling FullTextFunction.checkDisjunctionPushable(this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this introduce a side effect where we might not pushdown a disjunction that could be pushed because it uses a FTF?

You are correct! There is no longer a need to check disjunctions when scoring is not there. That's much better done in FullTextFunction class.

I corrected this in d23998d and added a plan optimizer test to check that a disjunction with pushable functions is effectively pushed down.

Thank you!

Comment on lines 117 to 118
* All functions used in the OR clauses are full-text functions themselves, or
* Scoring is not used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the meantime scoring has been put out of snapshot, hence it might be useful to reflect it here too.

Comment on lines 58 to 59
// Lucene based operators retrieve DocVectors as first block
DocVector docs = page.<DocBlock>getBlock(0).asVector();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this TBH, what about a guard / assertion that makes sure that the expected block is a DocBlock, to avoid CCE in case of implementation changes in Lucene operators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I added an assertion in 8e7a835

…ext-functions-disjunctions

# Conflicts:
#	x-pack/plugin/build.gradle
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
…unctions-disjunctions' into non-issue/esql-full-text-functions-disjunctions
Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @carlosdelest, LGTM! I added some thoughts about disjunctions with scores, it does not block this PR to merge, you may decide the best way to handle it.

Having LuceneQueryExpressionEvaluator is a great step forward to allow queries with full text functions and disjunction run into completion, I wonder if it is a good idea to allow disjunction with _score run to completion as well with warnings instead of VerificationException?

LookupFromIndexService lookupFromIndexService,
PhysicalOperationProviders physicalOperationProviders
PhysicalOperationProviders physicalOperationProviders,
List<ShardContext> shardContexts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great if we can have some comments in the code about using ShardContext in ComputeService, LocalExecutionPlanner and EvalMapper, it took me a while to understand it :-).

.stream()
.anyMatch(attr -> attr instanceof MetadataAttribute ma && ma.name().equals(MetadataAttribute.SCORE));
if (usesScore) {
checkFullTextSearchDisjunctions(condition, failures);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually can get scores for some disjunctions if both sides of the OR can be pushed down, however it is decided at data nodes, the coordinator node doesn't have this information.

For example, if the checkFullTextSearchDisjunctions is commented out, this query below can get score.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "from sample_data metadata _score | where match(message, \"Connected\") or cidr_match(client.ip, \"172.21.3.15\") | drop @timestamp, event.duration, test_date"
}
'
   client.ip   |       message       |      _score      
---------------+---------------------+------------------
172.21.2.162   |Connected to 10.1.0.3|0.7329744100570679
172.21.2.113   |Connected to 10.1.0.2|0.7329744100570679
172.21.3.15    |Connection error     |0.0               
172.21.3.15    |Connection error     |0.0               
172.21.3.15    |Connection error     |0.0               
172.21.3.15    |Connected to 10.1.0.1|0.7329744100570679   

I'm wondering if it is a good idea to not fail queries with disjunctions and metadata _score? One option that I can think of is that if data nodes can pushdown the OR tree to Lucene, _score is populated by the one returned by Lucene, otherwise if we end up with running full text function with LuceneQueryExpressionEvaluator, we can set _score to null with warnings that mention score is not available for this situation. This is not a new behavior in ES|QL, for the other ES|QL functions, when there are exceptions happen at data nodes, null + warnings will be returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it is a good idea to not fail queries with disjunctions and metadata _score?

I wondered about this as well. I think it's better to fail:

  • Score would be useless in this situation for relevance scoring
  • Users have a workaround: Either disable scoring or use full text functions for the query
  • This is a temporary limitation that we're aiming to fix

I'd prefer to fail early rather than provide a broken score, even with warnings.

carlosdelest and others added 2 commits January 28, 2025 11:38
…ext-functions-disjunctions

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/match-function.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
@carlosdelest carlosdelest enabled auto-merge (squash) January 28, 2025 10:41
@carlosdelest
Copy link
Member Author

It will be great if we can have some comments in the code about using ShardContext in ComputeService, LocalExecutionPlanner and EvalMapper, it took me a while to understand it :-).

@fang-xing-esql I added some comments to EvalMapper.toEvaluator in 5df79d4 - the rest of the changes imply adding ShardContext to constructors, so I don't think there's anything to comment there.

…unctions-disjunctions' into non-issue/esql-full-text-functions-disjunctions
@carlosdelest carlosdelest merged commit a87bd7a into elastic:main Jan 28, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120291

@carlosdelest
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Jan 28, 2025
…ions (elastic#120291)

(cherry picked from commit a87bd7a)

# Conflicts:
#	docs/reference/esql/esql-limitations.asciidoc
elasticsearchmachine pushed a commit that referenced this pull request Jan 28, 2025
…ions (#120291) (#121026)

(cherry picked from commit a87bd7a)

# Conflicts:
#	docs/reference/esql/esql-limitations.asciidoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >feature :Search/Search Search-related issues that do not fall into other categories Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants