CPS-1272 Support for contains in cps-path
we would like to bring support for contains operator in cps-path.
contains() is a method in XPath expression. It is used when the value of any attribute changes dynamically, below are the examples,
Reference
Issues & decisions
Native query for Contains Operator using Like Keyword :
# | Issue | Notes | Decisions |
---|---|---|---|
1 | Which keyword to use ? Do we want case sensitivity or not? Do we follow the Xpath contains or do we become specific? |
| As per discussion , with Toine Siebelink , community call like keyword would has consistency which support case sensitive attribute values. Need to discuss with stakeholders. As per discussion in community call , decided to go with like keyword more consistent support case sensitivity. |
# | Json Data | CPS-PATH Syntax | Output |
---|---|---|---|
1 | Below is the sample data , Here are ways to use contains keyword : | <cps-path>(contains'[@leafname,'<string-value>']')
| |
Native Query for contains keyword
1.Using LIKE Keyword :
Like operator is used to match specified matching pattern. It has two signs :
% : Matches any sequence of character, the character size may be 0 or more.
_ : Matches any single character.
# | Query | Output |
---|---|---|
1 | cpsdb=# SELECT * FROM FRAGMENT WHERE anchor_id = 4 and attributes->>'lang' like '%en%'; | |
2 | cpsdb=# SELECT * FROM FRAGMENT WHERE anchor_id = 4 and attributes->>'lang' ilike '%En%'; | |
3 | cpsdb=# SELECT * FROM FRAGMENT WHERE anchor_id = 4 and attributes->>'lang' like 'en'; |
2.Using SIMILAR TO Regular Expression Keyword :
The only difference between like
and similar to
is to pattern matches the given string. It is similar to LIKE
, except that it interprets the pattern using the SQL standard's definition of a regular expression
SIMILAR TO
supports these pattern-matching metacharacters borrowed from POSIX regular expressions:
|
denotes alternation (either of two alternatives).*
denotes repetition of the previous item zero or more times.+
denotes repetition of the previous item one or more times.?
denotes repetition of the previous item zero or one time.{
m
}
denotes repetition of the previous item exactlym
times.{
m
,}
denotes repetition of the previous itemm
or more times.{
m
,
n
}
denotes repetition of the previous item at leastm
and not more thann
times.Parentheses
()
can be used to group items into a single logical item.A bracket expression
[...]
specifies a character class, just as in POSIX regular expressions.
# | Query | Output |
---|---|---|
1 | cpsdb=# SELECT * FROM FRAGMENT WHERE anchor_id = 3 and attributes->>'pub_year'similar to '%(94|95)%'; |
Performance wise : As we are not making much changes for query , the performance is similar to existing query will not effect much
Implementation of Contains Operator
1.Update antlr parser to recognize this pattern
2.Implement required (native) query
3.Add Integration tests for
a.filter on string leaf-value
b.filter on Integer leaf-value
4.Update documentation
5.demo to team
Limitations
1. contains condition is case sensitive.
2. Only leaves can be used, leaf-list are not supported.
3. Only string and integer values are supported, boolean and float values are not supported.
4. When empty value is passed with contains it returns all the nodes that has given leaf element.