...
Repository queries taking collection parameters are limited to 32K collection items.
For example, given a FragmentRepository method:
Code Block |
---|
|
@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath IN :xpaths", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpathxpaths); |
This would throw an exception if supplied with a collection of 32,767 xpaths.
...
Code Block |
---|
|
SELECT * FROM fragment WHERE anchor_id = ? AND xpath IN (?,?,?,?,?,?...,?) |
Each collection item (xpath) is supplied as an SQL query parameter. Hibernate with Postgres has a limit of 32,767 query parameters per SQL statement, so an exception will be thrown with too many.
CPS-1574 is to remove those limits (and write tests to verify).
Tests
As part of CPS-1574, tests have been added to confirm the presence of the limit:
Code Block |
---|
|
def 'Delete datanodes from multiple anchors limit exceeded: 32,766 (~ 2^15) anchors.'() {
given: 'more than 32,766 anchor names'
def anchorNames = (0..32_766).collect { "size-of-this-name-does-not-matter-for-limit-" + it }
when: 'single operation is executed to delete all datanodes in given anchors'
objectUnderTest.deleteDataNodes(CPS_PERFORMANCE_TEST_DATASPACE, anchorNames, OffsetDateTime.now())
then: 'a database exception is thrown'
thrown(DataAccessResourceFailureException.class)
} |
Once the solution is implemented, these tests will be updated to show that no exception is thrown, with even more items e.g. 40,000.
Proposal #1: List partitioning
This solution adds a wrapper method that partitions the input collection into lists of 32,000 and calls the original repository method , collecting the resultson each batch:
Code Block |
---|
|
@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath IN :xpaths", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths);
default List<FragmentEntity> findAllByAnchorIdAndXpathInBatch(final int anchorId, final List<String> xpaths) {
List<FragmentEntity> results = new ArrayList<>(xpaths.size());
for (final Collection<String> xpathsBatch : Lists.partition(xpaths, 32000)) {
results.addAll(findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch));
}
return results;
} |
In addition, if we wish to avoid the overhead of adding the results to a collection when not needed, we may directly call the original when the collection size is less than 32,000:
Code Block |
---|
|
@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath IN :xpaths", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths);
default List<FragmentEntity> findAllByAnchorIdAndXpathInBatch(final int anchorId, final List<String> xpaths) {
if (xpaths.size() < 32000) {
return findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch);
}
List<FragmentEntity> results = new ArrayList<>(xpaths.size());
for (final Collection<String> xpathsBatch : Lists.partition(xpaths, 32000)) {
results.addAll(findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch));
}
return results;
} |
...
This solution uses a third-party library to add PostGres array support to queries. Postgres arrays
Adding the dependency to cps-ri/pom.xml:
Code Block |
---|
|
<dependency>
<groupId>io.hypersistence</groupId>
<artifactId>hypersistence-utils-hibernate-52</artifactId>
</dependency> |
This solution involves creating two versions of each repository method taking a collection parameter - one that uses an array, and one using a Collection:
Code Block |
---|
|
@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath = ANY(:xpaths)", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(@Param("anchorId") int anchorId, @Param("xpaths") String[] xpaths);
default List<FragmentEntity> findAllByAnchorIdAndXpathIn(final int anchorId, final Collection<String> xpaths) {
return findAllByAnchorIdAndXpathIn(anchorId, xpaths.toArray(new String[0]));
} |
This would result in the following SQL, given any number of xpaths:
Code Block |
---|
|
SELECT * FROM fragment WHERE anchor_id = ? AND xpath = ANY(?) |
As seen, only two SQL query parameters will be ever used. This is because the array is binary-encoded and sent as a single parameter.
Using Postgres arrays can simplify existing code
The current code for has code using Postgres arrays to support "LIKE ANY" logic (using LIKE with a collection). However, it uses query parameters, and is thus limited to 32K parameters.
Code Block |
---|
|
@Override
public void deleteByAnchorIdAndXpathLikeAny(final int anchorId, final Collection<String> xpathPatterns) {
final String queryString = "DELETE FROM fragment f WHERE f.anchor_id = ? AND xpath LIKE ANY (array[:parameterPlaceholders])";
executeUpdateWithAnchorIdAndCollection(queryString, anchorId, xpathPatterns);
}
// Accept security hotspot as placeholders in SQL query are created internally, not from user input.
@SuppressWarnings("squid:S2077")
private void executeUpdateWithAnchorIdAndCollection(final String sqlTemplate, final int anchorId, final Collection<String> collection) {
if (!collection.isEmpty()) {
final String parameterPlaceholders = String.join(",", Collections.nCopies(collection.size(), "?"));
final String queryStringWithParameterPlaceholders =
sqlTemplate.replaceFirst(":parameterPlaceholders\\b", parameterPlaceholders);
final Query query = entityManager.createNativeQuery(queryStringWithParameterPlaceholders);
query.setParameter(1, anchorId);
int parameterIndex = 2;
for (final String parameterValue : collection) {
query.setParameter(parameterIndex++, parameterValue);
}
query.executeUpdate();
}
} |
Using Postgres arrays, this code can be written directly in the FragmentRepository interface using Query annotation:
Code Block |
---|
|
@Modifying
@Query(value = "DELETE FROM fragment f WHERE anchor_id = :anchorId AND xpath LIKE ANY (:xpathPatterns)", nativeQuery = true)
void deleteByAnchorIdAndXpathLikeAny(@Param("anchorId") int anchorId, @Param("xpathPatterns") String[] xpathPatterns);
default void deleteByAnchorIdAndXpathLikeAny(int anchorId, Collection<String> xpathPatterns) {
deleteByAnchorIdAndXpathLikeAny(anchorId, xpathPatterns.toArray(new String[0]));
} |
Comparison
- Both solutions have same performance (as determined using Groovy performance tests modified to take 100,000 items)
- Solution #1 is explicit, and easy to understand
- Solution #2 is shorter, but may be not as clear to the reader what it is used for
- Solution #2 is more flexible, adding array support unlocks more Postgres features.