...

Repository queries taking collection parameters are limited to 32K collection items.
For example, given a FragmentRepository method:

Code Block

language	java

@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath IN :xpaths", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths);

...

Code Block

language	java

@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath IN :xpaths", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths);

default List<FragmentEntity> findAllByAnchorIdAndXpathInBatch(final int anchorId, final List<String> xpaths) {
    List<FragmentEntity> results = new ArrayList<>(xpaths.size());
    for (final Collection<String> xpathsBatch : Lists.partition(xpaths, 32000)) {
        results.addAll(findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch));
    }
    return results;
}

...

This solution is largely self-documenting, as the 32K limit is clearly seen in the code.

This solution still involves having a separate query parameter for each collection item - we are avoiding the limit by simply splitting into multiple SQL queries:

@Query(value = "

Code Block

language	java

sql

SELECT * FROM fragment WHERE anchor_id = :anchorId? AND xpath IN :xpaths", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths);

default List<FragmentEntity> findAllByAnchorIdAndXpathInBatch(final int anchorId, final List<String> xpaths) {
    if (xpaths.size() < 32000) {
        return findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch);
    }
    List<FragmentEntity> results = new ArrayList<>(xpaths.size());
    for (final Collection<String> xpathsBatch : Lists.partition(xpaths, 32000)) {
        results.addAll(findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch));
    }
    return results;
}(?,?,?,?,?,?...,?)
SELECT * FROM fragment WHERE anchor_id = ? AND xpath IN (?,?,?,?,?,?...,?)
SELECT * FROM fragment WHERE anchor_id = ? AND xpath IN (?,?,?,?)

Proposal #2: Postgres arrays

...

Code Block

language	java

@Query(value = "SELECT * FROM fragment WHERE anchor_id = :anchorId AND xpath = ANY(:xpaths)", nativeQuery = true)
List<FragmentEntity> findAllByAnchorIdAndXpathIn(@Param("anchorId") int anchorId, @Param("xpaths") String[] xpaths);

default List<FragmentEntity> findAllByAnchorIdAndXpathIn(final int anchorId, final Collection<String> xpaths) {
    return findAllByAnchorIdAndXpathIn(anchorId, xpaths.toArray(new String[0]));
}

This would result in the following a single SQL query, given any number of xpaths:

...

As seen, only two SQL query parameters will be ever used. This is because the array is binary-encoded and sent as a single parameter.

...

It is unclear if this has better performance in practice (limited testing shows both solutions have similar performance, but the test scenarios may be unrealistic).

Observations

Disadvantage of arrays: Limited array type support

Postgres array support is limited to primitive data types and String, that is int[], long[], float[], double[], boolean[], String[]

This means given a method such as:

Code Block

language	java

void deleteByAnchorIn(Collection<AnchorEntity> anchorEntities);

Changing to an array of AnchorEntity will not work. The following will not compile:

Code Block

language	java

void deleteByAnchorIn(AnchorEntity[] anchorEntities);

Rather, the method would have to be changed to use anchor ID's:

Code Block

language	java

void deleteByAnchorIn(int[] anchorIds);

This in turn limits the use to native queries, and not JPQL queries.

Advantage of arrays: Postgres arrays can simplify existing code

The current FragmentRepository has code already using Postgres arrays to support "LIKE ANY" logic (using LIKE with a collection). However, it uses query parameters, and is thus limited to 32K parameters.

...

Using native Postgres array support, this code can be written directly in the FragmentRepository interface using Query annotation, without the 32K limitation:

Code Block

language	java

    @Modifying
    @Query(value = "DELETE FROM fragment f WHERE anchor_id = :anchorId AND xpath LIKE ANY (:xpathPatterns)", nativeQuery = true)
    void deleteByAnchorIdAndXpathLikeAny(@Param("anchorId") int anchorId, @Param("xpathPatterns") String[] xpathPatterns);

    default void deleteByAnchorIdAndXpathLikeAny(int anchorId, Collection<String> xpathPatterns) {
        deleteByAnchorIdAndXpathLikeAny(anchorId, xpathPatterns.toArray(new String[0]));
    }

Comparison

Both Limited testing shows both solutions have same similar performance (as determined using Groovy performance tests modified to take 100,000 items)
Solution #1 is explicit using 32K batches, and so easy to understand what it does
Solution #2 is has shorter code, but may be not as clearSolution #2 is more flexible, adding array support unlocks more Postgres featuresnot very obvious to the reader what it does
Solution #2 simplifies code in the cases when we need Postgres array support.
It's possible to combine use both solutions, where most appropriate.

Version	Old Version 4	New Version 5
Changes made by	Daniel Hanrahan	Daniel Hanrahan
Saved on	Apr 19, 2023	Apr 19, 2023

Versions Compared

Key

Proposal #2: Postgres arrays

Observations

Disadvantage of arrays: Limited array type support

Advantage of arrays: Postgres arrays can simplify existing code

Comparison

Content Comparison

Versions Compared

Key

Proposal #2: Postgres arrays

Observations

Disadvantage of arrays: Limited array type support

Advantage of arrays: Postgres arrays can simplify existing code

Comparison