Background
Repository queries taking collection parameters are limited to 32K collection items.
For example, given a FragmentRepository method:
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpath);
This would throw an exception if supplied with a collection of 32,767 xpaths.
The reason is that JPA/Hibernate queries would translate the command to the following SQL:
SELECT * FROM fragment WHERE anchor_id = ? AND xpath IN (?,?,?,?...,?)
Each collection item (xpath) is supplied as an SQL query parameter. Hibernate with Postgres has a limit of 32,767 query parameters per SQL statement, so an exception will be thrown with too many.
CPS-1574 is to remove those limits (and write tests to verify).
Proposal #1: List partitioning
This solution partitions the input collection into lists of 32,000 and calls the original repository method, collecting the results:
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths); default List<FragmentEntity> findAllByAnchorIdAndXpathInBatch(final int anchorId, final List<String> xpaths) { List<FragmentEntity> results = new ArrayList<>(xpaths.size()); for (final Collection<String> xpathsBatch : Lists.partition(xpaths, 32000)) { results.addAll(findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch)); } return results; }
In addition, if we wish to avoid the overhead of adding the results to a collection when not needed, we may directly call the original when the collection size is less than 32,000:
List<FragmentEntity> findAllByAnchorIdAndXpathIn(int anchorId, Collection<String> xpaths); default List<FragmentEntity> findAllByAnchorIdAndXpathInBatch(final int anchorId, final List<String> xpaths) { if (xpaths.size() < 32000) { return findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch); } List<FragmentEntity> results = new ArrayList<>(xpaths.size()); for (final Collection<String> xpathsBatch : Lists.partition(xpaths, 32000)) { results.addAll(findAllByAnchorIdAndXpathIn(anchorId, xpathsBatch)); } return results; }
Proposal #2: Postgres arrays
This solution uses a third-party library to add PostGres array support to queries.