Home Programming Utilizing JDK Collectors to De-duplicate father or mother/little one nested collections – Java, SQL and jOOQ.

Utilizing JDK Collectors to De-duplicate father or mother/little one nested collections – Java, SQL and jOOQ.

0
Utilizing JDK Collectors to De-duplicate father or mother/little one nested collections – Java, SQL and jOOQ.

[ad_1]

In basic SQL (i.e. earlier than jOOQ’s superior MULTISET operator), nested collections had been fetched utilizing atypical (outer) joins. An instance of such a question can be a question operating towards the sakila database to fetch actors and their movies. Utilizing jOOQ:

End result<?> consequence =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        FILM.FILM_ID,
        FILM.TITLE)
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .orderBy(
       ACTOR.ACTOR_ID,
       FILM.FILM_ID)
   .fetch();

The consequence from the jOOQ debug log would look one thing like this:

+--------+----------+---------+-------+---------------------+
|actor_id|first_name|last_name|film_id|title                |
+--------+----------+---------+-------+---------------------+
|       1|PENELOPE  |GUINESS  |      1|ACADEMY DINOSAUR     |
|       1|PENELOPE  |GUINESS  |     23|ANACONDA CONFESSIONS |
|       1|PENELOPE  |GUINESS  |     25|ANGELS LIFE          |
|       1|PENELOPE  |GUINESS  |    106|BULWORTH COMMANDMENTS|
|       1|PENELOPE  |GUINESS  |    140|CHEAPER CLYDE        |
+--------+----------+---------+-------+---------------------+

As anticipated for a SQL be a part of operation, this denormalises the information, resulting in duplicate entries for every actor, or for those who type in another way, e.g. by FILM_ID, you then’d additionally see the duplicate entries per movie:

+--------+----------+---------+-------+----------------+
|actor_id|first_name|last_name|film_id|title           |
+--------+----------+---------+-------+----------------+
|       1|PENELOPE  |GUINESS  |      1|ACADEMY DINOSAUR|
|      10|CHRISTIAN |GABLE    |      1|ACADEMY DINOSAUR|
|      20|LUCILLE   |TRACY    |      1|ACADEMY DINOSAUR|
|      30|SANDRA    |PECK     |      1|ACADEMY DINOSAUR|
|      40|JOHNNY    |CAGE     |      1|ACADEMY DINOSAUR|
+--------+----------+---------+-------+----------------+

That is simply how a be a part of operation works. It creates a cartesian product after which filters by main key / overseas key matches, which JPA customers know underneath the title MultipleBagFetchException.

De-duplicating and nesting the collections with fetchGroups()

What we normally need is a few form of nested information construction, e.g. fetch the movies per actor. A easy utility in jOOQ is to only use fetchGroups():

Map<ActorRecord, End result<FilmRecord>> consequence =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        FILM.FILM_ID,
        FILM.TITLE)
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .orderBy(
       ACTOR.ACTOR_ID,
       FILM.FILM_ID)
   .fetchGroups(ACTOR, FILM);

That is tremendous handy, it’s order preserving, however it has a flaw. It isn’t “good” sufficient to recollect the LEFT JOIN semantics and do The Proper Factor, which is produce an empty checklist of FilmRecord in case an actor doesn’t have any movies. If that’s the case, then there’s a NULL movie within the consequence set in SQL:

+--------+----------+---------+-------+---------------------+
|actor_id|first_name|last_name|film_id|title                |
+--------+----------+---------+-------+---------------------+
|       1|PENELOPE  |GUINESS  |      1|ACADEMY DINOSAUR     |
|       1|PENELOPE  |GUINESS  |     23|ANACONDA CONFESSIONS |
|       1|PENELOPE  |GUINESS  |     25|ANGELS LIFE          |
|       1|PENELOPE  |GUINESS  |    106|BULWORTH COMMANDMENTS|
|       1|PENELOPE  |GUINESS  |    140|CHEAPER CLYDE        |
|     ...|...       |...      |    ...|...                  |
|     201|UNKNOWN   |ACTOR    | {null}|{null}               |
+--------+----------+---------+-------+---------------------+

So, we gained’t get an empty checklist, however an inventory containing an empty FilmRecord, identical to once you GROUP BY actor and COUNT(*) the variety of movies:

var r =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        rely(),
        rely(FILM.FILM_ID))
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .groupBy(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME)
   .orderBy(
       ACTOR.ACTOR_ID)
   .fetch();

The results of this question may seem like this:

+--------+----------+------------+-----+-----+
|actor_id|first_name|last_name   |rely|rely|
+--------+----------+------------+-----+-----+
|       1|PENELOPE  |GUINESS     |   19|   19|
|       2|NICK      |WAHLBERG    |   25|   25|
|       3|ED        |CHASE       |   22|   22|
|       4|JENNIFER  |DAVIS       |   22|   22|
|       5|JOHNNY    |LOLLOBRIGIDA|   29|   29|
|     ...|...       |...         |  ...|  ...|
|     201|UNKNOWN   |ACTOR       |    1|    0|
+--------+----------+------------+-----+-----+

Observe how the specified rely worth of 0 can solely be achieved once we cross the nullable FILM.FILM_ID column as an argument. So, what can be the equal fetchGroups() name that exposes this behaviour?

Deduplicating and nesting the collections with JDK Collectors

A really a lot underestimated JDK characteristic are Collectors. Whereas they had been launched particularly for utilization with the Stream API, they may very well be used with any sort of Iterable, in precept, and I’m nonetheless hoping {that a} future JDK will supply Iterable.gather(), amongst different issues.

With jOOQ, you possibly can gather the outcomes of any question by calling ResultQuery.gather(). To translate the above fetchGroups() instance, we will write this, producing virtually the identical consequence:

Map<ActorRecord, Checklist<FilmRecord>> consequence =
ctx.choose(
        ACTOR.ACTOR_ID,
        ACTOR.FIRST_NAME,
        ACTOR.LAST_NAME,
        FILM.FILM_ID,
        FILM.TITLE)
   .from(ACTOR)
   .leftJoin(FILM_ACTOR).on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .leftJoin(FILM).on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .orderBy(
       ACTOR.ACTOR_ID,
       FILM.FILM_ID)
   .gather(groupingBy(
       r -> r.into(ACTOR), filtering(
           r -> r.get(FILM.FILM_ID) != null, mapping(
               r -> r.into(FILM), toList()
           )
       )
   ));

The above gather() name nests these operations:

  • It teams by actor (identical to fetchGroups())
  • It filters group contents by these movies whose ID isn’t NULL (this could’t be achieved with fetchGroups()).
  • It maps the group contents to include solely FILM content material, not your entire projection.

So, clearly extra verbose, but additionally far more poweful and simple to introduce customized consumer facet aggregation behaviour, in case you possibly can’t transfer the aggregation logic to your SQL assertion.

Extra on highly effective collectors on this article right here:

Nesting collections instantly in SQL

No article on this weblog can be full with out plugging the superior MULTISET various to nest collections. In spite of everything, the above deduplication algorithm solely actually works for those who’re becoming a member of down a single parent-child relationship path, and it’s fairly inefficient when there are numerous duplicate information units.

Assuming these auxiliary information varieties:

file Movie(String title) {}
file Actor(String firstName, String lastName) {}
file Class(String title) {}

You’ll be able to write a question like this:

// We're importing the brand new Information::mapping technique for comfort
import static org.jooq.Information.mapping;

End result<Record3<Movie, Checklist<Actor>, Checklist<Class>>> consequence = ctx
    .choose(
        FILM.TITLE.convertFrom(Movie::new),
        multiset(
            choose(
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME
            )
            .from(FILM_ACTOR)
            .the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).convertFrom(r -> r.map(mapping(Actor::new))),
        multiset(
            choose(FILM_CATEGORY.class().NAME)
            .from(FILM_CATEGORY)
            .the place(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).convertFrom(r -> r.map(mapping(Class::new)))
    )
    .from(FILM)
    .orderBy(FILM.TITLE)
    .fetch();

Discover that we’re now getting a consequence set containing

  • All of the movies
  • The actors per movie as a nested assortment
  • The classes per movie as one other nested assortment

The instance is utilizing implicit joins to keep away from a few of the extra verbose be a part of syntaxes inside the MULTISET expressions, however that’s not strictly related for this instance.

The consequence being one thing like:

+----------------------------+--------------------------------------------------+----------------------------+
|title                       |multiset                                          |multiset                    |
+----------------------------+--------------------------------------------------+----------------------------+
|Movie[title=ACADEMY DINOSAUR]|[Actor[firstName=PENELOPE, lastName=GUINESS], A...|[Category[name=Documentary]]|
|Movie[title=ACE GOLDFINGER]  |[Actor[firstName=BOB, lastName=FAWCETT], Actor[...|[Category[name=Horror]]     |
|Movie[title=ADAPTATION HOLES]|[Actor[firstName=NICK, lastName=WAHLBERG], Acto...|[Category[name=Documentary]]|
|Movie[title=AFFAIR PREJUDICE]|[Actor[firstName=JODIE, lastName=DEGENERES], Ac...|[Category[name=Horror]]     |
|Movie[title=AFRICAN EGG]     |[Actor[firstName=GARY, lastName=PHOENIX], Actor...|[Category[name=Family]]     |
+----------------------------+--------------------------------------------------+----------------------------+

Conclusion

There are lots of ways in which result in Rome. Basic SQL primarily based approaches to nest collections used some form of deduplication method within the consumer. In jOOQ, you would at all times do that with fetchGroups(), and since just lately additionally with gather() instantly (gather() was at all times accessible by way of an middleman stream() name).

For actually highly effective nesting of collections, nonetheless, our advice is at all times to maneuver your logic into SQL instantly, utilizing native array performance, or utilizing MULTISET

[ad_2]

Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here