I'm struggling doing aggregations on a JSONB field in a PostgreSQL database. This is probably easier explained with an example so if create and populate a table called analysis with 2 columns (id and analysis) as follows: -
create table analysis (
id serial primary key,
analysis jsonb
);
insert into analysis
(id, analysis) values
(1, '{"category" : "news", "results" : [1, 2, 3, 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, null, null]}'),
(2, '{"category" : "news", "results" : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, null, 26]}'),
(3, '{"category" : "news", "results" : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]}'),
(4, '{"category" : "sport", "results" : [51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66]}'),
(5, '{"category" : "sport", "results" : [71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86]}'),
(6, '{"category" : "weather", "results" : [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106]}');
As you can see the analysis JSONB field always contains 2 attributes category and results. The results attribute will always contain an fixed length array of size 16. I've used various functions such as jsonb_array_elements but what I'm trying to do is the following: -
- Group by analysis->'category'
- Average of each array element
When I want is a statement to return 3 rows grouped by category (i.e. news, sport and weather) and a 16 fixed length array containing averages. To further complicate things, if there are nulls in the array then we should ignore them (i.e. we are not simply summing and averaging by the number of rows). The result should look something like the following: -
category | analysis_average
-----------+--------------------------------------------------------------------------------------------------------------
"news" | [14.33, 15.33, 16.33, 17.33, 18.33, 19.33, 20.33, 21.33, 22.33, 23.33, 24.33, 25.33, 26.33, 27.33, 45, 36]
"sport" | [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76]
"weather" | [91, 92, 93, 94, 95, 96, 97, 98, 99, 00, 101, 102, 103, 104, 105, 106]
NOTE: Notice the 45 and 36 in the last 2 array itmes on the 1st row which illustrates ignoring the nullss.
I had considered creating a view which exploded the array into 16 columns i.e.
create view analysis_view as
select a.*,
(a.analysis->'results'->>0)::int as result0,
(a.analysis->'results'->>1)::int as result1
/* ... etc for all 16 array entries .. */
from analysis a;
This seems extremely inelegant to me and removes the advantages of using an array in the first place but could probably hack something together using that approach.
Any pointers or tips will be most appreciated!
Also performance is really important here so the higher the performance the better!