I’m not sure if I understood your question, but I suggest a good way to process the data is the following:
Filter the data that have fg_time_ms = 0
and only keep the ones that have non-zero values. Then for each participant, for each app, pick the largest disjoint values based on start_time <-> end_time
. For example, if there are three records like below:
{
"start_time": "2019-10-11T19:10:36.534Z",
"end_time": "2019-10-12T18:23:00.618Z",
"app_name": "org.telegram.messenger",
"foreground_time_ms": 4582164,
...
}
{
"start_time": "2019-10-11T19:10:36.534Z",
"end_time": "2019-10-12T16:20:00.618Z",
"app_name": "org.telegram.messenger",
"foreground_time_ms": 4582164,
...
}
{
"start_time": "2019-10-11T19:10:36.534Z",
"end_time": "2019-10-12T19:00:00.618Z",
"app_name": "org.telegram.messenger",
"foreground_time_ms": 4582164,
...
}
Pick the one from 2019-10-11T19:10:36.534Z
to 2019-10-12T19:00:00.618Z
as that represents the largest period of time in comparison with the other two records.
If you filter the data as above, you will be left with a set of values which shows which app was used in which day, for how long.
Note that you can write your algorithm such that it uses the duplicated data to estimate at what time during the day the app was used, but that’s a bit more complicated. So I would start from what I described above.
Also, here I assume you are comfortable with writing this in some programming/scripting language like R or Python.
Hope this helps,
Mohammad