Interpretation of app usage data

Hello everyone,

I have a question about tracking participants’ app use. When I download the data, I have a variable “fg_time_ms” which is participants’ app usage on the foreground of their mobile phone. When I inspect the data, there is great overlap between the inspected values. For instance, when I sort “fg_time_ms” there seem to be fixed values - e.g., people spent 0, 1,1559ms, 5,421ms, and so forth, on WhatsApp. My question: how is it possible that there is such a great overlap in these values? Do they need to be aggregated over time? Ethica%20forum%20-%20Data%20example

Hey @timverbeij

I assume this data belong to your own account, correct? Can you tell me a range of time which you have seen this behaviour?

Thanks,
Mohammad

Hey @m.hashemian,

That is correct.

This repitition occurs on all days data was tracked. If I understand the basics of tracking app usage correctly, “fg_time_ms” is the app usage on the foreground between “start_time” and “end_time”. I added another example, this time with headings to make it more clear. Ethica%20forum%20-%20Data%20example%202.

I see. I believe your interpretation of the data is incorrect. The way you should read each data record is as follow. Assume you have a data record like the following:

{
    "app_name": "org.telegram.messenger",
    "device_id": "60f5b2111f66cd12"
    "end_time": "2019-10-12T18:23:00.618Z",
    "foreground_time_ms": 4582164,
    "last_used": "2019-10-12T17:14:50.062Z"
    "record_time": "2019-10-12T18:27:42.911Z",
    "rel_record_time": "1970-01-10T05:15:20.391Z",
    "start_time": "2019-10-11T19:10:36.534Z",
    "study_id": YYY,
    "user_id": XXX,
}

You should read this record as:

User XXX in Study YYY, from “2019-10-11 19:10:36.534” to “2019-10-12 18:23:00.618”, used “org.telegram.messenger” app for 4,582,164 milliseconds (~76 minutes), and the last time it was used was “2019-10-12 18:23:00.618”.

Note that the way currently the data is collected there are a lot of duplicate data. They are in the database while don’t provide any additional information. For example, one says:

from “2019-10-11 19:10:36.534” to “2019-10-12 18:23:00.618” Telegram was used for 76 minutes

and the next record says

from “2019-10-11 19:10:36.534” to “2019-10-12 18:26:00.618” Telegram was used for 76 minutes.

The second record does not add much extra information, other than saying for 3 minutes from “2019-10-12 18:23:00.618” to “2019-10-12 18:26:00.618” Telegram was not used at all. In your data processing, you can omit this data duplication of course.

I suggest you check out this video. It goes into details of how to interpret the App Usage data.

Hope it helps,
Mohammad

@m.hashemian Many thanks, that answers my question!

Edit: so [last_used] - [fg_time_ms] should correspond to the time point where I opened the app, right?

Also, is it possible to give me a bit more insight into the duplicates? Why does Android collect these duplicates?

Thanks in advance.

so [last_used] - [fg_time_ms] should correspond to the time point where I opened the app, right?

Well, no. last_used is the last time the app was used. fg_time_ms is the time the app is been in foreground between the start_time and end_time, cumulative. So from the start_time to the end_time user might have opened and closed the app multiple times, and not only one time. So you cannot conclude that last_used - fg_time_ms results the app’s open time.

is it possible to give me a bit more insight into the duplicates? Why does Android collect these duplicates?

It’s related to the way Android provides such data. The system does not provide the individual data on app usage, rather it provides the aggregate usage data over a period of time. In order to increase the accuracy, Ethica app queries the shortest range possible (1 day) every 5 minutes, and sends all available data. This leads to lots of duplicates.

You can read more on how such data is provided in Android phones here (particularly check the queryUsageStats function).

Hope it helps,
Mohammad

Thanks a lot for your help @m.hashemian! Do you know the most straightforward way to retrieve the absolute fg_time_ms values (other than [fg_time_use/relevant index] - [fg_time_use/index prior to relevant index])?

I’m not sure if I understood your question, but I suggest a good way to process the data is the following:

Filter the data that have fg_time_ms = 0 and only keep the ones that have non-zero values. Then for each participant, for each app, pick the largest disjoint values based on start_time <-> end_time. For example, if there are three records like below:

{
    "start_time": "2019-10-11T19:10:36.534Z",
    "end_time":   "2019-10-12T18:23:00.618Z",
    "app_name": "org.telegram.messenger",
    "foreground_time_ms": 4582164,
    ...
}
{
    "start_time": "2019-10-11T19:10:36.534Z",
    "end_time":   "2019-10-12T16:20:00.618Z",
    "app_name": "org.telegram.messenger",
    "foreground_time_ms": 4582164,
    ...
}
{
    "start_time": "2019-10-11T19:10:36.534Z",
    "end_time":   "2019-10-12T19:00:00.618Z",
    "app_name": "org.telegram.messenger",
    "foreground_time_ms": 4582164,
    ...
}

Pick the one from 2019-10-11T19:10:36.534Z to 2019-10-12T19:00:00.618Z as that represents the largest period of time in comparison with the other two records.

If you filter the data as above, you will be left with a set of values which shows which app was used in which day, for how long.

Note that you can write your algorithm such that it uses the duplicated data to estimate at what time during the day the app was used, but that’s a bit more complicated. So I would start from what I described above.

Also, here I assume you are comfortable with writing this in some programming/scripting language like R or Python.

Hope this helps,
Mohammad