Extending mhealthtools

If you need functionality that isn’t already included with mhealthtools, it will be necessary to incorporate your own code within the existing mhealthtools architecture. Fortunately, mhealthtools is written with modularity in mind, and using (or excluding) any part of the mhealthtools pipeline is easy – at least once you understand the underlying structure of the package. Whether it’s including a new statistical measure to be computed, a new activity, or even a completely new mobile sensor, it can be done without ever modifying the underlying codebase of mhealthtools.

The mhealthtools architecture

There are three distinct levels of abstraction within mhealthtools: activity modules, sensor modules, and utility functions. The functions that do all the “real” work – taking a numerical input and producing a numerical output – exist as utilities. These utility functions are called by sensor modules (e.g., accelerometer, gyroscope, …). Sensor module functions are in turn called by activity modules. Thus, a heirarchy is established where higher level modules will call functions from lower level modules, but lower level modules will never call a function from a module that is more abstracted than itself.

In brief summary: Activity > Sensor > Utility

Examples of Activity modules:
- get_tremor_features
- get_tapping_features

Examples of Sensor modules:
- accelerometer_features
- gyroscope_features

Examples of Utility functions:
- mutate_detrend
- detrend
- extract_features

Utility functions

Although utilities exist at the bottom of the functional totem pole, they often serve very different purposes from one another. Functions like map_groups or extract_features are functional and abstract, and could exist in just about any codebase, whereas functions like integral simply call a base R function in a way that is more interpretable for this package’s use cases. Some utility functions are hierarchical – mutate_integral makes use of integral, but accepts as input and outputs a dataframe with a schema optimized for kinematic sensor measurements, like those from accelerometer or gyroscope sensors. In short, utility functions are just a grab bag of functions that aren’t specific to any one activity or sensor module.

Sensor modules

Sensor modules are where things start to get interesting. The implementation of sensor modules is based on a bare-bones feature extraction paradigm/algorithm:

Input: raw sensor data, in a standardized format.
Transform: raw sensor data (by, e.g., tidying, computing rates of change, windowing).
Extract: features by computing statistics upon individual columns, usually grouped on an index.
Return: statistics/features for each group.

You can write your own feature extraction functions to include in the “Extract” step of the above process. The package expects feature extraction functions to accept a numeric vector as input and output a one-row dataframe with features as columns. Here’s a simple example:

my_feature_extraction_function <- function(x) {
  features <- data.frame(
    mean = mean(x),
    sd = sd(x),
    max_value = max(x))
  return(features)
}

accelerometer_features(
  sensor_data = accelerometer_data,
  detrend = TRUE,
  funs = my_feature_extraction_function)
#> Joining, by = "axis"
#> $extracted_features
#> # A tibble: 3 x 5
#> # Groups:   axis [3]
#>   measurementType axis         mean      sd max_value
#>   <chr>           <chr>       <dbl>   <dbl>     <dbl>
#> 1 acceleration    x     -0.00000481 0.00590    0.0751
#> 2 acceleration    y      0.0000179  0.00181    0.0132
#> 3 acceleration    z     -0.0000102  0.00471    0.0173
#> 
#> $model_features
#> NULL
#> 
#> $error
#> NULL

Notice that we used a transform function (detrend) included with the package, but were also able to pass our custom feature extraction function. That’s the power of computing your custom features with mhealthtools!

In some ways this paradigm is quite flexible – any numerical transformation can be applied to the raw sensor data, and any statistic can be computed on columns of the transformed result. But not all features you could plausibly want to compute fit well into this paradigm. Statistics that rely on complex combinations of their input variables, such as most machine learning models, are overly cumbersome to fit into the transform -> extract model. If you’d like to use these types of complex statistics in your feature extraction process, you can circumvent the extract portion of the transform -> extract pipeline by passing your own function(s) to the models parameter of a sensor level feature function. The models parameter consists of one or more functions which accept as input sensor_data after the specified transformations have been applied to it and outputs whatever it wants.

my_model_one <- function(transformed_sensor_data) {
  loess(acceleration ~ jerk, transformed_sensor_data)
}

my_model_two <- function(transformed_sensor_data) {
  aov(acceleration ~ displacement, transformed_sensor_data)
}

accelerometer_features(
  sensor_data = accelerometer_data,
  detrend = TRUE,
  derived_kinematics = TRUE,
  models = list(loess = my_model_one, aov = my_model_two))
#> $extracted_features
#> NULL
#> 
#> $model_features
#> $model_features$loess
#> Call:
#> loess(formula = acceleration ~ jerk, data = transformed_sensor_data)
#> 
#> Number of Observations: 2970 
#> Equivalent Number of Parameters: 6.54 
#> Residual Standard Error: 0.004002 
#> 
#> $model_features$aov
#> Call:
#>    aov(formula = acceleration ~ displacement, data = transformed_sensor_data)
#> 
#> Terms:
#>                 displacement  Residuals
#> Sum of Squares    0.00001729 0.05954186
#> Deg. of Freedom            1       2968
#> 
#> Residual standard error: 0.004478981
#> Estimated effects may be unbalanced
#> 
#> 
#> $error
#> NULL

The above example is not particularly realistic, but you can imagine a neural net model trained on kinematic sensor data to output a dense feature embedding for downstream analysis might be more useful.

Activity modules

Activity modules are meant to group together all sensor modules that relate to a given activity. For example, the activity module get_walk_features takes accelerometer, gyroscope, and gravity data as input, but is otherwise parameterized similarly to the sensor modules accelerometer_features and gyroscope_features. Within the function body, we perform some input validation, a call to accelerometer_features and gyroscope_features, the combining of their results into a single list object, and – if gravity sensor data was provided – the tagging of potential outlier windows. Activity modules allow us to extract features from all relevant sensors with a single function call and argument set.

Advanced sensor functionality

Recall the feature extraction paradigm:

Input: raw sensor data, in a standardized format.
Transform: raw sensor data (by, e.g., tidying, computing rates of change, windowing).
Extract: features by computing statistics upon individual columns, usually grouped on an index.
Return: statistics/features for each group.

So far we’ve talked about two ways to modify the behavior of the Extract step – using either funs or models parameters. But what if we want to modify the Transform step?

For kinematic sensors, most users should be satisfied with the included transform options (detrending, time filtering, frequency filtering, windowing, IMF+windowing, derived kinematics). But for those still not satisfied, we need to throw away the user-friendly wrappers accelerometer_features and gyroscope_features and look under the hood at sensor_features.

By design, there’s not much to see. Any functions passed to the transform parameter of sensor_features will be applied sequentially to sensor_data. What makes this parameter useful is if (nearly) all functions informally agree to a standardized type of input and output. If this is the case, we can apply (nearly all) our transform functions in any order we want.

The “nearly” arises because – in the case of the transform options included with accelerometer_features and gyroscope_features – our transform functions accept input data in a tidy format, but these two user-friendly wrappers accept their input with a schema that’s more likely to conform with what is recorded by a mobile device (columns t, x, y, z, rather than a tidy schema t, axis and value). And so one of our transform functions (tidy_sensor_data) violates the tidy in / tidy out principle in order to put the data in a tidy format in the first place.

`transform` parameter case study

As a case study, let’s step through how accelerometer_features creates a list of transform functions to pass to sensor_features.

When we make this function call:

accelerometer_features(
  sensor_data = accelerometer_data,
  time_filter = c(2, 8),
  detrend = TRUE,
  frequency_filter = c(1,25))

An intermediary function (whose details are not important) will produce this list of functions (whose nitty gritty details are also not important) – which is passed to the transform parameter of sensor_features.

[[1]] # tidy_sensor_data
function (sensor_data) 
{
    if (has_error(sensor_data)) 
        return(sensor_data)
    if (any(is.na(sensor_data$t))) 
        stop("NA values present in column t.")
    tidy_sensor_data <- tryCatch({
        t0 <- sensor_data$t[1]
        normalized_sensor_data <- sensor_data %>% dplyr::mutate(t = t - 
            t0)
        index <- order(sensor_data$t)
        tidy_sensor_data <- normalized_sensor_data[index, ] %>% 
            tidyr::gather(axis, value, -t) %>% dplyr::group_by(axis)
    }, error = function(e) {
        dplyr::tibble(error = "Could not put sensor data in tidy format by gathering the axes.")
    })
    return(tidy_sensor_data)
}
<environment: namespace:mhealthtools>

[[2]] # filter_time
<partialised>
function (...) 
filter_time(t1 = time_filter[[1]], t2 = time_filter[[2]], ...)

[[3]] # mutate_detrend
function (sensor_data) 
{
    if (has_error(sensor_data)) 
        return(sensor_data)
    detrended_sensor_data <- tryCatch({
        detrended_sensor_data <- sensor_data %>% dplyr::mutate(value = detrend(t, 
            value))
    }, error = function(e) {
        dplyr::tibble(error = "Detrend error")
    })
    return(detrended_sensor_data)
}
<environment: namespace:mhealthtools>

[[4]] # mutate_bandpass
<partialised>
function (...) 
mutate_bandpass(window_length = 256, sampling_rate = sampling_rate, 
    frequency_range = frequency_filter, ...)

[[5]] # not a repurposeable function, we simply drop the `t` column
function (sensor_data) 
{
    if (has_error(sensor_data)) 
        return(sensor_data)
    sensor_data %>% dplyr::select(-t)
}
<bytecode: 0x1032f3e40>
<environment: 0x10def1950>

[[6]] # not a repurposable function, we rename the metric to match the sensor (acceleration, in this case)
function (transformed_sensor_data) 
{
    if (has_error(transformed_sensor_data)) 
        return(transformed_sensor_data)
    transformed_sensor_data <- transformed_sensor_data %>% dplyr::rename(`:=`(!(!metric), 
        value))
    return(transformed_sensor_data)
}
<bytecode: 0x10b6cc808>
<environment: 0x10def1950>

What you should take away from this list is that sensor_features will pass its sensor_data argument through six functions to complete the transform and prepare our data for feature extraction.

In this case, these functions are:

list_index	function_name	triggered_by_parameter	input	output
1	tidy_sensor_data	NA	dataframe with columns t,x,y,z	tidy dataframe
2	filter_time	time_filter = c(2,8)	tidy dataframe	tidy dataframe
3	mutate_detrend	detrend = TRUE	tidy dataframe	tidy dataframe
4	mutate_bandpass	frequency_filter = c(1,25)	tidy dataframe	tidy dataframe
5	NA	NA	dataframe with column `t`	tidy dataframe with columns axis, metric
6	NA	NA	dataframe with column `metric`	tidy dataframe with columns axis, acceleration

Here, a “tidy dataframe” means a dataframe with columns t, axis, and metric, since we are dealing with kinematic sensor data.

Notice that the first function tidy_sensor_data serves to translate the raw accelerometer data (dataframe with columns t,x,y,z) into a more general tidy format.

The next three functions can then be applied in any order, or can have other functions that also input and output a tidy dataframe applied in between them. Because of the interchangeable input/output of these functions, you can include your own transform functions into a preexisting pipeline of transform functions – at least, as long as they agree to the informal rules that transform functions always input/output a tidy dataframe with the same schema.

The final two unnamed functions are purely procedural. We drop the t column because we didn’t specify any windowing transformation (which normally drops the t column) and we don’t need it to extract features. The very last function renames the column metric, which we’ve used here to now so that functions like mutate_detrend and mutate_bandpass can be used agnostically on both accelerometer and gyroscope data.

Let’s look at a similar case, where we also window the sensor data.

accelerometer_features(
  sensor_data = accelerometer_data,
  time_filter = c(2, 8),
  detrend = TRUE,
  frequency_filter = c(1,25),
  window_length = 256,
  window_overlap = 0.5)

In this case, the list of functions passed to transform looks like:

list_index	function_name	triggered_by_parameter	input	output
1	tidy_sensor_data	NA	dataframe with columns t,x,y,z	tidy dataframe
2	filter_time	time_filter = c(2,8)	tidy dataframe	tidy dataframe
3	mutate_detrend	detrend = TRUE	tidy dataframe	tidy dataframe
4	mutate_bandpass	frequency_filter = c(1,25)	tidy dataframe	tidy dataframe
5	transformation_window	window_length = 256, window_overlap = 0.5	tidy dataframe	tidy dataframe with columns axis, window, …, metric
6	NA	NA	dataframe with column `metric`	tidy dataframe with columns axis, window, …, acceleration

Here, transformation_window actually violates the tidy in / tidy out principle. The data is technically still tidy, but the function transformation_window cannot be used interchangeably with the other transform functions because the columns have changed. The distinction here is that transformation_window outputs its input data into a different metric space than it originally came from. Our input data came in indexed by t and axis, but came out indexed by axis and window.

[Aside: The function name transformation_window (transformation_imf_window behaves similarly) is a bit misleading, as a transformation in the strictest mathematical sense outputs its input into the original vector space. Within this package we use the “transformation” prefix for functions that take their inputs into a new vector space, behaving more as a mathematical function.]

The only other general rule to follow when writing transform functions is to keep your data grouped by its index. This way, when you apply an operation like detrend, you will apply the function to the relevant data. You wouldn’t want to detrend acceleration values measured along the x-axis by computing an average across all three axes, for example. Standard feature extraction functions will also extract features along these groups. So if you are performing a transform operation like windowing, you want to keep your data grouped by axis as well as window, otherwise features will be computed across windows.

Error handling within transform functions

What if something goes wrong in the feature extraction process? Maybe some input data is malformed or a transform function accidentally takes the logarithm of zero? Standard practice within mhealthtools is to do error-checking at the dataframe level – meaning if something goes wrong, it will be caught by a function that is meant to return a dataframe. Instead of returning a feature dataframe, or some transformation of its input, the function will return a dataframe with an “error” column that normally contains a string giving a description of the error. Consistency with this error handling behavior is important because these dataframe-level functions are often piped together, and if anything goes awry upstream, downstream functions know to act as identity functions until the result is returned to its original caller – where the result will be error-checked and returned in an appropriate fashion.

error
Detrend Error

For example, if we are sequentially applying the functions in transform to sensor_data within sensor_features and something goes wrong while detrending the data, detrend will throw an exception, which gets passed to mutate_detrend, our dataframe-level function. mutate_detrend will catch the exception and return a dataframe with an error column containing the string “Detrend Error”. If the next function applied is transformation_window, or mutate_bandpass, or any other transform function, that function should first check the input for errors (using the function has_error) and, upon finding one, immediately return its input. When the error dataframe is eventually returned to sensor_features, it will be stored in a list with the same names you would expect from a non-errored process and returned all the way to the original caller – whether that be a sensor or activity module. This is especially important when batch processing large amounts of sensor data, where data processing errors are practically inevitable. Rather than crashing your workflow, mhealthtools will return something that can be handled upstream by the batch script.

In summary

We use the following feature extraction paradigm:

Input: raw sensor data, in a standardized format.
Transform: raw sensor data (by, e.g., tidying, computing rates of change, windowing).
Extract: features by computing statistics upon individual columns, usually grouped on an index.
Return: statistics/features for each group.

If you want to change the statistics computed by Extract, pass functions to the funs parameter of a sensor or activity level module or the extract parameter of sensor_features. If you want to get rid of the default behavior of Extract entirely, pass one or more functions to the models parameter of a sensor or activity level module.

If you want to change the behavior of Transform, either make use of the preexisting parameters and functions within the sensor and activity level modules, or pass a list of functions to the transform parameter of sensor features.

If you pass a list of functions to the transform parameter of sensor_features, each function should generally follow a few rules to make it as flexible and reusable as possible:

Before doing anything else, check if the input is an error dataframe and react appropriately (see the above section Error handling within transform functions)
Output the same data schema that was input. In the case of kinematics sensors, that schema contains columns t, axis, and metric. This will allow you to use your transform functions in any order you like, or omit them entirely.
Violate rule #2 whenever you need to perform an operation that takes your data into a new metric space, like transformation_window.
Always keep your data grouped by its index.

Phil Snyder

2020-02-26

The mhealthtools architecture

Utility functions

Sensor modules

Activity modules

Advanced sensor functionality

`transform` parameter case study

Error handling within transform functions

In summary

Contents

Extending mhealthtools

Phil Snyder

2020-02-26

The mhealthtools architecture

Utility functions

Sensor modules

Activity modules

Advanced sensor functionality

transform parameter case study

Error handling within transform functions

In summary

Contents

`transform` parameter case study