If you need functionality that isn’t already included with mhealthtools, it will be necessary to incorporate your own code within the existing mhealthtools architecture. Fortunately, mhealthtools is written with modularity in mind, and using (or excluding) any part of the mhealthtools pipeline is easy – at least once you understand the underlying structure of the package. Whether it’s including a new statistical measure to be computed, a new activity, or even a completely new mobile sensor, it can be done without ever modifying the underlying codebase of mhealthtools.
There are three distinct levels of abstraction within mhealthtools: activity modules, sensor modules, and utility functions. The functions that do all the “real” work – taking a numerical input and producing a numerical output – exist as utilities. These utility functions are called by sensor modules (e.g., accelerometer, gyroscope, …). Sensor module functions are in turn called by activity modules. Thus, a heirarchy is established where higher level modules will call functions from lower level modules, but lower level modules will never call a function from a module that is more abstracted than itself.
In brief summary: Activity > Sensor > Utility
Examples of Activity modules:
- get_tremor_features
- get_tapping_features
Examples of Sensor modules:
- accelerometer_features
- gyroscope_features
Examples of Utility functions:
- mutate_detrend
- detrend
- extract_features
Although utilities exist at the bottom of the functional totem pole, they often serve very different purposes from one another. Functions like map_groups
or extract_features
are functional and abstract, and could exist in just about any codebase, whereas functions like integral
simply call a base R function in a way that is more interpretable for this package’s use cases. Some utility functions are hierarchical – mutate_integral
makes use of integral
, but accepts as input and outputs a dataframe with a schema optimized for kinematic sensor measurements, like those from accelerometer or gyroscope sensors. In short, utility functions are just a grab bag of functions that aren’t specific to any one activity or sensor module.
Sensor modules are where things start to get interesting. The implementation of sensor modules is based on a bare-bones feature extraction paradigm/algorithm:
You can write your own feature extraction functions to include in the “Extract” step of the above process. The package expects feature extraction functions to accept a numeric vector as input and output a one-row dataframe with features as columns. Here’s a simple example:
my_feature_extraction_function <- function(x) {
features <- data.frame(
mean = mean(x),
sd = sd(x),
max_value = max(x))
return(features)
}
accelerometer_features(
sensor_data = accelerometer_data,
detrend = TRUE,
funs = my_feature_extraction_function)
#> Joining, by = "axis"
#> $extracted_features
#> # A tibble: 3 x 5
#> # Groups: axis [3]
#> measurementType axis mean sd max_value
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 acceleration x -0.00000481 0.00590 0.0751
#> 2 acceleration y 0.0000179 0.00181 0.0132
#> 3 acceleration z -0.0000102 0.00471 0.0173
#>
#> $model_features
#> NULL
#>
#> $error
#> NULL
Notice that we used a transform function (detrend
) included with the package, but were also able to pass our custom feature extraction function. That’s the power of computing your custom features with mhealthtools
!
In some ways this paradigm is quite flexible – any numerical transformation can be applied to the raw sensor data, and any statistic can be computed on columns of the transformed result. But not all features you could plausibly want to compute fit well into this paradigm. Statistics that rely on complex combinations of their input variables, such as most machine learning models, are overly cumbersome to fit into the transform -> extract model. If you’d like to use these types of complex statistics in your feature extraction process, you can circumvent the extract portion of the transform -> extract pipeline by passing your own function(s) to the models
parameter of a sensor level feature function. The models
parameter consists of one or more functions which accept as input sensor_data
after the specified transformations have been applied to it and outputs whatever it wants.
my_model_one <- function(transformed_sensor_data) {
loess(acceleration ~ jerk, transformed_sensor_data)
}
my_model_two <- function(transformed_sensor_data) {
aov(acceleration ~ displacement, transformed_sensor_data)
}
accelerometer_features(
sensor_data = accelerometer_data,
detrend = TRUE,
derived_kinematics = TRUE,
models = list(loess = my_model_one, aov = my_model_two))
#> $extracted_features
#> NULL
#>
#> $model_features
#> $model_features$loess
#> Call:
#> loess(formula = acceleration ~ jerk, data = transformed_sensor_data)
#>
#> Number of Observations: 2970
#> Equivalent Number of Parameters: 6.54
#> Residual Standard Error: 0.004002
#>
#> $model_features$aov
#> Call:
#> aov(formula = acceleration ~ displacement, data = transformed_sensor_data)
#>
#> Terms:
#> displacement Residuals
#> Sum of Squares 0.00001729 0.05954186
#> Deg. of Freedom 1 2968
#>
#> Residual standard error: 0.004478981
#> Estimated effects may be unbalanced
#>
#>
#> $error
#> NULL
The above example is not particularly realistic, but you can imagine a neural net model trained on kinematic sensor data to output a dense feature embedding for downstream analysis might be more useful.
Activity modules are meant to group together all sensor modules that relate to a given activity. For example, the activity module get_walk_features
takes accelerometer, gyroscope, and gravity data as input, but is otherwise parameterized similarly to the sensor modules accelerometer_features
and gyroscope_features
. Within the function body, we perform some input validation, a call to accelerometer_features
and gyroscope_features
, the combining of their results into a single list object, and – if gravity sensor data was provided – the tagging of potential outlier windows. Activity modules allow us to extract features from all relevant sensors with a single function call and argument set.
Recall the feature extraction paradigm:
So far we’ve talked about two ways to modify the behavior of the Extract step – using either funs
or models
parameters. But what if we want to modify the Transform step?
For kinematic sensors, most users should be satisfied with the included transform options (detrending, time filtering, frequency filtering, windowing, IMF+windowing, derived kinematics). But for those still not satisfied, we need to throw away the user-friendly wrappers accelerometer_features
and gyroscope_features
and look under the hood at sensor_features
.
By design, there’s not much to see. Any functions passed to the transform
parameter of sensor_features
will be applied sequentially to sensor_data
. What makes this parameter useful is if (nearly) all functions informally agree to a standardized type of input and output. If this is the case, we can apply (nearly all) our transform functions in any order we want.
The “nearly” arises because – in the case of the transform options included with accelerometer_features
and gyroscope_features
– our transform functions accept input data in a tidy format, but these two user-friendly wrappers accept their input with a schema that’s more likely to conform with what is recorded by a mobile device (columns t
, x
, y
, z
, rather than a tidy schema t
, axis
and value
). And so one of our transform functions (tidy_sensor_data
) violates the tidy in / tidy out principle in order to put the data in a tidy format in the first place.
transform
parameter case studyAs a case study, let’s step through how accelerometer_features
creates a list of transform functions to pass to sensor_features
.
When we make this function call:
accelerometer_features(
sensor_data = accelerometer_data,
time_filter = c(2, 8),
detrend = TRUE,
frequency_filter = c(1,25))
An intermediary function (whose details are not important) will produce this list of functions (whose nitty gritty details are also not important) – which is passed to the transform
parameter of sensor_features
.
[[1]] # tidy_sensor_data
function (sensor_data)
{
if (has_error(sensor_data))
return(sensor_data)
if (any(is.na(sensor_data$t)))
stop("NA values present in column t.")
tidy_sensor_data <- tryCatch({
t0 <- sensor_data$t[1]
normalized_sensor_data <- sensor_data %>% dplyr::mutate(t = t -
t0)
index <- order(sensor_data$t)
tidy_sensor_data <- normalized_sensor_data[index, ] %>%
tidyr::gather(axis, value, -t) %>% dplyr::group_by(axis)
}, error = function(e) {
dplyr::tibble(error = "Could not put sensor data in tidy format by gathering the axes.")
})
return(tidy_sensor_data)
}
<environment: namespace:mhealthtools>
[[2]] # filter_time
<partialised>
function (...)
filter_time(t1 = time_filter[[1]], t2 = time_filter[[2]], ...)
[[3]] # mutate_detrend
function (sensor_data)
{
if (has_error(sensor_data))
return(sensor_data)
detrended_sensor_data <- tryCatch({
detrended_sensor_data <- sensor_data %>% dplyr::mutate(value = detrend(t,
value))
}, error = function(e) {
dplyr::tibble(error = "Detrend error")
})
return(detrended_sensor_data)
}
<environment: namespace:mhealthtools>
[[4]] # mutate_bandpass
<partialised>
function (...)
mutate_bandpass(window_length = 256, sampling_rate = sampling_rate,
frequency_range = frequency_filter, ...)
[[5]] # not a repurposeable function, we simply drop the `t` column
function (sensor_data)
{
if (has_error(sensor_data))
return(sensor_data)
sensor_data %>% dplyr::select(-t)
}
<bytecode: 0x1032f3e40>
<environment: 0x10def1950>
[[6]] # not a repurposable function, we rename the metric to match the sensor (acceleration, in this case)
function (transformed_sensor_data)
{
if (has_error(transformed_sensor_data))
return(transformed_sensor_data)
transformed_sensor_data <- transformed_sensor_data %>% dplyr::rename(`:=`(!(!metric),
value))
return(transformed_sensor_data)
}
<bytecode: 0x10b6cc808>
<environment: 0x10def1950>
What you should take away from this list is that sensor_features
will pass its sensor_data
argument through six functions to complete the transform and prepare our data for feature extraction.
In this case, these functions are:
list_index | function_name | triggered_by_parameter | input | output |
---|---|---|---|---|
1 | tidy_sensor_data | NA | dataframe with columns t,x,y,z | tidy dataframe |
2 | filter_time | time_filter = c(2,8) | tidy dataframe | tidy dataframe |
3 | mutate_detrend | detrend = TRUE | tidy dataframe | tidy dataframe |
4 | mutate_bandpass | frequency_filter = c(1,25) | tidy dataframe | tidy dataframe |
5 | NA | NA | dataframe with column t
|
tidy dataframe with columns axis, metric |
6 | NA | NA | dataframe with column metric
|
tidy dataframe with columns axis, acceleration |
Here, a “tidy dataframe” means a dataframe with columns t
, axis
, and metric
, since we are dealing with kinematic sensor data.
Notice that the first function tidy_sensor_data
serves to translate the raw accelerometer data (dataframe with columns t,x,y,z) into a more general tidy format.
The next three functions can then be applied in any order, or can have other functions that also input and output a tidy dataframe applied in between them. Because of the interchangeable input/output of these functions, you can include your own transform functions into a preexisting pipeline of transform functions – at least, as long as they agree to the informal rules that transform functions always input/output a tidy dataframe with the same schema.
The final two unnamed functions are purely procedural. We drop the t
column because we didn’t specify any windowing transformation (which normally drops the t
column) and we don’t need it to extract features. The very last function renames the column metric
, which we’ve used here to now so that functions like mutate_detrend
and mutate_bandpass
can be used agnostically on both accelerometer and gyroscope data.
Let’s look at a similar case, where we also window the sensor data.
accelerometer_features(
sensor_data = accelerometer_data,
time_filter = c(2, 8),
detrend = TRUE,
frequency_filter = c(1,25),
window_length = 256,
window_overlap = 0.5)
In this case, the list of functions passed to transform
looks like:
list_index | function_name | triggered_by_parameter | input | output |
---|---|---|---|---|
1 | tidy_sensor_data | NA | dataframe with columns t,x,y,z | tidy dataframe |
2 | filter_time | time_filter = c(2,8) | tidy dataframe | tidy dataframe |
3 | mutate_detrend | detrend = TRUE | tidy dataframe | tidy dataframe |
4 | mutate_bandpass | frequency_filter = c(1,25) | tidy dataframe | tidy dataframe |
5 | transformation_window | window_length = 256, window_overlap = 0.5 | tidy dataframe | tidy dataframe with columns axis, window, …, metric |
6 | NA | NA | dataframe with column metric
|
tidy dataframe with columns axis, window, …, acceleration |
Here, transformation_window
actually violates the tidy in / tidy out principle. The data is technically still tidy, but the function transformation_window
cannot be used interchangeably with the other transform functions because the columns have changed. The distinction here is that transformation_window
outputs its input data into a different metric space than it originally came from. Our input data came in indexed by t
and axis
, but came out indexed by axis
and window
.
[Aside: The function name transformation_window
(transformation_imf_window
behaves similarly) is a bit misleading, as a transformation in the strictest mathematical sense outputs its input into the original vector space. Within this package we use the “transformation” prefix for functions that take their inputs into a new vector space, behaving more as a mathematical function.]
The only other general rule to follow when writing transform functions is to keep your data grouped by its index. This way, when you apply an operation like detrend, you will apply the function to the relevant data. You wouldn’t want to detrend acceleration values measured along the x-axis by computing an average across all three axes, for example. Standard feature extraction functions will also extract features along these groups. So if you are performing a transform operation like windowing, you want to keep your data grouped by axis as well as window, otherwise features will be computed across windows.
What if something goes wrong in the feature extraction process? Maybe some input data is malformed or a transform function accidentally takes the logarithm of zero? Standard practice within mhealthtools is to do error-checking at the dataframe level – meaning if something goes wrong, it will be caught by a function that is meant to return a dataframe. Instead of returning a feature dataframe, or some transformation of its input, the function will return a dataframe with an “error” column that normally contains a string giving a description of the error. Consistency with this error handling behavior is important because these dataframe-level functions are often piped together, and if anything goes awry upstream, downstream functions know to act as identity functions until the result is returned to its original caller – where the result will be error-checked and returned in an appropriate fashion.
error |
---|
Detrend Error |
For example, if we are sequentially applying the functions in transform
to sensor_data
within sensor_features
and something goes wrong while detrending the data, detrend
will throw an exception, which gets passed to mutate_detrend
, our dataframe-level function. mutate_detrend
will catch the exception and return a dataframe with an error column containing the string “Detrend Error”. If the next function applied is transformation_window
, or mutate_bandpass
, or any other transform function, that function should first check the input for errors (using the function has_error
) and, upon finding one, immediately return its input. When the error dataframe is eventually returned to sensor_features
, it will be stored in a list with the same names you would expect from a non-errored process and returned all the way to the original caller – whether that be a sensor or activity module. This is especially important when batch processing large amounts of sensor data, where data processing errors are practically inevitable. Rather than crashing your workflow, mhealthtools will return something that can be handled upstream by the batch script.
We use the following feature extraction paradigm:
If you want to change the statistics computed by Extract, pass functions to the funs
parameter of a sensor or activity level module or the extract
parameter of sensor_features
. If you want to get rid of the default behavior of Extract entirely, pass one or more functions to the models
parameter of a sensor or activity level module.
If you want to change the behavior of Transform, either make use of the preexisting parameters and functions within the sensor and activity level modules, or pass a list of functions to the transform
parameter of sensor features
.
If you pass a list of functions to the transform
parameter of sensor_features
, each function should generally follow a few rules to make it as flexible and reusable as possible:
t
, axis
, and metric
. This will allow you to use your transform functions in any order you like, or omit them entirely.transformation_window
.