Classifying physical activity from smartphone data -

Introduction

WhatsApp Group Join Now

Telegram Group Join Now

Instagram Group Join Now

In this post we'll explain how to use smartphone accelerometer and gyroscope data to estimate the physical activity of people carrying the phone. Data used in this post comes from Smartphone-based human activity recognition and postural transition datasets Distributed by University of California, Irvine. Thirty individuals were tasked with performing various basic activities with an attached smartphone recording movement using an accelerometer and gyroscope.

Before we begin, let's load the various libraries we'll use in the analysis:


library(keras)     # Neural Networks
library(tidyverse) # Data cleaning / Visualization
library(knitr)     # Table printing
library(rmarkdown) # Misc. output utilities 
library(ggridges)  # Visualization

Activities dataset

Comes from the data used in this post. Smartphone-based human activity recognition and postural transition datasets(Reyes-Ortiz et al. 2016) Distributed by University of California, Irvine.

When downloaded from the link above, the data consists of two different 'parts'. one that is pre-processed using various feature extraction techniques such as Fast Fourier Transform, and the other RawData The section that only provides the raw X,Y,Z direction of the accelerometer and gyroscope. None of the standard noise filtering or feature extraction used in accelerometer data is applied. This is the data set we will use.

The motivation for working with raw data in this post is to help transfer code/concepts to less well-characterized domains in time series data. Although a more accurate model can be created using the filtered/cleaned data provided, the filtering and transformation can vary greatly from task to task. Requires a lot of manual effort and domain knowledge. One of the beautiful things about deep learning is that feature extraction is learned from data, not out of knowledge.

Activity labels

The data contains numeric encodings for activities that are not critical to the model, but are helpful for visual use. Let's load them first.


activityLabels <- read.table("data/activity_labels.txt", 
                             col.names = c("number", "label")) 

activityLabels %>% kable(align = c("c", "l"))

1	Walk
2	WALKING_UPSTAIRS
3	WALKING_DOWNSTAIRS
4	sitting
5	standing
6	lay down
7	STAND_TO_SIT
8	SIT_TO_STAND
9	SIT_TO_LIE
10	LIE_TO_SIT
11	STAND_TO_LIE
12	LIE_TO_STAND

Next, we load in the label key. RawData. This file is a list of all observations, or individual activity recordings, in the data set. The key for the column is taken from the data. README.txt.


Column 1: experiment number ID, 
Column 2: user number ID, 
Column 3: activity number ID 
Column 4: Label start point 
Column 5: Label end point

The start and end points are in the number of samples of the signal log (recorded at 50hz).

Let's take a look at the first 50 rows:


labels <- read.table(
  "data/RawData/labels.txt",
  col.names = c("experiment", "userId", "activity", "startPos", "endPos")
)

labels %>% 
  head(50) %>% 
  paged_table()

File Names

Next, let's look at the actual user data files provided to us. RawData/


dataFiles <- list.files("data/RawData")
dataFiles %>% head()


[1] "acc_exp01_user01.txt" "acc_exp02_user01.txt"
[3] "acc_exp03_user02.txt" "acc_exp04_user02.txt"
[5] "acc_exp05_user03.txt" "acc_exp06_user03.txt"

There is a three-part filename scheme. The first part is the type of data in the file: either acc For accelerometer or gyro Next is the experiment number for the gyroscope, and the end user ID for recording. Let's load them into a data frame for ease of use later.


fileInfo <- data_frame(
  filePath = dataFiles
) %>%
  filter(filePath != "labels.txt") %>% 
  separate(filePath, sep = '_', 
           into = c("type", "experiment", "userId"), 
           remove = FALSE) %>% 
  mutate(
    experiment = str_remove(experiment, "exp"),
    userId = str_remove_all(userId, "user|\\.txt")
  ) %>% 
  spread(type, filePath)

fileInfo %>% head() %>% kable()

01	01	acc_exp01_user01.txt	gyro_exp01_user01.txt
02	01	acc_exp02_user01.txt	gyro_exp02_user01.txt
03	02	acc_exp03_user02.txt	gyro_exp03_user02.txt
04	02	acc_exp04_user02.txt	gyro_exp04_user02.txt
05	03	acc_exp05_user03.txt	gyro_exp05_user03.txt
06	03	acc_exp06_user03.txt	gyro_exp06_user03.txt

Reading and collecting data

Before we can do anything with the supplied data we need to get it into a model compatible format. This means we want to have a list of observations, their class (or activity label), and the data according to the recording.

To get it we will scan every recording file in it. dataFilessee what observations are in the recordings, extract those recordings and easily return everything to the model along with the data frame.


# Read contents of single file to a dataframe with accelerometer and gyro data.
readInData <- function(experiment, userId){
  genFilePath = function(type) {
    paste0("data/RawData/", type, "_exp",experiment, "_user", userId, ".txt")
  }  
  
  bind_cols(
    read.table(genFilePath("acc"), col.names = c("a_x", "a_y", "a_z")),
    read.table(genFilePath("gyro"), col.names = c("g_x", "g_y", "g_z"))
  )
}

# Function to read a given file and get the observations contained along
# with their classes.

loadFileData <- function(curExperiment, curUserId) {
  
  # load sensor data from file into dataframe
  allData <- readInData(curExperiment, curUserId)

  extractObservation <- function(startPos, endPos){
    allData[startPos:endPos,]
  }
  
  # get observation locations in this file from labels dataframe
  dataLabels <- labels %>% 
    filter(userId == as.integer(curUserId), 
           experiment == as.integer(curExperiment))
  

  # extract observations as dataframes and save as a column in dataframe.
  dataLabels %>% 
    mutate(
      data = map2(startPos, endPos, extractObservation)
    ) %>% 
    select(-startPos, -endPos)
}

# scan through all experiment and userId combos and gather data into a dataframe. 
allObservations <- map2_df(fileInfo$experiment, fileInfo$userId, loadFileData) %>% 
  right_join(activityLabels, by = c("activity" = "number")) %>% 
  rename(activityName = label)

# cache work. 
write_rds(allObservations, "allObservations.rds")
allObservations %>% dim()

Searching for data

Now that we have all the data loaded. experiment, userIdAnd activity labels, we can explore the data set.

Recording length

Let's first look at recording length by activity.


allObservations %>% 
  mutate(recording_length = map_int(data,nrow)) %>% 
  ggplot(aes(x = recording_length, y = activityName)) +
  geom_density_ridges(alpha = 0.8)

unnamed chunk 8 1

The reality is that there is enough variation in recording length between different activity types to be a bit cautious about how to proceed. If we trained the model on each class simultaneously, we would have to pad all observations to the length of the longest, leaving a large majority of observations with a large proportion of their data simply padding-zero. will go Because of this, we will fit our model to only the largest 'group' of activities of observation duration, including STAND_TO_SIT, STAND_TO_LIE, SIT_TO_STAND, SIT_TO_LIE, LIE_TO_STANDAnd LIE_TO_SIT.

An interesting future direction would be to try to use another architecture such as an RNN that can handle variable length inputs and train it on all data. However, you run the risk of learning the model only if the observation is long, so it's probably one of the four longest classes that won't generalize to the scenario where you're running the model on a real-time stream of data. were .

Filtering activities

Based on our work from above, let's subset the data to only the activities of interest.


desiredActivities <- c(
  "STAND_TO_SIT", "SIT_TO_STAND", "SIT_TO_LIE", 
  "LIE_TO_SIT", "STAND_TO_LIE", "LIE_TO_STAND"  
)

filteredObservations <- allObservations %>% 
  filter(activityName %in% desiredActivities) %>% 
  mutate(observationId = 1:n())

filteredObservations %>% paged_table()

So after our aggressive data harvesting we will be left with a respectable amount of data that our model can learn on.

Training/testing distribution

Before we go any further into exploring the data for our model, in an effort to be as fair as possible with our performance measures, we need to split the data into a train and a test set. Since each user performed all the activities only once (except the one who performed only 10 out of 12 activities) userId We will make sure that our model looks exclusively at new people when we test it.


# get all users
userIds <- allObservations$userId %>% unique()

# randomly choose 24 (80% of 30 individuals) for training
set.seed(42) # seed for reproducibility
trainIds <- sample(userIds, size = 24)

# set the rest of the users to the testing set
testIds <- setdiff(userIds,trainIds)

# filter data. 
trainData <- filteredObservations %>% 
  filter(userId %in% trainIds)

testData <- filteredObservations %>% 
  filter(userId %in% testIds)

Imaginative activities

Now that we've pruned our data by removing activities and separating the test set, we can actually look at the data for each class to see if there's an immediately discernible pattern that our model can fit. can pick up

Let's first unpack our data from a data frame of one row per observation to a clean version of all observations.


unpackedObs <- 1:nrow(trainData) %>% 
  map_df(function(rowNum){
    dataRow <- trainData[rowNum, ]
    dataRow$data[[1]] %>% 
      mutate(
        activityName = dataRow$activityName, 
        observationId = dataRow$observationId,
        time = 1:n() )
  }) %>% 
  gather(reading, value, -time, -activityName, -observationId) %>% 
  separate(reading, into = c("type", "direction"), sep = "_") %>% 
  mutate(type = ifelse(type == "a", "acceleration", "gyro"))

Now that we have an unpacked collection of our observations, let's visualize them!


unpackedObs %>% 
  ggplot(aes(x = time, y = value, color = direction)) +
  geom_line(alpha = 0.2) +
  geom_smooth(se = FALSE, alpha = 0.7, size = 0.5) +
  facet_grid(type ~ activityName, scales = "free_y") +
  theme_minimal() +
  theme( axis.text.x = element_blank() )

unnamed chunk 12 1

So at least the data patterns in the accelerometer must emerge. One would imagine that there could be problems with differences between models. LIE_TO_SIT And LIE_TO_STAND As they have the same average profile. That's what it goes for SIT_TO_STAND And STAND_TO_SIT.

Pre-processing

Before we can train the neural network, we need to take a few steps to pre-process the data.

Padding observations

First we'll determine what length we'll pad (and trim) our continuum by finding what the 98th percentile length is. By not using the longest observation length this will help us avoid extra long outlier recordings that disrupt the padding.


padSize <- trainData$data %>% 
  map_int(nrow) %>% 
  quantile(p = 0.98) %>% 
  ceiling()
padSize


98% 
334

Now we just need to convert our list of observations into a matrix, then use superhandy. pad_sequences() works in Keras to pad all the observations and convert them into 3D tensors for us.


convertToTensor <- . %>% 
  map(as.matrix) %>% 
  pad_sequences(maxlen = padSize)

trainObs <- trainData$data %>% convertToTensor()
testObs <- testData$data %>% convertToTensor()
  
dim(trainObs)


[1] 286 334   6

Amazing, we now have data in a nice neural network friendly format with 3D tensor dimensions. (, , ).

A hot encoding

One last thing we need to do before training our model, and that is to convert our observation classes from integers to one-hot, or dummy-encoded, vectors. Fortunately, once again Keras provides us with a very helpful function to do this.


oneHotClasses <- . %>% 
  {. - 7} %>%        # bring integers down to 0-6 from 7-12
  to_categorical() # One-hot encode

trainY <- trainData$activity %>% oneHotClasses()
testY <- testData$activity %>% oneHotClasses()

Modeling

architecture

Since we have temporally dense time series data we will use 1D convolutional layers. With temporally dense data, an RNN has to learn very long dependencies to capture the pattern, CNNs can stack just a few convolutional layers to represent a pattern of sufficient length. Since we are also looking for a single classification of activity for each observation, we can simply use pooling to 'summarize' the CNN view of the data into a dense layer.

Apart from two stacking layer_conv_1d() layers, we'll use the batch routine and dropout (Local type(Thompson et al. 2014) and on convolutional layers standard on dense) to regularize the network.


input_shape <- dim(trainObs)[-1]
num_classes <- dim(trainY)[2]

filters <- 24     # number of convolutional filters to learn
kernel_size <- 8  # how many time-steps each conv layer sees.
dense_size <- 48  # size of our penultimate dense layer. 

# Initialize model
model <- keras_model_sequential()
model %>% 
  layer_conv_1d(
    filters = filters,
    kernel_size = kernel_size, 
    input_shape = input_shape,
    padding = "valid", 
    activation = "relu"
  ) %>%
  layer_batch_normalization() %>%
  layer_spatial_dropout_1d(0.15) %>% 
  layer_conv_1d(
    filters = filters/2,
    kernel_size = kernel_size,
    activation = "relu",
  ) %>%
  # Apply average pooling:
  layer_global_average_pooling_1d() %>% 
  layer_batch_normalization() %>%
  layer_dropout(0.2) %>% 
  layer_dense(
    dense_size,
    activation = "relu"
  ) %>% 
  layer_batch_normalization() %>%
  layer_dropout(0.25) %>% 
  layer_dense(
    num_classes, 
    activation = "softmax",
    name = "dense_output"
  ) 

summary(model)


______________________________________________________________________
Layer (type)                   Output Shape                Param #    
======================================================================
conv1d_1 (Conv1D)              (None, 327, 24)             1176       
______________________________________________________________________
batch_normalization_1 (BatchNo (None, 327, 24)             96         
______________________________________________________________________
spatial_dropout1d_1 (SpatialDr (None, 327, 24)             0          
______________________________________________________________________
conv1d_2 (Conv1D)              (None, 320, 12)             2316       
______________________________________________________________________
global_average_pooling1d_1 (Gl (None, 12)                  0          
______________________________________________________________________
batch_normalization_2 (BatchNo (None, 12)                  48         
______________________________________________________________________
dropout_1 (Dropout)            (None, 12)                  0          
______________________________________________________________________
dense_1 (Dense)                (None, 48)                  624        
______________________________________________________________________
batch_normalization_3 (BatchNo (None, 48)                  192        
______________________________________________________________________
dropout_2 (Dropout)            (None, 48)                  0          
______________________________________________________________________
dense_output (Dense)           (None, 6)                   294        
======================================================================
Total params: 4,746
Trainable params: 4,578
Non-trainable params: 168
______________________________________________________________________

training

Now we can train the model using our test and training data. Note that we use callback_model_checkpoint() to ensure that we save only the best variations of the model (desirable because at some point during training the model may start to overfit or otherwise stop improving).


# Compile model
model %>% compile(
  loss = "categorical_crossentropy",
  optimizer = "rmsprop",
  metrics = "accuracy"
)

trainHistory <- model %>%
  fit(
    x = trainObs, y = trainY,
    epochs = 350,
    validation_data = list(testObs, testY),
    callbacks = list(
      callback_model_checkpoint("best_model.h5", 
                                save_best_only = TRUE)
    )
  )

unnamed chunk 20 1

The model is learning something! We get a respectable 94.4% accuracy on the validation data, not bad with six possible classes to choose from. Let's look a little deeper at the validation performance to see where the model is messing up.

appraisal

Now that we have a trained model let's investigate the errors it made in our testing data. We can load the best model from training based on validation accuracy and then look at each observation, what the model predicted, how high a probability it assigned, and the actual activity label.


# dataframe to get labels onto one-hot encoded prediction columns
oneHotToLabel <- activityLabels %>% 
  mutate(number = number - 7) %>% 
  filter(number >= 0) %>% 
  mutate(class = paste0("V",number + 1)) %>% 
  select(-number)

# Load our best model checkpoint
bestModel <- load_model_hdf5("best_model.h5")

tidyPredictionProbs <- bestModel %>% 
  predict(testObs) %>% 
  as_data_frame() %>% 
  mutate(obs = 1:n()) %>% 
  gather(class, prob, -obs) %>% 
  right_join(oneHotToLabel, by = "class")

predictionPerformance <- tidyPredictionProbs %>% 
  group_by(obs) %>% 
  summarise(
    highestProb = max(prob),
    predicted = label[prob == highestProb]
  ) %>% 
  mutate(
    truth = testData$activityName,
    correct = truth == predicted
  ) 

predictionPerformance %>% paged_table()

First, let's see how 'confident' the model was if the prediction was correct or not.


predictionPerformance %>% 
  mutate(result = ifelse(correct, 'Correct', 'Incorrect')) %>% 
  ggplot(aes(highestProb)) +
  geom_histogram(binwidth = 0.01) +
  geom_rug(alpha = 0.5) +
  facet_grid(result~.) +
  ggtitle("Probabilities associated with prediction by correctness")

unnamed chunk 22 1

Reassuringly, it appears that the model was, on average, less confident about its classification for incorrect outcomes than for correct outcomes. (However, the sample size is too small to say anything definitive.)

Let's see which activities the model had the hardest time using the confusion matrix.


predictionPerformance %>% 
  group_by(truth, predicted) %>% 
  summarise(count = n()) %>% 
  mutate(good = truth == predicted) %>% 
  ggplot(aes(x = truth,  y = predicted)) +
  geom_point(aes(size = count, color = good)) +
  geom_text(aes(label = count), 
            hjust = 0, vjust = 0, 
            nudge_x = 0.1, nudge_y = 0.1) + 
  guides(color = FALSE, size = FALSE) +
  theme_minimal()

unnamed chunk 23 1

We see that, as the initial hypothesis suggested, the model had little trouble distinguishing between them. LIE_TO_SIT And LIE_TO_STAND Classes, as well SIT_TO_LIE And STAND_TO_LIEwhich also have similar visual profiles.

Future directions.

The most obvious future direction to take this analysis would be to try to make the model more general by working with more types of provided activities. Another interesting direction would be to not separate the recordings into separate 'observations' but instead to keep them as a streaming set of data, as a real-world deployment of a model would do, and this See how well a model can perform in classification and detection of streaming data. Changes in activity.

Gal, Yarin, and Zubin Gharamani. 2016. “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning.” I International Conference on Machine Learning1050-9.

Graves, Alex. 2012. “Supervised Sequence Labeling.” I Supervised Sequence Labeling with Recurrent Neural Networks, 5–13. Springer.

Kononenko, Igor. 1989. “Bayesian Neural Networks.” Biological cybernetics 61 (5). Springer: 361–70.

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” The nature 521 (7553). Nature Publishing Group: 436.

Reyes-Ortiz, Jorge-L, Luca Oneto, Albert Samà, Xavier Parra, and Davide Anguita. 2016. “Transition-Aware Human Activity Detection Using Smartphones.” Neurocomputing 171. Elsevier: 754–67.

Thompson, Jonathan, Ross Goroshen, Arjun Jain, Yann Likon, and Christoph Bregler. 2014. “Efficient Object Localization Using Artificial Networks.” CoRR abs/1411.4280. http://arxiv.org/abs/1411.4280.

Post Views: 92