This notebook illustrates the concept of frequency domain features in the context of activity recognition using accelerometer data. We'll focus on two key features:
We'll generate synthetic data for different activities (walking, running, cycling) and analyze their frequency domain characteristics. This approach can help in distinguishing between different types of periodic activities.
Let's start by generating some synthetic walking data and visualizing it in both time and frequency domains.
# Frequency Domain Features Illustration
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.fft import fft, fftfreq
from scipy.signal import find_peaks
# Set plot styles
plt.style.use('default') # Changed from 'seaborn' to 'default'
plt.rcParams['figure.figsize'] = [12, 8]
plt.rcParams.update({'font.size': 12})
def generate_walking_data(duration=10, sampling_rate=100):
"""
Generates synthetic accelerometer data for walking.
This function simulates walking data by combining sine waves at the fundamental walking frequency
and its harmonics, then adding some random noise to make it more realistic.
Parameters:
- duration: Length of the data in seconds
- sampling_rate: Number of samples per second
Returns:
- t: Time array
- walking_data: Synthetic accelerometer data
"""
t = np.linspace(0, duration, int(duration * sampling_rate), endpoint=False)
# Simulate walking frequency (approx. 2 Hz) and its harmonics
walking_freq = 2
signal = (np.sin(2 * np.pi * walking_freq * t) +
0.5 * np.sin(2 * np.pi * 2 * walking_freq * t) +
0.3 * np.sin(2 * np.pi * 3 * walking_freq * t))
# Add some noise
noise = np.random.normal(0, 0.2, len(t))
return t, signal + noise
t, walking_data = generate_walking_data()
# Plot the synthetic walking data
plt.figure(figsize=(12, 4))
plt.plot(t, walking_data)
plt.title('Synthetic Accelerometer Data for Walking')
plt.xlabel('Time (s)')
plt.ylabel('Acceleration')
plt.show()
The Fast Fourier Transform is a crucial algorithm that allows us to convert our time-domain signal into its frequency-domain representation. This transformation reveals the periodic components within our data, which are often not easily visible in the time domain.
This section demonstrates the symmetry in the Fast Fourier Transform (FFT) output and explains why we typically use only the positive frequency components. The FFT output for real-valued signals (like our walking data) exhibits symmetry around the zero frequency. The negative frequencies are mirror images of the positive frequencies. We typically use only the positive half because:
By focusing on the positive frequencies, we can more easily identify the dominant rhythmic components of our walking signal.
# 2.1 Perform FFT on the walking data (show mirror symmetry and why we knock off the negative)
N = len(t)
T = t[1] - t[0] # sampling interval
yf = fft(walking_data)
# Generate frequency array
xf = fftfreq(N, T)
# Plot
plt.figure(figsize=(12, 6))
plt.plot(xf, np.abs(yf)/N)
plt.title('FFT of Signal')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude')
# Highlight symmetry
plt.axvline(x=0, color='r', linestyle='--', label='Symmetry Axis')
plt.annotate('Positive Frequencies\n(Used for Analysis)', xy=(25, 0.3), xytext=(25, 0.4),
arrowprops=dict(facecolor='green', shrink=0.05))
plt.annotate('Negative Frequencies\n(Mirror of Positive)', xy=(-25, 0.3), xytext=(-25, 0.4),
arrowprops=dict(facecolor='blue', shrink=0.05))
# Add arrow to show mirroring
plt.annotate('', xy=(20, 0.3), xytext=(-20, 0.3),
arrowprops=dict(arrowstyle='<->', color='purple'))
plt.text(0, 0.32, 'Mirror', color='purple', ha='center')
plt.legend()
plt.grid(True)
plt.show()
This section illustrates the importance of using actual frequency values on the x-axis instead of FFT bin numbers. Plotting the FFT output against actual frequency values rather than bin numbers is useful because:
The actual frequency values are calculated using the fftfreq function, which takes into account the sampling rate of our original signal.
# 2.2 Perform FFT on the walking data (show why we plot actual frequency rather than bin value)
N = len(t)
T = t[1] - t[0] # sampling interval
yf = fft(walking_data)
xf = fftfreq(N, T)[:N//2]
# Plot using bin numbers
plt.figure(figsize=(12, 4))
plt.subplot(121)
plt.plot(np.abs(yf[:N//2]))
plt.title('FFT with Bin Numbers')
plt.xlabel('Bin Number')
plt.ylabel('Magnitude')
# Plot using actual frequencies
xf = fftfreq(N, T)[:N//2]
plt.subplot(122)
plt.plot(xf, np.abs(yf[:N//2]))
plt.title('FFT with Actual Frequencies')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude')
Text(0, 0.5, 'Magnitude')
This section shows how we normalize the FFT output. Normalizing the FFT output is needed for several reasons:
The choice of normalization method depends on the specific analysis goals. We normalize by the number of samples, N, here but normalizing to the maximum value is also sometimes useful for comparing relative strengths of frequency components.
# 2.3 Perform FFT on the walking data (show why we normalize)
N = len(t)
T = t[1] - t[0] # sampling interval
yf = fft(walking_data)
xf = fftfreq(N, T)[:N//2]
# Plot the frequency spectrum
plt.figure(figsize=(12, 4))
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.title('Frequency Spectrum of Walking Data')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Normalized Magnitude')
plt.xlim(0, 10) # Limit x-axis to 0-10 Hz for better visibility
plt.show()
Now that we have our data in the frequency domain, we can extract useful features. We'll focus on two key features: Dominant Frequency and Signal Energy.
# 3. Extract Dominant Frequency
def get_dominant_frequency(xf, yf):
"""
Extracts the dominant frequency from the FFT results.
This function finds the frequency with the highest magnitude in the frequency spectrum,
which represents the most significant rhythmic component of the activity.
Parameters:
- xf: Frequency array
- yf: FFT result array
Returns:
- dominant_freq: The frequency with the highest magnitude
"""
peaks, _ = find_peaks(np.abs(yf[:N//2]), height=0)
dominant_freq = xf[peaks[np.argmax(np.abs(yf[peaks]))]]
return dominant_freq
dominant_freq = get_dominant_frequency(xf, yf)
print(f"Dominant Frequency: {dominant_freq:.2f} Hz")
Dominant Frequency: 2.00 Hz
# 4. Calculate Signal Energy
def calculate_signal_energy(yf):
"""
Signal energy gives an overall measure of the intensity of variations in the accelerometer data.
It's calculated as the sum of the squared magnitudes of the FFT results.
Parameters:
- yf: FFT result array
Returns:
- signal_energy: The calculated signal energy
"""
return np.sum(np.abs(yf)**2) / len(yf)
signal_energy = calculate_signal_energy(yf)
print(f"Signal Energy: {signal_energy:.2f}")
Signal Energy: 713.15
Now that we have methods to generate data and extract frequency domain features, let's apply them to different activities and compare the results.
# 5. Generate and analyze data for different activities
def generate_activity_data(duration=10, sampling_rate=100, freq=2, noise_level=0.2):
"""
Generates synthetic data for different activities.
This function creates simple sinusoidal data with added noise to simulate different activities.
The frequency and noise level are adjusted to represent different types of activities.
Parameters:
- duration: Length of the data in seconds
- sampling_rate: Number of samples per second
- freq: The main frequency of the activity
- noise_level: The amount of random noise to add
Returns:
- t: Time array
- data: Synthetic activity data
"""
t = np.linspace(0, duration, int(duration * sampling_rate), endpoint=False)
signal = np.sin(2 * np.pi * freq * t)
noise = np.random.normal(0, noise_level, len(t))
return t, signal + noise
activities = {
'Walking': {'freq': 2, 'noise': 0.2},
'Running': {'freq': 3, 'noise': 0.3},
'Cycling': {'freq': 1.5, 'noise': 0.1}
}
results = {}
for activity, params in activities.items():
t, data = generate_activity_data(freq=params['freq'], noise_level=params['noise'])
yf = fft(data)
xf = fftfreq(len(t), t[1] - t[0])[:len(t)//2]
dominant_freq = get_dominant_frequency(xf, yf)
signal_energy = calculate_signal_energy(yf)
results[activity] = {
'Dominant Frequency': dominant_freq,
'Signal Energy': signal_energy
}
# Display results
results_df = pd.DataFrame(results).T
print(results_df)
# Plot frequency spectra for different activities
plt.figure(figsize=(12, 8))
for i, (activity, params) in enumerate(activities.items(), 1):
t, data = generate_activity_data(freq=params['freq'], noise_level=params['noise'])
yf = fft(data)
xf = fftfreq(len(t), t[1] - t[0])[:len(t)//2]
plt.subplot(3, 1, i)
plt.plot(xf, 2.0/len(t) * np.abs(yf[0:len(t)//2]))
plt.title(f'Frequency Spectrum of {activity}')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude')
plt.xlim(0, 10)
plt.tight_layout()
plt.show()
Dominant Frequency Signal Energy Walking 2.0 535.403622 Running 3.0 620.173365 Cycling 1.5 510.165708