Sunday, December 03, 2023

SSA attempt: Duplicating a MATLAB example with Python

 


 The following codes refer to the MATLAB codes displayed in https://uk.mathworks.com/matlabcentral/fileexchange/58967-singular-spectrum-analysis-beginners-guide

 There are several algorithm mismatches between Python functions and MATLAB functions which cause the malfunctioning of generating the components as the same as what have been generated in MATLAB.

It has to admit that this is incomplete. However, this attempt provides the pathway to experiment of implementing Singular Spectrum Analysis (SSA) which is a relatively newly introduced time series analysis with Python.

SSA itself has so many variations that there does not seem to be any one solid method. There are several varying methods of SSA for each different programming language and each platform.

The following simulation is based on my synthetic data imitating the business cycle: 

# importing necessary tools
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import networkx as nx
import numpy as np
import pandas as pd
import random
import statistics
import scipy.linalg
from numpy import linalg
from scipy import stats
import statsmodels.formula.api as sm
import math
import cmath
from scipy.linalg import hankel

# Showing the plots bigger
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)

# Defining the function for the synthetic complex exponential  
def syn_exp(amp,freq,x):
    t=len(amp)
    y=[]
    for omg in range(len(x)):
        yi=0
        for ap in range(len(amp)):
            yi+=amp[ap]*cmath.exp(freq[ap]*x[omg])
        y.append(yi)
    return y

# Defining complex number
i=complex(0,1)
i

# Creating the synthetic data imitating the business cycle
phi = [0.03, 3*i, 10*i, -1+30*i, -1-20*i, 0.01+5*i];
alpha = [4, 3, 2, 2, 2, 1];

# sampling rate
S = 100

# number of exponentials in the signal
n = 600

# f_syn: samples used by Prony's methhod
_2n_1=(2*n-1)+1
omega = [2*math.pi/S*(_0__2n_1) for _0__2n_1 in range(_2n_1)]

# Defininig the synthetic time series
f_syn = syn_exp(alpha, phi, omega)

# Defining the noise
noise=[np.random.normal(0) for k in range(len(f_syn))]
noise=noise/linalg.norm(noise)

# eps: noise level, also try 10^(-0)
eps =  10^(-1);
#eps = 0

# f = f_syn added with random noise
f = f_syn + eps*noise;
f

# Extract only the real values of f
X=f.real
# Plotting x (real values of the synthetic time series)
plt.plot(X)
plt.title('Read value of the synthetic data')
plt.xlabel('Time')
plt.ylabel('Magnitude')
plt.show
# Redefining the time series length
N = n*2;

# The following codes refer to the MATLAB codes displayed in https://uk.mathworks.com/matlabcentral/fileexchange/58967-singular-spectrum-analysis-beginners-guide
print('The following codes refer to the MATLAB codes displayed in https://uk.mathworks.com/matlabcentral/fileexchange/58967-singular-spectrum-analysis-beginners-guide')

## Set general Parameters
M = 30;    # window length = embedding dimension

## Calculate covariance matrix (trajectory approach)
# An alternative approach is to determine C directly from the scalar product of Y,
# the time-delayed embedding of X. Although this estimation of C does not give a Toeplitz structure,
# with the eigenvectors not being symmetric or antisymmetric, it ensures a positive semi-definite
# covariance matrix.

Y=[]
for k in range(M):
    Y.append(X[k:N-M+k].tolist())
Y=np.array(Y)

# Creating the embedded covariance matrix
Cemb = np.dot(Y,(Y.transpose()))/(N-M+1)
img = plt.imshow(Cemb, origin='lower', cmap='jet', interpolation='nearest', aspect='auto')
plt.show()

## Choose covariance estimation
# Choose between Toeplitz approach (cf. Vautard & Ghil) and trajectory approach (cf. Broomhead & King).
C=Cemb

# Finding the eigenvalues (Lambda) and the eigenvectors (RHO)
[LAMBDA,RHO]=scipy.linalg.eig(C)
# Extracting the real parts of them
LAMBDA=LAMBDA.real
RHO=RHO.real

## Calculate principal components PC
# The principal components are given as the scalar product between Y,
# the time-delayed embedding of X, and the eigenvectors RHO

PC=[]
for j in range(M):
    pc=[]
    for k in range(len(Y[0])):
        pc.append(np.dot(Y[:,k],RHO[:,j]))
    PC.append(pc)
PC=np.array(PC)

# Plotting a graph comparison
RawSSA=PC[0]
for k in range(M-1):
    RawSSA=np.add(RawSSA,PC[k+1])
SmoothedSSA=PC[0]
for k in range(3-1):
    SmoothedSSA=np.add(SmoothedSSA,PC[k+1])
plt.subplot(2,1,1)
plt.title("Original vs. Reconstructed time series")
plt.plot(X[20:])
plt.plot(RawSSA/5)
plt.subplot(2,1,2)
plt.title("Reconstructed raw vs. smoothed time series")
plt.plot(RawSSA/5)
plt.plot(SmoothedSSA/5)