Wednesday, August 14, 2024

Matrix Pencil (MP) method for financial time series estimate (Prony method for finance): EURO-DOLLAR from ECB

Update record

1. Originally posted on 23rd July 2024
2. Title change from "Estimates based on the Matrix Pencil (MP) method for financial dataset (Prony method for finance)" to "Matrix Pencil (MP) method for financial time series estimate (Prony method for finance)" on 14th August 2024
3. Amended "N=len(f)" to "N=len(samples)" and "y = f" to "y = samples" in the function "MP(samples,nterms)" on 14th August 2024
4. Added the mathematical statement sheet on 14th August 2024 
5. The dimension setting of Hankel matrix in the function plotting singular values (SVs) modified on 21 August 2024


This analysis is continued from Matrix Pencil (MP) methods for financial dataset (Prony method for finance). Last time, it demonstrated the full reconstruction. However, I have realised that it is not supposed to include the terms for the noisy terms (non-structural components). Matrix Pencil (MP) is essentially designed to only model the physical/structured components by filtering the rest noisy non-structured components.

The mathematical statement is as follows:



In order to detect the number of the terms representing the physical terms, Singular Value Decomposition (SVD) of the rectangular Hankel matrix is used. The singular values of the structural components are significantly higher than the non-structural counterparts. Furthermore, one set of the components is represented by the relatively identical singular values. Therefore, when the group changes from one to the other, there is a significant change from the previous values. 

The number of the terms (NTERMS) is determined with reference to this SVD process. These parameters retrieved by MP successfully reconstructs a smoothed time series of Euro-Dollar FX rate as shown in the line plot. In addition, limiting the NTERMS to only the physical terms enables to stably estimate future values. This example exhibits 156 data points of the future.

The following is my code implemented:

## Importing libraries and setting up graphs and complex number
# Execute this cell to have access to the API of Frankfurter App
import requests
import json
import ast

# importing necessary tools
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
import random
import statistics
import scipy.linalg
from numpy import linalg as LA
from scipy.stats import qmc
from scipy.optimize import newton
from scipy import stats
import statsmodels.formula.api as sm

import math
import cmath

from scipy.linalg import hankel

# Showing the plots bigger
plt.rcParams["figure.figsize"] = (20,10)

# defining imaginary number
i=complex(0,1)



## Downloading the dataset: Euro-Dollar weekly data from European Central Bank (ECB)

# Connecting to the API https://www.frankfurter.app/docs/ to get the EUR/USD rates
url = "http://api.frankfurter.app/1999-01-01.."
resp = requests.get(url,{'to':'USD'})
#print(resp.url)
#print(resp.status_code)
#print("Type:", type(resp.content))
#print(resp.content)

print("Type:", type(resp))

# Preparing for the right encoding
import ast
byte_str = resp.content
dict_str = byte_str.decode("UTF-8")
mydata = ast.literal_eval(dict_str)
print("Type:", type(mydata))

#mydata #show the accessed data
EURUSD=(mydata["rates"])
#EURUSD

# Sorting the data into the Python dictionary form
Dates_EURUSD=[]
Rates_EURUSD=[]

for key, value in EURUSD.items():
    Dates_EURUSD.append(key)
    for k,v in value.items():
        Rates_EURUSD.append(v)

EURUSD=dict(zip(Dates_EURUSD,Rates_EURUSD))
#EURUSD

# # Show the dataset plot
# Xticks=list(EURUSD.keys())
# f=list(EURUSD.values()) # Define the time series f
# xi = list(range(len(Xticks)))
# # plot the index for the x-values
# plt.plot(xi, f)
# plt.xlabel('Weeks')
# plt.ylabel('EUR/USD')
# plt.xticks(xi, Xticks, rotation=90)
# plt.title('Weekly EUR/USD FX rate')
# plt.legend()
# plt.show()



# Define basic parameters
f=list(EURUSD.values()) # Define the time series f
f_org=f.copy() # Saving the original data
n=len(f)



# Prony method
## Defininig the functions for the exponential analysis and reconstruction

# Definining the function of Matrix Pencil (MP) method,
# one of Prony-like methods.
def MP(samples,nterms):
    # Firstly creating a rectangular hankel matrix of the direct imput of time series
    n_samples=len(samples)
    ncols_value=1/3
    ncols=int(n_samples*ncols_value)
    nrows=int(n_samples+1-ncols)
    if nterms>math.floor(n_samples*ncols_value):
        if min(ncols,nrows)==ncols: #Increase the number of columns
            d=nterms-ncols
            ncols=ncols+d
            nrows=nrows-d
        elif min(ncols,nrows)==nrows: #Increase the number of rows
            d=nterms-nrows
            ncols=ncols-d
            nrows=nrows+d
    H=hankel(samples)[0:nrows,0:ncols]

    # Secondly, extracting the unitary matrix (left eigenvectors) of the hankel matrix
    SVD_H=LA.svd(H, full_matrices=True, compute_uv=True, hermitian=False)
    U=SVD_H[0]
    U0=U[0:len(U)-1,0:nterms]
    U1=U[1:len(U),0:nterms]

    # Generating (snall) omega, the periodic time-steps.
    n=math.floor(len(samples)/2)
    _2n_1=(2*n-1)+1
    M=1 # Sampling rate:
    #According to Shannon-Nyquist sampling theorem, the sampling rate must be twice of the maximum signal length.
    #However, M=1 the financial data not based on a particular signalling cycle model.
    #omega = [2*math.pi/M*(_0__2n_1) for _0__2n_1 in range(_2n_1)] # Signal processing time steps
    omega = np.arange(_2n_1) # using a simple discrete time steps   Updated 17 07 2024

    # Obtaining (big) Omega containing the frequencies and the time step
    # Generalised engenvalues are decomposed by eig(U0\U1)
    # https://fr.mathworks.com/help/matlab/ref/eig.html
    OMEGA= scipy.linalg.eig(np.dot(np.linalg.pinv(U0),(U1)))[0]
    # frequencies and damping factors

    # phi_rec = [cmath.log(OMEGA[k])*M/(2*math.pi) for k in range(len(OMEGA))] # if using the signal process time step
    phi_rec = [cmath.log(OMEGA[k]) for k in range(len(OMEGA))]   # if using a simple discrete time steps   Updated 17 07 2024

    # Retrieving alpha
    V=np.transpose(np.vander(OMEGA, N=len(samples), increasing=True))
    # https://stackoverflow.com/questions/11479064/multiple-linear-regression-in-python
    y = samples
    X = V
    coeffs = np.linalg.lstsq(X, y, rcond=None)[0]
    alpha_rec=coeffs
    alpha_rec=alpha_rec.tolist()

    #shorting alphas and phs according to absolute values of alphas
    abs_alpha=[abs(alpha_rec[k]) for k in range(len(alpha_rec))]
    alpha_rec = [abs_alpha for _,abs_alpha in sorted(zip(abs_alpha,alpha_rec), reverse=True)]
    phi_rec = [abs_alpha for _,abs_alpha in sorted(zip(abs_alpha,phi_rec), reverse=True)]

    return alpha_rec,phi_rec

# Function for generating time series based on alphas and phis
def Exp(amp,freq,samples):
    n=math.floor(len(samples)/2)
    y=[]
    # Generating x, the periodic time-steps.
    _2n_1=(2*n-1)+1
    omega = np.arange(_2n_1) # using a simple discrete time steps
    for omg in range(len(omega)):
        yi=0
        for ap in range(len(amp)):
            yi+=amp[ap]*cmath.exp(freq[ap]*omega[omg])
        y.append(yi)
    return y

# Displaying alphas and phis in a table
def PametersDisplay(AlphasPhis,nterms):
    Header=['alphas','phis']
    List=AlphasPhis
    Table=dict(zip(Header,List))
    panda_df = pd.DataFrame(data = Table)
    pd.set_option('display.max_rows',nterms)
    panda_df.head(nterms)
    display(panda_df)

# Plotting two lines
def plot2(f,f_hat,Title,Xticks):
    # generating the reconstructed data given alphas and phis
    #print(Title)
    xi = list(range(len(Xticks)))
    plt.plot(f, label="original")
    plt.plot(f_hat, label="reconstructed")
    plt.legend(loc="upper right")
    plt.title(Title)
    plt.xlabel('Weeks')
    plt.ylabel('EUR/USD')
    plt.xticks(xi, Xticks, rotation=90)
    plt.show

# Plotting two lines
def plot2_future(amp,freq,samples,NoWeeks,Title):
    # Function for generating time series based on alphas and phis
    N=len(samples)
    N_new=N+NoWeeks
    n=math.floor(N_new/2)
    y=[]
    # Generating x, the periodic time-steps.
    _2n_1=(2*n-1)+1
    omega = np.arange(_2n_1) # using a simple discrete time steps
    for omg in range(len(omega)):
        yi=0
        for ap in range(len(amp)):
            yi+=amp[ap]*cmath.exp(freq[ap]*omega[omg])
        y.append(yi)
    # generating the reconstructed data given alphas and phis
    #print(Title)
    plt.plot(samples, label="Original")
    plt.plot(y, label="Reconstructed")
    plt.legend(loc="upper right")
    plt.title(Title)
    plt.xlabel('Weeks')
    plt.ylabel('EUR/USD')
    plt.show
    return y



# plot logged Singular Value Decomposition (SVD) and the log difference
# and returns the odd number indices of the terms representing the physical components
# for a reference to check how many exponential components can be grouped together.
# ref. https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.hankel.html
def plot_lnSVD_logdiff(samples):
    # calculating logged SVs and their  log difference
N=math.floor(len(samples))
H=hankel(samples)[0:math.floor(N*2/3),0:math.floor(N*1/3)]
    SVD=LA.svd(H, full_matrices=True, compute_uv=True, hermitian=False)[1]
    ln_SVD=[math.log(SVD[k]) for k in range(len(SVD)) ]
    d_ln_SVD=[abs(ln_SVD[k+1]-ln_SVD[k]) for k in range(len(SVD)-1) ]
   
    # Plotting two line plots with each different scale
    fig, ax1 = plt.subplots()
   
    color = 'tab:blue'
    ax1.set_xlabel('time (s)')
    ax1.set_ylabel('exp', color=color)
    ax1.plot(ln_SVD, color=color)
    ax1.tick_params(axis='y', labelcolor=color)
   
    ax2 = ax1.twinx()  # instantiate a second Axes that shares the same x-axis
   
    color = 'tab:red'
    ax2.set_ylabel('sin', color=color)  # we already handled the x-label with ax1
    ax2.plot(d_ln_SVD, color=color)
    ax2.tick_params(axis='y', labelcolor=color)
   
    fig.tight_layout()  # otherwise the right y-label is slightly clipped
    plt.show()
   
    # Selecting the indices of the significant terms representing the physical components
    # The number of indices has to be odd as the first term with its index no 0 is the real term constant
    # and the rest complex terms are the pairs
    EnumSVD = [list(enumerate(d_ln_SVD))[k] for k in range(len(d_ln_SVD))]
    TermIndices=[]
    for k in range(math.floor(len(samples)/3*2/3)):
        if EnumSVD[k][1]>statistics.mean(d_ln_SVD)*3: # !! ** Select the threshold of selecting terms ** !!
            if EnumSVD[k][0]%2==0:
                TermIndices.append(EnumSVD[k][0]+1)
            else:
                TermIndices.append(EnumSVD[k][0])
    return list(set(TermIndices))
PhysicalTerms=plot_lnSVD_logdiff(f)
PhysicalTerms


## Set the number of terms (NTERMS)  to reconstruct
NTERMS=PhysicalTerms[-1] # <-- Set up NTERMS by refering the SVD

# the number of the reconstruction terms i.e. the amplitude (alpha) + the frequency (phi)
# Adjust it until actually working for the reconstruction
# The fully fitting reconstruction requires the integer slightly smaller than a half of the time series length
# You may set up any integer for NTERMS e.g. small enough to extract only the trend
# Set the lag parameter e.g.4 to make the number of the reconstruction

## Output
# Showing the MP analysis and reconstruction result
print(f'NTERMS={NTERMS}')
Parameters=MP(f,NTERMS)
# generating the reconstructed data given alphas and phis
alpha_rec, phi_rec = Parameters
f_hat=Exp(alpha_rec,phi_rec,f)
f_hat=np.array(f_hat).real # Extracting only the real number
f_hat=f_hat.tolist() # Converting back to list from numpy
EDkeys=list(EURUSD.keys())
Tittle=(f'Estimate based on Euro vs. US-Dollar FX rate from {EDkeys[0]} to {EDkeys[-1]}  Number of terms used {NTERMS}')
alpha_rec, phi_rec = Parameters
Future_Weeks=52*3 # ! ** Adjust **
f_future=plot2_future(alpha_rec, phi_rec,f,Future_Weeks,Tittle)



No comments: