This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

Statistical analysis tool using the R language

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

Statistical analysis tool using the R language

Currently, applications have developed using the statistical analysis tool using the R language, Rstudio and Shiny web application. This application allows users to do fundamental statistical analysis by accessing the application online. A user-friendly application has been developed using the R language, which gives in and out information about the company. Different algorithms have been compared for the predictive model using R to check the accuracy between them in “Big data predictive analysis: Using R analytical tool”. Analysis of security is highly essential to detect and prevent attacks in the present and future. R language is applied in security analysis in “Big data security analysis approach using Computational Intelligence techniques in R for desktop users”.

Linear Models

The most rudimentary approach to modelling FTS is by assuming that they follow the random walk model. The random walk model can be simply expressed as a sum of a series of independent random variables. Mathematically:

𝑍=Σ𝑋𝑖 (1)

Where: 𝑋𝑖 is the random variable, 𝑛 is the number of variables

The model shown in equation (1) can be viewed as a particular case of a more general model. Indeed, the random walk model is equivalent to the linear Auto-Regressive (AR) model with a slope of 1. This model denoted 𝐴𝑅(𝑝), can be expressed mathematically as:

π‘₯Μ‚[𝑑]=Ξ£(𝛼𝑖 π‘₯π‘‘βˆ’π‘–)𝑝𝑖=1 +πœ–π‘‘ (2)

Where: π‘₯Μ‚[𝑑] is the predicted value of the variable π‘₯ in terms of time, 𝑝 is the model order parameter, 𝛼 is the model coefficient, π‘₯π‘‘βˆ’π‘– is the value of variable π‘₯ at time π‘‘βˆ’π‘–, πœ– in AWGN.

Equation (2) describes how the predicted value is derived from a regression using coefficients applied to past values summed with AWGN. Hence, (2) shows how the response variable of a previous period becomes a predictor for the next.

A set of equations were developed by U. Yule and G. Walker to provide quantitative methods for estimating parameters in AR models. This work was subsequently expanded upon by J. P. Burg, who provided an alternative approach albeit with different stability properties.

These AR models are often accompanied by another type of linear model, the Moving Average (MA) model. This model denoted 𝑀𝐴(π‘ž), can be written as:

π‘₯Μ‚[𝑑]=πœ‡+Ξ£(𝛽𝑖 πœ–π‘‘βˆ’π‘–)π‘žπ‘–=1+πœ–π‘‘ (3)

Where: πœ‡ is the mean, π‘ž is the model order parameter, and 𝛽 is the model coefficient.

In (3) as in (2) the predicted value of π‘₯ is being modelled from past data by effectively using an impulse response filter. However, in (3) unlike in (2), the mean (often assumed to be 0) and coefficients applied to past noise terms are used instead of using past values directly.

These two linear models can be combined into the Autoregressive Moving Average (ARMA) model. This new model denoted 𝐴𝑅𝑀𝐴(𝑝,π‘ž), can be written as:

π‘₯Μ‚[𝑑]=Ξ£(𝛼𝑖 π‘₯π‘‘βˆ’π‘–)𝑝𝑖=1+Ξ£(𝛽𝑖 πœ–π‘‘βˆ’π‘–)π‘žπ‘–=1+πœ–π‘‘ (4)

Equation (4) can be easily derived from the linear sum of (2) and (3). By combining (2) and (3), the ARMA model can use both auto-regression and moving averages to provide its predicted value.

However, a fundamental limitation of AR, MA and ARMA models is that they all assume the process being modelled is stationary. Stationary is a property of methods whereby the probability distribution remains constant over time; thus, variance also remains constant. Indeed, this assumption is significant as FTS are often non-stationary processes. This can be seen by observing that variance changes from one period to the next, as will be demonstrated in detail in later sections. Therefore, this model’s accuracy will suffer, highlighting the need to consider this problem of stationary.

This is done by generalising the ARMA model into the Autoregressive Integrated Moving Average (ARIMA) model [11]. The ARIMA model solves the issue of non-stationarity by exploiting the concept of returns (or degrees of differencing). Non-stationary series can, therefore, be made stationary by differencing. Returns (π‘Ÿ) and consequently the 𝐴𝑅𝐼𝑀𝐴(𝑝,𝑑,π‘ž) model are given by:

π‘Ÿ= π‘₯π‘‘βˆ’π‘₯π‘‘βˆ’1 (5)

∴ π‘₯Μ‚[𝑑]=πœ‡+Ξ£(𝛼𝑝π‘₯π‘‘βˆ’π‘π‘π‘–=0)βˆ’Ξ£(π›½π‘žπœ–π‘‘βˆ’π‘ž)π‘žπ‘–=0 (6)

 

Where: 𝛽 is defined to have negative signs, following Box and Jenkins convention

As shown in (6), 𝐴𝑅𝐼𝑀𝐴(𝑝,𝑑,π‘ž) differs from the 𝐴𝑅𝑀𝐴(𝑝,π‘ž) model through its use of an added derivative parameter 𝑑. This parameter is used to identify how many times the data has been differenced, as shown in (5).

It is worth noting that stationary ARMA processes are, by construction, ergodic. Ergodicity is a property of processes whereby given sufficient time, samples are statistically representative of the whole.

 

Nonlinear Models

The aforementioned linear models all suffer from the assumption that FTS is homoscedastic processes. This means that these linear models assume that the variance in FTS remains constant over time.

This is indeed often a poor assumption to make, as shown in by R.F. Engle. In Engle states that by using a more sophisticated model such as the Auto-Regressive Conditional Heteroscedasticity (ARCH) model, the homoscedastic assumption can be avoided. This 𝐴𝑅𝐢𝐻(𝑝) model describes the variance of a time series as follows:

πœŽπ‘‘2=πœ”+Σ𝛼𝑖 πœ–π‘‘βˆ’π‘–2π‘žπ‘–=1 (7)

Where: 𝛼 is the ARCH model parameter

Equation (7) shows that ARCH originated from Engle’s realisation that there is a high correlation in squared returns. Indeed, here the ARCH model is expressed as a linear function of past squared disturbances. For this reason, ARCH is often likened to a finite impulse response filter (FIR).

This ARCH model was later described by Bollerslev in as a special case of a more generalised model called the Generalised Auto-Regressive Conditional Heteroscedasticity (GARCH) model. This 𝐺𝐴𝑅𝐢𝐻(𝑝,π‘ž) model describes variance as follows:

πœŽπ‘‘2=πœ”+Ξ£(𝛼𝑖 πœ–π‘‘βˆ’π‘–2)π‘žπ‘–=1 +Ξ£(𝛽𝑖 πœŽπ‘‘βˆ’π‘–2) 𝑝𝑖= (8)

Where: 𝛼 and 𝛽 are the GARCH model parameters

By comparing equations (4) and (8), it can be noted that GARCH is using the ARMA model to express the error variance of the time series. For this reason, GARCH is often liked to an infinite response filter (IIR). Many more variants of the GARCH model have been published since its original publication in 1986. These include NAGARCH (nonlinear asymmetric GARCH) [16], EGARCH (exponential GARCH), GJR-GARCH (Glosten-Jagannathan-Runkle GARCH) and many others. These GARCH derivatives are often nested under Hentschel’s fGARCH (Family GARCH) model [19], but these all lie outside the scope of this report.

In the same time as the ARCH and GARCH models, J. Leontaris and S. A. Billings published an alternative in known as is the Nonlinear Autoregressive Moving Average model with exogenous inputs (NARMAX). This work, building on their own previous work on ARMAX models, demonstrated that NARMAX models could successfully be applied to model complex time series. This model can be expressed mathematically as:

𝑦(π‘˜)=𝐹[𝑦(π‘˜βˆ’1),…𝑦(π‘˜βˆ’π‘›π‘¦),𝑒(π‘˜βˆ’1),…𝑒(π‘˜βˆ’π‘›π‘’),𝑒(π‘˜βˆ’1),…𝑒(π‘˜βˆ’π‘›π‘’)]+π‘’π‘˜ (9)

Where: 𝑦(𝑑) is the system output prediction and 𝑒(𝑑) is the system input, 𝐹[βˆ™] is some nonlinear function, 𝑒 is noise and 𝑛𝑦/𝑒/𝑒 is the maximum lag for system output/input/noise

From (9) we can see that NARMAX is, at its core, a function of lags of system input, output and noise (modelled explicitly). While equation (9) does not contain all the detail of NARMAX models, it is about as far as one can go in terms of specifying a general finite nonlinear system. NARMAX models often implemented in varying types of networks, but further mathematical detail lies beyond the scope of this report.

Blockchain-based

A major advantage of crypto-currencies is that block-chains contain a huge amount of publicly downloadable data for use in trading algorithms. In addition, unlike traditional exchange markets with open and close times, cryptocurrency exchanges do not close. This means the data collected has the advantage of being truly continuous. One of the largest of these exchanges by trading volume is Poloniex. Large trading volumes are preferable for FTS modelling as they allow constantly changing prices even at very high frequency which enables the deployment of high-frequency algorithms. This significant advantage, combined with their excellent API capabilities, constituted the primary motivation behind choosing Poloniex for this thesis.

The application of deep learning algorithms for crypto-currency forecasting is thus far minimal. Indeed, there is currently no direct competitor who has attempted prediction of DASH using deep learning, and most other competitors focus on BTC forecasting. Amongst these BTC competitors, most attempts not published in academic journals [34] – [36], with only two competitors found to be peer-reviewed and published in academia [37], [38]. Of the unpublished competitors cited, [34] achieves the best performance, which ranges from 50% to 55%. All three attempts conclude that their findings warrant further investigation into applications of deep learning to block-chain FTS. Amongst the published competitors, Greaves and Au in [37] use a 2-hidden-layer DNN and achieve a 55% prediction accuracy without heavy tuning. Similarly, Jang and Lee in [38] use a Bayesian Neural Network (BNN) and achieve accuracies.

 

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn πŸ‘‹

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask