Business analytics and intelligence
Abstract
Business Intelligence application in any kind of business aids in bringing about a pool of advantages that steers up significant outcomes on investment. It simplifies the converting raw data into meaningful business intelligence and hence reliefs lots of tasks due to physical complexity by providing organizations the meaningful strength to transmute the data from several sources into correct, usable information that can be passed across securely throughout the organization. To add on, it enables the managers to make sound business decisions instantly by the means of query provision as well as providing reporting tools they need as well as sharing to the general public. The paper also aims at describing processes of building Business Intelligence (BI) systems.
Introduction
Business intelligence (BI) refers to a collection of applications and software that are used to analyze various aspects of data sets and presents it informs that enhance decision making. Further, (BI) has emerged from developing fundamental reports and tools used for the historical query, to include a lot of components such as forecasting, online analytical processing, predictive modeling, data management, data mining, and optimization. When BI is well contained with essential tools, business organizations as well as companies can successfully retrieve what is or is not working at present, discriminate what historical factors occurred to make it so, and freely identify future trends to maximize their potential.
Clustering algorithms
Clustering refers to grouping together data into k subgroups called clusters. During clustering, k should be less than or equal to n (population number). In multivariate data sets from fields such as marketing, clustering methods are used to identify groups with similar objects. The different types of clustering algorithms are hierarchical clustering algorithms, k-means, self-organizing maps (SOM) algorithm, and exceptional maximization (EM) clustering algorithm.
K-means algorithm
K-means is a dividing method, where objects are classified as belonging to one of the k groups. The number of clusters K is chosen prior. The multidimensional version of each cluster’s mean is calculated, and each object is assigned to a group with the neighboring centroid. This reduces the overall within-cluster dispersion by reiterative rearrangement of cluster members (Abbas, 2008). The algorithm takes a set of object S as the input, an integer k, and yields subsets of S (that is S1, S2, …, Sk), using the sum of the squares optimization criterion.
Steps of clustering
Choose k as the number of clusters.
Calculate the multidimensional version of the mean (centroid)
Allocate each object to the neighboring centroid.
Benefits
The popularity of the method in time and space makes it a suitable clustering technique. It is also ordered, independent. For a given set of cluster centroid, “it produces the same group of data, regardless of the order in which the patterns are presented to the algorithm” (Abbas,2008). This clustering algorithm is applied to markets, whose primary goal is to minimize the total cost.
Hierarchical clustering algorithm
Here, each object is treated as a separate cluster. The algorithm is based on identifying the first number of groups to converge and iteratively reallocating objects among groups to converge. These existing groups are either combined or divided, depending on their similarities or differences, to create a hierarchical construction that mirrors the order in which groups are separated or combined. The object initially belongs to subsets S1, S2, …, Sn, they are then paired (say Si and Sj) using a cost function, from the cheapest list to merge. After merging, the pair, Si and Sj are eliminated from the catalog to form a union of Si and Sj (Abbas, 2008). The iteration process is performed until all objects are in a single group.
Steps for clustering
Identify two clusters that are closely related.
Merge the two clusters.
Continue with iterations until all objects are merged to form a single group.
Benefits
It provides efficiency when performing specific tasks, as the algorithm is more versatile, and it is easy to handle any form of similarity or difference. The hierarchical clustering algorithm is used in environments where the objects have similar or different levels of resemblance.
Self-organization maps (SOM) algorithm
The self-organizing map algorithm is a training scheme for a neural network, where input dimensionality is reduced, to represent its distribution as a map. This type of algorithm is used to obtain solutions for robust data. It has a “substantial deposition for visual cluster analysis as it provides the data reduction and spatialization of cluster prototypes, forming a baseline for visualization and interaction with data” (Schreck, 2010). The algorithm can be applied in a broad range of data types. Clusters are obtained from the primary data. Each data competes for presentation. The weight vectors are initialized, and sample vectors are selected randomly. In the map, the samples are mapped together, and iterations are performed.
Steps of clustering
The weight of Each node is initialized.
A vector is selected randomly from the set of training data.
The examination of each node takes place to determine the input vector, depending on their weight. The winning node is referred to as the Best Matching Unit (BMU).
The area of BMU is calculated to reduce the number of neighbors over time.
The winning weight becomes the sample vector, and iterations continue.
Benefits
It is useful for vector quantization and speech recognition.
The method utilizes different kinds of distance measures and joining criteria (Abbas, 2007).
When the region of the map units is convex, several map units allow the creation of non-convex clusters (Abbas, 2007).
If the initial weights are not appropriately selected, the SOM generates a sub-optimal partition.
Application
This algorithm is applied in fields that deal with big data such as business domains and multimedia. Also, “SOM clustering has often been successfully applied in the Text (Nuernberger & Detyniecki (2006); Honkela et al. (1997)) and Multimedia Retrieval (Laaksonen et al. (2000); Pampalk et al. (2002); Bustos et al. (2004)) domains” (Schreck, 2010). Therefore, this algorithm is suitable for all fields that use robust data.
The expectation maximum clustering Algorithm
Expectation maximization (EM) is an algorithm that assumes modeling of a data set as linear combinations of multivariate normal distributions. As the name suggests, it alternates between expectation and maximization of the data. This algorithm determines the maximum log-likelihood parameters to maximize a model quality measure.
Steps of clustering
This technique only has two steps and iterations. First, estimate the clusters’ values, and second, optimize the model (Brownlee, 2019). Then repeat the two steps for N iterations until convergence occurs.
Benefits
The EM algorithm has a strong statistical basis. The data converges faster, and it is a linear database thus does not have a lot of complications given a proper initialization. The other benefit is that this algorithm can handle high dimensionality and accept its inputs to be the desired number of clusters. It is also robust to noisy data. Considering that noisy data must be experienced once clustering has begun, the expectation maximum grouping to deal with big data makes it a more effective tool than other methods such as hierarchical clustering algorithms.
Application
The expectation-maximization clustering algorithm is applicable in areas dealing with big data, as it provides a solution to maximum likelihood estimation of the clusters (variables). It is also used by machine learning students to solve problems such as probability density estimations and clusters (Brownlee, 2019).
To conclude the above types of clustering algorithms, we can say that the application criteria are the same. In all, k is the number of clusters. Difference sets in at a hierarchical algorithm, where the number of clusters k is not identified prior. All these algorithms can be used for big data. However, k-means and the expected maximum clustering algorithms are less complicated, thus becoming more suitable for comfortable dealing with big data.
However, for an analyst to effectively perform cluster analysis, they must be able to deal with all data types and be good at scaling to avoid biasness. The analyst should also be well equipped with techniques on how to deal with noisy data since there must be noisy data during clustering processes. The structure of the data should also be considered to determine the algorithm that of suitable for the data, depending on the algorithm’s flexibility, ability to handle dimensionality, applicability, and how accessible the algorithm is (Abbas, 2019).
Task 2: Bicycle manufacturing company.
The company’s goal is to maximize profit concerning the new bike.
Table 1: Estimated demand for a new bicycle based on a price level.
Price (£) | Demand (in Thousand) |
1100 | 170 |
1125 | 166 |
1150 | 150 |
1175 | 138 |
1200 | 133 |
1225 | 131 |
1250 | 120 |
1275 | 105 |
1300 | 95 |
1325 | 90 |
1350 | 80 |
1375 | 75 |
1400 | 72 |
1425 | 70 |
1450 | 67 |
1475 | 64 |
1500 | 62 |
1525 | 61.5 |
1550 | 60 |
1575 | 59.5 |
Assumption: the unit production and supply cost of each bike is £900.
Profit= Sale price – unit production cost
The estimated demand quadratic equation function is y = -0.2447x + 425.77
The optimal price that maximizes profits is £1100
The optimal demand will be 156.6
The firm’s optimal profit will be
optimal profit = (sales – unit cost) * Demand = (1100-900) 156.6 = £31320
Therefore, the company’s optimal profit will be £31320.
From the data on the prices and demand of the new bike, the demand equation is y = -0.2447x + 425.77. The firm’s decision is the production cost (p), which is 900. Since the firm’s profit is (c-p) D(p), that is (sales price- unit production cost) times the demand, the firm wants to solve the following unconstrained maximization problem: Max (c – p) D(p). For the company to attain the maximum profit, the optimal demand and optimal selling price are used. This is the theoretical view that has resulted in our solution to finding the firm’s optimal advantage (part d).
Optimal prices can also be obtained by calculating the firm’s profit and loss by subtracting fixed costs from the contribution margin. If one uses a table to analyze, the analyst will receive the optimal price by checking the column with the highest profit. The contribution margins are obtained by subtracting total variable costs from revenue (total sales). Revenue is derived from the product of price and quantity. That is, R=q*p.
Optimal profit is directly affected by optimal prices and optimal demand. And since the price is directly affected by demand, if the supply costs change to 1000, 1100, and 1150 and maintain the production cost, the difference between the unit production cost is 100, 200, and 250, respectively. The initial supply cost of a bike is £900, which is equal to the production cost. The demand for 1000 prices will be higher than the others. Therefore, the higher demand and the lower cost of production are likely to maximize profit as the firm will sell more units. Consecutively, if the change in supply cost does not affect the optimal demand, the optimal price could have been 1150.
Task 3
Compute the forecasted sale using an alpha value of 0.3 [5 marks]
Period | Sale (in Thousands) | α=0.3 | |
1 | 137 | #N/A | Differences |
2 | 138 | 137 | 1 |
3 | 134 | 137.7 | -3.7 |
4 | 133 | 135.11 | -2.11 |
5 | 136 | 133.633 | 2.367 |
6 | 140 | 135.2899 | 4.7101 |
7 | 137 | 138.58697 | -1.58697 |
8 | 141 | 137.476091 | 3.523909 |
9 | 136 | 139.9428273 | -3.9428273 |
10 | 142 | 137.1828482 | 4.81715181 |
11 | 145 | 140.5548545 | 4.445145543 |
12 | 143 | 143.6664563 | -0.666456337 |
13 | 145 | 143.1999369 | 1.800063099 |
14 | 142 | 144.4599811 | -2.45998107 |
15 | 150 | 142.7379943 | 7.262005679 |
Plot the data for the actual demand and the forecasted figures. Describe the main features of the series. [5 marks]
The differences between the actual values and the predicted values are so minimal, hence the accuracy level is boosted in this prediction model. The model indicates a cyclic future trends.
Calculate the error, Mean Absolute Deviation (MAD) error, Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE). [10 marks]
The following output is a summary of what is expected in this section.
α | 0.3 | |||||||
Period | Sale (in Thousands) | Exponential forecast(0.3) | error | ABS(error) | error^2 | %error | ABS%error | |
1 | 137 | |||||||
2 | 138 | 137 | 1 | 1 | 1 | 1% | 1% | |
3 | 134 | 138 | -4 | 4 | 16 | -3% | 3% | |
4 | 133 | 134 | -1 | 1 | 1 | -1% | 1% | |
5 | 136 | 133 | 3 | 3 | 9 | 2% | 2% | |
6 | 140 | 136 | 4 | 4 | 16 | 3% | 3% | |
7 | 137 | 140 | -3 | 3 | 9 | -2% | 2% | |
8 | 141 | 137 | 4 | 4 | 16 | 3% | 3% | |
9 | 136 | 141 | -5 | 5 | 25 | -4% | 4% | |
10 | 142 | 136 | 6 | 6 | 36 | 4% | 4% | |
11 | 145 | 142 | 3 | 3 | 9 | 2% | 2% | |
12 | 143 | 145 | -2 | 2 | 4 | -1% | 1% | |
13 | 145 | 143 | 2 | 2 | 4 | 1% | 1% | |
14 | 142 | 145 | -3 | 3 | 9 | -2% | 2% | |
15 | 150 | 142 | 8 | 8 | 64 | 5% | 5% | |
16 | 3.5 | 15.64285714 | 2% | |||||
Compute the forecasted values with the smoothing constants of 0.4 and 0.5. [10 marks]
Compute the Error, Mean Absolute Deviation (MAD) error, Mean Square Error (MSE), and Mean Absolute Percentage Error (MAPE) for the smoothing constants of 0.4 and 0.5. [10 Mark]
α | 0.4 | |||||||
Period | Sale (in Thousands) | Exponential forecast(0.4) | error | ABS(error) | error^2 | %error | ABS%error | |
1 | 137 | |||||||
2 | 138 | 137 | 1 | 1 | 1 | 1% | 1% | |
3 | 134 | 138 | -4 | 4 | 16 | -3% | 3% | |
4 | 133 | 134 | -1 | 1 | 1 | -1% | 1% | |
5 | 136 | 133 | 3 | 3 | 9 | 2% | 2% | |
6 | 140 | 136 | 4 | 4 | 16 | 3% | 3% | |
7 | 137 | 140 | -3 | 3 | 9 | -2% | 2% | |
8 | 141 | 137 | 4 | 4 | 16 | 3% | 3% | |
9 | 136 | 141 | -5 | 5 | 25 | -4% | 4% | |
10 | 142 | 136 | 6 | 6 | 36 | 4% | 4% | |
11 | 145 | 142 | 3 | 3 | 9 | 2% | 2% | |
12 | 143 | 145 | -2 | 2 | 4 | -1% | 1% | |
13 | 145 | 143 | 2 | 2 | 4 | 1% | 1% | |
14 | 142 | 145 | -3 | 3 | 9 | -2% | 2% | |
15 | 150 | 142 | 8 | 8 | 64 | 5% | 5% | |
16 | 3.5 | 15.64285714 | 2% |
Compare extracted measures for different smoothing constants and select the proper one. [5 marks]
The extracted measures for different smoothing constants indicate that there is little or no change at all across all the values.
Forecast the demand for the 16th period and discuss the outcome of the forecasted sale amount. [5 Mark]
From the excel output below, the expected 16th-period demand is about 155
Conclusion
An exponential smoothing technique is widely used by many organizations purposely to predict future events. However, there arise challenges to assign the value of exponential smoothing constant. In this BI study, this problem was solved by determining the optimal value of exponential smoothing constant. The Mean square error and mean absolute deviation are minimized to get optimal value of the constant and optimal values from the excel formula for minimum mean square error and mean absolute deviation respectively. From this method criterion, any organization can adopt the optimal value of exponential smoothing constant to enhance the accuracy of forecasting. However, some advancement can be performed to this study to minimize other forecast errors that might occur such as mean absolute percent error (MAPE), cumulative forecast error (CFE).
References
Abbas, A (2008). Comparisons between Data Clustering Algorithms. The International Arab Journal of information technology. Vol 5, No. 3 (320-325)
Ahn, J. (2005). Self-organizing map tutorial system. Interactive system design. http://www.pitt.edu/~is2470pb/Spring05/FinalProjects/Group1a/tutorial/som.html
Brownlee, J. (2019). A gentle introduction to Expectation-maximization (EM) algorithm. Retrieved July 25, 2020, from http://machinelearningmastery.com/expectation-maximization-em-algorithm
Schreck, T. (2010). Visual-interactive Analysis with self-organizing maps- advances and research challenges. Intech-open. http://doi.org/10.5772/9171