This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

ITECH1103-Big Data and Analytics

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

 

ITECH1103-Big Data and        Analytics

 

 

TABLE OF CONTENTS

Executive Summary…………………………………………………………………………………………………………2

Project Background………………………………………………………………………………………………………….3

SAS Visual Analytics……………………………………………………………………………………………………….3

Results and Discussion…………………………………………………………………………………………………….4

Conclusion………………………………………………………………………………………………………………….. 13

References……………………………………………………………………………………….14

TABLE OF FIGURES

Fig. 1: Product-CLEAN Dataset (the latest with over 1 GB size and updated on 10/03/210019) from SAS Visual Analytics

Fig 2. Product category and their frequencies

Fig 3: Countries with respect to product frequency

Table 1: Top 10 countries concerning product frequency

Fig 4: Frequency of product name in Australia

Table 2: Days concerning most numbers of orders placed by the customers

Fig 5: Product line in terms of frequency and quantity ordered

Fig 6: Countries concerning each product line

Fig. 7 The trend of orders placed over the year

Fig 8: Decision Tree for amount ordered

Fig. 9: Visualisation of the Decision tree

Fig. 9: Geomap of the supplier’s countries

Fig 10: Cluster analysis on the products related data

Fig 11: The correlation and correlation matrix between quantity ordered, cost, retail price and discount in per cent

 

 

Executive Summary

The assignment is a report writing using data analysis. Here SAS is used for data investigation to envision our information and recognise the data. The SAS analytics present SAS BI analysis. As there is a colossal measure of information are recorded in business consistently, it is much needed for that unsorted information and sorted out using data analytics tools to discover successful data for marketing. Analysing the information help association to translate business status, create a procedure and advance viable arrangement. It calls attention to the shortcoming and quality of the organisation. The organisation business can grow new strategies and plan if these data are analysed in an organised way.

A total of 12 product category namely Clothes, Assorted Sports Articles, Outdoors, Shoes, Children Sports, Running – Jogging, Team Sports, Golf, Winter Sports, Swim Sports, Racket Sports and Indoor Sports data are analysed among which Petanque Balls Chromium 8-pack, Bulls Eye Stuart/Tungsten 24 Gram, Hurricane 4, Lucky Tech Intergal Wp/B Rain Pants, Comfort Shelter, Family Holiday 4, Pacific 95% 23 Gram, Expedition10 – Medium-Right-Blue Ribbon, Rain Jacket and Big Guy Men’s Air Tuned Sirocco Shoes are reported to be top 10 products irrespective and United State, Germany, France, United Kingdom, Italy, Spain, Netherland, Australia, Belgium and Denmark are the top countries in product selling. A linear trend of orders placed over the year with a lower value of correlation coefficient is revealed. The correlation analysis between quantity ordered cost, retail price and discount in per cent told that there is a significant correlation between the cost and the retail price is found.

 

 

 

 

 

 

 

 

 

 

 

Background of the Project

The gathering of information in an organised way is called dataset. A dataset may contain business information and may be an individual one too. Dataset can likewise be depicted as a database as it includes numerous huge documents, and these are identified with a part. It is viewed as that the collection of data is started from IBM. As in IBM centralised server, working frameworks dataset implies accumulation of information containing singular information units composed in a particular way. The datasets are gathered for a specific reason. There are different approaches to collect information, for example, studies, interviews, accumulations, gatherings etc.

Analysis of data is a procedure for shaping, assessing the information sources. Complete realities getting from trial and perception are called information. It might be straight out and numerical. Clear cut information speaks to the sort of information like the shade of balls and so on. While statistical information is the information of numbers or scientific information, by keeping the information may assist us with remembering the income that implies the progression of cash. Information investigation is a significant undertaking in our day by day life. It isn’t just the undertaking for any business associations. Indeed, we can see month to month or state yearly conservative rundown by imagining the information by keeping a record of all-out costs and benefits during that time. A well-organised arrangement must be cultivated altogether by executing it as an undertaking. A large number of organisations have a vast amount of activities in progress on the planet. Ventures resemble a door for any achievement. Undertakings may differ as to their significance, ventures, all outnumber of individuals including in it. Building span, streets, grocery stores, processing plants are some case of investments held around us.   A gathering of documents containing a database’s metadata is an information word reference. It is the record of the articles in the database. It is a significant piece of each social database.

 

SAS Visual Analytics

As by name itself, huge information is the gathering of the immense measure of information which should be prepared broke down and exhibited so as to separate some valuable data or view the outcomes. Here we have to break down the huge information by utilising SAS visual examination. SAS is an online item or programming that is utilised to process an enormous number of information at once. SAS has such a large number of novel and unusual capacities that enables you to join, break, condense, and channel and some more. SAS gives a GUI, which is simpler to utilise even by a non-specific client. The advantages of using a visual Analytics to dissect information are First of all, visual investigation improves the information assessment by restricting general costs. It turns out to be generally simple to process vast information utilising visual research as it has numerous accessible apparatuses and capacities to delve profoundly into the Visual information examination has a fantastic introduction device, it can naturally make the best portrayal outline or ideal diagram for portrayal the information. The wide assortment of tools and counts technique makes it significantly progressively viable and ground-breaking. The effortlessness of the product makes it all the more intriguing to utilise. So while envisioning the information, we will use diverse graphical portrayals, for example, visual chart, scatterplot, heat map, crosstab, line diagram and different parameters. The fundamental purpose behind utilising these apparatuses and parameters is these charts and outlines will make it more explicit information. Indeed, even by the slight look, it will be evident to everybody. Also a fresher or general will have the option to comprehend the data plainly so that these representations apparatuses will be utilised all through the advancement.

 

Results and Discussion

For the present study, we analysed the Product-CLEAN Dataset (the latest with over 1 GB size and updated on 10/03/210019) from SAS Visual Analytics. The figure below depicts the dataset visualisation.

Fig. 1: Product-CLEAN Dataset (the latest with over 1 GB size and updated on 10/03/210019) from SAS Visual Analytics

In the dataset, there are total 12 product category namely Clothes, Assorted Sports Articles, Outdoors, Shoes, Children Sports, Running – Jogging, Team Sports, Golf, Winter Sports, Swim Sports, Racket Sports and Indoor Sports. The frequencies of the said categories are revealing through the following bar graph.

Fig 2. Product category and their frequencies

 

An analysis is made on the products name to find out the top 10 products concerning All countries. The top 10 products found to be Petanque Balls Chromium 8-pack, Bulls Eye Stuart/Tungsten 24 Gram, Hurricane 4, Lucky Tech Integral Wp/B Rain Pants, Comfort Shelter, Family Holiday 4, Pacific 95% 23 Gram, Expedition10 – Medium-Right-Blue Ribbon, Rain Jacket and Big Guy Men’s Air Tuned Sirocco Shoes with a frequency distribution of 4898, 4413, 30313, 1193, 1453, 1983, 2783, 5173, 7753 respectively.

Fig 3: Countries concerning product frequency

An analysis is made on the Country to find out the top 10 countries concerning product frequency. The top 10 countries reported being United State, Germany, France, United Kingdom, Italy, Spain, Netherland, Australia, Belgium and Denmark with a product frequency as shown the following table.

Table 1: Top 10 countries concerning product frequency

 

Fig 4: Frequency of product name in Australia

A similar analysis is made on the products name to find out the top 10 products only in the county of Australia as shown in the above figure. The top 10 products found to Bulls Eye Stuart/Tungsten 24 Gram, Pacific 95% 23 Gram, Aim4it 16 Gram Softtip Pil, Aim4it 18 Gram Softtip Pil, Aim4it 80% Tungsten 22 Gram, Bulls Eye 15 Gram 80% Tungsten Soft, Lucky Tech Integral Wp/B Rain Pants, Hurricane 4, Comfort Shelter and Expedition10,Medium,Right,Blue Ribbon with frequencies of 761, 585,526, 525,426,409, 402, 390,372 respectively.

Table 2: Days concerning most numbers of orders placed by the customers

The analysis on the number of days with respect to most numbers of orders placed by the customers to find out the top 5 days revealed that the 6th December 2016, 19th December 2016, 26th December 2016, 13th December 2016, 27th December 2016 are the top according to the order placed as revealed through the table above.

An analysis on the suppliers in terms of the quantity ordered it is observed that the top 3 suppliers are Eclipse inc., 3Top Sports and Luna sastreria S.A. and the bottom 3 suppliers are Dutchman Bike, A Pereir a Sports and SportPharma Inc.

Fig 5: Product line in terms of frequency and quantity ordered

The above figure depicts the product line in terms of frequency and quantity ordered. From the figure, it is observed that the sport is most popular product line in terms of frequency as well as the quantity ordered. Whereas the children are least popular product line in terms of frequency as well as the quantity ordered.

Fig 6: Countries with respect to each product line

Analysis to find out the top 5 countries with respect to each product line revealed that the United State, France, Germany,  Spain and the United Kingdom are the top 5 countries for the product line sports.

France, United States, United Kingdom, Spain and the Netherlands are the top 5 for the product line children.

United State, Italy, United Kingdom, Germany and France are the top 5 for the product line Clothes and Shoes.

United State, Germany, Italy, Netherlands and Australia are the top 5 for the product line outdoors.

Fig. 7 The trend of orders placed over the year

A linear pattern of orders placed over the year with a lower value of correlation coefficient is revealed through the above figure.

Fig 8: Decision Tree for quantity ordered

A Decision tree is made for the ordered quantities. The variables of importance are product-related variables, e.g. product group, product category, product line. Decision theory manages techniques for deciding the ideal game-plan when various options are accessible, and their results cannot be a figure with certainty. It is hard to envision a circumstance which doesn’t include such decision theory issues, yet we will limit ourselves principally to problems happening in business, with outcomes that can be depicted in dollars of benefit or income, cost or misfortune. For these issues, it might be sensible to consider as the best elective that results in the most elevated gain or profit, or least cost, over the long haul. This foundation of optimality isn’t without deficiencies; however, it should fill in as a practical manual for activity in dreary circumstances where the outcomes are not necessary.

Fig. 9: Visualisation of the Decision tree

The least difficult decision theory issues can be settled by posting the conceivable money related results and the related probabilities for each alternative, calculating the standard fiscal estimations everything being equal, and choosing the option with the most elevated anticipated financial worth. The assurance of the ideal choice turns into somewhat more entangled when the decision includes groupings of the decision tree.

Fig. 9: Geomap of the supplier’s countries

A geomap of the supplier’s countries is shown in the figure above.  A geomap is a guide of a nation, landmass, or area map, with hues and qualities doled out to specific locales. Conditions are shown as a shading scale, and you can indicate discretionary hover text for locales. The guide is rendered in the program utilising an implanted Flash player. Note that the guide isn’t draggable, however, can be arranged to permit zooming.

Fig 10: Cluster analysis on the products related data

Cluster analysis is done on the products related data, as shown in the above figure. Cluster analysis is the way toward gathering perceptions of comparable sorts into littler gatherings inside the bigger populace. It has across the board application in business investigation. One of the inquiries confronting organisations is how to sort out the colossal measures of accessible information into important structures or break an enormous heterogeneous populace into littler similar gatherings. Bunch investigation is an exploratory information examination device which targets arranging various articles into cluster such that the level of relationship between two items is maximal if they have a place with a similar gathering and negligible generally.  (Abonyi, J., & Feil, B. 2007).

Cluster analysis has been utilised in promoting for different purposes. Division of buyers in bunch examination is utilised based on advantages looked for from the acquisition of the item. It very well may be utilised to distinguish similar gatherings of purchasers.  (Anderberg, M. R.  1973).

The factors on which the cluster analysis is to be done ought to be chosen by remembering past research. It ought to likewise be selected from by hypothesis, the theories being tried, and the judgment of the scientist. A fitting proportion of separation or closeness ought to be chosen; the most customarily utilised measure is the Euclidean separation or its square.

Fig 11: The correlation and correlation matrix between quantity ordered, cost, retail price and discount in per cent

In SAS Visual Analytics correlations are determined by utilising Pearson’s product-moment relationship coefficient count. This count takes in two measures and decides the amount they are connected directly. The scope of the correlation worth can be anyplace from – 1 to 1. Anything from – 1 to 0 shows a negative relationship, which implies that as one of the measures builds different reductions. A correlation of 0 indicates no relationship by any stretch of the imagination. Positive numbers from 0 to 1 show a positive relationship, which implies that as one measure increments so do the other. SAS distinguishes these scopes of appraisals for connections as being Weak, Moderate, or Strong.

In a correlation matrix, there are two alternatives in the roles tab under correlations to show the connections between the measures that you need. The choice inside one lot of rules takes many steps and displays them in a lattice against themselves so that, in a triangle design, you see every measure relationship against each other.

The correlation analysis between quantity ordered cost, retail price and discount per cent are shown in the above table. The figure above represents the correlation matrix between the amount requested, cost, retail price and discount in per cent. It is observed that there is a significant correlation between the cost and the retail price is found.

 

Conclusion:

The main objective of this assignment is to furnish understudies with functional involvement in working in groups to compose an information investigative report to give helpful experiences, example and patterns in the given big dataset. This action offered understudies the chance to show advancement and imagination in applying SAS Analytics and structuring valuable perception and prescient answers for different examination issues.

A linear trend of orders placed over the year with a lower value of correlation coefficient is revealed. The correlation analysis between quantity ordered cost, retail price and discount per cent told that there is a significant correlation between the cost and the retail price is found.

 

 

 

 

References:

Bridgewater, Adrian (2010). “JMP Genomics 5: Data Visualization & Exploration”. Dr. Dobb’s Journal. Retrieved May 31, 2012.

Der, G.; B. S. Everittt (2009). “Basic Statistics using SAS Enterprise Guide”. Journal of the Royal Statistical Society, Series A. 172 (2): 530. doi:10.1111/j.1467-985X.2009.00588_2.x

Adams, M.N. (2010) Perspectives on Data Mining. International Journal of Market Research 52(1), 11–19

Bakshi, K. (2012) Considerations for Big Data: Architecture and Approaches. In: Proceedings of the IEEE Aerospace Conference, pp. 1–7

Cebr: Data equity, Unlocking the value of big data. in: SAS Reports, pp. 1–44 (2012)

Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C. (2009) MAD Skills: New Analysis Practices for Big Data. Proceedings of the ACM VLDB Endowment 2(2), 1481–1492

Abonyi, J., & Feil, B. (2007). Cluster analysis for data mining and system identification. Boston, MA: Birkhäuser Basel.

Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Newbury Park, CA: Sage Publications.

Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic Press.

Arabie, P., Carroll, J. D., & DeSarbo, W. S. (1987). Three-way scaling and clustering. Newbury Park, CA: Sage Publications.

Everitt, B. S. (1980). Cluster analysis. Quality and Quantity, 14(1), 75-100.

Everitt, B. S., Landau, S., & Leese, M. (2001). Cluster analysis (4th ed.). London: Arnold.

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask