Define data mining as an enabling technology for business analytics
Data mining can be described as the procedure used to analyze data from various sources which may not be uniform and making a summary of all the relevant information in the data that can be used to guide decisions in a company towards better policies that decrease costs and increase revenue. Data mining is carried out in huge data sets containing millions of instances. Data mining finds patterns in the data which generate information.
Objectives and benefits
The main aim of data mining is to identify structures in unstructured data. Due to the large size of the data sets on which mining is carried out, most of the data is unstructured. Unstructured data has many irrelevant variables which take many forms such as strings, numeric variables and double figures. Finding structures in structured data provides better knowledge of their customer base. With this knowledge, a business can come up with better strategies to increase revenues and decrease costs. The reaping of data mining benefits varies according to the specific field where mining is being done. Taking the example of the marketing industry, building models from data mining allows markets to design strategies. In finance, data mining allows lending institutions to determine the creditworthiness of people before issuing loans.
Standardized data mining processes
The standardized data mining process contains five steps which are grouped into two phases. The first phase is preparing the data which includes cleaning the data, selecting relevant variables, transforming data into a form suitable for analysis. The second phase involves data mining where algorithms search for patterns and the representation of knowledge from the data.
Methods and algorithms for mining
Data mining techniques are mainly based on machine learning algorithms and statistical models such as correlations. The methods and algorithms for mining include correlation analysis, Bayes classification, sequential patterns, outlier detection, classification, decision tree induction and cluster analysis among others.
Existing data mining software tools.
There is specialized software in the market whose main purpose is data mining. Mining can also be carried out through packages of various programming languages such as SPSS, R and Python. Some of the software in the market such as MEPX, mlpack and Massive Online Analysis is free. Some examples of proprietary software in the market include Vertica, STATISTICA and PSeven.
Privacy issues, pitfalls and myths of data mining.
The growth of technology has brought with it disadvantages of privacy issues on the data collected. Data mining requires huge data sets implying that a lot of personal data ends up in the databases. There is fear that this information may be used to harm or target individuals especially with the emergence of hackers. The information acquired in data mining is mainly for ethical purposes which may be overlooked in some instances. There is a myth that information from data mining is extremely accurate. However this is not the case, and wrong data may lead to bad decision making which may impact business negatively.
Text analytics and the need for text mining.
Text analytics is the process of converting large volumes of text which have no defined structure into quantitative data. The process is automated. Data collected with string values cannot be analyzed quantitatively in its raw form. Mining text data for text analytics helps gain insights, uncover patterns and trends.
Differentiate among text analytics, text mining, ad data mining.
The difference between test mining and text analytics is minimal. This is because both processes have the end goal of gaining useful information from unstructured text. However, results from data mining are qualitative because of the string nature of the responses while the results form text analytics ca be quantified to provide information. Data mining is related to the two processes as it also has the same end goal. However, data mining differs from the two as it combines both qualitative and quantitative process. Data from mining has variables of different forms such as integers, short strings and long strings. Text mining and data mining can be considered a subset of data mining techniques.
The process of carrying out a text mining project.
The text mining process involves a series of steps. The first step is cleaning the text to remove unwanted variables. The text data then undergoes tokenization. Tokenization splits data on punctuation marks and white spaces where abbreviations are not identified. Each token is then assigned a word class. The tokens are clearly labelled. The processes of attribute generation, attribute selection, data mining and evaluation are then carried out on the data. Feature selection is the most important step in the text mining process. Through feature selection, the important variables are selected. These variables are the basis of any model. Elimination of significant variables implies that any model developed will be significantly less accurate.
Describe sentiment analysis
Sentiment analysis is a measure of the attitude of a customer towards the brand or to a certain product through the use of computer algorithms such as the Natural Language Processing framework. Sentiment analysis is based on the negativity or positivity of comments.
Applications of sentiment analysis, methods for sentiment analysis, speech analytics and relation to sentiment analysis
Sentiment analysis can be applied in various fields. Taking the example of the business field, sentiment analysis is mainly used to analyze customer feedback. Another application of sentiment analysis is in the stock markets. The prices of stocks are highly volatile and are highly influenced by the image of a company. Analysis of sentiments of people about the company is likely to yield opportunities for stock market traders. Positive reviews imply that a company’s share prices are likely to go up while negative reviews imply that the share prices are likely to go down. The methods used in sentiment analysis can be grouped into three namely: statistical methods, hybrid approaches and knowledge-based techniques. The knowledge-based method classifies text according to the presence of some words that indicate perspective such as bad and good. The statistical method incorporates machine learning techniques to try and test for the presence of sentiments. The hybrid method incorporates both statistical techniques, machine learning and the representation of knowledge. Data collected when mining in some instances contains audio data such as in data from customer care centres who record their calls. This data is important as it shows the sentiments of the customer. Speech analytics is applied to this data to gauge the attitude of the customer when he/she is receiving customer care services. Speech analytics can be described as a subgroup of sentiment analysis.
Applications of perspective analytics techniques
Perspective analytics uses data to help businesses to make decisions for the near future through the use of machine learning techniques. In the journalism industry, it can be used to determine which story is likely to receive the highest number of readership and which one should not be included in analysis. In health care, sentiment analysis can be used to determine which patient is likely to be readmitted to the hospital. Models built on perspective analysis can be used to determine the value for commodities which have no fixed value or have several factors influencing the prices. Taking the example of the tourism industry, the prices of the hotels can be adjusted automatically by models built from perspective analysis based on season and demand among other factors.
The basic concept of analytical decision modelling and understand the concepts of analytical models for selected decision problems.
Decision analysis makes use of quantitative methods applied explicitly to analyze decisions under uncertain conditions. Analytical decision modelling allows experts to model the expected consequences of adopting certain strategies while considering all the factors that will affect a strategy. These factors are assigned varying weights for the model depending on certain criteria such as correlation. Taking the example of the medical industry where there are different procedures for treating the same ailment but with different outcomes depending on the body of a patient. Putting all the characteristics of a patient into consideration, it is possible to use probabilities to determine the consequences of adopting any procedure.
Describe how spreadsheets can be used for analytical modelling and solutions. Explain the basic concepts of optimization and when they should be used.
Spreadsheets act as decision-support systems in analytical modelling. Data is stored in data-based in the form of spreadsheets. Spreadsheets provide business modelling tools that are user friendly. It is possible to build models that automatically help determine the weights of different variables in business and thus calculate outcomes for certain scenarios. Solutions can be developed from forecasts of these models. However, several solutions are available from these models. To find the best solution optimization concepts should be applied. The optimal solutions are found by finding the right balance to the weights.
Explain sensitivity analysis, what-is analysis, goal-seeking. Concepts and various applications of simulation. Potential applications of discrete event simulation.
Spreadsheets can be used in sensitivity analysis, what-if analysis, goal-seeking and simulation. Sensitivity analysis measures the variations recorded on a system depending on the input. What-if analysis is the changing of values in a spreadsheet to determine changes to outcomes of models. Goal seeking allows experts to derive an initial condition for the desired outcome of the model. Simulation can be applied in many fields such as in medical education to allow students to learn about the human brain. Simulation is also used in gaming, coming up with queuing techniques and in the transport industry to design roads. Discrete event simulation can be used in the health care industry to determine the best procedures for patients.