All posts by oddur.bjarnason@gmail.com

Data Mining

Dr. Saed Sayed has published a fine data mining map called “An Introduction to Data Mining”. The url is http://www.saedsayad.com.

The map show the stages and substages of the data mining process with a wealth of information about relevant methods.

Dr. Sayed supplies the following information about himself:

“I have more than 20 years of experience in data mining, statistics and artificial intelligence and designed, developed and deployed many business and scientific applications of predictive modeling. I am a pioneer researcher in real time data mining, an adjunct Professor at the University of Toronto, and have been presenting a popular graduate data mining course since 2001.”

Dr. Sayed has written an excellent book called “Real Time Data Mining”.  His description of the content of the book is an excellent characterization of data mining.

“Data mining is about explaining the past and predicting the future by exploring and analyzing data. Data mining is a multi-disciplinary field which combines statistics, machine learning, artificial intelligence and database technology. Although data mining algorithms are widely used in extremely diverse situations, in practice, one or more major limitations almost invariably appear and significantly constrain successful data mining applications. Frequently, these problems are associated with large increases in the rate of generation of data, the quantity of data and the number of attributes (variables) to be processed: Increasingly, the data situation is now beyond the capabilities of conventional data mining methods. The term Real Time is used to describe how well a data mining algorithm can accommodate an ever increasing data load instantaneously. Upgrading conventional data mining to real time data mining is through the use of a method termed the Real Time Learning Machine or RTLM. The use of the RTLM with conventional data mining methods enables Real Time Data Mining. The future of predictive modeling belongs to real time data mining and the main motivation in authoring this book is to help you to understand the method and to implement it for your applications.

The image below illustrates the extraction of data from a bottomless black hole of big data.

http://www.proscoutleadgeneration.com/wp-content/uploads/2014/04/datamining.jpg

Searching for Gold.

http://www.grtcorp.com/content/big-data-blues-dangers-data-mining

Data Mining Page

Decision Making Methods

A large number of decision making methods have been developed. A few of them are listed below:

Decision Trees
Influence Diagrams
Multicriteria Decision Analysis
Analytic Hierarchy Process
Analytic Network Process
Hierarchical Influence Diagrams
PAPRIKA Method: “potentially all pairwise rankings of possible alternatives”

These methods are implemented by various software packages, for example:

TreeAge Pro
AgenaRisk
1000minds
Analytica Free
Priest

 Decision Making Methods – Page

Decision Analysis Software

Decision making software is a tool intended to support the decision making process but not to replace it. The software frees decision makers from the technical details of the decision-making method employed and makes it possible for them to focus on fundamental value judgments.

A large number of software packages are available. Their quality and price is extremely variable. Some of the packages are exorbitantly expensive. Some are less expensive and even free but nevertheless of high quality. I have selected the following packages for my own use:

TreeAge Pro
AgenaRisk
1000minds
Analytica Free
Priest

These packages may be shortly characterized as follows:

TreeAge Pro is a visual modeling tool for building and analyzing decision trees, influence diagrams and Markov models. It employs Bayes analysis and multicriteria decision making.

AgenaRisk uses the latest developments from the field of artificial intelligence and visualisation to solve complex, risky problems. AgenaRisk enables decision-makers to measure and compare different risks in a way that is repeatable and auditable. The AgenaRisk solution includes predictive analytics and scales up to organisational-level risk monitoring and assessment. It is ideal for risk scenario planning.

1000minds is an online decision-making software for multi-criteria decision making. The software implements the “potentially all pairwise rankings of possible alternatives (PAPRIKA) method.

Analytica is a visual decision-making software. It combines hierarchical influence diagrams for visual creation and view of models, arrays folr working with multidimensional data, Monte Carlo simulation, for analyzing risk and uncertainty, and optimization, including linear and nonlinear programming.

PriEsT is an open-sources decision making software that implements the analytic hierarchy process method.

Decision Analysis Software – Page

Establishing a Blog about Data Analysis and Associated Decision Analysis

I have for a long time been interested in data analysis and decision analysis, partly in connection with my work as a physician and psychiatrist and partly in connection with various other interests that I have.

I have a large amount of material related to data analysis and decision analysis on my computer or accessible by way of my computer and I often work on data analytic and decsion analytic projects.

I have now decided to establish a WordPress blog about data analysis and associated decision analysis. I realize that the likelihood of anybody reading the blog is minimal. In February 2014 there were 75.8 million WordPress blogs in existence in addition to hundreds of millions on other blogging services. The likelihood of anybody finding the blog and finding the blog of sufficient interest to actually read it  is minimal. Less than the likelihood of finding a message in a bottle or the proverbial needle in a haystack. Search engines would of course function analogously to loupes or magnets.

Message in a bottle buried in sand
Message in a bottle buried in sand
Message in a bottle buried in sand
Neddle in the haystack
Neddle in the haystack
Neddle in the haystack

 

 

Nevertheless, writing the blog will be of value to me. It will help me to organize the material and thoughts that I have about data analysis and associated decision analysis and the possibility that somebody might read the blog with critical eyes will make me endeavor to improve the quality of my thinking and writing. It will discipline me.

Establishing a Blog …-Page

The Rising Data Wave

A large and rapidly increasing amount of data is being generated and collected in connection with every human activity. A large part of this data is stored and can be accessed on the Internet. The data stores may be very large and may contain very large datasets. The ability to analyze these datasets, converting their information content into knowledge, understanding and insight necessary to make decisions and implement the decisions by corresponding actions is becoming increasingly important. This ability has until recently been limited to large organizations and institutions using large computers – even supercomputers. The development of  relatively inexpensive personal computers with increasing computing power and the development  of  a number of relatively inexpensive and effective software packages has made it possible for ordinary people to analyze large datasets.

The Wave - Hokusai

The availability of a large amount of data and effective software is not enough to derive reliable knowledge from the information in the data and to make reliable decisions as well as to take effective actions. In order to do this it is necessary to adhere to the stages in an orderly data analytic process and execute the process in a competent manner.

The Rising Data Wave – Page

 

Data Analysis Software

The recent tidal wave of data has given rise to the development of a large number of software programs relevant to the analysis of the data. From a long list of programs I have chosen the following for my own use:

  1. Tableau
  2. DataDesk
  3. StatCrunch
  4. BestView – Addon to Mathematica
  5. Mathematica
  6. R
  7. ParallAX
  8. NeuroSolutions
  9. Gephi
  10. Ayasdi

Some of these programs are preexisting programs that have been adapted to the requirements of big data, some are new, as for example Tableau and Ayasdi. The programs I have chosen are not necessarily the best for all but they are the best for my present needs.

Data Analysis Software – Page

Data and Decision Analytic Process

The data and decision analytic process is a path leading from the larva of data to the butterfly of knowledge, understanding, and insight.

Before starting to work on data analytic and associated decision analytic projects it is necessary, in order to ensure the quality of the results, to

  1. define an orderly data analytic and decision analytic process
  2. select methods for executing the process
  3. select software packages for implementing the methods

In order to ensure the reliability of the answers/solutions and the quality of the decisions made and actions taken, it is necessary to adhere to the analytic process in an orderly manner and apply the methods and the software packages in a  competent manner

Before starting an analytic process it is necessary to state the question/problem under consideration and ask the following preliminary questions:

  1. Is the answer/solution considered known?
  2. Is the the answer/solution based on sufficiently recent/reliable data?
  3. Was the analysis performed in a competent/reliable manner?
  4. Is the results of the analysis presented/visualized in such a way that it sufficiently increases the understanding and insight of the target group ?
  5. Do the results of the analysis, their presentation/visualization, and the resulting understanding and insight form a sufficently firm basis for decision making and action?

If any of the answers are no there may be a reason to go ahead with the analytic and decision analytic process. If all the the answers are yes it is unnecessary to go ahead with the process unless you are confident that you can improve the results materially or introduce your particular results to a new or wider audience. But beware of hubris.

The main stages of a combined data analytic and decision analytic process

  1. State an important question/problem
  2. Data analysis
    1. Select data relevant to answering the questions or solving the problems
    2. Prepare the data for analysis. Employ visualization during preparation
    3. Analyze the data – Increase knowledge about the past, present, and future state of the system generating the data – Increase knowledge about individual variables and the relationship between variables. Employ visualization extensively during analysis
      1. Descriptive data analysis
      2. Exploratory data analysis
      3. Confirmatory data analysis
      4. Predictive data analysis
    4. Present/visualize the results of the analysis
    5. Evaluate the results of the analysis – Have the original questions been answered?
  3. Decision analysis
    1. Make decisions based on the results of the analysis
    2. Implement decisions – Act
    3. Present/visualize the results of the actions
    4. Evaluate the results of the actions – Have the original problems originally posed been solved?
  4. Reiterate the process or its individual stages as necessary

Data and Decision Analytic Process – Page

Decision Analysis

Decision Analysis is a systematic, quantitative and visual approach to addressing and evaluating important choices confronted by decision makers. Decision analysis utilizes a variety of tools to evaluate all relevant information to aid in the decision making process.

From <http://www.investopedia.com/terms/d/decision-analysis.asp>

After all of the alternatives have been analyzed and a final decision has been reached, there are steps that should be taken during the implementation process for that decision. Three essential actions to implementing a decision include creating an implementation plan, informing stakeholders, and finally, adjusting the decision to make compromises as necessary.

From<https://www.boundless.com/management/decision-making/decision-making-process/implement-the-course/>

Decision Analysis – Page

Data Analysis

Data analysis proper is a process consisting of a sequence of stages beginning with  data and ending in knowledge derived from the information in the data.

This knowledge may then be used in making and implementing decisions.

Lens

There are many different kinds of data analysis:

  • Descriptive data analysis – describe the features of the data
  • Exploratory data analysis – discover new features in the data
  • Confirmatory data analysis – confirm or disconfirm/falsify existing hypotheses
  • Predictive data analysis – apply statistical or structural models for predictive forecasting or classification

Data Analysis – Page

Books about Data Analysis

In order to become proficient in data analysis it is necessary to study the theory and practice of data analysis,

A large number of books have been written about data analysis, especially after the data deluge in recent years and the appearance of relatively inexpensive and effective data analysis software.

I have read or at least skimmed through a considerable number of these books and am keeping them on hand as reference books when I work  on data analysis projects.

A list of these books can be found in the page Data Analysis Books

From these books I have learned that data analysis is a process consisting of a sequence of stages beginning with  data and ending in knowledge derived from the information in the data.

This knowledge may then be used in making decisions and implementing them by corresponding actions.

The books place different emphasis on the different steps in the data analytic process. Some emphasize the data end, some the analytic middle, some the visualization of the data and the results of the analysis.

The quality of the books are is quite varied. All of them contain something of value and can be used for reference. Some of them I find of special interest to me and have selected for thorough reading. These are:

      1. Data Just Right – Manoochehri
      2. Making Sense of Data – Myatt
      3. The Visual Display of Quantitative Information – Tufte
      4. Visual Statistics – Seeing Data with Dynamic Interactive Graphics – Young et alia
      5. Tableau Your Data – Murray
      6. DataDesk Manual
      7. Parallel Coordinates – Inselberg
      8. Modeling Techniques in Predictive Analytics – Miller
      9. Predictive Analytics for Dummies – Bari, Chaouchi, Jung
      10. R for Dummies – Meys, de Vries

These books are not necessarily the best for all but they are the best for fulfilling my present needs.

Data Analysis Books – Page