The Rising Data Wave

A large and rapidly increasing amount of data is being generated and collected in connection with every human activity. A large part of this data is stored and can be accessed on the Internet. The data stores may be very large and may contain very large datasets. The ability to analyze these datasets, converting their information content into knowledge, understanding and insight necessary to make decisions and implement the decisions by corresponding actions is becoming increasingly important. This ability has until recently been limited to large organizations and institutions using large computers – even supercomputers. The development of  relatively inexpensive personal computers with increasing computing power and the development  of  a number of relatively inexpensive and effective software packages has made it possible for ordinary people to analyze large datasets.

The Wave - Hokusai

The availability of a large amount of data and effective software is not enough to derive reliable knowledge from the information in the data and to make reliable decisions as well as to take effective actions. In order to do this it is necessary to adhere to the stages in an orderly data analytic process and execute the process in a competent manner.

The Rising Data Wave – Page

 

Data Analysis Software

The recent tidal wave of data has given rise to the development of a large number of software programs relevant to the analysis of the data. From a long list of programs I have chosen the following for my own use:

  1. Tableau
  2. DataDesk
  3. StatCrunch
  4. BestView – Addon to Mathematica
  5. Mathematica
  6. R
  7. ParallAX
  8. NeuroSolutions
  9. Gephi
  10. Ayasdi

Some of these programs are preexisting programs that have been adapted to the requirements of big data, some are new, as for example Tableau and Ayasdi. The programs I have chosen are not necessarily the best for all but they are the best for my present needs.

Data Analysis Software – Page

Data and Decision Analytic Process

The data and decision analytic process is a path leading from the larva of data to the butterfly of knowledge, understanding, and insight.

Before starting to work on data analytic and associated decision analytic projects it is necessary, in order to ensure the quality of the results, to

  1. define an orderly data analytic and decision analytic process
  2. select methods for executing the process
  3. select software packages for implementing the methods

In order to ensure the reliability of the answers/solutions and the quality of the decisions made and actions taken, it is necessary to adhere to the analytic process in an orderly manner and apply the methods and the software packages in a  competent manner

Before starting an analytic process it is necessary to state the question/problem under consideration and ask the following preliminary questions:

  1. Is the answer/solution considered known?
  2. Is the the answer/solution based on sufficiently recent/reliable data?
  3. Was the analysis performed in a competent/reliable manner?
  4. Is the results of the analysis presented/visualized in such a way that it sufficiently increases the understanding and insight of the target group ?
  5. Do the results of the analysis, their presentation/visualization, and the resulting understanding and insight form a sufficently firm basis for decision making and action?

If any of the answers are no there may be a reason to go ahead with the analytic and decision analytic process. If all the the answers are yes it is unnecessary to go ahead with the process unless you are confident that you can improve the results materially or introduce your particular results to a new or wider audience. But beware of hubris.

The main stages of a combined data analytic and decision analytic process

  1. State an important question/problem
  2. Data analysis
    1. Select data relevant to answering the questions or solving the problems
    2. Prepare the data for analysis. Employ visualization during preparation
    3. Analyze the data – Increase knowledge about the past, present, and future state of the system generating the data – Increase knowledge about individual variables and the relationship between variables. Employ visualization extensively during analysis
      1. Descriptive data analysis
      2. Exploratory data analysis
      3. Confirmatory data analysis
      4. Predictive data analysis
    4. Present/visualize the results of the analysis
    5. Evaluate the results of the analysis – Have the original questions been answered?
  3. Decision analysis
    1. Make decisions based on the results of the analysis
    2. Implement decisions – Act
    3. Present/visualize the results of the actions
    4. Evaluate the results of the actions – Have the original problems originally posed been solved?
  4. Reiterate the process or its individual stages as necessary

Data and Decision Analytic Process – Page

Decision Analysis

Decision Analysis is a systematic, quantitative and visual approach to addressing and evaluating important choices confronted by decision makers. Decision analysis utilizes a variety of tools to evaluate all relevant information to aid in the decision making process.

From <http://www.investopedia.com/terms/d/decision-analysis.asp>

After all of the alternatives have been analyzed and a final decision has been reached, there are steps that should be taken during the implementation process for that decision. Three essential actions to implementing a decision include creating an implementation plan, informing stakeholders, and finally, adjusting the decision to make compromises as necessary.

From<https://www.boundless.com/management/decision-making/decision-making-process/implement-the-course/>

Decision Analysis – Page

Data Analysis

Data analysis proper is a process consisting of a sequence of stages beginning with  data and ending in knowledge derived from the information in the data.

This knowledge may then be used in making and implementing decisions.

Lens

There are many different kinds of data analysis:

  • Descriptive data analysis – describe the features of the data
  • Exploratory data analysis – discover new features in the data
  • Confirmatory data analysis – confirm or disconfirm/falsify existing hypotheses
  • Predictive data analysis – apply statistical or structural models for predictive forecasting or classification

Data Analysis – Page

Books about Data Analysis

In order to become proficient in data analysis it is necessary to study the theory and practice of data analysis,

A large number of books have been written about data analysis, especially after the data deluge in recent years and the appearance of relatively inexpensive and effective data analysis software.

I have read or at least skimmed through a considerable number of these books and am keeping them on hand as reference books when I work  on data analysis projects.

A list of these books can be found in the page Data Analysis Books

From these books I have learned that data analysis is a process consisting of a sequence of stages beginning with  data and ending in knowledge derived from the information in the data.

This knowledge may then be used in making decisions and implementing them by corresponding actions.

The books place different emphasis on the different steps in the data analytic process. Some emphasize the data end, some the analytic middle, some the visualization of the data and the results of the analysis.

The quality of the books are is quite varied. All of them contain something of value and can be used for reference. Some of them I find of special interest to me and have selected for thorough reading. These are:

      1. Data Just Right – Manoochehri
      2. Making Sense of Data – Myatt
      3. The Visual Display of Quantitative Information – Tufte
      4. Visual Statistics – Seeing Data with Dynamic Interactive Graphics – Young et alia
      5. Tableau Your Data – Murray
      6. DataDesk Manual
      7. Parallel Coordinates – Inselberg
      8. Modeling Techniques in Predictive Analytics – Miller
      9. Predictive Analytics for Dummies – Bari, Chaouchi, Jung
      10. R for Dummies – Meys, de Vries

These books are not necessarily the best for all but they are the best for fulfilling my present needs.

Data Analysis Books – Page