|Tim Chartier, co-author with Anne Greenbaum of Numerical Methods: Design, Analysis, and Computer Implementation of Algorithms, explains how to make sense of big data with numerical analysis.|
You submit a query to Google or watch football bowl games as we enter a new year. In either case, you benefit from mathematical methods that can garner meaningful information from large amounts of data. Such techniques fall in the field of data mining.
Massive datasets are available with every passing minute in our world. For example, during the Oscars in February, the Cirque du Soleil performance resulted in 18,718 tweets in one minute according to TweetReachBlog. While tweets cannot exceed 140 characters in length, their average length is 81.9 characters according to MediaFuturist. So, in one minute, approximately 1.5 million characters zoomed through Twitter. From Wikipedia, we’ll take the average length of a word (in English) to be 5.1 characters. Assuming these Oscar tweets are written in English and conform to the standard length of words, 300,000 words were tweeted in one minute. This is approximately the number of words contained in the entire Hunger Games Trilogy!
Mathematical models and numerical analysis play important roles in data mining. For example, a foundational part of Google’s search engine algorithm is a method called PageRank. In Anne Greenbaum and my book, Numerical Methods: Design, Analysis, and Computer Implementation of Algorithms, published by Princeton University Press, we discuss the PageRank method– both its underlying mathematical model and how it is computed on a computer.
In an exercise in the text, you can develop a system of linear equations in a manner similar to that used by the Bowl Championship Series to rank college football teams (editor – or college basketball teams for March Madness). An important part of this problem is developing the linear system. Our text also discusses the computation challenges of such problems and what numerical methods result in the most accurate results.
Many techniques utilized to solve the large linear systems of data mining are also utilized in engineering and science. The book discusses how large linear systems (containing millions of rows) can derive from problems involving partial differential equations. Again, the book analyzes the efficiency and accuracy of the methods utilized to solve such systems. Such techniques led to the computed animated figures we enjoy in modern movies and aid in simulating the aerodynamics of a car created with computer-aided design.
As stated at the opening of Chapter 1 of the text, “Numerical methods play an important role in modern science. Scientific exploration is often conducted on computers rather than laboratory equipment. While it is rarely meant to completely replace work in the scientific laboratory, computer simulation often complements this work.” As such modern science demands the use and understanding of numerical methods.