SELECTING BASEBALL’S BEST BATTER: A MATHEMATICAL ANALYSIS!

 

CHRISTINA WALL

 

Latta High School, GRADE 10

 

ABSTRACT

 

 This project employed a mathematical rating and weighting system to determine an “all time best batter” throughout history.  The purpose of this project was to determine whether a weighted statistical analysis of batters’ statistics differ from an unweigted analysis in the selection of the best batter throughout history?

The hypothesis was “When comparing baseball-batting statistics using a mathematical analysis which weights each desirable characteristic, a significantly different result will be evident in the selection of the ‘best baseball batter ever’ rather than the one selected using an unweighted mathematical analysis which assigns all characteristics an equal importance value.”  A survey was developed to receive coaches’ opinions to create a rating scale on batting characteristics. Statistics on the players were collected from the internet, and the top fifteen players in each of the eighteen different statistics were ranked from one to fifteen and assigned values. The number of points each player received in each category were totaled. 

Three different ranking methods were employed:

1.      unweighted, where all batting statistics were weighted equally,

2.      a weighted version which weighted only the desirable characteristics chosen by the coaches’ surveys, and

3.      another weighted version which weighted the top four desirable characteristics along with the other unweighted statistics. 

The top batter in the unweighted statistical version was Lou Gehrig, the top batter in the weighted only version was Babe Ruth, and the top batter in the weighted only plus other statistics was Babe Ruth.

The top fifteen batters in each method was determined and assigned a numerical value.  Points from all three versions were totaled up and a final overall best batter was decided. 

The project’s hypothesis was only partially supported.  There was a difference in the three versions but only a very slight one.  Lou Gehrig was the best batter statistically overall after all three versions had been combined and points totaled up.

 

 INTRODUCTION

                What makes a great batter?  Is it their on base percentage? Perhaps it is how many home runs a player hits during his career?  Coaches and sports fans all around the world have their own opinion on what makes a great batter.  Most coaches and sports fans base their opinion on only one or two statistics when in reality, there are many more statistics that people should take into consideration.

“Hitting a baseball is one of the most challenging skills in sports- an art in which 30% success rate is considered above average.” (1) Batters have to deal with many things including the pitching ability, the bat and ball both having round surfaces, batting stance, positioning, attacking the ball, and thinking outside the box.

Baseball is the game best suited for statistical analysis.  To have a batting average of  “300” means that for every ten times at bat then the hitter got a hit three times.  Statistics in baseball can be figured in many different ways.  For example, a person can determine their career homerun average or mean by totaling their total number of home runs in his career divided by the total seasons played.  By ranking ones statistics from highest to lowest, a player can determine what characteristic he needs to work on more. Not only are statistics useful in determining a batter’s own percentages, but they can also enable a batter to evaluate where he stands in relation to other batters.  Statistics are extremely useful in determining baseball percentages and comparisons.

            Many characteristics go into the production of a great batter.  Some people say that the size of the batter influences the way a batter hits.   “Current baseball scouts generally focus their attention on larger prospects, particularly pitchers.”(2)   However if Babe Ruth was magically transported into the 21st century, he would not stand out in any team picture.  Although he was a legendary giant, if he were alive today, he would only stand taller than 48% of the players who were on major-league 40-man rosters at the start of spring training.

               There are many statistics that should go into a great batter, but frequently people consider only one or two batting statistics to be important.  Many fans feel that the number of home runs is the ultimate statistic that makes a great batter.  Home runs are the most exciting play of the game.   “Whoever called the triple the most exciting play in baseball was either narrow minded or nuts.  We all know what baseball’s most exciting play is: the long ball. That’s what puts derrieres in the seats.  Chicks don’t dig three-baggers.  Come on, how many of you knew that Christian Guzman led the major leagues with 20 triples last season?” (3) Home runs can be very important during a ballgame, but home runs don’t count for everything.  Other statistics into play such as: on base percentage, runs batted in, and slugging percentage, yet these are not even all of the important statistics which should be considered when deciding on the best batter.

            Just as the world has changed over the past years, baseball has also changed.   “The ‘National Pastime,’ is a whole new ballgame compared to what it was like in 1942.” (4) That may wound the sensibilities of those so entranced by nostalgia and a romantic view of the game’s mythology as to contend baseball is immutable.  But to insist that Barry Bonds, Sammy Sosa, Jason Giambi and Randy Johnson are playing the same game as their great predecessors of the 1940’s, like Joe DiMaggio, Ted Williams, Stan Musial, Bob Feller, Mort Cooper and the rest, is sheer self-delusion.

The purpose of this is to determine the best batter throughout history by using a statistical analysis over three versions: an unweighted version, a weighted only version, and a weighed version plus other unweighted statistics.  The unweighted version weights each of eighteen batting statistics equally therefore giving all statistics equal importance.  The weighted version using only the top four desirable characteristics chosen by the coaches’ survey therefore giving only certain characteristics importance.  The weighted statistics plus the other unweighted statistics using both the unweighted and the weighted statistics therefore giving everything importance, but some statistics gaining more importance than others.  The hypothesis for this experiment was “When comparing baseball-batting statistics using a mathematical analysis which weights each desirable characteristic, a significantly different result will be evident in the selection of the “best baseball batter ever” rather than the one selected using an unweighted mathematical analysis which assigns all characteristics an equal importance value.”

 

Methodology

 

MATERIALS

                                           

PROCEDURE

STEP 1: Prepare survey to be sent to coaches asking their opinion on the top 10 baseball batters old and new, and their opinions on characteristics of what makes the best batter. (See survey on following page.)

 

STEP 2: Consult Coach Eddie Collins (Oklahoma Baseball Coach Hall of Famer) to revise survey.

 

STEP 3: Fax or E-Mail 30 coaches.

 

STEP 4: After surveys have been returned, tabulate coaches’ responses to the surveys to find top four characteristics they felt were most desirable and record using Microsoft Excel. Also from the survey’s results, determine the top 50 players.

 

STEP 5: Gather data on 18 batting statistics for those top 50 players from the Internet using www.baseballreference.com. (See list of 18 statistics on the following pages) and record into a database in Microsoft Excel.

 

STEP 6: Rank each player for each batting statistic by numbering each player as 1 being the best and 50 being the worst.

 

STEP 7: Assign the top 15 players in each statistical category points by giving the number 1 in each category 15 points, the number 2 in each category 14 points, and so forth until the top 15 players have been given points in each category.

 

VERSION 1: UNWEIGHTED STATISTICAL VERSION:

 

STEP 8: Rank batters by their total points and determine who is the best batter by using an unweighted statistical version.

 

VERSION 2: FOR WEIGHTED USING ONLY COACHES’ SELECTED CHARACTERISTICS:

 

STEP 9: Weight the top 4 statistics chosen by the survey by multiplying the most desirable batting statistic by 5, the next most desirable statistic by 4, the next most desirable characteristic by 3, and the last most desirable characteristic by 2. 

 

 

STEP 10:Use only the four most desirable characteristics and rank the batters by their total of points and determine who the best batter is by using the weighted statistics only.

 

VERSION 3: COMBINATION OF WEIGHTED COACH PREFERRED STATISTICS AND THE OTHER UNWEIGHTED STATISTICS:

 

STEP 11: Copy the four statistics used in version #2 for all the batters and then record the other unweighted statistics from version #1 and record into a new database.

 

STEP 12: Rank the batters by their total number of points and determine who the best batter is by using the unweighted statistics and the weighted statistics.

 

STEP 13: Assign the top fifteen in all three categories points by giving the number one fifteen points, the number two fourteen, and so fourth till all fifteen in each category have been given points.

 

STEP 14: Place each player’s new points into a new spreadsheet.

 

STEP 15: Total each player’s points, rank, and determine the overall best batter. 

 

STEP 16: Analyze the difference between the three versions, if any, and decide whether the project supported the hypothesis or not.

 

THE 18 STATISTICS USED (ABREVIATION/STATISTIC)

 

H-                    Hits –career

H-                    Hits – single season

2B-                  Doubles – career

2B-                  Doubles –single season

3B-                  Triples – career

3B-                  Triples – single season

HR-                 Home Runs – career

HR-                 Home Runs – single season

RBI-                Runs Batted In – career

RBI-                Runs Batted In – single season

SO-                  Strikeouts – career

SO-                  Strikeouts – single season

BA-                 Batting average – career  (hits/ at bats)

BA-                 Batting average – single season

OBP-               On base percentage – career (hits + base on balls + hit by pitcher / at bat + base on balls + sacrifice flies + hit by pitcher)

OBP-               On base percentage – single season

SLG-               Slugging percentage – career (total bases/ at bats)

SLG-               Slugging percentage – single season

 

 

 

 

 

 

 

 

 

COACHES’ SURVEY

 

When it comes down to the final statistics, what makes a great batter?

Please give you opinion about the following questions.

 

¨     What is the most desirable characteristic of a hitter?

 

¨     Please rank the following list (1 being the best and 5 being the worst)

SINGLE SEASON

_______A batter who has a great on-base percentage

_______A batter who has many home runs

_______A batter who has several home runs and an a high average on-base percentage (.325 batting average)

_______ A batter who not only has a good on-base percentage but also gets multiple bases (doubles and triples)

_______A batter who has many RBI’s

 

CAREER

_______ A batter who has a consistently high batting average over many seasons

_______ A batter who has a consistently high number of homeruns each season

_______A batter whose career last many years

_______ A batter who has one record breaking season

_______A batter who has very few strikeouts in their career

PLEASE LIST YOUR TOP 10 FAVORITE OLD AND NEW BASEBALL BATTERS

RESULTS

Five sets of data resulted after the data was analyzed: the coaches’ survey, an unweighted statistical version, a version using only the weighted statistics, and a weighted plus unweighted statistical version.  From the coaches’ survey, the top four desirable characteristics chosen by the coaches were used in the experiment. These characteristics were batting average (BA), home runs (HR), slugging percentage (SLG %), and strikeouts (SO).  Batting average, home runs, and strikeouts were all chosen from the career section of the survey, but slugging percentage was also included in the cumulative ranking from the single season section of the survey. After sending the coaches surveys, it was realized that the survey should have been written more clearly to be properly rated by the coaches.  Therefore, slugging percentage is an overall statistic that covered the top three choices: great on base percentage, several home runs and a high average on base percentage, and a high number of RBI’s.  Eighty percent of the coaches selected batting average as the most desirable characteristic.  Ten percent of the coaches chose home runs as their most desirable characteristic, and ten percent of the coaches chose fewest strikeouts as their most desirable characteristic. Though home runs and strikeouts had the same number of most desirable characteristics, home runs received more second and third choices than strikeouts.  Since slugging percentage was an overall desirable trait formed by the combination of three statistics we placed it as second best in the top four desirable characteristics. 

            The second set of results is the unweighted statistical version.  The top five in this category are Lou Gehrig with 143 points, Ty Cobb with 142 points, Rogers Hornsby with 139 points, Tris Speaker with 134 points, and Babe Ruth with 127 points. 

The third set of results is the weighted only statistical version.  The top five in this category are Babe Ruth with 152, Ted Williams with 150 points, Lou Gehrig with 122 points, Todd Helton with 110, and Rogers Hornsby with 104 points. 

            The fourth set of results is the weighted version plus the other unweigthed statistics.  The top five in this category are Babe Ruth with 240 points, Lou Gehrig with 234 points, Ted Williams with 223 points, Rogers Hornsby with 216 points, and Ty Cobb with 210 points. 

            The fifth and final set of results is the complete total or overall best batter of all three versions combined.  The top five in this category are Lou Gehrig with 42 points, Babe Ruth with 41 points, Rogers Hornsby with 36 points, Ted Williams with 36 points, and Ty Cobb with 35 points.


SURVEY RESULTS

 

 

 

 

 

 

 

 

PLEASE RANK THE FOLLOWING 1 BEING WORST AND 5 BEING THE BEST

 

 

 

SINGLE SEASON

 

 

 

 

 

 

 

 

___1___A BATTER WHO HAS A GREAT  ON-BASE PERCENTAGE

 

 

 

___2__­_A BATTER WHO HAS MANY HOME RUNS

 

 

 

 

 

___3___A BATTER WHO HAS SEVERAL HOME RUNS AND A HIGH AVERAGE ON-BASE % (.325)

 

___4___A BATTER WHO NOT ONLY HAS A GOOD ON-BASE % BUT ALSO GETS MULTIPLE BASES

___5___A BATTER WHO HAS MANY RBI'S

 

 

 

 

 

 

CAREER

 

 

 

 

 

 

 

 

 

___6___A BATTER WHO HAS A CONSISTENTLY HIGH BATTING AVERAGE OVER MANY SEASONS

_ _ 7___A BATTER WHO HAS A CONSISTENTLY HIGH NUMBER OF HOMERUNS EACH SEASON

 

___8___A BATTER WHOSE CAREER LAST MANY SEASONS

 

 

 

 

_ _9___A BATTER WHO HAS ONE RECORD BREAKING

 SEASON 

 

 

 

 

__10___A BATTER WHO HAS VERY FEW STRIKEOUTS IN THEIR CAREER

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CHOICE #

# OF 1'S

# OF 2'S

# OF 3'S

# OF 4'S

# OF 5'S

 

 

 

 

1

5

0

2

2

1

 

 

 

 

2

0

0

2

3

5

 

 

 

 

3

4

3

1

2

0

 

 

 

 

4

4

4

2

0

0

 

 

 

 

5

1

3

3

1

2

 

 

 

 

6

8

2

0

0

0

 

 

 

 

7

1

4

3

2

0

 

 

 

 

8

0

0

5

4

1

 

 

 

 

9

0

0

1

3

6

 

 

 

 

10

1

3

3

2

1

 

 

 

 

 

ANALYSIS AND CONCLUSION

            From the survey’s results, it was seen that coaches do indeed have their own favorite characteristics for batters.  The hypothesis of this experiment was only partially supported in the sense that there was a slight difference at the top of the three versions but not a major one.  Lou Gehrig placed first in the unweighted version and Babe Ruth placed first in the weighted only version and the weighted plus unweighted statistics version. 

The majority of the top batters placed in the top in all three methods, but there are exceptions to this.  For example, Todd Helton placed third and received 12 points in the weighted only version, but only received a total of twelve points in the other two versions combined.  Between the top five players, there was only a difference of seven points, and each of the top players placed at least third in one of the three versions. Lou Gehrig, the player with the most points after all three versions were completed, placed 1st in the unweighted version, 2nd in the weighted version plus the unweighted statistics category, and 3rd in the weighted only version. 

It is interesting to note that all of the top five players were from the older years.  Perhaps they were more dedicated to the sport instead of the dollars.  If this project was completed again, the survey should be more clearly written with more precise directions so the choices would be more specific and the wording would be clearer.  This project could be used to show fans around the world that there are many other characteristics of a batter that should be considered while watching baseball games other than just home runs. The project could also show people that more credit should be given to those batters who excel in the other categories instead of just focusing on only a couple of high profile batters.

 

ACKNOWLEDGEMENTS

                I would like to thank my computer teacher, Mrs. Dansby, for teaching me Microsoft Excel, without which this project would have been impossible.

 

            I would like to thank my science teacher, Mrs. Stevens, for all the time and assitance in which she has given me.

REFERENCES

1.      Bonavita, M., HITTING BASICS, SPORTING NEWS, JULY 17, 2000

 

2.      Schmuck, P., DOES SIZE REALLY MATTER?, BASEBALL DIGEST, JULY, 2001

3.       Deveney, S., DEEP THOUGTS. (BASEBALL PLAYERS GIVE THEIR OPINION ON HITTING HOME RUNS), SPORTING NEWS,  APRIL 23, 2001

 

4.      Vass, G., HOW GAME HAS CHANGED IN 68 YEARS, BASEBALL DIGEST, FEBUARY, 2002

 

5.      Johnston, R., ELEMENTARY STATISTICS, seventh edition

6.      COACH EDDIE COLLINS, OKLAHOMA BASEBALL COACH HALL A FAMER

 

7.      THE BASEBALL ENCYCLOPEDIA, tenth edition

8.      Baseball statistic website, www.baseballreference.com
 

9.   Jim Albert and Jay Bennett, CURVE BALL, July, 2001