Ken's Blog: December 2008

Wednesday, December 03, 2008

Groovy Best Fit Line

An engineer on my team at work asked if I could help with coding an algorithm for calculating a line that best fits a given set of data points. I thought, "what a great excuse to practice some Groovy", and got started right away.

A few Google searches led me to this 10-yr-old page on Regression Functions by Stefan Waner from Hofstra University. Stefan outlined the exact algorithm I was looking for.

Here is my implementation in Groovy:

// Prints the start/end points of a line that best fits 4 sample data points
pts = [[1, 1.5], [2, 1.6], [3, 2.1], [4, 3.0]]
println bestFit(pts)

/**
* Given a set of points, uses a linear regression algorithm to find the start and end points
* of a line that best fits the set of points.
* Returns the two points as [[x1, y1], [x2, y2]].
* The algorithm is from
* http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html.
*/
def bestFit(pts) {
  // Find sums of x, y, xy, x^2
  n = pts.size()
  xSum = pts.collect() {p -> p[0]}.sum()
  ySum = pts.collect() {p -> p[1]}.sum()
  xySum = pts.collect() {p -> p[0]*p[1]}.sum()
  xSqSum = pts.collect() {p -> p[0]*p[0]}.sum()

  // Find m and b such that y = mx + b
  // m is the slope of the line and b is the y-intercept
  m = (n*xySum - xSum*ySum) / (n*xSqSum - xSum*xSum)
  b = (ySum - m*xSum) / n

  // Find start and end points based on the left-most and right-most points
  x1 = pts.collect() {p -> p[0]}.min()
  y1 = m*x1 + b
  x2 = pts.collect() {p -> p[0]}.max()
  y2 = m*x2 + b

  [[x1, y1], [x2, y2]]
}

Running this script prints the following:

[[1, 1.3], [4, 2.8]]

You gotta love Groovy!