## Wednesday, December 03, 2008

### Groovy Best Fit Line

An engineer on my team at work asked if I could help with coding an algorithm for calculating a line that best fits a given set of data points. I thought, "what a great excuse to practice some Groovy", and got started right away.

A few Google searches led me to this 10-yr-old page on Regression Functions by Stefan Waner from Hofstra University. Stefan outlined the exact algorithm I was looking for.

Here is my implementation in Groovy:
``// Prints the start/end points of a line that best fits 4 sample data pointspts = [[1, 1.5], [2, 1.6], [3, 2.1], [4, 3.0]]println bestFit(pts)/*** Given a set of points, uses a linear regression algorithm to find the start and end points* of a line that best fits the set of points.* Returns the two points as [[x1, y1], [x2, y2]].* The algorithm is from* http://people.hofstra.edu/stefan_waner/realworld/calctopic1/regression.html.*/def bestFit(pts) {  // Find sums of x, y, xy, x^2  n = pts.size()  xSum = pts.collect() {p -> p}.sum()  ySum = pts.collect() {p -> p}.sum()  xySum = pts.collect() {p -> p*p}.sum()  xSqSum = pts.collect() {p -> p*p}.sum()  // Find m and b such that y = mx + b  // m is the slope of the line and b is the y-intercept  m = (n*xySum - xSum*ySum) / (n*xSqSum - xSum*xSum)  b = (ySum - m*xSum) / n  // Find start and end points based on the left-most and right-most points  x1 = pts.collect() {p -> p}.min()  y1 = m*x1 + b  x2 = pts.collect() {p -> p}.max()  y2 = m*x2 + b  [[x1, y1], [x2, y2]]}``

Running this script prints the following:
``[[1, 1.3], [4, 2.8]]``

You gotta love Groovy!