Linear Regression

Explore how a line of best fit is calculated to model the relationship between variables. Adjust data points and watch the regression line update in real-time.

Supervised Learning Example

Predicting house prices based on size — drag the point to see predictions

Numbers
241k
1.5
−33.2

House sizes and prices

A house with 1250 ft² is predicted to cost $241k
Categories
cat2
dog
disease10

Regression model

Predicts numbers

Infinitely many possible outputs

Umbrella Concept

Supervised learning

Data has "right answers"

Model learns from labeled examples

Classification model

Predicts categories

Small number of possible outputs

Terminology

Training set: Data used to train the model

xx(size in ft²)yy(price in $1000's)
(1)2104400
(2)1416232
(3)1534315
(4)852178
.........
(47)3210870

Notation

xx = "input" variable (feature)

yy = "output" variable ("target" variable)

mm = number of training examples

(x,y)(x, y) = single training example

(x(i),y(i))(x^{(i)}, y^{(i)}) = ithi^{th} training example

Examples

x(1)=2104x^{(1)} = 2104, y(1)=400y^{(1)} = 400

(x(1),y(1))=(2104,400)(x^{(1)}, y^{(1)}) = (2104, 400)

x(2)=1416x^{(2)} = 1416

In this dataset: m=47m = 47

Important: Superscript notation

x(2)x2x^{(2)} \neq x^2

The superscript (i)(i) in parentheses denotes the index of the training example, not an exponent.