import tensorflow as tf
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
This article primarly discusses on implementation of simple linear regression in both sklearn and tensorflow.
We will consider a toy dataset of diabetes, described below, to perform both linear regression using OLS and shallow neural networks, and learn from both approaches.
= load_diabetes()
data print(data['DESCR'])
.. _diabetes_dataset:
Diabetes dataset
----------------
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
**Data Set Characteristics:**
:Number of Instances: 442
:Number of Attributes: First 10 columns are numeric predictive values
:Target: Column 11 is a quantitative measure of disease progression one year after baseline
:Attribute Information:
- age age in years
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, total serum cholesterol
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, total cholesterol / HDL
- s5 ltg, possibly log of serum triglycerides level
- s6 glu, blood sugar level
Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of `n_samples` (i.e. the sum of squares of each column totals 1).
Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
For more information see:
Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
print("Number of Independent features: ", data['data'].shape[1])
print("Number of Training Instances: ", data['target'].shape[0])
Number of Independent features: 10
Number of Training Instances: 442
Let us divide the entire dataset into train and test set
= data['data']
X= data['target']
y
= train_test_split(X, y, test_size=0.2, random_state=42) X_train, X_test, y_train, y_test
Linear Regression Sci-kit Learn
= LinearRegression()
model
model.fit(X_train,y_train)
= model.predict(X_test)
y_pred
= mean_squared_error(y_test, y_pred)
mse print("Mean Squared Error:", mse)
Mean Squared Error: 2900.193628493482
print("Model Coefficients: \n", model.coef_)
Model Coefficients:
[ 37.90402135 -241.96436231 542.42875852 347.70384391 -931.48884588
518.06227698 163.41998299 275.31790158 736.1988589 48.67065743]
print("Model Intercept: \n", model.intercept_)
Model Intercept:
151.34560453985995
Shallow Neural Network - Linear Regression
We can view the shallow neural network for linear regression in below way
= y.reshape(-1, 1)
y_reshaped
# Concatenate y as an additional column to X
= np.concatenate((X, y_reshaped), axis=1)
X_with_y
# Compute the correlation matrix
= np.corrcoef(X_with_y, rowvar=False) correlation_matrix
print(np.round(correlation_matrix,1))
[[ 1. 0.2 0.2 0.3 0.3 0.2 -0.1 0.2 0.3 0.3 0.2]
[ 0.2 1. 0.1 0.2 0. 0.1 -0.4 0.3 0.1 0.2 0. ]
[ 0.2 0.1 1. 0.4 0.2 0.3 -0.4 0.4 0.4 0.4 0.6]
[ 0.3 0.2 0.4 1. 0.2 0.2 -0.2 0.3 0.4 0.4 0.4]
[ 0.3 0. 0.2 0.2 1. 0.9 0.1 0.5 0.5 0.3 0.2]
[ 0.2 0.1 0.3 0.2 0.9 1. -0.2 0.7 0.3 0.3 0.2]
[-0.1 -0.4 -0.4 -0.2 0.1 -0.2 1. -0.7 -0.4 -0.3 -0.4]
[ 0.2 0.3 0.4 0.3 0.5 0.7 -0.7 1. 0.6 0.4 0.4]
[ 0.3 0.1 0.4 0.4 0.5 0.3 -0.4 0.6 1. 0.5 0.6]
[ 0.3 0.2 0.4 0.4 0.3 0.3 -0.3 0.4 0.5 1. 0.4]
[ 0.2 0. 0.6 0.4 0.2 0.2 -0.4 0.4 0.6 0.4 1. ]]
I’m using the above correlation matrix as weight initializer to check whether the weights will be in similar way to that of the regular OLS.
= X.shape[1]
no_of_features
= tf.keras.initializers.Constant(value= [0.2,0.2,0.3,0.3,0.2,-0.1,0.2,0.3,0.3,0.2])
initializer_weights
= tf.keras.initializers.Constant(value = [1.0])
initializer_bias
= tf.keras.Sequential([
model_tf =1,
tf.keras.layers.Dense(units="linear",
activation=(no_of_features,),
input_shape= initializer_weights,
kernel_initializer = initializer_bias)
bias_initializer
])
compile(optimizer='sgd', loss='mean_squared_error',metrics=['mse'])
model_tf.
=1500, verbose=0)
model_tf.fit(X_train, y_train, epochs
= model_tf.predict(X_test) y_pred_test
3/3 [==============================] - 0s 4ms/step
model_tf.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_7 (Dense) (None, 1) 11
=================================================================
Total params: 11 (44.00 Byte)
Trainable params: 11 (44.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
As we can see above there is just one layer connecting the paramters;
11 Paramters are 10 weight paramters and 1 bias parameter that needs to be learned by the neural network
= mean_squared_error(y_test, y_pred_test.flatten())
mse_tf print("Mean Squared Error:", mse_tf)
Mean Squared Error: 2925.001255544474
= model_tf.layers[0].get_weights()
weights, bias print("Weights:")
print(weights)
print("Bias:")
print(bias)
Weights:
[[ 5.9075676e+01]
[-9.0772835e+01]
[ 3.6696533e+02]
[ 2.5461090e+02]
[ 9.4142288e-02]
[-3.7435982e+01]
[-1.8325632e+02]
[ 1.4889064e+02]
[ 2.8909210e+02]
[ 1.5162097e+02]]
Bias:
[153.55777]
Comparision of Parameters
import pandas as pd
pd.concat(=["Model Coefficients of OLS"]),
[pd.DataFrame(model.coef_, columns=["Model Coefficients of NN"])],
pd.DataFrame(weights,columns= 1
axis )
Model Coefficients of OLS | Model Coefficients of NN | |
---|---|---|
0 | 37.904021 | 59.075676 |
1 | -241.964362 | -90.772835 |
2 | 542.428759 | 366.965332 |
3 | 347.703844 | 254.610901 |
4 | -931.488846 | 0.094142 |
5 | 518.062277 | -37.435982 |
6 | 163.419983 | -183.256317 |
7 | 275.317902 | 148.890640 |
8 | 736.198859 | 289.092102 |
9 | 48.670657 | 151.620972 |
At this juncture, it’s evident that the weights in neural networks are derived through iterative optimization methods, notably gradient descent, optimizing model performance. Conversely, OLS regression employs statistical techniques to determine model coefficients. Consequently, the fundamental nature of these weights differs significantly.
At present, it’s feasible to manually interpret the coefficients derived from OLS regression, aiding in understanding the relationships between variables.
However, this interpretability is not readily achievable with neural networks, underscoring their characterization as black box models primarily designed for prediction rather than inference.
It seems there is ongoing research seeks to enhance interpretability in neural networks which i’m not yet aware at this point of time.