1

I have a set of values for a surface (in pixels) that becomes bigger over time (exponentially). The surface consists of cells that divide over time. After doing some modelling, I came up with the following formula:

$$S(t)=S_{initial}2^{t/a_d},$$

where $a_d$ is the age at which the cell divides. $S_{initial}$ is known. I am trying to estimate $a_d$. I simply tried the $\chi^2$ test:

# Range of ages of division. 
a_range = np.linspace(1, 500, 100)

# Set up an empty vector to store the chi squared value
chi_sq = np.zeros(len(a_range))

# Iteration through division ages 
for i in range(len(a_range)):
    # Compute the expected value at each time point. 
    expect = cell_area[0] * (2**(time_range/a_range[i]))

    # Compute chi squared 
    chi_sq[i] = np.sum((cell_area - expect)**2)

# Plot chi squared test
plt.plot(a_range, chi_sq, '.')
plt.yscale('log')

# Labelling
plt.xlabel('division age [min]')
_ = plt.ylabel('$\chi^2$')

but the minimum is always at the upper bound of the age range, depending on what I set this range to be. It doesn't seem right. So I linearized the model:

$$ln(S(t))=ln(S_{initial})+(1/a_d)ln(2)*t,$$

which is now just simple linear regression with a fixed intercept.

Questions:

  1. Why didn't the first method work?
  2. Are there any resources on how to implement the above regression in Python? I'm new to this and everything I found was very simple, but I don't know how to deal with logs and fix the intercept.
ebrahimi
  • 1,305
  • 7
  • 20
  • 40
a0142204
  • 11
  • 2

3 Answers3

1

You can use sklearn to perform this fit (sklearn.linear_model.LinearRegression)

-> Set fit_intercept=False and use X=t and Y=ln(S(t))-ln(S(0)), the slope (a in Y=aX) should then be equal to (1/ad)ln(2)

About the analytical result : have you taken into account the fact that if cells are dividing on a surface, some cells might not be able to divide anymore as it gets more and more crowded?

mprouveur
  • 358
  • 1
  • 7
0

Chi squared test does not serve any purpose. The nonlinear equations/functions can be handled by transforming them in linear functions. The linear model can be used once we transform the nonlinear data/relations into linear format. chi squared test checks for variability. You seem to be interested in sum total of surface (area) i.e. linear model and not a linear regression.

Subhash C. Davar
  • 661
  • 5
  • 20
-1
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from statsmodels.tsa.arima.model import ARIMA
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

# Load cryptocurrency price data
btc_price = pd.read_csv('bitcoin_prices.csv', index_col='date')
eth_price = pd.read_csv('ethereum_prices.csv', index_col='date')
ltc_price = pd.read_csv('litecoin_prices.csv', index_col='date')

# Load sentiment data from social media
btc_sentiment = pd.read_csv('bitcoin_sentiment.csv', index_col='date')
eth_sentiment = pd.read_csv('ethereum_sentiment.csv', index_col='date')
ltc_sentiment = pd.read_csv('litecoin_sentiment.csv', index_col='date')

# Preprocess the data
df = pd.concat([btc_price, btc_sentiment, eth_price, eth_sentiment, ltc_price, ltc_sentiment], axis=1)
df = df.fillna(method='ffill')

# Train an ARIMA model to predict future cryptocurrency prices
btc_arima = ARIMA(btc_price['close'], order=(1, 1, 1)).fit()
btc_forecast = btc_arima.forecast(steps=30)

# Combine sentiment and price data
X = df[['btc_sentiment_score', 'eth_sentiment_score', 'ltc_sentiment_score', 'btc_close', 'eth_close', 'ltc_close']]
y = df['btc_close'].shift(-1)

# Train a linear regression model
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Evaluate the model
r2 = r2_score(y, y_pred)
mae = mean_absolute_error(y, y_pred)
rmse = np.sqrt(mean_squared_error(y, y_pred))

print(f'R-squared: {r2:.2f}')
print(f'Mean Absolute Error: {mae:.2f}')
print(f'Root Mean Squared Error: {rmse:.2f}')
MMK
  • 1