# Classifying one variable into two different parts with physical meaning

by Han Zhengzu   Last Updated May 16, 2018 14:19 PM

I'm not very proficient in statistic analysis. To deal with a practical problem, I presented the background here for some advice.

## Background

I'm working on the measurements of atmospheric pollutant and the identification of their sources. For a specific atmospheric species \$A\$, its origins are very complex and can be divided into two categories:

(1) directly emitted from the human activities, e.g., vehicle pipeline, industrial processes.

(2) \$A\$ is not emitted from the ground, but as the result of transformation of other species in the atmosphere.

Therefore, A as the combination of primary-emitted A (\$A_p\$) and secondary-formed A (\$A_S\$). To estimate the fractions of \$A_p\$ and \$A_s\$ are crucial for air quality management, while there is no direct method to distinguish them by current instructments.

## My thought

Since dividing A into two parts is hard using chemical analysis, I tried to think about the statistic approach to solve the problem.
\$\$A = A_{p}+A_{s}\$\$

I also measured other species, e.g., \$P_1, P_2\$ which is mainly originated from direct emission, and stable in the atmosphere, and connected with \$A_s\$; \$S_1, S_2\$ which is mainly from the secondary formation, and connected with \$A_s\$.

For the time series of \$A, P_1, P_2, S_1, S_2\$, I thought to predict \$A_p\$ by \$B\$, and \$A_s\$ by \$C\$ would be a meaningful approach to divide \$A\$. In some previous work, I knew that MLR could be an option as:
\$\$A_{p,\ predict} = (a+b*P_1+c*P_2)\$\$ \$\$A_{s,\ predict} = (d+e*S_1+f*S_2)\$\$ while the sum of \$A_{p,\ predict}\$ and \$A_{s,\ predict}\$ was not expected to be the measured \$A\$ in the time series.

Are there any suitable and advanced methods to tackle my problem?

Tags :

## Regression clustering

Updated October 26, 2017 15:19 PM

## Can we mix the algorithm?

Updated January 10, 2019 16:19 PM