by ozw1z5rd
Last Updated August 14, 2018 14:19 PM

I have a data set where some columns have discrete values like $x_2=('cat','dog','penguin')$, $x_3=( 'high', 'low')$ etc... how do I handle these values before to run a regression?

Do I have to convert them into integers like $x_2=(0,1,2)$, $x_3=(0,1)$?

Do I have to add more columns $x_{cat}, x_{dog}, x_{penguin}, x_{high}, x_{low}$ and assign them a value 0 or 1 ?

Converted to an answer from my comments.

Most modern software does this for you. It does something similar to what you outline in your last sentence for some meaning of similar. I would recommend letting your software do the heavy lifting here and if it does not offer this facility then choose a different software. There are some hints in this Q&A Dummy variables for categories in logistic regression and odd ratio which is for logistic regression but applies to linear and Poisson as well.

Note that your option $x_2$ may work as long as you tell the software these are categories not numerical values. Internally the software will do something like you last suggestion ($x_{cat}$) and so on.

