How to split the column with deleieter

by cdac   Last Updated October 20, 2019 05:26 AM - source

I have a csv. need to split the \n with ,

name,address
711-2880,Mankato\n96522\n(257) 563-7401
971-2880,CA\n965\n(01) 563-7401\nNebraska

My COde is below

import pandas as pd
df = pd.read_csv('test.csv')
df.address = df.address.str.split('\n')

My Out

    name    address
0   711-2880    [Mankato\n96522\n(257) 563-7401]
1   971-2880    [CA\n965\n(01) 563-7401\nNebraska]

Expected Out

    name    address
0   711-2880    [Mankato,96522,(257) 563-7401]
1   971-2880    [CA,965,(01) 563-7401,Nebraska]

I need to apply explode after seperate by ,

Tags : python pandas


Answers 2


Your data in the address column is a list, not a string. You first need to access the first element of this list (which is a string), and then do your split.

# Sample Data:
df = pd.DataFrame({
    "name": ['711-2880', '971-2880'], 
    "address": [['Mankato\n96522\n(257) 563-7401'], ['CA\n965\n(01) 563-7401\nNebraska']]}
)

>>> df['address'].apply(lambda col: col[0].split('\n'))
0      [Mankato, 96522, (257) 563-7401]
1    [CA, 965, (01) 563-7401, Nebraska]
Name: address, dtype: object

In the event that some of the address entries are empty, you can just work on the subset that have at least one list item (ignoring data if there are more than one item in the list).

mask = df['address'].apply(len).gt(0)
df.loc[mask, 'address'] = df.loc[mask, 'address'].apply(lambda col: col[0].split('\n'))
Alexander
Alexander
October 20, 2019 04:55 AM

i copied and pasted your data into a .csv file and read it in the following way, then split the address using a lambda like so:

import pandas as pd
df = pd.read_csv('file.csv')
df

       name                           address
0  711-2880    Mankato\n96522\n(257) 563-7401
1  971-2880  CA\n965\n(01) 563-7401\nNebraska

df.address = df.address.apply(lambda x: x.split('\\n'))
df

       name                             address
0  711-2880    [Mankato, 96522, (257) 563-7401]
1  971-2880  [CA, 965, (01) 563-7401, Nebraska]

you could also do it your way but by making this change:

df.address.str.split(r'\\n')
Derek Eden
Derek Eden
October 20, 2019 05:25 AM

Related Questions


pandas groupby apply is really slow

Updated November 05, 2017 15:26 PM

Do not map item to any output using apply()

Updated July 30, 2018 21:26 PM