Page 18 sur 25
Replace, remove, edit with conditions
Replace empty cells
[code]df['speciality'] = df['speciality'].fillna('Other')[/code]
Replace a value (the full value)
[code]df['Your field'] = df['Your field'].replace(['Old value'],'New value')[/code]
Replace a part of a string using regex
[code]df['Your field'] = df['Your field'].replace({'Old value': 'New value'}, regex=True)[/code]
Replace a part of a string using regex and ignoring the case
[code]df['Your field'] = df['Your field'].replace({'OlD valUe': 'New value'}, regex=True, case=False)[/code]
Replace a string if it contains
[code]df.loc[df['speciality'].str.contains('Researcher'), 'speciality'] = 'Research Scientist'[/code]
If not contains
Add ~:
[code]df.loc[~df['speciality'].str.contains('Researcher'), 'speciality'] = 'Research Scientist'[/code]
Replace comma with point
[code]df['myField'] = df['myField'].replace({'(?<=\d),(?=\d)': '.'}, regex=True)[/code]
Localize and remove rows
[code]ind_drop = df[df['Your field'].apply(lambda x: x == ('A value'))].index
df = df.drop(ind_drop)[/code]
Localize and remove rows starting with ...
[code]ind_drop = df[df['Your field'].apply(lambda x: x.startswith('A value'))].index
df = df.drop(ind_drop)[/code]
Localize and remove rows ending with ...
[code]ind_drop = df[df['Your field'].apply(lambda x: x.endswith('A value'))].index
df = df.drop(ind_drop)[/code]
Localize and replace full rows
[code]df.loc[(df['A field'] == 'TARGET')] = [[NewValue1, NewValue2, NewValue3]][/code]
Localize rows according a regex and edit another field
[code]df.loc[df['Field1'].str.contains(pat='^place ', regex=True), 'Field2'] = 'Yes'[/code]
Replace a field with values from another field with a condition
[code]df['Field1'] = np.where(df['Field1']
.apply(lambda x: x.startswith('A string in condition...')),
df['Field2'], df['Field1'])[/code]
Remove some first characters
Here we delete the 2 first characters if the cell starts with a comma then a space.
[code]df['Field'] = df['Field'].apply(lambda x: x[2:] if x.startswith(', ') else x)[/code]
Keep only some first characters
[code]df['Field'] = df['Field'].apply(lambda x: x[:10])[/code]
Remove some last characters
[code]df['DateInvoice'] = df['DateInvoice'].apply(lambda x: x[:-4] if x.endswith(' UTC') else x)[/code]
Remove the content from a field in another field
[code]df['NewField'] = df.apply(lambda x : x['FieldToWork'].replace(str(x['FieldWithStringToRemove']), ''), axis=1)[/code]
Or with a regex, example to remove the content only if it is at the beginning of the field:
[code]df['NewField'] = df.apply(lambda x : re.sub('^'+str(x['StringToRemove']), '', str(x['FieldToWork'])) if str(x['FieldToWork']).startswith(str(x['StringToRemove'])) else str(x['FieldToWork']), axis=1)[/code]
Edit with a condition
Increment a field if another field is empty.
[code]df.loc[df['My field maybe empty'].notna(), 'Field to increment'] += 1[/code]
Fill a field if a field is greater or equal to another field.
[code]df.loc[df['Field A'] >= df['Field B'], 'Field to fill'] = 'Yes'[/code]
Edit several fields in the same time.
[code]df.loc[df['Field A'] >= df['Field B'], ['Field A to fill', 'Field B to fill']] = ['Yes', 'No'][/code]
Edit with several conditions
Condition "AND" (&)
[code]df.loc[(df['My field maybe empty'].notna()) & (df['An integer field'] == 1) & (df['An string field'] != 'OK'), 'Field to increment'] += 1[/code]
Please replace "&" with a simple &.
Condition "OR" (|)
[code]df.loc[(df['My field maybe empty'].notna()) | (df['An integer field'] == 1) | (df['An string field'] != 'OK'), 'Field to fill'] = 'Yes'[/code]
Edit with IN or NOT IN condition (as SQL)
Just use isin:
[code]df.loc[df['Id field'].isin([531733,569732,652626]), 'Filed to edit'] = 'Yes'[/code]
And for NOT IN:
[code]df.loc[df['Id field'].isin([531733,569732,652626]) == False, 'Filed to edit'] = 'No'[/code]
Replace string beginning with
[code]df['id_commune'] = df['id_commune'].str.replace(r'(^75.*$)', '75056', regex=True)[/code]
Not start with
~df[phone].str.startswith('A')
Or:
~df[phone].str.startswith(('A', 'B'))
Remove letters
[code]df['mobile'] = df['mobile'].str.extract('(\d+)', expand=False).fillna('')[/code]
Extract before or after a string
Example if Job='IT: DBA'
[code]df['type'] = df['Job'].str.split(': ').str[0]
df['speciality'] = df['Job'].str.split(': ').str[1][/code]
Remove all after a string
[code]df_Files['new field'] = df_Files['old field'].str.replace("(StringToRemoveWithAfterToo).*","", regex=True)[/code]
Remove all before a string
[code]df_Files['file'] = df_Files['file'].str.replace("^.*?_","_", regex=True)[/code]
Get in title case
[code]df['firstname'] = df['firstname'].str.title()[/code]
Remove if contains less of n character (lenght)
[code]df.loc[df['mobile'].str.len() < 6, 'mobile'] = ''[/code]
Remove potential spaces before and after a string (trim)
Use .str.strip(), example:
[code]df.loc[df['My field'].astype(str).str.isdigit() == False, 'My field'] = df['My field'].astype(str).str.strip()[/code]
Remove with a function (def)
[code]def MyDeletion():
# Eventually if your df does not exist when you create the function
# global df
ind_drop = df[df['My field'].apply(lambda x: x == ('My value'))].index
df = df.drop(ind_drop)
...
MyDeletion()[/code]
Decode HTML special char
[code]import html
df['My field'] = df['My field'].apply(lambda x: html.unescape(x))[/code]