Page 21 sur 25
Filters
Field contains string
[code]df2 = df1[df1['My field'].str.contains('AAA').fillna(False)]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))[/code]
Field not contains a string
Just use ~:
[code]df2 = df1[~df1['My field'].str.contains('AAA').fillna(False)]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))[/code]
Field not empty, not NaN, not null
[code]df_notEmpty= df[
(df['My field'].notnull()) &
(df['My field'].notna()) &
(df['My field'] != '') &
(df['My field'] != 'nan')
][/code]
Filed starts with
[code]df2 = df1[df1['My field'].str.startswith('A')]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))[/code]
Or:
[code]df2 = df1[df1['My field'].str.startswith(('A', 'B'))]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))[/code]
Field is a numeric value
[code]# Function to extract numeric values
def extractNumericValue(my_text):
my_values = re.findall(r'^-?\d+\.\d+|-?\d+$', str(my_text))
if my_values:
return my_values
else:
return None
# Apply
df['SearchNumericValue'] = df['My column'].apply(extractNumericValue)
dfResult = df[
(df['SearchNumericValue'].notna()) &
(~df['My column'].astype(str).str.contains('[a-zA-Z]', regex = True))
]
print(tabulate(dfResult.head(10), headers='keys', tablefmt='psql', showindex=False))[/code]
Field matches regex
Here we search where the field is a number with a comma as decimal separator.
[code]dfTest = df_ImportTxt[df_ImportTxt['My field'].str.match(pat='^\d+,\d+$')==True][/code]
Field contains element from a list
Example searching space, star and parenthesis:
[code]MySearch = ['\*', ' ', '\(', '\)']
df2 = df1[df1['zone'].str.contains('|'.join(MySearch)).fillna(False)]
print(tabulate(df2.head(40), headers='keys', tablefmt='psql', showindex=False))[/code]
Field equals element from a list
[code]MySearch = ['Bbbb', 'XXX']
df2 = df1[df1['zone'].str.fullmatch('|'.join(MySearch)).fillna(False)]
print(tabulate(df2.head(40), headers='keys', tablefmt='psql', showindex=False))[/code]
Or like a SQL IN:
[code]searchIn= ['AAA', 'BBB', 'CCC']
df2 = df1[df1['My field'].isin(searchIn)]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))[/code]
Field in digit
[code]df2 = df1.loc[df1['zone'].astype(str).str.isdigit()]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql'))[/code]
Several condition
Below we get lines with a field with null values and another field with empty values:
[code]df2 = df1[(df1['My field'].isna()) | (df1['My other field'] == '')]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))[/code]
Get lines searching if values are IN a list
[code]searchIn= ['AAA', 'BBB', 'CCC']
df2 = df1[df1['My field'].isin(searchIn)]
print(tabulate(df2.head(5), headers='keys', tablefmt='psql', showindex=False))[/code]
Field not finishes with a concatenation from other fields
Suppose you have this dataframe df:
| Language | |
| Python | docA-Python.pdf |
| SQL | docF-SQ.pdf |
| PHP | docS-PHP.pdf |
| CSS | docG-CSS.pdf |
| Javascript | docR-Javascript.pdf |
| HTML | docW-HTML.pdf |
Now you would like extract lines where the PDF field does not finish with [Language field] + '.pdf':
[code]Compare = df[['Language', 'PDF']].apply(lambda x: x['PDF'].endswith(x['Language'] + '.pdf'), axis=1)
df_temp = pd.DataFrame(Compare)
df_temp.columns =['True or false']
df_concat = pd.concat([df, df_temp], axis=1)
df_filtered = df_concat[['Language', 'PDF']][df_concat['True or false'] == False]
print(tabulate(df_filtered.head(5), headers='keys', tablefmt='psql', showindex=False))[/code]
| Language | |
| SQL | docF-SQ.pdf |
Field not contains an email 😀
[code]...
df_pb_Email = df[df['Email field'].str.contains(r'[^@]+@[^@]+\.[^@]+') == False]
...[/code]
Field not equals an email
[code]...
df_pb_Email = df[df['Email field'].str.fullmatch(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b') == False]
...[/code]
Field not contains a number
[code]dfProblem = df[(df['My field'].astype(str).str.match(pat=r'^[-+]?[0-9]+$|^[-+]?[0-9]+\.[0-9]+$') == False)][/code]
Field contains another field
[code]df['A contains B'] = [x[0] in x[1] for x in zip(df['FieldB'], df['FieldA'])]
df= df[df['A contains B'] == False][/code]
Field ends like another field
[code]df['A finish with B'] = [x[1].endswith(x[0]) for x in zip(df['FieldB'], df['FieldA'])]
df= df[df['A finish with B'] == False][/code]
Field with lenght equals a number
[code]dfProbleme = df[
df['Field1'].str.len() != 7
][/code]
Count a string in a field
How many times it appears in a cell:
[code]dfCount = df[(df['Substitutions'].str.count('\\,') > 1)][/code]