Group, count and sort

Détails: Mis à jour : 5 Juin 2024; Clics : 68319

Index de l'article

Page 17 sur 25

Group, count and sort

List unique values

[code]print(df_Deduped.devise.unique())[/code]

Count rows

[code]number = df.shape[0]
print(number)[/code]

Count values (group by)

If you have a field named Country in a data-frame named df:

[code]CountCountry = df.groupby(['Country'], dropna=False).size().sort_values(ascending=False)
print(CountCountry)[/code]

Count values (group by) and get it in a dataframe

[code]CountCommune = pd.DataFrame(df.groupby(['id_commune'], dropna=False).size(), columns=['Total']).sort_values(['Total'], ascending=False).reset_index()

print(tabulate(CountCommune.head(10), headers='keys', tablefmt='psql', showindex=False))[/code]

Count values grouping by several fields to find duplicates

[code]df_Duplicates = pd.DataFrame(df.groupby(['id_commune', 'PAGES'], dropna=False).size(), columns=['Total'])\
.sort_values(['Total'], ascending=False).reset_index()
df_Duplicates = df_Duplicates[(df_Duplicates['Total'] > 1)]

print(tabulate(df_Duplicates.head(10), headers='keys', tablefmt='psql', showindex=False))[/code]

Group concat

[code]df = df.groupby(['country'])['city'].apply(lambda x: ', '.join(x)).reset_index()[/code]

Count a specific value

[code]print (len(df[df['Your field'] == 'United States']))[/code]

Count with condition

[code]myCount = df[df['Field where search the condition'] == 'A value'].count()['Field to count'][/code]

Count empty values

[code]number = df['Your field'].isna().sum()[/code]

Keep empty values in a group by and fill them

[code]newDf = pd.DataFrame(df.groupby(['Your field'], dropna=False).size(), columns=['Total']).sort_values(['Total'], ascending=False).reset_index()
newDf = newDf.fillna('Unknow')[/code]

Count email domains (group by)

You have a Email field with clean emails. Thanks to Pandas it is easy to group and count their DNS.

First create a virtual field to split the emails and recover the 2nd part of the split. Then just group your new field.

[code]df['Domain'] = df['Email'].str.split('@').str[1]
df_Email_DNS_count = pd.DataFrame(df.groupby(['Domain']).size(), columns=['Total'])\
    .sort_values(['Total'], ascending=False).reset_index()[/code]

Calculate percentages

Acording your needs, you can use the sum from a Total field.

[code]df['Percent'] = (df['Total'] / df['Total'].sum()) * 100
df['Percent'] = df['Percent'].round(decimals=2)[/code]

Or the number of rows in your data-frame (useful if you work with a multi-valued field, see above the Analyze data from a multi-valued field chapter) .

[code]df['Percent'] = (df['Total'] / df.shape[0]) * 100
df['Percent'] = df['Percent'].round(decimals=2)[/code]

Sum group by year

[code]df['Year'] = df['DateInvoice'].dt.to_period('Y')

df_AmountYear = df.groupby(['Year'], dropna=False)['AmountEuro'].sum().reset_index()
df_AmountYear = df_AmountYear.sort_values(['Year'], ascending=False)

print('\nAmount per year:')
print(tabulate(df_AmountYear, headers='keys', tablefmt='psql', showindex=False))[/code]

Sort on a field

[code]df.sort_values(by=['Field'], ascending=False, inplace=True)
# Recreate the index if necessary (if you will use the index later)
df = df.reset_index(drop=True)[/code]

Sort on a field containing both text and numerical values

[code]df = df.loc[pd.to_numeric(df['My field'], errors='coerce').sort_values().index][/code]

Sum and count

If you want sum an Amount field per country counting the number of lines:

[code]df_AmountCountries = df.groupby(['Country'], dropna=False)\
    .agg({'ID': 'count', 'AmountEuro': 'sum'})\
    .reset_index()\
    .rename(columns={'ID': 'Number'})[/code]

Menu

Privé

Tests Fabrik

Connexion

Data management with Python, Pandas, Geopandas, Matplotlib and Openpyxl - Group, count and sort

Index de l'article

Group, count and sort

List unique values

Count rows

Count values (group by)

Count values (group by) and get it in a dataframe

Count values grouping by several fields to find duplicates

Group concat

Count a specific value

Count with condition

Count empty values

Keep empty values in a group by and fill them

Count email domains (group by)

Calculate percentages

Sum group by year

Sort on a field

Sort on a field containing both text and numerical values

Sum and count