Skip navigation


I imported a CSV in pandas like below

>>> import pandas
>>> df = pandas.read_csv('file.csv',names=['count', 'province', 'city', 'district', 'region', 'area'])
>>> df.head()
   count province         city           district region area
0   7923     Aceh   Aceh Barat   Arongan Lambalek            
1    628     Aceh   Aceh Barat     Johan Pahlawan            
2    235     Aceh   Aceh Barat        Woyla Timur            
4   3900     Aceh   Banda Aceh                         

using SQL, I can do something like this

    SELECT SUM(count) AS sum, district 
        FROM table WHERE city = 'Aceh Barat' 
        GROUP BY district 
        ORDER BY sum DESC

but using pandas python library, I can achieve the same using.

>>> import pandas, numpy
>>> df = pandas.read_csv('file.csv',names=['count', 'province', 'city', 'district', 'region', 'area'])
>>> df[df['city'] == 'Aceh Barat'].groupby('district').aggregate(numpy.sum).sort(['count'], ascending=False)
                  count
district               
Arongan Lambalek   7923
Johan Pahlawan      628
Woyla Timur         235

>>> df[df['city'] == 'Aceh Barat'].groupby('district').aggregate(numpy.sum).sort(['count'], ascending=False)
                  count
district               
Arongan Lambalek   7923
Johan Pahlawan      628
Woyla Timur         235

>>> df[df['city'] == 'Medan']
        count        province   city            district region area
10340  108769  Sumatera Utara  Medan                 NaN    NaN  NaN
10341     759  Sumatera Utara  Medan        Medan Amplas    NaN  NaN
10342     579  Sumatera Utara  Medan        Medan Amplas    NaN  NaN
10343    1272  Sumatera Utara  Medan        Medan Amplas    NaN  NaN
10344     769  Sumatera Utara  Medan        Medan Amplas    NaN  NaN
10345     379  Sumatera Utara  Medan        Medan Amplas    NaN  NaN
10346     988  Sumatera Utara  Medan        Medan Amplas    NaN  NaN
10347    4395  Sumatera Utara  Medan          Medan Area    NaN  NaN
10348    5598  Sumatera Utara  Medan         Medan Barat    NaN  NaN

>>> df[df['city'] == 'Medan'].groupby('district').aggregate(numpy.sum).sort(['count'], ascending=False)
                    count
district                 
Medan Tuntungan      7425
Medan Tembung        6349
Medan Barat          5598
Medan Timur          5378
Medan Amplas         4746

Tinggalkan Balasan

Isikan data di bawah atau klik salah satu ikon untuk log in:

Logo WordPress.com

You are commenting using your WordPress.com account. Logout / Ubah )

Gambar Twitter

You are commenting using your Twitter account. Logout / Ubah )

Foto Facebook

You are commenting using your Facebook account. Logout / Ubah )

Foto Google+

You are commenting using your Google+ account. Logout / Ubah )

Connecting to %s

%d blogger menyukai ini: