# Pandas in Python for Data Analysis with Example(Step-by-Step guide)

### Beginners Pandas Getting Started¶

Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.- If you're new to this first get the enviroment Setup in our previous post
- Getting Started with Jupyter [Part -1] http://www.androidxu.com/2017/04/guide-On-Jupyter-Notebook.html
- Getting Started with Jupyter [Part -2] http://www.androidxu.com/2017/04/the-ultimate-guide-on-jupyter-ipython-mardown.html#.WPJOBYVOL4g

pandas is well suited for:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

- Easy handling of
**missing data** **Size mutability**: columns can be inserted and deleted from DataFrame and higher dimensional objects- Automatic and explicit
**data alignment**: objects can be explicitly aligned to a set of labels, or the data can be aligned automatically - Powerful, flexible
**group by functionality**to perform split-apply-combine operations on data sets - Intelligent label-based
**slicing, fancy indexing, and subsetting**of large data sets - Intuitive
**merging and joining**data sets - Flexible
**reshaping and pivoting**of data sets **Hierarchical labeling**of axes- Robust
**IO tools**for loading data from flat files, Excel files, databases, and HDF5 **Time series functionality**: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

documentation: http://pandas.pydata.org/pandas-docs/stable/10min.html

### Series¶

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers,Python objects, etc.). The axis labels are collectively referred to as the index.

documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html

In [38]:

```
#importing numpy and pandas library
import pandas as pd
import numpy as np
```

#### Create series from NumPy array¶

Creating a basic series from NumpPy array.Number of labels in 'index' must be the same as the number of elements in array

In [39]:

```
my_simple_series = pd.Series(np.random.randn(7), index=['a', 'b', 'c', 'd', 'e','f','g'])
my_simple_series
```

Out[39]:

In [40]:

```
my_simple_series.index
```

Out[40]:

#### Create series from NumPy array, without explicit index¶

In [41]:

```
my_simple_series = pd.Series(np.random.randn(5))
my_simple_series
```

Out[41]:

Access a series like a NumPy array

In [42]:

```
my_simple_series[:3]
```

Out[42]:

##### Create series from Python dictionary¶

In [43]:

```
my_dictionary = {'a' : 45., 'b' : -19.5, 'c' : 4444}
my_second_series = pd.Series(my_dictionary)
my_second_series
```

Out[43]:

Access a series like a dictionary

In [44]:

```
my_second_series['b']
```

Out[44]:

note order in display; same as order in "index"

note NaN

note NaN

In [45]:

```
pd.Series(my_dictionary, index=['b', 'c', 'd', 'a'])
```

Out[45]:

In [46]:

```
my_second_series.get('a')
```

Out[46]:

In [47]:

```
unknown = my_second_series.get('f')
type(unknown)
```

Out[47]:

##### Create series from scalar¶

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index
In [48]:

```
pd.Series(5., index=['a', 'b', 'c', 'd', 'e'])
```

Out[48]:

### Vectorized Operations¶

- not necessary to write loops for element-by-element operations
- pandas' Series objects can be passed to
NumPy functions*MOST*

In [49]:

```
my_dictionary = {'a' : 45., 'b' : -19.5, 'c' : 4444}
my_series = pd.Series(my_dictionary)
my_series
```

Out[49]:

#### Add Series without loop¶

In [50]:

```
my_series + my_series
```

Out[50]:

In [51]:

```
my_series
```

Out[51]:

##### Series within arithmetic expression¶

In [52]:

```
#adding values into a series
my_series +5
```

Out[52]:

##### Series used as argument to NumPy function¶

In [53]:

```
np.exp(my_series)
```

Out[53]:

A key difference between Series and ndarray is that operations between Series automatically align the data based on

label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

label. Thus, you can write computations without giving consideration to whether the Series involved have the same labels.

In [54]:

```
my_series[1:]
```

Out[54]:

In [55]:

```
my_series[:-1]
```

Out[55]:

In [56]:

```
my_series[1:] + my_series[:-1]
```

Out[56]:

### Apply Python functions on an element-by-element basis¶

In [57]:

```
def multiply_by_ten (input_element):
return input_element * 10.0
```

In [58]:

```
my_series.map(multiply_by_ten)
```

Out[58]:

### Vectorized string methods¶

Series is equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically.
In [59]:

```
series_of_strings = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
```

In [60]:

```
series_of_strings.str.lower()
```

Out[60]:

- Reference resource :

## Comments