Pandas Library
Pandas 라이브러리
- Python Data Analysis의 약어이다.
- 관계형 또는 레이블된 데이터를 직관적으로 조작할 수 있도록 하는 데이터 구조를 제공하는 라이브러리이다.
- Pandas는 아래와 같은 종류의 데이터들에 대한 분석에 적합하다:
- SQL Table 또는 Excel Spreadsheet
- 정렬되었거나 정렬되지 않은 시계열 데이터
- 다른 형태의 관찰 / 통계 데이터 시트
- Pandas에서는 용이한 데이터 분석을 위해 아래와 같은 데이터 구조를 제공하고 있다:
- Series (시리즈)
- 1-Dimensional Array
- Index를 적용할 수 있다.
- DataFrame (데이터프레임)
- 2-Dimensional Array (Row X Column)
- 각 Column은 서로 다른 데이터 타입을 가질 수 있다.
Installation (설치)
Installation Command (설치 명령어)
# Installing from PyPI
pip install pandas
# Installing from Miniconda
conda install pandas
# Installing using Linux Distribution's Package Manager
sudo apt-get install python3-pandas # Ubuntu, Debian
yum install python3-pandas # CentOS
Import (모듈 임포트)
# Recommended import style
import pandas as pd
Version Check (버전 체크)
import pandas
pandas.__version__
Input/Output Methods
- Pandas에서는 데이터 입출력에 아래와 같은 함수들을 제공하고 있다.
Category | Methods | Description |
Pickling | read_pickle(filepath_or_buffer[, ...]) | - 파일로부터 pickled pandas 객체를 읽어온다. |
DataFrame.to_pickle(path[, compression, ...]) | - Pickle (serialize) 객체를 파일에 쓴다. | |
Flat file | read_table(filepath_or_buffer[, sep, ...]) | - general delimited file을 읽어 DataFrame으로 반환한다. |
read_csv(filepath_or_buffer[, sep, ...]) | - comma-separated values (csv) file을 읽어 DataFrame으로 반환한다. |
|
DataFrame.to_csv([path_or_buf, sep, na_rep, ...]) | - comma-separated values를 csv file로 쓴다. | |
read_fwf(filepath_or_buffer[, colspecs, ...]) | - fixed-width formatted lines으로 구성된 table을 읽어 DataFrame로 반환한다. |
|
Clipboard | read_clipboard([sep]) | - clipboard로부터 텍스트를 읽어 read_csv() 함수에 넘긴다. |
DataFrame.to_clipboard([excel, sep]) | - 객체를 복사하여 system clipboard에 저장한다. | |
Excel | read_excel(io[, sheet_name, header, names, ...]) | - Excel file을 읽어 pandas DataFrame으로 반환한다. |
DataFrame.to_excel(excel_writer[, ...]) | - 객체를 Excel sheet에 쓴다. | |
ExcelFile.parse([sheet_name, header, names, ...]) | - 특정 Excel sheet(s)를 Parse하여 DataFrame로 반환한다. |
|
Styler.to_excel(excel_writer[, sheet_name, ...]) | - Styler를 Excel sheet에 쓴다. | |
ExcelWriter(path[, engine, date_format, ...]) | - Excel sheet에 DataFrame을 쓰기 위한 Class이다. |
|
JSON | read_json([path_or_buf, orient, typ, dtype, ...]) | - JSON string을 pandas 객체로 반환한다. |
json_normalize(data[, record_path, meta, ...]) | - Normalize semi-structured JSON data를 flat table에 저장한다. |
|
DataFrame.to_json([path_or_buf, orient, ...]) | - 객체를 JSON string으로 반환한다. | |
build_table_schema(data[, index, ...]) | - 데이터로부터 Table schema를 Build하여 반환한다. |
|
HTML | read_html(io[, match, flavor, header, ...]) | - HTML tables을 읽어 DataFrame 객체로 구성된 List로 반환한다. |
DataFrame.to_html([buf, columns, col_space, ...]) | - DataFrame을 HTML table로 변환한다. | |
Styler.to_html([buf, table_uuid, ...]) | - buffer or string in HTML-CSS format의 buffer 혹은 string으로 구성된 Styler를 파일에 쓴다. |
|
XML | read_xml(path_or_buffer[, xpath, ...]) | - XML을 DataFrame로 반환한다. |
DataFrame.to_xml([path_or_buffer, index, ...]) | - DataFrame을 XML로 반환한다. | |
SQL | read_sql_table(table_name, con[, schema, ...]) | - SQL database table을 DataFrame로 반환한다. |
read_sql_query(sql, con[, index_col, ...]) | - SQL query를 DataFrame로 반환한다. | |
read_sql(sql, con[, index_col, ...]) | - SQL query 또는 database table을 DataFrame로 반환한다. |
|
DataFrame.to_sql(name, con[, schema, ...]) | - DataFrame에 저장된 records를 SQL database에 쓴다. |
General Functions
Category | Function | Description |
Data Manipulations |
melt(frame[, id_vars, value_vars, var_name, ...]) | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. |
pivot(data[, index, columns, values]) | - 지정한 index와 column값에 맞추어 reshape된 DataFrame을 반환한다. |
|
pivot_table(data[, values, index, columns, ...]) | - 데이터를 spreadsheet-style pivot table의 DataFrame으로 반환한다. |
|
crosstab(index, columns[, values, rownames, ...]) | Compute a simple cross tabulation of two (or more) factors. | |
cut(x, bins[, right, labels, retbins, ...]) | Bin values into discrete intervals. | |
qcut(x, q[, labels, retbins, precision, ...]) | Quantile-based discretization function. | |
merge(left, right[, how, on, left_on, ...]) | - DataFrame 또는 named Series를 Database에서의 Join과 같이 병합한다. |
|
merge_ordered(left, right[, on, left_on, ...]) | Perform a merge for ordered data with optional filling/interpolation. | |
merge_asof(left, right[, on, left_on, ...]) | Perform a merge by key distance. | |
concat(objs[, axis, join, ignore_index, ...]) | Concatenate pandas objects along a particular axis with optional set logic along the other axes. | |
get_dummies(data[, prefix, prefix_sep, ...]) | Convert categorical variable into dummy/indicator variables. | |
factorize(values[, sort, na_sentinel, size_hint]) | Encode the object as an enumerated type or categorical variable. | |
unique(values) | Return unique values based on a hash table. | |
wide_to_long(df, stubnames, i, j[, sep, suffix]) | Unpivot a DataFrame from wide to long format. | |
Top-Level Missing Data |
isna(obj) | Detect missing values for an array-like object. |
isnull(obj) | Detect missing values for an array-like object. | |
notna(obj) | Detect non-missing values for an array-like object. | |
notnull(obj) | Detect non-missing values for an array-like object. | |
Top-Level dealing with Numeric Data |
to_numeric(arg[, errors, downcast]) | Convert argument to a numeric type. |
Top-Level dealing with Datetimelike Data |
to_datetime(arg[, errors, dayfirst, ...]) | Convert argument to datetime. |
to_timedelta(arg[, unit, errors]) | Convert argument to timedelta. | |
date_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency DatetimeIndex. | |
bdate_range([start, end, periods, freq, tz, ...]) | Return a fixed frequency DatetimeIndex, with business day as the default frequency. | |
period_range([start, end, periods, freq, name]) | Return a fixed frequency PeriodIndex. | |
timedelta_range([start, end, periods, freq, ...]) | Return a fixed frequency TimedeltaIndex, with day as the default frequency. | |
infer_freq(index[, warn]) | Infer the most likely frequency given the input index. | |
Top-Level dealing with Interval Data |
interval_range([start, end, periods, freq, ...]) | Return a fixed frequency IntervalIndex. |
Top-Level Evaluation |
eval(expr[, parser, engine, truediv, ...]) | Evaluate a Python expression as a string using various backends. |
Hashing | util.hash_array(vals[, encoding, hash_key, ...]) | Given a 1d array, return an array of deterministic integers. |
util.hash_pandas_object(obj[, index, ...]) | Return a data hash of the Index/Series/DataFrame. | |
Testing | test([extra_args]) | Run the pandas test suite using pytest. |
Series (시리즈)
- Pandas에서 제공하는 1차원 배열 데이터 구조이다.
- Indexing 및 Slicing이 가능하며, 문자 Index를 부여 및 접근할 수 있다.
- 데이터 타입(dtype)이 존재한다.
Series Attributes
Category | Attribute | Description |
Axes (Axis) |
Series.index | - 해당 Series의 Index를 반환한다. - Series의 Index를 지정할 수 있다. |
Series.array | The ExtensionArray of the data backing this Series or Index. | |
Series.values | Return Series as ndarray or ndarray-like depending on the dtype. | |
Series.dtype | Return the dtype object of the underlying data. | |
Series.shape | Return a tuple of the shape of the underlying data. | |
Series.nbytes | Return the number of bytes in the underlying data. | |
Series.ndim | Number of dimensions of the underlying data, by definition 1. | |
Series.size | Return the number of elements in the underlying data. | |
Series.T | Return the transpose, which is by definition self. | |
Series.hasnans | Return True if there are any NaNs. | |
Series.empty | Indicator whether Series/DataFrame is empty. | |
Series.dtypes | Return the dtype object of the underlying data. | |
Series.name | Return the name of the Series. | |
Series.flags | Get the properties associated with this pandas object. | |
Indexting & Iteration |
Series.at | Access a single value for a row/column label pair. |
Series.iat | Access a single value for a row/column pair by integer position. | |
Series.loc | Access a group of rows and columns by label(s) or a boolean array. | |
Series.iloc | Purely integer-location based indexing for selection by position. |
Series Methods
Category | Method | Description |
Constructor | Series([data, index, dtype, name, copy, ...]) | - axis label이 부여된 1차원 ndarray 객체를 생성한다. - index에 내가 원하는 Index 이름을 지정할 수 있다. |
Axes (Axis) |
Series.memory_usage([index, deep]) | - Series의 메모리 사용량을 반환한다. |
Series.set_flags(*[, copy, ...]) | Return a new object with updated flags. | |
Conversion |
Series.astype(dtype[, copy, errors]) | Cast a pandas object to a specified dtype dtype. |
Series.convert_dtypes([infer_objects, ...]) | Convert columns to best possible dtypes using dtypes supporting pd.NA. | |
Series.infer_objects() | Attempt to infer better dtypes for object columns. | |
Series.copy([deep]) | Make a copy of this object's indices and data. | |
Series.bool() | Return the bool of a single element Series or DataFrame. | |
Series.to_numpy([dtype, copy, na_value]) | A NumPy ndarray representing the values in this Series or Index. | |
Series.to_period([freq, copy]) | Convert Series from DatetimeIndex to PeriodIndex. | |
Series.to_timestamp([freq, how, copy]) | Cast to DatetimeIndex of Timestamps, at beginning of period. | |
Series.to_list() | Return a list of the values. | |
Series.__array__([dtype]) | Return the values as a NumPy array. | |
Indexing and Iteration |
Series.get(key[, default]) | Get item from object for given key (ex: DataFrame column). |
Series.__iter__() | Return an iterator of the values. | |
Series.items() | Lazily iterate over (index, value) tuples. | |
Series.iteritems() | Lazily iterate over (index, value) tuples. | |
Series.keys() | Return alias for index. | |
Series.pop(item) | Return item and drops from series. | |
Series.item() | Return the first element of the underlying data as a Python scalar. | |
Series.xs(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
DataFrame (데이터프레임)
DataFrame Attributes
Category | Attribute | Description |
Constructor | DataFrame([data, index, columns, dtype, copy]) | - Two-dimensional, size-mutable, potentially heterogeneous tabular data. |
Axes (Axis) |
DataFrame.index | - DataFrame의 row labels(index)을 반환한다. |
DataFrame.columns | - DataFrame의 column labels을 반환한다. | |
DataFrame.dtypes | Return the dtypes in the DataFrame. | |
DataFrame.values | Return a Numpy representation of the DataFrame. | |
DataFrame.axes | Return a list representing the axes of the DataFrame. | |
DataFrame.ndim | Return an int representing the number of axes / array dimensions. | |
DataFrame.size | Return an int representing the number of elements in this object. | |
DataFrame.shape | Return a tuple representing the dimensionality of the DataFrame. | |
DataFrame.empty | Indicator whether Series/DataFrame is empty. | |
Indexing & Iteration |
DataFrame.at | Access a single value for a row/column label pair. |
DataFrame.iat | Access a single value for a row/column pair by integer position. | |
DataFrame.loc | Access a group of rows and columns by label(s) or a boolean array. | |
DataFrame.iloc | Purely integer-location based indexing for selection by position. | |
Reshaping & Sorting & Transposing |
DataFrame.T | - 전치된 행렬(Transposed Matrix)을 반환한다. |
Metadata | DataFrame.attrs | A dictionary for storing global metadata for this DataFrame. |
DataFrame Methods
Category | Method | Description |
Axes (Axis) |
DataFrame.info([verbose, buf, max_cols, ...]) | Print a concise summary of a DataFrame. |
DataFrame.select_dtypes([include, exclude]) | Return a subset of the DataFrame's columns based on the column dtypes. | |
DataFrame.memory_usage([index, deep]) | Return the memory usage of each column in bytes. | |
DataFrame.set_flags(*[, copy, ...]) | Return a new object with updated flags. | |
Conversion |
DataFrame.astype(dtype[, copy, errors]) | Cast a pandas object to a specified dtype dtype. |
DataFrame.convert_dtypes([infer_objects, ...]) | Convert columns to best possible dtypes using dtypes supporting pd.NA. | |
DataFrame.infer_objects() | Attempt to infer better dtypes for object columns. | |
DataFrame.copy([deep]) | Make a copy of this object's indices and data. | |
DataFrame.bool() | Return the bool of a single element Series or DataFrame. | |
Indexing & Iteration |
DataFrame.head([n]) | Return the first n rows. |
DataFrame.insert(loc, column, value[, ...]) | Insert column into DataFrame at specified location. | |
DataFrame.__iter__() | Iterate over info axis. | |
DataFrame.items() | Iterate over (column name, Series) pairs. | |
DataFrame.iteritems() | Iterate over (column name, Series) pairs. | |
DataFrame.keys() | Get the 'info axis' (see Indexing for more). | |
DataFrame.iterrows() | Iterate over DataFrame rows as (index, Series) pairs. | |
DataFrame.itertuples([index, name]) | Iterate over DataFrame rows as namedtuples. | |
DataFrame.pop(item) |
Return item and drop from frame. | |
DataFrame.tail([n]) | Return the last n rows. | |
DataFrame.xs(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. | |
DataFrame.get(key[, default]) | Get item from object for given key (ex: DataFrame column). | |
DataFrame.isin(values) | Whether each element in the DataFrame is contained in values. | |
DataFrame.where(cond[, other, inplace, ...]) | Replace values where the condition is False. | |
DataFrame.mask(cond[, other, inplace, axis, ...]) | Replace values where the condition is True. | |
DataFrame.query(expr[, inplace]) | Query the columns of a DataFrame with a boolean expression. | |
Binary Operator Functions |
DataFrame.add(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator add). |
DataFrame.sub(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator sub). | |
DataFrame.mul(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator mul). | |
DataFrame.div(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). | |
DataFrame.truediv(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). | |
DataFrame.floordiv(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator floordiv). | |
DataFrame.mod(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator mod). | |
DataFrame.pow(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator pow). | |
DataFrame.dot(other) | Compute the matrix multiplication between the DataFrame and other. | |
DataFrame.radd(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator radd). | |
DataFrame.rsub(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator rsub). | |
DataFrame.rmul(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator rmul). | |
DataFrame.rdiv(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). | |
DataFrame.rtruediv(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). | |
DataFrame.rfloordiv(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator rfloordiv). | |
DataFrame.rmod(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator rmod). | |
DataFrame.rpow(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator rpow). | |
DataFrame.lt(other[, axis, level]) | Get Less than of dataframe and other, element-wise (binary operator lt). | |
DataFrame.gt(other[, axis, level]) | Get Greater than of dataframe and other, element-wise (binary operator gt). | |
DataFrame.le(other[, axis, level]) | Get Less than or equal to of dataframe and other, element-wise (binary operator le). | |
DataFrame.ge(other[, axis, level]) | Get Greater than or equal to of dataframe and other, element-wise (binary operator ge). | |
DataFrame.ne(other[, axis, level]) | Get Not equal to of dataframe and other, element-wise (binary operator ne). | |
DataFrame.eq(other[, axis, level]) | Get Equal to of dataframe and other, element-wise (binary operator eq). | |
DataFrame.combine(other, func[, fill_value, ...]) | Perform column-wise combine with another DataFrame. | |
DataFrame.combine_first(other) | Update null elements with value in the same location in other. | |
Function Application & GroupBy & window |
DataFrame.apply(func[, axis, raw, ...]) | Apply a function along an axis of the DataFrame. |
DataFrame.applymap(func[, na_action]) | Apply a function to a Dataframe elementwise. | |
DataFrame.pipe(func, *args, **kwargs) | Apply chainable functions that expect Series or DataFrames. | |
DataFrame.agg([func, axis]) | Aggregate using one or more operations over the specified axis. | |
DataFrame.aggregate([func, axis]) | Aggregate using one or more operations over the specified axis. | |
DataFrame.transform(func[, axis]) | Call func on self producing a DataFrame with the same axis shape as self. | |
DataFrame.groupby([by, axis, level, ...]) | Group DataFrame using a mapper or by a Series of columns. | |
DataFrame.rolling(window[, min_periods, ...]) | Provide rolling window calculations. | |
DataFrame.expanding([min_periods, center, ...]) | Provide expanding window calculations. | |
DataFrame.ewm([com, span, halflife, alpha, ...]) | Provide exponentially weighted (EW) calculations. | |
Computations & Descriptive Stats |
DataFrame.abs() | Return a Series/DataFrame with absolute numeric value of each element. |
DataFrame.all([axis, bool_only, skipna, level]) | Return whether all elements are True, potentially over an axis. | |
DataFrame.any([axis, bool_only, skipna, level]) | Return whether any element is True, potentially over an axis. | |
DataFrame.clip([lower, upper, axis, inplace]) | Trim values at input threshold(s). | |
DataFrame.corr([method, min_periods]) | Compute pairwise correlation of columns, excluding NA/null values. | |
DataFrame.corrwith(other[, axis, drop, method]) | Compute pairwise correlation. | |
DataFrame.count([axis, level, numeric_only]) | Count non-NA cells for each column or row. | |
DataFrame.cov([min_periods, ddof]) | Compute pairwise covariance of columns, excluding NA/null values. | |
DataFrame.cummax([axis, skipna]) | Return cumulative maximum over a DataFrame or Series axis. | |
DataFrame.cummin([axis, skipna]) | Return cumulative minimum over a DataFrame or Series axis. | |
DataFrame.cumprod([axis, skipna]) | Return cumulative product over a DataFrame or Series axis. | |
DataFrame.cumsum([axis, skipna]) | Return cumulative sum over a DataFrame or Series axis. | |
DataFrame.describe([percentiles, include, ...]) | Generate descriptive statistics. | |
DataFrame.diff([periods, axis]) | First discrete difference of element. | |
DataFrame.eval(expr[, inplace]) | Evaluate a string describing operations on DataFrame columns. | |
DataFrame.kurt([axis, skipna, level, ...]) | Return unbiased kurtosis over requested axis. | |
DataFrame.kurtosis([axis, skipna, level, ...]) | Return unbiased kurtosis over requested axis. | |
DataFrame.mad([axis, skipna, level]) | Return the mean absolute deviation of the values over the requested axis. | |
DataFrame.max([axis, skipna, level, ...]) | Return the maximum of the values over the requested axis. | |
DataFrame.mean([axis, skipna, level, ...]) | Return the mean of the values over the requested axis. | |
DataFrame.median([axis, skipna, level, ...]) | Return the median of the values over the requested axis. | |
DataFrame.min([axis, skipna, level, ...]) | Return the minimum of the values over the requested axis. | |
DataFrame.mode([axis, numeric_only, dropna]) | - 선택된 axis의 최빈값을 반환한다. | |
DataFrame.pct_change([periods, fill_method, ...]) | Percentage change between the current and a prior element. | |
DataFrame.prod([axis, skipna, level, ...]) | Return the product of the values over the requested axis. | |
DataFrame.product([axis, skipna, level, ...]) | Return the product of the values over the requested axis. | |
DataFrame.quantile([q, axis, numeric_only, ...]) | Return values at the given quantile over requested axis. | |
DataFrame.rank([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. | |
DataFrame.round([decimals]) | Round a DataFrame to a variable number of decimal places. | |
DataFrame.sem([axis, skipna, level, ddof, ...]) | Return unbiased standard error of the mean over requested axis. | |
DataFrame.skew([axis, skipna, level, ...]) | Return unbiased skew over requested axis. | |
DataFrame.sum([axis, skipna, level, ...]) | Return the sum of the values over the requested axis. | |
DataFrame.std([axis, skipna, level, ddof, ...]) | Return sample standard deviation over requested axis. | |
DataFrame.var([axis, skipna, level, ddof, ...]) | Return unbiased variance over requested axis. | |
DataFrame.nunique([axis, dropna]) | Count number of distinct elements in specified axis. | |
DataFrame.value_counts([subset, normalize, ...]) | Return a Series containing counts of unique rows in the DataFrame. | |
Reindexing & Selection & Label Manipulation |
DataFrame.add_prefix(prefix) | Prefix labels with string prefix. |
DataFrame.add_suffix(suffix) | Suffix labels with string suffix. | |
DataFrame.align(other[, join, axis, level, ...]) | Align two objects on their axes with the specified join method. | |
DataFrame.at_time(time[, asof, axis]) | Select values at particular time of day (e.g., 9:30AM). | |
DataFrame.between_time(start_time, end_time) | Select values between particular times of the day (e.g., 9:00-9:30 AM). | |
DataFrame.drop([labels, axis, index, ...]) | Drop specified labels from rows or columns. | |
DataFrame.drop_duplicates([subset, keep, ...]) | Return DataFrame with duplicate rows removed. | |
DataFrame.duplicated([subset, keep]) | Return boolean Series denoting duplicate rows. | |
DataFrame.equals(other) | Test whether two objects contain the same elements. | |
DataFrame.filter([items, like, regex, axis]) | Subset the dataframe rows or columns according to the specified index labels. | |
DataFrame.first(offset) | Select initial periods of time series data based on a date offset. | |
DataFrame.head([n]) | Return the first n rows. | |
DataFrame.idxmax([axis, skipna]) | Return index of first occurrence of maximum over requested axis. | |
DataFrame.idxmin([axis, skipna]) | Return index of first occurrence of minimum over requested axis. | |
DataFrame.last(offset) | Select final periods of time series data based on a date offset. | |
DataFrame.reindex([labels, index, columns, ...]) | Conform Series/DataFrame to new index with optional filling logic. | |
DataFrame.reindex_like(other[, method, ...]) | Return an object with matching indices as other object. | |
DataFrame.rename([mapper, index, columns, ...]) | Alter axes labels. | |
DataFrame.rename_axis([mapper, index, ...]) | Set the name of the axis for the index or columns. | |
DataFrame.reset_index([level, drop, ...]) | Reset the index, or a level of it. | |
DataFrame.sample([n, frac, replace, ...]) | Return a random sample of items from an axis of object. | |
DataFrame.set_axis(labels[, axis, inplace]) | Assign desired index to given axis. | |
DataFrame.set_index(keys[, drop, append, ...]) | Set the DataFrame index using existing columns. | |
DataFrame.tail([n]) | Return the last n rows. | |
DataFrame.take(indices[, axis, is_copy]) | Return the elements in the given positional indices along an axis. | |
DataFrame.truncate([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. | |
Missing Data Handling |
DataFrame.backfill([axis, inplace, limit, ...]) | Synonym for DataFrame.fillna() with method='bfill'. |
DataFrame.bfill([axis, inplace, limit, downcast]) | Synonym for DataFrame.fillna() with method='bfill'. | |
DataFrame.dropna([axis, how, thresh, ...]) | Remove missing values. | |
DataFrame.ffill([axis, inplace, limit, downcast]) | Synonym for DataFrame.fillna() with method='ffill'. | |
DataFrame.fillna([value, method, axis, ...]) | Fill NA/NaN values using the specified method. | |
DataFrame.interpolate([method, axis, limit, ...]) | Fill NaN values using an interpolation method. | |
DataFrame.isna() | Detect missing values. | |
DataFrame.isnull() | DataFrame.isnull is an alias for DataFrame.isna. | |
DataFrame.notna() | Detect existing (non-missing) values. | |
DataFrame.notnull() | DataFrame.notnull is an alias for DataFrame.notna. | |
DataFrame.pad([axis, inplace, limit, downcast]) | Synonym for DataFrame.fillna() with method='ffill'. | |
DataFrame.replace([to_replace, value, ...]) | Replace values given in to_replace with value. | |
Reshaping & Sorting & Transposing |
DataFrame.droplevel(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
DataFrame.pivot([index, columns, values]) | Return reshaped DataFrame organized by given index / column values. | |
DataFrame.pivot_table([values, index, ...]) | Create a spreadsheet-style pivot table as a DataFrame. | |
DataFrame.reorder_levels(order[, axis]) | Rearrange index levels using input order. | |
DataFrame.sort_values(by[, axis, ascending, ...]) | Sort by the values along either axis. | |
DataFrame.sort_index([axis, level, ...]) | Sort object by labels (along an axis). | |
DataFrame.nlargest(n, columns[, keep]) | Return the first n rows ordered by columns in descending order. | |
DataFrame.nsmallest(n, columns[, keep]) | Return the first n rows ordered by columns in ascending order. | |
DataFrame.swaplevel([i, j, axis]) | Swap levels i and j in a MultiIndex. | |
DataFrame.stack([level, dropna]) | Stack the prescribed level(s) from columns to index. | |
DataFrame.unstack([level, fill_value]) | Pivot a level of the (necessarily hierarchical) index labels. | |
DataFrame.swapaxes(axis1, axis2[, copy]) | Interchange axes and swap values axes appropriately. | |
DataFrame.melt([id_vars, value_vars, ...]) | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. | |
DataFrame.explode(column[, ignore_index]) | Transform each element of a list-like to a row, replicating index values. | |
DataFrame.squeeze([axis]) | Squeeze 1 dimensional axis objects into scalars. | |
DataFrame.to_xarray() | Return an xarray object from the pandas object. | |
DataFrame.transpose(*args[, copy]) | Transpose index and columns. | |
Combining & Comparing & Joining & Merging |
DataFrame.assign(**kwargs) | Assign new columns to a DataFrame. |
DataFrame.compare(other[, align_axis, ...]) | Compare to another DataFrame and show the differences. | |
DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns of another DataFrame. | |
DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame or named Series objects with a database-style join. | |
DataFrame.update(other[, join, overwrite, ...]) | Modify in place using non-NA values from another DataFrame. |
Reference: "pandas", pandas.pydata.org, 2022.09.05 검색, URL.