{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# merge - 数据连接\n", "\n", "```{admonition} 在线刷题\n", ":class: seealso\n", "\n", "检查 or 强化 `Pandas` 数据分析操作?👉在线体验「Pandas进阶修炼300题」\n", "```\n", "\n", "```{note} \n", "本页面代码可以[在线编辑、执行](../指引/在线执行.md)!\n", "```\n", "\n", "\n", "`merge` 是 `pandas` 中 `SQL` 风格的合并方法,虽然它有多个参数,但大多情况下我们只需要了解如何使用 `how(LEFT、RIGHT、OUTER、INNER)` 参数,下面是图解\n", "\n", "\n", "```{figure} https://pic.liuzaoqi.com/picgo/202112231753690.png\n", ":width: 100%\n", ":align: center\n", "```\n", "\n", "下面是官方文档中的案例,你可以修改相关代码来验证自己的想法!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 按单键连接\n", "\n", "根据 `key` 连接 `left` 和 `right`\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_on_key.png)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-input", "thebe-init" ] }, "outputs": [], "source": [ "import pandas as pd\n", "left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n", " 'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3']})\n", "\n", "right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']})" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyABCD
0K0A0B0C0D0
1K1A1B1C1D1
2K2A2B2C2D2
3K3A3B3C3D3
\n", "
" ], "text/plain": [ " key A B C D\n", "0 K0 A0 B0 C0 D0\n", "1 K1 A1 B1 C1 D1\n", "2 K2 A2 B2 C2 D2\n", "3 K3 A3 B3 C3 D3" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, on='key')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 按多键连接\n", "\n", "根据 `key1` 和 `key2` 连接 `left` 和 `right`\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_on_key_multiple.png)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n", " 'key2': ['K0', 'K1', 'K0', 'K1'],\n", " 'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3']})\n", "\n", "\n", "right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n", " 'key2': ['K0', 'K0', 'K0', 'K0'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']})\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K1 K0 A2 B2 C1 D1\n", "2 K1 K0 A2 B2 C2 D2" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 左外连接\n", "\n", "\n", "如下图所示的结果连接 left 和 right,保留左表全部键\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_on_key_left.png)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K0 K1 A1 B1 NaN NaN\n", "2 K1 K0 A2 B2 C1 D1\n", "3 K1 K0 A2 B2 C2 D2\n", "4 K2 K1 A3 B3 NaN NaN" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='left', on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 右外连接\n", "\n", "\n", "如下图所示的结果连接 left 和 right,保留右表全部键\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_on_key_right.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.merge(left, right, how='right', on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 全外连接\n", "\n", "\n", "如下图所示的结果连接 left 和 right,保留全部键\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_on_key_outer.png)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
5K2K0NaNNaNC3D3
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K0 K1 A1 B1 NaN NaN\n", "2 K1 K0 A2 B2 C1 D1\n", "3 K1 K0 A2 B2 C2 D2\n", "4 K2 K1 A3 B3 NaN NaN\n", "5 K2 K0 NaN NaN C3 D3" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='outer', on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 内连接\n", "\n", "\n", "如下图所示的结果连接 left 和 right,保留交集\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_on_key_inner.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.merge(left, right, how='inner', on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 重复索引\n", "\n", "\n", "按下图所示进行连接\n", "\n", "![](https://pandas.pydata.org/pandas-docs/version/0.20/_images/merging_merge_overlapped_suffix.png)" ] }, { "cell_type": "code", "execution_count": 255, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})\n", "right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pd.merge(left, right, on='k', suffixes=['_l', '_r'])" ] } ], "metadata": { "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "384px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }