{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Bokeh Charts Attributes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of Bokeh Charts main contributions is that it provides a flexible interface for applying unique attributes based on the unique values in column(s) of a DataFrame.\n", "\n", "Internally, the bokeh chart uses the AttrSpec to define the mapping, but allows the user to pass in their own spec, or utilize a function to produce a customized one." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from bokeh.charts.attributes import AttrSpec, ColorAttr, MarkerAttr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Examples\n", "\n", "The AttrSpec assigns values in the iterable to values in items." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "attr = AttrSpec(items=[1, 2, 3], iterable=['a', 'b', 'c'])\n", "attr.attr_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will see that the key in the mapping will be a tuple, and it will always be a tuple. The mapping works like this because the AttrSpec(s) are often used with Pandas DataFrames groupby method. The groupby method can return a single value or a tuple of values when used with multiple columns, so this is just making sure that is consistent. \n", "\n", "However, you can still access the values in the following way:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "attr[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `ColorAttr` is just a custom `AttrSpec` that has a default palette as the iterable, but can be customized, and will likely provide some other color generation functionality. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "color = ColorAttr(items=[1, 2, 3])\n", "color.attr_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's assume that you don't know how many unique items you are working with, but you have defined the things that you want to assign the items to. The `AttrSpec` will automatically cycle the iterable for you. This is important for exploratory analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "color = ColorAttr(items=list(range(0, 10)))\n", "color.attr_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because there are only 6 unique colors in the default palette, the palette repeats starting on the 7th item." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using with Pandas" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from bokeh.sampledata.autompg import autompg as df" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "color_attr = ColorAttr(df=df, columns=['cyl', 'origin'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "color_attr.attr_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will notice that this is similar to a pandas series with a MultiIndex, which is seen below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "color_attr.series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can think of this as a SQL table with 3 columns, two of which are an index. You can imagine how you might join this view data into the original data source to assign these colors to the associated rows." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining with ChartDataSource" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from bokeh.charts.data_source import ChartDataSource" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fill_color = ColorAttr(df=df, columns=['cyl', 'origin'])\n", "\n", "ds = ChartDataSource.from_data(df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ds.join_attrs(fill_color=fill_color).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiple Attributes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# add new column\n", "df['large_displ'] = df['displ'] >= 350\n", "\n", "fill_color = ColorAttr(df=df, columns=['cyl', 'origin'])\n", "line_color = ColorAttr(df=df, columns=['large_displ'])\n", "\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Custom Iterable\n", "\n", "You will see that the output contains the combined chart_index and the columns for both attributes. The values of each are joined in based on the original assignment. For example, line_color only has two colors because the large_displ column only has two values.\n", "\n", "If we wanted to change the true/false, we can modify the ColorAttr." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "line_color = ColorAttr(df=df, columns=['large_displ'], palette=['Green', 'Red'])\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Altering Attribute Assignment Order\n", "\n", "You may not have wanted to assign the values in the order that occured. So, you would have five options.\n", "\n", "\n", "1. Pre order the data and tell the attribute not to sort.\n", "2. Make the column a categorical and set the order.\n", "3. Specify the sort options to the `AttrSpec`\n", "4. Manually specify the items in the order you want them to be assigned.\n", "5. Specify the iterable in the order you want." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Pre order the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_sorted = df.sort(columns=['large_displ'], ascending=False)\n", "\n", "line_color = ColorAttr(df=df_sorted, columns=['large_displ'], palette=['Green', 'Red'], sort=False)\n", "\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Make the column a categorical and set the order\n", "\n", "We'll show the default sort order of a boolean column, which is ascending." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.sort(columns='large_displ').head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "df_cat = df.copy()\n", "\n", "# create the categorical and set the default (ascending)\n", "df_cat['large_displ'] = pd.Categorical.from_array(df.large_displ).reorder_categories([True, False])\n", "\n", "# we don't have to sort here, but doing it so you can see the order that the attr spec will see\n", "df_cat.sort(columns='large_displ').head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "line_color = ColorAttr(df=df_cat, columns=['large_displ'], palette=['Green', 'Red'])\n", "\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Specify the sort options to the `AttrSpec`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# the items will be sorted descending (uses same sorting options as pandas)\n", "line_color = ColorAttr(df=df, columns=['large_displ'], palette=['Green', 'Red'], sort=True, ascending=False)\n", "\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Manually specify the items in the order you want them" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# remove df so the items aren't auto-calculated\n", "# still need column name for when palette is joined into the dataset\n", "line_color = ColorAttr(columns=['large_displ'], items=[True, False], palette=['Green', 'Red'])\n", "\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Change the order of the iterable" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "line_color = ColorAttr(df=df, columns=['large_displ'], palette=['Red', 'Green'])\n", "\n", "ds.join_attrs(fill_color=fill_color, line_color=line_color).head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }