欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

python应用篇之数据可视化——总结

时间:2023-05-26
前言

  我们通过七篇文章给大家大致介绍了数据可视化的制作过程,当然这个项目也是来自Eric Matthes编著的《Python编程从入门到实践书中项目。不过,本人是通过一定的特色,通过我学习项目的方式来给大家介绍这个项目。从环境搭建到后面一步步的实现。当然,随着项目的不断深入,代码的量越来越大,为了方便大家的阅读,我们只是将实现功能对应的代码方法进行了书写。今天,我们给出大家该项目的完整代码,给大家一个完整的效果。不过还是强烈读者从这个项目的开始阅读。这样,相信会对你获益匪浅。如果只是简单的将本文中的所有代码粘贴一遍,没有任何的用,可能你连这个项目的整个框架都不清楚。这里需要说明的是:由于本项目是数据的可视化,与上一个项目——外星人入侵还是有一定的区别的。外星人入侵是一个模块实现项目的一小个部分,它是依附于项目的全部代码才能跑起来,比如说我们前面介绍的武装飞船,我们光有这一块代码是跑不起来。但是我们的数据可视化是不一样的,它每个模块之间是相互独立的,没有必然的联系,耦合性是极低的,我们主要是给大家介绍数据的获取以及API的具体使用,将我们已有的数据进行分析,教大家如何制作一些漂亮的图表。

项目概括

  数据可视化指的是通过可视化表示来探索数据,它与数据挖掘紧密相关,其实准确地说,它是数据挖掘、人工智能地其中一个环节,而数据挖掘指的是使用代码来探索数据集的规律和关联。数据集可以是用一行代码就能表示的小型数字列表,也可以是比较直观地图片。具体效果如下:


  漂亮地呈现数据关乎的并非仅仅是漂亮的图片。以引人注目的简洁方式呈现数据,让用户很清晰、直观地明白数据背后所呈现的含义,从而更好地把控其中的规律。本项目首先给大家介绍的就是解决数据的问题,因为数据可视化的大前提是我们首先得有数据才行。主要通过三篇文章给大家介绍生成数据,即在没有数据的情况下,我们应该生成一些数据供我们分析;通过两篇文章介绍下载数据;有了数据之后,最后就是通过两篇文章给大家介绍API的具体使用与分析。
  不过本文只是给大家介绍的是一些小的方法,比如柱状图、折线图怎么画,读者要想学这方面的知识,网上教程一大堆,大家可以去学习一下子,比较简单,应用还挺多,性价比还挺好的。接下来给大家介绍本次项目中的所有代码。方便大家整体上去参考。

完整代码 1、dice_visual.py的实现

import pygalfrom die import Die# Create two D6 dice.die_1 = Die()die_2 = Die()# Make some rolls, and store results in a list.results = []for roll_num in range(1000): result = die_1.roll() + die_2.roll() results.append(result) # Analyze the results.frequencies = []max_result = die_1.num_sides + die_2.num_sidesfor value in range(2, max_result+1): frequency = results.count(value) frequencies.append(frequency) # Visualize the results.hist = pygal.Bar()hist.force_uri_protocol = 'http'hist.title = "Results of rolling two D6 dice 1000 times."hist.x_labels = ['2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']hist.x_title = "Result"hist.y_title = "Frequency of Result"hist.add('D6 + D6', frequencies)hist.render_to_file('dice_visual.svg')

2、die.py的实现

from random import randintclass Die(): """A class representing a single die.""" def __init__(self, num_sides=6): """Assume a six-sided die.""" self.num_sides = num_sides def roll(self): """"Return a random value between 1 and number of sides.""" return randint(1, self.num_sides)

3、die_visual.py的实现

import pygalfrom die import Die# Create a D6.die = Die()# Make some rolls, and store results in a list.results = []for roll_num in range(1000): result = die.roll() results.append(result) # Analyze the results.frequencies = []for value in range(1, die.num_sides+1): frequency = results.count(value) frequencies.append(frequency) # Visualize the results.hist = pygal.Bar()hist.force_uri_protocol = 'http'hist.title = "Results of rolling one D6 1000 times."hist.x_labels = ['1', '2', '3', '4', '5', '6']hist.x_title = "Result"hist.y_title = "Frequency of Result"hist.add('D6', frequencies)hist.render_to_file('die_visual.svg')

4、different_dice.py的实现

from die import Dieimport pygal# Create a D6 and a D10.die_1 = Die()die_2 = Die(10)# Make some rolls, and store results in a list.results = []for roll_num in range(50000): result = die_1.roll() + die_2.roll() results.append(result) # Analyze the results.frequencies = []max_result = die_1.num_sides + die_2.num_sidesfor value in range(2, max_result+1): frequency = results.count(value) frequencies.append(frequency) # Visualize the results.hist = pygal.Bar()hist.force_uri_protocol = 'http'hist.title = "Results of rolling a D6 and a D10 50,000 times."hist.x_labels = ['2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16']hist.x_title = "Result"hist.y_title = "Frequency of Result"hist.add('D10 + D10', frequencies)hist.render_to_file('dice_visual.svg')

5、mpl_squares.py的实现

import matplotlib.pyplot as pltinput_values = [1, 2, 3, 4, 5]squares = [1, 4, 9, 16, 25]plt.plot(input_values, squares, linewidth=5)# Set chart title and label axes.plt.title("Square Numbers", fontsize=24)plt.xlabel("Value", fontsize=14)plt.ylabel("Square of Value", fontsize=14)# Set size of tick labels.plt.tick_params(axis='both', labelsize=14)plt.show()

6、random_walk.py的实现

from random import choiceclass RandomWalk(): """A class to generate random walks.""" def __init__(self, num_points=5000): """Initialize attributes of a walk.""" self.num_points = num_points # All walks start at (0, 0). self.x_values = [0] self.y_values = [0] def fill_walk(self): """Calculate all the points in the walk.""" # Keep taking steps until the walk reaches the desired length. while len(self.x_values) < self.num_points: # Decide which direction to go, and how far to go in that direction. x_direction = choice([1, -1]) x_distance = choice([0, 1, 2, 3, 4]) x_step = x_direction * x_distance y_direction = choice([1, -1]) y_distance = choice([0, 1, 2, 3, 4]) y_step = y_direction * y_distance # Reject moves that go nowhere. if x_step == 0 and y_step == 0: continue # Calculate the next x and y values. next_x = self.x_values[-1] + x_step next_y = self.y_values[-1] + y_step self.x_values.append(next_x) self.y_values.append(next_y)

7、rw_visual.py的实现

import matplotlib.pyplot as pltfrom random_walk import RandomWalk# Keep making new walks, as long as the program is active.while True: # Make a random walk, and plot the points. rw = RandomWalk(50000) rw.fill_walk() # Set the size of the plotting window. plt.figure(dpi=128, figsize=(10, 6)) point_numbers = list(range(rw.num_points)) plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues, edgecolor='none', s=1) # Emphasize the first and last points. plt.scatter(0, 0, c='green', edgecolors='none', s=100) plt.scatter(rw.x_values[-1], rw.y_values[-1], c='red', edgecolors='none', s=100) # Remove the axes. plt.axes().get_xaxis().set_visible(False) plt.axes().get_yaxis().set_visible(False) plt.show() keep_running = input("Make another walk? (y/n): ") if keep_running == 'n': break

8、scatter_squares.py的实现

import matplotlib.pyplot as pltx_values = list(range(1, 1001))y_values = [x**2 for x in x_values]plt.scatter(x_values, y_values, c=(0, 0, 0.8), edgecolor='none', s=40)# Set chart title, and label axes.plt.title("Square Numbers", fontsize=24)plt.xlabel("Value", fontsize=14)plt.ylabel("Square of Value", fontsize=14)# Set size of tick labels.plt.tick_params(axis='both', which='major', labelsize=14)# Set the range for each axis.plt.axis([0, 1100, 0, 1100000])plt.show()

9、highs_lows.py的实现

import csvfrom datetime import datetimefrom matplotlib import pyplot as plt# Get dates, high, and low temperatures from file.filename = 'death_valley_2014.csv'with open(filename) as f: reader = csv.reader(f) header_row = next(reader) dates, highs, lows = [], [], [] for row in reader: try: current_date = datetime.strptime(row[0], "%Y-%m-%d") high = int(row[1]) low = int(row[3]) except ValueError: print(current_date, 'missing data') else: dates.append(current_date) highs.append(high) lows.append(low)# Plot data.fig = plt.figure(dpi=128, figsize=(10, 6))plt.plot(dates, highs, c='red', alpha=0.5)plt.plot(dates, lows, c='blue', alpha=0.5)plt.fill_between(dates, highs, lows, facecolor='blue', alpha=0.1)# Format plot.title = "Daily high and low temperatures - 2014nDeath Valley, CA"plt.title(title, fontsize=20)plt.xlabel('', fontsize=16)fig.autofmt_xdate()plt.ylabel("Temperature (F)", fontsize=16)plt.tick_params(axis='both', which='major', labelsize=16)plt.show()

10、btc_close_2017.py的实现

from __future__ import (absolute_import, division, print_function, unicode_literals)try: # Python 2.x 版本 from urllib2 import urlopenexcept importError: # Python 3.x 版本 from urllib.request import urlopen # 1import jsonimport requestsimport pygalimport mathfrom itertools import groupbyjson_url = 'https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json'response = urlopen(json_url) # 2# 读取数据req = response.read()# 将数据写入文件with open('btc_close_2017_urllib.json', 'wb') as f: # 3 f.write(req)# 加载json格式file_urllib = json.loads(req.decode('utf8')) # 4print(file_urllib)json_url = 'https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json'req = requests.get(json_url) # 1# 将数据写入文件with open('btc_close_2017_request.json', 'w') as f: f.write(req.text) # 2file_requests = req.json() # 3print(file_urllib == file_requests)# 将数据加载到一个列表中filename = 'btc_close_2017.json'with open(filename) as f: btc_data = json.load(f) # 1# 打印每一天的信息for btc_dict in btc_data: date = btc_dict['date'] month = int(btc_dict['month']) week = int(btc_dict['week']) weekday = btc_dict['weekday'] close = int(float(btc_dict['close'])) # 1 print("{} is month {} week {}, {}, the close price is {} RMB".format( date, month, week, weekday, close))# 创建5个列表,分别存储日期和收盘价dates = []months = []weeks = []weekdays = []close = []# 每一天的信息for btc_dict in btc_data: dates.append(btc_dict['date']) months.append(int(btc_dict['month'])) weeks.append(int(btc_dict['week'])) weekdays.append(btc_dict['weekday']) close.append(int(float(btc_dict['close'])))line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False) # ①line_chart.title = '收盘价(¥)'line_chart.x_labels = datesN = 20 # x轴坐标每隔20天显示一次line_chart.x_labels_major = dates[::N] # ②line_chart.add('收盘价', close)line_chart.render_to_file('收盘价折线图(¥).svg')line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)line_chart.title = '收盘价对数变换(¥)'line_chart.x_labels = datesN = 20 # x轴坐标每隔20天显示一次line_chart.x_labels_major = dates[::N]close_log = [math.log10(_) for _ in close] # ①line_chart.add('log收盘价', close_log)line_chart.render_to_file('收盘价对数变换折线图(¥).svg')line_chartdef draw_line(x_data, y_data, title, y_legend): xy_map = [] for x, y in groupby(sorted(zip(x_data, y_data)), key=lambda _: _[0]): # 2 y_list = [v for _, v in y] xy_map.append([x, sum(y_list) / len(y_list)]) # 3 x_unique, y_mean = [*zip(*xy_map)] # 4 line_chart = pygal.Line() line_chart.title = title line_chart.x_labels = x_unique line_chart.add(y_legend, y_mean) line_chart.render_to_file(title + '.svg') return line_chartidx_month = dates.index('2017-12-01')line_chart_month = draw_line( months[:idx_month], close[:idx_month], '收盘价月日均值(¥)', '月日均值')line_chart_monthidx_week = dates.index('2017-12-11')line_chart_week = draw_line( weeks[1:idx_week], close[1:idx_week], '收盘价周日均值(¥)', '周日均值')line_chart_weekidx_week = dates.index('2017-12-11')wd = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']weekdays_int = [wd.index(w) + 1 for w in weekdays[1:idx_week]]line_chart_weekday = draw_line( weekdays_int, close[1:idx_week], '收盘价星期均值(¥)', '星期均值')line_chart_weekday.x_labels = ['周一', '周二', '周三', '周四', '周五', '周六', '周日']line_chart_weekday.render_to_file('收盘价星期均值(¥).svg')line_chart_weekdaywith open('收盘价Dashboard.html', 'w', encoding='utf8') as html_file: html_file.write( '收盘价Dashboardn') for svg in [ '收盘价折线图(¥).svg', '收盘价对数变换折线图(¥).svg', '收盘价月日均值(¥).svg', '收盘价周日均值(¥).svg', '收盘价星期均值(¥).svg' ]: html_file.write( ' n'.format(svg)) # 1 html_file.write('')

11、bar_descriptions.py的实现

import pygalfrom pygal.style import LightColorizedStyle as LCS, LightenStyle as LSmy_style = LS('#333366', base_style=LCS)chart = pygal.Bar(style=my_style, x_label_rotation=45, show_legend=False)chart.title = 'Python Projects'chart.x_labels = ['httpie', 'django', 'flask']chart.force_uri_protocol = 'http'plot_dicts = [ {'value': 16101, 'label': 'Description of httpie.'}, {'value': 15028, 'label': 'Description of django.'}, {'value': 14798, 'label': 'Description of flask.'}, ]chart.add('', plot_dicts)chart.render_to_file('bar_descriptions.svg')

12、python_repos.py的实现

import requestsimport pygalfrom pygal.style import LightColorizedStyle as LCS, LightenStyle as LS# Make an API call, and store the response.url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'r = requests.get(url)print("Status code:", r.status_code)# Store API response in a variable.response_dict = r.json()print("Total repositories:", response_dict['total_count'])# Explore information about the repositories.repo_dicts = response_dict['items']names, plot_dicts = [], []for repo_dict in repo_dicts: names.append(repo_dict['name']) plot_dict = { 'value': repo_dict['stargazers_count'], 'label': repo_dict['description'], 'xlink': repo_dict['html_url'], } plot_dicts.append(plot_dict)# Make visualization.my_style = LS('#333366', base_style=LCS)my_config = pygal.Config()my_config.force_uri_protocol = 'http'my_config.x_label_rotation = 45my_config.show_legend = Falsemy_config.title_font_size = 24my_config.label_font_size = 14my_config.major_label_font_size = 18my_config.truncate_label = 15my_config.show_y_guides = Falsemy_config.width = 1000chart = pygal.Bar(my_config, style=my_style)chart.title = 'Most-Starred Python Projects on GitHub'chart.x_labels = nameschart.add('', plot_dicts)chart.render_to_file('python_repos.svg')

13、hn_submissions.py的实现

import requestsfrom operator import itemgetter# Make an API call, and store the response.url = 'https://hacker-news.firebaseio.com/v0/topstories.json'r = requests.get(url)print("Status code:", r.status_code)# Process information about each submission.submission_ids = r.json()submission_dicts = []for submission_id in submission_ids[:30]: # Make a separate API call for each submission. url = ('https://hacker-news.firebaseio.com/v0/item/' + str(submission_id) + '.json') submission_r = requests.get(url) print(submission_r.status_code) response_dict = submission_r.json() submission_dict = { 'title': response_dict['title'], 'link': 'http://news.ycombinator.com/item?id=' + str(submission_id), 'comments': response_dict.get('descendants', 0) } submission_dicts.append(submission_dict) submission_dicts = sorted(submission_dicts, key=itemgetter('comments'), reverse=True)for submission_dict in submission_dicts: print("nTitle:", submission_dict['title']) print("Discussion link:", submission_dict['link']) print("Comments:", submission_dict['comments'])

  这就是本文的完整代码,希望读者学完之后可以有一个清晰的认知,对自己的Python基础知识的应用有一个较为深刻的认知。

每个模块实现过程 1、生成数据

[1].生成数据(上)
[2].生成数据(中)
[3].生成数据(下)

2、下载数据

[1].下载数据(上)
[2].下载数据(下)

3、使用API

[1].使用API(上)
[2].使用API(下)
  这就是我们本项目每个模块实现的详情,大家可以认真阅读,对大家在日后的数据分析中一定有所帮助。

总结

  本文给大家总结了《数据可视化》项目,从需求分析,到代码结构,以及给出了本项目的完整代码。最后贵吗总结了前面每个功能实现的文章链接,方便大家阅读。Python是一门注重实际操作的语言,它是众多编程语言中最简单,也是最好入门的。当你把这门语言学会了,再去学习java、go以及C语言就比较简单了。当然,Python也是一门热门语言,对于人工智能的实现有着很大的帮助,因此,值得大家花时间去学习。生命不息,奋斗不止,我们每天努力,好好学习,不断提高自己的能力,相信自己一定会学有所获。加油!!!

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。