Numpy学习——Task01 数据类型及数组创建

it2023-07-04 112

Numpy学习——Task01 数据类型及数组创建

一、常量二、数据类型三、时间日期和时间增量四、数组的创建1.数组的创建2. 结构数组的创建3.数组的属性

一、常量

1）np.nan 表示空值。nan = NaN = NAN两个numpy.nan是不相等的。例： import numpy as np print(np.nan == np.nan) # False print(np.nan != np.nan) # True numpy.isnan(x, *args, **kwargs)Test element-wise for NaN and return result as a boolean array.import numpy as np x=np.array([1,2,0,3,np.nan,8,0,0]) np.isnan(x) #array([False, False, False, False, True, False, False, False]) np.nonzero以元组形式返回array中非0的坐标,包括np.nan.import numpy as np x=np.array([1,2,0,3,np.nan,8,0,0]) np.nonzero(x) #(array([0, 1, 3, 4, 5], dtype=int64),) np.count_nonzero返回array中非0项的个数,包括np.nan. import numpy as np x=np.array([1,2,0,3,np.nan,8,0,0]) np.count_nonzero(x) #5 2）np.pi 表示圆周率pi = 3.1415926535897932384626433... 3）np.inf 表示正无穷大。Inf = inf = infty = Infinity = PINF 4）np.e 表示自然常数e = 2.71828182845904523536028747135266249775724709369995...

二、数据类型

1）常见数据类型 Python 原生的数据类型相对较少， bool、int、float、str等。这在不需要关心数据在计算机中表示的所有方式的应用中是方便的。然而，对于科学计算，通常需要更多的控制。为了加以区分 numpy 在这些类型名称末尾都加了“_”。

型备注说明ool_ = bool88位布尔类型nt8 = byte8位整型nt16 = short16位整型nt32 = intc32位整型nt_ = int64 = long = int0 = intp64位整型int8 = ubyte8位无符号整型int16 = ushort16位无符号整型int32 = uintc32位无符号整型int64 = uintp = uint0 = uint64位无符号整型loat16 = half16位浮点型loat32 = single32位浮点型loat_ = float64 = double64位浮点型tr_ = unicode_ = str0 = unicodeUnicode 字符串atetime64日期时间类型imedelta64表示两个时间之间的间隔

2）创建数据类型

numpy 的数值类型实际上是 dtype 对象的实例。

class dtype(object): def __init__(self, obj, align=False, copy=False): pass

每个内建类型都有一个唯一定义它的字符代码，如下：

字符对应类型备注bboolean‘b1’isigned integer‘i1’, ‘i2’, ‘i4’, ‘i8’uunsigned integer‘u1’, ‘u2’ ,‘u4’ ,‘u8’ffloating-point‘f2’, ‘f4’, ‘f8’ccomplex floating-pointmtimedelta64表示两个时间之间的间隔Mdatetime64日期时间类型OobjectS(byte-)stringS3表示长度为3的字符串UUnicodeUnicode 字符串Vvoid

【例】

import numpy as np a = np.dtype('b1') print(a.type) # <class 'numpy.bool_'> print(a.itemsize) # 1 a = np.dtype('i1') print(a.type) # <class 'numpy.int8'> print(a.itemsize) # 1 a = np.dtype('i2') print(a.type) # <class 'numpy.int16'> print(a.itemsize) # 2 a = np.dtype('i4') print(a.type) # <class 'numpy.int32'> print(a.itemsize) # 4 a = np.dtype('i8') print(a.type) # <class 'numpy.int64'> print(a.itemsize) # 8 a = np.dtype('u1') print(a.type) # <class 'numpy.uint8'> print(a.itemsize) # 1 a = np.dtype('u2') print(a.type) # <class 'numpy.uint16'> print(a.itemsize) # 2 a = np.dtype('u4') print(a.type) # <class 'numpy.uint32'> print(a.itemsize) # 4 a = np.dtype('u8') print(a.type) # <class 'numpy.uint64'> print(a.itemsize) # 8 a = np.dtype('f2') print(a.type) # <class 'numpy.float16'> print(a.itemsize) # 2 a = np.dtype('f4') print(a.type) # <class 'numpy.float32'> print(a.itemsize) # 4 a = np.dtype('f8') print(a.type) # <class 'numpy.float64'> print(a.itemsize) # 8 a = np.dtype('S') print(a.type) # <class 'numpy.bytes_'> print(a.itemsize) # 0 a = np.dtype('S3') print(a.type) # <class 'numpy.bytes_'> print(a.itemsize) # 3 a = np.dtype('U3') print(a.type) # <class 'numpy.str_'> print(a.itemsize) # 12

3）数据类型信息

Python 的浮点数通常是64位浮点数，几乎等同于 np.float64。

NumPy和Python整数类型的行为在整数溢出方面存在显着差异，与 NumPy 不同，Python 的int 是灵活的。这意味着Python整数可以扩展以容纳任何整数并且不会溢出。

Machine limits for integer types.

class iinfo(object): def __init__(self, int_type): pass def min(self): pass def max(self): pass

【例】

import numpy as np ii16 = np.iinfo(np.int16) print(ii16.min) # -32768 print(ii16.max) # 32767 ii32 = np.iinfo(np.int32) print(ii32.min) # -2147483648 print(ii32.max) # 2147483647

Machine limits for floating point types.

class finfo(object): def _init(self, dtype):

【例】

import numpy as np ff16 = np.finfo(np.float16) print(ff16.bits) # 16 print(ff16.min) # -65500.0 print(ff16.max) # 65500.0 print(ff16.eps) # 0.000977 ff32 = np.finfo(np.float32) print(ff32.bits) # 32 print(ff32.min) # -3.4028235e+38 print(ff32.max) # 3.4028235e+38 print(ff32.eps) # 1.1920929e-07

三、时间日期和时间增量

1）datetime64 基础在 numpy 中，我们很方便的将字符串转换成时间日期类型 datetime64（datetime 已被 python 包含的日期时间库所占用）。 datatime64是带单位的日期时间类型，其单位如下：

期单位代码含义时间单位代码含义年h小时月m分钟周s秒天ms毫秒-us微秒-ns纳秒-ps皮秒-fs飞秒-as阿托秒

注意：

1秒 = 1000 毫秒（milliseconds）1毫秒 = 1000 微秒（microseconds）

【例1】从字符串创建 datetime64 类型时，默认情况下，numpy 会根据字符串自动选择对应的单位。

import numpy as np a = np.datetime64('2020-03-01') print(a, a.dtype) # 2020-03-01 datetime64[D] a = np.datetime64('2020-03') print(a, a.dtype) # 2020-03 datetime64[M] a = np.datetime64('2020-03-08 20:00:05') print(a, a.dtype) # 2020-03-08T20:00:05 datetime64[s] a = np.datetime64('2020-03-08 20:00') print(a, a.dtype) # 2020-03-08T20:00 datetime64[m] a = np.datetime64('2020-03-08 20') print(a, a.dtype) # 2020-03-08T20 datetime64[h]

【例2】从字符串创建 datetime64 类型时，可以强制指定使用的单位。

import numpy as np a = np.datetime64('2020-03', 'D') print(a, a.dtype) # 2020-03-01 datetime64[D] a = np.datetime64('2020-03', 'Y') print(a, a.dtype) # 2020 datetime64[Y] print(np.datetime64('2020-03') == np.datetime64('2020-03-01')) # True print(np.datetime64('2020-03') == np.datetime64('2020-03-02')) #False

由上例可以看出，2019-03 和 2019-03-01 所表示的其实是同一个时间。事实上，如果两个 datetime64 对象具有不同的单位，它们可能仍然代表相同的时刻。并且从较大的单位（如月份）转换为较小的单位（如天数）是安全的。

【例3】从字符串创建 datetime64 数组时，如果单位不统一，则一律转化成其中最小的单位。

import numpy as np a = np.array(['2020-03', '2020-03-08', '2020-03-08 20:00'], dtype='datetime64') print(a, a.dtype) # ['2020-03-01T00:00' '2020-03-08T00:00' '2020-03-08T20:00'] datetime64[m]

【例4】使用arange()创建 datetime64 数组，用于生成日期范围。

import numpy as np a = np.arange('2020-08-01', '2020-08-10', dtype=np.datetime64) print(a) # ['2020-08-01' '2020-08-02' '2020-08-03' '2020-08-04' '2020-08-05' # '2020-08-06' '2020-08-07' '2020-08-08' '2020-08-09'] print(a.dtype) # datetime64[D] a = np.arange('2020-08-01 20:00', '2020-08-10', dtype=np.datetime64) print(a) # ['2020-08-01T20:00' '2020-08-01T20:01' '2020-08-01T20:02' ... # '2020-08-09T23:57' '2020-08-09T23:58' '2020-08-09T23:59'] print(a.dtype) # datetime64[m] a = np.arange('2020-05', '2020-12', dtype=np.datetime64) print(a) # ['2020-05' '2020-06' '2020-07' '2020-08' '2020-09' '2020-10' '2020-11'] print(a.dtype) # datetime64[M]

2）datetime64 和 timedelta64 运算

【例1】timedelta64 表示两个 datetime64 之间的差。timedelta64 也是带单位的，并且和相减运算中的两个 datetime64 中的较小的单位保持一致。

import numpy as np a = np.datetime64('2020-03-08') - np.datetime64('2020-03-07') b = np.datetime64('2020-03-08') - np.datetime64('202-03-07 08:00') c = np.datetime64('2020-03-08') - np.datetime64('2020-03-07 23:00', 'D') print(a, a.dtype) # 1 days timedelta64[D] print(b, b.dtype) # 956178240 minutes timedelta64[m] print(c, c.dtype) # 1 days timedelta64[D] a = np.datetime64('2020-03') + np.timedelta64(20, 'D') b = np.datetime64('2020-06-15 00:00') + np.timedelta64(12, 'h') print(a, a.dtype) # 2020-03-21 datetime64[D] print(b, b.dtype) # 2020-06-15T12:00 datetime64[m]

【例2】生成 timedelta64时，要注意年（‘Y’）和月（‘M’）这两个单位无法和其它单位进行运算（一年有几天？一个月有几个小时？这些都是不确定的）。

import numpy as np a = np.timedelta64(1, 'Y') b = np.timedelta64(a, 'M') print(a) # 1 years print(b) # 12 months c = np.timedelta64(1, 'h') d = np.timedelta64(c, 'm') print(c) # 1 hours print(d) # 60 minutes print(np.timedelta64(a, 'D')) # TypeError: Cannot cast NumPy timedelta64 scalar from metadata [Y] to [D] according to the rule 'same_kind' print(np.timedelta64(b, 'D')) # TypeError: Cannot cast NumPy timedelta64 scalar from metadata [M] to [D] according to the rule 'same_kind'

【例3】timedelta64 的运算。

import numpy as np a = np.timedelta64(1, 'Y') b = np.timedelta64(6, 'M') c = np.timedelta64(1, 'W') d = np.timedelta64(1, 'D') e = np.timedelta64(10, 'D') print(a) # 1 years print(b) # 6 months print(a + b) # 18 months print(a - b) # 6 months print(2 * a) # 2 years print(a / b) # 2.0 print(c / d) # 7.0 print(c % e) # 7 days

【例4】numpy.datetime64 与 datetime.datetime 相互转换

import numpy as np import datetime dt = datetime.datetime(year=2020, month=6, day=1, hour=20, minute=5, second=30) dt64 = np.datetime64(dt, 's') print(dt64, dt64.dtype) 2020-06-01T20:05:30 datetime64[s] dt2 = dt64.astype(datetime.datetime) print(dt2, type(dt2)) #2020-06-01 20:05:30 <class 'datetime.datetime'>

四、数组的创建

1.数组的创建

numpy 提供的最重要的数据结构是ndarray，它是 python 中list的扩展。

1）依据现有数据来创建 ndarray

1.1）通过array()函数进行创建。

def array(p_object, dtype=None, copy=True, order='K', subok=False, ndmin=0):

【例】

import numpy as np # 创建一维数组 a = np.array([0, 1, 2, 3, 4]) b = np.array((0, 1, 2, 3, 4)) print(a, type(a)) # [0 1 2 3 4] <class 'numpy.ndarray'> print(b, type(b)) # [0 1 2 3 4] <class 'numpy.ndarray'> # 创建二维数组 c = np.array([[11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25], [26, 27, 28, 29, 30], [31, 32, 33, 34, 35]]) print(c, type(c)) # [[11 12 13 14 15] # [16 17 18 19 20] # [21 22 23 24 25] # [26 27 28 29 30] # [31 32 33 34 35]] <class 'numpy.ndarray'> # 创建三维数组 d = np.array([[(1.5, 2, 3), (4, 5, 6)], [(3, 2, 1), (4, 5, 6)]]) print(d, type(d)) # [[[1.5 2. 3. ] # [4. 5. 6. ]] # # [[3. 2. 1. ] # [4. 5. 6. ]]] <class 'numpy.ndarray'>

1.2）通过asarray()函数进行创建

array()和asarray()都可以将结构数据转化为 ndarray，但是array()和asarray()主要区别就是当数据源是ndarray 时，array()仍然会 copy 出一个副本，占用新的内存，但不改变 dtype 时 asarray()不会。

def asarray(a, dtype=None, order=None): return array(a, dtype, copy=False, order=order)

【例】array()和asarray()都可以将结构数据转化为 ndarray

import numpy as np x = [[1, 1, 1], [1, 1, 1], [1, 1, 1]] y = np.array(x) z = np.asarray(x) x[1][2] = 2 print(x,type(x)) # [[1, 1, 1], [1, 1, 2], [1, 1, 1]] <class 'list'> print(y,type(y)) # [[1 1 1] # [1 1 1] # [1 1 1]] <class 'numpy.ndarray'> print(z,type(z)) # [[1 1 1] # [1 1 1] # [1 1 1]] <class 'numpy.ndarray'>

【例】array()和asarray()的区别。(array()和asarray()主要区别就是当数据源是ndarray 时，array()仍然会 copy 出一个副本，占用新的内存，但不改变 dtype 时 asarray()不会。)

import numpy as np x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]]) y = np.array(x) z = np.asarray(x) w = np.asarray(x, dtype=np.int) x[1][2] = 2 print(x,type(x),x.dtype) # [[1 1 1] # [1 1 2] # [1 1 1]] <class 'numpy.ndarray'> int32 print(y,type(y),y.dtype) # [[1 1 1] # [1 1 1] # [1 1 1]] <class 'numpy.ndarray'> int32 print(z,type(z),z.dtype) # [[1 1 1] # [1 1 2] # [1 1 1]] <class 'numpy.ndarray'> int32 print(w,type(w),w.dtype) # [[1 1 1] # [1 1 2] # [1 1 1]] <class 'numpy.ndarray'> int32

1.3）通过fromfunction()函数进行创建

给函数绘图的时候可能会用到fromfunction()，该函数可从函数中创建数组。

def fromfunction(function, shape, **kwargs):

【例】通过在每个坐标上执行一个函数来构造数组。

import numpy as np def f(x, y): return 10 * x + y x = np.fromfunction(f, (5, 4), dtype=int) print(x) # [[ 0 1 2 3] # [10 11 12 13] # [20 21 22 23] # [30 31 32 33] # [40 41 42 43]] x = np.fromfunction(lambda i, j: i == j, (3, 3), dtype=int) print(x) # [[ True False False] # [False True False] # [False False True]] x = np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int) print(x) # [[0 1 2] # [1 2 3] # [2 3 4]]

2）依据 ones 和 zeros 填充方式

在机器学习任务中经常做的一件事就是初始化参数，需要用常数值或者随机值来创建一个固定大小的矩阵。

（a）零数组

zeros()函数：返回给定形状和类型的零数组。zeros_like()函数：返回与给定数组形状和类型相同的零数组。 def zeros(shape, dtype=None, order='C'): def zeros_like(a, dtype=None, order='K', subok=True, shape=None):

【例】

import numpy as np x = np.zeros(5) print(x) # [0. 0. 0. 0. 0.] x = np.zeros([2, 3]) print(x) # [[0. 0. 0.] # [0. 0. 0.]] x = np.array([[1, 2, 3], [4, 5, 6]]) y = np.zeros_like(x) print(y) # [[0 0 0] # [0 0 0]]

（b）1数组

ones()函数：返回给定形状和类型的1数组。ones_like()函数：返回与给定数组形状和类型相同的1数组。 def ones(shape, dtype=None, order='C'): def ones_like(a, dtype=None, order='K', subok=True, shape=None):

【例】

import numpy as np x = np.ones(5) print(x) # [1. 1. 1. 1. 1.] x = np.ones([2, 3]) print(x) # [[1. 1. 1.] # [1. 1. 1.]] x = np.array([[1, 2, 3], [4, 5, 6]]) y = np.ones_like(x) print(y) # [[1 1 1] # [1 1 1]]

（c）空数组

empty()函数：返回一个空数组，数组元素为随机数。empty_like函数：返回与给定数组具有相同形状和类型的新数组。 def empty(shape, dtype=None, order='C'): def empty_like(prototype, dtype=None, order='K', subok=True, shape=None):

【例】

import numpy as np x = np.empty(5) print(x) # [1.95821574e-306 1.60219035e-306 1.37961506e-306 # 9.34609790e-307 1.24610383e-306] x = np.empty((3, 2)) print(x) # [[1.60220393e-306 9.34587382e-307] # [8.45599367e-307 7.56598449e-307] # [1.33509389e-306 3.59412896e-317]] x = np.array([[1, 2, 3], [4, 5, 6]]) y = np.empty_like(x) print(y) # [[ 7209029 6422625 6619244] # [ 100 707539280 504]]

（d）单位数组

eye()函数：返回一个对角线上为1，其它地方为零的单位数组。identity()函数：返回一个方的单位数组。 def eye(N, M=None, k=0, dtype=float, order='C'): def identity(n, dtype=None):

【例】

import numpy as np x = np.eye(4) print(x) # [[1. 0. 0. 0.] # [0. 1. 0. 0.] # [0. 0. 1. 0.] # [0. 0. 0. 1.]] x = np.eye(2, 3) print(x) # [[1. 0. 0.] # [0. 1. 0.]] x = np.identity(4) print(x) # [[1. 0. 0. 0.] # [0. 1. 0. 0.] # [0. 0. 1. 0.] # [0. 0. 0. 1.]]

（e）对角数组

diag()函数：提取对角线或构造对角数组。 def diag(v, k=0):

【例】

import numpy as np x = np.arange(9).reshape((3, 3)) print(x) # [[0 1 2] # [3 4 5] # [6 7 8]] print(np.diag(x)) # [0 4 8] print(np.diag(x, k=1)) # [1 5] print(np.diag(x, k=-1)) # [3 7] v = [1, 3, 5, 7] x = np.diag(v) print(x) # [[1 0 0 0] # [0 3 0 0] # [0 0 5 0] # [0 0 0 7]]

（f）常数数组

full()函数：返回一个常数数组。full_like()函数：返回与给定数组具有相同形状和类型的常数数组。 def full(shape, fill_value, dtype=None, order='C'): def full_like(a, fill_value, dtype=None, order='K', subok=True, shape=None):

【例】

import numpy as np x = np.full((2,), 7) print(x) # [7 7] x = np.full(2, 7) print(x) # [7 7] x = np.full((2, 7), 7) print(x) # [[7 7 7 7 7 7 7] # [7 7 7 7 7 7 7]] x = np.array([[1, 2, 3], [4, 5, 6]]) y = np.full_like(x, 7) print(y) # [[7 7 7] # [7 7 7]]

3）利用数值范围来创建ndarray

arange()函数：返回给定间隔内的均匀间隔的值。linspace()函数：返回指定间隔内的等间隔数字。logspace()函数：返回数以对数刻度均匀分布。numpy.random.rand() 返回一个由[0,1)内的随机数组成的数组。 def arange([start,] stop[, step,], dtype=None): def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0): def logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0): def rand(d0, d1, ..., dn):

【例】

import numpy as np x = np.arange(5) print(x) # [0 1 2 3 4] x = np.arange(3, 7, 2) print(x) # [3 5] x = np.linspace(start=0, stop=2, num=9) print(x) # [0. 0.25 0.5 0.75 1. 1.25 1.5 1.75 2. ] x = np.logspace(0, 1, 5) print(np.around(x, 2)) # [ 1. 1.78 3.16 5.62 10. ] #np.around 返回四舍五入后的值，可指定精度。 # around(a, decimals=0, out=None) # a 输入数组 # decimals 要舍入的小数位数。默认值为0。如果为负，整数将四舍五入到小数点左侧的位置 x = np.linspace(start=0, stop=1, num=5) x = [10 ** i for i in x] print(np.around(x, 2)) # [ 1. 1.78 3.16 5.62 10. ] x = np.random.random(5) print(x) # [0.41768753 0.16315577 0.80167915 0.99690199 0.11812291] x = np.random.random([2, 3]) print(x) # [[0.41151858 0.93785153 0.57031309] # [0.13482333 0.20583516 0.45429181]]

2. 结构数组的创建

结构数组，首先需要定义结构，然后利用np.array()来创建数组，其参数dtype为定义的结构。字符代码可参考第二部分中数据类型

（a）利用字典来定义结构

【例】

import numpy as np personType = np.dtype({ 'names': ['name', 'age', 'weight'], 'formats': ['U30', 'i8', 'f8']}) a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)], dtype=personType) print(a, type(a)) # [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)] # <class 'numpy.ndarray'>

（b）利用包含多个元组的列表来定义结构

【例】

import numpy as np personType = np.dtype([('name', 'U30'), ('age', 'i8'), ('weight', 'f8')]) a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)], dtype=personType) print(a, type(a)) # [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)] # <class 'numpy.ndarray'> # 结构数组的取值方式和一般数组差不多，可以通过下标取得元素： print(a[0]) # ('Liming', 24, 63.9) print(a[-2:]) # [('Mike', 15, 67. ) ('Jan', 34, 45.8)] # 我们可以使用字段名作为下标获取对应的值 print(a['name']) # ['Liming' 'Mike' 'Jan'] print(a['age']) # [24 15 34] print(a['weight']) # [63.9 67. 45.8]

3.数组的属性

在使用 numpy 时，你会想知道数组的某些信息。很幸运，在这个包里边包含了很多便捷的方法，可以给你想要的信息。

numpy.ndarray.ndim用于返回数组的维数（轴的个数）也称为秩，一维数组的秩为 1，二维数组的秩为 2，以此类推。numpy.ndarray.shape表示数组的维度，返回一个元组，这个元组的长度就是维度的数目，即 ndim 属性(秩)。numpy.ndarray.size数组中所有元素的总量，相当于数组的shape中所有元素的乘积，例如矩阵的元素总量为行与列的乘积。numpy.ndarray.dtype ndarray 对象的元素类型。numpy.ndarray.itemsize以字节的形式返回数组中每一个元素的大小。 class ndarray(object): shape = property(lambda self: object(), lambda self, v: None, lambda self: None) dtype = property(lambda self: object(), lambda self, v: None, lambda self: None) size = property(lambda self: object(), lambda self, v: None, lambda self: None) ndim = property(lambda self: object(), lambda self, v: None, lambda self: None) itemsize = property(lambda self: object(), lambda self, v: None, lambda self: None)

【例】

import numpy as np a = np.array([1, 2, 3, 4, 5]) print(a.shape) # (5,) print(a.dtype) # int32 print(a.size) # 5 print(a.ndim) # 1 print(a.itemsize) # 4 b = np.array([[1, 2, 3], [4, 5, 6.0]]) print(b.shape) # (2, 3) print(b.dtype) # float64 print(b.size) # 6 print(b.ndim) # 2 print(b.itemsize) # 8

在ndarray中所有元素必须是同一类型，否则会自动向下转换，int->float->str。

【例】

import numpy as np a = np.array([1, 2, 3, 4, 5]) print(a) # [1 2 3 4 5] b = np.array([1, 2, 3, 4, '5']) print(b) # ['1' '2' '3' '4' '5'] c = np.array([1, 2, 3, 4, 5.0]) print(c) # [1. 2. 3. 4. 5.]

最新回复(0)