治疗输出怎么设置R语言基本操作

新闻资讯2026-04-21 16:43:54

为什么选择R?

丰富的资源

涵盖了多种行业数据分析中几乎所有的方法;

良好的扩展性

十分方便的编写函数和程序包,跨平台,可以胜任复杂的数据分析、绘制精美的图形;

完备的帮助系统

每个函数都有统一格式的帮助,运行实例;

GNU软件

免费、软件本身及程序包的源代码公开;

R的特点:

多领域的统计资源

目前在R网站上约有 4000 个程序包,涵盖了基础统计学、社会学、经济学、生态学、空间分

析、系统发育分析、生物信息学等诸多方面;

跨平台

R可在多种操作系统下运行,如Windows、MacOS、多种Linux和UNIX等;
命令行驱动
R即时解释,输入命令,即可获得相应的结果;

参考资料:

参考配套知识点的第一章,想了解更全面的知识点,可以看这👉 【R语言知识点详细总结】 中的 第

一章 R的数据结构 ;

In [3]:
x <- 4 #也可用=赋值
print(x)

In [4]:
typeof(x)

In [5]:
is.vector(x)

[1] 4

'double'

TRUE

In [6]:
y <- c( 88 , 5 , 12 , 13 )
print(y)
print(typeof(y))
print(is.vector(y))

In [7]:

x1 <- c( 1 , 2 , 3 , 4 , 5 )
print(x1)

x2 <- 1 : 5
print(x2)

In [8]:
seq(from = 12 , to = 30 , by = 3 )

In [9]:
seq(from=1.1, to= 2 , length= 10 )

[1] 88 5 12 13

[1] "double"
[1] TRUE

[1] 1 2 3 4 5

[1] 1 2 3 4 5

12 15 18 21 24 27 30

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

In [10]:

rep( 8 , 4 )

c( 8 , 8 , 8 , 8 )

In [11]:
y <- c(y[ 1 : 3 ], 168 , y[ 4 ])
print(y)

In [12]:
y <- c(y[ 1 : 3 ], c( 56 , 24 , 35 , 10 , 5 , 7 ), y[ 4 ])
print(y)

In [13]:
length(y)

8 8 8 8

8 8 8 8

[1] 88 5 12 168 13

[1] 88 5 12 56 24 35 10 5 7 168

10

In [14]:
c( 1 , 2 , 4 ) + c( 5 , 0 ,-1)
c( 1 , 2 , 4 ) - c( 5 , 0 ,-1)
c( 1 , 2 , 4 ) * c( 5 , 0 ,-1)
c( 1 , 2 , 4 ) / c( 5 , 0 ,-1)

In [15]:
y[ 2 ]

In [16]:
y[ 2 : 4 ]

In [17]:
print(y)
y[ 2 : 4 ] = c( 8 , 14 , 67 )
print(y)

6 2 3

-4 2 5

5 0 -

0.2 Inf -

5

5 12 56

[1] 88 5 12 56 24 35 10 5 7 168

[1] 88 8 14 67 24 35 10 5 7 168

In [18]:
print(y)
print(y[-c( 1 : 3 )]) # 或者b=(1:3) y[-b]

  • X = c(1,1,1)

  • Y = c(2,2,2)

  • temp = c(14.7,18.5,25.9)

  • RH = c(66,73,41)

  • wind = c(2.7,8.5,3.6)

  • rain = c(0,0,0)

  • area = c(0,0,0)

  • rank = c(1,2,3)

In [19]:
X = c( 1 , 1 , 1 )
Y = c( 2 , 2 , 2 )
temp = c(14.7,18.5,25.9)
RH = c( 66 , 73 , 41 )
wind = c(2.7,8.5,3.6)
rain = c( 0 , 0 , 0 )
area = c( 0 , 0 , 0 )
rank = c( 1 , 2 , 3 )
ForeData = cbind(X,Y,temp,RH,wind,rain,area,rank)
print(ForeData)
print(is.matrix(ForeData)) # 判断是否为矩阵

[1] 88 8 14 67 24 35 10 5 7 168

[1] 67 24 35 10 5 7 168

X Y temp RH wind rain area rank
[1,] 1 2 14.7 66 2.7 0 0 1
[2,] 1 2 18.5 73 8.5 0 0 2
[3,] 1 2 25.9 41 3.6 0 0 3
[1] TRUE

In [20]:
mdat <- matrix(c( 1 , 2 , 3 , 11 , 12 , 13 ), nrow = 2 , ncol = 3 , byrow = TRUE, dimnames =
print(mdat)

In [21]:
x = matrix(nrow = 2 , ncol = 2 ) # 注意:不能写成matrix(2,3)
x[ 1 , 1 ] = 1
x[ 2 , 1 ] = 2
x[ 1 , 2 ] = 3
x[ 2 , 2 ] = 4
print(x)

In [22]:
colnames(x) = c(‘a’,‘b’)
rownames(x) = c(‘1’,‘2’)
print(x)

In [23]:
print(ForeData[ 2 , 3 ])

C.1 C.2 C.

row1 1 2 3
row2 11 12 13

[,1] [,2]

[1,] 1 3

[2,] 2 4

a b
1 1 3
2 2 4
temp
18.

In [24]:
print(ForeData[ 1 : 2 , 1 : 3 ])

In [25]:
print(ForeData[ 1 : 2 , c( 1 , 3 )])

X Y temp
[1,] 1 2 14.
[2,] 1 2 18.
X temp
[1,] 1 14.
[2,] 1 18.

In [26]:
a = c( 1 : 60 )
dim1 = c(‘R1’,‘R2’,‘R3’,‘R4’)
dim2 = c(‘C1’,‘C2’,‘C3’,‘C4’,‘C5’)
dim3 = c(‘T1’,‘T2’,‘T3’)
f = array(a,c( 4 , 5 , 3 ),dimnames = list(dim1,dim2,dim3))
print(f)

  • X = c(1,1,1)

  • Y = c(2,2,2)

  • temp = c(14.7,18.5,25.9)

  • RH = c(66,73,41)

  • wind = c(2.7,8.5,3.6)

  • rain = c(0,0,0)

  • area = c(0,0,0)

  • month = c(‘aug’,‘aug’,‘aug’)

  • day = c(‘fri’,‘fri’,‘fri’)

, , T

C1 C2 C3 C4 C

R1 1 5 9 13 17

R2 2 6 10 14 18

R3 3 7 11 15 19

R4 4 8 12 16 20

, , T

C1 C2 C3 C4 C

R1 21 25 29 33 37

R2 22 26 30 34 38

R3 23 27 31 35 39

R4 24 28 32 36 40

, , T

C1 C2 C3 C4 C

R1 41 45 49 53 57

R2 42 46 50 54 58

R3 43 47 51 55 59

R4 44 48 52 56 60

In [27]:
X = c( 1 , 1 , 1 )
Y = c( 2 , 2 , 2 )
temp = c(14.7,18.5,25.9)
RH = c( 66 , 73 , 41 )
wind = c(2.7,8.5,3.6)
rain = c( 0 , 0 , 0 )
area = c( 0 , 0 , 0 )
month = c(‘aug’,‘aug’,‘aug’)
day = c(‘fri’,‘fri’,‘fri’)
ForeDataFrm = data.frame(FX = X,FY = Y, Fmonth = month,Fday = day, Ftemp = temp
print(ForeDataFrm)

In [28]:
names(ForeDataFrm)

In [29]:
is.data.frame(ForeDataFrm)

FX FY Fmonth Fday Ftemp FRH Fwind Frain Farea
1 1 2 aug fri 14.7 66 2.7 0 0
2 1 2 aug fri 18.5 73 8.5 0 0
3 1 2 aug fri 25.9 41 3.6 0 0
'FX' 'FY' 'Fmonth' 'Fday' 'Ftemp' 'FRH' 'Fwind' 'Frain' 'Farea'

TRUE

In [30]:

print(ForeDataFrm[,c( 1 , 3 )])

print(ForeDataFrm[,c(‘FX’,‘Fmonth’)])

In [31]:

ForeDataFrm$Fwind

ForeDataFrm[[‘Fwind’]]

ForeDataFrm[[ 7 ]]

FX Fmonth
1 1 aug
2 1 aug
3 1 aug
FX Fmonth
1 1 aug
2 1 aug
3 1 aug

2.7 8.5 3.

2.7 8.5 3.

2.7 8.5 3.

In [32]:
a <- 123.
is.numeric(a)
is.integer(a)
is.character(a)
is.logical(a)

In [33]:
b <- “123.4”
is.numeric(b)
is.integer(b)
is.character(b)
is.logical(b)

In [34]:
typeof(a)
typeof(b)

TRUE

FALSE

FALSE

FALSE

FALSE

FALSE

TRUE

FALSE

'double'
'character'

In [35]:
a <- as.character(a)
b <- as.double(b)
typeof(a)
typeof(b)

In [36]:
e <- c( 1 : 10 )
f <- as.matrix(e)
print(f)

参考配套知识点的第二章,想了解更全面的知识点,可以看这👉 【R语言知识点详细总结】 中的 第

二章 数据的导入 ;这里只是以txt的导入为例,如果想看更多的文件导入方式,可以看下配套知识点
的第二章,里面有更多格式的文件导入方式;
'character'
'double'

[,1]

[1,] 1

[2,] 2

[3,] 3

[4,] 4

[5,] 5

[6,] 6

[7,] 7

[8,] 8

[9,] 9

[10,] 10

In [37]:
ReportCard1 = read.table(file=‘/home/mw/input/wlong6309/ReportCard1.txt’, heade
ReportCard2 = read.table(file=‘/home/mw/input/wlong6309/ReportCard2.txt’, heade
names(ReportCard1)
names(ReportCard2)

参考配套知识点的第三章,想了解更全面的知识点,可以看这👉 【R语言知识点详细总结】 中的 第

三章 R的数据管理 ;

In [38]:
ReportCard = merge(ReportCard1, ReportCard2, by = ‘xh’)
print(head(ReportCard))

In [39]:
Ord = order(ReportCard$math, na.last = TRUE, decreasing = TRUE)
print(Ord) # Ord为位置向量, 1 号学生的数学成绩最高, 3 号学生的数学成绩最低或者为缺失值

'xh' 'sex' 'poli' 'chi' 'math'
'xh' 'fore' 'phy' 'che' 'geo' 'his'
xh sex poli chi math fore phy che geo his
1 92101 2 96 96 87.5 72 93 65 76.0 92
2 92102 1 94 97 86.5 61 93 64 79.5 95
3 92103 2 NA NA NA 66 98 79 89.0 81
4 92104 2 89 97 69.5 86 83 62 83.0 94
5 92105 1 82 85 79.5 60 88 66 72.5 98
6 92106 2 88 88 78.0 60 90 70 81.5 77

[1] 1 33 2 32 34 31 14 5 6 35 10 45 9 12 8 36 38 46 4 7 44 39 13 50 11

[26] 49 41 16 37 43 42 40 17 47 27 19 58 15 18 52 20 57 22 23 24 48 54 21 30 51

[51] 53 55 60 26 25 56 28 59 29 3

In [40]:

a = ReportCard[Ord,]
print(head(a))

In [41]:
a = is.na(ReportCard$math)
print(ReportCard[a,])

In [42]:
a = complete.cases(ReportCard)
print(ReportCard[!a,])

xh sex poli chi math fore phy che geo his
1 92101 2 96 96 87.5 72 93 65 76.0 92
33 92204 2 88 81 87.5 60 84 63 79.0 92
2 92102 1 94 97 86.5 61 93 64 79.5 95
32 92203 2 74 93 84.5 50 89 72 82.5 92
34 92205 2 81 79 84.0 60 91 64 81.0 92
31 92202 1 78 89 83.5 81 91 77 81.0 93
xh sex poli chi math fore phy che geo his
3 92103 2 NA NA NA 66 98 79 89 81
xh sex poli chi math fore phy che geo his
3 92103 2 NA NA NA 66 98 79 89 81
27 92142 2 NaN 70 59 22 68 26 26 63

In [43]:
install.packages(“mice”)
library(mice)
Updating HTML index of packages in ‘.Library’
Making ‘packages.html’ … done
Warning message:
“As of rlang 0.4.0, dplyr must be at least version 0.8.0.

  • dplyr 0.7.8 is too old for rlang 0.4.11.
  • Please update dplyr with install.packages("dplyr") and restart R.”
    Attaching package: ‘mice’
    The following object is masked from ‘package:stats’:
    filter
    The following objects are masked from ‘package:base’:
    cbind, rbind

In [44]:
print(md.pattern(ReportCard))

注:至于想了解更多的统计函数,可以参考【R语言配套知识点详细总结】中 第三章 R的数据管理 里的 变量计

算 ,里面包含 数学函数、统计函数、概率函数、字符串函数 等;

xh sex fore phy che geo his chi math poli
58 1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 0 0 0 3
0 0 0 0 0 0 0 1 1 2 4

In [45]:
round(sqrt(log( 10 , 2 )),digits= 3 )

注:更多函数,可以移步【R语言配套知识点详细总结】中 第三章 里的 统计函数 ;

In [46]:
mean(y) # 中位数
median(y)
sd(y)
var(y)
max(y)
min(y)

In [47]:
attach(ReportCard)#访问数据框中域访问
SumScore = poli + chi + math + fore + phy + che + geo + his
detach(ReportCard)
AvScore = SumScore/ 8 #计算平均值
ReportCardavScore = AvScore
sum(is.na(ReportCard$sumScore))#计算总分为缺失值的观测值的观测样本数
mean(complete.cases(ReportCard))#计算完整观测样本的比率

1.

42.

19

52.

2727.

168

5

2

0.

In [48]:
sum(y)
cumsum(y)
prod(y)

注:更多函数,可以移步【R语言配套知识点详细总结】中 第三章 里的 概率函数 ;

In [49]:
a = is.na(ReportCardmath
math = math[!a]
dnorm(math,mean(math),sd(math))

注:更多函数,可以移步【R语言配套知识点详细总结】中 第三章 里的 字符串函数 ;

426

88 96 110 177 201 236 246 251 258 426

32616105984000

0.00575809401592225 0.00645108245058991 0.0227150564107415 0.

0.0141904617776079 0.0227150564107415 0.0213938476169057 0.

0.01577914953468 0.0251373901050085 0.019951012746438 0.

0.0116313829477201 0.0243359978745361 0.0261470803551543 0.

0.0232497215340507 0.024950643098031 1 0.0219931442165863 0.

0.0201 10646396652 0.0175429182127999 0.0170139044800109 0.

0.0112979964667856 0.0261887891036519 0.00521672261070003 0.

0.0148879840945352 0.00883547768683561 0.007991 13070263313

0.00575809401592225 0.00840732757664685 0.0147175850040436 0.

0.0262670193933022 0.0218495019576 0.0231224289399321 0.

0.0256209012015816 0.0264212263047745 0.0263585015720458 0.

0.01577914953468 0.0222903309124429 0.0262996025645777 0.

0.0253919543357831 0.0248580221981635 0.0143598431926238 0.

0.0143598431926238 0.0154185816426542 0.0143598431926238 0.008543917573531 1

0.0215421231399613 0.0246549590986037 0.00521672261070003 0.

In [50]:
str = “You like R. So do I”
str_1 = strsplit(str,‘S’)[[ 1 ]] # 注:列表名$域名 或者 列表名[ [‘域名’] ] 或者 列表框[
str_2 = sub(’ ', ‘‘, sub(’ ', '’, str_1[ 1 ])) # 为什么嵌套:sub好像只能替换第一个
str_3 = toupper(str_2)
print(str_3)

注: 46 到 50 是有关矩阵的运算的知识点,想了解更多的相关知识点,可以看这👉【【R语言配套知

识点详细总结】】中的 第三章 R的数据湖管理 里的 矩阵的运算

In [51]:
print(diag( 4 ))

In [52]:
m = matrix( 1 , nrow= 2 , ncol= 2 )
n = matrix( 2 , nrow= 2 , ncol= 2 )
print(m)
print(n)

[1] "YOU_LIKE_R. "

[,1] [,2] [,3] [,4]

[1,] 1 0 0 0

[2,] 0 1 0 0

[3,] 0 0 1 0

[4,] 0 0 0 1

[,1] [,2]

[1,] 1 1

[2,] 1 1

[,1] [,2]

[1,] 2 2

[2,] 2 2

In [53]:
mn = m %*% n
print(mn)

In [54]:
print(diag(mn)) # 输出正对角元素值

In [55]:
mm = matrix( 1 : 9 , nrow= 3 , ncol= 3 , byrow=TRUE)
print(mm)
print(‘转置后的矩阵:’)
print(t(mm))

In [56]:
eigen(mm)

[,1] [,2]

[1,] 4 4

[2,] 4 4

[1] 4 4

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

[1] “转置后的矩阵:”

[,1] [,2] [,3]

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

eigen() decomposition
$values
[1] 1.611684e+01 -1.116844e+00 -1.303678e-15
$vectors
[,1] [,2] [,3]
[1,] -0.2319707 -0.78583024 0.4082483
[2,] -0.5253221 -0.08675134 -0.8164966
[3,] -0.8186735 0.61232756 0.4082483

A :大于等于 90 分;

B:大于等于 80 分,小于 90 分;

C:大于等于 70 分,小于 80 分;

D:大于等于 60 分,小于 70 分;

E:小于 60 分;

In [57]:
attach(ReportCard)#访问数据框中域访问
SumScore = poli + chi + math + fore + phy + che + geo + his
detach(ReportCard)
AvScore = SumScore/ 8 #计算平均值
ReportCardavScore = AvScore

ReportCard = within(ReportCard,{
avScore[avScore>= 90 ] = ‘A’
avScore[avScore>= 80 & avScore < 90 ] = ‘B’
avScore[avScore>= 70 & avScore < 80 ] = ‘C’
avScore[avScore>= 60 & avScore < 70 ] = ‘D’
avScore[avScore < 60 ] = ‘E’
})

flag = ReportCard$avScore %in% c(‘A’,“B”,“C”,“D”,“E”)

ReportCard$avScore[!flag] = NA

print(ReportCard$avScore)

'M’表示男性;

'F’表示女性;

The following object is masked _by_ .GlobalEnv:
math
Warning message in poli + chi + math:
“longer object length is not a multiple of shorter object length”
[1] "B" "B" NA "B" "C" "C" "C" "C" "C" "C" "C" "C" "C" "D" "C" "D" "D" "C" "D"
[20] "D" "D" "D" "D" "D" "E" "E" NA "E" "E" "E" "B" "B" "C" "C" "C" "C" "C" "C"
[39] "D" "C" "C" "C" "C" "D" "D" "D" "D" "C" "D" "D" "D" "D" "D" "D" "D" "D" "D"
[58] "E" "E" "E"

In [58]:
ReportCardsex, levels = c( 1 , 2 ), labels = c(“M”,“F”))
str(ReportCard$sex)

In [59]:
print(head(ReportCard))

In [60]:
MaleScore = subset(ReportCard, ReportCardavScore == 'E
print(MaleScore)

Factor w/ 2 levels "M","F": 2 1 2 2 1 2 2 1 1 2 ...
xh sex poli chi math fore phy che geo his sumScore avScore
1 92101 F 96 96 87.5 72 93 65 76.0 92 677.5 B
2 92102 M 94 97 86.5 61 93 64 79.5 95 670.0 B
3 92103 F NA NA NA 66 98 79 89.0 81 NA <NA>
4 92104 F 89 97 69.5 86 83 62 83.0 94 673.5 B
5 92105 M 82 85 79.5 60 88 66 72.5 98 629.5 C
6 92106 F 88 88 78.0 60 90 70 81.5 77 624.0 C
xh sex poli chi math fore phy che geo his sumScore avScore
28 92144 M 59 79.0 34.0 34 57 37 37 76 409.5 E
29 92145 M 74 84.5 30.5 33 64 34 34 71 439.5 E
30 92146 M 61 69.0 45.0 20 49 32 32 51 397.5 E
58 92234 M 66 79.0 55.5 57 52 57 41 65 451.0 E
59 92236 M 79 76.0 34.0 28 63 36 36 52 414.0 E

In [61]:
xh = sample(ReportCardxh %in% xh,]
print(sample_s)

In [62]:

i = 6
repeat}

In [63]:

for(i in seq(from = 6 , to = 50 , by = 6 ))
print(i)

xh sex poli chi math fore phy che geo his sumScore avScore
1 92101 F 96 96 87.5 72 93 65 76.0 92 677.5 B
5 92105 M 82 85 79.5 60 88 66 72.5 98 629.5 C
7 92108 F 84 90 69.5 50 80 60 86.5 94 615.5 C
27 92142 F NaN 70 59.0 22 68 26 26.0 63 NaN <NA>
30 92146 M 61 69 45.0 20 49 32 32.0 51 397.5 E
39 92211 F 71 73 69.0 42 95 61 76.5 76 556.0 D
41 92213 M 82 76 65.0 60 75 60 78.0 76 569.0 C
46 92218 M 87 72 70.0 65 72 49 62.0 68 534.5 D
56 92231 F 83 84 38.5 60 76 46 65.5 49 515.0 D
58 92234 M 66 79 55.5 57 52 57 41.0 65 451.0 E

[1] 6

[1] 12

[1] 18

[1] 24

[1] 30

[1] 36

[1] 42

[1] 48

[1] 6

[1] 12

[1] 18

[1] 24

[1] 30

[1] 36

[1] 42

[1] 48

参考配套知识点的第四章,想了解更全面的知识点,可以看这👉 【R语言知识点详细总结】 中的 第

四章 R的基本数据分析 ;

In [64]:
summary(ReportCard)

In [65]:
Av.Course = sapply(ReportCard[, 3 : 10 ], FUN = mean, na.rm = TRUE) # 均值
Sd.Course = sapply(ReportCard[, 3 : 10 ],FUN = sd, na.rm = TRUE) # 方差
print(Av.Course)
print(Sd.Course)

xh sex poli chi math
Min. :92101 M:30 Min. :40.00 Min. :63.00 Min. :30.50
1st Qu.:92122 F:30 1st Qu.:74.50 1st Qu.:77.00 1st Qu.:47.25
Median :92174 Median :82.50 Median :84.00 Median :62.50
Mean :92170 Mean :79.64 Mean :83.28 Mean :61.17
3rd Qu.:92217 3rd Qu.:87.00 3rd Qu.:90.00 3rd Qu.:70.75
Max. :92239 Max. :96.00 Max. :97.00 Max. :87.50
NA's :2 NA's :1 NA's :1
fore phy che geo
Min. :20.00 Min. :49.00 Min. :26.00 Min. :26.00
1st Qu.:40.75 1st Qu.:67.75 1st Qu.:45.50 1st Qu.:57.75
Median :50.00 Median :76.50 Median :55.00 Median :66.00
Mean :49.92 Mean :75.20 Mean :54.08 Mean :65.24
3rd Qu.:60.00 3rd Qu.:83.25 3rd Qu.:62.25 3rd Qu.:78.00
Max. :86.00 Max. :98.00 Max. :83.00 Max. :89.00
his sumScore avScore
Min. :49.00 Min. :372.5 Length:60
1st Qu.:71.75 1st Qu.:510.0 Class :character
Median :79.50 Median :554.0 Mode :character
Mean :78.68 Mean :548.7
3rd Qu.:91.00 3rd Qu.:589.2
Max. :98.00 Max. :677.5
NA's :2
poli chi math fore phy che geo his
79.63793 83.27966 61.16949 49.91667 75.20000 54.08333 65.24167 78.68333
poli chi math fore phy che geo his
10.575872 8.127365 15.076417 14.018501 12.351902 12.315474 15.394389 12.735233

In [66]:
Av.Course = colMeans(ReportCard[, 3 : 10 ],na.rm = TRUE) # 各科平均分
Sums.Course = colSums(ReportCard[, 3 : 10 ],na.rm = TRUE) # 各科总分
print(Av.Course)
print(Sums.Course)

In [67]:
Av.Person = rowMeans(ReportCard[, 3 : 10 ],na.rm = TRUE)
Sum.Person = rowSums(ReportCard[, 3 : 10 ],na.rm = TRUE)
print(Av.Person)
print(Sum.Person)

In [68]:
#抽取女生的数据
FeMaleCard = subset(ReportCard,ReportCard$sex == “F”)
#求女生各科成绩的平均值
Des.FeMale = sapply(FeMaleCard[ 3 : 10 ],FUN = mean,na.rm = TRUE)
print(Des.FeMale)

poli chi math fore phy che geo his
79.63793 83.27966 61.16949 49.91667 75.20000 54.08333 65.24167 78.68333
poli chi math fore phy che geo his
4619.0 4913.5 3609.0 2995.0 4512.0 3245.0 3914.5 4721.0

[1] 84.68750 83.75000 82.60000 82.93750 78.87500 79.06250 76.75000 79.00000

[9] 71.75000 75.75000 72.87500 72.31250 72.81250 70.06250 71.18750 68.56250

[17] 67.25000 71.68750 67.68750 70.00000 68.06250 67.12500 63.00000 62.87500

[25] 56.25000 56.68750 47.71429 51.62500 53.12500 44.87500 84.18750 79.62500

[33] 79.31250 79.00000 71.81250 74.50000 73.75000 72.75000 70.43750 71.81250

[41] 71.50000 70.81250 69.43750 68.50000 67.43750 68.12500 67.62500 69.06250

[49] 64.25000 63.93750 66.62500 64.68750 63.81250 62.75000 62.81250 62.75000

[57] 60.12500 59.06250 50.50000 41.12500

[1] 677.5 670.0 413.0 663.5 631.0 632.5 614.0 632.0 574.0 606.0 583.0 578.5

[13] 582.5 560.5 569.5 548.5 538.0 573.5 541.5 560.0 544.5 537.0 504.0 503.0

[25] 450.0 453.5 334.0 413.0 425.0 359.0 673.5 637.0 634.5 632.0 574.5 596.0

[37] 590.0 582.0 563.5 574.5 572.0 566.5 555.5 548.0 539.5 545.0 541.0 552.5

[49] 514.0 511.5 533.0 517.5 510.5 502.0 502.5 502.0 481.0 472.5 404.0 329.0

poli chi math fore phy che geo his
80.46429 83.05172 62.34483 48.63333 77.66667 55.80000 67.95000 78.43333

In [69]:
Des.Gender = tapply(ReportCardsex,FUN = summary,na.rm
print(Des.Gender)

In [70]:
Tmp = ReportCard[complete.cases(ReportCard),]
CorMatrix = cor(Tmp[,c( 5 , 7 , 8 )],use = “everything”,method = “pearson”)
print(CorMatrix)

In [71]:
Tmp = ReportCard[complete.cases(ReportCard),]
cor.test(Tmp[, 5 ],Tmp[, 7 ],alternative = “two.side”,method = “pearson”)

$M

Min. 1st Qu. Median Mean 3rd Qu. Max.
56.00 73.25 82.00 78.87 86.75 94.00
$F
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
40.00 76.00 83.00 80.46 88.00 96.00 2
math phy che
math 1.0000000 0.7535317 0.7171637
phy 0.7535317 1.0000000 0.6207730
che 0.7171637 0.6207730 1.0000000
Pearson's product-moment correlation
data: Tmp[, 5] and Tmp[, 7]
t = 8.5775, df = 56, p-value = 8.753e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6149204 0.8469769
sample estimates:
cor
0.7535317

In [72]:
CrossTable = table(ReportCard[,c( 2 , 12 )])
chisq.test(CrossTable)

参考配套知识点的第五章,想了解更全面的知识点,可以看这👉 【R语言知识点详细总结】 中的 第

五章 R的数据可视化 ;

In [4]:
Forest = read.table(file=‘/home/mw/input/wlong6309/ForestData.txt’, header = TR
print(head(Forest))

Warning message in chisq.test(CrossTable):
“Chi-squared approximation may be incorrect”
Pearson's Chi-squared test
data: CrossTable
X-squared = 0.67532, df = 3, p-value = 0.879
X Y month day temp RH wind rain area
1 1 2 aug fri 14.7 66 2.7 0 0
2 1 2 aug fri 18.5 73 8.5 0 0
3 1 2 aug fri 25.9 41 3.6 0 0
4 1 2 aug sat 25.9 32 3.1 0 0
5 1 2 aug sun 19.5 39 6.3 0 0
6 1 2 aug sun 17.9 44 2.2 0 0

In [74]:
stem(Forest$temp)

The decimal point is at the |
2 | 2
4 | 26666668111112333588
6 | 755
8 | 022337889038
10 | 1112334566690002223345556667888
12 | 223444677899123344777888899
14 | 012222334456677778911222222444444455667788999999
16 | 011222234446666677888888900001111222333444444446666677777888888999
18 | 00001222222334444556666777888999999000111111222233333344444556666666
20 | 11111122233334444445566666667777778888889900011112222333344445555666+3
22 | 11112223344566778888899990001223333344444455677778889999
24 | 0111112222233333566668889901333445679999
26 | 122344444788899234556788899
28 | 002336779236
30 | 2226880
32 | 344613

In [75]:
Forestmonth,levels = c(“jan”,“feb”,“mar”,“apr”,“may”,"ju
boxplot(temp~month,data = Forest,main = “森林地区各月温度箱线图”)

In [76]:
hist(Forest$temp,xlab = “森林地区温度”,ylab = “频率”,main = “森林地区温度直方图”,cex.

In [77]:
NumGrade = tapply(ReportCardavScore,FUN = length)
barplot(NumGrade,xlab = “平均分等级”,ylab = “人数”,ylim = c( 0 , 25 ))

In [78]:
Pct = round(NumGrade/length(ReportCard$avScore)* 100 , 2 )
GLabs = paste(c(“B”,“C”,“D”,“E”),Pct,“%”,sep = “”)
pie(NumGrade,labels = GLabs,cex = 0.8,main = “平均分等级饼图”,cex.main = 0.8)

In [79]:
plot(ForestRH,main = “森林地区温度和相对湿度的散点图”,xlab = “温度”,ylab

In [80]:
plot(ForestRH,main = “森林地区温度和相对湿度的散点图”,xlab = “温度”,ylab
M0 = lm(RH~temp,data = Forest)
abline(M0temp)
lines(Forestfitted[Ord],lwd = 1 ,lty = 1 ,col = 2 )

In [81]:
install.packages(“scatterplot3d”)
library(“scatterplot3d”)
with(Forest,scatterplot3d(temp,RH,wind,main=“森林地区温度、相对湿度和风力的三维散点图”

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

In [82]:
install.packages(“corrgram”)
library(“corrgram”)
corrgram(ReportCard[, 3 : 10 ],lower.panel=panel.shade,upper.panel=panel.pie,text.p

参考配套知识点的第六章,想了解更全面的知识点,可以看这👉 【R语言知识点详细总结】 中的 第

六章 R的统计分析 ;

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

In [83]:

x <- c( 95 , 89 , 68 , 90 , 88 , 60 , 81 , 67 , 60 , 60 , 60 , 63 , 60 , 92 , 60 , 88 , 88 , 87 , 60 , 73 , 60 , 97 , 91 , 60
binom.test(min(sum(x> 80 ),sum(x< 80 )),sum(x!= 80 ), 0.75)

310 350 370 377 389 400 415 425 440 295 325 296 250 340 298 365 375 360 385

In [84]:

spamail <- c( 310 , 350 , 370 , 377 , 380 , 400 , 415 , 425 , 440 , 295 , 325 , 296 , 250 , 340 , 298 , 365 , 37
wilcox.test(spamail, 320 ,alt=‘great’,conf.int=TRUE)

Exact binomial test
data: min(sum(x > 80), sum(x < 80)) and sum(x != 80)
number of successes = 13, number of trials = 28, p-value = 0.001436
alternative hypothesis: true probability of success is not equal to 0.75
95 percent confidence interval:
0.2751086 0.6613009
sample estimates:
probability of success
0.4642857
Wilcoxon rank sum test
data: spamail and 320
W = 14, p-value = 0.3
alternative hypothesis: true location shift is greater than 0
95 percent confidence interval:
-70 Inf
sample estimates:
difference in location
45

In [85]:

x <- c( 24 , 26 , 29 , 34 , 43 , 58 , 63 , 72 , 87 , 101 )
y <- c( 82 , 87 , 97 , 121 , 164 , 208 , 213 )

wilcox.test(x,y,alternative=“less”,exact=FALSE,correct=FALSE)

Wilcoxon rank sum test
data: x and y
W = 4.5, p-value = 0.001449
alternative hypothesis: true location shift is less than 0

In [86]:

x <- c( 98 , 67 , 13 , 18 , 38 , 41 , 8 , 12 , 289 , 262 , 57 , 30 )
dim(x)<- c( 4 , 3 )
chisq.test(x)

In [87]:

medicine<-matrix(c( 8 , 7 , 2 , 23 ), 2 , 2 )
fisher.test(medicine)

Pearson's Chi-squared test
data: x
X-squared = 15.073, df = 6, p-value = 0.01969
Fisher's Exact Test for Count Data
data: medicine
p-value = 0.002429
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.856547 143.340082
sample estimates:
odds ratio
12.12648

In [88]:

drug <- c( 80 , 203 , 236 , 252 , 284 , 368 , 457 , 393 , 133 , 180 , 100 , 160 , 156 , 295 , 320 , 448 , 465 , 48
gr.drug<-c( 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 3 , 3 , 3 , 4 , 4 , 4 , 4 , 4 , 4 )
kruskal.test(drug,gr.drug)

Kruskal-Wallis rank sum test
data: drug and gr.drug
Kruskal-Wallis chi-squared = 8.0721, df = 3, p-value = 0.04455

In [89]:

beijingfish <- c( 85 , 82 , 82 , 79 , 87 , 75 , 86 , 82 , 90 , 81 , 80 , 76 , 80 , 75 , 81 , 75 )
treat.BF <- c( 1 , 2 , 3 , 4 , 1 , 2 , 3 , 4 , 1 , 2 , 3 , 4 , 1 , 2 , 3 , 4 )
block.BF <- c( 1 , 1 , 1 , 1 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 4 , 4 , 4 , 4 )
friedman.test(beijingfish,treat.BF,block.BF)

Friedman rank sum test
data: beijingfish, treat.BF and block.BF
Friedman chi-squared = 8.1316, df = 3, p-value = 0.04337

In [90]:
x <- c( 65 , 79 , 67 , 66 , 89 , 85 , 84 , 73 , 88 , 80 , 86 , 75 )
y <- c( 62 , 66 , 50 , 68 , 88 , 86 , 64 , 62 , 92 , 64 , 81 , 80 )
cor.test(x,y) #pearson相关性检验
cor.test(x,y,meth=‘spearman’) # spearman相关系数
cor.test(x,y,meth=‘kendall’) # kendall相关系数

x:318,910,200,409,425,502,314,1210,1022,1225
y:524,1019,638,815,913,928,605,1516,1219,1624
Pearson's product-moment correlation
data: x and y
t = 3.4403, df = 10, p-value = 0.006328
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.2811026 0.9209916
sample estimates:
cor
0.7362315
Warning message in cor.test.default(x, y, meth = "spearman"):
“Cannot compute exact p-value with ties”
Spearman's rank correlation rho
data: x and y
S = 65.227, p-value = 0.003265
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.7719346
Warning message in cor.test.default(x, y, meth = "kendall"):
“Cannot compute exact p-value with ties”
Kendall's rank correlation tau
data: x and y
z = 2.6181, p-value = 0.008842
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.5846846

In [91]:
x<-c( 318 , 910 , 200 , 409 , 425 , 502 , 314 , 1210 , 1022 , 1225 )
y<-c( 524 , 1019 , 638 , 815 , 913 , 928 , 605 , 1516 , 1219 , 1624 )
plot(x,y)
lm.reg<-lm(y~ 1 +x)
summary(lm.reg)
op=par(mfrow=c( 2 , 2 ))
plot(lm.reg)#产生四个图,分别是:1 residual vs fitted;2 Normal QQ-plot;3 scale-loc
par(op)

Call:
lm(formula = y ~ 1 + x)
Residuals:
Min 1Q Median 3Q Max
-191.52 -86.63 45.26 79.32 138.17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 393.0431 79.6510 4.935 0.00114 **
x 0.8983 0.1057 8.498 2.82e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 125.4 on 8 degrees of freedom
Multiple R-squared: 0.9003, Adjusted R-squared: 0.8878
F-statistic: 72.21 on 1 and 8 DF, p-value: 2.821e-05

In [92]:

point <- data.frame(x= 425 )
lm.pred <- predict(lm.reg,point,interval=‘prediction’,level=0.95)
print(lm.pred)

fit lwr upr
1 774.8322 466.5557 1083.109

In [93]:
#从 1 加到 100
#方法1:for循环
sum1= 0
for(i in seq (from= 1 , to= 100 ,by= 1 ) ) sum1=sum1+i
print(sum1)
#方法2:repeat循环
i= 0
sum2= 0
repeat}
print(sum2)
#方法3:while循环
sum3= 0
i= 0
while(i<= 100 ){ sum3=sum3+i; i=i+1}
print(sum3)
#方法4 : sum函数
print(sum(c( 1 : 100 )))

In [94]:
#从 1 的平方加到 100 的平方#方法1:for函数
sum4= 0
for(i in seq (from= 1 , to= 100 , by= 1 ) ) sum4=sum4+i^ 2
print(sum4)
#方法2:repeat循环
i= 0
sum5= 0
repeat}
print(sum5)
#方法3:while循环
sum6= 0
i= 0
while (i<= 100 ){ sum6=sum6+i^ 2 ; i=i+1}
print(sum6)
#方法4: sum函数
print(sum(c(( 1 : 100 )^ 2 )))

[1] 5050

[1] 5050

[1] 5050

[1] 5050

[1] 338350

[1] 338350

[1] 338350

[1] 338350

In [95]:
t = seq(from= 1 , to= 100 , by= 2 ) #从 1 到100,间隔为 2 ,输出数
print(t)

In [96]:
t = c( 1 : 200 )
t = t[-5] #删除第 5 个元素
t = c(t[ 1 : 4 ], 11 , 21 , t[ 5 : 199 ]) #在第五个元素的位置上,添加11,21两个数
print(t)

[1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

[26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99

[1] 1 2 3 4 11 21 6 7 8 9 10 11 12 13 14 15 16 17

[19] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

[37] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

[55] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

[73] 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

[91] 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107

[109] 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125

[127] 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143

[145] 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161

[163] 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179

[181] 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

[199] 198 199 200

In [97]:
y = c( 1 : 24 )
t = array(y, c( 3 , 4 , 2 )) #访问第二组数据
print(t)

X = c(1,1,1)
Y = c(2,2,2)
temp = c(14.7,18.5,25.9)
RH = c(66,73,41)

In [98]:
X = c( 1 , 1 , 1 )
Y = c( 2 , 2 , 2 )
temp = c(14.7,18.5,25.9)
RH = c( 66 , 73 , 41 )
data = data.frame(X,Y,temp,RH) #定义数据框
print(data)

In [99]:
print(data[,‘temp’] ) #访问temp列,或者也可以写成data[,3]

, , 1

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

, , 2

[,1] [,2] [,3] [,4]

[1,] 13 16 19 22

[2,] 14 17 20 23

[3,] 15 18 21 24

X Y temp RH
1 1 2 14.7 66
2 1 2 18.5 73
3 1 2 25.9 41

[1] 14.7 18.5 25.9

In [100]:
set.seed( 100 )
y = rnorm( 100 , 0 , 1 )#生成 100 个标准正态分布的 100 个数
print(y)

In [101]:
y = sort(y)#将y中的数值进行排序
print(y)

[1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009

[7] -0.58179068 0.71453271 -0.82525943 -0.35986213 0.08988614 0.09627446

[13] -0.20163395 0.73984050 0.12337950 -0.02931671 -0.38885425 0.51085626

[19] -0.91381419 2.31029682 -0.43808998 0.76406062 0.26196129 0.77340460

[25] -0.81437912 -0.43845057 -0.72022155 0.23094453 -1.15772946 0.24707599

[31] -0.09111356 1.75737562 -0.13792961 -0.11119350 -0.69001432 -0.22179423

[37] 0.18290768 0.41732329 1.06540233 0.97020202 -0.10162924 1.40320349

[43] -1.77677563 0.62286739 -0.52228335 1.32223096 -0.36344033 1.31906574

[49] 0.04377907 -1.87865588 -0.44706218 -1.73859795 0.17886485 1.89746570

[55] -2.27192549 0.98046414 -1.39882562 1.82487242 1.38129873 -0.83885188

[61] -0.26199577 -0.06884403 -0.37888356 2.58195893 0.12983414 -0.71302498

[67] 0.63799424 0.20169159 -0.06991695 -0.09248988 0.44890327 -1.06435567

[73] -1.16241932 1.64852175 -2.06209602 0.01274972 -1.08752835 0.27053949

[79] 1.00845187 -2.07440475 0.89682227 -0.04999577 -1.34534931 -1.93121153

[85] 0.70958158 -0.15790503 0.21636787 0.81736208 1.72717575 -0.10377029

[91] -0.55712229 1.42830143 -0.89295740 -1.15757124 -0.53029645 2.44568276

[97] -0.83249580 0.41351985 -1.17868314 -1.17403476

[1] -2.27192549 -2.07440475 -2.06209602 -1.93121153 -1.87865588 -1.77677563

[7] -1.73859795 -1.39882562 -1.34534931 -1.17868314 -1.17403476 -1.16241932

[13] -1.15772946 -1.15757124 -1.08752835 -1.06435567 -0.91381419 -0.89295740

[19] -0.83885188 -0.83249580 -0.82525943 -0.81437912 -0.72022155 -0.71302498

[25] -0.69001432 -0.58179068 -0.55712229 -0.53029645 -0.52228335 -0.50219235

[31] -0.44706218 -0.43845057 -0.43808998 -0.38885425 -0.37888356 -0.36344033

[37] -0.35986213 -0.26199577 -0.22179423 -0.20163395 -0.15790503 -0.13792961

[43] -0.11119350 -0.10377029 -0.10162924 -0.09248988 -0.09111356 -0.07891709

[49] -0.06991695 -0.06884403 -0.04999577 -0.02931671 0.01274972 0.04377907

[55] 0.08988614 0.09627446 0.11697127 0.12337950 0.12983414 0.13153117

[61] 0.17886485 0.18290768 0.20169159 0.21636787 0.23094453 0.24707599

[67] 0.26196129 0.27053949 0.31863009 0.41351985 0.41732329 0.44890327

[73] 0.51085626 0.62286739 0.63799424 0.70958158 0.71453271 0.73984050

[79] 0.76406062 0.77340460 0.81736208 0.88678481 0.89682227 0.97020202

[85] 0.98046414 1.00845187 1.06540233 1.31906574 1.32223096 1.38129873

[91] 1.40320349 1.42830143 1.64852175 1.72717575 1.75737562 1.82487242

[97] 1.89746570 2.31029682 2.44568276 2.58195893

In [102]:
plot(y, dnorm(y, 0 , 1 ), type=“l”, main=“正态分布密度图”) #生成正态分布密度图

In [103]:
#首先,定义一个f函数
f = function(n){
sum = 0 #定义一个sum,存储总和
for(i in 1 :n) sum = sum + i^ 3 #定义一个for循环,依次将n个数的立方求和
return(sum) #返回求和后的数
}
f( 5 ) #当n为 5 时,调用f函数,结果为 225

In [104]:
round(abs(exp( 1 )-exp( 2 ))^( 1 / 3 ), 2 )

225

1.67

In [105]:
x=c( 3 : 95 )
mean(x)
median(x)
sd(x)
var(x)
max(x)
min(x)
length(x)
sum(x)

In [106]:
Reportcard1 = read.table(“/home/mw/input/wlong6309/ReportCard1.txt”,header=T)
Reportcard2 = read.table(“/home/mw/input/wlong6309/ReportCard2.txt”,header=T)
Reportcard = merge(Reportcard1,Reportcard2,by=‘xh’)
print(head(Reportcard))

49

49

26.9907391525316

728.5

95

3

93

4557

xh sex poli chi math fore phy che geo his
1 92101 2 96 96 87.5 72 93 65 76.0 92
2 92102 1 94 97 86.5 61 93 64 79.5 95
3 92103 2 NA NA NA 66 98 79 89.0 81
4 92104 2 89 97 69.5 86 83 62 83.0 94
5 92105 1 82 85 79.5 60 88 66 72.5 98
6 92106 2 88 88 78.0 60 90 70 81.5 77

In [107]:
Reportcard = na.omit(Reportcard)
print(head(Reportcard))

In [108]:
Reportcardsex, levels=c( 1 , 2 ),labels=c(“M”,“F”))
Reportcard$sex

In [109]:
SumScore = rowSums(Reportcard[, 3 : 10 ], na.rm=TRUE)
ReportcardAvScore = AvScore
print(head(Reportcard))

xh sex poli chi math fore phy che geo his
1 92101 2 96 96 87.5 72 93 65 76.0 92
2 92102 1 94 97 86.5 61 93 64 79.5 95
4 92104 2 89 97 69.5 86 83 62 83.0 94
5 92105 1 82 85 79.5 60 88 66 72.5 98
6 92106 2 88 88 78.0 60 90 70 81.5 77
7 92108 2 84 90 69.5 50 80 60 86.5 94

F M F M F F M M F M M F M M M M F M F F F F M F F

M M M M F F F F F M F F M M M M M F M F F M F M M

M M F F F M M F

Levels :
xh sex poli chi math fore phy che geo his SumScore AvScore
1 92101 F 96 96 87.5 72 93 65 76.0 92 677.5 84.6875
2 92102 M 94 97 86.5 61 93 64 79.5 95 670.0 83.7500
4 92104 F 89 97 69.5 86 83 62 83.0 94 663.5 82.9375
5 92105 M 82 85 79.5 60 88 66 72.5 98 631.0 78.8750
6 92106 F 88 88 78.0 60 90 70 81.5 77 632.5 79.0625
7 92108 F 84 90 69.5 50 80 60 86.5 94 614.0 76.7500

其中(大于等于 90 分为A,大于等于 80 并小于 90 分为B,大于等于 70 分并小于 80 分为C,大于等于 60

分并小于 70 分为D,小于 60 分为E)

In [110]:
Reportcard = within(Reportcard,{
AvScore[AvScore>= 90 ] = ‘A’
AvScore[AvScore>= 80 & AvScore< 90 ] = ‘B’
AvScore[AvScore>= 70 & AvScore< 80 ] = ‘C’
AvScore[AvScore>= 60 & AvScore< 70 ] = ‘D’
AvScore[AvScore< 60 ] = ‘E’
})
avScore = Reportcard[, 12 ]#将重新编码的数据保存到avScore中
print(avScore)
[1] “B” “B” “B” “C” “C” “C” “C” “C” “C” “C” “C” “C” “C” “C” “D” “D” “C” “D” “C”
[20] “D” “D” “D” “D” “E” “E” “E” “E” “E” “B” “C” “C” “C” “C” “C” “C” “C” “C” “C”
[39] “C” “C” “D” “D” “D” “D” “D” “D” “D” “D” “D” “D” “D” “D” “D” “D” “D” “E” “E”
[58] “E”

In [111]:
n=table(Reportcard$AvScore)
barplot(n,ylim=c( 0 , 25 )) #生成柱状图

In [112]:
data = matrix(c( 1 , 2 , 3 , 4 , 5 , 6 ), nrow= 2 )
row_max = c()
row_min = c()
col_max = c()
col_min = c()
for(i in 1 :nrow(data))
{
row_max = c(row_max, max(data[i,]))
row_min = c(row_min, min(data[i,]))
}
data = cbind(data, row_max, row_min)
for(j in 1 :ncol(data))
{
col_max = c(col_max, max(data[,j]))
col_min = c(col_min, min(data[,j]))
}
data = rbind(data, col_max, col_min)
print(data)

row_max row_min
1 3 5 5 1
2 4 6 6 2
col_max 2 4 6 6 2
col_min 1 3 5 5 1

In [3]:
install.packages(“rvest”)
library(rvest)#包含爬虫函数的包

page_text <- read_html(“https://sjz.58.com/xinfang/”)#加载第一页的数据
#获取小区名称
estate_name <- page_text %>% html_nodes(“span.items-name”) %>% html_text()
#获取小区所在位置
estate_detail_address <- page_text %>% html_nodes(“span.list-map”) %>% html_tex
estate_brief_address <- substr(estate_detail_address, 3 , 4 )#所在县区
#均价
estate_price <- page_text %>% html_nodes(“p.price”) %>% html_nodes(“span”)%>% h
#处理数据:翰林观天下售价显示的是周边均价(保留)
estate_price <- c(estate_price[ 1 : 16 ], “15990”, estate_price[ 17 : 59 ])
#将爬取到的数据存入数据框中
estate <- data.frame(name=estate_name,address=estate_brief_address,price=estate

print(head(estate))

至于用R爬取 58 同城新房代码见后续完整的项目哈,马上安排更新,欢迎 点赞、Fork 哈!!
【 R语言配套知识点详细总结】

In [ ]:

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
name address price
1 紫晶悦和中心 长安 14800
2 天润福庭 藁城 10500
3 美好时光 裕华 12500
4 玖筑翰府 开发 11000
5 绿城诚园 新华 12800
6 东华国樾府 裕华 15500