截取字符串时,截断HTML的问题
在截取字符串时,如果该串中含 HTML 代码,往往会把 HTML 截断,比如:
$string = "aaaaaaaaaa<br />bbbbbbbbbb<br />cccccccccc<br />dddddddddd<br />eeeeeeeeee";
截取 29 个字符:
$len = 29;
echo substr($string,0,$len);
将会得到: aaaaaaaaaa<br />bbbbbbbbbb<br
用以下方法,可以避免从 HTML 代码处截断,而且长度的计算,不算上代码的长度,更比较切合实际:
<?
$len = 29;
$string = "aaaaaaaaaa<br />bbbbbbbbbb<br />cccccccccc<br />dddddddddd<br />eeeeeeeeee";
echo substr($string,0,$len);
// Result: aaaaaaaaaa<br />bbbbbbbbbb<br
echo "\n";
echo cut_without_tags($string,$len);
// Result: aaaaaaaaaa<br />bbbbbbbbbb<br />ccccccccc
/** Function Start,faisun@sina.com **/
function cut_without_tags($string,$len){
/*
Split $string to array:
"aaaaaaaaaa","<br />","bbbbbbbbbb","<br />","cccccccccc","<br />","dddddddddd","<br />","eeeeeeeeee"
*/
$spchar = chr(1).chr(2).chr(4); // A special String
$s = str_replace("<","$spchar<",$string);
$s = str_replace(">",">$spchar",$s);
$str_array = split("$spchar",$s);
$new_str = "";
$new_str_len = 0;
$temp_lem = 0;
foreach($str_array as $s){
$tag = strrchr($s,'<')?true:false; // Is a HTML tag?
if(!$tag) $temp_lem += strlen($s); // valid length,if NOT a HTML tag
if( $temp_lem < $len ){
$new_str .= $s;
$new_str_len = $temp_lem;
}else if($new_str_len==$len || $tag){
$new_str .= $s;
$new_str_len = $temp_lem;
break;
}else{
$new_str .= substr($s,0,$len-$new_str_len); //Cut, if too long and NOT a HTML tag
break;
}
}
return $new_str;
}
?>
中文的截取,在字符串的长度计算和截取方面有所不同,需要换成相应的函数。
用这种方法截取后,可能会出现不闭合的标签,还要把这些标签进行后期闭合处理。