개인적인 생각과 비슷한 부분이 많아서 메모


참고 : alibaba mysql rules(with java) 

 - https://github.com/alibaba/Alibaba-Java-Coding-Guidelines#3-mysql-rules




Table Schema Rules

1. [Mandatory] Columns expressing the concept of True or False, must be named as is_xxx, whose data type should be unsigned tinyint (1 is True, 0 is False).

Note: All columns with non-negative values must be unsigned.

2. [Mandatory] Names of tables and columns must consist of lower case letters, digits or underscores. Names starting with digits and names which contain only digits (no other characters) in between two underscores are not allowed. Columns should be named cautiously, as it is costly to change column names and cannot be released in pre-release environment.

Positive example: getter_admin, task_config, level3_name
Counter example: GetterAdmin, taskConfig, level_3_name

3. [Mandatory] Plural nouns are not allowed as table names.

4. [Mandatory] Keyword, such as descrangematchdelayed, etc., should not be used. It can be referenced from MySQL official document.

5. [Mandatory] The name of primary key index should be prefixed with pk_, followed by column name; Unique index should be named by prefixing its column name with uk_; And normal index should be formatted as idx_[column_name].

Note: pk means primary key, uk means unique key, and idx is short for index.

6. [Mandatory] Decimals should be typed as decimalfloat and double are not allowed.

Note: It may have precision loss when float and double numbers are stored, which in turn may lead to incorrect data comparison result. It is recommended to store integral and fractional parts separately when data range to be stored is beyond the range covered by decimal type.

7. [Mandatory] Use char if lengths of information to be stored in that column are almost the same.

8. [Mandatory] The length of varchar should not exceed 5000, otherwise it should be defined as text. It is better to store them in a separate table in order to avoid its effect on indexing efficiency of other columns.

9. [Mandatory] A table must include three columns as following: idgmt_create and gmt_modified.

Note: id is the primary key, which is unsigned bigint and self-incrementing with step length of 1. The type of gmt_createand gmt_modified should be DATE_TIME.

10. [Recommended] It is recommended to define table name as [table_business_name]_[table_purpose].

Positive example: tiger_task / tiger_reader / mpp_config

11. [Recommended] Try to define database name same with the application name.

12. [Recommended] Update column comments once column meaning is changed or new possible status values are added.

13. [Recommended] Some appropriate columns may be stored in multiple tables redundantly to improve search performance, but consistency must be concerned. Redundant columns should not be:
  1) Columns with frequent modification.
  2) Columns typed with very long varchar or text.

Positive example: Product category names are short, frequently used and with almost never changing/fixed values. They may be stored redundantly in relevant tables to avoid joined queries.

14. [Recommended] Database sharding may only be recommended when there are more than 5 million rows in a single table or table capacity exceeds 2GB.

Note: Please do not shard during table creation if anticipated data quantity is not to reach this grade.

15. [For Reference] Appropriate char column length not only saves database and index storing space, but also improves query efficiency.

Positive example: Unsigned types could avoid storing negative values mistakenly, but also may cover bigger data representative range.

ObjectAgeRecommended data typeRange
humanwithin 150 years oldunsigned tinyintunsigned integers: 0 to 255
turtlehundreds years oldunsigned smallintunsigned integers: 0 to 65,535
dinosaur 
fossil
tens of millions years oldunsigned intunsigned integers: 
0 to around 4.29 billion
sunaround 5 billion years oldunsigned bigintunsigned integers: 0 to around 10^19

Index Rules

1. [Mandatory] Unique index should be used if business logic is applicable.

Note: Negative impact of unique indices on insert efficiency is neglectable, but it improves query speed significantly. Additionally, even if complete check is done at the application layer, as per Murphy's Law, dirty data might still be produced, as long as there is no unique index.

2. [Mandatory] JOIN is not allowed if more than three tables are involved. Columns to be joined must be with absolutely similar data types. Make sure that columns to be joined are indexed.

Note: Indexing and SQL performance should be considered even if only 2 tables are joined.

3. [Mandatory] Index length must be specified when adding index on varchar columns. The index length should be set according to the distribution of data.

Note: Normally for char columns, an index with the length of 20 can distinguish more than 90% data, which is calculated by count(distinct left(column_name, index_length)) / count()*.

4. [Mandatory] LIKE '%...' or LIKE '%...%' are not allowed when searching with pagination. Search engine can be used if it is really needed.

Note: Index files have B-Tree's left most prefix matching characteristic. Index cannot be applied if left prefix value is not determined.

5. [Recommended] Make use of the index order when using ORDER BY clauses. The last columns of ORDER BY clauses should be at the end of a composite index. The reason is to avoid the file_sort issue, which affects the query performance.

Positive example: where a=? and b=? order by c; Index is: a_b_c 
Counter example: The index order will not take effect if the query condition contains a range, e.g., where a>10 order by b;Index a_b cannot be activated.

6. [Recommended] Make use of Covering Index for query to avoid additional query after searching index.

Note: If we need to check the title of Chapter 11 of a book, do we need turn to the page where Chapter 11 starts? No, because the table of contents actually includes the title, which serves as a covering index.
Positive example: Index types include primary key indexunique index and common indexCovering index pertains to a query effect. When refer to explain result, using index may appear in extra columns.

7. [Recommended] Use late join or sub-query to optimize scenarios with many pages.

Note: Instead of bypassing offset rows, MySQL retrieves totally offset+N rows, then drops off offset rows and returns N rows. It is very inefficient when offset is very big. The solution is either limiting the number of pages to be returned, or rewriting SQL statement when page number exceeds a predefined threshold.
Positive example: Firstly locate the required id range quickly, then join:
select a.* from table1 a, (select id from table1 where some_condition LIMIT 100000, 20) b where a.id=b.id;

8. [Recommended] The target of SQL performance optimization is that the result type of EXPLAIN reaches REF level, or RANGEat least, or CONSTS if possible.

Counter example: Pay attention to the type of INDEX in EXPLAIN result because it is very slow to do a full scan to the database index file, whose performance nearly equals to an all-table scan.
CONSTS: There is at most one matching row, which is read by the optimizer. It is very fast.
REF: The normal index is used.
RANGE: A given range of index are retrieved, which can be used when a key column is compared to a constant by using any of the =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, or IN() operators.

9. [Recommended] Put the most discriminative column to the left most when adding a composite index.

Positive example: For the sub-clause where a=? and b=?, if data of column a is nearly unique, adding index idx_a is enough.
Note: When equal and non-equal check both exist in query conditions, put the column in equal condition first when adding an index. For example, where a>? and b=?b should be put as the 1st column of the index, even if column a is more discriminative.

10. [For Reference] Avoid listed below misunderstandings when adding index:
  1) It is false that each query needs one index.
  2) It is false that index consumes story space and degrades updateinsert operations significantly.
  3) It is false that unique index should all be achieved from application layer by "check and insert".

SQL Rules

1. [Mandatory] Do not use COUNT(column_name) or COUNT(constant_value) in place of COUNT(*). COUNT(*) is SQL92 defined standard syntax to count the number of rows. It is not database specific and has nothing to do with NULL and non-NULL.

Note: COUNT(*) counts NULL row in, while COUNT(column_name) does not take NULL valued row into consideration.

2. [Mandatory] COUNT(distinct column) calculates number of rows with distinct values in this column, excluding NULL values. Please note that COUNT(distinct column1, column2) returns 0 if all values of one of the columns are NULL, even if the other column contains distinct non-NULL values.

3. [Mandatory] When all values of one column are NULL, COUNT(column) returns 0, while SUM(column) returns NULL, so pay attention to NullPointerException issue when using SUM().

Positive example: NPE issue could be avoided in this way:
SELECT IF(ISNULL(SUM(g)), 0, SUM(g)) FROM table;

4. [Mandatory] Use ISNULL() to check NULL values. Result will be NULL when comparing NULL with any other values.

Note: 
  1) NULL<>NULL returns NULL, rather than false.
  2) NULL=NULL returns NULL, rather than true.
  3) NULL<>1 returns NULL, rather than true.

5. [Mandatory] When coding on DB query with paging logic, it should return immediately once count is 0, to avoid executing paging query statement followed.

6. [Mandatory] Foreign key and cascade update are not allowed. All foreign key related logic should be handled in application layer.

Note: e.g. Student table has student_id as primary key, score table has student_id as foreign key. When student.student_id is updated, score.student_id update is also triggered, this is called a cascading updateForeign key and cascading update are suitable for single machine, low parallel systems, not for distributed, high parallel cluster systems. Cascading updates are strong blocked, as it may lead to a DB update storm. Foreign key affects DB insertion efficiency.

7. [Mandatory] Stored procedures are not allowed. They are difficult to debug, extend and not portable.

8. [Mandatory] When correcting data, delete and update DB records, SELECT should be done first to ensure data correctness.

9. [Recommended] IN clause should be avoided. Record set size of the IN clause should be evaluated carefully and control it within 1000, if it cannot be avoided.

10. [For Reference] For globalization needs, characters should be represented and stored with UTF-8, and be cautious of character number counting.

Note: SELECT LENGTH("轻松工作"); returns 12.
SELECT CHARACTER_LENGTH("轻松工作"); returns 4.
Use UTF8MB4 encoding to store emoji if needed, taking into account of its difference from UTF-8.

11. [For Reference] TRUNCATE is not recommended when coding, even if it is faster than DELETE and uses less system, transaction log resource. Because TRUNCATE does not have transaction nor trigger DB trigger, problems might occur.

Note: In terms of Functionality, TRUNCATE TABLE is similar to DELETE without WHERE sub-clause.

ORM Rules

1. [Mandatory] Specific column names should be specified during query, rather than using *.

Note:

  1. * increases parsing cost.
  2. It may introduce mismatch with resultMap when adding or removing query columns.

2. [Mandatory] Name of Boolean property of POJO classes cannot be prefixed with is, while DB column name should prefix with is. A mapping between properties and columns is required.

Note: Refer to rules of POJO class and DB column definition, mapping is needed in resultMap. Code generated by MyBatis Generator might need to be adjusted.

3. [Mandatory] Do not use resultClass as return parameters, even if all class property names are the same as DB columns, corresponding DO definition is needed.

Note: Mapping configuration is needed, to decouple DO definition and table columns, which in turn facilitates maintenance.

4. [Mandatory] Be cautious with parameters in xml configuration. Do not use ${} in place of #{}#param#. SQL injection may happen in this way.

5. [Mandatory] iBatis built in queryForList(String statementName, int start, int size) is not recommended.

Note: It may lead to OOM issue because its implementation is to retrieve all DB records of statementName's corresponding SQL statement, then start, size subset is applied through subList.
Positive example: Use #start##size# in sqlmap.xml.

Map<String, Object> map = new HashMap<String, Object>();  
map.put("start", start);  
map.put("size", size);  

6. [Mandatory] Do not use HashMap or HashTable as DB query result type.

7. [Mandatory] gmt_modified column should be updated with current timestamp simultaneously with DB record update.

8. [Recommended] Do not define a universal table updating interface, which accepts POJO as input parameter, and always update table set c1=value1, c2=value2, c3=value3, ... regardless of intended columns to be updated. It is better not to update unrelated columns, because it is error prone, not efficient, and increases binlog storage.

9. [For Reference] Do not overuse @Transactional. Because transaction affects QPS of DB, and relevant rollbacks may need be considered, including cache rollback, search engine rollback, message making up, statistics adjustment, etc.

10. [For Reference] compareValue of <isEqual> is a constant (normally a number) which is used to compared with property value. <isNotEmpty> means executing corresponding logic when property is not empty and not null. <isNotNull> means executing related logic when property is not null.


1.8.4 설치방법은 아래 링크를 참고하세요


설치방법 메모

유저홈/apps 디렉토리를 기준으로 설치 함

#디렉토리 생성
mkdir ~/apps && cd ~apps

#다운로드 후 압축 해제, 심볼릭 링크 처리
wget 'https://github.com/naver/pinpoint/archive/1.8.1-RC1.tar.gz' && tar -xzf 1.8.1-RC1.tar.gz && rm 1.8.1-RC1.tar.gz && ln -s pinpoint-1.8.1-RC1 pinpoint


#인스톨(JAVA 6,7,8, 9 필요)
cd ~/apps/pinpoint && ./mvnw install -Dmaven.test.skip=true

#참고: JAVA설치
* 아래 wget이 인증 만료로 진행되지 않는다면, 각 다운로드 페이지에 로그인 후 fiddler를 통해 다운로드 URL을 구하면 됨

cd ~/apps

chmod 755 jdk-6u45-linux-x64.bin && ./jdk-6u45-linux-x64.bin && rm jdk-6u45-linux-x64.bin

tar -xzf jdk-7u80-linux-x64.tar.gz && rm jdk-7u80-linux-x64.tar.gz

tar -xzf jdk-8u191-linux-x64.tar.gz && rm jdk-8u191-linux-x64.tar.gz && ln -s ~/apps/jdk1.8.0_191 jdk && ln -s ~/apps/jdk1.8.0_191 jdk_8

tar -xf jdk-9.0.4_linux-x64_bin.tar.gz  && rm jdk-9.0.4_linux-x64_bin.tar.gz && ln -s ~/apps/jdk-9.0.4 jdk_9


#환경변수 등록
vi ~/.bashrc 후에

export JAVA_HOME=~/apps/jdk
export JAVA_9_HOME=~/apps/jdk_9
export JAVA_8_HOME=~/apps/jdk_8
export JAVA_6_HOME=~/apps/jdk_6
export JAVA_7_HOME=~/apps/jdk_7
export PATH=$JAVA_HOME/bin:$PATH



#참고 : JAVA DNS TTL Modify
echo 'networkaddress.cache.ttl=60' >> ~/apps/jdk/jre/lib/security/java.security



#참고 : OS 디스크외 AWS EBS를 추가로 마운트해서 데이터를 저장한다면 pinpoint data디렉토리 변경(/data에 EBS 마운트)
cd ~/apps/pinpoint/quickstart && ln -s /data/pinpoint data
sudo su
mkdir /data/pinpoint && chown 유저계정:유저계정 /data/pinpoint



Hbase 설치 및 시작
-- HBase 저장기간을 줄임(1일=86400초로 줄이는데 상황에 따라서 적당한 수치로 조정)
 vi ./quickstart/conf/hbase/init-hbase.txt 후 아래 입력 (2일간 데이터 보관)
 :%s/5184000/172800/g

-- Hbase 다운로드 및 시작
quickstart/bin/start-hbase.sh

-- Hbase 테이블 초기화
quickstart/bin/init-hbase.sh



Pinpoint 데몬들 설정 후 시작

-- 컬렉터로그 레벨을 조정해서 적게 남기도록 함
vi ./quickstart/collector/src/main/resources/log4j.xml
:%s/DEBUG/INFO/g
:%s/TRACE/INFO/g

-- pinpoint 웹의 로그레벨 조정
vi ./quickstart/web/src/main/resources/log4j.xml
:%s/DEBUG/INFO/g

-- 컬렉터(데이터 수집 프로세스)  시작
quickstart/bin/start-collector.sh

-- 웹 UI 시작
quickstart/bin/start-web.sh


agent 수정(해당 agent가 실제 어플리케이션 시작시 사용됨)

agent 홈 경로 : ~/apps/pinpoint/agent/target/pinpoint-agent-버전

cd ~/apps/pinpoint/agent/target/pinpoint-agent-버전

-- agent 로그 조정( 미 조정시 DEBUG레벨로 로그가 남아서 대상 프로그램의 성능 하향이 발생)
vi lib/log4j.xml 후 아래 명령 수행(로그레벨 DEBUG를 INFO로 조정)
:%s/DEBUG/INFO/g

아래내용은 삭제
<appender-ref ref="console" />

-- Collector server의 IP 및 포트 수정


vi ./quickstart/agent/src/main/resources/pinpoint.config
cd ~/apps/pinpoint/agent/target/버전 && vi pinpoint.config

-- 어플리케이션 서버로 복사를 위해서 압축해둠(예)
ex) tar -czf pinpoint-agent-버전블라블라.tar.gz 블라블라

이후 /home/integtool/apps/pinpoint/quickstart/web/target/deploy 경로에 파일을 복사하면 웹에서 다운로드 가능



이후 APM으로 모니터링하고 싶은 서버에 Agent 복사 후 프로세스 실행시 agent 사용하도록 셋팅
 - 방법은 정리 예정이며 필요시 구글 검색하세요




package com.sample;
import java.util.concurrent.TimeUnit;
import org.apache.http.client.HttpClient;
import org.apache.http.impl.client.HttpClientBuilder;
import org.springframework.http.client.HttpComponentsClientHttpRequestFactory;
import org.springframework.web.client.RestTemplate;
import lombok.extern.slf4j.Slf4j;

/**
 * Spring RestTemplate 샘플
 *  - 타임아웃 및 pool설정 등의 튜닝버전 
 *
 * @author 
 */
@Slf4j
public class RestTemplateSample {
     private static final String REQ_URL_1 = "https://테스트URL 1번째";
     private static final String REQ_URL_2 = "https://테스트URL 2번째";
    
    private static RestTemplate restTemplate = new RestTemplate();
     
    public static void init() {
          log.info("== Start : restTemplate 초기화");
          HttpComponentsClientHttpRequestFactory crf = new HttpComponentsClientHttpRequestFactory();
          crf.setReadTimeout(7000); //읽기시간초과 타임아웃
          crf.setConnectTimeout(7000); //연결시간초과 타임아웃
          // @formatter:off
          HttpClient httpClient = HttpClientBuilder.create()
               .setMaxConnTotal(300) //커넥션풀적용(최대 오픈되는 커넥션 수)
               .setMaxConnPerRoute(50) //커넥션풀적용(IP:포트 1쌍에 대해 수행 할 연결 수제한)
               .evictIdleConnections(2000L, TimeUnit.MILLISECONDS) //서버에서 keepalive시간동안 미 사용한 커넥션을 죽이는 등의 케이스 방어로 idle커넥션을 주기적으로 지움
               .build();
          crf.setHttpClient(httpClient);
          // @formatter:on
          restTemplate.setRequestFactory(crf);
          log.info("== end : restTemplate 초기화\n\n");
     }
     /**
      * Spring RestTemplate 샘플
      *  - 타임아웃 및 pool설정 등의 튜닝버전
      *  - https URL호출 테스트시, 로그에 SSLConnectionSocketFactory 존재 유무 등을 확인하면 pool이 활용되어 https three hand shake가 발생하는지 알 수 있음
      * 
      * @param args
      * @throws Exception
      */
     public static void main(String[] args) throws Exception {
          init();
          int testReqCnt = 3; //테스트 호출 횟수
          //1번째 요청시 커넥션을 맺어서 pool에 넣고, 2번째 요청부터는 pool을 재사용(로그를 보면 ssl핸드쉐이크 부분이 2번째에서는 없음)
          for (int i = 1; i <= testReqCnt; i++) {
               System.out.println("\n\n" + i + "번째 요청 시작");
               log.info("\n\nURL:{}의 {} 번째 결과. {}\n\n\n", REQ_URL_1, i, restTemplate.getForObject(REQ_URL_1, String.class));
               //evictIdleConnections 확인시 주석 풀어서 실행=> 로그에 "Closing connections idle longer than 2000 MILLISECONDS" 확인 됨
               Thread.sleep(3000);
               if (i != testReqCnt) {
                    log.info("\n\n3초 sleep 후 idleConnection이 pool에서 제거된 후 해당URl을 재 요청해봄\n\n");
               }
          }
          System.out.println("\n\n\n\n");
          System.out.println("=== 2번째 URL호출 테스트 시작 ===");
          for (int i = 1; i <= testReqCnt; i++) {
               System.out.println("\n\n" + i + "번째 요청 시작");
               log.info("\n\nURL:{}의 {} 번째 결과. {}\n\n\n", REQ_URL_2, i, restTemplate.getForObject(REQ_URL_2, String.class));
          }
          System.out.println("\n\n\n\n");
          System.out.println("1번째 URL 재 호출해봄");
          log.info("\n\nURL:{}의결과. {}\n", REQ_URL_1, restTemplate.getForObject(REQ_URL_1, String.class));
     }
}


+ Recent posts